For understanding generalization:

The hypothesis suggests that what matters is not the total number of parameters but the structure of the connections. This partly explains why overparameterized networks generalize well---they provide a rich search space for finding good subnetworks.