all neurons compute identical gradients and never differentiate. - Biases are almost always initialized to zero. - Proper initialization keeps activation variance and gradient variance stable across layers.