**Activation means** should be near zero (for zero-centered activations like tanh) or near 0.5 times the standard deviation (for ReLU) - **Activation standard deviations** should be roughly constant across layers (not growing or shrinking) - **Dead fraction** (fraction of neurons outputting exactly