Behavior differs between training and evaluation modes (source of subtle bugs) - Performance degrades with very small batch sizes (batch statistics become noisy) - Not ideal for sequence models where batch statistics mix different sequence lengths