the surprising observation that simply adding more layers to a network can *decrease* accuracy, not because of overfitting, but because of optimization difficulties.