Improves model calibration (predicted probabilities better reflect true likelihoods). - Reduces overfitting, especially for datasets with noisy labels. - Encourages the model to learn more discriminative features for the penultimate layer. - Almost always helps for classification tasks.