performing well on new data — is the whole point. Everything else (the splits, the baselines, the complexity tuning) is in service of generalization.