a technique where the old and new models each select some ads for the same query, and the results are compared directly. Interleaving is more statistically efficient than a standard A/B test because each query serves as its own control, reducing variance and requiring fewer impressions to detect sma