Inference speed

1D CNNs are much faster than transformers for short sequences because they avoid the $O(n^2)$ self-attention computation and are highly parallelizable. In a recommendation system serving millions of requests per second, latency matters. (2) **Model size** — 1D CNNs have far fewer parameters than tra