it computes pairwise similarities and weighted averages without any notion of position or order. If the input sequence is permuted, the outputs are the same permutation of the original outputs (permutation equivariance). Without positional encoding, the transformer treats "the dog bit the man" ident