Glossary

tokenizer

the algorithm that converts raw text into the subword units the model processes. Unlike traditional word-level tokenization, modern approaches operate at the **subword** level, balancing vocabulary size with the ability to represent any text.

Learn More

Related Terms