convert the cleaned tokens into a numerical matrix. Bag-of-Words counts how many times each word appears. TF-IDF refines those counts by penalizing words that appear everywhere (and thus carry little information). The result is a document-term matrix where each row is a document and each column is a