4. Evaluation

measure whether the model actually works. For classification, the metrics from Chapter 16 apply directly. For topic modeling, coherence scores. For sentiment, both accuracy and qualitative review of misclassified examples.