Manually categorize errors from the test set into types: - Factual errors (wrong information). - Incomplete answers (missing key details). - Hallucinations (fabricated information). - Format errors (wrong structure or style). - Refusal errors (refuses a legitimate question). - For each error type, p