Glossary

3.4 Safety and Robustness Evaluation

Test for: - Hallucination rate: Fraction of responses containing fabricated information. - Refusal appropriateness: Does the model refuse to answer questions outside its expertise? - Adversarial robustness: Test with intentionally misleading or adversarial prompts. - General capability retention: Ev

Learn More

Related Terms