Glossary

After alignment:

Safety benchmarks: TruthfulQA, HarmBench, BBQ - Capability benchmarks: MMLU, HumanEval, GSM8K (to detect capability regression) - User satisfaction: if in production, track thumbs up/down ratings

Learn More

Related Terms