RLHF (Reinforcement Learning from Human Feedback)

A training technique that uses human evaluations of model outputs to teach the model to generate more helpful, honest, and harmless responses. (Ch. 2)