Anthropic HH-RLHF

Description: Human preference data for helpfulness and harmlessness. Contains pairs of model responses with human preference labels. - Chapters: 16, 17.