Glossary

HumanEval

A **benchmark** of 164 Python programming problems used to evaluate AI code generation models. Measures **pass@k** rates. (Ch. 3, Appendix A)

Learn More

Related Terms