Glossary

The Pile

Description: An 800GB curated dataset combining 22 high-quality sub-datasets (academic papers, books, code, etc.) designed for LLM pre-training. - License: MIT (compilation); individual components vary. - Chapters: 14, 15.

Learn More

Related Terms