Glossary

C4 (Colossal Clean Crawled Corpus)

Description: A cleaned version of Common Crawl used to train the T5 model. Approximately 750GB of text. - Access: `datasets.load_dataset("c4", "en")` - Chapters: 14, 15.

Learn More

Related Terms