Glossary

1.1 Data Collection

Collect raw domain text from at least two different sources. - Document each source: URL, license/terms of use, approximate size, collection method. - Collect at least 50,000 raw text passages (before filtering).

Learn More

Related Terms