Collecting the raw text data (news articles, social media posts, market data) that feeds into the NLP pipeline.