Focuses on building and maintaining the infrastructure that makes data available - Primary tools: SQL, Python, cloud platforms (AWS, GCP), Apache Spark, Airflow - Outputs: data pipelines, warehouses, ETL systems - Typical question: "How do we move 50 million rows of transaction data from production