When your data is too large for pandas (tens or hundreds of millions of rows), PySpark gives you the same DataFrame interface at distributed scale.