Chapter 29: Building Dashboards with Streamlit

30 min read

> "The best framework is the one that lets you ship the dashboard the same day the boss asks for it."

Learning Objectives

Explain Streamlit's execution model: top-to-bottom re-run on every interaction
Build a Streamlit app structure: page config, sidebar, main panel, columns, tabs, expanders
Add interactive widgets: slider, selectbox, multiselect, date_input, file_uploader, text_input
Embed matplotlib, seaborn, Plotly, and Altair charts with st.pyplot, st.plotly_chart, st.altair_chart
Implement caching with @st.cache_data and @st.cache_resource
Deploy a Streamlit app to Streamlit Community Cloud
Handle state management with st.session_state for multi-page apps and persistent selections

In This Chapter

29.1 From Individual Charts to Interactive Applications
29.2 Streamlit's Execution Model
29.3 The Minimal Streamlit App
29.4 Layout: Sidebar, Columns, Tabs, Expanders
29.5 Widgets: The Interactive Layer
29.6 Embedding Charts
29.7 Caching: Making Re-Runs Fast
29.8 Session State for Persistent Values
29.9 Multi-Page Applications
29.10 Deployment: Streamlit Community Cloud
29.11 Streamlit Pitfalls
29.12 Progressive Project: Climate Dashboard
29.13 Forms for Batched Input
29.14 Data Editing with st.data_editor
29.15 Custom Components and Theming
29.16 When to Use Streamlit vs. Alternatives
29.17 Streamlit in Production
29.18 A Design Pattern: Filter → Transform → Display
29.19 Reactive Dependencies Made Explicit
29.20 Streamlit for Prototyping ML Models
29.21 Streamlit vs. Notebooks
29.22 A Note on Update Frequencies
29.23 Testing and Debugging Streamlit Apps
29.24 Check Your Understanding
29.25 Chapter Summary
29.26 Spaced Review

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 29: Building Dashboards with Streamlit

"The best framework is the one that lets you ship the dashboard the same day the boss asks for it." — unattributed Python data engineering proverb

29.1 From Individual Charts to Interactive Applications

For 28 chapters, we have been building individual charts. A scatter plot in matplotlib. A bubble chart in Plotly. A network diagram in NetworkX. Each chart answers a specific question and lives in isolation — in a notebook, in a paper, in a slide deck. The skills are transferable but the outputs are discrete.

Real work often requires something different: a dashboard. A dashboard is a collection of charts and widgets arranged on a page, responding to user input, updating in real time. A stakeholder opens a dashboard to explore data interactively, filter to specific views, and drill down on details. The dashboard is not one chart but a workspace — an environment where several charts work together to answer a range of questions.

Building a dashboard in Python used to require web development skills. You would write HTML, CSS, and JavaScript; set up a Flask or Django backend; handle HTTP routes and form submissions; deploy to a web server. The barrier to entry was high enough that many data scientists stopped at the notebook stage. The dashboard was someone else's job — a web developer, a BI tool like Tableau or Power BI.

Streamlit, released in 2019 by a small startup acquired by Snowflake in 2022, changed this. Streamlit lets you build interactive dashboards in pure Python, no web development required. You write a Python script using Streamlit's API (st.title, st.sidebar, st.plotly_chart, etc.), run streamlit run app.py from the command line, and a dashboard appears in your browser. The same code runs locally for development and deploys to Streamlit Community Cloud (or another hosting provider) for production. The barrier to entry dropped from "several weeks of web development" to "an afternoon of Python scripting."

This chapter introduces Streamlit as the first dashboard tool in Part VII. The chapter's threshold concept — apps are scripts — is the key insight for understanding how Streamlit works. A Streamlit app is not a traditional web application with event handlers and component lifecycles. It is a Python script that re-runs from top to bottom on every interaction. Widgets return their current values as Python objects; the script reads the values and produces output; Streamlit displays the output. When the user changes a widget, the entire script re-runs with the new value. Once you internalize this, all of Streamlit's behavior becomes predictable.

Chapter 30 introduces Dash, a different dashboard framework with a different execution model. The comparison between Streamlit and Dash is informative — each has strengths and weaknesses, and the right choice depends on your use case. For most starting projects, Streamlit is the faster path.

29.2 Streamlit's Execution Model

The single most important thing to understand about Streamlit is the re-run model. When you load a Streamlit app, the Python script runs from top to bottom. Every line executes, every st.* call produces output, every widget renders. When you interact with the app — click a button, drag a slider, select from a dropdown — the entire script re-runs from top to bottom, with the widget now returning its new value.

This sounds wasteful, and in a naive implementation it would be. But Streamlit mitigates the cost with caching: expensive computations are cached, so they do not re-run on every interaction. Only the parts of the script that depend on the changed widget value (or on other changes) need to re-execute. The re-run is the default behavior; caching is what makes it fast.

The model has profound implications for how you structure code:

1. No event handlers. Traditional web frameworks use callbacks: "when this button is clicked, call this function." Streamlit does not have this concept. Instead, the button's state (clicked or not) is a Python value returned by st.button(...), and you react to that value with a regular if statement:

if st.button("Calculate"):
    result = expensive_calculation()
    st.write(result)

When the user clicks the button, the script re-runs, and this time st.button returns True, so the if block executes.

2. No component lifecycle. React and similar frameworks have "mount," "update," and "unmount" phases. Streamlit has none of these. The script just runs from top to bottom. The only "lifecycle" is the re-run itself.

3. State comes from widgets and session_state. Between re-runs, the widget values are remembered (so a slider keeps its position after a re-run), and anything you store in st.session_state persists. Local variables in the script do not persist — they are recomputed on every re-run.

4. Top-to-bottom ordering matters. The script runs in the order you wrote it. If you compute a value on line 10 and use it on line 20, the computation happens before the use. This is trivial for linear scripts but important to remember when you have conditional logic.

For most Streamlit apps, this model is liberating. You write a script the way you would write a notebook cell — top to bottom, with computations producing results and Streamlit commands rendering them. The framework handles all the web plumbing. The mental overhead compared to a traditional web framework is dramatically lower.

29.3 The Minimal Streamlit App

The smallest Streamlit app is three lines:

# app.py
import streamlit as st
st.title("My First App")
st.write("Hello, world!")

Save as app.py and run streamlit run app.py from the command line. A browser tab opens showing a page with a title and "Hello, world!" text. The app is live — it responds to URL parameters, re-runs if you edit the file (with "Rerun" button), and can be interacted with through any widgets you add.

st.write is the most versatile output function. It accepts almost anything — strings, numbers, pandas DataFrames, matplotlib figures, Plotly figures, Altair charts, dicts, JSON, markdown — and renders it appropriately. For quick dashboards, st.write(x) is often the right call. For more control, specific functions exist: st.dataframe(df) for interactive DataFrames, st.pyplot(fig) for matplotlib, st.plotly_chart(fig) for Plotly, and so on.

A slightly more interesting app loads some data and displays it:

import streamlit as st
import pandas as pd

st.title("Iris Dataset Explorer")
st.write("A quick look at the classic iris dataset.")

df = pd.read_csv("iris.csv")
st.dataframe(df)
st.write(f"Dataset has {len(df)} rows and {len(df.columns)} columns.")

Save, run, and you have an app that loads a CSV and displays it with an interactive table. The table supports sorting, filtering, and scrolling — features that Streamlit provides for free when you use st.dataframe instead of st.write.

Streamlit provides several layout primitives for organizing content on the page.

st.sidebar is the most common. Anything you write inside with st.sidebar: (or by calling st.sidebar.whatever(...)) appears in a left-hand sidebar that is separate from the main content area. Sidebars are typically used for controls (widgets) while the main area shows the results.

with st.sidebar:
    st.header("Filters")
    year_range = st.slider("Year range", 2000, 2024, (2010, 2024))
    category = st.selectbox("Category", ["All", "A", "B", "C"])

st.columns divides the main area into horizontal columns:

col1, col2, col3 = st.columns(3)
col1.metric("Revenue", "$1.2M", "+15%")
col2.metric("Users", "45,678", "+7%")
col3.metric("Conversion", "3.4%", "-0.2%")

st.columns(3) creates three equal-width columns. You can also pass a list of widths: st.columns([2, 1, 1]) creates a column twice as wide as the other two.

st.tabs creates tabbed content:

tab1, tab2, tab3 = st.tabs(["Overview", "Details", "Raw Data"])
with tab1:
    st.write("Summary charts here")
with tab2:
    st.write("Detailed analysis here")
with tab3:
    st.dataframe(df)

Tabs are useful for organizing dashboards with multiple views — users click to switch between sections without cluttering a single page.

st.expander hides content behind a collapsible section:

with st.expander("Advanced options"):
    threshold = st.slider("Threshold", 0.0, 1.0, 0.5)
    normalize = st.checkbox("Normalize")

Expanders are useful for optional or secondary controls — the user can expand them when needed and ignore them otherwise.

st.container groups content into a logical block, useful for styling or conditional display.

Combining these primitives produces typical dashboard layouts: sidebar for filters, multi-column KPI cards at the top, tabs for different analytical views, expanders for advanced options. The whole layout is defined in Python with no HTML or CSS.

29.5 Widgets: The Interactive Layer

Streamlit provides a rich set of widgets for user input. Each widget returns its current value when called:

Basic widgets:

name = st.text_input("Your name")
age = st.number_input("Age", min_value=0, max_value=150, value=30)
agree = st.checkbox("I agree")
choice = st.radio("Pick one", ["A", "B", "C"])
option = st.selectbox("Dropdown", ["Option 1", "Option 2", "Option 3"])

Numeric widgets:

value = st.slider("Value", 0, 100, 50)
range_val = st.slider("Range", 0, 100, (25, 75))   # tuple = range slider

Date and time:

start_date = st.date_input("Start date", value=datetime.date(2020, 1, 1))
end_time = st.time_input("End time")

Multi-select:

selected = st.multiselect("Choose categories", ["A", "B", "C", "D"], default=["A"])

File upload:

file = st.file_uploader("Upload CSV", type="csv")
if file is not None:
    df = pd.read_csv(file)
    st.dataframe(df)

Buttons and actions:

if st.button("Run analysis"):
    result = expensive_analysis()
    st.write(result)

Each widget takes a label (the visible text) plus options specific to its type. The return value is what the user has selected or entered. You use it immediately in the same script.

A key idiom: the widget value flows directly into the data filtering or chart construction. A slider controls a date range; you filter the DataFrame by the range; you plot the filtered data. The whole thing is linear:

date_range = st.slider("Date range", min_date, max_date, (min_date, max_date))
filtered_df = df[(df["date"] >= date_range[0]) & (df["date"] <= date_range[1])]
fig = px.line(filtered_df, x="date", y="value")
st.plotly_chart(fig)

When the user moves the slider, the script re-runs, date_range has a new value, filtered_df is recomputed, the chart is rebuilt, and the updated chart appears. No explicit event handlers; no state management beyond the widget itself.

29.6 Embedding Charts

Streamlit supports every major Python plotting library natively:

matplotlib and seaborn:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(x, y)
st.pyplot(fig)

Plotly:

import plotly.express as px
fig = px.scatter(df, x="x", y="y")
st.plotly_chart(fig, use_container_width=True)

The use_container_width=True makes the chart responsive to the container width, which is usually what you want.

Altair:

import altair as alt
chart = alt.Chart(df).mark_point().encode(x="x:Q", y="y:Q")
st.altair_chart(chart, use_container_width=True)

Bokeh:

from bokeh.plotting import figure
p = figure()
p.line(x, y)
st.bokeh_chart(p, use_container_width=True)

For interactive charts (Plotly, Altair), the charts remain interactive inside Streamlit — hover, zoom, and selection work normally. For static charts (matplotlib), they are rendered as images and cannot be interacted with.

Streamlit also has built-in chart functions: st.line_chart(df), st.bar_chart(df), st.area_chart(df), st.scatter_chart(df), st.map(df). These produce simple charts with minimal customization. They are convenient for quick dashboards but lack the flexibility of dedicated libraries. For anything more than a trivial line chart, use Plotly or Altair.

29.7 Caching: Making Re-Runs Fast

The re-run model means every interaction causes the entire script to re-execute. For fast operations (widget rendering, simple chart construction), this is fine. For slow operations (loading a large CSV, training a model, making an API call), re-running on every interaction is unacceptable.

Streamlit solves this with the @st.cache_data and @st.cache_resource decorators. These cache the output of a function based on its input arguments — subsequent calls with the same arguments return the cached result without re-executing.

@st.cache_data for data (DataFrames, lists, dicts, most data types):

@st.cache_data
def load_data():
    df = pd.read_csv("large_file.csv")
    df["date"] = pd.to_datetime(df["date"])
    return df

df = load_data()  # first call: slow, reads file
df = load_data()  # subsequent calls: instant, returns cache

The first call runs the function normally and stores the result in the cache. Subsequent calls with no arguments return the cached value immediately. If the function takes arguments, each distinct combination of arguments gets its own cache entry.

@st.cache_resource for resources that cannot be serialized (database connections, ML models, large in-memory objects):

@st.cache_resource
def get_model():
    return load_huge_model("model.pkl")

model = get_model()  # first call: slow, loads model
model = get_model()  # subsequent: returns the same object

The difference between cache_data and cache_resource: cache_data deep-copies the data to prevent accidental mutation (safer for DataFrames), while cache_resource returns the same object reference (necessary for objects with internal state like connections).

Cache invalidation: pass ttl=60 (time-to-live in seconds) to expire the cache periodically, or use the "Clear cache" button in Streamlit's menu. For a dashboard connected to live data, you typically set ttl to a few minutes so the cache refreshes with new data.

Caching is usually the single most impactful performance optimization in a Streamlit app. A dashboard that takes 30 seconds to load on every interaction becomes instantly responsive once the data loading is cached. Always cache your data loading and any expensive computations.

29.8 Session State for Persistent Values

Local variables in a Streamlit script do not persist across re-runs. A counter defined as counter = 0 is reset to 0 on every re-run. For state that needs to persist, Streamlit provides st.session_state — a dictionary that lives for the duration of the user's session.

if "counter" not in st.session_state:
    st.session_state.counter = 0

if st.button("Increment"):
    st.session_state.counter += 1

st.write(f"Counter: {st.session_state.counter}")

On the first run, the counter is initialized to 0. Each time the button is clicked, the script re-runs, the button returns True, and the counter is incremented. The value persists because it lives in session_state, not in a local variable.

Common uses of session_state:

Multi-page apps: persist the user's selections across pages.
Conditional workflows: track which step of a multi-step process the user is on.
Accumulated state: shopping carts, selected items, history of user actions.
Form submissions: hold form values until the user clicks Submit.

Each widget also has a key parameter that, if set, links the widget's value to session_state under that key. This lets you set default values programmatically or clear widget values on demand.

if st.button("Reset"):
    st.session_state["date_range"] = (default_start, default_end)

date_range = st.slider("Date range", min_date, max_date, key="date_range")

Session state is how Streamlit handles the tension between the re-run model (which discards local state) and the need for stateful interactions (which require persistence). Use it deliberately — for most simple dashboards, you do not need it, but for anything with multi-step workflows or persistent user choices, it is essential.

29.9 Multi-Page Applications

Streamlit supports multi-page applications through a pages directory. Create a folder called pages next to your main app file, and add Python files inside it — each file becomes a page accessible from the sidebar navigation.

my_app/
  app.py                    # main page
  pages/
    1_analysis.py           # second page (labeled "Analysis")
    2_settings.py           # third page (labeled "Settings")

The file names determine the page order (alphabetical) and the page labels (the filename minus the numeric prefix and extension). Each file is a standalone Streamlit script that can use the same Python modules, the same session_state, and the same caching as the main app.

For more control, Streamlit 1.30+ provides the st.navigation API for custom page definitions. This lets you dynamically create pages, conditionally hide pages, and specify custom icons and labels.

Multi-page apps are useful when the dashboard has several distinct workflows (e.g., a "Data" page for loading, an "Analysis" page for exploration, a "Reports" page for export). For simpler dashboards, a single-page layout with tabs or expanders is usually enough.

29.10 Deployment: Streamlit Community Cloud

One of Streamlit's killer features is Streamlit Community Cloud (formerly Streamlit Sharing), a free hosting service for Streamlit apps. You push your code to a GitHub repository, connect the repo to Streamlit Cloud, and the app is live on the internet within minutes. No server setup, no Docker, no CI/CD — just GitHub push.

The deployment workflow:

Create a GitHub repository containing your app.py, any supporting files, and a requirements.txt listing dependencies.
Sign in to share.streamlit.io with your GitHub account.
Click "New app," select the repository, and specify the main file.
Wait a minute for the app to build and deploy.
Share the URL.

Streamlit Cloud has resource limits (memory, CPU) appropriate for small-to-medium apps. For production use at scale, Streamlit Cloud has paid tiers with more resources, and you can also self-host Streamlit apps on any server that runs Python — AWS, GCP, Azure, Heroku, or a personal VPS.

For internal company dashboards, the deployment story is similar but uses internal infrastructure instead of Streamlit Cloud. The app itself is the same code either way.

The requirements.txt is critical. Streamlit Cloud installs only the packages listed there, so missing a dependency causes the deployment to fail with a cryptic error. Pin versions (pandas==2.1.0) for reproducibility, especially for production apps.

29.11 Streamlit Pitfalls

A few common Streamlit pitfalls and their remedies:

Slow re-runs from uncached computation. If data loading or model training runs on every interaction, the app becomes sluggish. Fix: use @st.cache_data or @st.cache_resource for any expensive operation.

Widget key conflicts. If two widgets have the same key (or no key and similar labels), Streamlit may silently share state between them. Fix: always provide unique keys for widgets that should have independent state.

Large data in session_state. Storing large DataFrames in session_state works but wastes memory and slows down the app. Fix: prefer caching over session state for large data.

Mutating cached data. If you modify a cached DataFrame in place, the cache sees the mutation on the next call and may behave unexpectedly. Fix: treat cached data as immutable; make copies if you need to modify.

Unpicklable objects in cache. cache_data tries to serialize the cached value, which fails for some objects (database connections, ML models with custom state). Fix: use cache_resource instead of cache_data for these.

Top-down re-execution surprises. If you write code that assumes Streamlit will call a specific function on a button click, remember that the entire script re-runs. Your "button handler" is not a callback; it is a conditional block in the main script that executes when the button returns True.

Running from Jupyter. Streamlit apps cannot run from Jupyter notebooks. They must be run from the command line with streamlit run app.py. Jupyter's execution model is incompatible with Streamlit's.

Not pinning dependencies. A requirements.txt without version pins will install the latest version of each package on every deployment, which can break apps when a new version introduces breaking changes. Always pin.

29.12 Progressive Project: Climate Dashboard

The climate project in this chapter builds a Streamlit dashboard for exploring the climate dataset. The structure:

Sidebar: filters for date range, variable selection, and smoothing window.

Main panel: interactive Plotly chart showing the selected variable over the selected date range with the selected smoothing.

Summary metrics: st.metric cards showing the mean, max, min, and trend.

Raw data view: an st.expander containing the filtered DataFrame.

Download button: st.download_button to export the filtered data as CSV.

The complete app in about 80 lines:

import streamlit as st
import pandas as pd
import plotly.express as px

st.set_page_config(page_title="Climate Explorer", layout="wide")

@st.cache_data
def load_climate_data():
    df = pd.read_csv("climate.csv", parse_dates=["date"])
    return df

df = load_climate_data()

st.title("🌍 Global Climate Explorer")
st.markdown("Explore temperature anomalies and CO₂ concentrations over time.")

with st.sidebar:
    st.header("Filters")
    date_range = st.slider(
        "Date range",
        min_value=df["date"].min().to_pydatetime(),
        max_value=df["date"].max().to_pydatetime(),
        value=(df["date"].min().to_pydatetime(), df["date"].max().to_pydatetime()),
    )
    variable = st.selectbox("Variable", ["temperature_anomaly", "co2_ppm", "sea_level_mm"])
    smoothing = st.slider("Smoothing (years)", 0, 20, 10)

filtered = df[(df["date"] >= date_range[0]) & (df["date"] <= date_range[1])].copy()
if smoothing > 0:
    filtered[f"{variable}_smooth"] = filtered[variable].rolling(window=smoothing * 12).mean()

col1, col2, col3, col4 = st.columns(4)
col1.metric("Records", f"{len(filtered):,}")
col2.metric(f"Mean {variable}", f"{filtered[variable].mean():.2f}")
col3.metric(f"Max {variable}", f"{filtered[variable].max():.2f}")
col4.metric(f"Min {variable}", f"{filtered[variable].min():.2f}")

fig = px.line(
    filtered, x="date", y=variable,
    title=f"{variable} over time",
    template="simple_white",
)
if smoothing > 0:
    fig.add_scatter(x=filtered["date"], y=filtered[f"{variable}_smooth"],
                    mode="lines", name=f"{smoothing}-year MA",
                    line=dict(width=3, color="red"))
st.plotly_chart(fig, use_container_width=True)

with st.expander("Raw data"):
    st.dataframe(filtered)

st.download_button(
    "Download filtered data",
    filtered.to_csv(index=False),
    file_name="climate_filtered.csv",
    mime="text/csv",
)

Run with streamlit run app.py and the dashboard appears. Users can drag the date range, select a variable, adjust smoothing, and see the chart update. They can view the filtered data in the expander and download it as CSV. The entire dashboard is ~80 lines of Python and requires no web development.

This is the value proposition of Streamlit: a working interactive dashboard in a few dozen lines, deployable to the internet in a few minutes. For rapid prototyping, internal tools, and MVPs, it is hard to beat.

29.13 Forms for Batched Input

The re-run model has a subtle drawback: every interaction triggers a re-run. If you have five widgets and the user needs to adjust all of them before the dashboard should update, the script re-runs five times (once per widget change) — four wasted re-runs. This is fine for fast apps but frustrating for slow ones.

st.form solves this. A form groups several widgets together and defers the re-run until the user clicks a submit button:

with st.form("filter_form"):
    date_range = st.slider("Date range", min_date, max_date, (min_date, max_date))
    variable = st.selectbox("Variable", options)
    smoothing = st.slider("Smoothing", 0, 20, 10)
    submitted = st.form_submit_button("Apply")

if submitted:
    # process the form inputs
    filtered_df = filter_data(date_range, variable, smoothing)
    st.plotly_chart(build_chart(filtered_df))

All widgets inside the with st.form(...) block are held in a batch. Changing them does not trigger a re-run. Only clicking the submit button re-runs the script, and at that point all the widget values are read at once. This can dramatically improve the perceived performance of dashboards with many interacting controls.

Forms are particularly useful for dashboards with expensive downstream computations. If each filter change triggered a 10-second query, the user would wait 50 seconds to set up five filters. With a form, they wait once for the whole batch.

Caveat: forms are visually distinct (they have a subtle border) and some users find them confusing ("why isn't the chart updating?"). Use forms when the performance benefit is real and the dashboard has a clear "apply" step; stick with immediate updates when the controls are cheap.

29.14 Data Editing with st.data_editor

Streamlit 1.23+ provides st.data_editor, an editable DataFrame widget. Users can modify cells, add rows, delete rows, and the edited DataFrame is returned as a Python object.

edited_df = st.data_editor(
    df,
    num_rows="dynamic",          # allow adding/deleting rows
    disabled=["id"],             # id column is read-only
    column_config={
        "value": st.column_config.NumberColumn("Value", format="$%.2f"),
        "category": st.column_config.SelectboxColumn(
            "Category",
            options=["A", "B", "C"],
        ),
    },
)

The column_config system lets you specify column types (NumberColumn, TextColumn, SelectboxColumn, CheckboxColumn, LinkColumn, DateColumn, etc.) with formatting and validation. The result is a spreadsheet-like editing experience inside the dashboard.

st.data_editor is useful for:

Annotation workflows: let users label or correct data points.
Configuration: edit a table of parameters that drives the rest of the dashboard.
Manual adjustments: override computed values with human judgment.
Data entry: collect structured data from users without a database form.

The edited DataFrame is returned like any other widget value, so you can use it immediately:

st.metric("Total", f"${edited_df['value'].sum():.2f}")

As the user edits cells, the script re-runs and the metric updates live. This is one of the most impressive Streamlit features for non-developer audiences — it produces a "smart spreadsheet" feel with minimal code.

29.15 Custom Components and Theming

Streamlit's built-in widgets cover most dashboard needs, but sometimes you want something custom — an interactive map, a specialized chart, a complex form. Streamlit supports custom components through a JavaScript extension system. Popular custom components:

streamlit-folium: embed Folium maps (Chapter 23) with click events.
streamlit-aggrid: advanced grid with filtering, grouping, pinned columns.
streamlit-lottie: animated Lottie files for loading screens.
streamlit-chat: chat UI for LLM applications.
streamlit-plotly-events: capture click/select events from Plotly charts.

These are installed separately (pip install streamlit-folium) and imported into your app. The library ecosystem is rapidly growing, and for specialized needs there is usually a custom component available.

For styling, Streamlit supports theming via a config.toml file in .streamlit/:

[theme]
primaryColor = "#1F77B4"
backgroundColor = "#FFFFFF"
secondaryBackgroundColor = "#F0F2F6"
textColor = "#262730"
font = "sans serif"

These settings apply to the whole app and affect Streamlit's built-in widgets. For custom chart styling, you still use the chart library's own theming (Plotly templates, matplotlib style sheets). Streamlit's theme system is limited compared to full CSS — for dashboards requiring brand-perfect styling, Dash is often a better choice.

29.16 When to Use Streamlit vs. Alternatives

Streamlit is not the only dashboard tool in Python. The main alternatives:

Dash (Chapter 30): explicit callbacks, more flexibility, more complex. Better for production dashboards with custom interactivity, especially cross-filtering between charts.

Panel: part of the HoloViz ecosystem (with datashader and HoloViews). Best for big-data interactive visualization. More powerful but more complex than Streamlit.

Gradio: similar philosophy to Streamlit but focused on ML model demos. Widgets are ML-friendly (image upload, audio, text generation). Better for "show what my model does" apps than for traditional dashboards.

Jupyter voila: turns Jupyter notebooks into dashboards by hiding the code cells. Good for data scientists who already work in notebooks and want to share without recoding.

Plotly Dash Design Kit (paid): Dash with professional UI components and enterprise features. Good for consulting projects or client-facing dashboards.

No Python at all: Tableau, Power BI, Looker. Drag-and-drop tools that require no code. Best for non-programmers or for organizations with existing BI infrastructure.

Decision criteria:

Speed of development: Streamlit > Gradio > Panel > Dash > pure HTML/JS. If you need a dashboard by Friday, Streamlit is probably the answer.
Flexibility: Dash > Panel > Streamlit > Gradio. For unusual interactions (cross-filtering, custom widgets, specific layouts), Dash gives more control.
Learning curve: Streamlit (gentle) < Gradio < Panel < Dash < raw Flask. Streamlit is the easiest entry point.
Scalability: Dash > Streamlit (both have paid tiers for scale). For thousands of concurrent users, Dash Enterprise or similar is more mature.
Deployment: Streamlit (Community Cloud is free) > Dash (requires hosting) > Panel (same).
Integration with enterprise tools: Dash (Plotly's enterprise offering) > Streamlit (Snowflake integration). Depends on your stack.

For most new projects, the right first question is "is Streamlit enough?" If yes, build it in Streamlit and ship in an afternoon. If no, move to Dash or Panel. The cost of moving is real (rewriting) but not catastrophic — most of the business logic transfers, and only the UI layer needs to be redone.

29.17 Streamlit in Production

Streamlit's development experience is excellent. Production is more nuanced. Several considerations for production Streamlit apps:

Authentication: Streamlit Community Cloud has basic auth (password protection for private apps). For SSO, OAuth, or role-based access, you need a custom auth layer or a paid tier.

Scaling: a single Streamlit instance handles modest load. For high concurrency, run multiple instances behind a load balancer with session affinity (so users always hit the same instance, preserving session state).

Database connections: use @st.cache_resource for connections. Be aware of connection pooling — Streamlit's re-run model can thrash naive connection logic.

Environment variables and secrets: use st.secrets (reads from a secrets.toml file) for API keys and credentials. Do not hardcode them.

Logging: Python's standard logging works. For distributed deployments, route logs to a centralized system (CloudWatch, Datadog, etc.).

Monitoring: Streamlit Community Cloud provides basic metrics. For more, instrument your app with Python metrics libraries and export to a monitoring service.

Testing: Streamlit apps are hard to unit test because the st.* calls are stateful. Focus unit testing on the pure-Python functions (data loading, filtering, chart building) and integration test the whole app with tools like streamlit-test or by running it against a headless browser.

CI/CD: GitHub Actions can run tests and deploy to Streamlit Cloud automatically on push. For self-hosted, Docker + standard CI/CD tools work.

These concerns are not unique to Streamlit — any web application has them — but they are worth mentioning because Streamlit's easy development can lure you into skipping production hardening until late.

29.18 A Design Pattern: Filter → Transform → Display

A recurring pattern in Streamlit dashboards is the filter-transform-display pipeline. The sidebar has filter widgets; the main area shows transformed results. This linear flow maps cleanly onto Streamlit's re-run model and is the natural structure for most analytical dashboards.

The canonical skeleton:

import streamlit as st
import pandas as pd

st.set_page_config(page_title="Dashboard", layout="wide")

# --- Data loading (cached) ---
@st.cache_data
def load_data():
    return pd.read_parquet("data.parquet")

df = load_data()

# --- Filter widgets in sidebar ---
with st.sidebar:
    st.header("Filters")
    date_range = st.slider("Date range", ...)
    category = st.multiselect("Category", df["category"].unique())
    region = st.selectbox("Region", ["All"] + sorted(df["region"].unique().tolist()))

# --- Apply filters ---
filtered = df[
    (df["date"] >= date_range[0]) & (df["date"] <= date_range[1])
    & (df["category"].isin(category) if category else True)
    & ((df["region"] == region) if region != "All" else True)
]

# --- Compute derived metrics ---
summary = filtered.groupby("category").agg(
    count=("id", "count"),
    total=("value", "sum"),
    mean=("value", "mean"),
).reset_index()

# --- Display ---
st.title("Dashboard")

col1, col2, col3 = st.columns(3)
col1.metric("Total records", f"{len(filtered):,}")
col2.metric("Sum of value", f"${filtered['value'].sum():,.0f}")
col3.metric("Mean of value", f"${filtered['value'].mean():.2f}")

tab1, tab2, tab3 = st.tabs(["Chart", "Summary table", "Raw data"])
with tab1:
    fig = build_chart(filtered)
    st.plotly_chart(fig, use_container_width=True)
with tab2:
    st.dataframe(summary)
with tab3:
    st.dataframe(filtered)

This skeleton is about 50 lines of Python and produces a fully functional dashboard. For most projects, you start with this template and adapt it — replace the data loading, customize the filters, build the charts. The structure stays the same.

The pattern has several virtues. It is linear: read top to bottom and the flow is obvious. It is testable: the filter logic and transform logic are pure functions that can be unit-tested in isolation. It is cacheable: the expensive load step is cached, and the cheap filter step runs on every interaction. It is predictable: users quickly learn that the sidebar has filters and the main area shows results.

Most Streamlit dashboards in the wild — at companies, in open-source projects, in academic demos — follow this pattern with only minor variations. Learning to recognize and reproduce it gets you 80% of the way to any dashboard you need to build.

29.19 Reactive Dependencies Made Explicit

In the filter-transform-display pattern, the dependencies between widgets and charts are implicit in the script order. A widget on line 20 affects a chart on line 40 because the chart reads the widget's value. Streamlit does not require you to declare this dependency explicitly; the re-run model makes it automatic.

This implicit dependency has trade-offs. The good: you write the code the way you think about it — linear, direct, no plumbing. The bad: there is no easy way to see all the dependencies at once. A large dashboard with many widgets and charts can become tangled because every widget potentially affects every chart, and the only documentation is the script itself.

Mitigations:

Separate data loading from UI: put your data loading and transformation functions in a separate module, and import them into the app. The Streamlit file becomes mostly UI code.
Use descriptive variable names: filtered_by_date_and_region is more informative than df2.
Add comments at transition points: mark where data is loaded, where filters are applied, where transforms happen, where display begins. This makes the flow easier to follow.
Limit the sidebar filter set: if you have twenty filters, consider grouping them (by category with st.expander) or restructuring the dashboard into multiple pages.

For dashboards that grow beyond about 200 lines, consider whether Streamlit's implicit model is still serving you or whether you should switch to Dash's explicit callback model (Chapter 30). The threshold is usually around "five independent interactive regions with cross-filtering" — at that scale, Dash's explicit dependencies become easier to reason about than Streamlit's implicit ones.

29.20 Streamlit for Prototyping ML Models

Beyond dashboards, Streamlit is heavily used for ML model prototyping. The pattern:

import streamlit as st
from PIL import Image
import model_loader

@st.cache_resource
def get_model():
    return model_loader.load()

model = get_model()

st.title("Image Classifier")
uploaded_file = st.file_uploader("Upload an image", type=["jpg", "png"])

if uploaded_file:
    img = Image.open(uploaded_file)
    st.image(img, caption="Input", use_container_width=True)

    with st.spinner("Running model..."):
        prediction = model.predict(img)

    st.success(f"Predicted: {prediction.label} ({prediction.score:.1%} confidence)")

    # Show top 5 alternatives
    st.bar_chart(pd.DataFrame(prediction.top_5))

This 20-line app lets anyone upload an image and see a model's predictions. The model is loaded once (via cache_resource) and reused across requests. The user never has to install Python, load the model, or write code — they just use the web app.

For model demos, internal tools, and experimentation, this pattern is hugely productive. It is the reason Streamlit became popular among ML researchers — a new model can be demoed to collaborators in a few hours rather than a few weeks.

Gradio (mentioned in Section 29.16) is a specialized alternative optimized for ML demos. If the dashboard is primarily "show what this model does," Gradio may be a slightly better fit. For mixed dashboards that combine ML with data exploration, Streamlit is more general.

29.21 Streamlit vs. Notebooks

A specific comparison worth making: Streamlit vs. Jupyter notebooks. Both are Python-based, both are used for data exploration, both can produce interactive output. When should you use each?

Use a notebook when:

You are doing exploratory analysis and do not yet know what the final output should look like.
You need the ability to run cells out of order, go back and edit earlier steps, and iterate rapidly.
The audience is you (or other data scientists) who will read the notebook and run it themselves.
You want to mix narrative text, code, output, and interpretation in one document.
The output is primarily for internal analysis or a paper, not for distribution to non-technical users.

Use a Streamlit app when:

You know what the output should look like and want to package it for others.
The audience is non-technical and will not run Python themselves.
You need interactive controls (widgets) that update the output in real time.
You want to deploy a persistent tool that users can access via URL.
The dashboard will be used repeatedly with different inputs.

A common workflow is to start in a notebook (for exploration and analysis), then port the final version to Streamlit (for delivery). The notebook captures the thinking; the app captures the result. Both are first-class Python artifacts, and the choice between them depends on the audience and the persistence requirement.

Hybrid approaches:

Jupyter voila: turns a notebook into a dashboard by hiding the code cells. Faster than rewriting to Streamlit but less flexible.
Streamlit with notebook-style markdown: use st.markdown liberally to capture the narrative structure of a notebook in a Streamlit app.
Export notebook to HTML: for static distribution, jupyter nbconvert produces a shareable file.

For most delivery contexts, Streamlit is the better choice. For exploration, notebooks remain irreplaceable. Treat them as complementary tools, not competitors.

29.22 A Note on Update Frequencies

Streamlit apps typically reload data on startup (via @st.cache_data) and reuse the cached data for the session. For dashboards that need to reflect live data, you need explicit refresh logic.

Options for live data:

Time-based cache expiration: set ttl=60 on @st.cache_data to refresh every 60 seconds. Simple but coarse — the user might see stale data for up to 60 seconds after a change.

Manual refresh button: a st.button("Refresh") that calls st.cache_data.clear() and re-runs. Gives users control but requires them to know when to click.

Auto-refresh with st.rerun: schedule a rerun with st.rerun() after a delay. Creates a polling loop. Works but can feel sluggish.

Live connection to a streaming source: websocket or server-sent events to a real-time data source. More complex to implement but gives true real-time updates.

Plotly Dash alternative: if real-time is critical, Dash's dcc.Interval component is designed for this use case and tends to be smoother than Streamlit's manual approaches.

For most analytical dashboards, "refresh once per session" is sufficient. For operational dashboards (trading, monitoring, alerts), real-time refresh matters more, and Dash is often the better tool.

29.23 Testing and Debugging Streamlit Apps

Streamlit's re-run model makes certain kinds of debugging unusual. A few strategies that help:

st.write everywhere during development. Sprinkle st.write(some_variable) calls throughout your script to see intermediate values. Streamlit will render them inline, letting you inspect the state of the app without a debugger. Remove them before deploying.

st.write(st.session_state): render the full session state to see exactly what is stored between re-runs. Useful for debugging persistent-state issues.

print still works but prints to the terminal where you ran streamlit run, not to the browser. Useful for logging.

st.exception: if an exception is raised inside your app, Streamlit catches it and displays a traceback in the browser. You can also catch exceptions yourself and render them with st.exception(e).

Rerun button: the top-right menu has a "Rerun" option that forces a full re-execution. Useful when you have edited the script and want to see the changes.

Clear cache: the same menu has "Clear cache" which invalidates all @st.cache_data and @st.cache_resource entries. Useful when cached data is stale.

Unit test the pure functions. The st.* calls are hard to test, but any data transformation or business logic in your app should be pure Python functions that can be tested with pytest. Keep the Streamlit layer thin.

Integration test with streamlit-test (community package): lets you instantiate an app, set widget values programmatically, and assert on the rendered output. Useful for catching regressions in complex apps.

Headless browser testing: for full end-to-end tests, use Playwright or Selenium to automate a real browser interacting with the app. More reliable but slower than unit tests.

The development cycle for Streamlit is fast: edit the script, save, the app auto-reloads in the browser, and you see the result. This tight loop is one of Streamlit's biggest productivity benefits. Treat it as a REPL for dashboards — iterate rapidly, fix issues live, and ship when it works.

A practical tip: keep the browser window open next to your text editor — on a second monitor if you have one, or in a split view if you do not. With the auto-reload, you can make a change, save, and see the effect in under a second. This feedback loop is faster than most IDE debuggers and encourages a playful, experimental approach to dashboard building. Try things, see what happens, adjust, and move on. For most dashboards, this iterative style produces a working first version faster than any amount of upfront planning would, and the planning step can happen after you have something concrete to react to and iterate on.

29.24 Check Your Understanding

What is Streamlit's execution model, and how does it differ from traditional web frameworks?
What is the difference between @st.cache_data and @st.cache_resource?
When should you use st.session_state?
What widget returns a tuple for a range slider?
How do you make a Plotly chart responsive to the container width in Streamlit?
What is st.download_button, and when would you use it?
How does a multi-page Streamlit app organize its files?
Name three Streamlit pitfalls and their remedies.

29.25 Chapter Summary

This chapter introduced Streamlit for building interactive dashboards in pure Python:

Execution model: apps are scripts that re-run top-to-bottom on every interaction. Caching mitigates the cost.
Layout primitives: st.sidebar, st.columns, st.tabs, st.expander, st.container.
Widgets: text_input, number_input, slider, selectbox, multiselect, date_input, file_uploader, button, checkbox, radio.
Chart embedding: st.pyplot, st.plotly_chart, st.altair_chart, st.bokeh_chart for each major library.
Caching: @st.cache_data for data, @st.cache_resource for models and connections.
Session state: st.session_state for values that persist across re-runs.
Multi-page apps: pages/ directory with numbered filenames.
Deployment: Streamlit Community Cloud for free hosting via GitHub integration.

The chapter's threshold concept — apps are scripts — is the foundation for everything else. Once you accept that Streamlit re-runs the script on every interaction, the rest of the framework makes sense.

Chapter 30 introduces Dash, which uses a different execution model (explicit callbacks) and has different trade-offs. Knowing both tools lets you pick the right one for each project.

29.26 Spaced Review

From Chapter 20 (Plotly Express): Streamlit embeds Plotly charts natively. Which Plotly features work inside Streamlit, and which require extra work?
From Chapter 12 (Customization Mastery): Streamlit supports matplotlib via st.pyplot. How does this interact with the rcParams customization from Chapter 12?
From Chapter 9 (Storytelling): A dashboard is a different kind of narrative than a presentation. How does Chapter 9's storytelling framework adapt to dashboards?
From Chapter 19 (Multi-Variable Exploration): Dashboards are tools for exploration. How does this fit with Shneiderman's mantra (overview, zoom, filter, details)?
From Chapter 4 (Honest Charts): Interactive dashboards can be manipulated by filter choices. How do you build a dashboard that resists misinterpretation?

Streamlit is the fastest path from "I have data" to "I have an interactive dashboard." For rapid prototyping, internal tools, ML model demos, and exploratory dashboards, it is often the single best tool choice in the Python ecosystem. Chapter 30 introduces Dash, the main alternative in the Python dashboard ecosystem, which gives you more control and more flexibility at the cost of more complexity and a slightly steeper learning curve.

Learning Objectives

In This Chapter

Chapter 29: Building Dashboards with Streamlit

29.1 From Individual Charts to Interactive Applications

29.2 Streamlit's Execution Model

29.3 The Minimal Streamlit App

29.4 Layout: Sidebar, Columns, Tabs, Expanders

29.5 Widgets: The Interactive Layer

29.6 Embedding Charts

29.7 Caching: Making Re-Runs Fast

29.8 Session State for Persistent Values

29.9 Multi-Page Applications

29.10 Deployment: Streamlit Community Cloud

29.11 Streamlit Pitfalls

29.12 Progressive Project: Climate Dashboard

29.13 Forms for Batched Input

29.14 Data Editing with st.data_editor

29.15 Custom Components and Theming

29.16 When to Use Streamlit vs. Alternatives

29.17 Streamlit in Production

29.18 A Design Pattern: Filter → Transform → Display

29.19 Reactive Dependencies Made Explicit

29.20 Streamlit for Prototyping ML Models

29.21 Streamlit vs. Notebooks

29.22 A Note on Update Frequencies

29.23 Testing and Debugging Streamlit Apps

29.24 Check Your Understanding

29.25 Chapter Summary

29.26 Spaced Review