Appendix E: Frequently Asked Questions

Q1: Why not just use Excel for data visualization?

Excel is fine for quick, one-off charts during exploration. However, Python-based visualization offers reproducibility (your chart is generated by a script, not mouse clicks), scalability (millions of rows without crashing), customization (pixel-level control over every element), and automation (generate 50 reports with one command). If you need to produce the same chart next month with updated data, a Python script will always be faster and more reliable than re-creating an Excel chart by hand.

Q2: Plotly or matplotlib -- which should I learn first?

Learn matplotlib first. It is the foundation that seaborn, pandas plotting, and many other libraries are built on. Once you understand the Figure/Axes model and can control every element of a static chart, picking up Plotly or Altair takes days rather than weeks. matplotlib also produces the highest-quality output for print and publication, which interactive libraries cannot match.

Q3: When should I use Plotly instead of matplotlib?

Use Plotly when your audience will view the chart in a browser and benefits from hover tooltips, zooming, or filtering. Dashboards, web applications, and exploratory Jupyter notebooks are natural Plotly contexts. For static output -- PDF reports, journal figures, slide decks, printed posters -- matplotlib or seaborn will give you cleaner results with more control.

Q4: How do I make my chart colorblind-safe?

Three strategies: (1) Use a colorblind-safe palette such as the Wong palette, viridis, or cividis. (2) Encode information redundantly -- pair color with shape, line style, or direct text labels. (3) Test your chart with a color vision deficiency simulator such as Color Oracle or Coblis. See Chapter 3 and Appendix B for palette recommendations.

Q5: What DPI should I use for print?

Use 300 DPI for standard print (reports, handouts, posters). Use 600 DPI for high-quality journal submissions if the publisher requires it. For screen display (web pages, slides, dashboards), 72 to 150 DPI is sufficient. In matplotlib, set the DPI in fig.savefig("chart.png", dpi=300).

Q6: Streamlit or Dash -- which dashboard framework should I pick?

Streamlit is faster to prototype. You can go from a Jupyter notebook to a working dashboard in under an hour. It uses a simple top-to-bottom script model with no callbacks. Choose Streamlit for internal tools, quick prototypes, and data science team dashboards.

Dash offers more architectural control through its callback system, supports multi-page applications with URL routing, and integrates deeply with Plotly. Choose Dash for production-facing applications, complex multi-view dashboards, and projects where you need fine-grained state management.

Q7: How do I handle overlapping points in a scatter plot?

Several options depending on the number of points: (1) Reduce alpha (transparency) to reveal density: alpha=0.3. (2) Add jitter (small random displacement) for discrete-valued data. (3) Switch to a hexbin plot (ax.hexbin()) or 2D KDE plot for large datasets. (4) Use datashader for millions of points. (5) Subsample for exploration, then use aggregation for the final chart.

Q8: How do I export a chart as a vector file for a publication?

Save as PDF or SVG: fig.savefig("chart.pdf", bbox_inches="tight"). Vector formats scale to any size without pixelation and are required by most academic publishers. If the journal requires EPS, matplotlib supports that too: fig.savefig("chart.eps"). Avoid PNG for publication figures unless the journal specifically requests raster format.

Q9: My figure labels get cut off when I save. How do I fix that?

Use bbox_inches="tight" in savefig() to automatically expand the bounding box around all visible elements. Alternatively, call fig.tight_layout() before saving, or create the figure with constrained_layout=True: fig, ax = plt.subplots(constrained_layout=True).

Q10: How do I display currency, percentages, or thousands separators on axes?

Use matplotlib.ticker.FuncFormatter:

from matplotlib.ticker import FuncFormatter
ax.yaxis.set_major_formatter(FuncFormatter(lambda v, _: f"${v:,.0f}"))      # $1,234
ax.yaxis.set_major_formatter(FuncFormatter(lambda v, _: f"{v:.0%}"))         # 45%
ax.yaxis.set_major_formatter(FuncFormatter(lambda v, _: f"{v/1e6:.1f}M"))   # 2.3M

Q11: How many colors can I safely use in a single chart?

For categorical data, keep to 7 or fewer distinct colors in a single chart. Beyond that, the human eye struggles to map colors back to legend entries reliably. If you have more than 7 categories, consider grouping smaller categories into "Other," using small multiples (one category per panel), or using direct labels instead of a legend.

Q12: Should I use a dark background for my charts?

Dark backgrounds work well for screen-only contexts: dashboards, presentations in dim rooms, and applications with a dark UI theme. However, they consume far more ink when printed, render poorly when pasted into light-background documents, and require careful color choices (avoid pure white text on pure black -- use off-white on dark gray). Default to a white or light gray background unless you have a specific reason for dark.

Q13: What is the difference between plt.plot() and ax.plot()?

plt.plot() uses the pyplot state machine and implicitly operates on the "current" Axes. ax.plot() explicitly targets a specific Axes object. The explicit ax.plot() approach is strongly recommended because it avoids ambiguity when you have multiple subplots and makes your code more readable and predictable. Use fig, ax = plt.subplots() and work with ax directly.

Q14: How do I add interactivity to a matplotlib figure?

For Jupyter notebooks, use %matplotlib widget (requires ipympl) to get pan, zoom, and resize controls. For standalone applications, matplotlib supports event handling: fig.canvas.mpl_connect("button_press_event", callback). For general-purpose interactivity (hover tooltips, linked views, dropdowns), switch to Plotly, Altair, or a dashboard framework like Streamlit or Dash -- matplotlib was designed primarily for static output.

Q15: How do I make my seaborn plot look less "default"?

Three steps: (1) Set a context and style: sns.set_theme(context="notebook", style="whitegrid"). (2) Choose a non-default palette: sns.set_palette("Set2") or pass a custom list of hex colors. (3) Post-process with matplotlib -- seaborn returns matplotlib objects, so you can call ax.set_title(), remove spines, adjust tick formatting, and save at high resolution. Seaborn creates the statistical chart; matplotlib handles the polish.

Q16: Can I use Google Fonts or custom fonts in matplotlib?

Yes. Download the .ttf or .otf file, then register it:

from matplotlib import font_manager
font_manager.fontManager.addfont("/path/to/CustomFont-Regular.ttf")
plt.rcParams["font.family"] = "Custom Font"

Clear the font cache (matplotlib.get_cachedir()) if the font does not appear immediately. For reproducibility, include the font file in your project repository.

Q17: How do I share an interactive Plotly chart with someone who does not have Python?

Export it as a self-contained HTML file: fig.write_html("chart.html", include_plotlyjs=True). The recipient can open this file in any web browser -- no Python, no server, no installation required. The file includes all data and the Plotly.js library. For smaller file sizes, use include_plotlyjs="cdn" to load the library from a CDN instead of embedding it.

Q18: What is the grammar of graphics, and why does it matter?

The grammar of graphics is a framework (originated by Leland Wilkinson and popularized by Hadley Wickham in ggplot2) that decomposes a chart into independent components: data, aesthetic mappings, geometric objects, scales, coordinate systems, and facets. Libraries like Altair and Plotly Express implement this grammar, allowing you to build charts by composing these components declaratively rather than issuing drawing commands. It matters because it gives you a mental model for thinking about visualization systematically rather than memorizing chart recipes.

Q19: How do I handle datetime axes in matplotlib?

Pass Python datetime objects or pandas Timestamp values directly to plotting functions -- matplotlib recognizes them automatically. For formatting, use matplotlib.dates:

import matplotlib.dates as mdates
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b %Y"))
fig.autofmt_xdate()  # rotate labels to prevent overlap

Q20: My notebook is slow when plotting large datasets. What can I do?

First, downsample for exploration: plot a random 10,000-point sample while developing the chart, then switch to the full dataset for the final render. Second, use aggregation: hexbin plots, 2D histograms, or datashader instead of raw scatter plots. Third, for interactive plots, Plotly's WebGL renderers (scatter_gl, scattergl) handle hundreds of thousands of points far better than SVG mode. Fourth, avoid calling plt.show() or displaying the figure in a loop -- batch your rendering.