Chapter 22 Quiz: Data Analysis and Visualization

DataField.Dev

Chapter 22 Quiz: Data Analysis and Visualization

Test your understanding of AI-assisted data analysis workflows, trust calibration, and appropriate tool selection. After answering each question, reveal the explanation using the dropdown.

Question 1

Which of the following is the most accurate description of Tier 1 (chat-based) data analysis?

A) Tier 1 requires the most technical skill because you must understand the underlying code B) At Tier 1, you interact in natural language without writing code, making it accessible to non-programmers but requiring high verification effort since the analysis process is not directly visible C) Tier 1 is less accurate than Tier 2 because it cannot handle large datasets D) Tier 1 requires API keys and programming knowledge to use effectively

Answer

**B) At Tier 1, you interact in natural language without writing code, making it accessible to non-programmers but requiring high verification effort since the analysis process is not directly visible** The chapter explicitly describes Tier 1 as: "You do not write code. You upload your data to a tool like ChatGPT Advanced Data Analysis or describe it in a chat interface, ask questions in natural language, and receive analysis results." It also notes that "Verification requirements at Tier 1: High. Because you are not seeing the underlying code, you have less ability to audit the analysis process." The accessibility comes with an obligation to verify outputs more carefully because you cannot read the code to check its logic.

Question 2

What is the most important verification step after receiving AI-generated analysis that includes specific calculated numbers?

A) Checking whether the visualization looks professional B) Verifying calculated numbers manually or with a separate calculation before using them in a deliverable C) Asking AI to calculate the same numbers again to confirm consistency D) Running the analysis through a second AI model to cross-check

Answer

**B) Verifying calculated numbers manually or with a separate calculation before using them in a deliverable** The chapter states: "Whenever AI performs a calculation — a sum, a percentage, an average, a growth rate — verify it manually or in a separate calculation. Do not trust that the number is correct because it was produced by a sophisticated model." The verification can be a quick back-of-envelope check: total divided by count to verify an average, starting and ending values divided to verify a growth rate. Asking AI to recalculate is not verification — it may produce the same error again.

Question 3

What is "narrative overfitting" in the context of AI data interpretation?

A) AI producing interpretations that are too long and need to be shortened B) AI constructing a compelling narrative from data patterns that may not be statistically significant — the story feels convincing even when the data is insufficient to support it C) AI fitting a standard template narrative to every dataset regardless of what the data shows D) AI producing interpretations that are too vague and need more specific direction

Answer

**B) AI constructing a compelling narrative from data patterns that may not be statistically significant — the story feels convincing even when the data is insufficient to support it** The chapter states: "A compelling story about why the numbers look a certain way can feel convincing even when the data is insufficient to support the story." AI is skilled at constructing coherent narratives from patterns — sometimes too skilled. The danger is that the narrative's coherence creates a false sense of its validity. The counter is to explicitly ask: "What alternative narratives fit this data equally well?" and "Is this pattern practically significant, not just statistically interesting?"

Question 4

When using ChatGPT Advanced Data Analysis, what information should you include in the initial prompt beyond just uploading the file?

A) The specific code you want it to run B) Nothing — the tool should figure out the context from the data C) The size of the dataset and the number of rows D) Context about what the data contains and specifically what you want to understand

Answer

**D) Context about what the data contains and specifically what you want to understand** The chapter explicitly contrasts: "'Analyze this data' produces generic output" versus a detailed context prompt that "produces targeted, useful analysis." Context matters significantly. Tell the tool what the data represents, what time period it covers, and what business questions you want it to help answer. Without this context, the tool produces a generic EDA without regard for what matters to your specific situation.

Question 5

At Tier 2 (code-assisted analysis), what should you do before running AI-generated analysis code?

A) Run it immediately to save time — you can review the output rather than the code B) Submit it to a second AI model to verify it is correct C) Read and understand the code, checking for correct filtering, date parsing, aggregation functions, and chart axis labeling D) Only review the first 10-15 lines since errors are most likely at the beginning

Answer

**C) Read and understand the code, checking for correct filtering, date parsing, aggregation functions, and chart axis labeling** The chapter states: "At Tier 2, when AI writes analysis code, you must read and understand the code before running it." The specific things to check include: whether data is being filtered correctly, whether dates are parsed correctly, whether the aggregation function (sum, mean, count) is appropriate for the measure, and whether chart axes are labeled correctly. Running code without reading it is explicitly identified as "a significant risk" because code can work without errors while doing the wrong thing analytically.

Question 6

Which data types should NOT be uploaded to consumer AI tools without organizational approval or anonymization?

A) Publicly available economic data and aggregated product sales totals B) Personally identifiable information, protected health information, and data covered by NDAs C) Any data with more than 1,000 rows D) Internal data of any kind — all internal data requires approval before using AI tools

Answer

**B) Personally identifiable information, protected health information, and data covered by NDAs** The chapter provides a specific list: "Do not upload to external AI tools: Personally identifiable information (PII) — names, email addresses, social security numbers; Protected health information (PHI) covered by HIPAA; Non-public financial data with regulatory sensitivity; Data covered by NDAs or confidentiality agreements with specific handling requirements." The chapter also recommends anonymizing data before uploading when sensitive information is present.

Question 7

In the Alex scenario, what does she do after receiving the AI interpretation of her campaign data that demonstrates good trust calibration?

A) She publishes the AI interpretation directly in her quarterly business review report B) She asks AI to verify its own interpretation C) She critically evaluates each finding, challenges the email timing finding as potentially a sample size artifact, and adds her own domain context (competitive development) to the final summary D) She runs the same analysis in a second tool to verify the findings

Answer

**C) She critically evaluates each finding, challenges the email timing finding as potentially a sample size artifact, and adds her own domain context (competitive development) to the final summary** Alex's evaluation demonstrates appropriate trust calibration: she agrees with Finding 1, flags Finding 2 as potentially a small-sample artifact (two spikes over one quarter is not a robust pattern), and challenges Finding 3 by asking whether the trend is statistically significant. She also adds domain context — a competitive development affecting the paid search picture — that AI could not know. The final summary is her own analytical judgment, informed by AI analysis, not a direct republication of AI output.

Question 8

What is the key discovery in Raj's log analysis scenario, and what does it illustrate about AI-assisted investigation?

A) AI found the root cause independently; the scenario illustrates that AI can replace human investigation for infrastructure issues B) AI produced incorrect analysis that led Raj to the wrong conclusion; the scenario illustrates AI's limitations in technical domains C) AI-generated analysis surfaced an anomaly (the 2-4 AM latency window); human judgment recognized the security implications of the pattern; both contributions were necessary D) The scenario illustrates that log analysis should not use AI because of the security risks involved

Answer

**C) AI-generated analysis surfaced an anomaly (the 2-4 AM latency window); human judgment recognized the security implications of the pattern; both contributions were necessary** The Raj scenario is designed to illustrate the human-in-the-loop principle applied to technical analysis. AI-written code surfaced the latency anomaly pattern and the hypothesis about external API rate limiting (which proved correct). But the security implication — that seventeen of the twenty slowest requests originated from a suspicious IP pattern suggesting probing — was identified by a human team member reviewing the trace IDs, not by AI. Both contributions were necessary for the full picture.

Question 9

What is the chapter's framework for distinguishing what AI does well versus what requires human judgment in data interpretation?

A) AI handles visualization; humans handle statistical analysis B) AI is good at pattern description and hypothesis generation; humans must provide causal claims, business context, statistical significance assessment, and challenge to narrative overfitting C) AI handles the analysis for small datasets; humans handle large datasets D) AI handles all interpretation; humans only need to verify the charts look correct

Answer

**B) AI is good at pattern description and hypothesis generation; humans must provide causal claims, business context, statistical significance assessment, and challenge to narrative overfitting** The chapter explicitly lists what AI does well in interpretation (pattern description, hypothesis generation, implication articulation) and what requires human judgment (causal claims, business context, statistical significance assessment, and challenging narrative overfitting). These are not arbitrary distinctions — they reflect the specific ways that AI interpretation fails: it generates causal language beyond what data supports, lacks your specific business context, does not always flag statistical significance concerns, and can construct compelling narratives from insufficient data.

Question 10

What research finding about AI-generated analysis code should influence your Tier 2 verification practices?

A) AI-generated code is 99% accurate and rarely requires review B) AI-generated code only fails on complex statistical analyses; simple pandas operations are always correct C) Approximately 25% of first-draft AI analysis code contains logical or technical errors that produce incorrect results without causing code failures D) AI-generated code contains errors only when the prompt is insufficiently detailed

Answer

**C) Approximately 25% of first-draft AI analysis code contains logical or technical errors that produce incorrect results without causing code failures** The chapter cites a 2024 study finding approximately 25% error rates in first-draft AI analysis code — including errors that "did not cause code failures; they produced incorrect results silently." The silent failure mode is particularly important: code that fails with an error is obviously broken; code that runs successfully but produces wrong numbers is invisible without verification. This finding directly justifies the requirement to read and understand AI-generated code before running it.

Question 11

What makes Elena's approach to qualitative survey data different from her approach to quantitative data, and why?

A) She uses different AI models for each type of data B) She uploads quantitative data to an external tool after de-identification, but reads qualitative data herself first and submits only her own theme notes to AI — avoiding the re-identification risk of uploading 800 individual text responses C) She analyzes quantitative data herself and uses AI only for qualitative data D) She does not use AI for qualitative analysis at any stage

Answer

**B) She uploads quantitative data to an external tool after de-identification, but reads qualitative data herself first and submits only her own theme notes to AI — avoiding the re-identification risk of uploading 800 individual text responses** The scenario explains Elena's reasoning: "She cannot upload 800 free-text responses to an external tool — even de-identified, the text volume and content risk re-identification." Her solution is to read 100 responses herself, identify recurring themes, and submit her theme notes (not the original texts) to AI for synthesis. This approach applies AI's synthesis capability without the privacy risk of uploading potentially re-identifiable text. The scenario also illustrates that the qualitative synthesis is accurate because Elena is providing input from her own reading.

Question 12

What should you specify in a visualization request to get a useful chart on the first attempt?

A) Only the chart type — the AI will figure out the data mapping B) Chart type, what goes on each axis, any grouping or color coding, the desired title, and any specific formatting requirements C) The underlying data in the prompt — the AI needs to see the raw numbers to create a good chart D) A description of the insight you want to communicate — the AI will choose the best chart type

Answer

**B) Chart type, what goes on each axis, any grouping or color coding, the desired title, and any specific formatting requirements** The chapter recommends specifying: chart type, axis content, grouping/color coding, and title. The example prompt demonstrates this level of specificity: "Create a line chart showing monthly revenue for all three product lines on the same chart. Use distinct colors for each product line. Include a legend. Add a trend line for each product line. Title: 'Monthly Revenue by Product Line, 2022-2024'." Vague requests ("show me a chart of the data") produce generic outputs that require multiple iterations. Specific requests get closer to useful on the first attempt.

Question 13

What is the appropriate use of Gemini in Google Sheets or Excel Copilot compared to ChatGPT Advanced Data Analysis?

A) Gemini and Copilot are more powerful than Advanced Data Analysis and should always be preferred B) Advanced Data Analysis is better for large datasets; Gemini/Copilot are better for small datasets C) Gemini and Copilot are integrated directly into the spreadsheet environment, making them convenient for analysis within existing spreadsheets; Advanced Data Analysis is better for exploratory analysis of new datasets and Python-based computation D) Gemini and Copilot should not be used because they access your data for training purposes

Answer

**C) Gemini and Copilot are integrated directly into the spreadsheet environment, making them convenient for analysis within existing spreadsheets; Advanced Data Analysis is better for exploratory analysis of new datasets and Python-based computation** The chapter describes spreadsheet AI as convenient for professionals "whose primary data tool is a spreadsheet application" — the integration within the familiar environment is the advantage. Both tools have the same trust calibration requirements as other AI analysis tools. The choice between them is primarily about workflow fit: if you are already working in Sheets or Excel, the integrated tool is convenient; for more complex analysis or unfamiliar datasets, Advanced Data Analysis offers Python computation capability.

Question 14

What does the research finding about analyst productivity gains suggest about where AI adds the most value for data professionals?

A) AI adds equal value across all analysis tasks for professional data analysts B) The productivity gain is largest for boilerplate code (loading, cleaning, standard visualizations) and smallest for complex statistical analysis and interpretation C) AI adds more value for statistical analysis than for data loading and cleaning D) AI adds value only for junior analysts; senior analysts do not benefit significantly

Answer

**B) The productivity gain is largest for boilerplate code (loading, cleaning, standard visualizations) and smallest for complex statistical analysis and interpretation** The chapter cites multiple studies finding 30-50% time reduction for professional analysts, with the gain concentrated in routine tasks. This reflects the general pattern: AI is strongest where the task is well-defined and conventional (data loading, cleaning, standard plot types) and weakest where the task requires contextual judgment (complex statistical methodology, domain-specific interpretation). For expert analysts, the implication is to leverage AI heavily for the setup and visualization work while maintaining more oversight of the analysis logic and interpretation.

Question 15

When should you ask AI "what are five plausible hypotheses?" rather than "what is the explanation?" for a data pattern?

A) Only when you are uncertain about the data quality B) Always — AI should never be asked for a single explanation C) When you are investigating a pattern rather than communicating a conclusion; hypothesis generation is for exploration, not for final interpretation D) Only when the pattern involves time-series data

Answer

**C) When you are investigating a pattern rather than communicating a conclusion; hypothesis generation is for exploration, not for final interpretation** The chapter describes hypothesis generation as "generating multiple plausible explanations efficiently" — the purpose is to "direct investigation productively," not to provide a conclusion. Asking for multiple hypotheses is appropriate at the investigation stage: you have found a pattern and want to understand what might be causing it. You then investigate the most plausible hypotheses through further analysis or domain knowledge. Asking for "the explanation" invites AI to overstate confidence in a single interpretation; asking for multiple hypotheses acknowledges the genuine uncertainty at the investigation stage.