Chapter 16 Quiz: Excel and CSV Integration
Instructions: Select the best answer for each question. The answer key with explanations follows question 20.
Question 1
You call pd.read_excel("report.xlsx") without specifying sheet_name. What does pandas read?
A) All sheets, returned as a dictionary B) The first sheet (index 0), returned as a DataFrame C) The sheet named "Sheet1" — if it doesn't exist, raises a KeyError D) The last sheet in the workbook
Question 2
You want to read all sheets from an Excel workbook at once and get back a dictionary mapping sheet names to DataFrames. Which call achieves this?
A) pd.read_excel("report.xlsx", sheet_name="*")
B) pd.read_excel("report.xlsx", sheet_name=True)
C) pd.read_excel("report.xlsx", sheet_name=None)
D) pd.read_excel("report.xlsx", all_sheets=True)
Question 3
You load an Excel file with load_workbook("report.xlsx") and read a cell that contains the formula =SUM(A2:A10). The cell shows the value 45,000 in Excel. What does ws["B5"].value return in Python?
A) 45000
B) "=SUM(A2:A10)"
C) None — formulas cannot be read
D) A Formula object that must be evaluated separately
Question 4
Which load_workbook argument makes cell .value return the last calculated result instead of the formula string?
A) read_only=True
B) data_only=True
C) formula_mode="result"
D) cache_values=True
Question 5
You want to write multiple DataFrames to different sheets in the same Excel workbook using pandas. Which is the correct approach?
A) Call df.to_excel() multiple times with different sheet_name arguments — each call modifies the existing file
B) Use pd.ExcelWriter as a context manager and call df.to_excel(writer, sheet_name=...) for each DataFrame inside the block
C) Call pd.DataFrame.to_multisheet_excel(sheets_dict) — a built-in pandas method
D) Open the workbook with openpyxl after the first write, then append additional sheets manually
Question 6
In openpyxl, you apply cell.number_format = '"$"#,##0' to a cell containing the integer 580000. What does Excel display?
A) 580000 B) $580,000 C) $580000 D) 580,000
Question 7
You want to apply a bold, white-text font to every cell in the first row of a worksheet. Which code correctly achieves this?
A) ws.row(1).font = Font(bold=True, color="FFFFFF")
B) for cell in ws[1]: cell.font = Font(bold=True, color="FFFFFF")
C) ws.apply_row_style(1, Font(bold=True, color="FFFFFF"))
D) ws.rows[0].style = {"font": {"bold": True, "color": "FFFFFF"}}
Question 8
What does ws.freeze_panes = "A3" do?
A) Freezes column A only, allowing rows to scroll B) Freezes rows 1 and 2 (everything above row 3) so they stay visible while scrolling down C) Locks cells A1 through A3 from editing D) Freezes columns to the left of column A — which means nothing is frozen
Question 9
You want to auto-fit column widths based on content. openpyxl does not have a built-in auto-fit function. What is the standard manual approximation approach?
A) Set all columns to ws.column_dimensions["A"].best_fit = True
B) Iterate over each column, find the maximum string length of any cell value, and set ws.column_dimensions[col_letter].width to that length plus padding
C) Call ws.auto_size_columns() after writing all data
D) Use pd.ExcelWriter with engine="xlsxwriter" which has built-in auto-fit
Question 10
Which openpyxl number format string produces the output 38.5% from the value 0.385?
A) "0.0%"
B) "38.5%"
C) "##.#%"
D) '"{0:.1%}".format(value)'
Question 11
You have a CSV file from a German ERP system that uses semicolons as delimiters. The values include amounts like "1.234,56" (European format: period as thousands separator, comma as decimal). Which pd.read_csv call handles this correctly?
A) pd.read_csv("data.csv", sep=";", decimal=",")
B) pd.read_csv("data.csv", sep=";") — pandas detects decimal format automatically
C) pd.read_csv("data.csv", sep=",", european=True)
D) pd.read_csv("data.csv", sep=";", thousands=".", decimal=",")
Question 12
A CSV file contains the value Smith, John (with a comma inside a person's name). The file uses commas as delimiters. How should this value be stored in the CSV to prevent incorrect parsing?
A) In single quotes: 'Smith, John'
B) In double quotes: "Smith, John" — the CSV standard allows values containing delimiters if surrounded by the quote character
C) With a backslash escape: Smith\, John
D) It cannot be stored in CSV format — use a different format
Question 13
You call pd.read_csv("export.csv", encoding="utf-8") on a Windows-generated file and get a UnicodeDecodeError. What should you try first?
A) encoding="ascii"
B) encoding="utf-16"
C) encoding="cp1252" or encoding="latin-1"
D) encoding="utf-32"
Question 14
What is the on_bad_lines parameter in pd.read_csv and what does on_bad_lines="skip" do?
A) It controls how pandas handles lines with fewer fields than expected; "skip" silently drops those rows
B) It controls encoding errors; "skip" ignores characters that don't decode
C) It controls empty rows; "skip" removes all blank rows automatically
D) It controls comment lines starting with #; "skip" ignores them
Question 15
You need to read a 2 GB CSV file but only care about rows where region == "West". What is the most memory-efficient approach?
A) Read the entire file with pd.read_csv(), then filter with df[df["region"] == "West"]
B) Use pd.read_csv(..., chunksize=50000) and filter each chunk, then concatenate the filtered chunks
C) Convert the CSV to SQLite first, then query with SQL
D) Use pd.read_csv(..., nrows=1000000) to limit the read
Question 16
In the context of openpyxl charts, what does titles_from_data=True do in chart.add_data(ref, titles_from_data=True)?
A) Reads the chart title from the first cell in the data reference range B) Uses the first row of the reference range as the data series name in the chart legend C) Automatically generates axis titles from the column headers D) Reads tooltip text from the cells adjacent to the data range
Question 17
What is the primary purpose of xlwings compared to openpyxl?
A) xlwings reads and writes .xlsx files; openpyxl only reads .xls files
B) xlwings connects to a running Excel process for live interaction including formula recalculation and VBA macros; openpyxl works with saved files
C) xlwings is faster for writing large datasets; openpyxl is faster for reading
D) xlwings supports newer Excel features (.xlsx); openpyxl only supports older formats
Question 18
You build a workbook with pd.ExcelWriter and then try to access the openpyxl workbook object to add formatting. Which is the correct way to access it?
A) openpyxl.load_workbook(writer.path) — load it after writing
B) writer.book — the openpyxl Workbook object accessible during the ExcelWriter context
C) writer.sheets["Sheet1"].workbook
D) writer.get_workbook()
Question 19
Marcus formats his Excel report headers manually every Monday in about 20 minutes. You offer to automate this step with Python. According to the chapter's decision framework, which statement best describes when Python-to-Excel automation is the right choice over doing it manually in Excel?
A) Always — Python is always faster than Excel for formatting B) When the report is generated repeatedly from fresh data, the same structure applies each time, and the manual work exceeds the development cost of the automation over time C) Only when the data comes from a Python analysis — if the data is already in Excel, stay in Excel D) Only when the report has more than 10,000 rows — below that, Excel pivot tables are more appropriate
Question 20
Maya's invoice generator uses Python dataclasses to hold invoice data before writing to Excel. What is the primary software design benefit of this separation?
A) Dataclasses are required by openpyxl for structured data input B) It separates the business logic (calculating totals, formatting line items) from the presentation layer (Excel formatting), making both independently testable and modifiable C) Dataclasses perform faster than dictionaries for accessing Excel cell data D) Python dataclasses automatically validate data types, preventing Excel formatting errors
Answer Key
| Q | Answer | Explanation |
|---|---|---|
| 1 | B | Without sheet_name, pd.read_excel reads the first sheet (index 0) and returns a single DataFrame. |
| 2 | C | sheet_name=None is the correct argument to read all sheets. It returns a dictionary with sheet names as keys and DataFrames as values. |
| 3 | B | Without data_only=True, openpyxl returns the formula string "=SUM(A2:A10)". The calculated value is not returned because openpyxl does not have a formula evaluation engine. |
| 4 | B | data_only=True makes openpyxl return the cached (last calculated) value instead of the formula string. The value is whatever Excel calculated the last time the file was saved. |
| 5 | B | pd.ExcelWriter as a context manager is the correct approach. Multiple df.to_excel(writer, sheet_name=...) calls within the same with block write to the same workbook. The workbook is saved when the context exits. |
| 6 | B | '"$"#,##0'` formats the integer as a dollar amount with comma thousands separator. For `580000` it produces `$580,000. The "$" in the format string is a literal dollar sign. |
| 7 | B | ws[1] in openpyxl returns a tuple of all cells in row 1. Iterating over it with for cell in ws[1] is the correct way to apply formatting to an entire row. |
| 8 | B | The freeze panes value is the first cell that is NOT frozen. "A3" means rows 1 and 2 are frozen (everything above row 3) and no columns are frozen. |
| 9 | B | The standard approach is to iterate over each column's cells, find the maximum string length, and set column_dimensions[col_letter].width to that length plus a small constant (typically 2-4). openpyxl has no built-in auto-fit function. |
| 10 | A | "0.0%" is the openpyxl number format for one decimal place percentage. Excel multiplies the stored value (0.385) by 100 and appends %, displaying 38.5%. |
| 11 | D | Both sep=";" for the semicolon delimiter AND thousands="." and decimal="," for European number formatting are needed. Without both, "1.234,56" would not parse as 1234.56. |
| 12 | B | The CSV standard (RFC 4180) allows any value to be enclosed in double quotes. Values containing the delimiter, double quotes (escaped as "" inside quotes), or newlines must be quoted. This is handled automatically by pandas' to_csv. |
| 13 | C | Files from Windows systems (particularly from Office applications) often use cp1252 (Windows-1252) or latin-1 encoding. These are the first encodings to try after UTF-8 fails. |
| 14 | A | on_bad_lines controls handling of lines with more or fewer fields than the header row. "skip" silently drops the problematic rows. "error" (the default) raises an exception. Use "skip" with caution — it can mask data quality issues. |
| 15 | B | Chunk processing is the correct approach. Each chunk is a smaller DataFrame that fits in memory. Filter within the chunk loop and concatenate only the matching rows. This keeps memory usage proportional to the filtered result, not the full file. |
| 16 | B | titles_from_data=True instructs openpyxl to treat the first row of the data reference as series name(s) for the chart legend rather than as data points. |
| 17 | B | xlwings connects to a live Excel process, enabling formula recalculation, macro invocation, and bidirectional real-time interaction. openpyxl reads and writes saved .xlsx files without requiring Excel to be installed. |
| 18 | B | writer.book is the underlying openpyxl Workbook object. Access it inside the with pd.ExcelWriter(...) as writer: context block. writer.sheets["Sheet1"] gives access to individual worksheets. |
| 19 | B | The chapter's decision framework states: use Python → Excel when the report is generated repeatedly from fresh data, the structure is consistent, and the automation cost is justified by time savings over multiple runs. One-time or highly variable reports are better done manually in Excel. |
| 20 | B | Separating data models from rendering (the "separation of concerns" principle) means the invoice business logic can be tested independently of the Excel code. If Maya changes her Excel formatting, she doesn't touch the calculation logic, and vice versa. |
Score guide: 18-20 correct = Mastery | 14-17 = Proficient | 10-13 = Review sections 16.2-16.8 | Below 10 = Re-read the full chapter