Chapter 7 Quiz: Aggregation and Grouping

DataField.Dev

Chapter 7 Quiz: Aggregation and Grouping

Test your understanding of Chapter 7. Attempt all questions from memory before revealing the answers. The act of retrieval — even when you get it wrong — strengthens learning more than re-reading.

Scoring Guide: - 22-25 correct: Excellent — you have strong command of aggregation and grouping - 17-21 correct: Good — review the sections where you missed questions - 12-16 correct: Fair — re-read the chapter with the quiz questions in mind - Below 12: Re-read the chapter carefully, then retake this quiz in 48 hours

Multiple Choice Questions

Question 1

What does COUNT(*) count?

A) Non-NULL values in the first column B) Distinct values in the result set C) All rows, regardless of NULL values D) Only rows where no column is NULL

Answer

**C) All rows, regardless of NULL values** COUNT(*) counts rows. It does not examine any column and is completely unaffected by NULLs. If the table has 35 rows, COUNT(*) returns 35 even if every column in every row is NULL.

Question 2

A column contains these values: 10, 20, NULL, 30, NULL. What does AVG(column) return?

A) 12 (60 / 5) B) 20 (60 / 3) C) NULL D) 0

Answer

**B) 20 (60 / 3)** AVG ignores NULLs. It computes SUM(non-null values) / COUNT(non-null values) = (10 + 20 + 30) / 3 = 20. The two NULL rows are excluded from both the numerator and the denominator.

Question 3

What does SUM return when applied to a column where every value is NULL (or the result set is empty)?

A) 0 B) NULL C) An error D) -1

Answer

**B) NULL** SUM over an empty set or a set of all NULLs returns NULL, not zero. This is why defensive coding uses COALESCE(SUM(column), 0) when the result might be used in arithmetic or displayed to users.

Question 4

Which of the following is a valid SELECT list when GROUP BY BRANCH_ID is specified?

A) SELECT BRANCH_ID, ACCOUNT_NUMBER, COUNT() B) SELECT BRANCH_ID, MAX(ACCOUNT_NUMBER), COUNT() C) SELECT ACCOUNT_NUMBER, COUNT(*) D) SELECT BRANCH_ID, ACCOUNT_NUMBER

Answer

**B) SELECT BRANCH_ID, MAX(ACCOUNT_NUMBER), COUNT(*)** Every column in the SELECT list must either appear in the GROUP BY clause or be inside an aggregate function. BRANCH_ID is in GROUP BY, MAX(ACCOUNT_NUMBER) is an aggregate, and COUNT(*) is an aggregate. Option A fails because ACCOUNT_NUMBER is neither grouped nor aggregated. Option C fails for the same reason. Option D has no aggregates at all, and ACCOUNT_NUMBER is not in GROUP BY.

Question 5

Where should you place a filter that tests an aggregate value, such as SUM(BALANCE) > 100000?

A) WHERE clause B) HAVING clause C) ON clause D) Either WHERE or HAVING

Answer

**B) HAVING clause** Aggregate conditions must go in HAVING because WHERE is evaluated before groups are formed and aggregates are computed. Placing SUM(BALANCE) > 100000 in WHERE causes a syntax error.

Question 6

You want to filter rows where BRANCH_ID = 1 before grouping. Where should this condition go?

A) HAVING clause only B) WHERE clause only C) Either WHERE or HAVING (same result) D) WHERE clause (preferred for performance)

Answer

**D) WHERE clause (preferred for performance)** BRANCH_ID is not an aggregate, so the filter works in both WHERE and HAVING and produces the same result. However, WHERE is preferred because it reduces the number of rows entering the GROUP BY operation, making the query more efficient.

Question 7

In the logical order of SQL evaluation, which step comes first?

A) HAVING B) GROUP BY C) WHERE D) SELECT

Answer

**C) WHERE** The logical order is: FROM -> JOIN/ON -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> FETCH FIRST. WHERE is evaluated before GROUP BY and HAVING.

Question 8

What does GROUPING SETS ((a, b), (a), ()) produce?

A) One grouping level: a and b together B) Two grouping levels: (a,b) detail and grand total C) Three grouping levels: (a,b) detail, subtotal by a, and grand total D) Four grouping levels: (a,b), (a), (b), and grand total

Answer

**C) Three grouping levels: (a,b) detail, subtotal by a, and grand total** Each set in GROUPING SETS defines one grouping level. (a, b) groups by both columns, (a) groups by a alone (a subtotal across all b values), and () is the grand total with no grouping columns.

Question 9

GROUP BY ROLLUP (x, y, z) produces how many grouping levels?

A) 3 B) 4 C) 7 D) 8

Answer

**B) 4** ROLLUP with N columns produces N+1 grouping levels. ROLLUP(x, y, z) is equivalent to GROUPING SETS ((x,y,z), (x,y), (x), ()). That is 4 levels: the detail, subtotal dropping z, subtotal dropping y and z, and the grand total.

Question 10

GROUP BY CUBE (a, b) produces how many grouping levels?

A) 2 B) 3 C) 4 D) 8

Answer

**C) 4** CUBE with N columns produces 2^N grouping levels. CUBE(a, b) = 2^2 = 4 levels: (a,b), (a), (b), and (). CUBE produces every possible combination of the grouping columns, including individual columns that ROLLUP would not produce.

Question 11

What does GROUPING(column_name) return when the column is part of a summary (rolled-up) row?

A) 0 B) 1 C) NULL D) The column's value

Answer

**B) 1** GROUPING(column) returns 1 when the column has been aggregated away (the NULL in the column represents a summary row, not actual NULL data). It returns 0 when the column is a real grouping value.

Question 12

When using a LEFT JOIN with GROUP BY, why should you use COUNT(right_table.column) instead of COUNT(*)?

A) COUNT() causes an error with LEFT JOIN B) COUNT() counts unmatched rows as 1; COUNT(column) correctly returns 0 for them C) COUNT(column) is faster D) There is no difference

Answer

**B) COUNT(*) counts unmatched rows as 1; COUNT(column) correctly returns 0 for them** With a LEFT JOIN, unmatched rows from the right table have NULLs in all right-table columns. COUNT(*) counts these rows because it counts rows regardless of values. COUNT(right_table.column) returns 0 for these groups because the column is NULL, giving you the correct count of matched rows.

Question 13

You have a LEFT JOIN between BRANCHES and ACCOUNTS with a filter a.STATUS = 'A'. Where should this filter go to preserve branches with no active accounts?

A) In the WHERE clause B) In the ON clause of the JOIN C) In the HAVING clause D) It does not matter

Answer

**B) In the ON clause of the JOIN** Putting the filter in WHERE would eliminate the NULL rows produced by the LEFT JOIN for branches with no active accounts, effectively converting the LEFT JOIN to an INNER JOIN. Placing it in the ON clause means "only join active accounts, but keep all branches regardless."

Question 14

What is the standard pattern for producing a financial report that shows branch names (not just IDs) with aggregate metrics?

A) Aggregate first, then join to get names B) Join to get names, then aggregate C) Use a subquery for names D) Use BRANCH_ID only; names are added by the application

Answer

**B) Join to get names, then aggregate** The standard reporting pattern is: JOIN tables to assemble raw data with descriptive columns, filter with WHERE, GROUP BY to aggregate, filter groups with HAVING if needed, and ORDER BY for presentation. The descriptive columns (like branch name) go in both SELECT and GROUP BY.

Question 15

Which technique would you use to split a SUM into "credit" and "debit" components in a single query?

A) Two separate queries with UNION ALL B) CASE expression inside SUM C) HAVING with two conditions D) Two GROUP BY clauses

Answer

**B) CASE expression inside SUM** Conditional aggregation uses `SUM(CASE WHEN amount > 0 THEN amount ELSE 0 END)` for credits and a similar expression for debits. This computes both values in a single pass through the data, which is more efficient than running two separate queries.

Question 16

ROLLUP(STATE, BRANCH) will produce subtotals for each state. Will it also produce subtotals for each branch across all states?

A) Yes B) No

Answer

**B) No** ROLLUP removes columns from right to left. ROLLUP(STATE, BRANCH) produces: (STATE, BRANCH) detail, (STATE) subtotals per state, and () grand total. It does NOT produce (BRANCH) subtotals across all states. To get that, you would need CUBE(STATE, BRANCH) or an explicit GROUPING SETS that includes (BRANCH).

Question 17

A column has values: 5, 5, 10, 10, 10. What does COUNT(DISTINCT column) return?

A) 2 B) 3 C) 5 D) 10

Answer

**A) 2** COUNT(DISTINCT column) counts unique non-NULL values. There are two distinct values: 5 and 10. The count of rows (5) or the sum of values is irrelevant.

Question 18

When GROUP BY includes a column with NULL values, how are the NULLs handled?

A) Rows with NULL are excluded from all groups B) Rows with NULL cause an error C) All NULL values are placed in a single group D) Each NULL value gets its own group

Answer

**C) All NULL values are placed in a single group** This is one of the few places in SQL where NULLs are treated as "equal" to each other. All rows where the grouping column is NULL are collected into a single group and aggregated together.

Question 19

What is the purpose of COALESCE(SUM(column), 0)?

A) To replace individual NULL values before summing B) To guarantee a 0 result instead of NULL when SUM operates on an empty set C) To improve performance of the SUM operation D) To convert the SUM result to an integer

Answer

**B) To guarantee a 0 result instead of NULL when SUM operates on an empty set** SUM over an empty result set (or all NULLs) returns NULL. COALESCE converts that NULL to 0. This is defensive coding that prevents NULL propagation in reports and downstream calculations. To replace NULLs before summing, you would use SUM(COALESCE(column, 0)), which is a different pattern.

Question 20

CUBE(a, b, c) is equivalent to GROUPING SETS with how many individual sets?

A) 4 B) 6 C) 7 D) 8

Answer

**D) 8** CUBE with N columns produces 2^N grouping sets. CUBE(a, b, c) = 2^3 = 8 sets: (a,b,c), (a,b), (a,c), (b,c), (a), (b), (c), ().

Question 21

Which DB2 function returns the population standard deviation?

A) STDEV() B) STDDEV() C) STDDEV_SAMP() D) STD()

Answer

**B) STDDEV()** STDDEV() computes the population standard deviation. STDDEV_SAMP() computes the sample standard deviation (using N-1 in the denominator). The distinction matters when the data represents a sample from a larger population rather than the entire population.

Question 22

In a report using ROLLUP, you see a row where STATE_CODE is 'CT' and BRANCH_NAME is NULL. What does this row represent?

A) A branch in Connecticut whose name is unknown B) The subtotal for all branches in Connecticut C) An error in the data D) The grand total

Answer

**B) The subtotal for all branches in Connecticut** In ROLLUP(STATE_CODE, BRANCH_NAME), a NULL in BRANCH_NAME while STATE_CODE has a value indicates a state-level subtotal. The grand total would have NULLs in both columns. To confirm programmatically, use GROUPING(BRANCH_NAME) which returns 1 for summary rows.

Question 23

Which of the following is more efficient and why?

Version A: GROUP BY x HAVING x = 5 Version B: WHERE x = 5 GROUP BY x

A) Version A — HAVING is optimized for filters B) Version B — WHERE filters before grouping reduces work C) They are identical in performance D) Version A — it avoids an index scan

Answer

**B) Version B — WHERE filters before grouping reduces work** WHERE eliminates non-matching rows before they enter the GROUP BY operation, meaning fewer rows need to be sorted or hashed. HAVING filters after grouping, so all rows must first be grouped. Both produce the same result, but WHERE reduces the work performed by the GROUP BY step.

Question 24

What is a "covering index" in the context of aggregation queries?

A) An index that covers the entire table B) An index that contains all columns referenced in the query, enabling index-only access C) An index that automatically maintains aggregated values D) An index created specifically for ROLLUP operations

Answer

**B) An index that contains all columns referenced in the query, enabling index-only access** A covering index includes every column used in the SELECT, WHERE, GROUP BY, and HAVING clauses. When such an index exists, DB2 can satisfy the entire query by reading only the index without touching the base table. This is called index-only access and is extremely fast for aggregation queries.

Question 25

What is a Materialized Query Table (MQT) in DB2, and why is it relevant to aggregation?

A) A temporary table that exists only during a query B) A table that stores pre-computed aggregation results, refreshed periodically, to avoid recomputing expensive GROUP BY queries every time C) A virtual table defined by a view D) A table that stores EXPLAIN output

Answer

**B) A table that stores pre-computed aggregation results, refreshed periodically, to avoid recomputing expensive GROUP BY queries every time** Materialized Query Tables (MQTs) are DB2's implementation of materialized views. For aggregation queries that run frequently against large tables (e.g., a daily branch summary report run 50 times per day), the MQT computes and stores the result once, and subsequent queries read from the MQT instead of re-aggregating millions of rows.

Answer Summary

Question	Answer	Section
1	C	7.1
2	B	7.1, 7.4
3	B	7.4
4	B	7.2
5	B	7.3
6	D	7.3
7	C	7.3
8	C	7.5
9	B	7.6
10	C	7.7
11	B	7.8
12	B	7.9
13	B	7.9
14	B	7.9
15	B	7.10
16	B	7.6
17	A	7.1
18	C	7.4
19	B	7.4
20	D	7.7
21	B	7.1
22	B	7.6, 7.8
23	B	7.3
24	B	7.11
25	B	7.11