Chapter 15 Quiz: SIMD Programming

Instructions: Questions 1-18 are worth 1 point each. Questions 19-20 are worth 2 points each. Total: 22 points.


Question 1. How many 32-bit floats fit in a YMM register?

A) 2 B) 4 C) 8 D) 16


Question 2. Which instruction requires 16-byte alignment of its memory operand?

A) MOVUPS B) MOVAPS C) MOVDQU D) LDDQU


Question 3. What does ADDPS xmm0, xmm1 compute?

A) The sum of all 4 floats in xmm0, stored as a scalar in xmm0[0] B) Adds the low float of xmm1 to the low float of xmm0 C) Independently adds each of the 4 float pairs: xmm0[i] += xmm1[i] D) Adds xmm0 and xmm1 as 128-bit integers


Question 4. Which ISA extension introduced the 256-bit YMM registers?

A) SSE2 B) SSE4.2 C) AVX D) AVX-512


Question 5. What is the purpose of VZEROUPPER?

A) Zero all YMM registers B) Zero the upper 128 bits of all YMM registers and prevent AVX-SSE transition penalties C) Set the upper half of YMM registers to 0xFFFFFFFF D) Convert YMM registers to ZMM registers


Question 6. VFMADD213PS ymm0, ymm1, ymm2 computes:

A) ymm0 = ymm0 × ymm1 + ymm2 B) ymm0 = ymm1 × ymm0 + ymm2 C) ymm0 = ymm1 × ymm2 + ymm0 D) ymm0 = ymm0 + ymm1 × ymm2 (same as A with different naming)


Question 7. After executing SHUFPS xmm0, xmm0, 0x00, what does xmm0 contain?

A) xmm0 is zeroed B) All 4 lanes contain xmm0's original element 0 (broadcast) C) All 4 lanes contain xmm0's original element 3 D) xmm0 is unchanged


Question 8. How many AES rounds does AESENC perform?

A) 1 round B) 4 rounds C) 10 rounds D) It depends on the key size


Question 9. For AES-128 encryption, how many AESENC instructions are needed per block (not counting the final AESENCLAST)?

A) 8 B) 9 C) 10 D) 11


Question 10. AESKEYGENASSIST xmm1, xmm0, 0x01 operates on which bytes of xmm0?

A) Bytes [3:0] (the low dword) B) Bytes [15:12] (the high dword) C) All 16 bytes equally D) Bytes [7:4] (the second dword)


Question 11. In AES-CTR mode, what is XORed with the plaintext to produce ciphertext?

A) The encryption key directly B) The previous ciphertext block C) The encrypted counter block D) The AES S-box output for the counter


Question 12. What is the latency of VFMADD213PS on Intel Haswell?

A) 1 cycle B) 3 cycles C) 5 cycles D) 10 cycles


Question 13. Which of the following is NOT a reason to prefer AES-NI over software AES?

A) AES-NI is faster (1-3 cycles per round vs. 20+ cycles) B) AES-NI is constant-time and eliminates cache-timing attacks C) AES-NI supports larger key sizes than software AES D) AES-NI reduces code complexity


Question 14. After VMOVAPS ymm0, [memory], what is the relationship between ymm0 and xmm0?

A) xmm0 contains the upper 128 bits of ymm0 B) xmm0 contains the lower 128 bits of ymm0 C) xmm0 is unchanged; they are separate registers D) xmm0 contains all zeros after any AVX instruction


Question 15. PCMPEQD xmm0, xmm1 produces:

A) A single bit: 1 if all dwords are equal, 0 otherwise B) A packed result: 0xFFFFFFFF in each lane where equal, 0x00000000 where not equal C) A single integer: the count of matching dword lanes D) Sets ZF=1 if all dwords are equal


Question 16. A horizontal sum of 8 floats in ymm0 requires which step that would NOT be needed for 4 floats in xmm0?

A) Using FMA instead of ADDPS B) Extracting the upper 128-bit lane with VEXTRACTF128 before the final reduction C) Using VZEROUPPER before the reduction D) Storing to memory before summing


Question 17. Software AES using T-tables is vulnerable to cache-timing attacks because:

A) T-table lookups always take more than 100 cycles B) The memory addresses accessed depend on the key and plaintext, leaking information via cache state C) T-tables cannot represent all 256 AES S-box values D) Software AES produces incorrect output for certain key values


Question 18. What does PADDSB xmm0, xmm1 do differently from PADDB xmm0, xmm1?

A) PADDSB operates on 16-bit integers; PADDB operates on 8-bit integers B) PADDSB saturates at 127/−128 instead of wrapping around C) PADDSB is slower but more accurate D) There is no difference; they are aliases


Question 19 (2 points). A programmer writes the following SIMD code:

vmovaps  ymm0, [rel array_a]     ; load 8 floats from A
vmovaps  ymm1, [rel array_b]     ; load 8 floats from B
vaddps   ymm0, ymm0, ymm1        ; ymm0 = A + B
; ... (many more AVX instructions) ...
movaps   xmm2, [rel array_c]     ; legacy SSE load
addps    xmm0, xmm2              ; legacy SSE add
; ... (more SSE instructions) ...
vaddps   ymm3, ymm4, ymm5        ; back to AVX

Identify all problems in this code sequence and explain: (a) What AVX-SSE transition problem occurs and on which specific instruction? (b) How does the programmer fix it? (c) What does xmm0 contain immediately after ADDPS xmm0, xmm2 if the upper 128 bits of ymm0 were non-zero before the transition?


Question 20 (2 points). Explain why AES-NI instructions eliminate cache-timing attacks against AES:

(a) What property of AESENC / AESENCLAST makes them immune to cache-timing attacks? (b) A security researcher claims: "Even with AES-NI, an AES implementation can still leak key material via timing." Describe one realistic scenario where this is true (other than a side-channel in the key schedule expansion). (c) The following pseudocode is from a "constant-time" AES implementation:

void aes_encrypt(uint8_t *ct, const uint8_t *pt, const uint8_t *key, int key_len) {
    if (key_len == 128) {
        aes128_encrypt_ni(ct, pt, key);
    } else if (key_len == 256) {
        aes256_encrypt_ni(ct, pt, key);
    }
    // key_len == 192 case not implemented, silently does nothing
}

Identify at least two security problems beyond the obvious missing case.