Chapter 34 Quiz: On-Chain Analytics

Multiple Choice Questions

Question 1

What is the primary advantage of blockchain data over traditional financial data for analytical purposes?

A) It is more accurate because smart contracts never have bugs B) Every transaction is permanently public, immutable, and verifiable without institutional permission C) It is easier to query because blockchains use SQL databases D) It updates faster because there are no regulatory reporting delays

Answer: B Explanation: The defining feature of public blockchain data is its unconditional transparency --- anyone can read every transaction ever recorded, without needing permission from a bank, regulator, or data provider. Option A is false (smart contracts can have bugs). Option C is false (raw blockchain data is not in SQL format --- platforms like Dune Analytics add that layer). Option D is partially true for real-time settlement, but the fundamental advantage is access, not speed.


Question 2

Which layer of the on-chain analytics stack would you use to run a complex SQL query joining Uniswap swap events with ERC-20 token transfers across 30 days of Ethereum history?

A) A full Ethereum archive node B) The Etherscan REST API C) Dune Analytics or Flipside Crypto D) The Whale Alert Twitter bot

Answer: C Explanation: Query platforms like Dune Analytics pre-index and decode blockchain data into relational tables, enabling complex SQL queries with joins across multiple data types. A full node (A) stores raw data but has no SQL interface. The Etherscan API (B) handles individual address lookups but cannot join across tables. Whale Alert (D) is a monitoring service, not an analytical platform.


Question 3

When fetching transaction data from the Etherscan API, what is the maximum number of results returned per query?

A) 1,000 B) 5,000 C) 10,000 D) Unlimited

Answer: C Explanation: Etherscan limits API responses to 10,000 results per query. For addresses with more transactions, you must paginate using block number ranges --- fetching transactions from block 0 to the last block in the previous batch, then continuing from the next block.


Question 4

What does the NVT (Network Value to Transactions) ratio measure?

A) The number of nodes validating transactions on the network B) The ratio of a network's market capitalization to its daily transaction volume in USD C) The percentage of network value locked in DeFi protocols D) The transaction throughput capacity of the network

Answer: B Explanation: The NVT ratio, proposed by Willy Woo, divides market cap by daily transaction volume (USD). It functions as a blockchain equivalent of the price-to-earnings ratio --- a high NVT suggests the network's valuation outpaces its economic utility, while a low NVT suggests the opposite.


Question 5

A Gini coefficient of 0.97 for a governance token's distribution indicates:

A) Near-perfect equality --- almost every address holds the same amount B) Moderate concentration --- the top holders control about half the supply C) Extreme concentration --- a very small number of addresses hold the vast majority of tokens D) That the token has experienced hyperinflation

Answer: C Explanation: The Gini coefficient ranges from 0 (perfect equality) to 1 (perfect inequality, where one entity holds everything). A value of 0.97 is extremely high, indicating severe concentration. However, as the chapter notes, this must be interpreted carefully --- exchange addresses hold tokens on behalf of many users, and smart contract addresses (like liquidity pools) represent collective ownership.


Question 6

Positive exchange net flow (more tokens flowing into exchanges than out) is generally interpreted as:

A) A bullish signal, indicating that investors are buying B) A bearish signal, indicating potential selling pressure C) A neutral signal with no market implications D) An indicator that the exchange is being hacked

Answer: B Explanation: Users typically deposit tokens to exchanges when they intend to sell. Therefore, net positive inflow (more going in than coming out) suggests selling pressure, which is interpreted as bearish. Net negative flow (more leaving exchanges) suggests accumulation, which is bullish. However, this is a heuristic --- deposits can also be for staking, lending, or collateral purposes.


Question 7

In Bitcoin, the common-input-ownership heuristic for wallet clustering works because:

A) Bitcoin transactions always have exactly one input and one output B) All input addresses in a multi-input transaction must be controlled by the same entity C) Bitcoin addresses are linked to real names through KYC requirements D) Bitcoin miners can see which wallets belong to the same user

Answer: B Explanation: In Bitcoin's UTXO model, a transaction can consume multiple inputs. To spend a UTXO, you must sign with its private key. Therefore, all inputs in a single transaction must be controlled by the entity that created the transaction. This allows analysts to cluster addresses that appear as co-inputs in any transaction.


Question 8

Which of the following is the strongest indicator of NFT wash trading?

A) An NFT selling for more than its floor price B) The buyer's wallet was directly funded by the seller's wallet shortly before the purchase C) The transaction occurred during low-gas-fee hours D) The NFT was listed on multiple marketplaces simultaneously

Answer: B Explanation: Circular fund flows --- where the seller funds the buyer before the "sale" --- are the strongest indicator of wash trading. This pattern reveals that a single entity is trading with itself to create artificial volume or inflate price history. The other options describe normal market behavior.


Question 9

What is a Sybil attack in the context of cryptocurrency airdrops?

A) A 51% attack on a proof-of-work blockchain B) Creating many wallets to qualify for multiple airdrop allocations from a single real person C) Exploiting a smart contract vulnerability to steal airdropped tokens D) Using a mixer to anonymize airdropped tokens after receiving them

Answer: B Explanation: A Sybil attack in the airdrop context involves one person creating many blockchain addresses and performing qualifying actions with each one, in order to receive multiple airdrop allocations. The term comes from the 1973 book "Sybil" about dissociative identity disorder. Detection relies on identifying clusters of addresses with common funding sources and identical behavioral patterns.


Question 10

Why does the Ethereum clustering heuristic differ from the Bitcoin clustering heuristic?

A) Ethereum is a private blockchain with no public transactions B) Ethereum uses an account model rather than a UTXO model, so multi-input transactions do not exist C) Ethereum addresses are already linked to real identities D) Ethereum transactions are encrypted and cannot be analyzed

Answer: B Explanation: Bitcoin uses the UTXO (Unspent Transaction Output) model where transactions consume multiple inputs, enabling the common-input-ownership heuristic. Ethereum uses an account model where each transaction has exactly one sender, eliminating this particular heuristic. Ethereum clustering instead relies on behavioral heuristics: funding source analysis, contract interaction patterns, and temporal correlations.


Question 11

Total Value Locked (TVL) as a DeFi metric can be misleading because:

A) It only counts tokens, not NFTs B) It can be inflated by recursive borrowing, double-counting across protocols, and pure price appreciation C) It does not include governance tokens D) It is calculated differently by each analytics platform

Answer: B Explanation: TVL has well-documented inflation mechanisms: recursive borrowing (deposit, borrow, re-deposit), double-counting when the same assets appear in multiple protocols, and price appreciation (if ETH doubles, all ETH-denominated TVL doubles without new deposits). While D is also partially true, B is the most comprehensive and significant limitation.


Question 12

The "measurability bias" in on-chain analytics refers to:

A) The tendency to only analyze tokens with high trading volume B) The systematic bias toward treating quantifiable on-chain metrics as the only metrics that matter, while ignoring unmeasurable factors like team quality, code security, and user satisfaction C) The bias introduced by Ethereum's gas pricing mechanism D) The tendency for on-chain metrics to be more accurate for Bitcoin than for Ethereum

Answer: B Explanation: Measurability bias is the tendency to equate "what can be measured" with "what matters." On-chain data provides precise metrics about transactions, addresses, and token movements, but tells you nothing about developer quality, regulatory risk, user experience, or protocol security. An analyst who relies exclusively on quantifiable on-chain data may miss critical qualitative factors.


Question 13

When analyzing governance participation for a DAO, which metric combination best indicates healthy decentralized governance?

A) High TVL + high token price B) High voter turnout + low voting power concentration + diverse proposal creators C) Many total token holders + high trading volume D) Fast block confirmation times + low gas costs

Answer: B Explanation: Healthy governance requires broad participation (high turnout), distributed power (low concentration, meaning no single whale dominates all votes), and diverse engagement (multiple addresses creating proposals, not just the founding team). TVL, token price, and trading volume measure economic activity, not governance health. Block times and gas costs are network-level metrics unrelated to governance.


Question 14

The Graph protocol uses which query language to serve indexed blockchain data?

A) SQL B) REST API with JSON C) GraphQL D) SPARQL

Answer: C Explanation: The Graph uses GraphQL, a query language that allows clients to specify exactly which fields they need. Developers define "subgraphs" that specify indexing schemas for specific smart contracts, and these are queried via GraphQL endpoints. This is different from Dune Analytics (SQL), Etherscan (REST/JSON), and traditional linked data systems (SPARQL).


Question 15

An on-chain analyst observes that a new DeFi protocol has 50,000 unique addresses, $200 million in TVL, but a cohort retention rate of only 5% after 30 days. The most likely explanation is:

A) The protocol is experiencing rapid organic growth B) The protocol is attracting mercenary capital and airdrop farmers who leave after claiming incentives C) The protocol has a critical security vulnerability D) The blockchain explorer is miscounting addresses

Answer: B Explanation: High initial user count with extremely low retention (5% at 30 days) is the classic signature of incentive farming --- users arrive for yield farming rewards or airdrop eligibility, interact with the protocol enough to qualify, and then leave. This pattern was widely observed during "DeFi Summer" (2020) and subsequent token launch periods. Real product-market fit shows much higher retention rates.


Short Answer Questions

Question 16

Explain why a full Ethereum archive node, despite being the most complete data source, is rarely used directly for analytics work. What trade-offs does it involve?

Model Answer: A full Ethereum archive node contains every historical state of every account and smart contract --- the most complete possible data source. However, it requires 12+ TB of storage and can take weeks to sync. More critically, its JSON-RPC API is designed for individual lookups, not analytical queries. Asking a question that requires scanning millions of blocks (like "total Uniswap volume in the last month") would involve sequential block-by-block queries, taking hours or days. Analytics platforms like Dune Analytics solve this by pre-indexing and decoding blockchain data into relational tables optimized for analytical queries. The trade-off is between completeness/independence (full node) and convenience/speed (indexed platform). Most analysts use indexed platforms for daily work and fall back to full nodes only when they need data that no third-party service provides, or when they require complete query privacy.


Question 17

Describe two distinct techniques for detecting Sybil attacks on airdrops, and explain one limitation of each technique.

Model Answer: (1) Common funding source analysis: Identify clusters of addresses that all received their initial ETH from the same address or small set of addresses. If 50 wallets were all funded by the same source and then performed identical qualifying actions, they are likely Sybil addresses. Limitation: Some legitimate funding sources (payroll contracts, faucets, centralized exchange withdrawals) fund many unrelated addresses, producing false positives. (2) Behavioral similarity clustering: Compare the sequence and timing of transactions across addresses. Sybil addresses often perform the same actions in the same order at evenly spaced time intervals, because they are driven by automated scripts. Limitation: Sophisticated farmers randomize their action sequences and timing to avoid detection, making behavioral clustering less effective against well-funded, technically sophisticated attackers who are aware of these detection methods.


Question 18

A colleague claims: "This token has a Gini coefficient of 0.98, so it's basically controlled by five whales and is a scam." Provide a nuanced response explaining why this interpretation may be incorrect.

Model Answer: A Gini coefficient of 0.98 indicates extreme concentration, but this alone does not mean the token is controlled by "five whales" or is a scam. Several legitimate factors inflate token Gini coefficients: (1) Exchange addresses hold tokens on behalf of millions of individual users --- Binance's hot wallet might hold 15% of a token's supply representing thousands of customers. (2) Team and treasury addresses hold tokens earmarked for grants, development, and ecosystem growth, not personal wealth. (3) Smart contract addresses (like Uniswap liquidity pools or staking contracts) hold tokens that are collectively owned by many liquidity providers or stakers. (4) Burned or locked tokens in dead addresses inflate apparent concentration. Before concluding that a token is whale-controlled, an analyst must decompose the top holders by entity type: separate exchange addresses, treasury addresses, smart contracts, and known locked/burned supplies from genuine individual whale holdings. Only the residual concentration among identifiable individual wallets indicates real plutocratic control.


Question 19

Explain the "observer effect" in on-chain analytics and give one concrete example of how it changes the behavior being observed.

Model Answer: The observer effect in on-chain analytics refers to the phenomenon where the act of publicly monitoring blockchain activity causes sophisticated actors to change their behavior to avoid detection. Because blockchain analysis tools and their methodologies are widely known, the entities being studied adapt their strategies. Concrete example: When whale alert services began monitoring large Bitcoin transfers and broadcasting them on social media, institutional traders and large holders began splitting their orders across multiple wallets and executing them over longer time periods rather than making single large transfers. A fund that might have previously moved 5,000 BTC in one transaction now moves 500 BTC through ten different addresses over a week, specifically to avoid triggering whale alerts that could move the market against them before their order is complete. This makes single-transfer whale metrics less reliable over time, because the most sophisticated whales are precisely the ones most likely to structure their transactions to avoid detection. The measurable activity becomes an increasingly biased sample skewed toward less sophisticated participants.


Question 20

You are building an on-chain analytics dashboard for a new DeFi lending protocol. Name four metrics you would include, explain what each one measures, and identify one limitation or caveat for each metric.

Model Answer: (1) Total Value Locked (TVL): Total USD value of assets deposited in the protocol's smart contracts. Limitation: TVL inflates when underlying asset prices rise, even without new deposits; also inflatable through recursive borrowing. (2) Daily active borrowers: Number of unique addresses with active loans. Limitation: A single entity can use multiple addresses, and bot activity can inflate unique address counts. (3) Utilization rate: Percentage of deposited assets currently lent out (borrowed amount / total deposits). Limitation: Extremely high utilization may indicate risk (depositors cannot withdraw), not health. (4) 30-day user retention rate: Percentage of users who interacted with the protocol 30 days ago who are still active today. Limitation: Short-term incentive programs can temporarily inflate retention; also, DeFi users may legitimately need to borrow only once (e.g., for a specific purchase), so low retention does not always indicate a problem. Each metric provides one facet of protocol health, but none tells the complete story alone.