Chapter 34 Exercises: On-Chain Analytics
Exercise 34.1: Etherscan API Exploration (Beginner)
Objective: Become comfortable querying the Etherscan API and converting results to pandas DataFrames.
Tasks:
-
Register for a free Etherscan API key at
etherscan.io/apis. Set it as an environment variableETHERSCAN_API_KEY. -
Using the
EtherscanClientclass fromcode/etherscan_query.py, fetch the last 100 transactions for the Ethereum Foundation's donation address:0xde0B295669a9FD93d5F28D9Ec85E40f4cb697BAe. -
Convert the results to a pandas DataFrame using
transactions_to_dataframe(). Answer the following questions: - What is the date range of the 100 most recent transactions? - What is the mean and median transaction value in ETH? - What is the total gas cost (in ETH) across all 100 transactions? - How many unique sender addresses appear? -
Create a bar chart showing the number of transactions per day for this address.
-
Identify the single largest transaction by value. What was the amount, and when did it occur?
Deliverable: A Python script that produces the five answers and one chart. Include comments explaining what each Etherscan API parameter does.
Exercise 34.2: Token Distribution Analysis (Beginner)
Objective: Analyze the distribution of an ERC-20 token using transfer event data.
Tasks:
-
Choose any well-known ERC-20 token (USDC, LINK, UNI, or AAVE are good choices). Find its contract address on Etherscan.
-
Using
get_token_transfers(), fetch the first 10,000 transfer events for the token. -
Compute the balance of each address that appears in the transfer data (sum of incoming transfers minus sum of outgoing transfers). Filter out addresses with zero or negative balances.
-
Calculate the Gini coefficient of the resulting distribution using the
gini_coefficient()function. -
Create two visualizations: - A bar chart of the top 20 token holders by balance - A Lorenz curve (cumulative share of addresses vs. cumulative share of tokens)
-
Write a paragraph interpreting your findings. Is the token highly concentrated? What types of addresses likely dominate the top of the distribution (exchanges, treasuries, whales)?
Deliverable: A Python script with the Gini calculation, both charts, and your written interpretation as a comment block.
Exercise 34.3: Exchange Flow Tracker (Intermediate)
Objective: Build a tool that calculates and visualizes exchange inflows and outflows for ETH.
Tasks:
-
Using the
EXCHANGE_ADDRESSESdictionary from the chapter, fetch the last 1,000 transactions for three major exchange hot wallets: - Binance:0x28c6c06298d514db089934071355e5743bf21d60- Coinbase:0xdfd5293d8e347dfe59e90efd55b2956a1343963d- Kraken:0x2910543Af39abA0Cd09dBb2D50200b3E800A63D2 -
For each exchange, calculate: - Total daily inflow (ETH received) - Total daily outflow (ETH sent) - Net flow (inflow - outflow)
-
Create a dashboard with three panels (one per exchange) showing net flow over time. Color positive net flow (inflows > outflows) red and negative net flow (outflows > inflows) green.
-
Calculate the aggregate net flow across all three exchanges. On which days was the aggregate flow most positive (potential selling pressure) and most negative (potential accumulation)?
-
Discuss in comments: What are three reasons a large deposit to an exchange might NOT indicate intent to sell?
Deliverable: A Python script producing the three-panel dashboard and a summary of peak flow days.
Exercise 34.4: Whale Alert Clone (Intermediate)
Objective: Build a simplified "Whale Alert" system that monitors for large ETH transfers.
Tasks:
-
Write a function
monitor_recent_blocks(client, num_blocks, threshold_eth)that: - Fetches the latest block number using the Etherscan API - Queries transactions for the most recentnum_blocksblocks - Filters for transactions where the value exceedsthreshold_ethETH - Returns a DataFrame with columns: block, timestamp, from, to, value_eth, tx_hash -
Run the function with
threshold_eth=100(adjust downward if necessary to get results) and display the output. -
For each whale transfer found, check if the sender or receiver is a known exchange address. Label the transfer as: - "Exchange Deposit" if the receiver is an exchange - "Exchange Withdrawal" if the sender is an exchange - "Whale to Whale" if neither is an exchange - "Unknown" otherwise
-
Create a summary table showing the count and total value of each transfer type.
-
Bonus: Add a function that sends a console notification (simple print statement with formatting) when a whale transfer above 1,000 ETH is detected.
Deliverable: A Python script that acts as a simple whale alert system, with labeled transfer categories and summary statistics.
Exercise 34.5: DeFi Protocol Comparison (Intermediate)
Objective: Compare two DeFi protocols using on-chain metrics.
Tasks:
-
Choose two DeFi protocols from the same category (e.g., Uniswap vs. SushiSwap for DEXes, Aave vs. Compound for lending). Find their primary smart contract addresses.
-
For each protocol, calculate: - Total number of transactions in the last 30 days (using Etherscan transaction count API) - Number of unique addresses interacting with the protocol - Average transaction value - Estimated gas costs paid by users
-
If token transfer data is available, analyze the governance token distribution for both protocols (Gini coefficient, top 10 holder percentage).
-
Create a comparison table:
| Metric | Protocol A | Protocol B |
|---|---|---|
| 30-day transactions | ... | ... |
| Unique users | ... | ... |
| Avg TX value | ... | ... |
| Gas costs | ... | ... |
| Token Gini | ... | ... |
- Write a 200-word analysis: Based purely on the on-chain data, which protocol appears healthier? What metrics are most informative, and which might be misleading?
Deliverable: A Python script producing the comparison table and your written analysis.
Exercise 34.6: Wash Trading Detector (Advanced)
Objective: Build a tool that identifies potential wash trading in NFT markets.
Tasks:
-
Using the Etherscan API's NFT transfer endpoint (
tokennfttx), fetch transfer data for a popular NFT collection. (CryptoPunks, Bored Apes, or a more recent collection.) -
For each NFT trade (transfer with a non-zero value), check whether the buyer's address was funded by the seller's address within the preceding 7 days. Use
get_transactions()to check funding sources. -
Implement a scoring system for wash trade likelihood: - High confidence: Seller directly funded buyer within 7 days before the trade - Medium confidence: Seller funded an intermediary that funded the buyer (2-hop chain) - Low confidence: Buyer and seller interacted with the same non-exchange address
-
Generate a report showing: - Total trades analyzed - Number flagged at each confidence level - Percentage of total volume that is potentially wash trading - The top 5 most suspicious trader pairs by frequency
-
Visualize the results: a scatter plot with trade price on the Y-axis and a color indicating wash trade confidence level.
Deliverable: A Python script with the wash trade detection pipeline, scoring system, report, and visualization.
Note: Due to API rate limits, you may need to work with a limited sample. Document your sampling strategy and discuss how the results might change with a complete dataset.
Exercise 34.7: Governance Participation Analysis (Advanced)
Objective: Analyze governance participation for a real DAO using on-chain voting data.
Tasks:
-
Choose a DAO with on-chain governance (Uniswap, Compound, Aave, or ENS). Find the governance contract address.
-
Query the governance contract's event logs to extract: -
ProposalCreatedevents (who proposed, when, description) -VoteCastevents (who voted, which proposal, for/against, voting power) -
Calculate the following metrics: - Participation rate by proposal: What percentage of token holders voted? - Voting power concentration: What percentage of total voting power do the top 10 voters control? - Voter consistency: How many addresses voted on more than half of all proposals? - Proposal success rate: What percentage of proposals passed? - Time-to-vote: How quickly after proposal creation do most votes arrive?
-
Create a visualization with four panels: - Participation rate over time (by proposal) - Voting power distribution (Lorenz curve) - Voter engagement histogram (number of proposals voted on per address) - Vote timing distribution (hours after proposal creation)
-
Write a 300-word analysis addressing: Is this governance system functioning democratically? What evidence suggests concentrated power? How does participation compare to traditional shareholder voting?
Deliverable: A complete governance analysis script with metrics, four-panel visualization, and written analysis.
Exercise 34.8: Build Your Own Dashboard (Advanced / Project)
Objective: Build a comprehensive on-chain analytics dashboard for a protocol or token of your choice.
Tasks:
-
Choose a protocol, token, or DeFi system to analyze. Define 5-7 metrics that are meaningful for evaluating its health and adoption.
-
Build a Python script that: - Fetches all necessary data from the Etherscan API (with proper rate limiting and error handling) - Processes the data into the metrics you defined - Generates an interactive plotly dashboard with at least 4 panels - Saves the dashboard as an HTML file that can be opened in any browser
-
Your dashboard must include at least: - One time-series metric (showing a trend over time) - One distribution metric (showing how something is distributed across addresses) - One comparison metric (comparing two related quantities) - One derived metric (calculated from combining multiple raw data sources)
-
Add a "Data Quality" section to your dashboard that discusses: - What percentage of activity might be bots or wash trading? - What off-chain activity is this dashboard not capturing? - What assumptions does each metric make?
-
Write documentation (as code comments) explaining your metric choices and what they reveal about the protocol's health.
Deliverable: A self-contained Python script that produces an interactive HTML dashboard with documentation.
Evaluation Criteria: - Metric selection (are the metrics meaningful and non-redundant?) - Data quality awareness (does the dashboard acknowledge limitations?) - Visualization quality (are the charts clear, labeled, and interactive?) - Code quality (proper rate limiting, error handling, modularity)
Exercise 34.9: Sybil Detection Challenge (Advanced)
Objective: Identify potential Sybil clusters in airdrop-eligible addresses.
Tasks:
-
The Arbitrum airdrop (March 2023) distributed ARB tokens to addresses that had used the Arbitrum bridge and ecosystem. Using Etherscan or Arbiscan data, find 50 addresses that received the ARB airdrop.
-
For each address, reconstruct a "behavioral profile": - When was the address first active? - What was its funding source (first incoming ETH transfer)? - How many unique contracts did it interact with? - What was the total number of transactions? - Was the address active after the airdrop snapshot date?
-
Apply clustering to identify potential Sybil groups: - Group addresses by funding source - Within each group, check for behavioral similarity (similar transaction counts, similar activity dates, similar contracts interacted with) - Score each cluster on a 1-5 Sybil likelihood scale
-
Visualize the clusters as a network graph using NetworkX, with edges connecting addresses that share a funding source and similar behavioral profiles.
-
Write a 200-word discussion: How confident are you in your Sybil identifications? What false positive rate do you expect? How could a sophisticated farmer avoid detection?
Deliverable: A Python script with behavioral profiling, clustering, network visualization, and analysis.
Exercise 34.10: On-Chain Forensics Simulation (Advanced / Capstone)
Objective: Trace a simulated stolen fund flow through a series of blockchain transactions.
Setup: This exercise uses simulated data (provided below) rather than real stolen funds.
Scenario: An attacker has stolen 500 ETH from a DeFi protocol. The attacker's address is 0xAttacker1111111111111111111111111111111111. Your job is to trace where the funds went.
Simulated transaction data (create this as a DataFrame):
theft_trace = pd.DataFrame([
{"from": "0xAttacker111...", "to": "0xHop1_aaa...", "value_eth": 200, "timestamp": "2024-01-15 14:00"},
{"from": "0xAttacker111...", "to": "0xHop1_bbb...", "value_eth": 150, "timestamp": "2024-01-15 14:02"},
{"from": "0xAttacker111...", "to": "0xHop1_ccc...", "value_eth": 150, "timestamp": "2024-01-15 14:05"},
{"from": "0xHop1_aaa...", "to": "0xHop2_ddd...", "value_eth": 100, "timestamp": "2024-01-16 09:00"},
{"from": "0xHop1_aaa...", "to": "0xHop2_eee...", "value_eth": 95, "timestamp": "2024-01-16 09:15"},
{"from": "0xHop1_bbb...", "to": "0xExchangeHotWallet...", "value_eth": 148, "timestamp": "2024-01-16 10:00"},
{"from": "0xHop1_ccc...", "to": "0xMixerContract...", "value_eth": 145, "timestamp": "2024-01-17 02:00"},
{"from": "0xHop2_ddd...", "to": "0xExchangeHotWallet...", "value_eth": 97, "timestamp": "2024-01-18 15:00"},
{"from": "0xHop2_eee...", "to": "0xHop3_fff...", "value_eth": 90, "timestamp": "2024-01-19 08:00"},
{"from": "0xHop3_fff...", "to": "0xExchangeHotWallet...", "value_eth": 85, "timestamp": "2024-01-20 12:00"},
])
Tasks:
-
Build a directed graph of the fund flow using NetworkX. Each node is an address, each edge is a transfer with the value as the edge weight.
-
Trace all paths from the attacker's address to terminal destinations (exchange or mixer). Calculate the total value reaching each destination.
-
Calculate the "time to cash out" --- how many hours between the initial theft and the funds reaching an exchange or mixer.
-
Identify the "laundering strategy": How did the attacker attempt to obscure the trail? (splitting, hopping, mixing, timing)
-
Create a network visualization showing the fund flow. Color-code nodes by type (attacker=red, intermediary=yellow, exchange=blue, mixer=purple). Size edges by transfer value.
-
Write a forensic report (300 words) summarizing: the theft amount, the laundering path, the percentage of funds that reached exchanges vs. mixers, the total time window, and recommendations for fund recovery.
Deliverable: A Python script with the network graph, path analysis, visualization, and forensic report.