Quiz: Digital Divide, Data Justice, and Equity

DataField.Dev

Quiz: Digital Divide, Data Justice, and Equity

Test your understanding before moving to the next chapter. Target: 70% or higher to proceed.

Section 1: Multiple Choice (1 point each)

1. The "third-level digital divide" refers to:

A) The gap between countries with broadband infrastructure and those without.
B) The disparity in device quality between wealthy and low-income users.
C) Unequal outcomes from digital engagement, even among people with equivalent access and skills.
D) The gap between those who understand algorithms and those who do not.

Answer

**C)** Unequal outcomes from digital engagement, even among people with equivalent access and skills. *Explanation:* Section 32.1.1 identifies three levels: access (first), skills (second), and outcomes (third). The third-level divide recognizes that even when people have the same access and skills, their digital engagement produces unequal outcomes based on social position — a job seeker in a wealthy suburb with professional networks benefits differently from the same tools than a job seeker in an underserved community. This is the most structurally embedded dimension of digital inequality.

2. A study by the Markup (2022) found that AT&T, Verizon, and other ISPs:

A) Offered identical service quality across all neighborhoods in the cities studied.
B) Offered slower speeds at higher prices in neighborhoods with higher percentages of Black, Hispanic, and low-income residents.
C) Invested more in broadband infrastructure in low-income neighborhoods to close the digital divide.
D) Were legally required to provide equal service quality across neighborhoods.

Answer

**B)** Offered slower speeds at higher prices in neighborhoods with higher percentages of Black, Hispanic, and low-income residents. *Explanation:* Section 32.2.2 cites the Markup's landmark study documenting that ISPs offered slower speeds at higher prices in neighborhoods with higher percentages of Black, Hispanic, and low-income residents. The disparities persisted even after controlling for housing density and distance from network infrastructure, establishing digital redlining as an empirically documented phenomenon rather than a theoretical concern.

3. Nick Couldry and Ulises Mejias argue that data colonialism represents:

A) A metaphor with no structural connection to historical colonialism.
B) A new stage of capitalism in which the raw material being extracted is human life itself — relationships, behaviors, emotions — converted into data.
C) A positive development that brings digital services to underserved populations.
D) A problem limited to the Global South that does not affect people in wealthy countries.

Answer

**B)** A new stage of capitalism in which the raw material being extracted is human life itself — relationships, behaviors, emotions — converted into data. *Explanation:* Section 32.3.1 presents Couldry and Mejias's framework: data colonialism is not merely a metaphor but a structural analysis of how powerful actors extract value from human life by converting it into data, with minimal compensation, consent, or reciprocal benefit. The parallels to historical colonialism are specific and structural — extraction, appropriation through legal mechanisms, value export, dependency creation, and erasure — though the raw material being extracted is behavioral data rather than physical resources.

4. The CARE Principles for Indigenous Data Governance stand for:

A) Collection, Analysis, Reporting, Evaluation
B) Community, Authority, Responsibility, Ethics
C) Collective Benefit, Authority to Control, Responsibility, Ethics
D) Consent, Access, Rights, Equity

Answer

**C)** Collective Benefit, Authority to Control, Responsibility, Ethics. *Explanation:* Section 32.4.2 describes the CARE Principles: Collective Benefit (data ecosystems should benefit Indigenous peoples), Authority to Control (Indigenous rights and authority over Indigenous data must be recognized), Responsibility (those working with Indigenous data must share how it is used), and Ethics (Indigenous rights and wellbeing should be the primary concern at all stages). CARE complements the FAIR Principles by adding a justice dimension to data accessibility.

5. Data feminism's concept of "missing data" refers to:

A) Data that has been accidentally deleted from databases.
B) The systematic absence of data about marginalized populations, which renders them invisible to data-driven decision-making.
C) Data gaps caused by technical failures in collection systems.
D) Information that individuals choose not to share with data collectors.

Answer

**B)** The systematic absence of data about marginalized populations, which renders them invisible to data-driven decision-making. *Explanation:* Section 32.5.2 explains that missing data is not accidental but reflects power structures. Examples include the absence of official femicide data (requiring activist-built databases), the absence of comprehensive police violence data (requiring journalist-built databases), and the absence of data on transgender populations (rendering their needs invisible to policymaking). The data is "missing by design — or, more precisely, by the design of systems that were built by and for people who didn't need that data."

6. Eli describes the vicious cycle of the "data divide" (Section 32.1.3). Which of the following correctly represents this cycle?

A) More data leads to better services, which leads to more users, which leads to more data.
B) Underrepresented communities generate less data; algorithms perform worse for them; worse performance reduces service value; reduced value discourages adoption; less adoption means less data.
C) Government regulation reduces data collection, which reduces service quality, which reduces adoption.
D) Communities with more data have better broadband, which produces more data.

Answer

**B)** Underrepresented communities generate less data; algorithms perform worse for them; worse performance reduces service value; reduced value discourages adoption; less adoption means less data. *Explanation:* Section 32.1.3 describes this four-step vicious cycle precisely. Communities without reliable internet generate less data; algorithms trained on incomplete data perform worse for them; worse performance makes digital services less useful; reduced usefulness discourages adoption; and reduced adoption means even less data. Eli captures this by saying his neighborhood is "the missing data" — the technology was never designed with his community as the audience.

7. VitraMed's equity audit (Section 32.6.2) found that its predictive models performed worst for:

A) Patients with the most complete electronic health records.
B) Patients in urban, well-insured populations.
C) Low-income, rural, minority patients with limited healthcare access and incomplete records.
D) Patients who used VitraMed's telehealth features most frequently.

Answer

**C)** Low-income, rural, minority patients with limited healthcare access and incomplete records. *Explanation:* Section 32.6.2 reports that the model's false negative rate was 31% higher for Black patients, sensitivity was 23% lower for rural ZIP codes, and the model performed worst for the patients who needed it most. The structural causes were training data skew (predominantly from insured, urban populations), feature availability gaps (less complete records for underserved patients), and outcome label bias (adverse events not recorded for patients not in contact with the healthcare system).

8. Which of the following is NOT identified in the chapter as a form of resistance to data colonialism?

A) Data sovereignty movements
B) Data cooperatives
C) Increasing data collection by Global North corporations in the Global South
D) Platform cooperatives owned by users

Answer

**C)** Increasing data collection by Global North corporations in the Global South. *Explanation:* Section 32.3.3 lists data sovereignty movements, data cooperatives, platform cooperatives, regulatory interventions, and counter-data practices as forms of resistance to data colonialism. Increasing data collection by Global North corporations would represent an intensification of the data colonial dynamic, not resistance to it. The chapter explicitly contrasts extractive data practices (the problem) with community-controlled governance (the solution).

9. Linnet Taylor's data justice framework identifies three pillars. Which of the following correctly lists all three?

A) Privacy, security, and transparency
B) (In)visibility, engagement with technology, and non-discrimination
C) Consent, access, and portability
D) Fairness, accountability, and transparency

Answer

**B)** (In)visibility, engagement with technology, and non-discrimination. *Explanation:* Section 32.8.1 presents Taylor's three pillars: (In)visibility concerns who is seen and unseen in data systems; engagement with technology concerns the terms on which people interact with data-collecting systems; and non-discrimination concerns whether data-driven decisions treat people equitably. The first pillar is notable for its ambivalence: visibility can enable both services and surveillance.

10. The chapter's core argument about the relationship between individual data rights and data justice is that:

A) Individual data rights are sufficient to achieve data equity if properly enforced.
B) Individual data rights are unnecessary — only collective governance matters.
C) Individual data rights are necessary but not sufficient for data equity; collective mechanisms are also needed.
D) Individual data rights and data justice are unrelated concepts.

Answer

**C)** Individual data rights are necessary but not sufficient for data equity; collective mechanisms are also needed. *Explanation:* Section 32.8.2 makes this argument explicitly: individual rights assume a level playing field, but the playing field is not level. Individual data rights mean little if you lack the broadband access, digital literacy, economic power, or political representation to exercise them. Data justice therefore requires collective mechanisms — community governance, cooperative structures, political organizing — that can counterbalance the structural power of data-collecting institutions.

Section 2: True/False with Justification (1 point each)

11. "The digital divide in the United States has largely been closed by market forces and private sector investment."

Answer

**False.** *Explanation:* Section 32.1 describes how the pandemic revealed that the digital divide was not narrowing — in many communities, it was deepening. Approximately 40% of Detroit households lacked broadband during the pandemic. The FCC estimates 24 million Americans lack broadband, with independent researchers suggesting the actual number is closer to 42 million. Market forces have not closed the gap because investment decisions follow expected return, which follows existing wealth patterns — reproducing inequality rather than addressing it.

12. "The FAIR Principles and the CARE Principles are contradictory frameworks that cannot be applied simultaneously."

Answer

**False.** *Explanation:* Section 32.4.2 explicitly states that the CARE Principles complement rather than compete with the FAIR Principles. FAIR prioritizes data accessibility (Findable, Accessible, Interoperable, Reusable); CARE prioritizes data justice (Collective Benefit, Authority to Control, Responsibility, Ethics). Without CARE, FAIR can facilitate data extraction — making Indigenous data maximally accessible without ensuring Indigenous benefit. Together, they point toward governance that is both scientifically productive and ethically grounded.

13. "Digital redlining is caused exclusively by explicit racial discrimination by telecommunications companies."

Answer

**False.** *Explanation:* Section 32.2.3 explains that digital redlining is primarily structural rather than explicitly discriminatory. As Eli observes, "It's not that someone decides 'let's give Black neighborhoods worse internet.' It's that investment decisions follow expected return, expected return follows existing wealth, existing wealth follows centuries of discriminatory policy." The result reproduces inequality without requiring any individual to make an explicitly discriminatory decision. This is structural discrimination — embedded in economic incentive systems rather than individual prejudice.

14. "VitraMed's biased health prediction models resulted from deliberate decisions by the company's engineers to disadvantage certain patient populations."

Answer

**False.** *Explanation:* Section 32.6 explicitly states that the model disparities were "structural, not intentional." The causes were training data skew (data from predominantly insured, white, urban populations), feature availability gaps (metrics correlated with healthcare access rather than health status), and outcome label bias (adverse events undercounted for patients not in regular contact with the healthcare system). VitraMed built on an unequal data foundation, and the resulting models encoded existing structural inequalities without deliberate design.

15. "Data feminism argues that data science should incorporate emotional knowledge and lived experience alongside quantitative analysis."

Answer

**True.** *Explanation:* Principle 3 of data feminism (Section 32.5.1) — "Elevate emotion and embodiment" — argues that data science's emphasis on objectivity and rationality often devalues emotional knowledge, lived experience, and embodied understanding. A more complete data science would incorporate multiple ways of knowing. This does not mean abandoning rigor; it means recognizing that quantitative analysis alone cannot capture the full picture and that the perspectives of affected communities provide essential context for interpreting data.

Section 3: Short Answer (2 points each)

16. Explain the concept of intersectionality (Section 32.1.2) and why it is important for understanding the digital divide. Use a specific example to illustrate how overlapping axes of disadvantage compound rather than simply add.

Sample Answer

Intersectionality, developed by Kimberle Crenshaw, is the insight that overlapping systems of oppression cannot be understood by examining each axis of disadvantage in isolation — they compound multiplicatively rather than additively. For the digital divide, this means that an elderly, low-income, Black woman in rural Mississippi does not simply face the sum of age-based, income-based, race-based, and geography-based digital disadvantages; she faces their multiplication. Her rural location means limited broadband options. Her low income means she cannot afford premium internet service. Her age means she may lack the digital literacy to navigate complex online systems. Her race means she lives in a community more likely to have experienced digital redlining. And crucially, these factors interact: the digital literacy programs available in her area may not be designed for her needs, the telehealth platforms that could compensate for her rural healthcare access may require broadband she doesn't have, and the algorithms that drive service delivery may perform worst for someone in her demographic profile. Each disadvantage magnifies the others. *Key points for full credit:* - Defines intersectionality clearly - Explains the multiplicative (not additive) nature of compounding disadvantage - Provides a specific, concrete example with interacting factors

17. Describe how digital redlining compounds algorithmic bias. Reference at least two specific examples from the chapter (e.g., predictive policing, credit scoring, health technology).

Sample Answer

Digital redlining — discriminatory patterns in digital infrastructure investment — compounds algorithmic bias by systematically underrepresenting certain communities in the data that algorithms are trained on. First, predictive policing ([Chapter 14](../../part-03-algorithmic-systems-and-ai-ethics/chapter-14-bias-in-data-bias-in-machines/index.md)) is deployed in the same communities that experience digital redlining, producing surveillance saturation in neighborhoods already denied equitable infrastructure. The same communities that lack adequate broadband are subjected to intensive algorithmic monitoring — they are the most watched and the least connected. Second, algorithmic credit scoring ([Chapter 15](../../part-03-algorithmic-systems-and-ai-ethics/chapter-15-fairness-definitions-tensions-and-tradeoffs/index.md)) performs less accurately in communities with less digital footprint data, because the features that credit models rely on (online transaction history, digital financial behavior) are less available for people on the wrong side of the digital divide. The result is higher denial rates and worse terms for communities that already face infrastructure discrimination. Third, VitraMed's health technology (Section 32.6) is functionally unavailable to patients without broadband — precisely the patients most likely to need expanded healthcare access. Digital redlining creates a data desert that corrupts every algorithm built on that data. *Key points for full credit:* - Explains the mechanism (infrastructure inequality leads to data underrepresentation) - Provides at least two specific examples with clear causal connections - Connects digital redlining to downstream algorithmic outcomes

18. Explain what Sofia Reyes means when she says "Data colonialism isn't a metaphor — it's a structural analysis" (Section 32.3.2). What structural flows does she identify?

Sample Answer

Sofia means that data colonialism is not merely a rhetorical comparison to historical colonialism but an analysis of actual structural flows of value and harm. She identifies three specific flows that parallel colonial extraction: data flows from communities to corporations (analogous to raw materials flowing from colonies to imperial centers), profits flow from corporations to shareholders (analogous to wealth flowing to the colonizing nation), and harms flow from algorithms back to communities (analogous to the social and environmental damage left behind by extraction). The structural analysis holds because the logic is the same: powerful actors extract value from less powerful populations through legal mechanisms (terms of service, like colonial land seizure), with minimal compensation, consent, or reciprocal benefit, while creating dependency (platform lock-in, like colonial infrastructure designed to serve the colonizer). Calling it a "structural analysis" rather than a "metaphor" insists that these are not merely similar patterns — they are the same extractive logic operating through different technological means. *Key points for full credit:* - Distinguishes structural analysis from metaphor - Identifies the specific flows (data, profits, harms) - Connects to the mechanisms of extraction (terms of service, dependency, minimal consent)

19. The chapter identifies community broadband, political digital literacy, and data governance advocacy as three community responses to the digital divide in Detroit (Section 32.7.2). Select one of these three responses and explain how it addresses a specific dimension of data justice as defined by Linnet Taylor's framework (visibility, engagement, or non-discrimination).

Sample Answer

Data governance advocacy — Eli's testimony before the city council demanding community representation on data governance boards, mandatory equity audits, and public reporting on data investment distribution — addresses the **engagement** pillar of Taylor's data justice framework. Engagement asks: on what terms do people engage with data-collecting technologies? Are those terms genuinely voluntary, informed, and equitable? Currently, residents of Eli's neighborhood are subject to data collection (Smart City sensors, predictive policing algorithms) without meaningful participation in how those systems are designed, deployed, or governed. Data governance advocacy aims to change the terms of engagement by ensuring that affected communities have a voice in governance decisions. Rather than being passive subjects of data systems designed elsewhere, communities become active participants in shaping the rules that govern how their data is collected, used, and protected. This shifts the power dynamic from extraction (data taken from the community) to participation (data governed with the community). *Key points for full credit:* - Selects one community response and one data justice pillar - Explains the specific connection between the response and the pillar - Shows how the response changes the power dynamic

Section 4: Applied Scenario (5 points)

20. Read the following scenario and answer all parts.

Scenario: CommunityHealth AI

A health technology startup called CommunityHealth AI develops a predictive model to identify patients at risk of hospitalization. The model is trained on electronic health records from three large hospital systems in major cities. The startup markets the model to community health centers in rural areas, small towns, and tribal lands, claiming it will help identify at-risk patients before they require emergency care.

After six months of deployment, community health centers report that the model consistently underestimates risk for their patient populations. Native American patients, in particular, are flagged as "low risk" at significantly higher rates than their actual hospitalization rates would suggest. The model also performs poorly for patients who use traditional medicine alongside or instead of conventional healthcare, because their medical histories contain gaps that the model interprets as "healthy" rather than "undocumented."

A tribal health authority requests access to the model's training data and methodology. CommunityHealth AI declines, citing proprietary concerns.

(a) Using the concept of the "data divide" (Section 32.1.3), explain why the model performs poorly for rural and indigenous populations. Identify at least two structural causes. (1 point)

(b) Apply the CARE Principles (Section 32.4.2) to the tribal health authority's request. For each of the four principles (Collective Benefit, Authority to Control, Responsibility, Ethics), explain how CommunityHealth AI's current practices either satisfy or violate the principle. (1 point)

(c) Using the data feminism principle of "missing data" (Section 32.5.2), analyze how the model's interpretation of incomplete medical histories as "healthy" reflects a structural bias. Who is made invisible by this design choice, and what are the consequences? (1 point)

(d) Apply the Data Equity Audit framework (Representation, Access, Benefit, Harm, Governance) to CommunityHealth AI's model. Identify at least one finding at each of the five steps. (1 point)

(e) Propose three specific interventions — one technical, one governance-related, and one community-based — that CommunityHealth AI should implement. For each, explain what it addresses and what its limitations are. (1 point)

Sample Answer

**(a)** The model performs poorly because of two structural causes rooted in the data divide: (1) **Training data skew**: the model was trained exclusively on data from large urban hospital systems, whose patient populations differ systematically from rural and tribal communities in demographics, disease patterns, healthcare access, and documentation practices. Urban hospital records reflect frequent encounters with insured patients; rural and tribal records reflect less frequent encounters with patients who may use traditional medicine or face access barriers. (2) **Feature availability gaps**: the model relies on features (visit frequency, specialist referrals, lab results) that correlate with healthcare access rather than health status. Patients in underserved communities have fewer recorded encounters not because they are healthier but because they have less access to the care that generates the data the model expects. **(b)** CARE analysis: - **Collective Benefit**: Violated. The model was marketed to tribal communities but developed without their input, and its poor performance actively harms them by underestimating risk. - **Authority to Control**: Violated. The tribal health authority requested access to data and methodology and was denied. Indigenous authority over data about their community is not recognized. - **Responsibility**: Violated. CommunityHealth AI has not shared how the model uses data, how it performs for indigenous populations, or what steps it has taken to address disparities. - **Ethics**: Violated. Indigenous patient wellbeing is not the primary concern — proprietary competitive advantage is prioritized over the health outcomes of the community the model is supposed to serve. **(c)** The model treats incomplete medical histories as evidence of health, but for many patients — especially those using traditional medicine, facing access barriers, or distrusting conventional healthcare systems — incomplete records reflect structural factors, not health status. This design choice makes invisible the patients who are most at risk: those whose relationship with the conventional healthcare system is limited or complicated. It reflects the data feminism insight that "silence in data is not just an absence" — it is a statement that the experiences of patients outside the conventional system are not worth measuring. The consequence is that the model systematically underestimates risk for the populations that most need early intervention. **(d)** Data Equity Audit: - **Representation**: Native American and rural patients are systematically underrepresented in training data, which was drawn exclusively from urban hospital systems. - **Access**: The tribal health authority cannot access the model's data or methodology. Communities affected by the model's decisions have no access to the information necessary to evaluate or challenge those decisions. - **Benefit**: Benefits accrue primarily to the startup (revenue from licensing) and to the well-represented populations for whom the model works well. Rural and indigenous communities — the marketed beneficiaries — receive a model that underserves them. - **Harm**: Indigenous patients are flagged as "low risk" when they are actually at risk, potentially delaying critical interventions. The harm falls disproportionately on the most vulnerable populations. - **Governance**: No community input in model design or evaluation. No equity audit conducted before deployment. Proprietary claims prevent independent assessment. **(e)** Three interventions: 1. **Technical: Diverse training data and calibration.** Partner with tribal and rural health systems to include representative data in model training, and recalibrate the model for different population subgroups. Limitation: data from these communities may be limited, requiring supplemental approaches; and data sharing must respect indigenous data governance principles. 2. **Governance: Mandatory equity audit and transparency.** Conduct and publish equity audits before deploying the model in any new community, with disaggregated performance metrics by race, geography, and healthcare access. Grant requesting communities access to methodology. Limitation: audits measure existing performance but cannot guarantee future equity; transparency may reveal proprietary methods. 3. **Community-based: Community advisory board with indigenous representation.** Establish a community advisory board including representatives from tribal health authorities and rural providers, with decision-making power over model deployment, evaluation criteria, and data governance. Limitation: advisory boards can become tokenistic if not given real authority; meaningful participation requires resources and commitment.

Scoring & Review Recommendations

Score Range	Assessment	Next Steps
Below 50% (< 15 pts)	Needs review	Re-read Sections 32.1-32.3 carefully, redo Part A exercises
50-69% (15-20 pts)	Partial understanding	Review specific weak areas, focus on Part B exercises
70-85% (21-25 pts)	Solid understanding	Ready to proceed to Chapter 33
Above 85% (> 25 pts)	Strong mastery	Proceed to Chapter 33: Labor, Automation, and the Gig Economy

Section	Points Available
Section 1: Multiple Choice	10 points (10 questions x 1 pt)
Section 2: True/False with Justification	5 points (5 questions x 1 pt)
Section 3: Short Answer	8 points (4 questions x 2 pts)
Section 4: Applied Scenario	5 points (5 parts x 1 pt)
Total	28 points