Case Study 2: Salesforce's Office of Ethical and Humane Use — Operationalizing Responsible AI

DataField.Dev

Case Study 2: Salesforce's Office of Ethical and Humane Use — Operationalizing Responsible AI

Introduction

While Google's ATEAC (Case Study 1) illustrates responsible AI governance done wrong, Salesforce's Office of Ethical and Humane Use illustrates what it looks like when an organization takes operationalization seriously --- building internal structures, embedding review processes into product development, and sustaining the commitment over multiple years.

Salesforce's approach is particularly instructive for business leaders because Salesforce is a platform company. Its AI capabilities --- embedded in Salesforce Einstein and later Einstein GPT --- are used by over 150,000 customer organizations across industries. A bias in Salesforce's AI does not affect only Salesforce's direct users. It ripples through every organization that deploys Salesforce's AI features to make decisions about their own customers, employees, and operations. The responsible AI challenge at a platform company is therefore fundamentally different from the challenge at an end-user company: the platform must build AI that is trustworthy not just for its own context but for thousands of contexts it cannot anticipate.

This case study examines how Salesforce built its responsible AI function, how the product review process works, what challenges the company faced, and what lessons are transferable to other organizations.

Origins: From Principles to Office

Salesforce's journey toward responsible AI followed a path that echoes the maturity model discussed in Chapter 30.

Level 1 — Awareness (2017-2018). Salesforce CEO Marc Benioff began speaking publicly about the ethical responsibilities of technology companies. Internally, conversations about the ethical implications of Salesforce's products --- particularly its use by government agencies, law enforcement, and the military --- gained momentum. Individual employees raised concerns, but no formal process existed for addressing them.

Level 2 — Policy (2018-2019). In 2018, Salesforce appointed Paula Goldman as its first Chief Ethical and Humane Use Officer --- a C-suite title that signaled executive commitment. The same year, Salesforce published its "Ethical Use Policy," which identified categories of use that were prohibited (weapons development, unauthorized surveillance) and categories that required additional review (government use, criminal justice applications).

Level 3 — Practice (2019-2022). The Office of Ethical and Humane Use (OEHU) was established as a dedicated team with a defined mandate, budget, and organizational authority. The office developed the product review process that became its signature contribution to responsible AI practice.

The Organizational Structure

The OEHU operated with approximately 15-20 full-time staff, positioned as a function that reported to the CEO through the Chief Ethical and Humane Use Officer. The team included:

Ethics strategists who worked directly with product teams to identify ethical risks in new features and products
Policy analysts who tracked global AI regulations, developed internal policies, and maintained Salesforce's ethical use guidelines
Research and engagement staff who conducted research on responsible AI best practices and engaged with external stakeholders (civil society, academia, government)
Technical advisors who understood the AI/ML pipeline well enough to evaluate specific model risks

This organizational positioning --- reporting to the CEO, not buried within legal or compliance --- gave the OEHU strategic authority. It was not a compliance function that rubber-stamped products after they were built. It was a strategic function that influenced product direction during design.

Business Insight: Organizational positioning matters. When a responsible AI function reports to the CTO, it is perceived as a technical function. When it reports to the General Counsel, it is perceived as a legal/compliance function. When it reports to the CEO, it is perceived as a strategic function. Each positioning sends a different signal about the organization's priorities. Salesforce's choice to position the OEHU at the CEO level was a deliberate signal that ethical use was a business-level concern, not a technical sub-function.

The Product Review Process

The OEHU's most concrete contribution was the Ethical Use Advisory Council --- an internal body that reviewed AI products and features before they were launched. The review process worked as follows:

Trigger Mechanisms

Not every product required OEHU review. Review was triggered by:

Risk classification. Products and features were classified by risk level during the design phase. Features involving facial recognition, predictive scoring of individuals, automated decision-making about access to services, or government/law enforcement use were automatically flagged for review.
Self-referral. Product managers and engineers could refer their own products for review if they had ethical concerns. The OEHU cultivated a culture where self-referral was seen as responsible, not as an admission of wrongdoing.
External signal. Media reports, customer inquiries, civil society concerns, or employee feedback about specific use cases could trigger a review of an existing product.
Regulatory change. New regulations (such as the EU AI Act or state-level AI legislation) triggered reviews of products that might be affected.

The Review Framework

When a product was flagged for review, the OEHU applied a structured assessment framework:

1. Use case analysis. How is this product likely to be used? By whom? In what contexts? What are the most likely beneficial uses? What are the most likely harmful uses? What uses are foreseeable but not intended?

2. Stakeholder impact mapping. Who is affected by this product --- directly and indirectly? This extended beyond Salesforce's direct customers to the end users and communities affected by how Salesforce's customers deploy the product. For a predictive lead-scoring feature, the direct users are sales professionals, but the indirect subjects are the potential customers being scored.

3. Bias and fairness assessment. Does the product use AI/ML in ways that could produce biased outcomes? What data is it trained on? What demographic groups might be disadvantaged? The assessment drew on the fairness metrics and testing methodologies discussed in Chapters 25 and 26 of this textbook.

4. Transparency evaluation. Can users understand how the AI feature works? Can they see why a particular output was generated? Is the AI's role disclosed to end users? The explainability frameworks from Chapter 26 --- SHAP, LIME, model cards --- informed this assessment.

5. Misuse potential. Could this product be used in ways that violate Salesforce's ethical use policy? Could it enable discrimination, surveillance, or other harms? What technical or policy safeguards can mitigate misuse?

6. Regulatory alignment. Does this product comply with current and anticipated AI regulations in all jurisdictions where it will be available?

Outcomes

The review could result in several outcomes:

Approved. The product proceeds as designed.
Approved with modifications. The product proceeds with specific changes --- additional safeguards, modified features, enhanced documentation, or restricted availability.
Approved with monitoring requirements. The product proceeds but with ongoing monitoring for specific risks, with defined thresholds for re-review.
Sent back for redesign. The product requires fundamental design changes before it can proceed.
Not approved. In rare cases, a product or use case was determined to be incompatible with Salesforce's ethical use policy. The OEHU had the authority to recommend against launch, with the final decision escalated to executive leadership.

Research Note: Salesforce publicly disclosed that the OEHU conducted reviews for approximately 30-40 products and features per quarter at its peak. The vast majority were approved with modifications or monitoring requirements. Complete rejections were rare --- estimated at fewer than 5 percent of reviews --- but their existence gave the review process credibility. A review process that never says "no" is not a review process.

Concrete Examples

The Facial Recognition Decision

In 2020, during nationwide protests against racial injustice following the killing of George Floyd, Salesforce announced that it would prohibit the use of its facial recognition technology by law enforcement customers. The decision was informed by OEHU analysis that concluded:

Facial recognition technology exhibited well-documented accuracy disparities across racial groups (per the Buolamwini and Gebru research discussed in Chapter 25)
Law enforcement use of facial recognition carried heightened risks of civil rights violations
Salesforce could not ensure that its customers would deploy the technology in ways consistent with Salesforce's ethical use principles

The decision cost Salesforce revenue --- government and law enforcement contracts represented a meaningful business. But it also built trust with employees, civil society organizations, and customers who valued Salesforce's commitment to ethical use.

The Einstein Prediction Builder Guardrails

Salesforce's Einstein Prediction Builder allows customers to build custom AI prediction models using their own Salesforce data --- without writing code. This is a powerful capability with significant misuse potential: a customer could build a model that predicts employee termination risk, customer creditworthiness, or insurance claim likelihood using features that correlate with protected characteristics.

The OEHU worked with the Einstein product team to implement several guardrails:

Prohibited fields. Certain data fields (race, ethnicity, religion) are blocked from use as prediction inputs by default.
Proxy variable warnings. When a user selects fields that are known proxies for protected characteristics (such as ZIP code, which correlates with race), the system displays a warning explaining the proxy variable risk and directing users to fairness testing resources.
Model cards. Einstein Prediction Builder automatically generates model card documentation for each custom model, including feature importance, data composition, and performance metrics.
Fairness testing recommendations. After a model is built, the system recommends that users test for disparate impact across relevant demographic groups before deployment.

These guardrails did not eliminate the possibility of misuse. A determined user could find workarounds. But they raised the bar --- making it harder to build biased models inadvertently and creating a record of the decisions made during model development.

The GPT Trust Layer

When Salesforce integrated generative AI into its platform (Einstein GPT, later rebranded as Einstein Copilot), the OEHU's role became even more critical. Generative AI introduces risks --- hallucination, prompt injection, data leakage, harmful content generation --- that are fundamentally different from traditional predictive AI risks.

Salesforce's response was the "Einstein Trust Layer" --- a set of technical and policy safeguards designed to manage generative AI risks:

Data masking. Customer data sent to third-party LLMs (such as OpenAI's models) is masked to prevent sensitive information from being transmitted outside Salesforce's infrastructure.
Toxicity detection. Generated outputs are screened for harmful, biased, or inappropriate content before being presented to users.
Grounding. Generated outputs are grounded in the customer's Salesforce data (a RAG approach, as discussed in Chapter 21) to reduce hallucination.
Audit trail. Every generative AI interaction is logged, creating an audit trail for compliance and review.
Zero data retention. Salesforce negotiated agreements with LLM providers ensuring that customer data used for inference is not retained for model training.

Business Insight: The Einstein Trust Layer illustrates a critical insight for platform companies: when your customers use your AI, your responsible AI practices are their responsible AI practices. If Salesforce's generative AI produces biased or harmful outputs, Salesforce's customers bear the consequences --- but Salesforce bears the reputational and legal liability. This creates a strong business incentive for responsible AI that goes beyond ethics: it is a product quality issue.

Challenges and Limitations

Salesforce's approach was not without challenges:

The Scale Problem

Salesforce has over 150,000 customer organizations. Each uses Salesforce's AI features in unique ways, with unique data, in unique contexts. The OEHU could review Salesforce's products and features, but it could not review how each customer deployed those features. A predictive scoring model that is fair in one customer's context might be discriminatory in another's, depending on the training data, the use case, and the affected population.

Salesforce addressed this partly through the guardrails approach (blocking risky fields, warning about proxies, recommending fairness testing) and partly through customer education (publishing responsible AI guidelines and best practices). But the fundamental challenge of responsible AI at platform scale remains unsolved: the platform can provide tools for responsible AI, but it cannot ensure that every customer uses them.

The Tension Between Ethics and Sales

Salesforce sales teams sometimes found the OEHU's review process frustrating. A government contract that required facial recognition was lost. A product launch was delayed by two weeks for ethics review. A customer's use case was flagged as potentially problematic, creating an awkward conversation.

These tensions are inherent in any responsible AI program and echo the NovaMart dynamic discussed in Chapter 30. The OEHU's effectiveness depended on executive support --- specifically, Marc Benioff's willingness to accept short-term revenue impacts in exchange for long-term trust and brand value.

The Sustainability Question

Like many responsible AI teams across the industry, Salesforce's OEHU faced questions about long-term sustainability. The office was restructured in 2023, with some responsibilities absorbed into other functions. The Chief Ethical and Humane Use Officer role was maintained, but the team's headcount was reduced.

This restructuring mirrored the broader industry pattern documented in Case Study 1. It raised questions about whether the OEHU model --- a dedicated team with its own budget and reporting line --- is sustainable, or whether responsible AI must eventually be distributed across the organization to survive business cycle pressures.

Measuring Impact

The OEHU struggled with the same metrics challenge discussed in Chapter 30: how do you measure the impact of something that prevents harm? The facial recognition decision prevented potential civil rights violations, but the magnitude of harm prevented is inherently counterfactual. The product review process caught potential biases before they reached customers, but the value of a prevented incident is difficult to quantify.

The OEHU relied on a combination of activity metrics (reviews conducted, products modified), qualitative evidence (case examples of interventions), and reputational indicators (media coverage, industry recognition, customer feedback). These were informative but fell short of the rigorous outcome metrics that business leadership typically requires for sustained investment.

Lessons for Business Leaders

Lesson 1: Operationalization Requires Embedded Processes

Salesforce's most important contribution was not its ethics principles or its organizational structure. It was the product review process --- a defined, repeatable, embedded process that touched every high-risk product before launch. This is the difference between Level 2 (Policy) and Level 3 (Practice) on the responsible AI maturity model.

Principles without processes are aspirations. Processes without principles are mechanical. The combination --- clear values operationalized through defined processes --- is what makes responsible AI functional.

Lesson 2: Platform Companies Face Unique Challenges

For platform companies, responsible AI is not just an internal discipline. It is a product attribute. Customers choose Salesforce partly based on trust --- trust that the AI features embedded in the platform will not create legal, ethical, or reputational problems for the customer. The Einstein Trust Layer is not just ethics. It is product design.

This lesson extends beyond platform companies. Any organization that provides AI capabilities to other organizations --- through APIs, white-label products, or embedded features --- must consider how its responsible AI practices affect its customers' responsible AI posture.

Lesson 3: Executive Sponsorship Is Non-Negotiable

The OEHU's effectiveness depended on Marc Benioff's personal commitment. He created the role, appointed a C-suite officer, and publicly supported decisions (like the facial recognition ban) that cost revenue. Without this executive sponsorship, the OEHU would have been overruled the first time it conflicted with a sales target.

The implication for other organizations: responsible AI programs that lack executive sponsorship will be defunded, restructured, or overruled when they create friction. The executive sponsor need not be the CEO, but they must have the authority and willingness to protect the program.

Lesson 4: Guardrails Scale Better Than Reviews

The OEHU could review 30-40 products per quarter. It could not review 150,000 customers' deployments. The guardrails approach --- building responsible AI constraints into the product itself (blocked fields, proxy warnings, automated model cards) --- scales in ways that human review processes do not.

This lesson is directly applicable to any organization scaling AI: invest in technical guardrails that make it hard to do the wrong thing by default, rather than relying solely on review processes that catch problems after they exist.

Lesson 5: Responsible AI Is an Ongoing Commitment, Not a Destination

Salesforce's responsible AI journey was not linear. It involved expansion (creating the OEHU), innovation (the Einstein Trust Layer), and contraction (team restructuring). The generative AI era introduced new risks that required new responses. Regulatory changes demanded new compliance capabilities.

Responsible AI is not a project with a completion date. It is an ongoing organizational capability that must evolve as technology, regulation, and societal expectations change.

Comparison: ATEAC vs. OEHU

The contrast between Google's ATEAC and Salesforce's OEHU is instructive:

Dimension	Google ATEAC	Salesforce OEHU
Type	External advisory council	Internal operational team
Duration	10 days	5+ years
Authority	Advisory, undefined	Operational, with review authority
Stakeholder input	None before launch	Iterative development
Process	None established	Defined product review process
Impact	None (dissolved before first meeting)	30-40 product reviews per quarter
Transparency	Public announcement, public dissolution	Gradual, with published case examples
Sustainability	Failed immediately	Sustained with restructuring

The comparison suggests that internal operational teams with defined processes are more effective and sustainable than external advisory councils with undefined mandates --- though the ideal is likely a combination of internal operations and external advisory input.

Discussion Questions

Organizational Design. Salesforce positioned its OEHU at the CEO level rather than within legal, compliance, or engineering. What are the advantages and disadvantages of this positioning? Where would you position a responsible AI function in your organization?
Platform Responsibility. Salesforce cannot control how its 150,000 customers use its AI features. To what extent is Salesforce responsible for how its customers deploy Einstein? Where does platform responsibility end and customer responsibility begin?
The Guardrails Approach. Salesforce built responsible AI constraints into its products (blocked fields, proxy warnings, automated model cards). Could these guardrails be counterproductive --- for example, by giving users a false sense of security ("the system blocked bad fields, so my model must be fair")? How should organizations communicate the limitations of automated guardrails?
Revenue vs. Ethics. Salesforce's facial recognition ban cost revenue. Under what circumstances should a company forgo revenue for ethical reasons? How should the decision be made? Who should make it?
Sustainability. Salesforce's OEHU was restructured and reduced in 2023, mirroring the broader industry pattern. Does this suggest that the dedicated responsible AI team model is inherently unsustainable? What alternative models might be more resilient?
Applying to Athena. Athena is not a platform company --- it uses AI for its own operations and customer interactions. Which elements of Salesforce's approach are transferable to a company like Athena? Which are not?
The Metrics Challenge. The OEHU struggled to measure its impact because the value of prevented harm is inherently counterfactual. How would you measure the impact of a responsible AI program? Develop three metrics that would be persuasive to a CFO skeptical of the program's ROI.

This case study connects to Chapter 30's discussion of responsible AI team design, the principles-to-practice gap, and the business case for responsible AI. Salesforce's Einstein Prediction Builder guardrails connect to the bias detection techniques in Chapter 25 and the explainability tools in Chapter 26. The Einstein Trust Layer's RAG approach connects to Chapter 21. Salesforce's regulatory compliance strategy connects to the EU AI Act discussion in Chapter 28.