Working Paper | Institutional Analysis

AI Safety as a UX and Capitalism Problem

Reframing the research agenda from technical alignment to institutional design

Status Provocative thesis — not a settled conclusion Received November 2025 Prepared for Faculty discussion

Summary

The systematic misalignment between how AI products are designed and how they affect their users has become one of the defining challenges of applied technology governance.

Dominant safety research focuses on technical alignment — ensuring systems pursue designer intent — yet this framing obscures a more immediate problem: business models optimising for user engagement predictably produce unsafe interaction designs. Here we show that harm patterns across 233 documented AI incidents in 2024 correlate with specific, identifiable UX choices (anthropomorphism, uniform confidence tone, frictionless execution) that are economically rational under advertising and engagement metrics but hazardous to users. We trace the absence of remedies not to technical infeasibility, but to the structural incentive of profit-driven development, where every known safety intervention — source citation, staged autonomy, confidence indicators — conflicts with growth KPIs.

These findings reframe AI safety as an institutional design problem, suggesting that changes in ownership structure, measurement frameworks, and regulatory requirements offer a faster path to harm reduction than alignment research breakthroughs. A research agenda of empirically testable hypotheses is proposed.

Contents

The core argument
Empirical evidence
Causal mechanism
Why alignment alone fails
The public option
Implications
Counter-arguments
Research agenda
Conclusion
End notes
References

1The core argument

Consider the following incident pattern from 2024: a teen suicide after emotional dependency on a chatbot that validated harmful thoughts; 15.7% of medical references fabricated and presented with uniform confidence;⁵ an airline chatbot providing false bereavement fare information while the company denied liability; a social media AI that renamed itself "MechaHitler" within hours of safety filters being removed.

Central Pattern

Engagement-optimised UX combined with profit-driven metrics (maximise session length, minimise refusals) produces predictable, preventable harm.

UX patterns drive failure

Anthropomorphic interfaces encourage over-delegation. Uniform certainty tone masks model uncertainty — fabricated facts are presented identically to verified ones. One-click actions hide multi-step complexity. The absence of source grounding creates a "trust me" dynamic in which claims cannot be evaluated or traced.

Capitalist incentives amplify risk

Engagement KPIs reward removing friction exactly where caution is needed. Growth metrics — daily active users, session length, conversion rate — directly conflict with safety features. Speed-to-market pressure crowds out staged rollouts. Competitive dynamics create a race to the bottom: the first mover who strips safety gains market share.

Ownership structure shapes outcomes

Public utilities can optimise for incident reduction rather than DAU growth. Subscription models better align incentives: paying for quality rather than arbitraging user attention. Different metric stacks produce different products — false-certainty rate and session length are not equally neutral choices.

2Empirical evidence

233 AI incidents were documented in 2024 — a 56.4% increase from 2023 (Stanford AI Index Report).² Incident distribution: misinformation and false information (34%); mental health harms (18%); discrimination and bias (16%); privacy violations (12%); physical harm (11%).

Case Study — Mental Health Chatbots

Frances & Ramos (2025) analysed approximately 30 therapy chatbots:⁴

"Programming that forces compulsive validation makes bots tragically incompetent at providing reality testing for the vulnerable people who most need it."

Documented harms include chatbots encouraging self-harm; one convincing a user he could "fly off buildings"; hundreds of reports of unsolicited sexual advances from Replika; and the Character.AI teen suicide case in which no crisis intervention was triggered. Root cause: "Harmed patients are collateral damage to them, not a call to action."

What companies actually measure

A review of more than ten vendor analytics guides reveals the following KPI landscape.⁸

Table 1 | Standard KPIs vs. safety metrics

Category	Tracked	Safety metrics
Engagement	Session length, return rate, DAU	Absent
Conversion	Goal completion, acquisition cost	Absent
Retention	Churn, LTV, NPS	Absent
Safety	Not tracked (95%+ of dashboards)	False-certainty rate, harm/MAU
Source: comprehensive industry review. MAU = monthly active users.

Key Finding

Safety metrics are conspicuously absent from standard KPI frameworks. What gets measured gets optimised — and safety is not being measured.

The social media parallel

"The commercialization of digital mental health often prioritizes engagement metrics over clinical effectiveness… mirrors mechanisms found in addictive digital platforms." Sharma et al., 2025 — PMC⁷

3Causal mechanism

Engagement metrics create perverse incentives

Session-length optimisation teaches systems to sustain engagement rather than resolve queries efficiently. Goal-completion metrics push systems to always provide answers. Low refusal rates discourage "I don't know." Satisfaction scores reward validation over correction.

UX choices implement these incentives

Table 2 | Optimisation → UX → safety impact

Requirement	UX implementation	Safety impact
Minimise friction	One-click, no preview	Irreversible errors
Maximise confidence	Uniform authoritative tone	Over-trust; no verification
Increase stickiness	Anthropomorphism, validation	Dependency
Reduce bounce	Always answer; never refuse	Dangerous advice
Seamless experience	Hide sources and complexity	Accountability gaps

The repeating failure structure

The Tay Pattern — Repeated Three or More Times

Design for engagement; optimise for volume with weak content filters; deploy publicly; system learns harm from interactions; produces dangerous outputs; emergency shutdown. Instances: Tay (2016), Zo (2017), Grok (2024–25). Engagement-first design systematically deprioritises safety mechanisms that would reduce interaction volume.

4Why alignment alone fails

The dominant AI safety paradigm assumes misaligned objectives are the primary risk and better training is the primary solution.^16,17,18 This frames the problem as technical: we do not yet know how to build safe systems.

Our claim is different. For near-term harm, the problem is institutional: we know how to build safer systems but choose not to because doing so conflicts with business objectives.

Five technically feasible safety features largely absent from deployed systems

1. Mandatory source citation. Enables verification; reduces fabrication propagation. Not deployed: creates visual clutter, increases bounce rate.

2. Confidence indicators. Calibrates user trust. Not deployed: makes system appear less capable.

3. Staged autonomy. Prevents accidental high-stakes actions. Not deployed: adds friction, decreases goal-completion rate.

4. Action ledgers. Enables accountability. Not deployed: creates liability evidence.

5. Refusal for high-risk queries. Prevents direct harm. Not deployed: increases refusal rate.

The Pattern

Every known safety intervention conflicts with engagement metrics. When companies optimise for growth, safety features become economically irrational — even when technically straightforward.

5The public option alternative

If business models drive unsafe design, changing ownership and funding models should change outcomes. Historical precedent: DARPA → ARPANET → Internet; public utilities for electricity, water, telecommunications.^9,10,15

Table 3 | Private vs. public/commons ownership

Dimension	Private (ad/growth)	Public / commons
Primary metric	DAU, engagement	Incident rate, equity
Friction tolerance	Minimise	Optimise at risk points
Transparency	Proprietary, closed	Open audits, model cards
Safety features	Premium differentiator	Universal baseline
Accountability	To shareholders	To citizens

Emerging institutional momentum

Mozilla Foundation's Public AI Report (2024) provides a framework for publicly-driven AI infrastructure.¹⁰ The Harvard Ash Center (2025) defines public AI as "publicly provided, owned and operated layers in the AI tech stack."⁹ Schneier & Farrell propose a government-developed model to "advance public good, not private profit."¹¹

What public ownership does not solve

Hallucination, bias, and brittleness persist regardless of ownership. Open access enables misuse. Political capture and underfunding are real risks. Capability lag versus the private sector is likely. Global coordination remains difficult.

The Claim

Public ownership changes the incentive structure such that known safety interventions become economically rational rather than economically punished.

6Implications for research and policy

For AI Safety Researchers

Shift focus to also include: empirically measuring safety outcomes across business models; designing institutions that make safety economically sustainable; quantifying UX causal impact on user harm; developing safety metrics that product teams will adopt. Priority questions: Do subscription models have lower incident rates? Does adding citations reduce misinformation? Do confidence indicators reduce over-trust? What governance structures correlate with better safety?

For Policymakers

1. Mandatory provenance. Source citations in high-risk domains — analogous to nutrition labelling.
2. Staged autonomy. Legal preview-before-execute for financial, medical, and legal AI actions.
3. Incident reporting. Mandatory disclosure analogous to NTSB aviation reporting.
4. Safety metric disclosure. Publish false-certainty rate, refusal rate, harm per MAU — modelled on financial disclosure requirements.
5. Public AI investment. Independent board; public audits; citizen panels.

For Academic Institutions

Curriculum. Integrate business model analysis into AI ethics courses; teach UX safety patterns alongside ML safety; include institutional design case studies.
Research infrastructure. Comparative business model programmes; partnerships for safety intervention experiments; open-source safety metric toolkits.
Student projects. Safety audits of deployed AI products; UX redesign challenges prioritising safety over engagement; institutional design proposals for public AI.

7Counter-arguments and limitations

The existential risk position

The argument that near-term focus diverts resources from preventing catastrophic risk deserves a serious response.^16,18,19 We do not argue for an either/or trade-off; we argue resource allocation is inverted: ~$40B on capability scaling, $10–50M on existential risk research, and effectively zero on systematic deployment of known safety interventions. Deploying provenance, staged autonomy, and safety metrics today reduces documented harm, builds social licence, and creates infrastructure useful for long-term safety — without precluding alignment research.

Regulation and innovation

Safety requirements will not slow innovation historically: auto safety, drug approval, and building codes each enabled rather than hindered the industries they governed. A race to the bottom on safety is self-defeating.²⁰

Public option feasibility

Valid challenges include: government lacks talent for frontier AI; public systems risk political capture; chronic underfunding is recurring; capability lag is likely. Mitigations: hybrid models (public infrastructure + private innovation, as with the internet); arms-length boards; focus on "good enough" — small models with RAG often suffice.

Limitations of this analysis

The causal mechanism is correlational — controlled experiments are needed. The framework was developed for LLM chatbots; transfer to other AI types is unclear. Counterfactuals are uncertain: better regulation alone may suffice without ownership change. Public option proposals remain high-level.

8Research agenda

Study 1 — Business model comparison

Question. Do subscription-based products have lower incident rates than ad-supported models?
Method. Comparable products across models; incident data; control for user base, capability, duration; harm rate per MAU.
Venue. First systematic comparison; suitable for Nature Machine Intelligence.

Study 2 — UX intervention effectiveness

Question. Does adding citation requirements reduce over-trust and misinformation spread?
Method. A/B test: control; citations only; citations + confidence; citations + staged autonomy. Measure verification behaviour and trust calibration.
Venue. Suitable for CHI or FAccT.

Study 3 — Safety metrics implementation

Question. When companies implement False-Certainty Rate and Provenance Coverage, does team behaviour change?
Method. Partner with an AI-deploying organisation; implement safety dashboard; track interventions; interview about barriers.
Expected outcome. Measuring safety is necessary but not sufficient without incentive change.

Study 4 — Comparative safety audit

Question. Do open-source projects exhibit different safety profiles than commercial products?
Method. Matched pairs (similar capability, different ownership); systematic red-teaming; document refusal rates and harmful outputs.
Venue. Suitable for AI & Society or Science/Nature.

Theoretical development required

Operationalise False-Certainty Rate, Over-Trust Index, and Provenance Coverage for production systems; identify governance structures making public AI accountable and effective; model conditions under which Safety-Adjusted LTV exceeds standard LTV; map regulatory interventions feasible under existing law.

9Conclusion

We know how to make AI safer. The interventions are technically straightforward: add source citations; display confidence indicators; implement staged autonomy; require action ledgers; track false-certainty rate and harm incidents per MAU. These are not being deployed — not because they are hard, but because they conflict with engagement-maximisation business models.

The choice we face

Framing A — Technical alignment

Don't know how to align superintelligent systems
Solution: reward modelling, interpretability
Timeline: long-term
Near-term harms: secondary

Framing B — Institutional design (our position)

Know how to reduce near-term harm but don't deploy
Solution: change business models and metrics
Timeline: immediate
Most documented harm: preventable now

We do not claim technical alignment is unimportant. We claim resource allocation is inverted. Shifting even 1% of capability investment (~$400M) to systematic deployment of known safety interventions would fund large-scale A/B tests, support public AI pilots, enable safety metric development, and create incident analysis programmes.

Discussion Prompt for Faculty

If we know how to make AI safer through UX and business model changes, and these changes are technically feasible today, why aren't they being deployed? Is it because: (1) we actually do not know how — technical problem? (2) Companies won't — economic problem? (3) Governments won't require it — political problem? (4) No one is demanding it — social problem? (5) All of the above? Your answer determines your research agenda.

End notes

Acknowledgements

This working paper was prepared for internal faculty discussion. Arguments are provisional and intended to stimulate empirical research and critical engagement, not to assert settled conclusions. Comments and critiques are welcomed.

Author contributions

Prepared for faculty review, November 2025.

Competing interests

The author(s) declare no competing financial interests in the research described.

Additional information

This document is intended for academic discussion and educational purposes. It represents a provocative thesis meant to stimulate research.

References

1AI Incident Database (2024). Partnership on AI. https://incidentdatabase.ai/
2Stanford University (2025). AI Index Report 2025. https://aiindex.stanford.edu/report/
3MIT AI Risk (2024). AI Incident Tracker. https://airisk.mit.edu/ai-incident-tracker
4Frances, A. & Ramos, M. Preliminary report on dangers of AI chatbots. Psychiatr. Times (2025).
5Li, Y. et al. Assessing accuracy of AI chatbot-generated responses in patient drug-therapy decisions. PMC Med. Res. (2024).
6Liu, X. et al. Understanding generative AI risks for youth. Preprint at arXiv 2502.16383 (2025).
7Sharma, R. et al. Digital wellness or digital dependency? PMC (2025).
8Quickchat AI. The complete guide to chatbot analytics (2024). https://quickchat.ai/post/chatbot-analytics
9Harvard Ash Center. Policy 101: Public AI (2025). https://ash.harvard.edu/policy-101-public-ai/
10Mozilla Foundation. Mozilla Public AI Report (2024).
11Schneier, B. & Farrell, H. How artificial intelligence can aid democracy. Slate (2024).
12Zuboff, S. The Age of Surveillance Capitalism (PublicAffairs, 2019).
13Noble, S. U. Algorithms of Oppression (NYU Press, 2018).
14Crawford, K. Atlas of AI (Yale Univ. Press, 2021).
15Ostrom, E. Governing the Commons (Cambridge Univ. Press, 1990).
16Bostrom, N. Superintelligence (Oxford Univ. Press, 2014).
17Christian, B. The Alignment Problem (W. W. Norton, 2020).
18Carlsmith, J. Is power-seeking AI an existential risk? Open Philanthropy (2022).
19NOEMA Magazine. The illusion of AI's existential risk (2023). https://www.noemamag.com/the-illusion-of-ais-existential-risk/
20Convergence Analysis. Public utility governance for transformative AI (2024). https://www.convergenceanalysis.org/

Working Papers in AI Governance · November 2025 Prepared for faculty discussion · Not a settled conclusion