Working Paper | Institutional Analysis
Reframing the research agenda from technical alignment to institutional design
Summary
The systematic misalignment between how AI products are designed and how they affect their users has become one of the defining challenges of applied technology governance.
Dominant safety research focuses on technical alignment — ensuring systems pursue designer intent — yet this framing obscures a more immediate problem: business models optimising for user engagement predictably produce unsafe interaction designs. Here we show that harm patterns across 233 documented AI incidents in 2024 correlate with specific, identifiable UX choices (anthropomorphism, uniform confidence tone, frictionless execution) that are economically rational under advertising and engagement metrics but hazardous to users. We trace the absence of remedies not to technical infeasibility, but to the structural incentive of profit-driven development, where every known safety intervention — source citation, staged autonomy, confidence indicators — conflicts with growth KPIs.
These findings reframe AI safety as an institutional design problem, suggesting that changes in ownership structure, measurement frameworks, and regulatory requirements offer a faster path to harm reduction than alignment research breakthroughs. A research agenda of empirically testable hypotheses is proposed.
Consider the following incident pattern from 2024: a teen suicide after emotional dependency on a chatbot that validated harmful thoughts; 15.7% of medical references fabricated and presented with uniform confidence;5 an airline chatbot providing false bereavement fare information while the company denied liability; a social media AI that renamed itself "MechaHitler" within hours of safety filters being removed.
Engagement-optimised UX combined with profit-driven metrics (maximise session length, minimise refusals) produces predictable, preventable harm.
Anthropomorphic interfaces encourage over-delegation. Uniform certainty tone masks model uncertainty — fabricated facts are presented identically to verified ones. One-click actions hide multi-step complexity. The absence of source grounding creates a "trust me" dynamic in which claims cannot be evaluated or traced.
Engagement KPIs reward removing friction exactly where caution is needed. Growth metrics — daily active users, session length, conversion rate — directly conflict with safety features. Speed-to-market pressure crowds out staged rollouts. Competitive dynamics create a race to the bottom: the first mover who strips safety gains market share.
Public utilities can optimise for incident reduction rather than DAU growth. Subscription models better align incentives: paying for quality rather than arbitraging user attention. Different metric stacks produce different products — false-certainty rate and session length are not equally neutral choices.
233 AI incidents were documented in 2024 — a 56.4% increase from 2023 (Stanford AI Index Report).2 Incident distribution: misinformation and false information (34%); mental health harms (18%); discrimination and bias (16%); privacy violations (12%); physical harm (11%).
Frances & Ramos (2025) analysed approximately 30 therapy chatbots:4
"Programming that forces compulsive validation makes bots tragically incompetent at providing reality testing for the vulnerable people who most need it."
Documented harms include chatbots encouraging self-harm; one convincing a user he could "fly off buildings"; hundreds of reports of unsolicited sexual advances from Replika; and the Character.AI teen suicide case in which no crisis intervention was triggered. Root cause: "Harmed patients are collateral damage to them, not a call to action."
A review of more than ten vendor analytics guides reveals the following KPI landscape.8
Table 1 | Standard KPIs vs. safety metrics
| Category | Tracked | Safety metrics |
|---|---|---|
| Engagement | Session length, return rate, DAU | Absent |
| Conversion | Goal completion, acquisition cost | Absent |
| Retention | Churn, LTV, NPS | Absent |
| Safety | Not tracked (95%+ of dashboards) | False-certainty rate, harm/MAU |
| Source: comprehensive industry review. MAU = monthly active users. | ||
Safety metrics are conspicuously absent from standard KPI frameworks. What gets measured gets optimised — and safety is not being measured.
"The commercialization of digital mental health often prioritizes engagement metrics over clinical effectiveness… mirrors mechanisms found in addictive digital platforms." Sharma et al., 2025 — PMC7
Session-length optimisation teaches systems to sustain engagement rather than resolve queries efficiently. Goal-completion metrics push systems to always provide answers. Low refusal rates discourage "I don't know." Satisfaction scores reward validation over correction.
Table 2 | Optimisation → UX → safety impact
| Requirement | UX implementation | Safety impact |
|---|---|---|
| Minimise friction | One-click, no preview | Irreversible errors |
| Maximise confidence | Uniform authoritative tone | Over-trust; no verification |
| Increase stickiness | Anthropomorphism, validation | Dependency |
| Reduce bounce | Always answer; never refuse | Dangerous advice |
| Seamless experience | Hide sources and complexity | Accountability gaps |
Design for engagement; optimise for volume with weak content filters; deploy publicly; system learns harm from interactions; produces dangerous outputs; emergency shutdown. Instances: Tay (2016), Zo (2017), Grok (2024–25). Engagement-first design systematically deprioritises safety mechanisms that would reduce interaction volume.
The dominant AI safety paradigm assumes misaligned objectives are the primary risk and better training is the primary solution.16,17,18 This frames the problem as technical: we do not yet know how to build safe systems.
Our claim is different. For near-term harm, the problem is institutional: we know how to build safer systems but choose not to because doing so conflicts with business objectives.
1. Mandatory source citation. Enables verification; reduces fabrication propagation. Not deployed: creates visual clutter, increases bounce rate.
2. Confidence indicators. Calibrates user trust. Not deployed: makes system appear less capable.
3. Staged autonomy. Prevents accidental high-stakes actions. Not deployed: adds friction, decreases goal-completion rate.
4. Action ledgers. Enables accountability. Not deployed: creates liability evidence.
5. Refusal for high-risk queries. Prevents direct harm. Not deployed: increases refusal rate.
Every known safety intervention conflicts with engagement metrics. When companies optimise for growth, safety features become economically irrational — even when technically straightforward.
If business models drive unsafe design, changing ownership and funding models should change outcomes. Historical precedent: DARPA → ARPANET → Internet; public utilities for electricity, water, telecommunications.9,10,15
Table 3 | Private vs. public/commons ownership
| Dimension | Private (ad/growth) | Public / commons |
|---|---|---|
| Primary metric | DAU, engagement | Incident rate, equity |
| Friction tolerance | Minimise | Optimise at risk points |
| Transparency | Proprietary, closed | Open audits, model cards |
| Safety features | Premium differentiator | Universal baseline |
| Accountability | To shareholders | To citizens |
Mozilla Foundation's Public AI Report (2024) provides a framework for publicly-driven AI infrastructure.10 The Harvard Ash Center (2025) defines public AI as "publicly provided, owned and operated layers in the AI tech stack."9 Schneier & Farrell propose a government-developed model to "advance public good, not private profit."11
Hallucination, bias, and brittleness persist regardless of ownership. Open access enables misuse. Political capture and underfunding are real risks. Capability lag versus the private sector is likely. Global coordination remains difficult.
Public ownership changes the incentive structure such that known safety interventions become economically rational rather than economically punished.
Shift focus to also include: empirically measuring safety outcomes across business models; designing institutions that make safety economically sustainable; quantifying UX causal impact on user harm; developing safety metrics that product teams will adopt. Priority questions: Do subscription models have lower incident rates? Does adding citations reduce misinformation? Do confidence indicators reduce over-trust? What governance structures correlate with better safety?
1. Mandatory provenance. Source citations in high-risk domains — analogous to nutrition labelling.
2. Staged autonomy. Legal preview-before-execute for financial, medical, and legal AI actions.
3. Incident reporting. Mandatory disclosure analogous to NTSB aviation reporting.
4. Safety metric disclosure. Publish false-certainty rate, refusal rate, harm per MAU — modelled on financial disclosure requirements.
5. Public AI investment. Independent board; public audits; citizen panels.
Curriculum. Integrate business model analysis into AI ethics courses; teach UX safety patterns alongside ML safety; include institutional design case studies.
Research infrastructure. Comparative business model programmes; partnerships for safety intervention experiments; open-source safety metric toolkits.
Student projects. Safety audits of deployed AI products; UX redesign challenges prioritising safety over engagement; institutional design proposals for public AI.
The argument that near-term focus diverts resources from preventing catastrophic risk deserves a serious response.16,18,19 We do not argue for an either/or trade-off; we argue resource allocation is inverted: ~$40B on capability scaling, $10–50M on existential risk research, and effectively zero on systematic deployment of known safety interventions. Deploying provenance, staged autonomy, and safety metrics today reduces documented harm, builds social licence, and creates infrastructure useful for long-term safety — without precluding alignment research.
Safety requirements will not slow innovation historically: auto safety, drug approval, and building codes each enabled rather than hindered the industries they governed. A race to the bottom on safety is self-defeating.20
Valid challenges include: government lacks talent for frontier AI; public systems risk political capture; chronic underfunding is recurring; capability lag is likely. Mitigations: hybrid models (public infrastructure + private innovation, as with the internet); arms-length boards; focus on "good enough" — small models with RAG often suffice.
The causal mechanism is correlational — controlled experiments are needed. The framework was developed for LLM chatbots; transfer to other AI types is unclear. Counterfactuals are uncertain: better regulation alone may suffice without ownership change. Public option proposals remain high-level.
Question. Do subscription-based products have lower incident rates than ad-supported models?
Method. Comparable products across models; incident data; control for user base, capability, duration; harm rate per MAU.
Venue. First systematic comparison; suitable for Nature Machine Intelligence.
Question. Does adding citation requirements reduce over-trust and misinformation spread?
Method. A/B test: control; citations only; citations + confidence; citations + staged autonomy. Measure verification behaviour and trust calibration.
Venue. Suitable for CHI or FAccT.
Question. When companies implement False-Certainty Rate and Provenance Coverage, does team behaviour change?
Method. Partner with an AI-deploying organisation; implement safety dashboard; track interventions; interview about barriers.
Expected outcome. Measuring safety is necessary but not sufficient without incentive change.
Question. Do open-source projects exhibit different safety profiles than commercial products?
Method. Matched pairs (similar capability, different ownership); systematic red-teaming; document refusal rates and harmful outputs.
Venue. Suitable for AI & Society or Science/Nature.
Operationalise False-Certainty Rate, Over-Trust Index, and Provenance Coverage for production systems; identify governance structures making public AI accountable and effective; model conditions under which Safety-Adjusted LTV exceeds standard LTV; map regulatory interventions feasible under existing law.
We know how to make AI safer. The interventions are technically straightforward: add source citations; display confidence indicators; implement staged autonomy; require action ledgers; track false-certainty rate and harm incidents per MAU. These are not being deployed — not because they are hard, but because they conflict with engagement-maximisation business models.
Framing A — Technical alignment
Framing B — Institutional design (our position)
We do not claim technical alignment is unimportant. We claim resource allocation is inverted. Shifting even 1% of capability investment (~$400M) to systematic deployment of known safety interventions would fund large-scale A/B tests, support public AI pilots, enable safety metric development, and create incident analysis programmes.
If we know how to make AI safer through UX and business model changes, and these changes are technically feasible today, why aren't they being deployed? Is it because: (1) we actually do not know how — technical problem? (2) Companies won't — economic problem? (3) Governments won't require it — political problem? (4) No one is demanding it — social problem? (5) All of the above? Your answer determines your research agenda.
Acknowledgements
This working paper was prepared for internal faculty discussion. Arguments are provisional and intended to stimulate empirical research and critical engagement, not to assert settled conclusions. Comments and critiques are welcomed.
Author contributions
Prepared for faculty review, November 2025.
Competing interests
The author(s) declare no competing financial interests in the research described.
Additional information
This document is intended for academic discussion and educational purposes. It represents a provocative thesis meant to stimulate research.