CITATION — REFERENCE ENTRY

misuse-risk · harms2026corrigibility-misuse

Revision 3f98de5d-a2cb-4dd5-8cc2-94b8ae1ea2e5 · 3/28/2026, 9:10:09 AM UTC

Citation: harms2026corrigibility-misuse
Claim ID: misuse-risk
Assertion: A perfectly corrigible agent has no values of its own and will obediently serve whoever controls it, including bad actors; Harms describes this as one of the most serious risks of the corrigibility approach.
Quote: Amoral servitude: A perfectly corrigible agent doesn't care about morality; it only cares about being something like a tool of the humans. If a bad actor is in charge, the AI will obediently help them commit atrocities (e.g. building bioweapons).
Quote language: en
Locator: Episode summary, 'This approach remains extremely risky'

Available in