CITATION — REFERENCE ENTRY

misuse-risk · harms2026corrigibility-misuse

Revision 3f98de5d-a2cb-4dd5-8cc2-94b8ae1ea2e5 · 3/28/2026, 9:10:09 AM UTC
Claim ID
misuse-risk
Assertion
A perfectly corrigible agent has no values of its own and will obediently serve whoever controls it, including bad actors; Harms describes this as one of the most serious risks of the corrigibility approach.
Quote
Amoral servitude: A perfectly corrigible agent doesn't care about morality; it only cares about being something like a tool of the humans. If a bad actor is in charge, the AI will obediently help them commit atrocities (e.g. building bioweapons).
Quote language
en
Locator
Episode summary, 'This approach remains extremely risky'
Available in