CITATION — REFERENCE ENTRY

definition · soares2015corrigibility

Revision c41d88ed-2a37-4438-8fdf-3369e54b932f · 3/27/2026, 8:25:50 PM UTC
Claim ID
definition
Assertion
An AI system is corrigible if it cooperates with what its creators regard as a corrective intervention, despite default incentives for rational agents to resist attempts to shut them down or modify their preferences.
Quote
We call an AI system "corrigible" if it cooperates with what its creators regard as a corrective intervention, despite default incentives for rational agents to resist attempts to shut them down or modify their preferences.
Quote language
en
Locator
Abstract
Available in