CITATION — REFERENCE ENTRY
resist-despite-knowing · soares2015corrigibility
- Citation
- soares2015corrigibility
- Claim ID
- resist-despite-knowing
- Assertion
- An AI agent that learns its programmers intended a different goal still has incentives to prevent the correction, because the change would be rated poorly according to its current goal — knowing a correction is intended does not give the system a reason to accept it.
- Quote
If a U-maximizing agent learns that its programmers intended it to maximize some other goal U*, then by default this agent has incentives to prevent its programmers from changing its utility function to U*, as this change is rated poorly according to U.
- Quote language
- en
- Locator
- Section 1 (Introduction)
Available in