CITATION — REFERENCE ENTRY
uncertainty-preserves-switch · hadfield-menell2017offswitch
- Citation
- hadfield-menell2017offswitch
- Claim ID
- uncertainty-preserves-switch
- Assertion
- A traditional agent that treats its reward function as known has an incentive to disable its off switch; but if the robot is uncertain about the utility associated with the outcome and treats the human's decision to press the switch as evidence about that utility, it has a positive incentive to preserve the switch.
- Quote
A traditional agent takes its reward function for granted: we show that such agents have an incentive to disable the off switch, except in the special case where H is perfectly rational. Our key insight is that for R to want to preserve its off switch, it needs to be uncertain about the utility associated with the outcome, and to treat H's actions as important observations about that utility.
- Quote language
- en
- Locator
- Abstract
Available in