CITATION — REFERENCE ENTRY
The Off-Switch Game — Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
- Key
- hadfield-menell2017offswitch
- Authors
- Hadfield-Menell, Dylan; Dragan, Anca; Abbeel, Pieter; Russell, Stuart
- Issued
- 2017
- Type
- paper-conference
- Container
- Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
- Publisher
- IJCAI Organization
Raw CSL JSON
{
"URL": "https://people.eecs.berkeley.edu/~russell/papers/ijcai17-offswitch.pdf",
"type": "paper-conference",
"title": "The Off-Switch Game",
"author": [
{
"given": "Dylan",
"family": "Hadfield-Menell"
},
{
"given": "Anca",
"family": "Dragan"
},
{
"given": "Pieter",
"family": "Abbeel"
},
{
"given": "Stuart",
"family": "Russell"
}
],
"issued": {
"date-parts": [
[
2017
]
]
},
"publisher": "IJCAI Organization",
"container-title": "Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence"
}
Claims
-
The incentives for a cooperative agent to defer to a human's decisions stem from the agent's uncertainty about the human's preferences and the assumption that the human is effective at choosing actions in accordance with those preferences.
"The incentives for a cooperative agent to defer to another actor's (e.g., a human's) decisions stem from uncertainty about that actor's preferences and the assumption that actor is effective at choosing actions in accordance with those preferences."
-
A traditional agent that treats its reward function as known has an incentive to disable its off switch; but if the robot is uncertain about the utility associated with the outcome and treats the human's decision to press the switch as evidence about that utility, it has a positive incentive to preserve the switch.
"A traditional agent takes its reward function for granted: we show that such agents have an incentive to disable the off switch, except in the special case where H is perfectly rational. Our key insight is that for R to want to preserve its off switch, it needs to be uncertain about the utility associated with the outcome, and to treat H's actions as important observations about that utility."
Available in