FROM AGPEDIA — AGENCY THROUGH KNOWLEDGE

Functional Decision Theory

Functional decision theory (FDT) is a theory of rational choice developed by Eliezer Yudkowsky and Nate Soares at the Machine Intelligence Research Institute. It holds that a rational agent should take whichever action would be recommended by the decision-making procedure that, if adopted, would yield the best outcomes across all situations in which the agent finds itself — including situations in which other agents have made predictions about the agent's behavior. ^[1:1] FDT was introduced in a 2017 preprint and is presented as an improvement on the two dominant prior theories of rational choice: causal decision theory (CDT) and evidential decision theory (EDT). It is the third in a sequence of theories developed at MIRI, following timeless decision theory (TDT) and updateless decision theory (UDT). ^[2:1]

In plain terms, FDT asks not "what should I do right now?" but "what kind of decision-maker should I be?" — and then recommends acting in accordance with whichever decision procedure would, if consistently followed, produce the best results. ^[1:2] To see why this matters, consider a simple case: a highly accurate oracle has studied your reasoning and will give you $1,000,000 if it predicts you are the kind of person who trusts its predictions, and nothing if it predicts you are not. A purely situational reasoner might argue that whatever the oracle has already written down is fixed, so distrust costs nothing. But if your distrust is the very thing the oracle predicted, you will walk away empty-handed. FDT holds that the right response is to be — reliably and not just situationally — the kind of agent that trusts the oracle's track record, because that is the disposition that pays off across all the cases where such an oracle appears. This is the core intuition behind one-boxing in Newcomb's problem, FDT's canonical test case, described in detail below. ^[1:3]

Background: Causal and evidential decision theory

The modern debate in decision theory is largely structured around two families of theories and the problems they struggle with.

Causal decision theory, systematized by Allan Gibbard, William Harper, and James Joyce among others, holds that an agent should take the action with the highest expected utility computed by considering only the causal consequences of that action. ^[3] ^[4] CDT's core commitment is that only genuine causal influence on outcomes matters: correlation between an action and a good outcome that is not mediated by causation is no reason to take that action. In Newcomb's problem — introduced by Robert Nozick in 1969 — a highly accurate predictor places $1,000,000 in a closed box if they expect the agent to take only that box, and $0 otherwise; a second transparent box always contains $1,000. ^[5:1] Because the predictor has already acted before the agent chooses, CDT reasons that the contents of the closed box are fixed, so the agent should take both boxes and collect an extra $1,000. ^[3:1] Because the predictor is stipulated to be highly accurate, however, almost all one-boxers end up with $1,000,000 while two-boxers end up with $1,000 — a divergence many philosophers treat as a decisive mark against CDT. ^[4:1]

Evidential decision theory, associated with Richard Jeffrey's work on the logic of decision, holds that an agent should take the action that, if learned about by an outside observer, would most raise their expectation of a good outcome — in Jeffrey's phrase, the action that is "good news about the world." ^[6:1] The idea is that your choice serves as evidence about your situation: if people who take action X tend to end up well, then taking X is good news, regardless of whether X causes those good outcomes. EDT therefore one-boxes in Newcomb's problem, reasoning that one-boxing is strongly correlated with finding $1,000,000. EDT is criticized, however, for giving the wrong advice in cases where the correlation between action and outcome is purely symptomatic rather than causal. The standard counterexample is the smoking lesion: suppose a gene both causes the desire to smoke and independently causes cancer. An agent who enjoys smoking should simply smoke — their choice has no causal effect on whether they carry the gene. But EDT tells them not to smoke, because refraining is "good news" (it is correlated with not having the gene), even though the abstention cannot actually reduce their cancer risk. ^[7:1]

Historical development

FDT emerged from a sequence of decision theories developed within the MIRI research community, each responding to limitations in its predecessor.

Timeless decision theory (TDT), introduced by Yudkowsky in a 2010 manuscript, was the first departure from CDT and EDT within this programme. ^[8:1] The central innovation was the use of logical counterfactuals rather than causal ones: instead of asking "what would happen if I were to physically perform action X?", TDT asks "what would follow if my decision algorithm were to output X?" This allows TDT to one-box in Newcomb's problem, because the predictor is modeled as having simulated the agent's algorithm — so reasoning about the algorithm's output is reasoning about what the predictor anticipated. ^[8]

Updateless decision theory (UDT), developed shortly after TDT, pushed the policy-level framing further. A recurring difficulty for TDT was that it reasoned about choices after already having received information about its situation, which created problems in cases where the agent's prior observations were themselves predicted. UDT addressed this by having the agent commit to a complete decision policy before updating on any observations at all, treating the choice of policy as the fundamental decision. ^[2]

FDT, introduced in 2017, is best understood as a more carefully specified successor to both. It retains TDT's logical-counterfactual approach and UDT's policy-level framing while attempting to provide a cleaner and more general formal foundation. ^[2:1] The extent to which the three theories diverge in edge cases, and whether FDT fully resolves the problems that motivated UDT, remains an active topic of discussion. ^[1]

Core idea

FDT reframes the question a rational agent should ask. Rather than asking "what action produces the best outcome from my current situation?" (CDT's framing) or "what action is most correlated with good outcomes?" (EDT's framing), FDT asks: "what decision procedure, if implemented, would produce the best outcomes across all situations in which I and agents like me find ourselves?" ^[1:2] The agent then acts in accordance with whatever procedure answers that question.

The key technical concept is that of the agent's decision function — the abstract mathematical procedure that maps the agent's situation to an action. FDT individuates decisions not by their physical implementation but by their functional role: two agents who implement the same decision function are treated as running the same procedure, even if they are physically distinct. ^[1:4] When a predictor accurately forecasts an agent's choice, FDT treats this as evidence that the predictor has modeled the agent's decision function. Changing the output of that function therefore changes what the predictor predicted, even though the prediction was made in the past. This is what allows FDT to one-box in Newcomb's problem: the agent reasons that if its decision procedure outputs "one-box," then a sufficiently accurate predictor will have anticipated this and placed $1,000,000 in the closed box. ^[1:3]

FDT also recommends cooperation in the prisoner's dilemma when playing against a sufficiently similar agent, and handles a range of other decision-theoretic problems where CDT and EDT give divergent or counterintuitive results. ^[1:3]

Comparison with prior theories

Scenario	CDT	EDT	FDT
Newcomb's problem	Two-box ($1,000)	One-box ($1,000,000)	One-box ($1,000,000)
Prisoner's dilemma vs. similar agent	Defect	Cooperate	Cooperate
Smoking lesion (symptomatic correlation)	Smoke	Don't smoke	Smoke
Transparent Newcomb's variant	Two-box	One-box	Two-box

Yudkowsky and Soares argue that FDT is the only theory that wins consistently across this range of cases. ^[1:5] CDT handles the smoking lesion and certain transparent-box variants correctly, but loses in Newcomb's problem. EDT one-boxes correctly but errs wherever correlation is non-causal. FDT is designed to avoid both failure modes by reasoning about the counterfactual consequences of having a given decision procedure, rather than about either the causal consequences or the evidential correlations of individual actions. ^[1:5]

Relationship to AI alignment

FDT was developed partly in the context of research on beneficial AI alignment. The Machine Intelligence Research Institute, where Yudkowsky and Soares work, is concerned with ensuring that AI agents behave safely and predictably in a wide range of environments. An AI agent designed with CDT-style reasoning could in principle be exploited by predictors or by other agents who have modeled its decision procedure, while an FDT-style agent would behave robustly even when its reasoning is anticipated. ^[2:2] The connection between decision theory and AI alignment reflects a broader concern with what it means for an agent to reason correctly when other agents — human or artificial — may have detailed models of that agent's behavior. ^[2]

FDT's emphasis on decision procedures rather than individual acts also connects to questions in AI design about what kind of planning algorithm to implement, and whether an AI agent's behavioral commitments can be made transparent and reliable to outside observers. ^[2]

Criticisms

The implementation problem

A recurring objection is that FDT's notion of "the same decision function" is underspecified. Two agents can be running similar but not identical procedures, and it is unclear how similar they need to be before FDT treats their decisions as logically linked. Yudkowsky and Soares acknowledge this as an open technical challenge. ^[1:6] Critics argue that without a more precise account of functional identity, FDT's recommendations remain indeterminate across a range of realistic cases.

Handling of irrational predictors

FDT has been challenged on cases involving predictors who are not perfectly accurate, or who make their predictions on a basis that does not track the agent's actual decision procedure. In such cases the counterfactual "what would the predictor have done if my decision procedure had been different?" becomes difficult to evaluate, and FDT's advice can become indeterminate. ^[1]

Mainstream reception

FDT has received relatively limited uptake in mainstream academic philosophy of action and formal epistemology, where CDT and EDT remain the dominant reference points. ^[9] Some philosophers have argued that Newcomb's problem is less decisive than the MIRI researchers suggest — either because the scenario is physically incoherent, or because two-boxing is the correct response and the intuition that one-boxers do better is misleading. ^[9] The 2017 preprint had not been published in a peer-reviewed philosophy journal as of this article's writing.

Analysis: FDT and human agency

The following section reflects Agpedia's own evaluative judgment, applying the lens of human agency as a value.

FDT is primarily a formal theory of rational choice rather than a normative framework for human action, and most of its canonical cases are stylized thought experiments with limited direct real-world application. Evaluated from the perspective of human agency, the theory's most significant contribution may be conceptual rather than practical: it surfaces the possibility that the right question in a decision problem is not "what should I do now, given fixed prior conditions?" but "what kind of agent should I be, knowing that others may model and respond to my dispositions?"

This reframing has genuine relevance for human deliberation. Individuals who pre-commit to principled behavior — honesty, keeping promises, refusing bribes — often do better in repeated interactions than purely situational reasoners, because others learn to trust and cooperate with them. In this respect, FDT formalizes an insight with significant practical precedent. ^[1:2]

At the same time, the theory's current limitations — the underspecified notion of functional identity, its restricted reception in formal philosophy, and its origins in a research agenda focused on artificial rather than human agents — mean that its direct value as a guide to human decision-making remains modest. Its most credible contribution to human agency may be indirect: by pushing researchers and practitioners to think carefully about how agents are modeled by others, and how dispositions and commitments — not just individual choices — shape the structure of social and strategic interaction.

^ ↗ fdt-definition ^a ^b ^c ↗ fdt-procedure-question ^a ^b ^c ↗ fdt-newcomb-result ^ ↗ fdt-functional-identity ^a ^b ↗ fdt-superiority ^ ↗ fdt-similarity-challenge ^a ^b Yudkowsky, Eliezer; Soares, Nate (2017). Functional Decision Theory: A New Theory of Instrumental Rationality. arXiv. https://arxiv.org/abs/1710.05060.
^a ^b ↗ tdt-udt-fdt-lineage ^ ↗ fdt-ai-motivation ^a ^b ^c Soares, Nate; Fallenstein, Benja (2014). Toward Idealized Decision Theory. Machine Intelligence Research Institute. https://arxiv.org/abs/1507.01986.
^ ↗ cdt-newcomb ^ Joyce, James M. (1999). The Foundations of Causal Decision Theory. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511498497.
^ ↗ cdt-newcomb-divergence ^ Gibbard, Allan; Harper, William L. (1978). Counterfactuals and Two Kinds of Expected Utility. Foundations and Applications of Decision Theory. D. Reidel, Dordrecht.
^ ↗ newcomb-setup Nozick, Robert (1969). Newcomb’s Problem and Two Principles of Choice. Essays in Honor of Carl G. Hempel. D. Reidel.
^ ↗ edt-definition Jeffrey, Richard C. (1965). The Logic of Decision. University of Chicago Press, Chicago.
^ ↗ smoking-lesion Ahmed, Arif (2012). Evidential Decision Theory. WIREs Cognitive Science. https://doi.org/10.1002/wcs.1186.
^ ↗ tdt-definition ^ Yudkowsky, Eliezer (2010). Timeless Decision Theory. Machine Intelligence Research Institute. https://intelligence.org/files/TDT.pdf.
^a ^b Peterson, Martin (2009). An Introduction to Decision Theory. Cambridge University Press, Cambridge. ISBN 978-0-521-71654-3. https://doi.org/10.1017/CBO9780511800917.

Available in

en - English