FROM AGPEDIA — AGENCY THROUGH KNOWLEDGE

Functional Decision Theory

Functional decision theory (FDT) is a theory of rational choice developed by Eliezer Yudkowsky and Nate Soares at the Machine Intelligence Research Institute. It holds that a rational agent should take whichever action would be recommended by the decision-making procedure that, if adopted, would yield the best outcomes across all situations in which the agent finds itself — including situations in which other agents have made predictions about the agent's behavior. ^[1:1] FDT was introduced in a 2017 preprint and is presented as an improvement on the two dominant prior theories of rational choice: causal decision theory (CDT) and evidential decision theory (EDT).

Background: Causal and evidential decision theory

The modern debate in decision theory is largely structured around two families of theories and the problems they struggle with.

Causal decision theory, systematized by James Joyce among others, holds that an agent should take the action with the highest expected utility computed by considering only the causal consequences of that action. ^[2] CDT's core commitment is that only genuine causal influence on outcomes matters: correlation between an action and a good outcome that is not mediated by causation is no reason to take that action. In Newcomb's problem — introduced by Robert Nozick in 1969 — a highly accurate predictor places $1,000,000 in a closed box if they expect the agent to take only that box, and $0 otherwise; a second transparent box always contains $1,000. ^[3:1] Because the predictor has already acted before the agent chooses, CDT reasons that the contents of the closed box are fixed, so the agent should take both boxes and collect an extra $1,000. ^[2:1] Almost all one-boxers end up with $1,000,000, however, while two-boxers end up with $1,000 — a result many philosophers and decision theorists consider a decisive mark against CDT.

Evidential decision theory, associated with Richard Jeffrey's work on the logic of decision, holds instead that an agent should take the action that, conditional on taking it, is associated with the best expected outcomes — the action that is "good news" about the world. ^[4:1] EDT one-boxes in Newcomb's problem, reasoning that one-boxing is strongly correlated with finding $1,000,000. EDT is criticized, however, for recommending actions in cases where the correlation between action and outcome is purely symptomatic rather than causal — for instance, refraining from smoking not to avoid cancer but because refraining is statistically associated with a cancer-resistant genotype.

Core idea

FDT reframes the question a rational agent should ask. Rather than asking "what action produces the best outcome from my current situation?" (CDT's framing) or "what action is most correlated with good outcomes?" (EDT's framing), FDT asks: "what decision procedure, if implemented, would produce the best outcomes across all situations in which I and agents like me find ourselves?" ^[1:2] The agent then acts in accordance with whatever procedure answers that question.

The key technical concept is that of the agent's decision function — the abstract mathematical procedure that maps the agent's situation to an action. FDT individuates decisions not by their physical implementation but by their functional role: two agents who implement the same decision function are treated as running the same procedure, even if they are physically distinct. When a predictor accurately forecasts an agent's choice, FDT treats this as evidence that the predictor has modeled the agent's decision function. Changing the output of that function therefore changes what the predictor predicted, even though the prediction was made in the past. This is what allows FDT to one-box in Newcomb's problem: the agent reasons that if its decision procedure outputs "one-box," then a sufficiently accurate predictor will have anticipated this and placed $1,000,000 in the closed box. ^[1:3]

FDT also recommends cooperation in the prisoner's dilemma when playing against a sufficiently similar agent, and handles a range of other decision-theoretic problems where CDT and EDT give divergent or counterintuitive results. ^[1:3]

Comparison with prior theories

Scenario	CDT	EDT	FDT
Newcomb's problem	Two-box ($1,000)	One-box ($1,000,000)	One-box ($1,000,000)
Prisoner's dilemma vs. similar agent	Defect	Cooperate	Cooperate
Smoking lesion (symptomatic correlation)	Don't smoke (wrong reason)	Smoke (correct result, wrong reason)	Don't smoke (correct)
Transparent Newcomb's variant	Two-box	One-box	Two-box

Yudkowsky and Soares argue that FDT is the only theory that wins consistently across this range of cases. ^[1:4] CDT wins in the smoking lesion case and certain transparent-box variants but loses in Newcomb's problem. EDT wins in Newcomb's problem but makes errors in cases of non-causal correlation. FDT is designed to avoid both failure modes by reasoning about the counterfactual consequences of having a given decision procedure, rather than about either causal consequences or evidential correlations of individual actions. ^[1:4]

Relationship to AI alignment

FDT was developed partly in the context of research on beneficial AI. The Machine Intelligence Research Institute, where Yudkowsky and Soares work, is concerned with ensuring that AI agents behave safely and predictably in a wide range of environments. An AI agent designed with CDT-style reasoning could in principle be exploited by predictors or by other agents who have modeled its decision procedure, while an FDT-style agent would behave robustly even when its reasoning is anticipated. ^[5:1] The connection between decision theory and AI alignment reflects a broader concern with what it means for an agent to reason correctly when other agents — human or artificial — may have detailed models of that agent's behavior.

FDT's emphasis on decision procedures rather than individual acts also connects to questions in AI design about what kind of planning algorithm to implement, and whether an AI agent's behavioral commitments can be made transparent and reliable to outside observers.

Criticisms

The implementation problem

A recurring objection is that FDT's notion of "the same decision function" is underspecified. Two agents can be running similar but not identical procedures: it is unclear how similar they need to be before FDT counts them as implementing the same function and thus treats their decisions as logically linked. Critics argue that without a precise account of functional identity, FDT's recommendations are indeterminate in a range of realistic cases.

Handling of irrational predictors

FDT has been challenged on cases involving predictors who are not perfectly accurate, or who make their predictions on a basis that does not track the agent's actual decision procedure. In some formulations, FDT struggles to give consistent advice when the predictor uses a flawed model of the agent, because the counterfactual "what would the predictor have put in the box if my decision procedure had been different?" becomes difficult to evaluate.

Relationship to updateless decision theory

FDT is closely related to — and partly a refinement of — updateless decision theory (UDT), an earlier proposal developed within the MIRI research community. ^[5] UDT similarly recommends acting on the basis of policies rather than individual acts, and also handles Newcomb-like problems by one-boxing. Some researchers treat FDT and UDT as equivalent in most practical cases, while others regard FDT as a more carefully specified successor. The relationship between them and the extent to which they give different results in edge cases remains an active topic of discussion.

Mainstream reception

FDT has received relatively limited uptake in mainstream academic philosophy of action and formal epistemology, where CDT and EDT remain the dominant reference points. Some philosophers have argued that Newcomb's problem is less decisive than the MIRI researchers suggest — either because the scenario is physically incoherent, or because two-boxing is the correct response and the intuition that one-boxers do better is misleading. ^[6] The 2017 preprint has not been published in a peer-reviewed philosophy journal as of the article's writing, which has been noted as a limitation of the theory's formal academic reception.

Analysis: FDT and human agency

The following section reflects Agpedia's own evaluative judgment, applying the lens of human agency as a value.

FDT is primarily a formal theory of rational choice rather than a normative framework for human action, and most of its canonical cases are stylized thought experiments with limited direct real-world application. Evaluated from the perspective of human agency, the theory's most significant contribution may be conceptual rather than practical: it surfaces the possibility that the right question in a decision problem is not "what should I do now, given fixed prior conditions?" but "what kind of agent should I be, knowing that others may model and respond to my dispositions?"

This reframing has genuine relevance for human deliberation. Individuals who pre-commit to principled behavior — honesty, keeping promises, refusing bribes — often do better in repeated interactions than purely situational reasoners, because others learn to trust and cooperate with them. In this respect, FDT formalizes an insight with significant practical precedent. ^[1:2]

At the same time, the theory's current limitations — the underspecified notion of functional identity, its restricted reception in formal philosophy, and its origins in a research agenda focused on artificial rather than human agents — mean that its direct value as a guide to human decision-making remains modest. Its most credible contribution to human agency may be indirect: by pushing researchers and practitioners to think carefully about how agents are modeled by others, and how dispositions and commitments — not just individual choices — shape the structure of social and strategic interaction.

^ ↗ fdt-definition ^a ^b ↗ fdt-procedure-question ^a ^b ↗ fdt-newcomb-result ^a ^b ↗ fdt-superiority Yudkowsky, Eliezer; Soares, Nate (2017). Functional Decision Theory: A New Theory of Instrumental Rationality. arXiv. https://arxiv.org/abs/1710.05060.
^ ↗ cdt-newcomb ^ Joyce, James M. (1999). The Foundations of Causal Decision Theory. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511498497.
^ ↗ newcomb-setup Nozick, Robert (1969). Newcomb’s Problem and Two Principles of Choice. Essays in Honor of Carl G. Hempel. D. Reidel.
^ ↗ edt-definition Jeffrey, Richard C. (1965). The Logic of Decision. University of Chicago Press, Chicago.
^ ↗ fdt-ai-motivation ^ Soares, Nate; Fallenstein, Benja (2014). Toward Idealized Decision Theory. Machine Intelligence Research Institute. https://arxiv.org/abs/1507.01986.
^ Peterson, Martin (2009). An Introduction to Decision Theory. Cambridge University Press, Cambridge. ISBN 978-0-521-71654-3. https://doi.org/10.1017/CBO9780511800917.

Available in

en - English