Skip to article frontmatterSkip to article content

Decision making

Download the slides here


Introduction

This week’s sections don’t cover a single theme. Instead, we’ll look at a few different strands of work in neuroscience that could be of interest to people with a machine learning or quantitative background. We may also add more sections in future years.

In this section, we’ll focus on decision making - a really broad topic of course, because in a way you can think of everything the brain has to do is about making decisions.

Two-alternative forced choice (2AFC)

Let’s make this even more concrete with a very specific sort of task, the two-alternative forced choice, or 2AFC.

In this task, participants are shown some sort of image or movie and asked to decide between two options. A common one is the random dot kinematogram, with random dots on the screen some of which are moving coherently either to the left or right, and some moving randomly. The participants have to determine which direction the coherent dots are moving, and sometimes they’re asked to make their decision as quickly as possible. The reaction time is the time between the start of the video and the moment when they press the button.

Random Dot Kinematogram

Figure 1:Random Dot Kinematogram Experiment

If you run this experiment, you see characteristic skewed distributions of reaction times. This data is actually from a slightly different task where participants were asked whether or not they had seen the image being shown before or not.

2AFC Experiment Results

Figure 2:2AFC Experiment Results Ratcliff, 1978.

Drift diffusion / random walk model

There’s a beautifully simple theory for how we make these decisions which can account for these reaction times:

  • Imagine that as time goes by, sometimes a bit of evidence arrives that is on its own unreliable, but suggestive that the dots are moving right rather than left. Then another, followed by one that suggests that left is more likely, and so on. We keep track of a running total of how much evidence we’ve received that suggests right versus left, and once it crosses some threshold we make our decision. This is a biased random walk, where it’s more likely we take a step in the correct direction than the wrong direction.
<Figure size 1750x500 with 2 Axes>
  • As we increase the number of time points from 10 to 30.
<Figure size 1750x500 with 2 Axes>
<Figure size 1750x500 with 2 Axes>

And all the way up to 1000, this random walk looks more and more like Brownian motion with a drift. This gives us a mathematical theory that lets us analytically compute expressions for the reaction time distributions.

<Figure size 1750x500 with 2 Axes>
  • Or we can just numerically plot them, here with either a low decision threshold or a high decision threshold.
<Figure size 1250x500 with 1 Axes>
  • And sure enough, we get something that looks very much like the data from experiments.

Probabilistic interpretation

That’s nice, but still this model might seem a bit ad hoc. Fortunately, it turns out that there’s a neat probabilistic interpretation of this model.

  • Let’s set up a probabilistic model of the task. We’ll say there’s a true direction DD that can be either LL or RR, and both options are equally likely. You can actually modify this to make the options have different probabilities too.
True direction 𝐷=𝐿 or 𝐷=𝑅 equally likely 𝑃(𝐷=𝐿)=𝑃(𝐷=𝑅)=1/2\begin{gather} \text{True direction } 𝐷=𝐿 \text{ or } 𝐷=𝑅 \\ \text{ equally likely } \textcolor{#fcad03}{𝑃(𝐷=𝐿)}=\textcolor{#fcad03}{𝑃(𝐷=𝑅)}=1/2 \end{gather}
  • Now we have the data observed at time t is a series of symbols 𝑋𝑡𝑋_𝑡 each of which is RR or LL.
Observed data 𝑋𝑡=(𝑅,𝑅,𝐿,𝑅,𝐿,𝑅,𝑅,𝑅,) \text{Observed data } 𝑋_𝑡=(𝑅,𝑅,𝐿,𝑅,𝐿,𝑅,𝑅,𝑅,…)
  • The probability you observe the correct symbol is pp and the probability you observe the incorrect symbol 1p1-p.
Probabilities 𝑃(𝑋𝑡=𝐷)=𝑝 and 𝑃(𝑋𝑡=𝐷)=1𝑝where 𝑝>1𝑝\begin{gather} \text{Probabilities } 𝑃(𝑋_𝑡=𝐷)= \textcolor{#03bafc}{𝑝} \text{ and } 𝑃(𝑋_𝑡=−𝐷)=1− \textcolor{#03bafc}{𝑝} \\ \text{where } \textcolor{#03bafc}{𝑝} > 1 − \textcolor{#03bafc}{𝑝} \end{gather}
  • Now given we know the observed data and we want to infer the unknown value of DD, we compute which of the two options is more likely. We’ll write that as r(X)\textcolor{#a503fc}{r(X)} equals the ratio of the probability that D=RD=R given the observations XX, divided by the probability that D=LD=L. If this ratio is high then D=RD=R is much more likely, and if it’s low then D=LD=L is more likely.
Write r(X)=𝑃(𝐷=𝑅𝑿)𝑃(𝐷=𝐿𝑿) the likelihood ratio.If this is high then 𝐷=𝑅 is more likely.\begin{gather} \text{Write } \textcolor{#a503fc}{r(X)}=\frac{𝑃(𝐷=𝑅|𝑿)}{𝑃(𝐷=𝐿|𝑿)} \text{ the likelihood ratio.} \\ \text{If this is high then } 𝐷=𝑅 \text{ is more likely.} \end{gather}
  • We use Bayes’ theorem to rewrite the probability that D=RD=R given X in terms of the probability of XX given D=RD=R times the probability D=RD=R over the probability of XX. And the same thing for the probability that D=LD=L on the bottom.
r(X)=𝑃(𝑿𝐷=𝑅)𝑃(𝐷=R)/P(X)𝑃(𝑿𝐷=L)𝑃(𝐷=𝐿)/P(X) \textcolor{#a503fc}{r(X)}=\frac{𝑃(𝑿|𝐷=𝑅)_{\textcolor{#fcad03}{𝑃(𝐷=R)}/\textcolor{#128f04}{P(X)}}}{𝑃(𝑿|𝐷=L)_{\textcolor{#fcad03}{𝑃(𝐷=𝐿)}/\textcolor{#128f04}{P(X)}}}
  • The P(X)\textcolor{#128f04}{P(X)} cancels and both the prior probabilities D=RD=R and D=LD=L are both ½ so they cancel too.
r(X)=𝑃(𝑿𝐷=𝑅)𝑃(𝑿𝐷=L) \textcolor{#a503fc}{r(X)}=\frac{𝑃(𝑿|𝐷=𝑅)}{𝑃(𝑿|𝐷=L)}
  • The observations at different time points are independent, so we can expand these as a product of the probabilities at each time point.
r(X)=Πt𝑃(𝑿t𝐷=𝑅)Πt𝑃(𝑿t𝐷=L) \textcolor{#a503fc}{r(X)}=\frac{\Pi_{t}𝑃(𝑿_t|𝐷=𝑅)}{\Pi_{t}𝑃(𝑿_t|𝐷=L)}
  • And this product is the same at the top and bottom so we can pull it out.
r(X)=Πt𝑃(𝑿t𝐷=𝑅)𝑃(𝑿t𝐷=L) \textcolor{#a503fc}{r(X)}=\Pi_{t}\frac{𝑃(𝑿_t|𝐷=𝑅)}{𝑃(𝑿_t|𝐷=L)}
  • Now to see what’s going on more clearly we take the log of this ratio to get the log likelihood ratio.
logr(X)=logΠt𝑃(𝑿t𝐷=𝑅)𝑃(𝑿t𝐷=L) log\textcolor{#a503fc}{r(X)}=log\Pi_{t}\frac{𝑃(𝑿_t|𝐷=𝑅)}{𝑃(𝑿_t|𝐷=L)}
  • The log of a product is the sum of the logs.
logr(X)=Σtlog𝑃(𝑿t𝐷=𝑅)𝑃(𝑿t𝐷=L) log\textcolor{#a503fc}{r(X)}=\Sigma_{t}log\frac{𝑃(𝑿_t|𝐷=𝑅)}{𝑃(𝑿_t|𝐷=L)}
  • And we’ll write the individual terms as the “evidence at time ttϵ(𝑋𝑡)\epsilon(𝑋_𝑡).
logr(X)=Σtϵ(𝑋𝑡) log\textcolor{#a503fc}{r(X)}=\Sigma_{t}\epsilon(𝑋_𝑡)
  • This evidence will be log of 𝑝/1𝑝\textcolor{#03bafc}{𝑝}/ 1- \textcolor{#03bafc}{𝑝} when 𝑋𝑡=𝐿𝑋_𝑡=𝐿. If 𝑋𝑡=𝑅𝑋_𝑡=𝑅 it’s log of (1𝑝)/𝑝(1- \textcolor{#03bafc}{𝑝})/ \textcolor{#03bafc}{𝑝} but this is just negative log of 𝑝/1𝑝\textcolor{#03bafc}{𝑝} / 1- \textcolor{#03bafc}{𝑝}.
ϵ(𝑋𝑡)=log𝑝1𝑝 if 𝑋𝑡=Lor log𝑝1𝑝 if 𝑋𝑡=R\begin{gather} \epsilon(𝑋_𝑡) = log\frac{\textcolor{#03bafc}{𝑝}}{1-\textcolor{#03bafc}{𝑝}} \text{ if } 𝑋_𝑡 = L \\ \text{or } -log\frac{\textcolor{#03bafc}{𝑝}}{1-\textcolor{#03bafc}{𝑝}} \text{ if } 𝑋_𝑡 = R \end{gather}
  • With that we can write the log likelihood ratio as a constant term multiplied by the sum of terms Σtδ(𝑋𝑡)\textcolor{#fc0505}{\Sigma_{t}\delta(𝑋_𝑡)} which are +1 if 𝑋𝑡=𝑅𝑋_𝑡=𝑅 and -1 if 𝑋𝑡=𝐿𝑋_𝑡=𝐿.
log𝑟(𝑿)=log𝑝1𝑝Σtδ(𝑋𝑡) log𝑟(𝑿)=log\frac{\textcolor{#03bafc}{𝑝}}{1-\textcolor{#03bafc}{𝑝}} * \textcolor{#fc0505}{\Sigma_{t}\delta(𝑋_𝑡)}
  • But this sum is precisely the random walk we saw previously.
Σtδ(𝑋𝑡) is random walk! \textcolor{#fc0505}{\Sigma_{t}\delta(𝑋_𝑡)} \text{ is random walk!}
  • When D=R the sum increases by 1 with probability 𝑝\textcolor{#03bafc}{𝑝} and decreases by 1 with probability 1𝑝1-\textcolor{#03bafc}{𝑝}, and vice versa if D=LD=L.

  • We can understand the decision threshold in terms of probability now. We wait until the log likelihood ratio is bigger than some threshold θ or equivalently that the likelihood ratio is bigger than 𝑒θ𝑒^{\theta}.

Decision threshold θ𝐷=𝑅 more likely iflogr(X)>θ or r(X)>𝑒θ\begin{gather} \text{Decision threshold }\theta \text{: } 𝐷=𝑅 \text{ more likely if} \\ log\textcolor{#a503fc}{r(X)}>\theta \text{ or } \textcolor{#a503fc}{r(X)}>𝑒^\theta \end{gather}
  • And this happens when the sum of the deltas is bigger than some threshold, precisely as in the drift diffusion model.
This is true if Σtδ(𝑋𝑡)>const=θ/log𝑝1𝑝 \text{This is true if } \textcolor{#fc0505}{\Sigma_{t}\delta(𝑋_𝑡)} > \text{const} = \theta / log\frac{\textcolor{#03bafc}{𝑝}}{1-\textcolor{#03bafc}{𝑝}}

Electrophysiological data

And finally, perhaps the best thing about this theory is that having come up with the model based on fitting behavioural observations and then finding a rigorous probabilistic interpretation, electrophysiological experiments find traces of these evidence accumulation processes across multiple different brain regions in multiple species.

Experimental Evidence of Accumulation Processes Across Multiple Brain Regions and Species

Figure 3:Experimental Evidence of Accumulation Processes Across Multiple Brain Regions and Species O’Connell et al., 2018.

Of course, it’s never quite as clean cut as the theory and all sorts of modifications have been proposed, like adaptive thresholds that get closer to the origin over time to represent the increasing urgency of making some sort of decision, or thresholds that adapt to wider context in various ways. There are also much more comprehensive Bayesian theories of decision making in which this model is just one special case, and so on.

References
  1. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85(2), 59–108. 10.1037/0033-295x.85.2.59
  2. O’Connell, R. G., Shadlen, M. N., Wong-Lin, K., & Kelly, S. P. (2018). Bridging Neural and Computational Viewpoints on Perceptual Decision-Making. Trends in Neurosciences, 41(11), 838–852. 10.1016/j.tins.2018.06.005
  3. Gold, J. I., & Shadlen, M. N. (2007). The Neural Basis of Decision Making. Annual Review of Neuroscience, 30(1), 535–574. 10.1146/annurev.neuro.29.051605.113038