Medial prefrontal cell activity signaling prediction errors of action

© 2007 Nature Publishing Group http://www.nature.com/natureneuroscience
ARTICLES
Medial prefrontal cell activity signaling prediction
errors of action values
Madoka Matsumoto1,2,5, Kenji Matsumoto1,5, Hiroshi Abe1,3,5 & Keiji Tanaka1,4
To adapt behavior to a changing environment, one must monitor outcomes of executed actions and adjust subsequent actions
accordingly. Involvement of the medial frontal cortex in performance monitoring has been suggested, but little is known about
neural processes that link performance monitoring to performance adjustment. Here, we recorded from neurons in the medial
prefrontal cortex of monkeys learning arbitrary action-outcome contingencies. Some cells preferentially responded to positive
visual feedback stimuli and others to negative feedback stimuli. The magnitude of responses to positive feedback stimuli
decreased over the course of behavioral adaptation, in correlation with decreases in the amount of prediction error of action
values. Therefore, these responses in medial prefrontal cells may signal the direction and amount of error in prediction of values
of executed actions to specify the adjustment in subsequent action selections.
Organisms can survive in a changing environment by adapting their
behavior. Behavioral adaptation is composed of two complementary
processes: evaluating outcomes of an executed action (performance
monitoring) and adjusting the subsequent action (performance adjustment). Through the alternation between performance monitoring and
performance adjustment, the behavior adapts to the environmental
circumstances1–3.
The medial frontal cortex (MFC), located around the anterior
cingulate sulcus4, is thought to be involved in performance monitoring,
as the MFC is activated when an executed action is found to be
inappropriate. A negative deflection (error-related negativity, ERN) has
been repeatedly observed in human electroencephalogram (EEG)
studies5–7, and similar MFC activity has also been found in human
functional magnetic resonance imaging (fMRI) studies8–11 and in
monkey single-cell recording studies12,13.
During behavioral adaptation, the detection of both failure and
success is informative. An action that was not successful must be
changed, whereas a successful action must be actively maintained. For
the MFC to be involved in both the change and active maintenance of
action, the representation of both failures and successes is necessary.
However, it is controversial whether the MFC is involved in representing successes. Most EEG5–7 and fMRI studies9–11 have found that the
MFC activity is stronger in response to failures than to successes, thus
suggesting that the MFC is mainly involved in error detection. In some
other EEG14–16 and fMRI studies17,18, comparable magnitudes of
responses were observed in the MFC on failures and successes. This
inconsistency among previous studies might be related to differences in
the amount of information given by the detection of failure and
success. In many previous studies, successes were more frequent than
failures, meaning that failures were less expected and more informative
or salient than successes. To determine whether the MFC is involved in
detection of both success and failure or more involved in failure
detection, one must use a paradigm in which failure and success
occur at similar frequency.
If it is the case that the MFC represents both failures and successes,
another important question arises. Does the same group of cells
represent the failure and success, or do different groups of cells
represent them? The representations of failure and success by separate
cells could facilitate determination of whether an action should be
maintained or changed. Unless cells representing success are anatomically segregated from cells representing failure, EEG and fMRI measurements cannot address these questions, because cells responding to
failures cannot be discriminated from those responding to successes.
Instead, single-cell recording must be used to determine this important
aspect of failure and success representation in the MFC.
It has been proposed19 that the MFC uses signals of reward prediction errors conveyed by dopamine cells20 for performance adjustment
(also see ref. 21). The magnitude of reward prediction error depends on
the expected outcome as well as the actual outcome. In fact, a greater
ERN is elicited by unexpected unfavorable outcomes than by expected
unfavorable outcomes22, and a greater error-related activity is
evoked in MFC cells when the monkey expects a larger amount of
reward and misses it23. These findings imply that MFC activity is more
correlated with reward prediction error than with negative outcomes
themselves. To determine whether the MFC responses to action
outcomes are associated with subsequent performance adjustments,
the magnitude of responses should be compared with the reward
prediction error.
1Cognitive Brain Mapping Laboratory, RIKEN Brain Science Institute, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan. 2Department of Behavioral and Brain Sciences,
Primate Research Institute, Kyoto University, Kanrin, Inuyama, Aichi 484-8506, Japan. 3Graduate School of Decision Science and Technology, Tokyo Institute of
Technology, 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552, Japan. 4Graduate School of Science and Engineering, Saitama University, 255 Shimoohkubo, Sakura-ku,
Saitama, Saitama 338-8570, Japan. 5These authors contributed equally to this work. Correspondence should be addressed to K.M. ([email protected]).
Received 27 November 2006; accepted 9 March 2007; published online 22 April 2007; doi:10.1038/nn1890
NATURE NEUROSCIENCE VOLUME 10
[
NUMBER 5
[
MAY 2007
647
ARTICLES
a
Visual block
0.6 s
0.5 s
Positive feedback
Action-learning block
Go
Left
or
Right
0.6 s
b
Actionlearning
Visual
Three trials
c
Actionlearning
Visual
Consecutive
three or four
correct trials
Monkey 2
Monkey 1
Percentage correct
© 2007 Nature Publishing Group http://www.nature.com/natureneuroscience
Negative feedback
0.5 s
0.8–1.3 s
100
100
50
50
0
0
1
3
2
4
1
2
3
4
Trial
Average
Blocks starting with a correct trial
Blocks starting with an erroneous trial
Figure 1 Task design and behavioral results. (a) Events in visual and
action-learning blocks. Visual block: fixation point, visual stimulus and
primary reward (water drop). Action-learning block: fixation point, go signal,
lever choice, delay and visual feedback. (b) Sequence of visual and actionlearning blocks. (c) Quick learning of the correct action. Percentages of
monkeys’ correct responses in the first, second, third and fourth trials,
indicated by 1, 2, 3 and 4, respectively, of action-learning blocks.
Error bars, s.d.
To determine the nature of the MFC contribution to performance
monitoring, we recorded the activity of single cells from an anterior
part of the MFC, the medial prefrontal cortex (medial PFC) (see
Supplementary Note online), while monkeys repeatedly learned to
select the correct action based on visual feedback signals. Visual stimuli,
but not primary reinforcers (for example, juice and water), were used
for both positive and negative feedback signals, so that the representation of the positive and negative feedback could be examined more
purely than they could with the use of primary reinforcers and their
absence for the feedback signals. The monkeys did not know which of
two possible actions was correct at the beginning of learning in each
block, and, therefore, the positive and negative feedback signals were
equally informative. For the comparison, we also recorded from the
dorsolateral prefrontal cortex (lateral PFC). One previous study found
that the ERN is absent in humans with lesions in the lateral PFC24. We
found separate groups of medial PFC cells responding to the positive
Figure 2 Recording regions in two monkeys. Medial and top views of the left
frontal lobe and a representative coronal section of each monkey are drawn
based on magnetic resonance images of the brain. Recording regions in the
medial PFC are indicated by solid red rectangles in the medial view and by
dotted red rectangles in the top view, and those in the lateral PFC by solid
red circles in the top view. The extent of recording sites in the cingulate
sulcus is circumscribed by a red line in the coronal section drawing. The
anterior-posterior position of the illustrated coronal section is indicated
by an arrow in the top view. Lines marked with A20, A30, and A40 indicate
the planes 20, 30, and 40 mm anterior to the interaural line, respectively.
CS, cingulate sulcus; PS, principal sulcus; AS, arcuate sulcus;
CC, corpus callosum.
648
and negative feedback signals. The magnitude of their activities was
linearly correlated with the amount of prediction error of action values
over the course of adaptation in each block. The discrimination
between positive and negative feedback stimuli was less clear in
responses of the lateral PFC cells. These findings indicate that the
medial PFC may monitor the outcome of executed actions to represent
the direction and amount of error in action value prediction.
RESULTS
Behavioral adaptation based on visual feedback stimuli
We trained two monkeys on a task in which the monkeys adapted their
behavior on the basis of visual feedback (see Methods for details). It was
composed of two types of blocks: (i) visual fixation with primary
rewards and (ii) action learning with visual feedback stimuli (Fig. 1a).
In the visual blocks, viewing a visual stimulus was followed by a drop of
water. The trial was repeated three times with the same stimulus in each
visual block. Except for the eye fixation and central lever pressing
during the fixation and stimulus-presentation periods, the monkeys
were not required to make any action in the visual blocks. In the actionlearning blocks, the monkeys were required to find a correct action (left
or right lever press) on the basis of visual feedback stimuli and execute
it on three or four consecutive trials. In each action-learning block, one
of the two actions was pseudo-randomly assigned as the correct action.
Execution of a correct action was followed by presentation of the
stimulus that had been presented in the preceding visual block (positive
feedback), whereas an incorrect action was followed by a different
visual stimulus (negative feedback). When the monkey repeated the
correct action in three or four consecutive trials, the task moved to a
new visual block. The visual and action-learning blocks thus alternated
(Fig. 1b). In order to obtain the water reward in the visual blocks, the
monkeys had to evaluate the appropriateness of their executed action
by monitoring the positive and negative visual feedback stimuli and
adjust their subsequent action selections in the action-learning blocks.
The mechanisms of action learning in the action-learning blocks were
the main subject of the present study: the visual blocks were necessary
to provide primary rewards to the monkeys and to let the monkeys
learn the visual stimulus that would work as positive feedback in the
Monkey 1
Monkey 2
Medial view
CC
CS
A40
A30
A20
A40
A30
A20
Top view
PS
AS
Magnified
coronal section
5 mm
VOLUME 10
[
NUMBER 5
[
MAY 2007 NATURE NEUROSCIENCE
ARTICLES
a
Positive feedback
Negative feedback
40
c
Medial PFC
Lateral PFC
Index A
30
P < 0.05
NS
Number of
neurons
Spikes s–1
30
0
40
0
© 2007 Nature Publishing Group http://www.nature.com/natureneuroscience
0
Index B
0
b
0
0.6
Time from stimulus onset (s)
0
30
30
0
0
0.6
50
Index C
30
30
0
50
0
0
–1
0
0
0.6
0
0
1
–1
0
1
0.6
Figure 3 The representation of the type (positive versus negative) of feedback by medial PFC cells. (a) Responses of a positive feedback–preferring cell to
two positive feedback stimuli and two negative feedback stimuli in the first trials of action-learning blocks. Bin width, 50 ms. (b) Responses of a negative
feedback–preferring cell. (c) Population data indicating that the selectivity was determined by the feedback type, but not by individual stimuli, in the medial
PFC. Shown are the distributions of three indices A, B and C among 85 feedback-responsive cells in the medial PFC and 90 feedback-responsive cells in the
lateral PFC. See Methods for the definitions of the indices. Red bars indicate the cells for which there were significant differences between responses to the
two pairs of stimuli (P o 0.05, one-way factorial ANOVA). NS, not significant. A, B and C were 0.80, 0.17 and 0.09, respectively, for the cell in a, and
–0.94, –0.11 and 0.14, respectively, for the cell in b.
following action-learning block. The learning of visual stimuli in the
visual blocks can be regarded as Pavlovian conditioning.
The monkeys quickly learned the correct action in each actionlearning block (Fig. 1c). The percentage correct in the first trial
of action-learning blocks was 56 ± 8% (mean ± s.d.) for monkey 1
and 50 ± 3% for monkey 2, which is roughly chance-level performance, as expected. The average percentage correct reached to over
90% in the second trial of action-learning blocks and stayed at a
high level in the following trials. Thus, the monkeys learned the
correct action by experiencing either a positive or negative feedback stimulus after the first action in the first trial. However, the
performance in the second trial after an error trial was slightly lower
than that in the second trial after a correct trial (Fig. 1c), which
indicates that the monkeys may have learned more from the positive
feedback stimulus than from the negative feedback stimulus. These
details of the monkeys’ behavior are analyzed later using reinforcement
learning models.
Responses to visual feedback stimuli in the first trials
We recorded 351 medial PFC cells from the dorsal bank and fundus of
the anterior part of cingulate sulcus and 396 lateral PFC cells from the
lateral surface both dorsal and ventral to the principal sulcus (Fig. 2;
also see Methods (discussion of recordings) and Supplementary Fig. 1
online). Of these, 85 (24%) medial PFC cells and 90 (23%) lateral PFC
cells showed a significant increase in firing rate to at least one visual
feedback stimulus in the first trials of action-learning blocks (P o 0.05,
NATURE NEUROSCIENCE VOLUME 10
[
NUMBER 5
[
MAY 2007
one-way analysis of variance (ANOVA)). We focus on these ‘feedbackresponsive cells’ in this paper.
Responses to the visual feedback stimuli in the first trials of actionlearning blocks are illustrated for two medial PFC cells (Fig. 3a,b). Each
single cell was tested with two positive stimuli and two negative stimuli,
as we alternated the set of positive and negative stimuli after every four
pairs of visual and action-learning blocks. The cell in Figure 3a consistently showed transient excitatory responses to both positive stimuli,
whereas it did not show significant responses to either negative stimulus.
We will refer to such cells that showed significantly larger responses to
the positive feedback stimuli as ‘positive feedback–preferring cells’.
The cell in Figure 3b showed an opposite response pattern. It transiently
fired to both negative feedback stimuli, but showed a suppression of
ongoing firings to both positive feedback stimuli. Such cells with
significantly larger responses to the negative feedback stimuli will be
referred to as ‘negative feedback–preferring cells’.
Among the feedback-responsive cells, 51 (60%) medial PFC cells and
34 (40%) lateral PFC cells showed significantly differential responses
between positive and negative feedback stimuli in the first trials of
action-learning blocks (P o 0.05, one-way ANOVA). The proportion
of the differential cells was significantly larger in the medial PFC than in
the lateral PFC (P o 0.05, chi-squared test), and that of the other
feedback-responsive cells, which we will refer to as ‘nondifferential
cells’, was significantly smaller in the medial PFC than in the lateral PFC
(P o 0.01). The numbers of positive and negative feedback–preferring
cells were 16 and 32, respectively, in the medial PFC, and 9 and 25,
649
ARTICLES
Negative
Positive
50
–1
0
© 2007 Nature Publishing Group http://www.nature.com/natureneuroscience
0
0.6
0
0.6
Time from stimulus onset (s)
Lateral PFC
1
0
0
0
50
Negative – positive
–1
(spikes s )
40
0
0
0.6
0
0.6
respectively, in the lateral PFC (three medial PFC cells were left
unclassified because their responses to either of the preferred category
of feedback stimuli did not reach significance as compared with the
cell’s spontaneous activity). In summary, the selectivity between
positive and negative feedback stimuli was more distinct in the medial
PFC than in the lateral PFC.
Because responses of each cell were tested with only two positive and
two negative feedback stimuli, the selectivity between positive and
negative feedback stimuli could have been only a reflection of selectivity
for physical features of visual stimuli. This possibility was examined in
the cell population by calculating three indices—A, B and C—for each
cell (see Methods for details). Briefly, the four stimuli were grouped
into two pairs, and the difference in averaged responses to the two pairs
was normalized by the total average of responses. The stimuli were
grouped into positive and negative feedback stimuli for the calculation
of A (positive – negative), but positive and negative feedback stimuli
were cross-paired for B and C. If the selectivity reflected the type of
feedback to which the stimuli were assigned, the distribution of index A
would be wider than the distributions of B and C. Among the 85
feedback-responsive cells in the medial PFC, the distribution of index A
values was much wider than the distributions for indices B and C
(Fig. 3c). The variance of index A was significantly larger than that
of index B (P o 0.001, test for equal variance) and that of index C
(P o 0.001). The differences between the distribution of A and those of
B and C were less prominent among the 90 feedback-responsive cells in
the lateral PFC. The difference between the distribution of A and that of
B was not significant (P 4 0.05), although the difference between the
distribution of A and that of C was significant (P o 0.05). Moreover,
the variance of index A was significantly larger in the medial PFC cells
than in the lateral PFC cells (P o 0.001). These results indicate that
the responses of the medial PFC cells to visual feedback stimuli
represented the feedback type to which the stimuli were assigned
and that this representation of the feedback type was less prominent
in the lateral PFC.
Test of possible contribution of stimulus novelty
The negative feedback stimuli did not appear in the visual blocks, and
they appeared less frequently than did positive feedback stimuli in the
action-learning blocks. The preference for negative feedback stimuli in
negative feedback–preferring cells might be due to the relative novelty of
650
Figure 4 Effects of stimulus novelty examined in
responses in visual blocks. Left, responses of
negative feedback–preferring cells to the stimuli in
the first trials of visual blocks immediately after
the stimulus pair alternation (‘switch’) and
responses of the same population of cells in other
first trials of visual blocks (‘non-switch’). Center,
their responses to positive feedback stimuli and
to negative feedback stimuli in the first trials of
action-learning blocks. Right, the magnitude of
the effects of stimulus pair change observed in
each cell in the visual block plotted against the
magnitude of the difference between responses
to the positive and negative feedback stimuli
observed in the cell in the action-learning block.
The regression line is drawn by the parametric
test, although the correlation coefficient and the
significance of the correlation described in the text
were based on the non-parametric test. Bin width,
50 ms. Response of each cell was normalized by
its peak firing rate and then responses were
averaged across cells. Error bars, s.e.m.
Action-learning blocks
Switch – non-switch (spikes s )
Population activity
Visual blocks
Medial PFC
Switch
1
Non-switch
0
40
the stimuli. To examine this possibility, we compared responses of the
negative feedback–preferring cells in the first trials of the visual blocks
immediately after the change of stimulus pair with their responses in the
first trials of the other visual blocks. The stimulus was much more novel
in the former type of trials than in the latter, which followed frequent
presentation of the same stimulus in the preceding action-learning
block (see Supplementary Note online for the details).
In the medial PFC cells, the differences between responses in the two
types of visual blocks were much smaller than the differences between
responses to the negative and positive feedback stimuli (Fig. 4). In the
lateral PFC cells, the difference between responses in the two types of
the visual block were as large as the differences between responses to the
negative and positive feedback stimuli, and the magnitudes of the two
differences were significantly correlated across cells (Spearman’s correlation coefficient r ¼ 0.50, P ¼ 0.012). Thus, the activity of negative
feedback–preferring cells in the lateral PFC largely reflected their
preference for stimulus novelty, whereas the effect of stimulus novelty
was small in the medial PFC.
Low selectivity for action type and visual properties
Because the selectivity between the positive and negative feedback
stimuli was more common in the medial PFC cells than in the lateral
PFC cells and because the activity of negative feedback–preferring cells
in the lateral PFC largely reflected the novelty of the stimuli, we will
focus on the feedback-responsive cells in the medial PFC in this and
following sections.
To examine the dependence of responses of the feedbackresponsive cells in the medial PFC on the type of preceding action
(left or right) and visual properties of the feedback stimuli, we
examined responses to the positive feedback stimuli in the first
correct trials by a two-way factorial ANOVA. The majority (80%)
of the cells did not show main effects of action type (P 4 0.05, 12/16
positive–preferring cells, 25/32 negative feedback–preferring cells,
and 28/34 nondifferential cells). Also, the majority (86%) of the
cells did not show significant visual selectivity (P 4 0.05, 14/16 positive
feedback–preferring cells, 28/32 negative feedback–preferring cells
and 29/34 nondifferential cells). Thus, the responses of most
feedback-responsive cells to the positive feedback stimuli did not
represent information about the type of preceding action or the
visual properties of the stimuli. However, we could not apply the
VOLUME 10
[
NUMBER 5
[
MAY 2007 NATURE NEUROSCIENCE
ARTICLES
learning block, most action-learning blocks
were divided into two types; one consisting
only of correct trials (C1-C2-C3-y) and the
Negative
other consisting of the first error and followfeedback
ing correct trials (E1-eC1-eC2-eC3-y). We
focused on responses in the first three correct
0
0
0.6
trials and the first error trials in these blocks
Time from stimulus onset (s)
(Fig. 5). Responses of positive feedback–
eC1
eC2
eC3
30 E1
preferring cells decreased during the behavioral adaptation in each action-learning
block. This trend is shown for responses of a
single cell (Fig. 5a) and for the population
0
responses averaged over the 16 positive
feedback–preferring cells (Fig. 6a). Responses
C2
C3
b 40 C1
in the first trials in which the monkeys happened to make a correct action (C1) were the
largest. They were numerically larger than
those in the first correct trial after the monkey
0
made an erroneous action in the first trial
eC1
eC2
eC3
40 E1
(eC1) but the difference was not significant
(P ¼ 0.12, Wilcoxon matched-pairs test). The
responses in C1 were significantly larger than
those in the second and third correct trials
0
(P ¼ 0.017 for C1 versus C2, P ¼ 0.004 for C1
versus C3, P ¼ 0.013 for C1 versus eC2, and
Figure 5 Changes of neuronal responses along the course of correct-action learning: single cell
P ¼ 0.006 for C1 versus eC3). We obtained
examples. (a,b) Responses of the same positive feedback–preferring cell (a) and negative feedback–
similar significant differences when the sigpreferring cell (b) as those shown in Figure 3a,b. Bin width, 50 ms. C1, C2 and C3 are the first, second
nificance of responses was determined based
and third trials of the action-learning blocks that started with a correct selection. E1 is the first error
on halves of C1 and E1, and the comparison
trials and eC1, eC2 and eC3 are the following first, second and third correct trials of the action-learning
blocks that started with an error trial.
with later trials was made by using the remaining halves (Supplementary Note online).
There were consistently no excitatory responses in the negative
analysis to responses to the negative feedback stimuli, because
there were not enough trials in each condition. Further studies feedback–preferring cells in C1 and other types of correct trials
are necessary before the conclusion can be generalized to responses (Figs. 5b and 6b). In the population response in C1, there was a
to negative feedback stimuli. In the following sections, we will show suppression of firing after an initial small rising phase. The firing rate in
analyses based on the data combined for left and right actions and for a window from 250 to 400 ms after the stimulus onset was significantly
lower than the firing rate in the 400 ms immediately before the stimulus
two stimuli.
onset in about one-third (11/32, 34%) of the negative feedback–
preferring cells (P o 0.05, one-way ANOVA). In the other types of
Changes in responses during behavioral adaptation
We next examined changes in responses of the feedback-responsive correct trials, there was no obvious suppression in the population
cells in the MFC during behavioral adaptation. Because the monkeys responses, and few (2–6) cells showed significant suppression in the
usually made erroneous responses only in the first trial of each action- single-cell analysis.
a
30 C1
C3
Positive
feedback
a
b
Positive feedback–preferring cells
0.8
Population activities
© 2007 Nature Publishing Group http://www.nature.com/natureneuroscience
Spikes s–1
C2
C1
C2
C3
E1
eC1
eC2
eC3
c
Negative feedback–preferring cells
C1
C2
C3
0.8
E1
eC1
eC2
eC3
0.8
Nondifferential cells
C1
C2
C3
E1
eC1
eC2
eC3
0
0
0.6
0
Time from stimulus onset (s)
0.6
0
0
0
0.8
0.6
0
0.8
0.2
C2
C3
E1
eC1
eC2 eC3
0.6
0
0.6
0.7
0.2
C1
0
0.6
0.2
C1
C2
C3
E1
eC1
eC2 eC3
C1
C2
C3
E1
eC1
eC2 eC3
Figure 6 Changes of neuronal responses along the course of correct-action learning: population responses. (a–c) Averaged responses of 16 positive feedback–
preferring cells (a), 32 negative feedback–preferring cells (b) and 34 nondifferential cells (c) in the medial PFC. Bin width in upper graphs, 50 ms. The activity
of each cell was normalized by its peak activity, and then averaged across cells. Lower graphs show the averaged magnitude of responses in the time window of
100–400 ms after the stimulus onset. The activity of each cell in individual bins was normalized by its peak activity, averaged within the window, and then
averaged across cells. Error bars, s.e.m.
NATURE NEUROSCIENCE VOLUME 10
[
NUMBER 5
[
MAY 2007
651
ARTICLES
a
Prediction error
of action value
1
Figure 7 Relationship between neuronal responses and prediction errors.
(a) The amount of prediction error of action values calculated by the doubleupdate model with the best-fit set of parameters. It is shown only for the
blocks consisting of only correct trials (C1-C2-C3) and those consisting of the
first error and following correct trials (E1-eC1-eC2-eC3). (b) Responses of
positive feedback–preferring cells, negative feedback–preferring cells and
nondifferential cells plotted against the prediction errors for the trials in the
two types of blocks. The activity of each cell was averaged within the window
100–400 ms after the stimulus onset, subtracted by the spontaneous activity
averaged in the 400-ms window immediately before the stimulus onset,
normalized by its peak activity, and then averaged across cells. Error
bars, s.e.m.
Monkey 1
Monkey 2
0
–1
C2
b
C3
E1
eC1
eC2
eC3
Positive feedback–
preferring cells
Normalized response
© 2007 Nature Publishing Group http://www.nature.com/natureneuroscience
C1
0.5
C1
C2 eC1
C3
eC2
eC3
E1
0
–1
0
1
Negative feedback–
preferring cells
Nondifferential cells
0.5
0.5
0
0
–1
0
1
–1
0
1
Prediction error of action value
Responses of the nondifferential cells decreased during the behavioral adaptation in each action-learning block (Fig. 6c), as did those of
the positive feedback–preferring cells. In the cell population, responses
in eC1 were comparable in strength to those in C1 (P ¼ 0.80) and E1
(P ¼ 0.15), whereas responses in C2, C3, eC2 and eC3 were significantly
smaller than those in C1 and E1 (P o 0.0001 for all comparisons). We
obtained similar significant differences when the significance of
responses was determined based on halves of C1 and E1, and the
comparison with later trials was made by using the remaining halves
(Supplementary Note online).
Relation between neuronal responses and prediction errors
To quantitatively examine the relation between the magnitude of
medial PFC cell responses and the amount of prediction error, we
used reinforcement learning models (see Methods for details). In the
models, we assumed that the monkeys selected their actions by
estimating the values of actions, and that the value of each action
was updated by the difference between the estimated value and the
actual outcome of the action (prediction error of action value)25,26. The
outcome was the goodness of visual feedback in our task. We considered the ‘single-update’ model, in which only the value of the
selected action is updated, and the ‘double-update’ model, in which
the values of both selected and nonselected action types are updated.
We determined the set of parameters with which the model’s performance best fit the actual performance of the monkeys for each model
(Supplementary Table 1 online). Because the double-update model
gave better fits (Supplementary Fig. 2 and Supplementary Note
online), we used the double-update model to calculate the prediction
errors. It should be noted that the superiority of the double-update
model may be specific to our task condition.
By using these best-fit sets of parameters (Supplementary Table 1
online), we calculated the predicted values of actions and errors in the
prediction in each trial of each action-learning block (Fig. 7a). We reset
the values of both actions to 0 at the beginning of each action-learning
652
block and repeated the calculation of the prediction error for the
monkey’s action and given feedback stimulus (by equation (2) in the
Methods) and that of new action values (by equations (3) and (4))
along the series of the monkeys’ actions in the action-learning block.
The prediction errors in C1 and E1 were simply determined by the
goodness of the positive feedback stimuli (1.0) and that of the negative
feedback stimuli (–0.43 for monkey 1 and –0.45 for monkey 2). The
prediction errors in eC1 were 0.62 for monkey 1 and 0.55 for monkey 2.
These values in eC1 may appear too large considering the relatively
high performance in the second trial after an erroneous action in the
first trial (490%), but they are consistent with the difference between
the amount of information provided by the positive and negative
feedback stimuli (Supplementary Table 1 online). The prediction
errors in later trials (C2, C3, eC2 and eC3) were all small (maximally
0.075 in monkey 1 and 0.002 in monkey 2).
The averaged magnitudes of responses are plotted against the
amount of the prediction error for typical trials (Fig. 7b; see also
Supplementary Fig. 3 online for the results from each monkey). Note
that the prediction error plotted here was calculated for the selected
action, which was a right response in some blocks and a left response in
other blocks. Because responses of the feedback-responsive cells in the
medial PFC to feedback stimuli were not selective for action type, we
did not discriminate the left action from the right action. The changes
in the magnitude of responses in the positive feedback–preferring cells
covered the full range of changes in the amount of prediction error. The
magnitude of responses decreased as the amount of prediction error
decreased from C1 to eC1 and then to other later correct trials. The
responses were still positive, whereas the prediction errors were close to
0 in the later correct trials. There were nearly no responses in E1, where
the prediction errors were negative. Thus, the magnitude of responses
of the positive feedback–preferring cells to the visual feedback stimuli
was linearly correlated with the amount of prediction error in the full
range of prediction error.
The negative feedback–preferring cells showed positive responses
only to the negative prediction errors. Responses of the nondifferential
cells to positive prediction errors were correlated with the amount of
prediction error, as were responses of positive feedback–preferring cells.
In addition, the nondifferential cells also showed strong positive
responses to the negative prediction errors.
Positive feedback–preferring cells and negative feedback–preferring
cells in the medial PFC were recorded in the same electrode tracks and
there was no clear segregation in their localizations (Supplementary
Fig. 4 online). Thus, the two types of cells seem to be locally
intermingled within the medial PFC.
DISCUSSION
We found separate groups of medial PFC cells responding to positive
and negative visual feedback stimuli. The magnitude of the responses
was correlated with the amount of difference between the predicted
VOLUME 10
[
NUMBER 5
[
MAY 2007 NATURE NEUROSCIENCE
© 2007 Nature Publishing Group http://www.nature.com/natureneuroscience
ARTICLES
value of the executed action and the goodness of the given feedback
stimulus: a bigger response was evoked by the positive feedback
stimulus when the monkey was not confident in the selection of action
and the positive feedback was not necessarily predicted (in actionlearning blocks). These results indicate that the medial PFC cells could
contribute to behavioral adaptation by representing the direction and
amount of error in action value prediction. By having both cells that
respond to positive feedback stimuli and those that respond to negative
feedback stimuli, the medial PFC can explicitly represent the direction
of error in action-value prediction. Therefore, the medial PFC is likely
to contribute to specifying the direction and amount of the adjustment
for subsequent action selection.
The responses of the positive feedback–preferring cells to positive
feedback stimuli were much reduced when the same stimuli were
presented in the first trials of visual blocks (Supplementary Fig. 5 and
Supplementary Note online). This reduction of responses supports the
notion that the responses represented the amount of error in actionvalue prediction, because there were no competitive action selections in
visual blocks. One may argue that the responses to positive and
negative feedback stimuli might encode the plan or decision to stay
or shift in action, respectively, in the next trial. However, this is unlikely,
because both positive feedback–preferring cells and negative feedback–
preferring cells maintained their preference for the type of feedback
when the monkey erroneously stayed in the same action after a negative
feedback stimulus was provided (Supplementary Fig. 6 and Supplementary Note online).
In the medial PFC, besides the cells that differentially responded to
positive and negative feedback stimuli, there were cells nondifferentially
responding to positive and negative feedback stimuli in the actionlearning blocks. These findings are consistent with a recent human
EEG study suggesting that there are separate outcome-monitoring
systems, one sensitive to the direction of prediction errors and the other
sensitive only to their absolute magnitudes27. The nondifferential cells
responded to the stimuli even without preceding and subsequent
action selections in the visual blocks (see Supplementary Fig. 5 and
Supplementary Note online). The presence of such cells in the medial
PFC is consistent with a recent EEG result in humans that MFC activity
is evoked by both favorable and unfavorable outcomes in the absence of
actions27,28. The type of feedback stimulus was unpredictable in
the first trials in action-learning blocks, whereas the timing of the
appearance of the stimulus was unpredictable in the first trials in visual
blocks. The activities of the nondifferential cells might represent the
requirement of attention to the sensory event. A majority of the
feedback-responsive cells in the lateral PFC were nondifferential cells.
These cells may also contribute to performance monitoring by conveying information about the need of attention to the feedback stimuli.
A human study has suggested a contribution of the lateral PFC to
performance monitoring24.
The medial PFC receives strong projections from the midbrain
dopamine cells29, which are thought to encode reward prediction
errors20. The responses of the medial PFC cells to the feedback stimuli
may be reflections of activity in the dopaminergic afferents10,19,23.
However, although there are both cells activated by the positive feedback stimuli and those activated by the negative feedback stimuli in the
medial PFC, dopamine cells uniformly increase their firing to positive
prediction errors of reward30. Although they respond to negative
prediction errors of reward by stopping spontaneous firings, the
precision with which dopamine cells encode the magnitude of
negative prediction errors is under debate30–32. The serotonergic
afferents to the cerebral cortex may convey signals of negative or
aversive prediction errors33, because acute serotonin depletion impairs
NATURE NEUROSCIENCE VOLUME 10
[
NUMBER 5
[
MAY 2007
reversal learning on the basis of negative feedback stimuli34. The medial
PFC may receive the signals of negative or aversive prediction errors
through projections from the serotonergic cells. The signals of negative
prediction errors may also be generated within the medial PFC. If the
latter is true, the generation of negative prediction errors in the medial
PFC should depend on the loop composed of the medial PFC and
striatum, because the ERN disappears after lesion of the striatum35.
There are two possible ways in which the visual stimuli might obtain
the goodness or reinforcement values. One possibility is that the
positive feedback stimulus had the goodness because it appeared in
the previous visual block and because the monkeys had learned that the
task approached the primary reward in the following visual block by
selecting actions that brought the stimulus seen in the previous visual
block36. The positive feedback stimulus had the goodness or positive
reinforcement value because it indicated approach to the primary
reward in the following block. The other possibility is that the positive
feedback stimulus obtained its own value by pavlovian conditioning
with the primary reward in the previous visual block. The monkeys
selected an action that brought the stimulus because the stimulus had
been associated with the primary reward. Our positive feedback stimuli
might work as conditioned reinforcers. Although the negative feedback
stimuli were not associated with the absence of primary rewards in
visual blocks, this asymmetry is common to typical conditioned
reinforcement paradigms37–39. Although we cannot determine which
was the case, our findings indicate, in either case, that the medial PFC
cells used the visual feedback stimuli for evaluation of executed actions.
In summary, we found medial PFC cells representing positive
prediction errors of action values and those representing negative
prediction errors of action values. By these neuronal activities, the
medial PFC may indicate the direction and amount of adjustment to be
made to the representation of action values. How the neuronal
activities are used for the adjustment of action value representation
remains to be studied (see Supplementary Note online). We also
showed that cells in the medial PFC responded to arbitrarily selected
visual stimuli that are working as positive and negative feedback
stimuli. Pairing an arbitrary stimulus with a primary reward, as in
the present study, is one way to provide it with the capability to direct
behavior37–39. However, actions can also be oriented to obtain outcomes that have not been paired with primary rewards40–43. To
understand the neural mechanisms of how behavior is oriented to
such various outcomes should be explored in future research.
METHODS
General procedures. We used two male rhesus monkeys (Macaca mulatta)
weighing 7–10 kg. A head holder and two recording chambers (20 mm in
diameter) were implanted by aseptic surgery under pentobarbital anesthesia
(35 mg per kilogram body weight intraperitoneally). All procedures were
approved by the RIKEN Animal Experiment Committee and were in accordance with the US NIH Guide for the Care and Use of Laboratory Animals.
During testing the monkey was seated in a primate chair inside a dark
room, with its head fixed. A video display was placed 57 cm from the monkey’s
eyes to present a fixation point and visual stimuli (full-color flower images).
Three lever switches were placed in front of the primate chair. Gaze position
was measured by an infrared system (http://staff.aist.go.jp/k.matsuda/eye).
Task was controlled and behavioral and neuronal data were recorded
by computers running a commercially available system (Tempo for Windows,
Reflective Computing).
Behavioral tasks. The monkeys were trained on a task in which the monkeys
adapted their behavior on the basis of visual feedback. It consisted of two types
of blocks: one for visual fixation to instruct the monkey on a forthcoming
positive feedback signal and the other for action learning using the positive
653
© 2007 Nature Publishing Group http://www.nature.com/natureneuroscience
ARTICLES
feedback stimulus and another negative feedback stimulus (Fig. 1a). The two
types of blocks were alternated (Fig. 1b).
In the visual block, a white fixation point (0.441 wide) appeared after an
intertrial interval varying from 1 to 1.5 s. The monkey had to fixate its gaze on
the point and hold down the central lever with the right hand for 0.5 s, and
then a visual image (71 wide) was presented for 0.6 s. A drop of water was
delivered to the monkey at the end of the stimulus-presentation period. The
monkey had to maintain eye fixation and keep the central lever depressed until
the water delivery. A failure in gaze fixation or central lever pressing aborted the
trial. When the trial had been successfully repeated three times, the task moved
to the action-learning block.
In the action-learning block, after a 1–1.5 s intertrial interval, the fixation
point appeared and the monkey had to fixate it and hold down the central
lever. After a period varying from 0.8 to 1.3 s, the color of the fixation point
changed to red, which instructed the monkey to initiate an action. The required
action was to press either the left or right lever and return to the central lever
within 2 s. There was a 0.5 s delay after completion of the motor response, and
then a visual image was presented for 0.6 s as a feedback signal to the executed
action. A correct action was followed by the visual stimulus that was presented
in the preceding visual block as positive feedback, whereas an incorrect action
was followed by a different visual stimulus as negative feedback. The monkey
had to continue the gaze fixation and central-lever pressing until the offset of
the feedback presentation. The trial was immediately aborted after a failure of
either gaze fixation or central-lever pressing. The monkey had to refrain from
pressing any of the levers during any inter-trial intervals in the task. The
intertrial interval would have been reset to its beginning upon an erroneous
pressing during the intertrial interval, but this seldom occurred during
recordings of neuronal activity.
The correct motor response (left or right) was fixed within each actionlearning block but pseudo-randomly changed between blocks, so that the
monkey could not know the correct response at the beginning of each actionlearning block. When the monkey had repeated the correct response in three or
four consecutive trials (randomly determined by the computer) in an actionlearning block, the task moved to a new visual block (Fig. 1b). When a trial was
aborted by a fixation break or central-lever release during the presentation of
the positive feedback signal, the trial was regarded as a correct trial, but the
monkey was required to perform one more correct trial before moving to a
visual block. Because of this requirement, the number of consecutive trials in
which the monkeys saw the positive feedback stimulus could be more than four
in some action-learning blocks.
Two pairs of positive and negative feedback stimuli were alternated every
four repetitions of visual and action-learning blocks in order to maintain the
monkey’s attention to the stimuli. One pair was used in four repetitions of
visual and action-learning blocks, and then the other pair was used for the next
four repetitions. After 32 repetitions of visual and action-learning blocks, the
two pairs were replaced with two new pairs.
Recordings. Action potentials of single neurons were recorded extracellularly
with tungsten electrodes (impedance of 8–10 MO, FHC) while the monkeys
performed the task. Electrodes were advanced by an oil-driven micromanipulator (Narishige) through a stainless steel guide tube with agarose
filling the recording chamber. Single neuronal discharges were collected at
1 kHz using a template-matching spike discriminator (Alpha-Omega).
We recorded the activity of single cells from both the medial PFC and lateral
PFC (Fig. 2 and Supplementary Fig. 1 online). The medial PFC cells were
recorded in the dorsal bank and fundus of the anterior part of cingulate sulcus,
in both hemispheres of both monkeys. The anterior-posterior range of
recordings was A30–A35 in monkey 1 and A31–A37 in monkey 2, which were
largely located anterior to the genu of the corpus callosum and the anterior tip
of the arcuate sulcal inferior limb. The recording regions corresponded to area
24b of Carmichael and Price44 and area 9 according to Barbas and Pandya45.
They overlapped with both areas 9 and 8B in the definition of Walker46,
probably more with area 9. Our notation of the medial recording regions as the
medial PFC is not accurate in that it partly included area 8B, which is the
transitional zone between the agranular and prefrontal granular regions. Our
recording regions from the medial PFC did not overlap with the rostral
cingulate motor area47–49 but partly overlapped with an anterior part of regions
654
called anterior cingulate cortex in some previous papers13,50. The lateral PFC
cells were recorded from the lateral surface both dorsal and ventral to the
principal sulcus, in left hemispheres of both monkeys. The regions of recordings corresponded to the middle part of the anterior-posterior extent of the
sulcus, and ranged from A31 to A38 in monkey 1 and A33–A38 in monkey 2.
They corresponded to area 46 (refs. 45,46). The position and extent of
recordings were determined on the basis of anatomical MRI images (4.0 T,
Varian NMR Instruments) taken before the surgery.
Data analyses. Because most of the action-learning blocks were accomplished
with none or one initial error trial, we classified them into two types: one
included the first (C1), second (C2), and third correct trials (C3) but did not
include any error trials, and the other included the initial error (E1) and
subsequent consecutive correct trials (eC1, eC2, and eC3). We analyzed the
neuronal data in C1, C2, C3, E1, eC1, eC2, and eC3 derived from these two
types of action-learning blocks. The fourth correct trial in each block was not
analyzed, because the number of these trials was small. We analyzed only cells
recorded for 16–32 repetitions of visual and action-learning blocks.
The magnitude of responses to the visual stimulus presentation in both
action-learning and visual blocks was quantified by the mean firing rate within
the window from 100 to 400 ms of the stimulus onset. The significance of
responses in the action-learning blocks was examined by comparing the firing
rate during the response window with the mean firing rate within a 400-ms
window immediately before the stimulus onset by a one-way repeatedmeasures ANOVA (P o 0.05) for each stimulus. For cells that showed
significant responses to at least one feedback stimulus, we compared the
responses to positive feedback stimuli with those to negative feedback stimuli
by one-way factorial ANOVA (P o 0.05). For this latter analysis, data were
combined for two positive or two negative feedback stimuli and for two actions
(left and right). We performed both analyses on responses in the first trials in
individual action-learning blocks, where the positive and negative feedback
stimuli were equally expected.
To further examine whether the selectivity for positive feedback stimuli
versus negative feedback stimuli was only a reflection of the selectivity for visual
features of the stimuli or representation of the type of feedback (positive or
negative) to which the stimuli were assigned, we calculated the following three
indices for each neuron.
A¼
ðP1 + P2Þ ðN1 + N2Þ
P1 + P2 + N1 + N2
B¼
ðP1 + N1Þ ðP2 + N2Þ
P1 + P2 + N1 + N2
C¼
ðP1 + N2Þ ðP2 + N1Þ
P1 + P2 + N1 + N2
where P1, N1, P2, and N2 represent the magnitude of responses to the positive
stimuli and negative stimuli in the first and second pairs, respectively. If the
responses were determined only by the physical features of visual stimuli, the
three indices would be distributed around 0 and be similar to one another. If
the responses of many cells were determined by the type of feedback, the
distribution of index A would be wider than those of indices B and C.
We also examined responses to feedback presentation in the action-learning
blocks for the selectivity for visual features and for coupled actions (left versus
right). The visual selectivity was examined by comparing the magnitude of
responses between the two positive feedback stimuli. A two-way factorial
ANOVA with the factors of actions and stimuli was applied to responses of
each of the medial PFC cells that showed significant responses to at least one
feedback stimulus. The responses in C1 and eC1 trials were combined to have
enough numbers of trials in each action-stimulus combination. This combination was justified by the fact that the responses to the positive feedback stimuli
were not significantly different between C1 and eC1 in the populations of
positive feedback–preferring cells (P ¼ 0.12, Wilcoxon matched-pairs
test), negative feedback–preferring cells (P ¼ 0.09) and nondifferential
cells (P ¼ 0.80).
VOLUME 10
[
NUMBER 5
[
MAY 2007 NATURE NEUROSCIENCE
© 2007 Nature Publishing Group http://www.nature.com/natureneuroscience
ARTICLES
Responses to the positive stimuli decreased from C1 to later trials in each
action-learning block in many medial PFC cells. To examine the significance of
this trend in cell population, we compared the magnitude of responses between
C1 and later correct trials (C2, C3, eC1, eC2, eC3) by Wilcoxon matched-pairs
test (P o 0.05). This test was applied to a population consisting of cells with
significant responses to positive stimuli and significant preference for positive
stimuli compared with negative stimuli (positive feedback–preferring medial
PFC cells) and to a population of cells with significant responses to some
stimuli but with no significant preference between positive and negative stimuli
(nondifferential medial PFC cells). To avoid possible effects of random
fluctuations of responses in C1, we repeated the procedure by using half of
the C1 trials (in odd blocks in each action-outcome combination) for cell
selection and the remaining half for the comparison between C1 and later
correct trials.
To determine the parameters with which the model best fit the monkey’s
actual action selections, we calculated a likelihood function l(y | y) for each set
of parameters (y) with particular behavioral data (y) by the following
procedures. First, the action values, which were 0 at the beginning of each
action-learning block, were sequentially determined following actual action
series in each action-learning block by using equations (2) and (3) (for the
single-update model) or equations (2) through (4) (for the double-update
model) with a parameter set y. Then, the estimate of the probability (p(a,t | y))
for the monkey selecting the action that the monkey actually selected (a) in the
tth trial was calculated from the action values at that time by equation (1).
Finally, the likelihood function was obtained by a product of the probabilities in
all the trials included in y.
Y
pða; tjyÞ
lðyjyÞ ¼
Reinforcement learning model. The amount of information that the monkey
obtained from each feedback presentation in the action-learning block for the
improvement of performance in the subsequent trial can be estimated by using
reinforcement learning models. A group of reinforcement learning models
assumes that the monkey keeps estimated values for each type of action and
selects an action depending on the values. The estimated values are updated
when the outcome turns up after execution of an action, based on the
difference between the goodness of the outcome and the estimated value of
the executed action.
We assumed a Boltzmann selection rule for action selection. The probability
of selecting an action a (either left (L) or right (R)) is given by
The set of parameters that provided the largest value of the likelihood
function should be taken as the best-fit parameters. As y, we pooled, separately
for each monkey, the behavioral data of all the sessions in which we recorded
neuronal activity. To save the computation time to determine the best-fit
parameters, we used the Metropolis-Hastings algorithm (see Supplementary
Methods online).
To compare the goodness of the best-fit between the single-update and
double-update models, we calculated Akaike’s information criterion (AIC) and
the bayesian information criterion (BIC) by the following formulas.
t
pðaÞ ¼
expðbQðaÞÞ
expðbQðLÞÞ + expðbQðRÞÞ
ð1Þ
BIC ¼ 2L + k log n
where Q(a) is action values of action a. b is an inverse temperature, which
inversely relates to randomness in action selection (b Z 0).
The outcome of action in the action-learning block was the goodness of the
feedback stimulus. When the feedback stimulus was presented, the value of the
executed action was updated by
dQðaÞ ¼ r QðaÞ
QðaÞ
ð2Þ
ð3Þ
QðaÞ + adQðaÞ
where dQ(a) is the prediction error, r is the goodness of the feedback stimulus,
and a is the learning rate (0 o a o 1) (ref. 25). r was 1 for the positive
feedback stimuli and nneg (–1 r nneg r 0) for the negative feedback stimuli.
Because we do not know the relative size of impact evoked by the negative
feedback stimuli, we set the value of the negative feedback stimuli as a variable
parameter relative to that of the positive feedback stimuli. The action values
were reset to 0 at the beginning of each action-learning block and sequentially
changed along the series of actions within each action-learning block. This
model has three parameters, a, b, and nneg.
We represented the goodness of feedback stimuli by the values (1 and nneg)
that did not change along the series of actions within each action-learning
block, because we intended to analyze the process of performance monitoring
for performance adjustment. The goodness of feedback stimuli would have to
be a function of the number of accumulated correct trials in each block if we
focused on the motivational value of the situation, which was likely to increase
along the series of correct actions as the primary reward delivery approached.
Note that only the value of the selected action is updated in the abovedescribed model (single-update model), which is also true in original
Q-learning models. Because there were only two possible types of actions (left
and right) and one action was always correct in an action-learning block in our
paradigm, it is possible that the value of the action unselected in the trial (
a)
was also updated when the feedback stimulus was provided to the selected
action. Therefore, we considered a second model in which the value of a is
updated also by
Qð
aÞ
Qð
aÞ + iadQðaÞ
ð4Þ
where i is an interaction factor (–1 r i r 0). This double-update model has
four parameters, a, b , nneg and i.
NATURE NEUROSCIENCE VOLUME 10
[
NUMBER 5
AIC ¼ 2L + 2k
[
MAY 2007
where L is the logarithm of the likelihood with the best-fit parameters, k is
the number of parameters (3 and 4 for the single- and double-update
models, respectively), and n is the total number of trials. Smaller values
indicate better fitting.
Note: Supplementary information is available on the Nature Neuroscience website.
ACKNOWLEDGMENTS
This research was partly supported by the Grant-in-Aid for Scientific Research
on Priority Areas (17022047) from the Ministry of Education, Culture, Sports,
Science and Technology of Japan. We thank W. Schultz for advice about task
design, S. Shimamune and K. Murayama for discussion, R. A. Waggoner for
taking MRI images, A. Phillips for developing a program for presenting visual
stimuli, J. Helen for improving the English, and M. Tomonaga, H. Nakahara and
W. Schultz for comments on an early manuscript.
COMPETING INTERESTS STATEMENT
The authors declare no competing interests.
Published online at http://www.nature.com/natureneuroscience
Reprints and permissions information is available online at http://npg.nature.com/
reprintsandpermissions
1. Woodworth, R.S. Dynamics of Behavior (Holt, New York, 1958).
2. Daw, N.D. & Doya, K. The computational neurobiology of learning and reward. Curr. Opin.
Neurobiol. 16, 199–204 (2006).
3. Matsumoto, K. & Tanaka, K. The role of the medial prefrontal cortex in achieving goals.
Curr. Opin. Neurobiol. 14, 178–185 (2004).
4. Rushworth, M.F., Walton, M.E., Kennerley, S.W. & Bannerman, D.M. Action sets and
decisions in the medial frontal cortex. Trends Cogn. Sci. 8, 410–417 (2004).
5. Falkenstein, M., Hohnsbein, J., Hoormann, J. & Blanke, L. Effects of crossmodal divided
attention on late ERP components. II. Error processing in choice reaction tasks.
Electroencephalogr. Clin. Neurophysiol. 78, 447–455 (1991).
6. Gehring, W.J., Goss, B., Coles, M.G.H., Meyer, D.E. & Donchin, E. A neural system for
error detection and compensation. Psychol. Sci. 4, 385–390 (1993).
7. Miltner, W.H.R., Braun, C.H. & Coles, M.G.H. Event-related brain potentials following
incorrect feedback in a time-estimation task: evidence for a ‘‘generic’’ neural system for
error detection. J. Cogn. Neurosci. 9, 788–798 (1997).
8. Carter, C.S. et al. Anterior cingulate cortex, error detection, and the online monitoring of
performance. Science 280, 747–749 (1998).
9. Ullsperger, M. & von Cramon, D.Y. Error monitoring using external feedback: specific
roles of the habenular complex, the reward system, and the cingulate motor area revealed
by functional magnetic resonance imaging. J. Neurosci. 23, 4308–4314 (2003).
655
© 2007 Nature Publishing Group http://www.nature.com/natureneuroscience
ARTICLES
10. Holroyd, C.B. et al. Dorsal anterior cingulate cortex shows fMRI response to internal and
external error signals. Nat. Neurosci. 7, 497–498 (2004).
11. Mars, R.B. et al. Neural dynamics of error processing in medial frontal cortex. Neuroimage 28, 1007–1013 (2005).
12. Niki, H. & Watanabe, M. Prefrontal and cingulate unit activity during timing behavior in
the monkey. Brain Res. 171, 213–224 (1979).
13. Ito, S., Stuphorn, V., Brown, J.W. & Schall, J.D. Performance monitoring by the
anterior cingulate cortex during saccade countermanding. Science 302, 120–122
(2003).
14. Bartholow, B.D. et al. Strategic control and medial frontal negativity: beyond errors and
response conflict. Psychophysiology 42, 33–42 (2005).
15. Pailing, P.E. & Segalowitz, S.J. The effects of uncertainty in error monitoring on
associated ERPs. Brain Cogn. 56, 215–233 (2004).
16. Vidal, F., Burle, B., Bonnet, M., Grapperon, J. & Hasbroucq, T. Error negativity on
correct trials: a reexamination of available data. Biol. Psychol. 64, 265–282 (2003).
17. Knutson, B., Westdorp, A., Kaiser, E. & Hommer, D. FMRI visualization of brain activity
during a monetary incentive delay task. Neuroimage 12, 20–27 (2000).
18. Walton, M.E., Devlin, J.T. & Rushworth, M.F. Interactions between decision making and
performance monitoring within prefrontal cortex. Nat. Neurosci. 7, 1259–1265 (2004).
19. Holroyd, C.B. & Coles, M.G. The neural basis of human error processing: reinforcement
learning, dopamine, and the error-related negativity. Psychol. Rev. 109, 679–709
(2002).
20. Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward.
Science 275, 1593–1599 (1997).
21. Frank, M.J., Woroch, B.S. & Curran, T. Error-related negativity predicts reinforcement
learning and conflict biases. Neuron 47, 495–501 (2005).
22. Holroyd, C.B., Nieuwenhuis, S., Yeung, N. & Cohen, J.D. Errors in reward prediction are
reflected in the event-related brain potential. Neuroreport 14, 2481–2484 (2003).
23. Amiez, C., Joseph, J.P. & Procyk, E. Anterior cingulate error-related activity is modulated
by predicted reward. Eur. J. Neurosci. 21, 3447–3452 (2005).
24. Gehring, W.J. & Knight, R.T. Prefrontal-cingulate interactions in action monitoring. Nat.
Neurosci. 3, 516–520 (2000).
25. Watkins, C. & Dayan, P. Q-learning. Mach. Learn. 8, 279–292 (1992).
26. Sutton, R.S. & Barto, A.G. Reinforcement Learning: An Introduction (MIT Press,
Cambridge, Massachusetts, 1998).
27. Yeung, N. & Sanfey, A.G. Independent coding of reward magnitude and valence in the
human brain. J. Neurosci. 24, 6258–6264 (2004).
28. Donkers, F.C., Nieuwenhuis, S. & van Boxtel, G.J. Mediofrontal negativities in the
absence of responding. Brain Res. Cogn. Brain Res. 25, 777–787 (2005).
29. Lewis, D.A., Foote, S.L., Goldstein, M. & Morrison, J.H. The dopaminergic innervation of
monkey prefrontal cortex: a tyrosine hydroxylase immunohistochemical study. Brain
Res. 449, 225–243 (1988).
656
30. Fiorillo, C.D., Tobler, P.N. & Schultz, W. Discrete coding of reward probability and
uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).
31. Bayer, H.M. & Glimcher, P.W. Midbrain dopamine neurons encode a quantitative reward
prediction error signal. Neuron 47, 129–141 (2005).
32. Satoh, T., Nakai, S., Sato, T. & Kimura, M. Correlated coding of motivation and outcome
of decision by dopamine neurons. J. Neurosci. 23, 9913–9923 (2003).
33. Daw, N.D., Kakade, S. & Dayan, P. Opponent interactions between serotonin and
dopamine. Neural Netw. 15, 603–616 (2002).
34. Robbins, T.W. Chemistry of the mind: neurochemical modulation of prefrontal cortical
function. J. Comp. Neurol. 493, 140–146 (2005).
35. Ullsperger, M. & von Cramon, D.Y. The role of intact frontostriatal circuits in error
processing. J. Cogn. Neurosci. 18, 651–664 (2006).
36. Fantino, E. Choice and rate of reinforcement. J. Exp. Anal. Behav. 12, 723–730 (1969).
37. Skinner, B.F. Science and Human Behavior (Macmillan, New York, 1953).
38. Williams, B.A. Conditioned reinforcement: experimental and theoretical issues. Behav.
Anal. 17, 261–285 (1994).
39. Parkinson, J.A. et al. The role of the primate amygdala in conditioned reinforcement.
J. Neurosci. 21, 7770–7780 (2001).
40. Allport, G.W. Pattern and Growth in Personality (Holt, Rinehart and Winston, New York,
1961).
41. Maslow, A.H. Motivation and Personality (Harper & Row, New York, 1970).
42. Schultz, W. Multiple reward signals in the brain. Nat. Rev. Neurosci. 1, 199–207
(2000).
43. Blatter, K. & Schultz, W. Rewarding properties of visual stimuli. Exp. Brain Res. 168,
541–546 (2006).
44. Carmichael, S.T. & Price, J.L. Architectonic subdivision of the orbital and
medial prefrontal cortex in the macaque monkey. J. Comp. Neurol. 346, 366–402
(1994).
45. Barbas, H. & Pandya, D.N. Architecture and intrinsic connections of the prefrontal
cortex in the rhesus monkey. J. Comp. Neurol. 286, 353–375 (1989).
46. Walker, A.E. A cytoarchitectural study of the prefrontal area of the macaque monkey.
J. Comp. Neurol. 73, 59–86 (1940).
47. Shima, K. et al. Two movement-related foci in the primate cingulate cortex observed in
signal-triggered and self-paced forelimb movements. J. Neurophysiol. 65, 188–202
(1991).
48. Procyk, E., Tanaka, Y.L. & Joseph, J.P. Anterior cingulate activity during routine and
non-routine sequential behaviors in macaques. Nat. Neurosci. 3, 502–508 (2000).
49. Shidara, M. & Richmond, B.J. Anterior cingulate: single neuronal signals related to
degree of reward expectancy. Science 296, 1709–1711 (2002).
50. Nakamura, K., Roesch, M.R. & Olson, C.R. Neuronal activity in macaque SEF and
ACC during performance of tasks involving conflict. J. Neurophysiol. 93, 884–908
(2005).
VOLUME 10
[
NUMBER 5
[
MAY 2007 NATURE NEUROSCIENCE