© 2007 Nature Publishing Group http://www.nature.com/natureneuroscience ARTICLES Medial prefrontal cell activity signaling prediction errors of action values Madoka Matsumoto1,2,5, Kenji Matsumoto1,5, Hiroshi Abe1,3,5 & Keiji Tanaka1,4 To adapt behavior to a changing environment, one must monitor outcomes of executed actions and adjust subsequent actions accordingly. Involvement of the medial frontal cortex in performance monitoring has been suggested, but little is known about neural processes that link performance monitoring to performance adjustment. Here, we recorded from neurons in the medial prefrontal cortex of monkeys learning arbitrary action-outcome contingencies. Some cells preferentially responded to positive visual feedback stimuli and others to negative feedback stimuli. The magnitude of responses to positive feedback stimuli decreased over the course of behavioral adaptation, in correlation with decreases in the amount of prediction error of action values. Therefore, these responses in medial prefrontal cells may signal the direction and amount of error in prediction of values of executed actions to specify the adjustment in subsequent action selections. Organisms can survive in a changing environment by adapting their behavior. Behavioral adaptation is composed of two complementary processes: evaluating outcomes of an executed action (performance monitoring) and adjusting the subsequent action (performance adjustment). Through the alternation between performance monitoring and performance adjustment, the behavior adapts to the environmental circumstances1–3. The medial frontal cortex (MFC), located around the anterior cingulate sulcus4, is thought to be involved in performance monitoring, as the MFC is activated when an executed action is found to be inappropriate. A negative deflection (error-related negativity, ERN) has been repeatedly observed in human electroencephalogram (EEG) studies5–7, and similar MFC activity has also been found in human functional magnetic resonance imaging (fMRI) studies8–11 and in monkey single-cell recording studies12,13. During behavioral adaptation, the detection of both failure and success is informative. An action that was not successful must be changed, whereas a successful action must be actively maintained. For the MFC to be involved in both the change and active maintenance of action, the representation of both failures and successes is necessary. However, it is controversial whether the MFC is involved in representing successes. Most EEG5–7 and fMRI studies9–11 have found that the MFC activity is stronger in response to failures than to successes, thus suggesting that the MFC is mainly involved in error detection. In some other EEG14–16 and fMRI studies17,18, comparable magnitudes of responses were observed in the MFC on failures and successes. This inconsistency among previous studies might be related to differences in the amount of information given by the detection of failure and success. In many previous studies, successes were more frequent than failures, meaning that failures were less expected and more informative or salient than successes. To determine whether the MFC is involved in detection of both success and failure or more involved in failure detection, one must use a paradigm in which failure and success occur at similar frequency. If it is the case that the MFC represents both failures and successes, another important question arises. Does the same group of cells represent the failure and success, or do different groups of cells represent them? The representations of failure and success by separate cells could facilitate determination of whether an action should be maintained or changed. Unless cells representing success are anatomically segregated from cells representing failure, EEG and fMRI measurements cannot address these questions, because cells responding to failures cannot be discriminated from those responding to successes. Instead, single-cell recording must be used to determine this important aspect of failure and success representation in the MFC. It has been proposed19 that the MFC uses signals of reward prediction errors conveyed by dopamine cells20 for performance adjustment (also see ref. 21). The magnitude of reward prediction error depends on the expected outcome as well as the actual outcome. In fact, a greater ERN is elicited by unexpected unfavorable outcomes than by expected unfavorable outcomes22, and a greater error-related activity is evoked in MFC cells when the monkey expects a larger amount of reward and misses it23. These findings imply that MFC activity is more correlated with reward prediction error than with negative outcomes themselves. To determine whether the MFC responses to action outcomes are associated with subsequent performance adjustments, the magnitude of responses should be compared with the reward prediction error. 1Cognitive Brain Mapping Laboratory, RIKEN Brain Science Institute, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan. 2Department of Behavioral and Brain Sciences, Primate Research Institute, Kyoto University, Kanrin, Inuyama, Aichi 484-8506, Japan. 3Graduate School of Decision Science and Technology, Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552, Japan. 4Graduate School of Science and Engineering, Saitama University, 255 Shimoohkubo, Sakura-ku, Saitama, Saitama 338-8570, Japan. 5These authors contributed equally to this work. Correspondence should be addressed to K.M. ([email protected]). Received 27 November 2006; accepted 9 March 2007; published online 22 April 2007; doi:10.1038/nn1890 NATURE NEUROSCIENCE VOLUME 10 [ NUMBER 5 [ MAY 2007 647 ARTICLES a Visual block 0.6 s 0.5 s Positive feedback Action-learning block Go Left or Right 0.6 s b Actionlearning Visual Three trials c Actionlearning Visual Consecutive three or four correct trials Monkey 2 Monkey 1 Percentage correct © 2007 Nature Publishing Group http://www.nature.com/natureneuroscience Negative feedback 0.5 s 0.8–1.3 s 100 100 50 50 0 0 1 3 2 4 1 2 3 4 Trial Average Blocks starting with a correct trial Blocks starting with an erroneous trial Figure 1 Task design and behavioral results. (a) Events in visual and action-learning blocks. Visual block: fixation point, visual stimulus and primary reward (water drop). Action-learning block: fixation point, go signal, lever choice, delay and visual feedback. (b) Sequence of visual and actionlearning blocks. (c) Quick learning of the correct action. Percentages of monkeys’ correct responses in the first, second, third and fourth trials, indicated by 1, 2, 3 and 4, respectively, of action-learning blocks. Error bars, s.d. To determine the nature of the MFC contribution to performance monitoring, we recorded the activity of single cells from an anterior part of the MFC, the medial prefrontal cortex (medial PFC) (see Supplementary Note online), while monkeys repeatedly learned to select the correct action based on visual feedback signals. Visual stimuli, but not primary reinforcers (for example, juice and water), were used for both positive and negative feedback signals, so that the representation of the positive and negative feedback could be examined more purely than they could with the use of primary reinforcers and their absence for the feedback signals. The monkeys did not know which of two possible actions was correct at the beginning of learning in each block, and, therefore, the positive and negative feedback signals were equally informative. For the comparison, we also recorded from the dorsolateral prefrontal cortex (lateral PFC). One previous study found that the ERN is absent in humans with lesions in the lateral PFC24. We found separate groups of medial PFC cells responding to the positive Figure 2 Recording regions in two monkeys. Medial and top views of the left frontal lobe and a representative coronal section of each monkey are drawn based on magnetic resonance images of the brain. Recording regions in the medial PFC are indicated by solid red rectangles in the medial view and by dotted red rectangles in the top view, and those in the lateral PFC by solid red circles in the top view. The extent of recording sites in the cingulate sulcus is circumscribed by a red line in the coronal section drawing. The anterior-posterior position of the illustrated coronal section is indicated by an arrow in the top view. Lines marked with A20, A30, and A40 indicate the planes 20, 30, and 40 mm anterior to the interaural line, respectively. CS, cingulate sulcus; PS, principal sulcus; AS, arcuate sulcus; CC, corpus callosum. 648 and negative feedback signals. The magnitude of their activities was linearly correlated with the amount of prediction error of action values over the course of adaptation in each block. The discrimination between positive and negative feedback stimuli was less clear in responses of the lateral PFC cells. These findings indicate that the medial PFC may monitor the outcome of executed actions to represent the direction and amount of error in action value prediction. RESULTS Behavioral adaptation based on visual feedback stimuli We trained two monkeys on a task in which the monkeys adapted their behavior on the basis of visual feedback (see Methods for details). It was composed of two types of blocks: (i) visual fixation with primary rewards and (ii) action learning with visual feedback stimuli (Fig. 1a). In the visual blocks, viewing a visual stimulus was followed by a drop of water. The trial was repeated three times with the same stimulus in each visual block. Except for the eye fixation and central lever pressing during the fixation and stimulus-presentation periods, the monkeys were not required to make any action in the visual blocks. In the actionlearning blocks, the monkeys were required to find a correct action (left or right lever press) on the basis of visual feedback stimuli and execute it on three or four consecutive trials. In each action-learning block, one of the two actions was pseudo-randomly assigned as the correct action. Execution of a correct action was followed by presentation of the stimulus that had been presented in the preceding visual block (positive feedback), whereas an incorrect action was followed by a different visual stimulus (negative feedback). When the monkey repeated the correct action in three or four consecutive trials, the task moved to a new visual block. The visual and action-learning blocks thus alternated (Fig. 1b). In order to obtain the water reward in the visual blocks, the monkeys had to evaluate the appropriateness of their executed action by monitoring the positive and negative visual feedback stimuli and adjust their subsequent action selections in the action-learning blocks. The mechanisms of action learning in the action-learning blocks were the main subject of the present study: the visual blocks were necessary to provide primary rewards to the monkeys and to let the monkeys learn the visual stimulus that would work as positive feedback in the Monkey 1 Monkey 2 Medial view CC CS A40 A30 A20 A40 A30 A20 Top view PS AS Magnified coronal section 5 mm VOLUME 10 [ NUMBER 5 [ MAY 2007 NATURE NEUROSCIENCE ARTICLES a Positive feedback Negative feedback 40 c Medial PFC Lateral PFC Index A 30 P < 0.05 NS Number of neurons Spikes s–1 30 0 40 0 © 2007 Nature Publishing Group http://www.nature.com/natureneuroscience 0 Index B 0 b 0 0.6 Time from stimulus onset (s) 0 30 30 0 0 0.6 50 Index C 30 30 0 50 0 0 –1 0 0 0.6 0 0 1 –1 0 1 0.6 Figure 3 The representation of the type (positive versus negative) of feedback by medial PFC cells. (a) Responses of a positive feedback–preferring cell to two positive feedback stimuli and two negative feedback stimuli in the first trials of action-learning blocks. Bin width, 50 ms. (b) Responses of a negative feedback–preferring cell. (c) Population data indicating that the selectivity was determined by the feedback type, but not by individual stimuli, in the medial PFC. Shown are the distributions of three indices A, B and C among 85 feedback-responsive cells in the medial PFC and 90 feedback-responsive cells in the lateral PFC. See Methods for the definitions of the indices. Red bars indicate the cells for which there were significant differences between responses to the two pairs of stimuli (P o 0.05, one-way factorial ANOVA). NS, not significant. A, B and C were 0.80, 0.17 and 0.09, respectively, for the cell in a, and –0.94, –0.11 and 0.14, respectively, for the cell in b. following action-learning block. The learning of visual stimuli in the visual blocks can be regarded as Pavlovian conditioning. The monkeys quickly learned the correct action in each actionlearning block (Fig. 1c). The percentage correct in the first trial of action-learning blocks was 56 ± 8% (mean ± s.d.) for monkey 1 and 50 ± 3% for monkey 2, which is roughly chance-level performance, as expected. The average percentage correct reached to over 90% in the second trial of action-learning blocks and stayed at a high level in the following trials. Thus, the monkeys learned the correct action by experiencing either a positive or negative feedback stimulus after the first action in the first trial. However, the performance in the second trial after an error trial was slightly lower than that in the second trial after a correct trial (Fig. 1c), which indicates that the monkeys may have learned more from the positive feedback stimulus than from the negative feedback stimulus. These details of the monkeys’ behavior are analyzed later using reinforcement learning models. Responses to visual feedback stimuli in the first trials We recorded 351 medial PFC cells from the dorsal bank and fundus of the anterior part of cingulate sulcus and 396 lateral PFC cells from the lateral surface both dorsal and ventral to the principal sulcus (Fig. 2; also see Methods (discussion of recordings) and Supplementary Fig. 1 online). Of these, 85 (24%) medial PFC cells and 90 (23%) lateral PFC cells showed a significant increase in firing rate to at least one visual feedback stimulus in the first trials of action-learning blocks (P o 0.05, NATURE NEUROSCIENCE VOLUME 10 [ NUMBER 5 [ MAY 2007 one-way analysis of variance (ANOVA)). We focus on these ‘feedbackresponsive cells’ in this paper. Responses to the visual feedback stimuli in the first trials of actionlearning blocks are illustrated for two medial PFC cells (Fig. 3a,b). Each single cell was tested with two positive stimuli and two negative stimuli, as we alternated the set of positive and negative stimuli after every four pairs of visual and action-learning blocks. The cell in Figure 3a consistently showed transient excitatory responses to both positive stimuli, whereas it did not show significant responses to either negative stimulus. We will refer to such cells that showed significantly larger responses to the positive feedback stimuli as ‘positive feedback–preferring cells’. The cell in Figure 3b showed an opposite response pattern. It transiently fired to both negative feedback stimuli, but showed a suppression of ongoing firings to both positive feedback stimuli. Such cells with significantly larger responses to the negative feedback stimuli will be referred to as ‘negative feedback–preferring cells’. Among the feedback-responsive cells, 51 (60%) medial PFC cells and 34 (40%) lateral PFC cells showed significantly differential responses between positive and negative feedback stimuli in the first trials of action-learning blocks (P o 0.05, one-way ANOVA). The proportion of the differential cells was significantly larger in the medial PFC than in the lateral PFC (P o 0.05, chi-squared test), and that of the other feedback-responsive cells, which we will refer to as ‘nondifferential cells’, was significantly smaller in the medial PFC than in the lateral PFC (P o 0.01). The numbers of positive and negative feedback–preferring cells were 16 and 32, respectively, in the medial PFC, and 9 and 25, 649 ARTICLES Negative Positive 50 –1 0 © 2007 Nature Publishing Group http://www.nature.com/natureneuroscience 0 0.6 0 0.6 Time from stimulus onset (s) Lateral PFC 1 0 0 0 50 Negative – positive –1 (spikes s ) 40 0 0 0.6 0 0.6 respectively, in the lateral PFC (three medial PFC cells were left unclassified because their responses to either of the preferred category of feedback stimuli did not reach significance as compared with the cell’s spontaneous activity). In summary, the selectivity between positive and negative feedback stimuli was more distinct in the medial PFC than in the lateral PFC. Because responses of each cell were tested with only two positive and two negative feedback stimuli, the selectivity between positive and negative feedback stimuli could have been only a reflection of selectivity for physical features of visual stimuli. This possibility was examined in the cell population by calculating three indices—A, B and C—for each cell (see Methods for details). Briefly, the four stimuli were grouped into two pairs, and the difference in averaged responses to the two pairs was normalized by the total average of responses. The stimuli were grouped into positive and negative feedback stimuli for the calculation of A (positive – negative), but positive and negative feedback stimuli were cross-paired for B and C. If the selectivity reflected the type of feedback to which the stimuli were assigned, the distribution of index A would be wider than the distributions of B and C. Among the 85 feedback-responsive cells in the medial PFC, the distribution of index A values was much wider than the distributions for indices B and C (Fig. 3c). The variance of index A was significantly larger than that of index B (P o 0.001, test for equal variance) and that of index C (P o 0.001). The differences between the distribution of A and those of B and C were less prominent among the 90 feedback-responsive cells in the lateral PFC. The difference between the distribution of A and that of B was not significant (P 4 0.05), although the difference between the distribution of A and that of C was significant (P o 0.05). Moreover, the variance of index A was significantly larger in the medial PFC cells than in the lateral PFC cells (P o 0.001). These results indicate that the responses of the medial PFC cells to visual feedback stimuli represented the feedback type to which the stimuli were assigned and that this representation of the feedback type was less prominent in the lateral PFC. Test of possible contribution of stimulus novelty The negative feedback stimuli did not appear in the visual blocks, and they appeared less frequently than did positive feedback stimuli in the action-learning blocks. The preference for negative feedback stimuli in negative feedback–preferring cells might be due to the relative novelty of 650 Figure 4 Effects of stimulus novelty examined in responses in visual blocks. Left, responses of negative feedback–preferring cells to the stimuli in the first trials of visual blocks immediately after the stimulus pair alternation (‘switch’) and responses of the same population of cells in other first trials of visual blocks (‘non-switch’). Center, their responses to positive feedback stimuli and to negative feedback stimuli in the first trials of action-learning blocks. Right, the magnitude of the effects of stimulus pair change observed in each cell in the visual block plotted against the magnitude of the difference between responses to the positive and negative feedback stimuli observed in the cell in the action-learning block. The regression line is drawn by the parametric test, although the correlation coefficient and the significance of the correlation described in the text were based on the non-parametric test. Bin width, 50 ms. Response of each cell was normalized by its peak firing rate and then responses were averaged across cells. Error bars, s.e.m. Action-learning blocks Switch – non-switch (spikes s ) Population activity Visual blocks Medial PFC Switch 1 Non-switch 0 40 the stimuli. To examine this possibility, we compared responses of the negative feedback–preferring cells in the first trials of the visual blocks immediately after the change of stimulus pair with their responses in the first trials of the other visual blocks. The stimulus was much more novel in the former type of trials than in the latter, which followed frequent presentation of the same stimulus in the preceding action-learning block (see Supplementary Note online for the details). In the medial PFC cells, the differences between responses in the two types of visual blocks were much smaller than the differences between responses to the negative and positive feedback stimuli (Fig. 4). In the lateral PFC cells, the difference between responses in the two types of the visual block were as large as the differences between responses to the negative and positive feedback stimuli, and the magnitudes of the two differences were significantly correlated across cells (Spearman’s correlation coefficient r ¼ 0.50, P ¼ 0.012). Thus, the activity of negative feedback–preferring cells in the lateral PFC largely reflected their preference for stimulus novelty, whereas the effect of stimulus novelty was small in the medial PFC. Low selectivity for action type and visual properties Because the selectivity between the positive and negative feedback stimuli was more common in the medial PFC cells than in the lateral PFC cells and because the activity of negative feedback–preferring cells in the lateral PFC largely reflected the novelty of the stimuli, we will focus on the feedback-responsive cells in the medial PFC in this and following sections. To examine the dependence of responses of the feedbackresponsive cells in the medial PFC on the type of preceding action (left or right) and visual properties of the feedback stimuli, we examined responses to the positive feedback stimuli in the first correct trials by a two-way factorial ANOVA. The majority (80%) of the cells did not show main effects of action type (P 4 0.05, 12/16 positive–preferring cells, 25/32 negative feedback–preferring cells, and 28/34 nondifferential cells). Also, the majority (86%) of the cells did not show significant visual selectivity (P 4 0.05, 14/16 positive feedback–preferring cells, 28/32 negative feedback–preferring cells and 29/34 nondifferential cells). Thus, the responses of most feedback-responsive cells to the positive feedback stimuli did not represent information about the type of preceding action or the visual properties of the stimuli. However, we could not apply the VOLUME 10 [ NUMBER 5 [ MAY 2007 NATURE NEUROSCIENCE ARTICLES learning block, most action-learning blocks were divided into two types; one consisting only of correct trials (C1-C2-C3-y) and the Negative other consisting of the first error and followfeedback ing correct trials (E1-eC1-eC2-eC3-y). We focused on responses in the first three correct 0 0 0.6 trials and the first error trials in these blocks Time from stimulus onset (s) (Fig. 5). Responses of positive feedback– eC1 eC2 eC3 30 E1 preferring cells decreased during the behavioral adaptation in each action-learning block. This trend is shown for responses of a single cell (Fig. 5a) and for the population 0 responses averaged over the 16 positive feedback–preferring cells (Fig. 6a). Responses C2 C3 b 40 C1 in the first trials in which the monkeys happened to make a correct action (C1) were the largest. They were numerically larger than those in the first correct trial after the monkey 0 made an erroneous action in the first trial eC1 eC2 eC3 40 E1 (eC1) but the difference was not significant (P ¼ 0.12, Wilcoxon matched-pairs test). The responses in C1 were significantly larger than those in the second and third correct trials 0 (P ¼ 0.017 for C1 versus C2, P ¼ 0.004 for C1 versus C3, P ¼ 0.013 for C1 versus eC2, and Figure 5 Changes of neuronal responses along the course of correct-action learning: single cell P ¼ 0.006 for C1 versus eC3). We obtained examples. (a,b) Responses of the same positive feedback–preferring cell (a) and negative feedback– similar significant differences when the sigpreferring cell (b) as those shown in Figure 3a,b. Bin width, 50 ms. C1, C2 and C3 are the first, second nificance of responses was determined based and third trials of the action-learning blocks that started with a correct selection. E1 is the first error on halves of C1 and E1, and the comparison trials and eC1, eC2 and eC3 are the following first, second and third correct trials of the action-learning blocks that started with an error trial. with later trials was made by using the remaining halves (Supplementary Note online). There were consistently no excitatory responses in the negative analysis to responses to the negative feedback stimuli, because there were not enough trials in each condition. Further studies feedback–preferring cells in C1 and other types of correct trials are necessary before the conclusion can be generalized to responses (Figs. 5b and 6b). In the population response in C1, there was a to negative feedback stimuli. In the following sections, we will show suppression of firing after an initial small rising phase. The firing rate in analyses based on the data combined for left and right actions and for a window from 250 to 400 ms after the stimulus onset was significantly lower than the firing rate in the 400 ms immediately before the stimulus two stimuli. onset in about one-third (11/32, 34%) of the negative feedback– preferring cells (P o 0.05, one-way ANOVA). In the other types of Changes in responses during behavioral adaptation We next examined changes in responses of the feedback-responsive correct trials, there was no obvious suppression in the population cells in the MFC during behavioral adaptation. Because the monkeys responses, and few (2–6) cells showed significant suppression in the usually made erroneous responses only in the first trial of each action- single-cell analysis. a 30 C1 C3 Positive feedback a b Positive feedback–preferring cells 0.8 Population activities © 2007 Nature Publishing Group http://www.nature.com/natureneuroscience Spikes s–1 C2 C1 C2 C3 E1 eC1 eC2 eC3 c Negative feedback–preferring cells C1 C2 C3 0.8 E1 eC1 eC2 eC3 0.8 Nondifferential cells C1 C2 C3 E1 eC1 eC2 eC3 0 0 0.6 0 Time from stimulus onset (s) 0.6 0 0 0 0.8 0.6 0 0.8 0.2 C2 C3 E1 eC1 eC2 eC3 0.6 0 0.6 0.7 0.2 C1 0 0.6 0.2 C1 C2 C3 E1 eC1 eC2 eC3 C1 C2 C3 E1 eC1 eC2 eC3 Figure 6 Changes of neuronal responses along the course of correct-action learning: population responses. (a–c) Averaged responses of 16 positive feedback– preferring cells (a), 32 negative feedback–preferring cells (b) and 34 nondifferential cells (c) in the medial PFC. Bin width in upper graphs, 50 ms. The activity of each cell was normalized by its peak activity, and then averaged across cells. Lower graphs show the averaged magnitude of responses in the time window of 100–400 ms after the stimulus onset. The activity of each cell in individual bins was normalized by its peak activity, averaged within the window, and then averaged across cells. Error bars, s.e.m. NATURE NEUROSCIENCE VOLUME 10 [ NUMBER 5 [ MAY 2007 651 ARTICLES a Prediction error of action value 1 Figure 7 Relationship between neuronal responses and prediction errors. (a) The amount of prediction error of action values calculated by the doubleupdate model with the best-fit set of parameters. It is shown only for the blocks consisting of only correct trials (C1-C2-C3) and those consisting of the first error and following correct trials (E1-eC1-eC2-eC3). (b) Responses of positive feedback–preferring cells, negative feedback–preferring cells and nondifferential cells plotted against the prediction errors for the trials in the two types of blocks. The activity of each cell was averaged within the window 100–400 ms after the stimulus onset, subtracted by the spontaneous activity averaged in the 400-ms window immediately before the stimulus onset, normalized by its peak activity, and then averaged across cells. Error bars, s.e.m. Monkey 1 Monkey 2 0 –1 C2 b C3 E1 eC1 eC2 eC3 Positive feedback– preferring cells Normalized response © 2007 Nature Publishing Group http://www.nature.com/natureneuroscience C1 0.5 C1 C2 eC1 C3 eC2 eC3 E1 0 –1 0 1 Negative feedback– preferring cells Nondifferential cells 0.5 0.5 0 0 –1 0 1 –1 0 1 Prediction error of action value Responses of the nondifferential cells decreased during the behavioral adaptation in each action-learning block (Fig. 6c), as did those of the positive feedback–preferring cells. In the cell population, responses in eC1 were comparable in strength to those in C1 (P ¼ 0.80) and E1 (P ¼ 0.15), whereas responses in C2, C3, eC2 and eC3 were significantly smaller than those in C1 and E1 (P o 0.0001 for all comparisons). We obtained similar significant differences when the significance of responses was determined based on halves of C1 and E1, and the comparison with later trials was made by using the remaining halves (Supplementary Note online). Relation between neuronal responses and prediction errors To quantitatively examine the relation between the magnitude of medial PFC cell responses and the amount of prediction error, we used reinforcement learning models (see Methods for details). In the models, we assumed that the monkeys selected their actions by estimating the values of actions, and that the value of each action was updated by the difference between the estimated value and the actual outcome of the action (prediction error of action value)25,26. The outcome was the goodness of visual feedback in our task. We considered the ‘single-update’ model, in which only the value of the selected action is updated, and the ‘double-update’ model, in which the values of both selected and nonselected action types are updated. We determined the set of parameters with which the model’s performance best fit the actual performance of the monkeys for each model (Supplementary Table 1 online). Because the double-update model gave better fits (Supplementary Fig. 2 and Supplementary Note online), we used the double-update model to calculate the prediction errors. It should be noted that the superiority of the double-update model may be specific to our task condition. By using these best-fit sets of parameters (Supplementary Table 1 online), we calculated the predicted values of actions and errors in the prediction in each trial of each action-learning block (Fig. 7a). We reset the values of both actions to 0 at the beginning of each action-learning 652 block and repeated the calculation of the prediction error for the monkey’s action and given feedback stimulus (by equation (2) in the Methods) and that of new action values (by equations (3) and (4)) along the series of the monkeys’ actions in the action-learning block. The prediction errors in C1 and E1 were simply determined by the goodness of the positive feedback stimuli (1.0) and that of the negative feedback stimuli (–0.43 for monkey 1 and –0.45 for monkey 2). The prediction errors in eC1 were 0.62 for monkey 1 and 0.55 for monkey 2. These values in eC1 may appear too large considering the relatively high performance in the second trial after an erroneous action in the first trial (490%), but they are consistent with the difference between the amount of information provided by the positive and negative feedback stimuli (Supplementary Table 1 online). The prediction errors in later trials (C2, C3, eC2 and eC3) were all small (maximally 0.075 in monkey 1 and 0.002 in monkey 2). The averaged magnitudes of responses are plotted against the amount of the prediction error for typical trials (Fig. 7b; see also Supplementary Fig. 3 online for the results from each monkey). Note that the prediction error plotted here was calculated for the selected action, which was a right response in some blocks and a left response in other blocks. Because responses of the feedback-responsive cells in the medial PFC to feedback stimuli were not selective for action type, we did not discriminate the left action from the right action. The changes in the magnitude of responses in the positive feedback–preferring cells covered the full range of changes in the amount of prediction error. The magnitude of responses decreased as the amount of prediction error decreased from C1 to eC1 and then to other later correct trials. The responses were still positive, whereas the prediction errors were close to 0 in the later correct trials. There were nearly no responses in E1, where the prediction errors were negative. Thus, the magnitude of responses of the positive feedback–preferring cells to the visual feedback stimuli was linearly correlated with the amount of prediction error in the full range of prediction error. The negative feedback–preferring cells showed positive responses only to the negative prediction errors. Responses of the nondifferential cells to positive prediction errors were correlated with the amount of prediction error, as were responses of positive feedback–preferring cells. In addition, the nondifferential cells also showed strong positive responses to the negative prediction errors. Positive feedback–preferring cells and negative feedback–preferring cells in the medial PFC were recorded in the same electrode tracks and there was no clear segregation in their localizations (Supplementary Fig. 4 online). Thus, the two types of cells seem to be locally intermingled within the medial PFC. DISCUSSION We found separate groups of medial PFC cells responding to positive and negative visual feedback stimuli. The magnitude of the responses was correlated with the amount of difference between the predicted VOLUME 10 [ NUMBER 5 [ MAY 2007 NATURE NEUROSCIENCE © 2007 Nature Publishing Group http://www.nature.com/natureneuroscience ARTICLES value of the executed action and the goodness of the given feedback stimulus: a bigger response was evoked by the positive feedback stimulus when the monkey was not confident in the selection of action and the positive feedback was not necessarily predicted (in actionlearning blocks). These results indicate that the medial PFC cells could contribute to behavioral adaptation by representing the direction and amount of error in action value prediction. By having both cells that respond to positive feedback stimuli and those that respond to negative feedback stimuli, the medial PFC can explicitly represent the direction of error in action-value prediction. Therefore, the medial PFC is likely to contribute to specifying the direction and amount of the adjustment for subsequent action selection. The responses of the positive feedback–preferring cells to positive feedback stimuli were much reduced when the same stimuli were presented in the first trials of visual blocks (Supplementary Fig. 5 and Supplementary Note online). This reduction of responses supports the notion that the responses represented the amount of error in actionvalue prediction, because there were no competitive action selections in visual blocks. One may argue that the responses to positive and negative feedback stimuli might encode the plan or decision to stay or shift in action, respectively, in the next trial. However, this is unlikely, because both positive feedback–preferring cells and negative feedback– preferring cells maintained their preference for the type of feedback when the monkey erroneously stayed in the same action after a negative feedback stimulus was provided (Supplementary Fig. 6 and Supplementary Note online). In the medial PFC, besides the cells that differentially responded to positive and negative feedback stimuli, there were cells nondifferentially responding to positive and negative feedback stimuli in the actionlearning blocks. These findings are consistent with a recent human EEG study suggesting that there are separate outcome-monitoring systems, one sensitive to the direction of prediction errors and the other sensitive only to their absolute magnitudes27. The nondifferential cells responded to the stimuli even without preceding and subsequent action selections in the visual blocks (see Supplementary Fig. 5 and Supplementary Note online). The presence of such cells in the medial PFC is consistent with a recent EEG result in humans that MFC activity is evoked by both favorable and unfavorable outcomes in the absence of actions27,28. The type of feedback stimulus was unpredictable in the first trials in action-learning blocks, whereas the timing of the appearance of the stimulus was unpredictable in the first trials in visual blocks. The activities of the nondifferential cells might represent the requirement of attention to the sensory event. A majority of the feedback-responsive cells in the lateral PFC were nondifferential cells. These cells may also contribute to performance monitoring by conveying information about the need of attention to the feedback stimuli. A human study has suggested a contribution of the lateral PFC to performance monitoring24. The medial PFC receives strong projections from the midbrain dopamine cells29, which are thought to encode reward prediction errors20. The responses of the medial PFC cells to the feedback stimuli may be reflections of activity in the dopaminergic afferents10,19,23. However, although there are both cells activated by the positive feedback stimuli and those activated by the negative feedback stimuli in the medial PFC, dopamine cells uniformly increase their firing to positive prediction errors of reward30. Although they respond to negative prediction errors of reward by stopping spontaneous firings, the precision with which dopamine cells encode the magnitude of negative prediction errors is under debate30–32. The serotonergic afferents to the cerebral cortex may convey signals of negative or aversive prediction errors33, because acute serotonin depletion impairs NATURE NEUROSCIENCE VOLUME 10 [ NUMBER 5 [ MAY 2007 reversal learning on the basis of negative feedback stimuli34. The medial PFC may receive the signals of negative or aversive prediction errors through projections from the serotonergic cells. The signals of negative prediction errors may also be generated within the medial PFC. If the latter is true, the generation of negative prediction errors in the medial PFC should depend on the loop composed of the medial PFC and striatum, because the ERN disappears after lesion of the striatum35. There are two possible ways in which the visual stimuli might obtain the goodness or reinforcement values. One possibility is that the positive feedback stimulus had the goodness because it appeared in the previous visual block and because the monkeys had learned that the task approached the primary reward in the following visual block by selecting actions that brought the stimulus seen in the previous visual block36. The positive feedback stimulus had the goodness or positive reinforcement value because it indicated approach to the primary reward in the following block. The other possibility is that the positive feedback stimulus obtained its own value by pavlovian conditioning with the primary reward in the previous visual block. The monkeys selected an action that brought the stimulus because the stimulus had been associated with the primary reward. Our positive feedback stimuli might work as conditioned reinforcers. Although the negative feedback stimuli were not associated with the absence of primary rewards in visual blocks, this asymmetry is common to typical conditioned reinforcement paradigms37–39. Although we cannot determine which was the case, our findings indicate, in either case, that the medial PFC cells used the visual feedback stimuli for evaluation of executed actions. In summary, we found medial PFC cells representing positive prediction errors of action values and those representing negative prediction errors of action values. By these neuronal activities, the medial PFC may indicate the direction and amount of adjustment to be made to the representation of action values. How the neuronal activities are used for the adjustment of action value representation remains to be studied (see Supplementary Note online). We also showed that cells in the medial PFC responded to arbitrarily selected visual stimuli that are working as positive and negative feedback stimuli. Pairing an arbitrary stimulus with a primary reward, as in the present study, is one way to provide it with the capability to direct behavior37–39. However, actions can also be oriented to obtain outcomes that have not been paired with primary rewards40–43. To understand the neural mechanisms of how behavior is oriented to such various outcomes should be explored in future research. METHODS General procedures. We used two male rhesus monkeys (Macaca mulatta) weighing 7–10 kg. A head holder and two recording chambers (20 mm in diameter) were implanted by aseptic surgery under pentobarbital anesthesia (35 mg per kilogram body weight intraperitoneally). All procedures were approved by the RIKEN Animal Experiment Committee and were in accordance with the US NIH Guide for the Care and Use of Laboratory Animals. During testing the monkey was seated in a primate chair inside a dark room, with its head fixed. A video display was placed 57 cm from the monkey’s eyes to present a fixation point and visual stimuli (full-color flower images). Three lever switches were placed in front of the primate chair. Gaze position was measured by an infrared system (http://staff.aist.go.jp/k.matsuda/eye). Task was controlled and behavioral and neuronal data were recorded by computers running a commercially available system (Tempo for Windows, Reflective Computing). Behavioral tasks. The monkeys were trained on a task in which the monkeys adapted their behavior on the basis of visual feedback. It consisted of two types of blocks: one for visual fixation to instruct the monkey on a forthcoming positive feedback signal and the other for action learning using the positive 653 © 2007 Nature Publishing Group http://www.nature.com/natureneuroscience ARTICLES feedback stimulus and another negative feedback stimulus (Fig. 1a). The two types of blocks were alternated (Fig. 1b). In the visual block, a white fixation point (0.441 wide) appeared after an intertrial interval varying from 1 to 1.5 s. The monkey had to fixate its gaze on the point and hold down the central lever with the right hand for 0.5 s, and then a visual image (71 wide) was presented for 0.6 s. A drop of water was delivered to the monkey at the end of the stimulus-presentation period. The monkey had to maintain eye fixation and keep the central lever depressed until the water delivery. A failure in gaze fixation or central lever pressing aborted the trial. When the trial had been successfully repeated three times, the task moved to the action-learning block. In the action-learning block, after a 1–1.5 s intertrial interval, the fixation point appeared and the monkey had to fixate it and hold down the central lever. After a period varying from 0.8 to 1.3 s, the color of the fixation point changed to red, which instructed the monkey to initiate an action. The required action was to press either the left or right lever and return to the central lever within 2 s. There was a 0.5 s delay after completion of the motor response, and then a visual image was presented for 0.6 s as a feedback signal to the executed action. A correct action was followed by the visual stimulus that was presented in the preceding visual block as positive feedback, whereas an incorrect action was followed by a different visual stimulus as negative feedback. The monkey had to continue the gaze fixation and central-lever pressing until the offset of the feedback presentation. The trial was immediately aborted after a failure of either gaze fixation or central-lever pressing. The monkey had to refrain from pressing any of the levers during any inter-trial intervals in the task. The intertrial interval would have been reset to its beginning upon an erroneous pressing during the intertrial interval, but this seldom occurred during recordings of neuronal activity. The correct motor response (left or right) was fixed within each actionlearning block but pseudo-randomly changed between blocks, so that the monkey could not know the correct response at the beginning of each actionlearning block. When the monkey had repeated the correct response in three or four consecutive trials (randomly determined by the computer) in an actionlearning block, the task moved to a new visual block (Fig. 1b). When a trial was aborted by a fixation break or central-lever release during the presentation of the positive feedback signal, the trial was regarded as a correct trial, but the monkey was required to perform one more correct trial before moving to a visual block. Because of this requirement, the number of consecutive trials in which the monkeys saw the positive feedback stimulus could be more than four in some action-learning blocks. Two pairs of positive and negative feedback stimuli were alternated every four repetitions of visual and action-learning blocks in order to maintain the monkey’s attention to the stimuli. One pair was used in four repetitions of visual and action-learning blocks, and then the other pair was used for the next four repetitions. After 32 repetitions of visual and action-learning blocks, the two pairs were replaced with two new pairs. Recordings. Action potentials of single neurons were recorded extracellularly with tungsten electrodes (impedance of 8–10 MO, FHC) while the monkeys performed the task. Electrodes were advanced by an oil-driven micromanipulator (Narishige) through a stainless steel guide tube with agarose filling the recording chamber. Single neuronal discharges were collected at 1 kHz using a template-matching spike discriminator (Alpha-Omega). We recorded the activity of single cells from both the medial PFC and lateral PFC (Fig. 2 and Supplementary Fig. 1 online). The medial PFC cells were recorded in the dorsal bank and fundus of the anterior part of cingulate sulcus, in both hemispheres of both monkeys. The anterior-posterior range of recordings was A30–A35 in monkey 1 and A31–A37 in monkey 2, which were largely located anterior to the genu of the corpus callosum and the anterior tip of the arcuate sulcal inferior limb. The recording regions corresponded to area 24b of Carmichael and Price44 and area 9 according to Barbas and Pandya45. They overlapped with both areas 9 and 8B in the definition of Walker46, probably more with area 9. Our notation of the medial recording regions as the medial PFC is not accurate in that it partly included area 8B, which is the transitional zone between the agranular and prefrontal granular regions. Our recording regions from the medial PFC did not overlap with the rostral cingulate motor area47–49 but partly overlapped with an anterior part of regions 654 called anterior cingulate cortex in some previous papers13,50. The lateral PFC cells were recorded from the lateral surface both dorsal and ventral to the principal sulcus, in left hemispheres of both monkeys. The regions of recordings corresponded to the middle part of the anterior-posterior extent of the sulcus, and ranged from A31 to A38 in monkey 1 and A33–A38 in monkey 2. They corresponded to area 46 (refs. 45,46). The position and extent of recordings were determined on the basis of anatomical MRI images (4.0 T, Varian NMR Instruments) taken before the surgery. Data analyses. Because most of the action-learning blocks were accomplished with none or one initial error trial, we classified them into two types: one included the first (C1), second (C2), and third correct trials (C3) but did not include any error trials, and the other included the initial error (E1) and subsequent consecutive correct trials (eC1, eC2, and eC3). We analyzed the neuronal data in C1, C2, C3, E1, eC1, eC2, and eC3 derived from these two types of action-learning blocks. The fourth correct trial in each block was not analyzed, because the number of these trials was small. We analyzed only cells recorded for 16–32 repetitions of visual and action-learning blocks. The magnitude of responses to the visual stimulus presentation in both action-learning and visual blocks was quantified by the mean firing rate within the window from 100 to 400 ms of the stimulus onset. The significance of responses in the action-learning blocks was examined by comparing the firing rate during the response window with the mean firing rate within a 400-ms window immediately before the stimulus onset by a one-way repeatedmeasures ANOVA (P o 0.05) for each stimulus. For cells that showed significant responses to at least one feedback stimulus, we compared the responses to positive feedback stimuli with those to negative feedback stimuli by one-way factorial ANOVA (P o 0.05). For this latter analysis, data were combined for two positive or two negative feedback stimuli and for two actions (left and right). We performed both analyses on responses in the first trials in individual action-learning blocks, where the positive and negative feedback stimuli were equally expected. To further examine whether the selectivity for positive feedback stimuli versus negative feedback stimuli was only a reflection of the selectivity for visual features of the stimuli or representation of the type of feedback (positive or negative) to which the stimuli were assigned, we calculated the following three indices for each neuron. A¼ ðP1 + P2Þ ðN1 + N2Þ P1 + P2 + N1 + N2 B¼ ðP1 + N1Þ ðP2 + N2Þ P1 + P2 + N1 + N2 C¼ ðP1 + N2Þ ðP2 + N1Þ P1 + P2 + N1 + N2 where P1, N1, P2, and N2 represent the magnitude of responses to the positive stimuli and negative stimuli in the first and second pairs, respectively. If the responses were determined only by the physical features of visual stimuli, the three indices would be distributed around 0 and be similar to one another. If the responses of many cells were determined by the type of feedback, the distribution of index A would be wider than those of indices B and C. We also examined responses to feedback presentation in the action-learning blocks for the selectivity for visual features and for coupled actions (left versus right). The visual selectivity was examined by comparing the magnitude of responses between the two positive feedback stimuli. A two-way factorial ANOVA with the factors of actions and stimuli was applied to responses of each of the medial PFC cells that showed significant responses to at least one feedback stimulus. The responses in C1 and eC1 trials were combined to have enough numbers of trials in each action-stimulus combination. This combination was justified by the fact that the responses to the positive feedback stimuli were not significantly different between C1 and eC1 in the populations of positive feedback–preferring cells (P ¼ 0.12, Wilcoxon matched-pairs test), negative feedback–preferring cells (P ¼ 0.09) and nondifferential cells (P ¼ 0.80). VOLUME 10 [ NUMBER 5 [ MAY 2007 NATURE NEUROSCIENCE © 2007 Nature Publishing Group http://www.nature.com/natureneuroscience ARTICLES Responses to the positive stimuli decreased from C1 to later trials in each action-learning block in many medial PFC cells. To examine the significance of this trend in cell population, we compared the magnitude of responses between C1 and later correct trials (C2, C3, eC1, eC2, eC3) by Wilcoxon matched-pairs test (P o 0.05). This test was applied to a population consisting of cells with significant responses to positive stimuli and significant preference for positive stimuli compared with negative stimuli (positive feedback–preferring medial PFC cells) and to a population of cells with significant responses to some stimuli but with no significant preference between positive and negative stimuli (nondifferential medial PFC cells). To avoid possible effects of random fluctuations of responses in C1, we repeated the procedure by using half of the C1 trials (in odd blocks in each action-outcome combination) for cell selection and the remaining half for the comparison between C1 and later correct trials. To determine the parameters with which the model best fit the monkey’s actual action selections, we calculated a likelihood function l(y | y) for each set of parameters (y) with particular behavioral data (y) by the following procedures. First, the action values, which were 0 at the beginning of each action-learning block, were sequentially determined following actual action series in each action-learning block by using equations (2) and (3) (for the single-update model) or equations (2) through (4) (for the double-update model) with a parameter set y. Then, the estimate of the probability (p(a,t | y)) for the monkey selecting the action that the monkey actually selected (a) in the tth trial was calculated from the action values at that time by equation (1). Finally, the likelihood function was obtained by a product of the probabilities in all the trials included in y. Y pða; tjyÞ lðyjyÞ ¼ Reinforcement learning model. The amount of information that the monkey obtained from each feedback presentation in the action-learning block for the improvement of performance in the subsequent trial can be estimated by using reinforcement learning models. A group of reinforcement learning models assumes that the monkey keeps estimated values for each type of action and selects an action depending on the values. The estimated values are updated when the outcome turns up after execution of an action, based on the difference between the goodness of the outcome and the estimated value of the executed action. We assumed a Boltzmann selection rule for action selection. The probability of selecting an action a (either left (L) or right (R)) is given by The set of parameters that provided the largest value of the likelihood function should be taken as the best-fit parameters. As y, we pooled, separately for each monkey, the behavioral data of all the sessions in which we recorded neuronal activity. To save the computation time to determine the best-fit parameters, we used the Metropolis-Hastings algorithm (see Supplementary Methods online). To compare the goodness of the best-fit between the single-update and double-update models, we calculated Akaike’s information criterion (AIC) and the bayesian information criterion (BIC) by the following formulas. t pðaÞ ¼ expðbQðaÞÞ expðbQðLÞÞ + expðbQðRÞÞ ð1Þ BIC ¼ 2L + k log n where Q(a) is action values of action a. b is an inverse temperature, which inversely relates to randomness in action selection (b Z 0). The outcome of action in the action-learning block was the goodness of the feedback stimulus. When the feedback stimulus was presented, the value of the executed action was updated by dQðaÞ ¼ r QðaÞ QðaÞ ð2Þ ð3Þ QðaÞ + adQðaÞ where dQ(a) is the prediction error, r is the goodness of the feedback stimulus, and a is the learning rate (0 o a o 1) (ref. 25). r was 1 for the positive feedback stimuli and nneg (–1 r nneg r 0) for the negative feedback stimuli. Because we do not know the relative size of impact evoked by the negative feedback stimuli, we set the value of the negative feedback stimuli as a variable parameter relative to that of the positive feedback stimuli. The action values were reset to 0 at the beginning of each action-learning block and sequentially changed along the series of actions within each action-learning block. This model has three parameters, a, b, and nneg. We represented the goodness of feedback stimuli by the values (1 and nneg) that did not change along the series of actions within each action-learning block, because we intended to analyze the process of performance monitoring for performance adjustment. The goodness of feedback stimuli would have to be a function of the number of accumulated correct trials in each block if we focused on the motivational value of the situation, which was likely to increase along the series of correct actions as the primary reward delivery approached. Note that only the value of the selected action is updated in the abovedescribed model (single-update model), which is also true in original Q-learning models. Because there were only two possible types of actions (left and right) and one action was always correct in an action-learning block in our paradigm, it is possible that the value of the action unselected in the trial ( a) was also updated when the feedback stimulus was provided to the selected action. Therefore, we considered a second model in which the value of a is updated also by Qð aÞ Qð aÞ + iadQðaÞ ð4Þ where i is an interaction factor (–1 r i r 0). This double-update model has four parameters, a, b , nneg and i. NATURE NEUROSCIENCE VOLUME 10 [ NUMBER 5 AIC ¼ 2L + 2k [ MAY 2007 where L is the logarithm of the likelihood with the best-fit parameters, k is the number of parameters (3 and 4 for the single- and double-update models, respectively), and n is the total number of trials. Smaller values indicate better fitting. Note: Supplementary information is available on the Nature Neuroscience website. ACKNOWLEDGMENTS This research was partly supported by the Grant-in-Aid for Scientific Research on Priority Areas (17022047) from the Ministry of Education, Culture, Sports, Science and Technology of Japan. We thank W. Schultz for advice about task design, S. Shimamune and K. Murayama for discussion, R. A. Waggoner for taking MRI images, A. Phillips for developing a program for presenting visual stimuli, J. Helen for improving the English, and M. Tomonaga, H. Nakahara and W. Schultz for comments on an early manuscript. COMPETING INTERESTS STATEMENT The authors declare no competing interests. Published online at http://www.nature.com/natureneuroscience Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions 1. Woodworth, R.S. Dynamics of Behavior (Holt, New York, 1958). 2. Daw, N.D. & Doya, K. The computational neurobiology of learning and reward. Curr. Opin. Neurobiol. 16, 199–204 (2006). 3. Matsumoto, K. & Tanaka, K. The role of the medial prefrontal cortex in achieving goals. Curr. Opin. Neurobiol. 14, 178–185 (2004). 4. Rushworth, M.F., Walton, M.E., Kennerley, S.W. & Bannerman, D.M. Action sets and decisions in the medial frontal cortex. Trends Cogn. Sci. 8, 410–417 (2004). 5. Falkenstein, M., Hohnsbein, J., Hoormann, J. & Blanke, L. Effects of crossmodal divided attention on late ERP components. II. Error processing in choice reaction tasks. Electroencephalogr. Clin. Neurophysiol. 78, 447–455 (1991). 6. Gehring, W.J., Goss, B., Coles, M.G.H., Meyer, D.E. & Donchin, E. A neural system for error detection and compensation. Psychol. Sci. 4, 385–390 (1993). 7. Miltner, W.H.R., Braun, C.H. & Coles, M.G.H. Event-related brain potentials following incorrect feedback in a time-estimation task: evidence for a ‘‘generic’’ neural system for error detection. J. Cogn. Neurosci. 9, 788–798 (1997). 8. Carter, C.S. et al. Anterior cingulate cortex, error detection, and the online monitoring of performance. Science 280, 747–749 (1998). 9. Ullsperger, M. & von Cramon, D.Y. Error monitoring using external feedback: specific roles of the habenular complex, the reward system, and the cingulate motor area revealed by functional magnetic resonance imaging. J. Neurosci. 23, 4308–4314 (2003). 655 © 2007 Nature Publishing Group http://www.nature.com/natureneuroscience ARTICLES 10. Holroyd, C.B. et al. Dorsal anterior cingulate cortex shows fMRI response to internal and external error signals. Nat. Neurosci. 7, 497–498 (2004). 11. Mars, R.B. et al. Neural dynamics of error processing in medial frontal cortex. Neuroimage 28, 1007–1013 (2005). 12. Niki, H. & Watanabe, M. Prefrontal and cingulate unit activity during timing behavior in the monkey. Brain Res. 171, 213–224 (1979). 13. Ito, S., Stuphorn, V., Brown, J.W. & Schall, J.D. Performance monitoring by the anterior cingulate cortex during saccade countermanding. Science 302, 120–122 (2003). 14. Bartholow, B.D. et al. Strategic control and medial frontal negativity: beyond errors and response conflict. Psychophysiology 42, 33–42 (2005). 15. Pailing, P.E. & Segalowitz, S.J. The effects of uncertainty in error monitoring on associated ERPs. Brain Cogn. 56, 215–233 (2004). 16. Vidal, F., Burle, B., Bonnet, M., Grapperon, J. & Hasbroucq, T. Error negativity on correct trials: a reexamination of available data. Biol. Psychol. 64, 265–282 (2003). 17. Knutson, B., Westdorp, A., Kaiser, E. & Hommer, D. FMRI visualization of brain activity during a monetary incentive delay task. Neuroimage 12, 20–27 (2000). 18. Walton, M.E., Devlin, J.T. & Rushworth, M.F. Interactions between decision making and performance monitoring within prefrontal cortex. Nat. Neurosci. 7, 1259–1265 (2004). 19. Holroyd, C.B. & Coles, M.G. The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychol. Rev. 109, 679–709 (2002). 20. Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997). 21. Frank, M.J., Woroch, B.S. & Curran, T. Error-related negativity predicts reinforcement learning and conflict biases. Neuron 47, 495–501 (2005). 22. Holroyd, C.B., Nieuwenhuis, S., Yeung, N. & Cohen, J.D. Errors in reward prediction are reflected in the event-related brain potential. Neuroreport 14, 2481–2484 (2003). 23. Amiez, C., Joseph, J.P. & Procyk, E. Anterior cingulate error-related activity is modulated by predicted reward. Eur. J. Neurosci. 21, 3447–3452 (2005). 24. Gehring, W.J. & Knight, R.T. Prefrontal-cingulate interactions in action monitoring. Nat. Neurosci. 3, 516–520 (2000). 25. Watkins, C. & Dayan, P. Q-learning. Mach. Learn. 8, 279–292 (1992). 26. Sutton, R.S. & Barto, A.G. Reinforcement Learning: An Introduction (MIT Press, Cambridge, Massachusetts, 1998). 27. Yeung, N. & Sanfey, A.G. Independent coding of reward magnitude and valence in the human brain. J. Neurosci. 24, 6258–6264 (2004). 28. Donkers, F.C., Nieuwenhuis, S. & van Boxtel, G.J. Mediofrontal negativities in the absence of responding. Brain Res. Cogn. Brain Res. 25, 777–787 (2005). 29. Lewis, D.A., Foote, S.L., Goldstein, M. & Morrison, J.H. The dopaminergic innervation of monkey prefrontal cortex: a tyrosine hydroxylase immunohistochemical study. Brain Res. 449, 225–243 (1988). 656 30. Fiorillo, C.D., Tobler, P.N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003). 31. Bayer, H.M. & Glimcher, P.W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005). 32. Satoh, T., Nakai, S., Sato, T. & Kimura, M. Correlated coding of motivation and outcome of decision by dopamine neurons. J. Neurosci. 23, 9913–9923 (2003). 33. Daw, N.D., Kakade, S. & Dayan, P. Opponent interactions between serotonin and dopamine. Neural Netw. 15, 603–616 (2002). 34. Robbins, T.W. Chemistry of the mind: neurochemical modulation of prefrontal cortical function. J. Comp. Neurol. 493, 140–146 (2005). 35. Ullsperger, M. & von Cramon, D.Y. The role of intact frontostriatal circuits in error processing. J. Cogn. Neurosci. 18, 651–664 (2006). 36. Fantino, E. Choice and rate of reinforcement. J. Exp. Anal. Behav. 12, 723–730 (1969). 37. Skinner, B.F. Science and Human Behavior (Macmillan, New York, 1953). 38. Williams, B.A. Conditioned reinforcement: experimental and theoretical issues. Behav. Anal. 17, 261–285 (1994). 39. Parkinson, J.A. et al. The role of the primate amygdala in conditioned reinforcement. J. Neurosci. 21, 7770–7780 (2001). 40. Allport, G.W. Pattern and Growth in Personality (Holt, Rinehart and Winston, New York, 1961). 41. Maslow, A.H. Motivation and Personality (Harper & Row, New York, 1970). 42. Schultz, W. Multiple reward signals in the brain. Nat. Rev. Neurosci. 1, 199–207 (2000). 43. Blatter, K. & Schultz, W. Rewarding properties of visual stimuli. Exp. Brain Res. 168, 541–546 (2006). 44. Carmichael, S.T. & Price, J.L. Architectonic subdivision of the orbital and medial prefrontal cortex in the macaque monkey. J. Comp. Neurol. 346, 366–402 (1994). 45. Barbas, H. & Pandya, D.N. Architecture and intrinsic connections of the prefrontal cortex in the rhesus monkey. J. Comp. Neurol. 286, 353–375 (1989). 46. Walker, A.E. A cytoarchitectural study of the prefrontal area of the macaque monkey. J. Comp. Neurol. 73, 59–86 (1940). 47. Shima, K. et al. Two movement-related foci in the primate cingulate cortex observed in signal-triggered and self-paced forelimb movements. J. Neurophysiol. 65, 188–202 (1991). 48. Procyk, E., Tanaka, Y.L. & Joseph, J.P. Anterior cingulate activity during routine and non-routine sequential behaviors in macaques. Nat. Neurosci. 3, 502–508 (2000). 49. Shidara, M. & Richmond, B.J. Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science 296, 1709–1711 (2002). 50. Nakamura, K., Roesch, M.R. & Olson, C.R. Neuronal activity in macaque SEF and ACC during performance of tasks involving conflict. J. Neurophysiol. 93, 884–908 (2005). VOLUME 10 [ NUMBER 5 [ MAY 2007 NATURE NEUROSCIENCE
© Copyright 2025 Paperzz