Reward system


The reward system is a group of neural structures responsible for incentive salience, associative learning, and positively-valenced emotions, particularly ones which involve pleasure as a core component. Reward is the attractive and motivational property of a stimulus that induces appetitive behavior, also known as approach behavior, and consummatory behavior. In its description of a rewarding stimulus, a review on reward neuroscience noted, "any stimulus, object, event, activity, or situation that has the potential to make us approach and consume it is by definition a reward". In operant conditioning, rewarding stimuli function as positive reinforcers; however, the converse statement also holds true: positive reinforcers are rewarding.
Primary rewards are a class of rewarding stimuli which facilitate the survival of one's self and offspring, and include homeostatic and reproductive rewards. Intrinsic rewards are unconditioned rewards that are attractive and motivate behavior because they are inherently pleasurable. Extrinsic rewards are conditioned rewards that are attractive and motivate behavior, but are not inherently pleasurable. Extrinsic rewards derive their motivational value as a result of a learned association with intrinsic rewards. Extrinsic rewards may also elicit pleasure after being classically conditioned with intrinsic rewards.
Survival for most animal species depends upon maximizing contact with beneficial stimuli and minimizing contact with harmful stimuli. Reward cognition serves to increase the likelihood of survival and reproduction by causing associative learning, eliciting approach and consummatory behavior, and triggering positively-valenced emotions. Thus, reward is a mechanism that evolved to help increase the adaptive fitness of animals.

Definition

In neuroscience, the reward system is a collection of brain structures and neural pathways that are responsible for reward-related cognition, including associative learning, incentive salience, and positively-valenced emotions, particularly emotions that involve pleasure.
Terms that are commonly used to describe behavior related to the "wanting" or desire component of reward include appetitive behavior, approach behavior, preparatory behavior, instrumental behavior, anticipatory behavior, and seeking. Terms that are commonly used to describe behavior related to the "liking" or pleasure component of reward include consummatory behavior and taking behavior.
The three primary functions of rewards are their capacity to:
  1. produce associative learning ;
  2. affect decision-making and induce approach behavior ;
  3. elicit positively-valenced emotions, particularly pleasure.

    Anatomy

The brain structures that compose the reward system are located primarily within the cortico-basal ganglia-thalamo-cortical loop; the basal ganglia portion of the loop drives activity within the reward system. Most of the pathways that connect structures within the reward system are glutamatergic interneurons, GABAergic medium spiny neurons, and dopaminergic projection neurons, although other types of projection neurons contribute. The reward system includes the ventral tegmental area, ventral striatum, dorsal striatum, substantia nigra, prefrontal cortex, anterior cingulate cortex, insular cortex, hippocampus, hypothalamus, thalamus, subthalamic nucleus, globus pallidus, ventral pallidum, parabrachial nucleus, amygdala, and the remainder of the extended amygdala. The dorsal raphe nucleus and cerebellum appear to modulate some forms of reward-related cognition and behaviors as well. The laterodorsal tegmental nucleus, pedunculopontine nucleus, and lateral habenula are also capable of inducing aversive salience and incentive salience through their projections to the ventral tegmental area. The LDT and PPTg both send glutaminergic projections to the VTA that synapse on dopaminergic neurons, both of which can produce incentive salience. The LHb sends glutaminergic projections, the majority of which synapse on GABAergic RMTg neurons that in turn drive inhibition of dopaminergic VTA neurons, although some LHb projections terminate on VTA interneurons. These LHb projections are activated both by aversive stimuli and by the absence of an expected reward, and excitation of the LHb can induce aversion.
Most of the dopamine pathways that project out of the ventral tegmental area are part of the reward system; in these pathways, dopamine acts on D1-like receptors or D2-like receptors to either stimulate or inhibit the production of cAMP. The GABAergic medium spiny neurons of the striatum are components of the reward system as well. The glutamatergic projection nuclei in the subthalamic nucleus, prefrontal cortex, hippocampus, thalamus, and amygdala connect to other parts of the reward system via glutamate pathways. The medial forebrain bundle, which is a set of many neural pathways that mediate brain stimulation reward, is also a component of the reward system.
Two theories exist with regard to the activity of the nucleus accumbens and the generation liking and wanting. The inhibition hypothesis proposes that the nucleus accumbens exerts tonic inhibitory effects on downstream structures such as the ventral pallidum, hypothalamus or ventral tegmental area, and that in inhibiting in the nucleus accumbens, these structures are excited, "releasing" reward related behavior. While GABA receptor agonists are capable of eliciting both "liking" and "wanting" reactions in the nucleus accumbens, glutaminergic inputs from the basolateral amygdala, ventral hippocampus, and medial prefrontal cortex can drive incentive salience. Furthermore, while most studies find that NAcc neurons reduce firing in response to reward, a number of studies find the opposite response. This had led to the proposal of the disinhibition hypothesis, that proposes that excitation or NAcc neurons, or at least certain subsets, drives reward related behavior.
After nearly 50 years of research on brain-stimulation reward, experts have certified that dozens of sites in the brain will maintain intracranial self-stimulation. Regions include the lateral hypothalamus and medial forebrain bundles, which are especially effective. Stimulation there activates fibers that form the ascending pathways; the ascending pathways include the mesolimbic dopamine pathway, which projects from the ventral tegmental area to the nucleus accumbens. There are several explanations as to why the mesolimbic dopamine pathway is central to circuits mediating reward. First, there is a marked increase in dopamine release from the mesolimbic pathway when animals engage in intracranial self-stimulation. Second, experiments consistently indicate that brain-stimulation reward stimulates the reinforcement of pathways that are normally activated by natural rewards, and drug reward or intracranial self-stimulation can exert more powerful activation of central reward mechanisms because they activate the reward center directly rather than through the peripheral nerves. Third, when animals are administered addictive drugs or engage in naturally rewarding behaviors, such as feeding or sexual activity, there is a marked release of dopamine within the nucleus accumbens. However, dopamine is not the only reward compound in the brain.

Pleasure centers

is a component of reward, but not all rewards are pleasurable. Stimuli that are naturally pleasurable, and therefore attractive, are known as intrinsic rewards, whereas stimuli that are attractive and motivate approach behavior, but are not inherently pleasurable, are termed extrinsic rewards. Extrinsic rewards are rewarding as a result of a learned association with an intrinsic reward. In other words, extrinsic rewards function as motivational magnets that elicit "wanting", but not "liking" reactions once they have been acquired.
The reward system contains – i.e., brain structures that mediate pleasure or "liking" reactions from intrinsic rewards. hedonic hotspots have been identified in subcompartments within the nucleus accumbens shell, ventral pallidum, parabrachial nucleus, orbitofrontal cortex, and insular cortex. The hotspot within the nucleus accumbens shell is located in the rostrodorsal quadrant of the medial shell, while the hedonic coldspot is located in a more posterior region. The posterior ventral pallidum also contains a hedonic hotspot, while the anterior ventral pallidum contains a hedonic coldspot. Microinjections of opioids, endocannabinoids, and orexin are capable of enhancing liking in these hotspots. The hedonic hotspots located in the anterior OFC and posterior insula have been demonstrated to respond to orexin and opioids, as has the overlapping hedonic coldspot in the anterior insula and posterior OFC. On the other hand, the parabrachial nucleus hotspot has only been demonstrated to respond to benzodiazepine receptor agonists.
Hedonic hotspots are functionally linked, in that activation of one hotspot results in the recruitment of the others, as indexed by the induced expression of c-Fos, an immediate early gene. Furthermore, inhibition of one hotspot results in the blunting of the effects of activating another hotspot. Therefore, the simultaneous activation of every hedonic hotspot within the reward system is believed to be necessary for generating the sensation of an intense euphoria.

Wanting and liking

is the "wanting" or "desire" attribute, which includes a motivational component, that is assigned to a rewarding stimulus by the nucleus accumbens shell. The degree of dopamine neurotransmission into the NAcc shell from the mesolimbic pathway is highly correlated with the magnitude of incentive salience for rewarding stimuli.
Activation of the dorsorostral region of the nucleus accumbens correlates with increases in wanting without concurrent increases in liking. However, dopaminergic neurotransmission into the nucleus accumbens shell is responsible not only for appetitive motivational salience towards rewarding stimuli, but also for aversive motivational salience, which directs behavior away from undesirable stimuli. In the dorsal striatum, activation of D1 expressing MSNs produces appetitive incentive salience, while activation of D2 expressing MSNs produces aversion. In the NAcc, such a dichotomy is not as clear cut, and activation of both D1 and D2 MSNs is sufficient to enhance motivation, likely via disinhibiting the VTA through inhibiting the ventral pallidum.
Robinson and Berridge's 1993 incentive-sensitization theory proposed that reward contains separable psychological components: wanting and liking. To explain increasing contact with a certain stimulus such as chocolate, there are two independent factors at work – our desire to have the chocolate and the pleasure effect of the chocolate. According to Robinson and Berridge, wanting and liking are two aspects of the same process, so rewards are usually wanted and liked to the same degree. However, wanting and liking also change independently under certain circumstances. For example, rats that do not eat after receiving dopamine act as though they still like food. In another example, activated self-stimulation electrodes in the lateral hypothalamus of rats increase appetite, but also cause more adverse reactions to tastes such as sugar and salt; apparently, the stimulation increases wanting but not liking. Such results demonstrate that our reward system includes independent processes of wanting and liking. The wanting component is thought to be controlled by dopaminergic pathways, whereas the liking component is thought to be controlled by opiate-benzodiazepine systems.

Learning

Rewarding stimuli can drive learning in both the form of classical conditioning and operant conditioning. In classical conditioning, a reward can act as an unconditioned stimulus that, when associated with the conditioned stimulus, causes the conditioned stimulus to elicit both musculoskeletal and vegetative responses. In operant conditioning, a reward may act as a reinforcer in that it increases or supports actions that lead to itself. Learned behaviors may or may not be sensitive to the value of the outcomes they lead to; behaviors that are sensitive to the contingency of an outcome on the performance of an action as well as the outcome value are goal-directed, while elicited actions that are insensitive to contingency or value are called habits. This distinction is thought to reflected two forms of learning, model free and model based. Model free learning involves the simple caching and updating of values. In contrast, model based learning involves the storage and construction of an internal model of events that allows inference and flexible prediction. Although pavlovian conditioning is generally assumed to be model-free, the incentive salience assigned to a conditioned stimulus is flexible with regard to changes in internal motivational states.
Distinct neural systems are responsible for learning associations between stimuli and outcomes, actions and outcomes, and stimuli and responses. Although classical conditioning is not limited to the reward system, the enhancement of instrumental performance by stimuli requires the nucleus accumbens. Habitual and goal directed instrumental learning are dependent upon the lateral striatum and the medial striatum, respectively.
During instrumental learning, opposing changes in the ratio of AMPA to NMDA receptors and phosphorylated ERK occurs in the D1-type and D2-type MSNs that constitute the direct and indirect pathways, respectively. These changes in synaptic plasticity and the accompanying learning is dependent upon activation of striatal D1 and NMDA receptors. The intracellular cascade activated by D1 receptors involves the recruitment of protein kinase A, and through resulting phosphorylation of DARPP-32, the inhibition of phosphatases that deactivate ERK. NMDA receptors activate ERK through a different but interrelated Ras-Raf-MEK-ERK pathway. Alone NMDA mediated activation of ERK is self-limited, as NMDA activation also inhibits PKA mediated inhibition of ERK deactivating phosphatases. However, when D1 and NMDA cascades are co-activated, they work synergistically, and the resultant activation of ERK regulates synaptic plasticity in the form of spine restructuring, transport of AMPA receptors, regulation of CREB, and increasing cellular excitability via inhibiting Kv4.2

Disorders

Addiction

– a gene transcription factoroverexpression in the D1-type medium spiny neurons of the nucleus accumbens is the crucial common factor among virtually all forms of addiction that induces addiction-related behavior and neural plasticity. In particular, ΔFosB promotes self-administration, reward sensitization, and reward cross-sensitization effects among specific addictive drugs and behaviors. Certain epigenetic modifications of histone protein tails in specific regions of the brain are also known to play a crucial role in the molecular basis of addictions.
Addictive drugs and behaviors are rewarding and reinforcing due to their effects on the dopamine reward pathway.
The lateral hypothalamus and medial forebrain bundle has been the most-frequently-studied brain-stimulation reward site, particularly in studies of the effects of drugs on brain stimulation reward. The neurotransmitter system that has been most-clearly identified with the habit-forming actions of drugs-of-abuse is the mesolimbic dopamine system, with its efferent targets in the nucleus accumbens and its local GABAergic. The reward-relevant actions of amphetamine and cocaine are in the dopaminergic synapses of the nucleus accumbens and perhaps the medial prefrontal cortex. Rats also learn to lever-press for cocaine injections into the medial prefrontal cortex, which works by increasing dopamine turnover in the nucleus accumbens. Nicotine infused directly into the nucleus accumbens also enhances local dopamine release, presumably by a presynaptic action on the dopaminergic terminals of this region. Nicotinic receptors localize to dopaminergic cell bodies and local nicotine injections increase dopaminergic cell firing that is critical for nicotinic reward. Some additional habit-forming drugs are also likely to decrease the output of medium spiny neurons as a consequence, despite activating dopaminergic projections. For opiates, the lowest-threshold site for reward effects involves actions on GABAergic neurons in the ventral tegmental area, a secondary site of opiate-rewarding actions on medium spiny output neurons of the nucleus accumbens. Thus the following form the core of currently characterised drug-reward circuitry; GABAergic s to the mesolimbic dopamine neurons, the mesolimbic dopamine neurons themselves, and GABAergic efferents to the mesolimbic dopamine neurons.

Motivation

Dysfunctional motivational salience appears in a number of psychiatric symptoms and disorders. Anhedonia, traditionally defined as a reduced capacity to feel pleasure, has been reexamined as reflecting blunted incentive salience, as most anhedonic populations exhibit intact “liking”. On the other end of the spectrum, heightened incentive salience that is narrowed for specific stimuli is characteristic of behavioral and drug addictions. In the case of fear or paranoia, dysfunction may lie in elevated aversive salience.
Neuroimaging studies across diagnoses associated with anhedonia have reported reduced activity in the OFC and ventral striatum. One meta analysis reported anhedonia was associated with reduced neural response to reward anticipation in the caudate nucleus, putamen, nucleus accumbens and medial prefrontal cortex.

Mood disorders

Certain types of depression are associated with reduced motivation, as assessed by willingness to expend effort for reward. These abnormalities have been tentatively linked to reduced activity in areas of the striatum, and while dopaminergic abnormalities are hypothesized to play a role, most studies probing dopamine function in depression have reported inconsistent results. Although postmortem and neuroimaging studies have found abnormalities in numerous regions of the reward system, few findings are consistently replicated. Some studies have reported reduced NAcc, hippocampus, medial prefrontal cortex, and orbitofrontal cortex activity, as well as elevated basolateral amygdala and subgenual cingulate cortex activity during tasks related to reward or positive stimuli. These neuroimaging abnormalities are complemented by little post mortem research, but what little research has been done suggests reduced excitatory synapses in the mPFC. Reduced activity in the mPFC during reward related tasks appears to be localized to more dorsal regions, while the more ventral sgACC is hyperactive in depression.
Attempts to investigate underlying neural circuitry in animal models has also yielded conflicting results. Two paradigms are commonly used to simulate depression, chronic social defeat, and chronic mild stress, although many exist. CSDS produces reduced preference for sucrose, reduced social interactions, and increased immobility in the forced swim test. CMS similarly reduces sucrose preference, and behavioral despair as assessed by tail suspension and forced swim tests. Animals susceptible to CSDS exhibit increased phasic VTA firing, and inhibition of VTA-NAcc projections attenuates behavioral deficits induced by CSDS. However, inhibition of VTA- projections exacerbates social withdrawal. On the other hand, CMS associated reductions in sucrose preference and immobility were attenuated and exacerbated by VTA excitation and inhibition, respectively. Although these differences may be attributable to different stimulation protocols or poor translational paradigms, variable results may also lie in the heterogenous functionality of reward related regions.
Optogenetic stimulation of the mPFC as a whole produces antidepressant effects. This effect appears localized to the rodent homologue of the pgACC, as stimulation of the rodent homologue of the sgACC produces no behavioral effects. Furthermore, deep brain stimulation in the infralimbic cortex, which is thought to have an inhibitory effect, also produces an antidepressant effect. This finding is congruent with the observation that pharmacological inhibition of the infralimbic cortex attenuates depressive behaviors.

Schizophrenia

is associated with deficits in motivation, commonly grouped under other negative symptoms such as reduced spontaneous speech. The experience of “liking” is frequently reported to be intact, both behaviorally and neurally, although results may be specific to certain stimuli, such as monetary rewards. Furthermore, implicit learning and simple reward related tasks are also intact in schizophrenia. Rather, deficits in the reward system present during reward related tasks that are cognitively complex. These deficits are associated with both abnormal striatal and OFC activity, as well as abnormalities in regions associated with cognitive functions such as the dorsolateral prefrontal cortex.

History

The first clue to the presence of a reward system in the brain came with an accident discovery by James Olds and Peter Milner in 1954. They discovered that rats would perform behaviors such as pressing a bar, to administer a brief burst of electrical stimulation to specific sites in their brains. This phenomenon is called intracranial self-stimulation or brain stimulation reward. Typically, rats will press a lever hundreds or thousands of times per hour to obtain this brain stimulation, stopping only when they are exhausted. While trying to teach rats how to solve problems and run mazes, stimulation of certain regions of the brain where the stimulation was found seemed to give pleasure to the animals. They tried the same thing with humans and the results were similar. The explanation to why animals engage in a behavior that has no value to the survival of either themselves or their species is that the brain stimulation is activating the system underlying reward.
In a fundamental discovery made in 1954, researchers James Olds and Peter Milner found that low-voltage electrical stimulation of certain regions of the brain of the rat acted as a reward in teaching the animals to run mazes and solve problems. It seemed that stimulation of those parts of the brain gave the animals pleasure, and in later work humans reported pleasurable sensations from such stimulation. When rats were tested in Skinner boxes where they could stimulate the reward system by pressing a lever, the rats pressed for hours. Research in the next two decades established that dopamine is one of the main chemicals aiding neural signaling in these regions, and dopamine was suggested to be the brain's "pleasure chemical".
Ivan Pavlov was a psychologist who used the reward system to study classical conditioning. Pavlov used the reward system by rewarding dogs with food after they had heard a bell or another stimulus. Pavlov was rewarding the dogs so that the dogs associated food, the reward, with the bell, the stimulus.
Edward L. Thorndike used the reward system to study operant conditioning. He began by putting cats in a puzzle box and placing food outside of the box so that the cat wanted to escape. The cats worked to get out of the puzzle box to get to the food. Although the cats ate the food after they escaped the box, Thorndike learned that the cats attempted to escape the box without the reward of food. Thorndike used the rewards of food and freedom to stimulate the reward system of the cats. Thorndike used this to see how the cats learned to escape the box.

Other species

Animals quickly learn to press a bar to obtain an injection of opiates directly into the midbrain tegmentum or the nucleus accumbens. The same animals do not work to obtain the opiates if the dopaminergic neurons of the mesolimbic pathway are inactivated. In this perspective, animals, like humans, engage in behaviors that increase dopamine release.
Kent Berridge, a researcher in affective neuroscience, found that sweet and bitter tastes produced distinct orofacial expressions, and these expressions were similarly displayed by human newborns, orangutans, and rats. This was evidence that pleasure has objective features and was essentially the same across various animal species. Most neuroscience studies have shown that the more dopamine released by the reward, the more effective the reward is. This is called the hedonic impact, which can be changed by the effort for the reward and the reward itself. Berridge discovered that blocking dopamine systems did not seem to change the positive reaction to something sweet. In other words, the hedonic impact did not change based on the amount of sugar. This discounted the conventional assumption that dopamine mediates pleasure. Even with more-intense dopamine alterations, the data seemed to remain constant. However, a clinical study from January 2019 that assessed the effect of a dopamine precursor, antagonist, and a placebo on reward responses to music – including the degree of pleasure experienced during musical chills, as measured by changes in electrodermal activity as well as subjective ratings – found that the manipulation of dopamine neurotransmission bidirectionally regulates pleasure cognition in human subjects. This research demonstrated that increased dopamine neurotransmission acts as a sine qua non condition for pleasurable hedonic reactions to music in humans.
Berridge developed the incentive salience hypothesis to address the wanting aspect of rewards. It explains the compulsive use of drugs by drug addicts even when the drug no longer produces euphoria, and the cravings experienced even after the individual has finished going through withdrawal. Some addicts respond to certain stimuli involving neural changes caused by drugs. This sensitization in the brain is similar to the effect of dopamine because wanting and liking reactions occur. Human and animal brains and behaviors experience similar changes regarding reward systems because these systems are so prominent.