Learning

After classical conditioning, an unrelated stimulus elicits a behavior that is normally exhibited in the presence of some other stimulus.

Pavlov's Dogs: Principles of Classical Conditioning

Classical conditioning is also called Pavlovian conditioning.

Pavlov (1849-1936) was a Nobel Prize winning (1904) physiologist who studied digestion.

Here's how classical conditioning works:

Before conditioning: Some unlearned, natural stimulus (unconditioned stimulus-UCS) brings about an instinctive response (unconditioned response-UCR) while a neutral stimulus does not.

During conditioning: The neutral stimulus presented with UCS. The UCR follows the UCS.

After conditioning: The neutral stimulus becomes a conditioned stimulus (CS) which brings about the conditioned response or conditioned reflex (CR) (which is the same behavior as the UCR).

Unlearning

Extinction

Repeated presentation of the CS without the UCS seems to decrease the association with the CR.

Spontaneous recovery

An extinguished CR may reappear after a rest period.

Factors Affecting Conditioning

Stimulus characteristics

Traditional classical conditioning theory holds that the nature of the neutral stimulus is unimportant.

Stimulus generalization

A stimulus similar to the original CS also elicits the CR.

My cats come running to the sound of Fritos, because they sound like potato chips (which they love).

Stimulus discrimination

A stimulus distinct from the CS does not elicit the CR.

The cats don't trouble their furry little heads over a box of raisin bran, doesn't sound enough like enough like chips.

Timing

Conditioning is strongest when the CS is presented immediately before the UCS (usually less than a few seconds).

If presented after or at the same time there is little or no conditioning. It was traditionally believed that if the UCS is presented after too long a delay, conditioning does not occur.

Predictability

Conditioning is strongest when the CS is always followed by the UCS (ie, reliably predicts the UCS).

Signal Strength

Conditioning is faster and stronger when the UCS is stronger (ie, louder, brighter, more painful, etc)

Attention

A subject is more likely to become conditioned to a stimulus that they are paying attention to.

Second order conditioning

Once conditioned, a CS can serve as the UCS to another neutral stimulus.

Pavlov to Watson

In 1913 JB Watson (1878-1958) founded behaviorism with the article "Psychology as the Behaviorist Views It."

Psychology should be the scientific study of behavior.

This was a reaction against the study of subjective mental processes carried out in the early 1900's.

Behavior can be observed. "Let us limit ourselves to things that can be observed, and formulate laws concerning only those things"(1924).

Watson held that nearly all human attributes are learned, even personality and intelligence. In 1924, he wrote:

"Give me a dozen health infants, well-formed, and my own specified world to them up in and I'll guarantee to take any one of them at random and train him to become any type of specialist I might select--doctor, lawyer…beggar man…thief."

Emotional Conditioning

In 1920, Watson conditioned a toddler (~9 mos) named Albert B to be afraid of Santa Claus' beard.

He paired a white rat with a loud noise.

Albert became conditioned to the CS after only 7 pairings.

The CR was never extinguished in poor Al (he moved away).

There are countless examples of emotional associations with particular stimuli.

I had a girlfriend in 1986 who eventually broke my heart. We spent hours listening to Bonnie Raitt's "Give It Up." If I hear any of the songs from that album now, I feel a pang of sadness.

This is a somewhat complex case of conditioning--what is the UCS?

Perhaps I continued to listen to the tape after she dumped me.

Advertisers take advantage of classical conditioning to have consumers associate their products with sex, comfort and affection.

Physiological Conditioning

The UCR (and CR) may be physiological responses (as was the dogs).

Cancer patients (who received chemotherapy) experience nausea and vomiting at the sight of a nurse.

The response may even be unconscious.

Chemotherapy recipients show decreased immune response before treatment.

Single Trial Learning

When I was about 6 years old, someone gave me a 2-lb Hershey bar, which I consumed in a single session with predictable results.

For years, even the smell of Hershey bar made me queasy. (Interestingly, I had no problem eating Nestle's chocolate or otherwise overeating).

This conditioning runs counter to classical principles:

There was only a single pairing.

The UCR followed the CS by at least an hour.

In the 1960's, John Garcia (b 1917) found that an induced taste aversion could be condition in rats with a single presentation, followed by a delayed (drug-induced) gastrointestinal upset.

The rats did not associate the taste with external stimuli such as an electric shock.

This finding also contradicts Pavlov and Watson who believed that one stimulus was the same as another.

Garcia and others have found that an internal stimulus such as taste is more easily associated with internal responses such as nausea.

Likewise, external stimuli such as lights and noises are more readily associated with externally painful stimuli such as electric shock.

This ability to become conditioned to an appropriate stimulus is called biopreparedness.

There may be evolutionary benefit to associating some stimuli more easily than others.

Taste aversion has been used for pest control (coyotes).

Habituation

Response to a stimulus often takes a different route than that of Pavlovian conditioning.

A stimulus which is repeated over and over without changing starts to have less of an effect on an organism.

You notice things less after a short while. For example, immediately after you put your shoes on, you notice that you are not barefoot. But right now you are probably unaware of the pressure of your shoes on your foot.

You will notice it more if the pressure is uncomfortable, but even then you will get used to it to a large degree.

This decreased effect of a repeated stimulus is called habituation.

Solomon proposed an opponent process theory to explain habituation.

The stimulus triggers an unlearned response he called the A-Process.

The A-Process triggers a reaction in the opposite direction, the B-Process.

With repeated exposure to the stimulus, the A-Process does not change, but the B-Process becomes faster and stronger, partially cancelling the effects of the A-Process.

Phobias

A phobia is is an extreme (even debilitating) irrational fear of an animal, object or situation.

The frightening stimulus may be slightly threatening (or not) but the fearful response is way out of proportion to the threat.

Little Albert was conditioned to a rat phobia.

Certain phobias are much more common than others, suggesting an innate quality.

Martin Seligman noticed that we are more likely to have a phobic aversion to spiders than to highway driving even though many more people are killed by auto accidents than by black widows.

There may be biological preparedness for certain aversions.

Some stimulus/response combinations are more easily associated by an organism.

This connects with Garcia's taste aversions.

Birds that select their food by sight (and have little or no sense of smell and taste) can be conditioned to develop an aversion to colored water.

Cognitive View of Classical Conditioning

Cognitive psychologists are interested in the mental processes that accompany learning (and other behavior).

Reliable vs unreliable signals

The CR is not merely associated with the CS; it is also interpreted as information.

For conditioning to occur, the CS must be judged to be a reliable signal (that the UCS will be presented).

Pavlov suggested that the CS has predictive value: the dogs salivated because the bell reliably predicted the presentation of food.

Robert Rescorla (b 1940) studied conditioning in rats and concluded that classical conditioning is a mechanism that animals use in learning about the relationships between events in the world.

This is different from Watson's mechanical view.

Operant Conditioning

Operant, or instrumental, conditioning affects the rate at which a non-reflexive or voluntary behavior is exhibited.

Behavior is formed and sustained by its results.

Thorndike's Law of Effect

Responses (behaviors) that are followed by a satisfying stimulus are strengthened, or increased in frequency. Responses followed by an unpleasant stimulus are weakened, or decreased in frequency.

Thorndike (1874-1949) studied cats in puzzle boxes.

A cat would be trapped in a box that would open if a lever was moved or a loop was pulled, or some such device was activated.

The cats would try various behaviors to get out and eventually would inadvertently activate the escape mechanism.

On repeated trials, the cats would take less time to spring the door (and earn a treat).

Thorndike believed that the cats learned to solve the puzzle by trial and error, attempting various behaviors and associating the successful ones with a reward.

Thorndike called this learning instrumental conditioning.

BF Skinner (1904-1990)

Skinner was a behaviorist.

He sought the natural laws of behavior.

Psychology should study only observable behavior.

The explanation of a behavior can be found in the environment.

Skinner admitted that internal states (thoughts, feelings, beliefs, etc) exist, but held that they cannot be observed and cannot be used to explain behavior.

Eg, you do not have difficulty with public speaking because you are afraid. Rather, you speak with difficulty and you have a racing pulse, sweaty palms, etc, because you have been conditioned to have both responses.

Skinner coined the term operant conditioning, the process of changing the frequency of a behavior by altering the consequences of that behavior.

Technically there is a difference between instrumental and operant conditioning:

In instrumental conditioning, a situation is devised by an experimenter and the time to response is the dependent variable.

In operant conditioning, the subject is free to respond ad lib and the rate of responding is the dependent variable.

An operant is a spontaneously emitted behavior (ie, voluntary) that operates on the organism's surroundings to produce certain consequences.

Much of the terminology and laws of operant conditioning were worked out by Skinner.

Skinner invented the operant chamber, better known as the Skinner box.

A Skinner box is a small cage with a food dispenser.

In a Skinner box, a rat or a pigeon (or a person for that matter) can be conditioned to press a lever or peck at a spot in response to some discriminative stimulus such as a light or a tone.

Reinforcement Increases Response

Reinforcement occurs when the consequences of a behavior increase the frequency of the behavior.

The behavior is called an operant

The stimulus is called a reinforcing stimulus or reinforcer.

Positive and negative reinforcement:

Positive reinforcement is the presentation of a reinforcing stimulus following an operant thereby increasing the frequency of the operant.

We are tempted to call positively reinforcing stimuli "pleasant," or "desirable."

Skinner would not approve of the use of such subjective descriptions.

Not all positive reinforcers are pleasant (but all increase frequency of behavior).

Negative reinforcement is the withdrawal of a stimulus following an operant, thereby increasing the frequency of the operant.

NEGATIVE REINFORCEMENT IS NOT PUNISHMENT (though you could think of it as the removal of punishment).

We are tempted to call negative reinforcers "unpleasant," or "distressing."

Behaviorists use the term "aversive."

Negative reinforcement teaches us to escape or avoid distress.

Escape conditioning is straight negative reinforcement (you turn off the TV when an ad comes on).

Avoidance conditioning has an element of classical conditioning: A neutral stimulus becomes a signal that some aversive stimuli is coming (you turn off the TV when the announcer says "We'll be right back").

Avoidance conditioning is hard to extinguish even after the original painful stimulus is gone because there is an element of self-fulfilling prophecy: The avoiding behavior is successful in reducing anxiety.

Punishment Decreases Response

Punishment occurs when the consequences of a behavior decrease the frequency of the behavior.

In technical terms, punishment has only occurred when a behavior is diminished (thus jail time may not serve as punishment to a repeat offender).

Punishment by application vs punishment by removal (penalty)

Punishment by application is the presentation of an aversive stimulus following an operant to decrease the occurrence of the operant.

Punishment by removal is the withdrawal of a positive stimulus to decrease the occurrence of an operant.

Aversive stimuli often do not work well in decreasing an undesirable behavior (ie, they are not effective punishments).

Punishment should be applied immediately and consistently for best results.

Punished behavior often returns when the aversive consequences are no longer presented or even when the controlling stimulus is absent (ie, the offender thinks he can get away with it)..

Punishment does not provide an alternative (desirable) behavior.

Severe aversive stimuli may reduce the frequency of all behavior (producing passivity or timidity).

Primary and Secondary Reinforcers

A primary reinforcer is naturally reinforcing.

Food, water, warmth, sex, etc, are reinforcers for humans and most other animals.

A secondary, or conditioned, reinforcer gain effectiveness by association with other reinforcers.

Money is the biggest example.

Discriminative Stimuli

The environmental cues that accompany conditions under which a behavior is reinforced are called discriminative stimuli (eg, the light in the Skinner box).

Stimulus discrimination occurs when an organism learns which stimuli signal an opportunity for reinforcement (or punishment).

When animals learn that a stimulus signals the potential consequences of a response, we say that the response is under stimulus control.

Stimulus discrimination occurs fastest for stimuli that indicate that a behavior is appropriate (reinforceable) and slowest for stimuli that indicate that a behavior is inappropriate.

As in classical conditioning, stimulus generalization occurs in operant conditioning.

Skinner believed that the antecedents (discriminative stimuli) and consequences (reinforcement or punishment) of behavior determine human actions.

Thus, changing the environment and the pattern of reinforcement can change behavior.

Shaping

A given behavior, even if quite complex, can be brought about by a process called shaping.

Successive approximations of some goal behavior are reinforced.

First a food pellet is given for turning toward the lever; then for moving near the lever; then for touching the lever; then for pressing the lever (bingo).

Without shaping, the goal behavior might not be spontaneously emitted with a high enough frequency to permit reinforcement.

Skinner taught pigeons to play the piano using shaping.

Schedules of Reinforcement

Continuous vs partial reinforcement

The presentation of a reinforcer every time an operant is emitted very efficiently strengthens that behavior.

The rat gets a food pellet for every lever press.

Partial reinforcement occurs when the reinforcer is only presented after some of the instances of the behavior.

Extinction

A learned response that is no longer reinforced tends to disappear.

Partially reinforced behaviors are more resistant to extinction than continuously reinforced behaviors.

The rate of responding is often higher for partially reinforced behaviors.

	Different patterns of reinforcement produce different rates of responding:
	Fixed-Ratio Schedule (FR): the reinforcer is presented after a certain number of responses have occurred. Rats show burst and pause behavior when paid for piece work.
	Variable-Ratio Schedule (VR): the reinforcer is presented after a number of responses have occurred (and the number varies). Rats and humans alike will become addicted to slot machines.
	Fixed-Interval Schedule (FI): the reinforcer is presented after a definite period of time has passed. Rats and humans work differently for a paycheck.
	Variable-Interval Schedule (VI): the reinforcer is presented after a period of time (which varies) has passed. A moderate but steady rate of responding.

Why do reinforcers work?

They may fulfill a biological need (or not)

Certain reinforcers are hard-wired into the brain.

They may follow the Premack principle, ie, we each have a hierarchy of preferred behaviors. Each behavior above can reinforce one below.

They may reduce our disequilibrium. Any behavior that has been restricted can become reinforcing.

Applied Behavioral Conditioning

Strategies to Change Behavior (Skinner's Alternatives to Punishment)

Reinforce an Incompatible Behavior -- choose an alternative that is constructive and incompatible with the problem behavior;

Reinforce sharing (rather than hording)

Stop Reinforcing the Problem Behavior

Don't smile politely at some jerk who wants to waste your time with gossip.

Reinforce the Nonoccurrence of the Problem Behavior

Give a cookie to the child who sits quietly (assuming you want a quiet sitter).

Remove the Opportunity to Obtain Positive Reinforcement

Remove the trouble maker from the attention giving environment ("time out").

Strategies to improve self-control

Precommitment--make an advanced commitment to a long term goal.
Self-Reinforcement--give yourself rewards for staying on track.
Stimulus Control--put yourself in situations where you can work toward your goal; stay out of ones that tempt you away.
Focus on the Delayed Reinforcer--visualize the eventual attainment of your goal.
Observe Good Role Models--stick with others who have achieved the goal you are seeking.

The most effective positive reinforcement:

uses strong reinforcers
is immediate
is always presented at first, then is gradually decreased in frequency
has variety
can employ the Premack principle--a more preferred activity can be used to reinforce a less preferred activity
encourages self-reinforcement

Observational Learning

Learning by watching others is called observational learning.

This process is believed to be active rather than passive.

Albert Bandura (b 1925) asserts that much of human behavior is acquired through observational learning.

In the 1960's, Bandura showed children three versions of a film in which an adult attacked a Bobo doll (an inflatable clown with sand in the base).

The aggressive behavior had three outcomes:

treats

scolding and spanking

no consequences

The children were allowed to play spontaneously with a Bobo doll.

The ones who observed the spanking were less aggressive with Bobo.

When encouraged to show the experimenter all of the things that the adults did, all three groups of kids were equally willing to demonstrate the kicking, pushing, etc.

The expectation of reinforcement or punishment affects the likelihood of imitated behaviors.

Bandura formulated four cognitive processes that interact to bring about observationally learned behavior:

Attention: you must actively observe the behavior in others.
Memory: you must remember the behavior sufficiently to reproduce it.
Ability: you must have the capability of performing the behavior.
Motivation: you must have some reason to perform the behavior.

Bandura and others have summarized the factors that make it more likely that we will imitate the behavior of another:

The behavior is rewarded in others.
You have been rewarded for the behavior.
The behavior is shown by nurturing people.
The behavior is shown by those with power.
The people are similar to us.
The people are perceived to have higher social status.
The behavior is not too easy or too hard.
You lack an alternative behavior.
You are in an unfamiliar situation.

Cognitive Factors in Learning

You might think that cognitive factors in learning have received considerable attention from psychologists, but this is not so. Psychological theories of learning have been dominated by behaviorists until recently.

Tolman's Cognitive Maps

Three groups of rats were placed in a maze.

The first group was reinforced for successful runs and steadily decreased their error rate.

The second group was never reinforced for successfully running continued to make a large number of errors.

The group was not initially reinforced and made errors just like group two. But, as soon as these rats were reinforced, their error rate dropped.

The sudden improvement was interpreted to support the existence of a cognitive map, or mental representation of the maze. This cognitive map was learned without reinforcement.

Kohler's Chimps

On the island of Tenerife during World War I, Kohler studied learning in chimps. He would pose a problem, such as a banana hanging on a string just out of reach.

These creatures showed cognitive learning:

A successful approach was immediately repeated in full (no shaping was necessary).

An unsuccessful approach was rarely tried.

Solutions were often arrived at suddenly after a chimped seemed to sit and ponder over the problem.

These chimps were thought to have insight into the problem.

The solution may have been the result of mental trial-and-error (also a cognitive process).

Return to:

Anthony G Benoit abenoit@trcc.commnet.edu
(860) 885-2386

Revised