Dog Training Methods: Positive Reinforcement

Lee Kelley
Dec 8, 2015
8 min read

When it Comes to Learning, Are Dogs Really No Different Than Lab Rats?

A Brief HIstory of Positive Reinforcement

Most positive trainers advertise their methods as being based on “the science of how animals learn,” referring to a scientific discipline known alternately as learning theory, behavioral science, or behavior analysis, which is based primarily on Ivan Pavlov's work on salivation in dogs (1902), Edward Thorndike's "law" of effect (1905), and B. F. Skinner's "operant conditioning" (1938).

Pavlov was a gastroenterologist, doing research on salivation in dogs, when he read a paper by Sigmund Freud about neurotic behavior in humans. Pavlov wondered if it were possible for animals, dogs in particular, to exhibit neurotic behaviors as well.

In his most famous study, Pavlov noticed that any object or event that the dogs associated with food seemed to make them salivate. To test this, he used a neutral auditory stimulus. either from a metronome, a harmonium or a buzzer (he never actually used a bell). The dogs were kept in place, standing up, while restrained in harnesses to keep them from sitting or lying down, unable to relax. Each dog had a tube strapped to its mouth to gather saliva for the purposes of measurement.

Whenever Pavlov "fed" the dogs (he actually sprayed meat powder into their mouths), he also provided the auditory stimulus at the exact same tim. After a number of repetitions, over several days or longer, he again provided the auditory stimulus but this time he didn't spray the meat powder into their mouths. The dogs salivated anyway, associating the sound with the "food."

A few years later Edward Thorndike began experimenting with cats. He fasted the cats until they were approximately 2/3s their normal body weight, then placed them inside puzzle boxes, designed so that the cats couldn't get out unless they accidentally triggered a mechanism inside. Once the cat triggered the mechanism, it was put back in the box to see how long it took for him to escape again.

Thorndike found it took numerous repetitions before the cat learned to trigger the mechanism on his own. He called this "the law of effect,"in which he stated that experiences followed by pleasurable outcomes tend to be repeated while those followed by unpleasant outcomes tend to be extinguished. (Of course for the cats the pleasurable outcome was escaping from an unpleasant situation, which, oddly enough, is Sigmund Freud's famous definition of pleasure.)

B. F. Skinner, who expanded on Thorndike's work, believed it wasn't necessary to bring feeling states like "pleasure" into explanations of learning. He felt all we needed to know was how often a new behavior was repeated and whether it could be paired, statistically-speaking, with a prior stimulus.

Skinner began his experiments in the mid-1930s. Like Thorndike, he also fasted animals (rats and pigeons) to 2/3s their body weight and put them inside a puzzle box, but of a different kind. Instead of an escape mechanism it contained a lever and a light. When the animal learned to press the lever when the light came on, a morsel of food was dropped into the box. He called this "operant conditioning," because the animals were supposedly "operating on their environment."

Skinner said there were 4 quadrants to learning: positive & negative reinforcement, and positive & negative punishment. Positive and negative reinforcement increase the likelihood that a behavior will be repeated (not necessarily learned), while positive and negative punishment tend to increase the likelihood that a behavior will be extinguished. Remember, since Skinner didn't believe it was necessary to know how an animal is feeling, positive reinforcement doesn't have anything to do with whether or not the animal is having a positive experience. The word "positive" only denotes that something has been added while "negative" denotes that something has been subtracted or removed.

So when "positive trainers" say they use only "positive" methods, it actually has nothing to do with whether or not your dog is having a positive experience when she's being trained. It's also important to note that, just as with Pavlov's dogs and Thorndike's cats, Skinner's form of conditioning, which relied heavily on a statistical analysis, took numerous repetitions. Plus Skinner had to add some fairly complicated variations to the conditioning process in order for the learning process to hold.

Unnecessarily Complicated

There's a vast difference between how dogs learn new behaviors in a controlled setting, such as a laboratory, and how they learn in a puppy class. We don't strap our puppies into harness and put tubes into their mouths to measure their saliva. Neither do we place them inside puzzle boxes or operant conditioning chambers. Still the idea that animals learn through repetition seems inescapable.

But is it really?

No. Modern researchers have found that learning takes place through a process called pattern recognition, which is why positive reinforcements only work—that is they only have a genuine, lasting effect—when you change the pattern of reward. But knowing how to do that pretty much requires a college degree, and lots and lots of time. Here are just a few of the contingencies used by behavioral scientists:

* fixed ratio, * continuous ratio, * fixed interval, * variable interval, * variable ratio, * differential reinforcement of incompatible behavior, * differential reinforcement of other behavior, * differential reinforcement of low response rate, * differential reinforcement of high rate, etc, etc, etc.

And these are considered “simple” reinforcement schedules!

Dr. Ian Dunbar, a figurehead of the positive training movement, writes on his blog: "The first gift that we can give to all animal owners, parents and teachers is to simplify the ridiculously ambiguous and unnecessarily complicated and confusing terminology. Second, let’s simplify the underlying theory by going back to Thorndike’s original premise—that behavior is influenced by [its] consequences."

Karen Pryor, a former dolphin trainer (who has become an expert on dog training, though she's not a dog trainer), writes, “Casinos, believe me, use the power of the variable ratio schedule to develop behaviors, such as playing slot machines, that are very resistant to extinction, despite highly variable and unpredictable reinforcement.”

Are we training our dogs for obedience or turning them into gambling addicts?

At about the 2:00 mark in the video above Skinner proudly compares conditioning to addictive behavior.

A Salience Detector

One of the biggest problems with our current understanding of what motivates learning is that animals supposedly learn new behaviors because a neurotransmitter called dopamine creates a feeling of well-being in connection with an external reward, and that even the anticipation of a reward causes a release of dopamine. At least, that was the old view. And, unfortunately, it's one that many positive trainers and behavior analysts still believe in.

Here’s the problem though: in testing this idea directly on the brains of various animals, researchers have found an interesting set of anomalies.

In his paper “Dopamine and Reward: Comment on Hernandez et al. (2006),” Neuroscientist Dr. Randy Gallistel, of Rutgers University, writes: “In the monkey, dopamine neurons do not fire in response to an expected reward, only in response to an unexpected or uncertain one, and, most distressingly of all, to the omission of an expected one.” [Emphases mine.]

So missing out on a reward is pleasurable? How could that be?

In another article, “Deconstructing the Law of Effect,” Gallistel poses the problem of learning from an information theory perspective, contrasting Edward Thorndike’s model of a feedback system with a feedforward model. It’s well-known that shaping animal behavior via operant or classical conditioning requires lots and lots of time and repetition. But in the feedforward model learning takes place instantly, in real time.

Why the difference? And is it important?

I think so. Which is more adaptive, being able to learn a new behavior on the fly, in the heat of the moment, or waiting for more and more repetitions of the exact same experience to set a new behavior into place? An animal that has to think things through in a linear fashion doesn't stand much of a chance of surviving long enough to pass his genes on to the next generation. But an animal who's responding to rapidly changing patterns in the environment can learn new behaviors instantly, on the fly, giving him a much better chance.

What Really Motivates Learning?

We’re now discovering that the real purpose of dopamine is to motivate us to gather new information about the outside world quickly and efficiently. In fact dopamine is released during negative experiences as well as positive ones! It's not just released when we eat ice cream or fall in love, it's also released when we drink sour milk or stub our toes. In the first category it's telling us, "This is good. Remember this." In the second it's saying, "This is bad! Don't do that again!"

This adds further importance to the idea that learning is not as much about pairing behaviors with their consequences, or learning by cause-and-effect. It's about paying close attention to changes in the environment: the bigger the changes, the more dopamine is released, and, therefore, the deeper the learning. And I've found a simple way to test this idea: do things backwards.

When I train a puppy to sit, for example, I'll show her a treat, then move it around in a certain way so that she finds she can't grab it or get it into her mouth. Once she stops trying and sits, on her own, I give her the treat. Then, after she's already eating it, I finally say "Sit!" This means I'm giving the command after the puppy has already obeyed it. Again, here's the sequence: 1) I show the dog a treat, 2) I move it around so she can't get it, 3) she sits, then 4) I give her the treat (while she's already sitting), then) 5) I say "Sit."

Here's what happens on the 3rd or 4th time I do this. Without doing the rest of sequence, and without even showing the dog a treat. I simply, out of nowhere, say "Sit," then wait a fraction of a second or two, and the puppy sits. Remember, during each previous iteration I gave the command after the puppy was not only sitting, but while she was actually eating the treat.

Why does this happen?

It happens because learning in animals doesn't take place by making a mental association between a behavior with a reward. It happens unconsciously and automatically through pattern recognition.

Mind you, the puppy hasn't yet learned to sit on command under any and all circumstances. You have to repeat this process a few times in different locations, at different times of day, etc. But the point is, by doing things backwards I've effectively proven that dogs don't learn through association, which, remember, is said to be one of the primary features of positive training.

Dr. Gallistel again: “...behavior is not the result of a learning process that selects behaviors on the basis of their consequences ... both the appearance of ‘conditioned’ responses and their relative strengths may depend simply on perceived patterns of reward without regard to the behavior that produced those rewards.” (“The Rat Approximates an Ideal Detector of Changes in Rates of Reward: Implications for the Law of Effect,” Journal of Experimental Psychology: 2001.) [emphasis mine]

So when "positive" trainers say their training is based on the "science of how animals learn," it's not exactly true. It would be if we were still living in the 1930s. But the actual, 21st Century, science of learning is radically different from what most "positive" trainers believe it is.

Granted, this is an honest mistake on their part. But it is a mistake.

LCK

“Life Is an Adventure—Where Will Your Dog Take You?”

#positivereinforcement #pavlov #skinner #thorndike #puppytrainingnyc #nycdogtrainers #nycdogtraining #bestdogtrainerinnewyork #dogtrainingnyc