Is Temperament Testing Really Worth the Effort?
This is another post that’s probably going to upset a few people. It’s about the predictiveness of puppy temperament testing.
Temperament testing has been an important fixture in determining suitability of dogs for service work, therapy work, police and military service, and even for companion dogs.
I started temperament testing in the 1990s in the belief that it was the only responsible way to place a puppy. All of the other good breeders did it, my mentors did it (including those involved with major service dog organizations and the US Department of Defense puppy raising program), so I did it.
But like many things involved in breeding, it’s not necessarily a sound practice simply because everyone is doing it. We also need to look at the science.
Back then, we didn’t have much, if any, science, we just had the best efforts of the best in the business. Now we have some studies to look at, so let’s take a look at what they say.
Do puppy temperament tests actually predict adult behavior?
Service Dog organizations have a great deal invested in the performance of adult dogs, have regimented puppy rearing programs, and are able to collect data and do follow up, so much of the results and studies we have are from service dog organizations.
There are more than a few studies available. Here are some highlights, and you can pour through the references if you really want to get lost in the weeds.
In 1997, a study was performed on 630 eight-week-old German Shepherd puppies born into a service dog program with a follow-up evaluation at 14-19 months. The ability of the testers to predict adult behavior from puppy temperament tests was “negligible and the puppy test was therefore not found useful in predicting adult suitability for service dog work.” In fact, the correlation of behavior from puppyhood to adulthood was “exactly what would be expected by pure chance.” The authors conclude “… adult behaviour cannot be predicted as early as at eight weeks of age. Breeding programs aimed to improve behaviour in dogs may not be based on information collected on tests performed as early as at eight weeks of age.” This study also found that maternal effects are present in puppies, but that effect wanes once the puppies reach full adulthood.
In 2013, a study of 465 puppies in a guide dog program found low predictability between puppy temperament and certification as guide dogs as adults. The most predictive characteristic in the test was not success, but failure.
A study in 2014 evaluated 134 Border Collie puppies at days 2-10, days 40-50, and then again at 1.5-2 years. There was little correlation between puppy evaluation results and behavior at 1.5-2 years. Only exploratory behavior was found to be correlated into adulthood. The study concluded “the predictive validity of early tests for predicting specific behavioural traits in adult pet dogs is limited.” The really interesting thing about this study is that fear in puppies was NOT correlated with fear in adulthood. In fact, the inverse was shown and some of the most fearful puppies ended up being the most friendly adults.
Fearfulness is somewhat predictive at 3 months of age, but prediction accuracy improved with age. The same researchers conducted another study two years later and concluded that none of the tests they performed were predictive of ability to learn specific tasks.
Another guide dog program study concluded that “when applied at 7 weeks of age without an additional criterion, the test has no predictive value regarding future social tendencies.”
In a study of specific AKC breeds, tests were interestingly predictive of breed, the were not, however, predictive of adult temperament. “the puppy temperament scores were unreliable in predicting adult temperament.”
A few characteristics, such as playfulness, have some correlation.
A couple of studies had results that conflicted with those I list above.
This is normal in science, and the responsible way to handle these conflicts is to look at the studies individually for quality and also to look at the evidence as a whole.
In other words, you need to ask yourself if there more evidence supporting a particular conclusion. The truth is found in a preponderance of information, not in a specific single study.
Something else to consider is that a closer look at these few outlying studies show they tend to have smaller sample sizes and the studies with larger sample sizes (which are more reliable as a whole).
There was some correlation in this study between puppy testing for aggression and submissiveness, but lower correlation for responsiveness to training, fearfulness, and sociability. “Overall, we found evidence to suggest substantial consistency (r = 0.43). Furthermore, personality consistency was higher in older dogs, when behavioral assessment intervals were shorter, and when the measurement tool was exactly the same in both assessments. In puppies, aggression and submissiveness were the most consistent dimensions, while responsiveness to training, fearfulness, and sociability were the least consistent dimensions.”
In a study from a South African police dog program, retrieval was highly correlated from puppyhood to adulthood, with other traits not correlating from puppy hood to adulthood, but with correlation from juveniles to adulthood.
A study of 206 German Shepherd dogs in a police dog program showed correlation between certain behaviors at 7 weeks (catch, chase, fetch, and follow a dragged rag) and certification as adults. These, however, are more evident of specific drives those dogs possess and not necessarily of personality or temperament traits.
If testing of temperament in puppyhood is not predictive, when is temperament evaluation reliable?
A 2012 study of guide dog candidates using C-BARQ  criteria (a standardize behavioral assessment) of a whopping 8,000 dogs determined that while the test was not predictive of success, it did allow them to rule out dogs likely to fail when those dogs were tested at 6 and 12 months.
So while evaluation at those ages were still not predictive of success, they were predictive of failure.
So if temperament testing isn’t predictive, what is?
In a recent study, there were things we as breeders can do and can educate/encourage in our puppy homes to stack the decks in favor of better outcomes for our puppies. (This study used C-BARQ evaluation)
Experience or ability of families raising the puppy (aggression toward humans and dogs, fear, and touch sensitivity)
Another dog in the household (lowered aggression toward household members)
Avoidance of traumatic events (fear and aggression)
My Personal Choice
I’ve been reading the research on this for a couple of years. I’ve talked about it a little in some online discussions, but I’ve been hesitant to make any boldly public statements. But here it is now.
Two years ago I took the plunge and followed the science.
I stopped temperament testing.
And the sky hasn’t fallen.
People are happy with their companion dogs and just as successful with their service dog candidates or therapy dog candidates.
Granted, this is anecdotal. My program is a small sample size, I don't have a robust (or even close) data collection method, and I make no claim of my experience being a scientific representation.
Now if you temperament test and are happy with it, that’s great. I know from experience that some customers like it, it makes them feel better.
But if we are honest with ourselves about what the science is saying, for the most part temperament testing doesn’t really mean a whole lot, if anything.
And I decided that instead of taking all of the time and energy to evaluate puppies, write it all up, and communicate all of that to families, I would rather take that time and energy and put it back into working with the puppies.
It’s paying off for me and I hope you find what works best for you.
References and Footnotes
 Wilsson E, PE Sundgren. “Behaviour test for eight-week old puppies—heritabilities of tested behaviour traits and its correspondence to later behavior.” Applied Animal Behaviour Science 58 1998 151–162 https://www.sciencedirect.com/science/article/abs/pii/S0168159197000932  Asher L, Blythe S, Roberts R, Toothill L, Craigon PJ, et al. (2013) A standardized behavior test for potential guide dog puppies: Methods and association with subsequent success in guide dog training. J Vet Behav Clin Appl Res 8: 431–438. https://www.sciencedirect.com/science/article/pii/S1558787813001925  Riemer S, Müller C, Virányi Z, Huber L, Range F. The predictive value of early behavioural assessments in pet dogs--a longitudinal study from neonates to adults. PLoS One. 2014;9(7):e101237. Published 2014 Jul 8. doi:10.1371/journal.pone.0101237 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086890/  Goddard ME, Beilharz RG (1984) A factor analysis of fearfulness in potential guide dogs. Appl Anim Behav Sci 12: 253–265. https://www.sciencedirect.com/science/article/abs/pii/0168159184901187  Goddard ME, Beilharz RG (1986) Early prediction of adult behaviour in potential guide dogs. Appl Anim Behav Sci 15: 247–260. https://www.sciencedirect.com/science/article/abs/pii/016815918690095X  Beaudet R, Chalifoux A, Dallaire A (1994) Predictive value of activity level and behavioral evaluation on future dominance in puppies. Appl Anim Behav Sci 40: 273–284. https://www.sciencedirect.com/science/article/abs/pii/016815919490068X  Robinson, LM, RS Thompson, JC Ha. Puppy Temperament Assessments Predict Breed and American Kennel Club Group but Not Adult Temperament. Journal of Applied Animal Welfare Science. 19:2, 2016.  Scott JP, Beilfelt SW (1976) Analysis of the puppy testing program. In: Pfaffenberger, C.J., Scott, J.P., Fuller, J.L., Ginsburg, B.E., Bielfelt SW, editor. Guide Dogs for the Blind: Their Selection, Development and Training. pp. 39–75  Fratkin JL, Sinn DL, Patall EA, Gosling SD. Personality consistency in dogs: a meta-analysis. PLoS One. 2013; 8(1):e54907. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0054907  Slabbert JM, Odendaal JSJ (1999) Early prediction of adult police dog efficiency - a longitudinal study. Appl Anim Behav Sci 64: 269–288. https://www.sciencedirect.com/science/article/pii/S0168159199000386  Svobodova I, Vapenik P, Pinc L, Bartos L (2008) Testing German shepherd puppies to assess their chances of certification. Appl Anim Behav Sci 113: 139–149. https://www.sciencedirect.com/science/article/abs/pii/S0168159107003000  Duffy DL & JA Serpell 2012 Predictive validity of a method for evaluating temperament in young guide and service dogs. App. Anim. Behav. Sci. 138: 99-109. https://linkinghub.elsevier.com/retrieve/pii/S0168159112000433  Serpell JA, Duffy DL. Aspects of Juvenile and Adolescent Environment Predict Aggression and Fear in 12-Month-Old Guide Dogs. Front Vet Sci. 2016;3:49. Published 2016 Jun 22. doi:10.3389/fvets.2016.00049 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4916180/  In 2003, the University of Pennsylvania developed a behavioral test called C-BARQ, which measures aggression, fearfulness, and a few other behavioral problems in dogs. C-BARQ has become a standard for certain behavrioal studies and U Penn has a database with over 50,000 test results. The use of C-BARQ makes it easier to compare results among studies that use it, however, it is limited in its scope and doesn’t cover a number of qualities a breeder may want to be able to evaluate in puppies or adults. http://vetapps.vet.upenn.edu/cbarq/