Select date

May 2024
Mon Tue Wed Thu Fri Sat Sun

Bias Bias: The Inclination to Accuse People of Bias, by James Thompson

17-6-2019 < UNZ 174 2751 words
 


Early in any psychology course, students are taught to be very cautious about accepting people’s reports. A simple trick is to stage some sort of interruption to the lecture by confederates, and later ask the students to write down what they witnessed. Typically, they will misremember the events, sequences and even the number of people who staged the tableaux. Don’t trust witnesses, is the message.


Another approach is to show visual illusions, such as getting estimates of line lengths in the Muller-Lyer illusion, or studying simple line lengths under social pressure, as in the Asch experiment, or trying to solve the Peter Wason logic problems, or the puzzles set by Kahneman and Tversky. All these appear to show severe limitations of human judgment. Psychology is full of cautionary tales about the foibles of common folk.


As a consequence of this softening up, psychology students come to regard themselves and most people as fallible, malleable, unreliable, biased and generally irrational. No wonder psychologists feel superior to the average citizen, since they understand human limitations and, with their superior training, hope to rise above such lowly superstitions.


However, society still functions, people overcome errors and many things work well most of the time. Have psychologists, for one reason or another, misunderstood people, and been too quick to assume that the are incapable of rational thought?


Gerd Gigerenzer thinks so.


https://www.nowpublishers.com/article/OpenAccessDownload/RBE-0092


He is particularly interested in the economic consequences of apparent irrationality, and whether our presumed biases really result in us making bad economic decisions. If so, some argue we need a benign force, say a government, to protect us from our lack of capacity. Perhaps we need a tattoo on our forehead: Diminished Responsibility.



The argument leading from cognitive biases to governmental paternalism—in short, the irrationality argument—consists of three assumptions and one conclusion:


1. Lack of rationality. Experiments have shown that people’s intuitions are systematically biased.


2. Stubbornness. Like visual illusions, biases are persistent and hardly corrigible by education.


3. Substantial costs. Biases may incur substantial welfare-relevant costs such as lower wealth, health, or happiness.


4. Biases justify governmental paternalism. To protect people from theirbiases, governments should “nudge” the public toward better behavior.


The three assumptions—lack of rationality, stubbornness, and costs—imply that there is slim chance that people can ever learn or be educated out of their biases; instead governments need to step in with a policy called libertarian paternalism (Thaler and Sunstein, 2003).



So, are we as hopeless as some psychologists claim we are? In fact, probably not. Not all the initial claims have been substantiated. For example, it seems we are not as loss averse as previously claimed. Does our susceptibility to printed visual illusions show that we lack judgement in real life?


In Shepard’s (1990) words, “to fool a visual system that has a full binocular and freely mobile view of a well-illuminated scene is next to impossible” (p. 122). Thus, in psychology, the visual system is seen more as a genius than a fool in making intelligent inferences, and inferences, after all, are necessary for making sense of the images on the retina.


Most crucially, can people make probability judgements? Let us see. Try solving this one:



A disease has a base rate of .1, and a test is performed that has a hit rate of .9 (the conditional probability of a positive test given disease) and a false positive rate of .1 (the conditional probability of a positive test given no disease). What is the probability that a random person with a positive test result actually has the disease?



Most people fail this test, including 79% of gynaecologists giving breast screening tests. Some researchers have drawn the conclusion that people are fundamentally unable to deal with conditional probabilities. On the contrary, there is a way of laying out the problem such that most people have no difficulty with it. Watch what it looks like when presented as natural frequencies:



Among every 100 people, 10 are expected to have a disease. Among those 10, nine are expected to correctly test positive. Among the 90 people without the disease, nine are expected to falsely test positive. What proportion of those who test positive actually have the disease?



In this format the positive test result gives us 9 people with the disease and 9 people without the disease, so the chance that a positive test result shows a real disease is 50/50. Only 13% of gynaecologists fail this presentation.


Summing up the virtues of natural frequencies, Gigerenzer says:



When college students were given a 2-hour course in natural frequencies, the number of correct Bayesian inferences increased from 10% to 90%; most important, this 90% rate was maintained 3 months after training (Sedlmeier and Gigerenzer, 2001). Meta-analyses have also documented the “de-biasing” effect, and natural frequencies are now a technical term in evidence-based medicine (Akiet al., 2011; McDowell and Jacobs, 2017). These results are consistent with a long literature on techniques for successfully teaching statistical reasoning (e.g., Fonget al., 1986). In sum, humans can learn Bayesian inference quickly if the information is presented in natural frequencies.



If the problem is set out in a simple format, almost all of us can all do conditional probabilities.


I taught my medical students about the base rate screening problem in the late 1970s, based on: Robyn Dawes (1962) “A note on base rates and psychometric efficiency”. Decades later, alarmed by the positive scan detection of an unexplained mass, I confided my fears to a psychiatrist friend. He did a quick differential diagnosis on bowel cancer, showing I had no relevant symptoms, and reminded me I had lectured him as a student on base rates decades before, so I ought to relax. Indeed, it was false positive.


Here are the relevant figures, set out in terms of natural frequencies



Every test has a false positive rate (every step is being taken to reduce these), and when screening is used for entire populations many patients have to undergo further investigations, sometimes including surgery.


Setting out frequencies in a logical sequence can often prevent misunderstandings. Say a man on trial for having murdered his spouse has previously physically abused her. Should his previous history of abuse not be raised in Court because only 1 woman in 2500 cases of abuse is murdered by her abuser? Of course, whatever a defence lawyer may argue and a Court may accept, this is back to front. OJ Simpson was not on trial for spousal abuse, but for the murder of his former partner. The relevant question is: what is the probability that a man murdered his partner, given that she has been murdered and that he previously battered her.


Accepting the figures used by the defence lawyer, if 1 in 2500 women are murdered every year by their abusive male partners, how many women are murdered by men who did not previously abuse them? Using government figures that 5 women in 100,000 are murdered every year then putting everything onto the same 100,000 population, the frequencies look like this:



So, 40 to 5, it is 8 times more probable that abused women are murdered by their abuser. A relevant issue to raise in Court about the past history of an accused man.


Are people’s presumed biases costly, in the sense of making them vulnerable to exploitation, such that they can be turned into a money pump, or is it a case of “once bitten, twice shy”? In fact, there is no evidence that these apparently persistent logical errors actually result in people continually making costly errors. That presumption turns out to be a bias bias.


Gigerenzer goes on to show that people are in fact correct in their understanding of the randomness of short sequences of coin tosses, and Kahneman and Tversky wrong. Elegantly, he also shows that the “hot hand” of successful players in basketball is a real phenomenon, and not a stubborn illusion as claimed.


With equal elegance he disposes of a result I had depended upon since Slovic (1982), which is that people over-estimate the frequency of rare risks and under-estimate the frequency of common risks. This finding has led to the belief that people are no good at estimating risk. Who could doubt that a TV series about Chernobyl will lead citizens to have an exaggerated fear of nuclear power stations?



The original Slovic study was based on 39 college students, not exactly a fair sample of humanity. The conceit of psychologists knows no bounds. Gigerenzer looks at the data and shows that it is yet another example of regression to the mean. This is an apparent effect which arises whenever the predictor is less than perfect (the most common case), an unsystematic error effect, which is already evident when you calculate the correlation coefficient. Parental height and their children’s heights are positively but not perfectly correlated at about r = 0.5. Predictions made in either direction will under-predict in either direction, simply because they are not perfect, and do not capture all the variation. Try drawing out the correlation as an ellipse to see the effect of regression, compared to the perfect case of the straight line of r= 1.0


What diminishes in the presence of noise is the variability of the estimates, both the estimates of the height of the sons based on that of their fathers, and vice versa. Regression toward the mean is a result of unsystematic, not systematic error (Stigler,1999).


Gigerenzer also looks at the supposed finding that people are over-confidence in predictions, and finds that it is another regression to the mean problem.


Gigerenzer then goes on to consider that old favourite, that most people think they are better than average, which supposedly cannot be the case, because average people are average.



Consider the finding that most drivers think they drive better than average. If better driving is interpreted as meaning fewer accidents, then most drivers’ beliefs are actually true. The number of accidents per person has a skewed distribution, and an analysis of U.S. accident statistics showed that some 80% of drivers have fewer accidents than the average number of accidents (Mousavi and Gigerenzer, 2011)



Then he looks at the classical demonstration of framing, that is to say, the way people appear to be easily swayed by how the same facts are “framed” or presented to the person who has to make a decision.



A patient suffering from a serious heart disease considers high-risk surgery and asks a doctor about its prospects.


The doctor can frame the answer in two ways:


Positive Frame: Five years after surgery, 90% of patients are alive.
Negative Frame: Five years after surgery, 10% of patients are dead.


Should the patient listen to how the doctor frames the answer? Behavioral economists say no because both frames are logically equivalent (Kahneman, 2011). Nevertheless, people do listen. More are willing to agree to a medical procedure if the doctor uses positive framing (90% alive) than if negative framing is used (10% dead) (Moxeyet al., 2003). Framing effects challenge the assumption of stable preferences, leading to preference reversals. Thaler and Sunstein (2008) who presented the above surgery problem, concluded that “framing works because people tend to be somewhat mindless, passive decisionmakers” (p. 40)



Gigerenzer points out that in this particular example, subjects are having to make their judgements without knowing a key fact: how many survive without surgery. If you know that you have a datum which is more influential. These are the sorts of questions patients will often ask about, and discuss with other patients, or with several doctors. Furthermore, you don’t have to spin a statistic. You could simply say: “Five years after surgery, 90% of patients are alive and 10% are dead”.


Gigerenzer gives an explanation which is very relevant to current discussions about the meaning of intelligence, and about the power of intelligence tests:



In sum, the principle of logical equivalence or “description invariance” is a poor guide to understanding how human intelligence deals with an uncertain world where not everything is stated explicitly. It misses the very nature of intelligence, the ability to go beyond the information given (Bruner, 1973)


The key is to take uncertainty seriously, take heuristics seriously, and beware of the bias bias.



One important conclusion I draw from this entire paper is that the logical puzzles enjoyed by Kahneman, Tversky, Stanovich and others are rightly rejected by psychometricians as usually being poor indicators of real ability. They fail because they are designed to lead people up the garden path, and depend on idiosyncratic interpretations.


For more detail: http://www.unz.com/jthompson/the-tricky-question-of-rationality/


Critics of examinations of either intellectual ability or scholastic attainment are fond of claiming that the items are “arbitrary”. Not really. Scholastic tests have to be close to the curriculum in question, but still need to a have question forms which are simple to understand so that the stress lies in how students formulate the answer, not in how they decipher the structure of the question.


Intellectual tests have to avoid particular curricula and restrict themselves to the common ground of what most people in a community understand. Questions have to be super-simple, so that the correct answer follows easily from the question, with minimal ambiguity. Furthermore, in the case of national scholastic tests, and particularly in the case of intelligence tests, legal authorities will pore over the test, looking at each item for suspected biases of a sexual, racial or socio-economic nature. Designing an intelligence test is a difficult and expensive matter. Many putative new tests of intelligence never even get to the legal hurdle, because they flounder on matters of reliability and validity, and reveal themselves to be little better than the current range of assessments.


In conclusion, both in psychology and behavioural economics, some researchers have probably been too keen to allege bias in cases where there are unsystematic errors, or no errors at all. The corrective is to learn about base rates, and to use natural frequencies as a guide to good decision-making.


Don’t bother boosting your IQ. Boost your understanding of natural frequencies.


Print