Statistics Impact on Modern Society and Climate

by Dr. Tim Ball on May 26, 2011

in Data,History,Philosophy,Theory

Most people know Benjamin Disraeli’s comment,

There are three kinds of lies: lies, damn lies, and statistics.

…but few understand how application of statistics affected our lives in the 20th century. We sense it when everything sort of fits everyone, but doesn’t precisely fit anyone.

Many years ago I monitored development of a housing estate for low-income residents. Planners knew who qualified as residents beforehand because of criteria, so they did a survey to determine the desires and expectations. After people had lived there for a while, a second survey sought their judgment. The response was, “It’s alright, but…” It was a predictable outcome because the planners designed for the average. Chances of any individual requests being included were very small. In any population there is a wide range of individuals, but modern society only accommodates the majority near the middle – that is, those within one standard deviation of the average.

Application of statistics to all elements of our lives is an outgrowth of what is generally called logical positivism. Wikipedia defines it as

…a school of philosophy that combines empiricism, the idea that observational evidence is indispensable for knowledge of the world, with a version of rationalism incorporating mathematical and logico-linguistic constructs and deductions in epistemology.

In simple terms, this means that if you can’t quantify something, it doesn’t exist. It makes mathematics and its practical application, statistics, paramount. Ludwig Wittgenstein conceived the idea at the turn of the 20th century. Wikipedia notes,

Wittgenstein’s influence has been felt in nearly every field of the humanities and social sciences, yet there are widely diverging interpretations of his thought.

Interpretations may diverge, but the influence dominates our world and is at the center of why we have lost our way. The dominance is in the pure logical analysis of life and society.

Growth of a mathematically dominant view of the world actually started earlier with the idea that all aspects of the world could be quantified. Rene Descartes was an advocate and created the Cartesian grid system, the basis of the computer climate models. Antonio Damasio examines the problems of a purely logical view of the world from his knowledge and experience as a neurosurgeon in his book titled Descartes Error. He noticed a pattern of behavior change following certain brain injuries. He begins with a classic case from the 19th century in which the abstract part of a man’s brain was damaged and he switched from being popular, personable, and very human to being personally and socially intolerable. Damasio argues that if the pattern recognition or rational side of the brain is damaged the person can remain human, however they lose their humanity and effective decision making if the abstract or emotional side is lost. It is summarized on the cover as follows:

Far from interfering with rationality, his research shows us, the absence of emotion and feeling can break down rationality and make wise decision making almost impossible.

Most people, about 80%, are proud to say they can’t do math – for them the abstract and emotional side dominates. Of the remaining 20%, the rational dominates – and in some, almost completely. They are the people who are comfortable with, and advocate, logical positivism. A small percentage has great facility with numbers, and while they have positive contributions to society, we are in trouble if they become the leaders.

At the beginning of the 20th century, statistics began to have application to society. In universities, previously divided into the Natural Sciences and the Humanities, a new and ultimately larger division called the Social Sciences emerged. Many in the Natural Sciences view Social Science as an oxymoron and not a ‘real’ science.

In order to justify the name, social scientists began to apply statistics to their research. A book titled Statistical Packages for the Social Sciences (SPSS) became the handbook for students and researchers. Plug in some numbers and the program provided results. Few understood the underlying mathematical principles or assumptions, so they simply reinforced the Garbage In Garbage Out (GIGO) adage. Derision for this work is summarized in the comment that Social Science proves scientifically what everybody already knew. Unfortunately, it got more credibility than it deserved, because as Pierre Gallois said:

If you put tomfoolery into a computer, nothing comes out but tomfoolery. But this tomfoolery, having passed through a very expensive machine, is somehow ennobled and no-one dares criticize it.

Up until the 1960s, averages dominated. This was true in climate research when individual station data was recorded and average conditions calculated and published. Few understood how meaningless a measure it was. A farmer phoned one year and asked about the chances of an average summer. He was angry when I said virtually zero because he didn’t understand that ‘average’ is a statistic. If he had asked whether it would be above or below average a more informed and useful answer is possible.

The next development came from a need to make predictions for planning and social engineering as modern postwar societies evolved. It was the evolution of simple trend analysis, a pattern that still dominates as the recent assumption that house prices and stock markets would continue to trend upward, proves. What’s interesting is how this mentality persists despite recent evidence of downturns or upturns. The application of trends to climate data began in the 1970s with the prediction of a coming ice age as temperatures declined from 1940. When the temperature turned around in the mid-1980s we were told it would continue unabated. In addition, they now knew human CO2 was the cause and since it would continue to increase because of human additions the upward trend was certain to continue. Like all previous trends it did not last, as temperatures trended down starting in 2000.

A major problem with determining a trend is it is pre-determined by the beginning and endpoint of the record used. Look at this plot by John McLean of temperatures from Greenland ice cores for the last 11,000 years.

Greenland temperatures (GISP ice cores)

Present temperatures are on the left. The sudden rise of temperature 10,500 years ago was the onset of a long period of temperatures warmer than today called the Holocene Optimum. If you start your trend 3000 years ago you can argue the world has cooled significantly since. Pick any segment of the graph and make any argument including that the trend from 10,500 years ago to today is warming. However, you cannot forget that this is one segment of a much larger pattern. One pattern is obvious and that is how the temperature fluctuates from decade to decade and century to century. The Intergovernmental Panel on Climate Change (IPCC) claim the 0.6°C ± 0.2°C warming of the last 130 years is unnatural, but look at the changes in this record. Then consider the statistical range of error of 66%. No pollsters would ever consider a poll with such an error factor.

The one statistical measure rarely considered in climate research is variation: that is, how much the factor being considered varies about the average. A major part of the reason it is not considered is because it is eliminated by creating a moving average that smoothes out the curve and makes the trends more visible. Look at the following chart for continental US temperatures according to NASA GISS. The black line is the annual average temperature, the red line a 5-year moving average. This graph was later revised when an error was detected. It turned out 1934 was the warmest year in the record – not 1998, thus changing the trend. The error conveniently and incorrectly made the 1990s the warmest period in the record.

US temperature anomaly

For year-to-year living and business, the variability is very important. Farmers know you don’t plan next year’s operation on last year’s weather, but reduced variability reduces risk considerably. Notice how the variability was generally lower during the cooling phase from 1940 to 1980, and especially from 1950 to 1980. The increased variability after 1980 was exploited to increase people’s fear about the brief warming trend up to 2000.

Statistician Steve McIntyre, who showed the misuse of techniques in the infamous “hockey stick”, used the following quote from the Wegman Report of 2006 in his presentation to the recent Heartland Conference in Montreal.

A cardinal rule of statistical inference is that the method of analysis must be decided before looking at the data. The rules and strategy of analysis cannot be changed in order to obtain the desired result. Such a strategy carries no statistical integrity and cannot be used as a basis for drawing sound inferential conclusions.

Selecting the statistical method first is fine, but in most weather and climate research even if you do there is insufficient data. The temptation to find one that brings a necessary result is high, especially if it is a politically motivated result. In addition, the lack of data requires more complicated techniques to get something out of very little. A professor of statistics told me that a general rule is that the more sophisticated the statistical techniques applied, the weaker the data. As the US National Research Council Report of Feb 3, 1999 noted,

Deficiencies in the accuracy, quality and continuity of the records place serious limitations on the confidence that can be placed in the research results.

These inadequacies allow manipulation of the data to achieve the result you want. The classic example is the global average annual temperature. In 2008, two agencies gave distinctly different results ostensibly from the same world data set. One, NASA GISS, said it was the second warmest year on record; the other, NOAA, said it was the seventh warmest. This is not surprising, because the NASA GISS data consistently shows warmer modern conditions than any other agency. It also confirms Evan Esar’s observation,

Statistics: The only science that enables different experts using the same figures to draw different conclusions.

Most people think the Intergovernmental Panel on Climate Change (IPCC) makes predictions of future climate. They don’t; they produce what they call “scenarios”. These are a series of possible future trends in climate tempered by what the world economy is going to do. Through this approach they manage to combine all the most serious limitations of statistics by a severe lack of data. Worse, they combine the supposedly precise statistics of the natural sciences with the vague application of the Social Sciences. They assume human-produced CO2 is causing global warming and the amount will increase as economies develop. This means that all the scenarios trend up. Presumably the lowest scenario trend is if we reduce CO2 production by legislation, but what if it reduces because the current recession becomes a depression?

Ironically, economic forecasts underscore a major difference between statistics in the natural and social sciences. They are part of the differences Damasio is identifying. How do you quantify human behavior? How do you predict how people will react? A simple definition of science is the ability to predict. A social science prediction invariably invalidates itself because people respond to a prediction by changing their behavior.

Perhaps the sarcasm that summarizes Disraeli’s view best is the comment that 78% of all statistics are made up on the spot.