Applied statistics

From Citizendium
Revision as of 04:44, 15 January 2010 by imported>Nick Gardner (→‎Financial misinterpretation)
Jump to navigation Jump to search
This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
Tutorials [?]
 
This editable Main Article is under development and subject to a disclaimer.

The term applied statistics is used both to denote the work of professional statisticians, who are responsible for the production and presentation of statistics, and to refer to the methods of statistical interpretation that are undertaken by professional statisticions, by professional users of statistics and by the general public. It is an activity that makes use of the results of statistics theory but is distinct from the activity of developing that theory. It has made major contributions to scientific research and to the conduct of operational activities in a variety of fields. It has provided both a familiar source of information and a notorious source of misinformation. Mistakes of interpretation commonly arise from misplaced confidence in intuitive judgements, including errors by medical and legal professionals that have led to false diagnoses and wrongful imprisonment. Deliberate misinterpretation of statistics by politicians and marketing professionals is so much a popular perception that its genuine use is often treated with suspicion. However, its use is generally unavoidable, and its misinterpretation can be avoided given a grasp of elementary mathematics, the readily understood rules of chance, and basic verbal logic.

Overview: the basics

Statistics are observations that are recorded in numerical form. It is essential to their successful handling to accept that statistics are not facts and therefore incontrovertible, but observations about facts and therefore fallible. The reliability of the information that they provide depends not only upon their successful interpretation, but also upon the accuracy with which the facts are observed and the extent to which they truly represent the subject matter of that information. An appreciation of the means by which statistics are collected is thus an essential part of the understanding of statistics and is at least as important as a familiarity with the tools that are used in its interpretation.

The basic laws of chance from which much of statistics theory has been derived are little more than a formalisation of intuitive concepts. Combined with the use of advanced mathematics, they have been used to create a toolbox of concepts and theories that are available for the use of statistics in the solution of problems arising in other disciplines.

The Statistics profession

The American Statistical Association defines a professional statistician as an expert in producing trustworthy data; analyzing data to make their meaning clear; and, drawing practical conclusions from data. [1] Professional statisticians are normally qualified in mathematics and are normally familiar with the underlying mathematics of statistics theory but their professional activity involves them in the use of the results that theory, rather than in the development of those results. They are engaged in activities that include the compilation of national accounts and the design, operation and analysis of:-   social, industrial, environmental and occupational surveys;   medical trials and experiments, and health statistics;   industrial quality assurance systems;   financial risk management systems; and,   marketing pilots and programmes.

The International Statistics Institute has adopted a "Declaration on Professional Ethics" that treats the provision of well-grounded information as a service to society. It requires professional statisticians to "communicate their findings, for the benefit of the widest possible community"; "not engage or collude in selecting methods designed to produce misleading results, or in misrepresenting statistical findings by commission or omission"; not to "accept contractual conditions that are contingent upon a particular outcome from a proposed statistical inquiry"; and to "take appropriate measures to prevent their data from being published or otherwise released in a form that would allow any subject's identity to be disclosed or inferred"[2]. Statisticians' professional associations are listed on the external links subpage.

Major applications of statistics

Statistical analysis has been widely used for scientific research, drawing upon both published statistics and the results of surveys and laboratory experiments, and it has enabled major scientific advances to be made in subjects as diverse as medicine and criminology. It has also had operational uses in the fields of business decision-making. [3], climatology[4], metrology [5], cost/benefit analysis[6], and finance[7]. For those operational applications it has often had to draw upon theoretical knowledge and professional judgement, as well as upon records of past experience. As the distinguished economist Frank Knight observed in 1921, "business decisions ... deal with situations which are far too unique, generally speaking, for any sort of statistical tabulation to have any value for guidance. The conception of an objectively measurable probability or chance is simply inapplicable."[8] - and drew a distinction between risk, which can sometimes be estimated objectively; and uncertainty, which never can. Business decisions necessarily involve the use of subjective estimates, which can be combined using a "decision tree" [9] to produce a probability estimate for each of the range of possible outcomes, enabling the values of those outcomes to be combined to yield a "net present expected value"[10]. The estimation of net present value in that way, is an essential feature of cost/benefit analysis, and in those and other applications it may be necessary to base decisions upon a combination of "aleatory" (chance-related) and "epistemic" (cognitive) factors. In the practice of risk management, in particular, it is important not to neglect epistemic factors. The data used for statistical quality control[11] are usually the results of tests on samples drawn from a batch of the products of a manufacturing process and its purpose is to decide whether to accept that batch. The normal criteria for acceptance are designed to ensure, at a specified level of confidence that the quality of the batch does not depart by more than a specified percentage from a target level (but in cases where the costs of departures depend upon their magnitude the more complex Taguchi criteria [12] is sometimes adopted). Statistical analysis has also been used to devise economical sampling methods[13]

The collection of statistics

The methodology adopted for the collection of observations has a profound influence upon the problem of extracting useful information from the resulting statistics. That problem is at its easiest when the collecting authority can minimise disturbing influences by conducting a "controlled experiment"[14]. A range of more complex methodologies (and associated software packages) referred to as "the design of experiments" [15] is available for use when the collecting authority has various lesser degrees of control. The object of the design in each case is to facilitate the testing of an hypothesis by helping to exclude the influence of factors that the hypothesis does not take into account. When extraneous influences cannot be eliminated by physical exclusion, their effects can sometimes be estimated by purely analytical techniques such as the "analysis of variance"[16] For example, the attitudes of the statistics authorities to extraneous influences has usually been passive, and the testing of economic hypotheses has used a version of the analysis of variance termed "econometrics"[17] (sometimes confused with economic modelling, which is a purely deterministic technique).

The taking of samples[18] reduces the cost of collecting observations and increases the opportunities to generate false information. One source of error arises from the fact that every time a sample is taken there will be a different result. That source of error is readily quantified as the sample's "standard error", or as the "confidence interval" within which the mean observation may be expected to lie[19]. That source of error cannot be eliminated, but it can be reduced to an acceptable level by increasing the size of the sample. The other source of error arises from the likelihood that the characteristics of the sample differ from those of the "population" that it is intended to represent. That source of error does not diminish with sample size and cannot be estimated by a mathematical formula. Careful attention to what is known about the composition of the "population" and the reflection of that composition in the sample is the only available precaution. The composition of the respondents to an opinion poll, for example, is normally chosen to reflect as far as possible the composition of the intended "population" as regards sex, age, income bracket etc. The remaining difference is referred as the "sample bias"'.

The use by statisticians of the term "population" refers, not to people in general, but to the category of things or people about which information is sought. A precise definition of the target population is an essential starting point in a statistical investigation, and also a possible source of misinformation. Difficulty can arise when, as often happens, the definition has to be arbitrary. If the intended population were the output of the country's farmers, for example, it might be necessary to draw an arbitrary dividing line between farmers and owners of smallholdings such as market gardens. Any major change over time in the relative output of farm products by the included and excluded categories might then lead to misleading conclusions. Technological change, such as the change from typewriters to word processors has sometimes given rise to serious difficulties in the construction of the price indexes used in the correction of GDP for inflation[20]. Since there is no objective solution to those problems, it is inevitable that national statistics embody an element of judgement exercised by the professional statisticians in the statistics authorities.

Sources of statistics

Published statistics are widely used for research into subjects such as economics, health and demography. A large volume of time-series statistics is published by national statistics authorities. In the United States, for example, statistical series are published regularly by over 70 federal agencies[21]. There are also a number of databases and collations of national statistics published by international agencies such as the United Nations [22] and the OECD [23], most of which are available on the internet [24]. Although a number of attempts have been made to introduce common standards, nationally-published statistics are seldom directly comparable, except after extensive adjustments [25].

Many of the published series involve the quarterly or monthly collection of data concerning a specified topic such as an industry, a disease or a category of crime or of economic activity, often by a comprehensive survey (and sometimes by sample surveys) of holders of the relevant data. Another type of source, that has been of value in medical, social and criminological research, has been the occasional collection of comparative data from a group of people born in a particular place on a particular date, by a technique termed "birth-cohort study" (such as the study of the people born in Britain in one week in April 1970 [26], the study of people born in Philadelphia in 1958[27], and others[28])

Links to national statistics agencies are available on the external links subpage.

Adjustments, reliability, accuracy and bias

Those who regard statistics as being on a par with the proverbial "damned lies"[29] and those who believe them always to be totally objective estimates, are both mistaken. The concept of an objective statistic is a mathematical abstraction and although some, such as some financial statistics, approximate to that ideal, most published and unpublished statistics embody some element of subjective judgement. The embodiment of subjective adjustments in published statistics for the purpose of increasing their usefulness[30] is the responsibility of professional statisticians whose competence and objectivity is usually subject to oversight by an independent body (politicians in the major democracies have seldom had any influence upon the collection and publication of national statistics, and many countries have sought to allay suspicions to the contrary by delegating those functions to public bodies that are free from possible government influence.)[31].

It has seldom been feasible to assess the accuracy of published statistics, and indications of their reliability are available only when a variable is estimated from more than one source or when revisions are made. The substantial discrepancies between police-reported and victim-reported crime rates [32], for example, are presumably the result of differences in the reliability of the two methods, and the differences between income-based and expenditure-based estimates of GDP [33] may be similarly attributable. The major revisions that have been made to early estimates of GDP [34] also provide an indication of the reliability to be expected of such estimates.

The commonest cause of bias in published statistics has been faulty sampling. An early example was the "Lanarkshire Milk Trial", which yielded a perverse effect of giving milk to school children because teachers gave the milk to sickly rather than healthy children[35]. Similar errors that did not produce perverse results may have gone undetected.

Statistical inference

Although statistics is sometimes thought of as a branch of mathematics, some of its findings can be successfully interpreted by verbal inference, and there are others that also require the use of the rules of chance that are set out in paragraph 1 of the tutorials subpage). The development of statistics theory that has enabled more complex statistical problems to be tackled, has been the work of mathematical geniuses such as Bernoulli, Laplace and Pascal, but the skills required for the effective use of statistics are different from those required for the understanding of the mathematical derivation of their theorems. Statistics theory is mainly concerned with the special circumstances under which events are governed solely by chance, without any influence from human action or recognisable natural forces, whereas applied statistics is usually concerned with its use under less restricted circumstances. Also, the terminology of statistical theory attaches precise meanings to some everyday words that may not always be applicable under less restricted circumstances. When, for example, a degree of statistical confidence is expressed in the statement that the strength of a type of steel will not fall below a stated level, that degree of confidence may apply only to certain circumstances - and not, for example, under conditions of widely varying temperatures. Thus the successful user of statistics has to combine an awareness of the theoretical tools of inference that are available with an appreciation of the extent to which they can safely be applied to a particular problem - if indeed they can be so applied, bearing in mind the financial disasters that have resulted from the mistaken reliance upon statistics in situations containing deterministic risks. [36]. The user who plans to employ those tools for the analysis of data must also be prepared to spend a good deal of time acquiring a grasp of the relevant theorems of statistics theory, and mastering the intricacies of the free statistical software that is available for that purpose. Managers who supervise such work, and users of its application, may seek to be excused from such expenditure of effort, but cannot escape responsibilty for acquiring an understanding of statistical concepts that is at least sufficient for an awareness of the limitations of such analysis. And, as the statistician, M J Moroney has emphasised, there can never be any question of making a decision solely on the basis of a statistical test: an engineer doing a statistical test must remain an engineer, an economist must remain an economist, a pharmacist a pharmacist[37].

A major contribution of statistics theory to the interpretation of results obtained from a sample is the quantification of the intuitive concept of "significance" in a way that enables an objective answer to be given to the question of how likely it is that, what might appear to be information, is really only a matter of chance (although the way that question is usually put by statisticians is "whether the result is significant at the 5 per cent level"). If - and only if - it can be established by other methods that the sample used was not biassed, then one of a variety of statistical tests can be used to answer that question[38]. When established, the conclusion is best reported in jargon-free English, using a phrase such as "this result could arise by chance once in twenty trials". Equally useful is its quantification of the concept of the term "confidence", enabling an objective answer to be given to a question such as "how confident are we that the toxicity of a drug does not exceed a stipulate level" (or that a structure can bear at least a stipulate load, or that a variable lies within a stipulated confidence interval). Among the most powerful of the techniques of statistical analysis is the use of "correlation" (sometimes termed "regression") to explore a relationship when the available evidence is subject to errors that are attributable solely to chance. It could be used, for example, to estimate the average constant of proportionality in a (hypothesised) linear relation between IQs and examination marks. In a more complex case it is used to explore the relationship between household income and household saving, taking account of other factors believed to affect that relationship. But, besides being very powerful, regression methods are especially prone to the production of false information. Success in interpreting the data requires the tackling of problems that are often more difficult than the statistical manipulation[39].

Statistical misinterpretation

Probability blindness

The human brain is so well adapted to intuitive judgements - of speed, distance, the pitch of a note etcetera - that it is tempting to assume that it can also make intuitive probability judgements. In fact, such judgements are usually wide of the mark[40] [41]. For example, most people will agreed that it is intuitively very unlikely that two members of a group of two dozen people will have the same birthday, but in fact it is not at all unlikely[42]. The experimental psychologists, Amos Tversky and Daniel Kahneman have identified a variety of ways in which the use of "heuristics" in place of analysis distorts peoples' interpretation of numerical data[43]. The "availability heuristic", for example, leads to overestimates of the incidence of newsworthy events such as rail disasters, and underestimates of more commonplace occurrences such as road accidents. In a well-known series of experiments it was found that subjects tended to overestimate the likelihood of death from rare causes, underestimate likely causes, and be influenced by drama or vividness [44]. Similarly, there are commonly gross discrepancies between the fear of crime and its true incidence, and a widespread conviction that the crime rate is rising at times when it is stable or falling[45].

Medical misinterpretation

Errors in the reporting of medical research have not been uncommon. Researchers at the University of Girona in Spain, found that 38% of a sample of papers in Nature, and a quarter of those sampled in the British Medical Journal — two of the world's most respected journals — contained one or more statistical errors, some resulting in non-significant findings being misrepresented as significant [46]. In another study an eminent Greek statistican has estimated that most medical research findings are false, and that they may often be simply measures of a prevailing bias [47]. It is also believed that misinterpretation of the results of medical tests is not uncommon. When the following question was put to students and staff of the Harvard Medical School "If a test of a disease that has a prevalence rate of 1 in 1000 has a false positive rate of 5%, what is the chance that a person who has been given a positive result actually has the disease." - 45 per cent gave the intuitive answer of 95%[48]: the true answer is 2%[49].

Legal misinterpretation

There have also been cases of statistical misinterpretation by expert witnesses and lawyers, some of which have resulted in serious injustice[50]. In 1999 the British Solicitor, Sally Clark was convicted of murdering her two babies on the evidence of an expert witness who told the jury that the odds against the deaths being natural were one in 73 million; a figure he obtained by taking the square of the odds against a single cot death. The UK's Royal Statistical Society issued a formal statement that the 1 in 73 million figure "...would only be valid if SIDS (cot death) cases arose independently within families, an assumption that would need to be justified empirically. Not only was no such empirical justification provided in the case, but there are very strong a priori reasons for supposing that the assumption will be false"[51]. Wrongful convictions in rape cases have also arisen from a confusion between the probability of effect given cause and that of cause given effect that is known as the "prosecutor's fallacy"[52].

Financial misinterpretation

Financial analysts have been known to use "survivor bias" to mislead potential clients by the use of statistics that demonstrate the benefits they would have enjoyed had they previously invested in their currently recommended portfolio - thus drawing attention away from previous recommendations that had been withdrawn because they had done badly. Another example of survivor bias in the interpretation of financial statistics was study of financial returns which covered only firms existing at the end of the period under review, thereby excluding those that had failed in the interim, that showed shares to have earned a return of 8.8 percent when the true return had been 3.8 per cent[53].

Probably the most damaging consequence of statistical misinterpretation was the financial crisis that followed the acceptance by many of the world's financial institutions of the ill-founded risk assessments that were issued in the early 21st century by the credit-rating agencies [54]. The agencies had assumed that the risks would be due to random fluctuations and not to human behaviour, and that the underlying probability distribution would be the same as in a past period [55]. Neither assumption was supported by the available evidence [56].

Notes and references

  1. What Do Statisticians Do?, The careers center of the American Statistical Association
  2. Declaration on Professional Ethics, International Statistics Institute, August 1985 [1]
  3. Kent Holzinger: Decision Making Under Uncertainty: Statistical Decision Theory, University of Connecticut, 2007
  4. Climate adaptation: Risk, Uncertainty and Decision-making UK Department of the Environment, Food and Rural Affairs, May 2003
  5. Bayesian Metrology: Overview Topics, Statistical Engineering Division, National Institute of Standards and Technology, May 2001
  6. Appraisal and Evaluation in Central Government, HM Treasury, UK Stationery Office
  7. David Harper: Introduction to Value at Risk, Investopedia.com
  8. Frank Knight: Risk, Uncertainty and Profit Part III, Chapter IV, paragraph 47, Boston, MA: Hart, Schaffner & Marx; Houghton Mifflin Co, 1921
  9. Decision Tree: How to understand it, The Quality Toolbook, 2007
  10. For an explanation of net present expected value, see the article on net present value
  11. Introduction to Statistical Process Control, Statit Software, Inc, 2007
  12. Taguchi Loss Function Visual and Equation, Marc Timothy Smith Elsmar Cove Forums 2007
  13. What is Sequential Sampling?, The Quality Toolbook, 2007
  14. In a controlled experiment, a "control group", that is in all relevant respects similar to the experimental group, receive a "placebo", while the experimental group receive the treatment that is on trial
  15. Valerie Easton and John McCall: The Design of Experiments and ANOVA. STEPS 1997
  16. Anova Manova"
  17. Econometrics 2005
  18. Valerie Easton and John McCall: Sampling, STEPS 1997,
  19. Robin Levine-Wissing and David Thiel, Confidence Intervals, AP Statistics Tutorial
  20. See the article on Gross domestic product
  21. Fedstats (an index of statistics published by US federal agencies)
  22. UNdata (United Nations statistics database)
  23. OECD Statistics Portal
  24. International Statistics Guide, University of Maryland Library. August 2008
  25. For example the United Nations International Comparison Programme for economic statistics[2]
  26. The British Cohort Study 1970, Centre for Longitudinal Studies,2009
  27. Robert M. Figlio, Paul E. Tracy, and Marvin E. WolfgangDelinquency in a Birth Cohort II: Philadelphia, 1958-1988, Inter-university Consortium for Political and Social Research, September 1999
  28. Cohort Studies Throughout the World, Centre for Longitudinal Studies, 2008
  29. The quip "lies, damned lies and statistics" is generally attributed to Benjamin Disraeli speaking in 1895, but there have been earlier attributions[3]
  30. Data Editing in Federal Statistical Agencies, Statistical Policy Working Paper 18, Federal Committee on Statistical Methodology, Office of Management and Budget, April 1990
  31. In the United Kingdom, for example, supervision is the responsibility of the UK Statistics Authority[4]. The standard adopted by European authorities are set out in an EU code of practice [5]. Arrangements in the United States are discussed in the Federal Register of 4th June 2002[6]
  32. An Overview of the Differences between Police-Reported and Victim-Reported Crime, Statistics Canada, 1997
  33. Bruce T. Grimm: The Statistical Discrepancy, WP2007-01, US Bureau of Economic Analysis, March 2, 2007
  34. See paragraph 3.4 of the article on gross domestic product
  35. "Student" (W S Gosset): The Lanarkshire Milk Experiment, Biometrika 23 1931 pp 398-406.
  36. Among the factors held to have contributed the Crash of 2008 was the use of judgement-free statistical risk assessments in a situation containing deterministic risks, such as the bursting of a real estate bubble [[7]]
  37. M J Moroney: Facts from Figures,page 218, Penguin 1951
  38. For example the procedure of the tutorial Tests of Significance and its following chapters , in Stat Trek Statistics Tutorials[8]
  39. M J Moroney: Facts from Figures,page 303, Penguin 1951
  40. Daniel Kahneman and Amos Tversky: "Prospect Theory". Econometrica" vol 47 No2 1979
  41. Massimo Piattelli-Palmarini: Inevitable Illusions: How Mistakes of Reason Rule Our Minds, Chapter 4, "Probability Illusions", Wiley, 1994
  42. A proof of the two-birthday problem is at paragraph 2.1. of the /Tutorials/tutorials ubpage
  43. Amos Tversky and Daniel Kahneman: Judgment under Uncertainty: Heuristics and Biases, Science, New Series, Vol. 185, No. 4157. September 27, 1974
  44. Sara Lichtenstein et al: Judged Frequency of Lethal Events, Journal of Experimental Psychology: Human Learning and Memory, v4 n6 p551-78 Nov 1978
  45. As noted by the UK Home Office Director of Statistics in a letter to the Times of 22nnd January 2003
  46. Emili García-Berthou and Carles Alcaraz: Incongruence between test statistics and P values in medical papers, BMC Medical Research Methodology 2004, 4:13 [9](reported in The Econommist of 30th June 2004 [10])
  47. John P. A. Ioannidis Why Most Research Findings are False, PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124 August 2005
  48. Michael Eysenck and Mark Keane Cognitive Psychology page 483 [11]
  49. Proved in paragraph 2.2 of the tutorials subpage
  50. Evaluating Legal Evidence, Department of Computor Science, Queen Mary College, University of London, 2009
  51. Issues Raised in the Sally Clark Case, News release by the Royal Statistical Society, 23 October 2000
  52. The prosecutor's fallacy is explained at paragraph 2.3 of the tutorials subpage)
  53. Dimson, Marsh and Staunton: The Millenium Book: a Century of Investment Returns, ABN Amro, 2000
  54. See Sources of risk-management errors[12], on the tutorials subpage of "Crash of 2008"
  55. Written statement of Frank L. Raiter On “Credit Rating Agencies and the Financial Crisis” before the Committee on Oversight and Government Reform, United States House of Representatives, October 22, 2008
  56. [13] Barry du Toit Risk, theory, reflection: Limitations of the stochastic model of uncertainty in financial risk analysis Riskworx June 2004