"It's not about the figures themselves, it's what you do with them that matters."
Excerpts from The Undoing of Lamia Gurdleneck by K. A. C. Manderville
"Statistics lie. They are designed to sway opinions. Take the time to keep yourself informed on things that matter."
Dr Steve Maraboli
"There are three types of lies -- lies, damn lies, and statistics."
Benjamin Disraeli
Probably to many in the society, the field of statistics is a humdrum subject and topic. But little do many realise that the findings of statistics arguably affects every minute, even every second of their lives. The make and model of the mobile phones that people use to wake them up in the morning, the type of toothpaste they use, the taste and price of the coffee they drink, the quality and price of the bread they eat, the jam they use, the moment they step out of the house or even the moment they are awake or even asleep .....
Almost all major products and services consumers use in everyday life are decisions based on statistical findings. But how did all these statistics come about? Probably many would be aware that it is impossible to take a census of the entire population pertaining to certain matters. Assuming a population or consumer size of a million. Due to time and financial constraints, likely nobody, not even the government would resort to conducting a census to arrive at certain statistical findings on certain matters to base their decisions on. This explains why a general census on population size, age profile, spending patterns, education level, unemployment level, healthcare needs etc usually is taken once a decade or even longer. The cost and time to such a statistical finding is too prohibitive and probably impractical. By the time the survey is done on a million people, tabulated, processed, analysed etc, many months or even years would have elapsed and some data might be obsolete, for example in the case of fast moving consumer products or technological products such as mobile phones on people's taste, preference, budget etc.
In practice, a small sampling, n size, of the entire population, N size, is taken and inference and estimated conclusion drawn. Often big decisions are based on and justified by such findings such as government policies, building of infrastructures, education policies, bus and train arrivals, housing policies, tax increase, food prices, rentals, clothings, medication, medical drugs, social welfare etc etc. But are the published statistics to justify certain decisions always correct or are there intentions to mislead and misrepresent to justify certain decisions by certain omission, certain statistical methodology etc?
What is the parameter used? Is it the mean, the median or the mode? What was the sample used? Is it a random sample or a bias sample? Who designed the statistical models and with what intentions in mind? Is the inference based on a single sampling or a few samplings? What is the sampling size? What is a good sampling size? 50, 500, 5,000? When reading a published statistic based on certain survey, are the various relevant details leading to the findings revealed such as the sampling size, the parameter used, the various measures used? What was the statistical methodology used to arrive at the statistical inference? When reading statistics published, it would likely serves the readers well to adopt a critical approach and ask certain questions in their minds.
Basically, what exactly is provided in a statistical finding? Is it a proof that something is true? Often this is the misconception by many about statistical findings. Due to the above stated reasons that often due to time and financial constraints and other reasons, a census is not practical to be conducted on the entire population. Thus a small sampling is used. It is an exercise in estimating the probability of what the entire population would have yielded if there was a census done. In other words, the conclusion is arrived at via use or misuse of the data collected, even before the data is collected in the statistical model design. Can it be possible that a sampling size of a few thousand be representative of a population of a few millions or tens of millions? How is the data collected? What is the accuracy of the data collected? Who designed the statistical model in the first place? Are they biased? Are they experienced? What are their experience, competency, track records? So many things can actually go wrong even before the data are collected, processed and analysed. If based on an erroneous findings, can the conclusion inferred be accurate at all? Would not the estimation or inference or conclusion drawn be erroneous if the whole exercise is erroneous?
In everyday life, when statistical data are used to measure the well being of a society, for example in terms of the general income of the population, does the median income or the GDP per capita paint a more accurate picture? Often policies such as welfare, minimum wage legislation are based on such findings. Should outliers such as the income of top 0.1% billionaires or multi-millionaires be omitted from the calculation so that the finding would not be biased or skewed? When using GDP per capita income as a measure, will the inclusion of the arrivals of a few new immigrants in the form of billionaires distort the data and paint a different picture than what is actually on the ground?
In introductory statistical courses, basic concepts such as probability, sampling, discrete and continuous distributions, different types of discrete distribution such as discrete uniform, Bernoulli, Binomial, Geometric, Poisson, different types of continuous distribution such as Normal, Student's t, Chi-square etc, expectation and variance, point estimation etc are covered. For readers and consumers, it may not be necessary to be directly involved in the statistical process. However, as stakeholders in a society, it may matter to understand certain statistical concepts to know how certain findings are arrived at and adopting a critical mind instead of "wholesale swallowing" the findings as given when the entire statistical process and data collected could have been "dishonestly being dealt with".
PSS
*The writer blogs at http://pro-sustainable-sg.blogspot.sg/