- You can look at the entire population and find out what’s really going on.
- You can look at a sample of the population, and extrapolate.
Wednesday, 25 June 2014
Dear Social Scientist: The Stool Sample 25.06.14
Dear Social Scientist,
I am not one of you. I am watching your struggles from the outside, yet I think I understand your pain. Other scientists have it much easier. They can get funding and run experiments to prove or disprove theories. Nuclear physicists keep shooting particles at each other. Plant geneticists can produce tons of clones to abuse and then dissect. If entomologists want to find out the lethal dose of a pesticide, they can just take a lot of critter and poison them. You can’t do any of that. You can’t produce clones, subject people to experimentation or kill them. Even if your ethics didn’t stop you, it’d be hellishly hard to find funding for that. All you’re left to play with is “studies”. To the uninitiated, that can be translated into “asking people questions”, or, if you’re really pushing the boat, “making up completely non-dangerous scenarios, and then asking people questions”. There’s nothing else you can do. It’s all stacked up against you, and it’s repressive and harsh and unfair.
To add insult to injury, apparently to call yourself a “scientist” these days you need to produce some “data”, so you can carry out some “statistical analysis” to back up your “theories”. You’re kidding me, right? Those white-coated nerds in the boring sciences may think that numbers are fun – hell, you’ve probably seen them around getting all excited over an ANOVA or another, leaping in joy at finding their data statistically significant, complaining about their variables being dependent - do they turn up at their house and ask for food and money, I wonder? – and so on. They may get a kick out of this stuff, but they are essentially soulless freaks.
You’re not like them. You got into your chosen field because you really care, because you feel passionately about these issues. You don’t want to sit in an office and crunch numbers for hours on end! You want to change the world! Plus, let’s face it, nobody told you that you’d have to be good at maths to do humanities. You are seriously into philosophy and ethics and real life, man. You’re just not a number person - you're a people person. Do they care about that? They don’t. They’ve force-fed you through a basic statistics course, and now they’re demanding numbers, numbers, and more numbers, and they all want them to make sense.
I want to help you out. Statistics don’t have to be your enemy. We can make this nice and simple, with a handy step-by-step guide.
The first thing we need to work out is what statistics are all about. All those percentages and proportions and pie charts – what do they really mean? How does it all work? Fear not – it’s actually very simple.
Basically if you want to find out something that applies to the whole population (e.g. the percentage of boys to girls, the average weight, the favourite political party – anything, really), you have two options:
Why don’t we just ask the entire population all the time? Well, it’s a teeny tiny bit impractical. There are a whole load of people out there, so it can be a touch expensive. Furthermore, even when we do run giant statistical efforts, such as universal censuses or elections, there are still a lot of people who just refuse to play ball. For all intents and purposes we’re stuck with asking a sample of the population, and then assuming that the answers we get from them apply to everyone else. That’s actually allowed in science, and it’s been shown to work. Unfortunately, there is a big catch. The whole process hinges on your sample being valid. And that, my friend, is where you appear to be going wrong – consistently, awfully, sadly wrong.
In order to carry out a lot of the simple statistical techniques that you seem to prefer, your sample has to be big enough and random. Check this out:
“Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is determined based on the expense of data collection, and the need to have sufficient statistical power. In complicated studies there may be several different sample sizes involved in the study: for example, in a survey sampling involving stratified sampling there would be different sample sizes for each population. In a census, data are collected on the entire population, hence the sample size is equal to the population size. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.”
“In statistics, a simple random sample is a subset of individuals (a sample) chosen from a larger set (a population). Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals. This process and technique is known as simple random sampling, and should not be confused with systematic random sampling. A simple random sample is an unbiased surveying technique.”
…none of that went in, did it? Sorry about that. Let’s make it easy-peasy. Basically, a sample has to be pretty damn big for anyone to take your work seriously. If your sample is too small, your work will be worthless. Yes, there’s a huge size unfairness going on here. To make matters worse, you can’t make it up for having a small one. You can’t work harder with a small sample, get some flashy moves on and, you know, still find a way to please your audience. It’s one of those horrid situations where size really matters and you truly can’t do a damn thing about it.
Furthermore, for a lot of the tests and analysis your sample ought to be random. What you are doing is effectively assuming that the people you are asking are representative of the entire population. If you pre-sort them in any way, you just can’t make that assumption anymore. There’s no wiggle room on that – any sort of pre-sorting just messes the whole thing up. That makes things rather tricky, if you think about it. You can’t pick people solely from within any given organisation, club, church, hobby, etc.. If they all belong to or participate in something, anything, then the sample just isn’t random.
It gets worse. If you send your questionnaires into the wind hoping for people to be kind enough to answer, you won’t get a random return. People only tend to bother answering any sort of volunteer questionnaire or survey if they care about an issue. Most people who feel neutral about something just won’t waste the time. And even if people care about a subject, it tends to be those who have some problem with it who’ll feel the most compelled to answer. For instance, most business owners have to accept the fact that unless they force customers to review them, they will always get more complaints than compliments, regardless of how well they are actually doing. So, if your data collection is based on people volunteering their contribution, you are already pre-selecting your sample. Oops.
If your sample isn’t right, then you can’t make that leap that goes from it to the whole population. Whatever you have found out can only be used to describe the particular group you’re looking at. Any attempt to do otherwise is a misuse of statistics and very poor science indeed. If you’ve been wondering why you’ve been working so hard on all those “studies” that people ignore or laugh at, now you know. Those meanis are not so set in their ways that they can’t accept your point of view. They are not deliberately ignoring the truth because it clashes with their primitive beliefs. Your sampling is atrocious, so your whole study is just a joke.
Now, I know this may be a bit difficult not only to accept but also to absorb. It’s all complicated and dry concepts. It’s hard to grasp them and even harder to care. In order to help you out further, I’m going to give you some examples along with a clear visual-emotive reminder: the sad kitten.
You make bad statistics, you make kitten sad.
“A study of 22 people….”
A study of 22 people is not a scientific study, because a sample of 22 is so small as to be utterly ludicrous. As a general guideline, any study with a sample size so small that you could fit all participants on a bus makes kitten sad. Seriously, you need to aim a lot higher. Think multiplexes, at the very least.
“A study of 22 university students….”
The sample is not only pitifully small, but it is pre-sorted. University students are pre-sorted group – not only they tend to be of a certain age and socio-economic class, but they go through a specific selection process. Non-random sampling makes kitten sad.
“A study of 22 Social Studies students….”
No, no, and thrice no. Your sample is too small and pre-sorted twice. Anyone who decides to spend money and time on studying a certain topic so they can make it their lives’ work has a definite interest. This interest is likely not only to colour their point of view, but to be a reflection of their background and beliefs. Very non-random sampling makes kitten very sad.
“We sent a questionnaire to all Social Studies students at this University, and out of 22 responses….”
Seriously – you're killing the kitten. What you are looking at is a pathetically small selection of answers from people who are affected by the issue you are investigating, share a common interest, AND belong to a selected group. Carry on if you really want to, but don’t try to sell this as science.
I honestly hope this doesn’t upset you. I am not judging you – I am just trying to help you. I was once like you, and just couldn’t see what numbers meant. It all changed for me when I finished third grade, but we all grow at our own rate. There is no shame in this. I can honestly promise you that if your studies were only worth the paper they are printed on, people would probably really listen and care. At the moment, you’re cutting trees down for nothing. And that makes kitten sad.