TL;DR: Science has biases, but there are ways to reduce these biases.
I’ve been bingeing lately. No, not on Netflix. Way nerdier than that. On a podcast. About economics.
I love Planet Money
The quintessential “boring high school” movie scene is probably this one from Ferris Bueller’s Day Off. In it, Ben Stein bores his students to death by describing the Smoot-Hawley Tariff Act.
Can you GET any more boring than this? Anyone? Anyone?
Believe it or not, Planet Money did a podcast episode on the same topic and it was fascinating. It was so fascinating that I’d be willing to bet that high school students would stay awake for its entirety. That’s why I love Planet Money.
Planet Money takes on science
I recently listened to a Planet Money episode called The Experiment Experiment. It was shocking to me because it revealed some limitations with science. It also got me thinking deeper about the “science” used for User Experience design, metrics, and research.
The scientific method
The scientific method is a rigorous, disciplined approach to experimentation. It involves forming a question, coming up with a hypothesis, performing an experiment, then analyzing the results. Before publishing a scientific study, it must be peer-reviewed, meaning that other scientists check to make sure that the experimenters followed the scientific method.
Imagine you see a headline that says that chocolate actually makes you thinner. You might doubt or question this unbelievable claim. But there’s one magic word that will probably stop you in your tracks. One word that turns doubters into believers:
Chocolate does not help you lose weight. The study authors cherry-picked that result and used a small sample size of study participants. In other words, the hypothesis wasn’t defined in advance. Also, the study wasn’t peer-reviewed. Media outlets should have discovered these problems before reporting the findings as truth. Many didn’t. It was all done as part of a journalist’s exposé on pseudoscience.
Unfortunately, even statistically significant, peer-reviewed studies can be “fake news”.
Scientific studies use something called statistical significance to describe how likely that the result was a fluke as opposed to being representative of reality. For example, statistical analysis might estimate that there’s only a 5% chance of getting a different result if someone repeats the exact same study.
A fluke result is not likely to be repeatable.
Back to the “Experiment of Experiments” from the Planet Money episode. In said experiment, psychologists chose 100 peer-reviewed studies and repeated them. Only 39 of these repeat studies confirmed the findings of the original studies. 61% of them came up with different results.
The file drawer effect
Studies with boring, unsurprising results are often not published. 97% of studies published in psychology describe a “positive result”, meaning that they discovered something new. Only 3% of published studies confirmed things that psychology already knew (a “negative result”).
For unpublished studies, the numbers are probably much different. There are piles of studies that get done, produce a boring “negative result”, and end up in the file drawer instead of being submitted to a publication. Hence the name “file drawer effect”.
Sample size bias
Another reason for unrepeatable studies could be that researchers can skew results by expanding the study to a larger and larger sample size until they get an interesting “positive result”. In other words, if a 15-person study doesn’t show any link between chocolate-eating and health, then double it and see if a 30-person study does.
Science doesn’t always rule
All of these issues are nicely summarized by the following xkcd comic:
In this comic, researchers test to see if there’s a link between jelly beans and acne. They find no link. This is a negative result. Later, they decide to test different colors of jelly beans. 19 of the color studies find no link (again, negative results). But 1 study does find a link. This single positive result is the one that makes the front page of the newspaper.
How to fix biases in scientific studies
Yes, the media has to continue to check their sources before reporting them as fact to prevent spreading misinformation.
Yes, the general public can also check for sources before believing or spreading fake news.
But the onus should be on the scientific community.
Luckily, there’s a solution.
Before you do the study, you write down how you’re going to do it, how you’re going to analyze your data and what you’re going to try to learn.Brian Nosek, psychologist (Planet Money, January 15, 2016)
Fields such as drug research have done this, leading to huge reductions in the file drawer effect. In other words, more visibility into the “boring” study results, as well as less chance for the sample size bias.
This “study registry” is another layer of rigor that can be applied on top of the scientific method. When done correctly, it can give the general public, the media, and other scientists more confidence in study results.
User experience and science
User experience (UX) doesn’t always have the most disciplined approach to research because it doesn’t always need to. In other words, we rarely actually apply the scientific method.
As UX designers and researchers, we need to communicate the limitations of our conclusions. We should also work to reduce the bias in our work.
Don’t call it science
If it’s not science, don’t call it science. There’s nothing wrong with running a quick and dirty usability test with your friends and family. The problem comes if you present findings from that study as fact. Your test friends and family are probably not just like your real customers. Your small sample size gives a lot of room for fluke.
Be honest and transparent about your approach and limitations.
A quick win is be showing results using numbers of “test participants” instead of percentages of “users” or “customers”.
Bad: 80% of users completed the purchase.
Good: 4/5 test participants completed the purchase.
Define your study up front
As a UX designer, after you launch a new feature, you might watch the analytics data over the next few months. Looking at the data, you could probably cherry-pick one metric or one time period to show that the feature is a smashing success.
To prevent this, define your success metrics or your research study beforehand.
Again, this prevents the sample size bias and the file drawer effect.
The same thing goes for quantitative research. Don’t just pay for 20 users at usabilityhub, then “see if the results look right” and pay for 20 more if not. Commit up front. Put it in writing. Then do it.
This Planet Money episode really rattled me. I used to take “science” as gospel. Now I have a little more understanding into the limitations of science, as well as a healthy (I hope) dose of skepticism. And I hope you do too.
If you’re a scientist, commit to your study methodology before running the study.
If you’re not applying the scientific method, don’t call what you do “science”.
If you’re in the media or reading the media, check your sources.