Fearless Psychological Science: NHST

Showing posts with label NHST. Show all posts

Monday, February 27, 2017

Stat-ception II: How to fix statistics in psychology

Stat-ception Part II

I'm a star!

OK, my public speaking skills may not exactly have made me a star (yet!), but I AM on YouTube! I've included a link to my recent (Feb 2017) Cognition Forum presentations, as well as my current thinking about easily--and immediately implementable--solutions to ameliorate those weaknesses.

The first video goes into depth about the issues; the second describes my proposed solutions to those problems.

https://www.youtube.com/playlist?list=PLvPJKAgYsyoKcGOCKEYT2GyzK0yLVXvzN

For your viewing pleasure, I've also embedded the videos here:

Any feedback or advice is welcome!

I've also made the slideshows available on Google Drive. Here's the link to the first slideshow, so you can follow along: https://drive.google.com/file/d/0B4ZtXTwxIPrjTktiMGdoQ3JBSHM/view. And here's the link to the slideshow for the second video as well: https://drive.google.com/file/d/0B4ZtXTwxIPrjalZxdFJfUWNKTVU/view?usp=sharing

A draft of my manuscript on the topic (intended for eventual publication) is freely available for download at https://osf.io/preprints/psyarxiv/hp53k/. Since I'm an advocate of the open science movement, it's only right that I make my own work publicly available--hence why I uploaded these videos (and my manuscript) to public repositories.

You may not trust my own take on these issues, in which case I commend you for your skepticism! In the videos, I made numerous references to Ziliak & McCloskey (2009), Gigerenzer (2004), and Open Science Collaboration (2015)--all are worth reading, for anyone who cares about scientific integrity and the research process. All three works were highly influential in my thinking on this topic, though I cited a variety of other papers as well in my aforementioned manuscript.

You may disagree with my recommendations in the second video, and if so, that's okay! How to address the limitations of NHST and fix science is absolutely a discussion worth having; I advance my own ideas in the spirit of jump-starting such a discussion.

So, please put your thoughts in the comments, and share my work with colleagues who may be interested in the topic!

Monday, February 20, 2017

Stat-ception: Everything you think you know about psych stats is wrong!

In the spirit of open science, I have posted a video of a talk on statistical practice that I gave in the Cognition Forum at Bowling Green State University.

This talk was in 2 parts; the first part summarizes many of the common objections to null hypothesis significance testing (NHST) that thinkers have made over the decades, and the second part goes over my current recommendations to tackle the problem.

Part I is available at https://youtu.be/JgZZkMJhPvI; Part II is forthcoming! I've also embedded the video right here:

You can view and download the full slideshow at https://drive.google.com/open?id=0B4ZtXTwxIPrjTktiMGdoQ3JBSHM. The free (and very easy-to-use!) statistical program JASP can be found at https://jasp-stats.org/. JASP is useful if you want to run the analysis on the precision-vs-oomph example that I discuss at the end of the video (at the 39:41 mark).

I have already tackled some of the issues with NHST on more than one occasion in prior posts here, and I have also provided a practical guide to psych stats as a freely available educational resource!

There are a variety of excellent papers on the topic of statistical practice in social science fields; my working paper on the subject summarizes them. In the interest of open science, I've made this working paper available at https://osf.io/preprints/psyarxiv/hp53k/. Other great resources on the topic include Gigerenzer (2004) and Ziliak & McCloskey (2009), which are also freely available.

Sunday, January 8, 2017

What You Think You Know About Psychology is Wrong

What You Think You Know About Psychology is Wrong:

The limitations of null hypothesis significance testing

By: Zach Basehore

Are college students psychic?!

[Or maybe they're just really unlucky: http://www.deirdreverne.com/on-behalf-of-college-students-a-professor-appeals-to-psychic-theresa-caputo-the-long-island-medium/]

Let's say someone claims that people who go to college are more psychic than people who do not attend college. So I decide to test this claim!

How would I do that? Well, a simple test would be to examine people's ability to correctly predict whether a coin will land on heads or tails when I flip it. There are 10,000 college students and 10,000 non-college students; each person predicts the results of 100,000 coin flips, one flip at a time.

The results:

Each participant had a proportion of correct predictions. The mean proportion of correct predictions among college students was .50006 (that is, 50.006% correct), and the non-college-students had a mean proportion of correct predictions equal to .49999. The SDs are .00160 and .00155, respectively.

When you run an independent-samples t test, this difference is statistically significant at an alpha level of .01! The 95% CI for the difference is also quite narrow (indicating that these means are very close to the true population means).

So the statistical test gives us very strong evidence that college students really are more prescient than non-college students! We've made a new discovery that revolutionizes our understanding of the human mind, and opens up a whole new field of inquiry! Why are college students more psychic? Is it because they're smarter? More sensitive? Do they pay closer attention to the world around them?

The problem:

In this example, I've found evidence of psychic abilities! Specifically, I've shown that college students predict the outcome of coin flips more accurately than non-college students, and there's less than a 1% probability that the difference I found is due to chance alone, if the null hypothesis is true at the population level)! How exciting—I can establish a huge name for myself among scientific psychologists, and have my pick of schools at which to continue my groundbreaking research! I could continue this research at Oxford… nah, let's find a better climate; like Miami or USC. I could get multi-million-dollar grants to fund an elaborate lab with fancy equipment! I can give TED talks, write books and go on lucrative speaking tours...my research will grab headlines the world over! I’ll be a household name!

The gut-check:

But wait a second...what was the actual difference again? On average, college students are right on 7 more trials (out of 100,000) than non-college students?...

Any time you gather real-world data, you’d expect there to be some small difference between groups, even if it’s really not due to any systematic effect. In the research described above, everything happened in just the right way to give me a spurious result:

1 - low variance within each group [thanks in part to the excessive sample size; see the law of large numbers];
2 - a small but statistically significant difference that can easily be explained by a seemingly reasonable mechanism, and
3 - a very large sample.

These factors explain how I found a statistically significant difference between college students and non-college students despite the tiny difference in means.

Excited by the significant result and the potential to trumpet my exciting new ‘discovery’ [thereby launching a career, positioning myself as an expert who can charge ridiculously high consulting or speaking fees], I've failed to critically evaluate the implications of my results. And therefore, I've failed as a scientist. :(

How can we avoid falling into that trap?

One solution:

A standardized measure of effect size, like Cohen's d, will reveal what SHOULD be obvious from a look at the raw data: this difference between groups is tiny and practically insignificant, and it shouldn't convince anyone that college students are actually psychic!

In the spirit of scientific inquiry, you can test this for yourself! At GraphPad QuickCalcs, enter a mean of .50006 for Group 1 and .49999 for Group 2. Next, enter the SD of .00160 for Group 1 and .00155 for Group 2. The N for each group is 10000. Hit "Calculate now" and see what you get.

Now, enter the same means and SDs, but change the N to 100 for both groups, and observe the results.

Then, go to the Cohen's d calculator here and enter the same information (it doesn't ask for sample size). So what does all of this information mean?…

I’ve already done the easy part for you:

Sample of 20,000:

Sample of 200:

Cohen's d:

***

Statisical significance is a concept that has been called idolatrous, mindless, and an educational failure that proves essentially nothing! But every psychology major and minor has to learn it nonetheless...

The absurd focus on p-values in many social science fields (like psychology, education, economics, and biomedical science) leads to articles like the highly influential John Ioannidis piece Why Most Published Research Findings Are False—which has been cited over 4000 times!

A variety of ridiculous conclusions have been published based on small p-values, such as:

The assertion that our behavior changes due to existential crisis at the end of every decade of life (e.g. age 29, 39, 49, etc.)—when even a cursory glance at the data shows otherwise!
That college students can detect erotic images before they appear on a computer screen!
That hurricanes with female names are deadlier because people are less afraid of hurricanes with nice-sounding female names than of hurricanes with sterner-sounding male names, so fewer people leave—therefore leading to more deaths.
Or that people are actually younger after listening to “When I'm Sixty-Four” by the Beatles than an instrumental control song!

This is exactly why I pound the figurative table so hard about using effect sizes and well-designed, targeted experimental research. Don't just run NHST procedures on autopilot, or collect a huge dataset and mine for significance, or draw conclusions based solely on the arbitrary p ≤ .05 standard.

“But that's not how math works! How is the .05 standard arbitrary? And where did it come from?” Well, Gigerenzer (2004) identifies the source of this practice as a 1935 textbook by the influential psychological statistician Sir R.A. Fisher—and Gigerenzer also notes that Fisher himself wrote in 1956 that the practice of always relying on the .05 standard is “absurdly academic” and is not useful for scientific inquiry!

So, one of the early thinkers on whose work current psychological statistical practice is based would likely recoil in horror at what has become of statistical practice in our field today! [Note, however, that Cowles and Davis (1982) identified similar, though less absolute, rules about an older statistical practice called ‘probable error.’]

Remember that the greatest scientific discoveries, such as gravity, the laws of thermodynamics, Darwin's description of natural selection, and Pavlov's discovery of classical condition—not one relied on anything like p-values.

Not.

One.

There is truly no substitute—none whatsoever—for thinking critically about the quality of your research design, the strengths and limitations of your procedure, and the size and replicability of your effect. Attempts to automate interpretation based on the .05 standard (or any such universal line-in-the-sand!) result in most researchers pumping out mounds of garbage and hoping to find a diamond in the rubbish heap, rather than setting out specifically to find a genuine diamond...

Conclusion? The validity of most psychological research is questionable (at best)! We're taught to base research around statistical procedures that are of dubious help in understanding a phenomenon—and our work is almost always published solely on that basis! This pervasive problem will not be easy to fix: we need the entire field to stop doing analyses on autopilot, and to start thinking deeply and critically!

The most powerful evidence is, and will always be, to show that an effect occurs over, and over, and over again.

***

If you need further explanations, here are a couple helpful links:

Explanation of effect size: https://www.leeds.ac.uk/educol/documents/00002182.htm
Excellent slides on the p-value: http://www.slideshare.net/pbbarlow1/whats-significant-hypothesis-testing-effect-size-confidence-intervals-the-pvalue-fallacy

Some interesting links on the investigation of people who claim to have paranormal powers:

https://en.wikipedia.org/wiki/One_Million_Dollar_Paranormal_Challenge

http://www.csicop.org/si/show/fakers_and_innocents_the_one_million_dollar_challenge_and_those_who_try_for

https://www.youtube.com/watch?v=M9w7jHYriFo

Wednesday, November 30, 2016

A practical guide to Psych Stats

I've previously found the document "Reporting Statistics in Psychology" highly useful, and so I made a presentation for a stats course that I think is worth sharing! My own guide, a supplement of sorts, goes into a slightly broader variety of topics than the previous link, and mine also lists a 'bottom-line' approach that I think will be helpful to the people who just want to know what they should do!

Mine is called "A practical guide to Psych Stats," and I've made a freely available, freely downloadable PDF of that presentation here.

https://drive.google.com/open?id=0B4ZtXTwxIPrjUzJ2a0FXbHVxaXc

This is probably going to be useful to you if any of the following are true:

Early-career/inexperienced students:

You've been unsure which test is appropriate for a certain dataset
You've struggled to understand psych stats from a conceptual perspective
You've struggled to write up statistical results in APA style
You've wished there was an easier-to-use stats program
You've wished there was a free stats program that you can run on your own computer

More experienced/advanced students:

You've thought that null hypothesis significance testing (NHST) procedures didn't make sense
You think that the APA's reporting standards for statistical tests aren't stringent enough
You're not sure how to interpret standardized measures of effect size
You want to know a little bit more about Bayesian statistics
You're not sure how to interpret your Bayesian statistics
You're looking for a free/better/more user-friendly/more widely-compatible stats program to run on your own computer

Instructors:

You're looking for a quick, easy, free, relatively brief resource to guide your students through the morass that is psych stats

Bonus: links are embedded! :D
However, for best effect, you must download the PDF, as the online preview version may randomly insert characters that will break the links :(

Enjoy, and I hope you find this helpful!

Fearless Psychological Science