Showing posts with label understanding stats. Show all posts
Showing posts with label understanding stats. Show all posts

Tuesday, May 29, 2018

Lies, Damned Lies, and Statistics



Lies, Damned Lies, and Statistics

In an interesting post, Michael Batnick, the Irrelevant Investor, makes a critical point about the oft-overlooked limitations of data in the world of behavioral finance: http://theirrelevantinvestor.com/2018/04/04/the-limits-to-data/

Using Excel shows you how a robot should allocate its lottery winnings.
It doesn't show you that 70% of human lottery winners go bankrupt.

Darwin famously didn't trust complicated mathematics ("I have no faith in anything short of actual measurement and the Rule of Three," he wrote in a letter). He wasn't wrong: complex procedures can obscure what's going on 'under the hood.' This can render a formula's weaknesses virtually invisible.

Have you heard about the studies showing that irrelevant neuroscientific information in a research summary makes people rate the conclusion as more credible? The same seems to go for math—when people see some complex, technical information, they'd often rather just believe it instead of thinking critically.

http://www.bcps.org/offices/lis/researchcourse/images/statisticsirony.gif
 By Signe Wilkinson, for the Philadelphia Daily News
http://www.bcps.org/offices/lis/researchcourse/images/statisticsirony.gif


Wednesday, August 9, 2017

Multiple Regression Explained



How to interpret multiple regression

Regression is useful for making a predictive model. Let's say there's a positive linear correlation between K and N, but you suspect that Factors L and M also contribute to Outcome N


Make up a storysay, that Factors K, L, and M represent intelligence, persistence, and amount of sleep per night and N refers to a course grade.

So, to test the relative impacts of Factors K, L, and M on Outcome N, you can feed each factor into a regression model, and test whether each factor increases the fit. That is, a correlation between Factor K and Outcome N yields a Pearson's r of .64 and R2 of .4096. 

But, when you run a regression testing the effect of Factors K and L on Outcome N, you find an R2 of .5625, with a significant change in the R2 value. That means that Factors K and L together do a better job of explaining the relationship than Factor K alone. 

Then, you run a regression with Factors K, L, and M together, and find an R2 of .5929, with no significant changethis means that Factor M does not help to explain the relationship. Outcome N is due mostly to Factors K and L; Factor M is an unimportant predictor of Outcome N.

Voilà! There's regression in a nutshell! 

And, if you're confused about the math...remember in middle school or high school math, when you learned about "rise over run" and learned the formula y = mx + b? Yeah, that's a simple linear regression. With multiple regression, you can add multiple terms, such that y = ax1 + bx2 + cx3...+ z. But it's still the same concept, just with more predictors than that lone "mx" term.

In case you missed it, there are some fantastic, easy-to-use, and FREE stats programs available now! I review them here.
For more help explaining statistical concepts and when to use them, 
please download my freely available PDF guide here!
https://drive.google.com/open?id=0B4ZtXTwxIPrjUzJ2a0FXbHVxaXc

Saturday, August 5, 2017

When to use a chi-square

 
 
When to use a chi-square

Not clear about when you should use a chi-square vs. when to use a t test? 

First, you should check out my free, downloadable PDF, A Practical Guide to Psych Stats.

Now that that's out of the wayif you're still not sure, how about a tasty example? 
Let's say that we want to know whether a bag of Original Skittles has a truly random distribution of colors. If so, we’d expect to find roughly equal numbers of red, green, purple, yellow, and orange Skittles, right? 

A chi-square goodness-of-fit test [that is, a one-variable chi-square] can help us evaluate this. If there are 18 red, 13 green, 18 purple, 19 yellow, and 17 orange, the chi-square goodness-of-fit test tells us whether this distribution is different enough from an even distribution of 17 apiece (85 Skittles / 5 colors) that we can reject the notion that the colors are evenly distributed. 

If you're really curious about my made-up numbers, by the way, here's a straightforward, easy-to-use online calculator to help you: http://www.socscistatistics.com/tests/goodnessoffit/Default2.aspx

***
Now, let's say we’re looking for differences in the proportion of red Skittles to the other colors in a bag of Original vs. a bag of Tropical Skittles. 



In this case, we have two categorical variables [Original vs. Tropical Skittles, and unequal distribution of colors], so we would need a chi-square test for independence. The additional category makes the calculation a little more complex (but not if you use statistical software to handle the dirty work! 😊), but ultimately, we're looking at the same thing as before: are there roughly equal numbers of each type Skittles in each bag?

In case you missed it, there are some fantastic, easy-to-use, and FREE stats programs available now! I review them here.
For more help explaining statistical concepts and when to use them, 
please download my freely available PDF guide here!
https://drive.google.com/open?id=0B4ZtXTwxIPrjUzJ2a0FXbHVxaXc

Wednesday, November 30, 2016

A practical guide to Psych Stats



I've previously found the document "Reporting Statistics in Psychology" highly useful, and so I made a presentation for a stats course that I think is worth sharing! My own guide, a supplement of sorts, goes into a slightly broader variety of topics than the previous link, and mine also lists a 'bottom-line' approach that I think will be helpful to the people who just want to know what they should do!


Mine is called "A practical guide to Psych Stats," and I've made a freely available, freely downloadable PDF of that presentation here.
https://drive.google.com/open?id=0B4ZtXTwxIPrjUzJ2a0FXbHVxaXc


This is probably going to be useful to you if any of the following are true:

Early-career/inexperienced students:
  • You've been unsure which test is appropriate for a certain dataset
  • You've struggled to understand psych stats from a conceptual perspective
  • You've struggled to write up statistical results in APA style
  • You've wished there was an easier-to-use stats program
  • You've wished there was a free stats program that you can run on your own computer
 More experienced/advanced students:
  • You've thought that null hypothesis significance testing (NHST) procedures didn't make sense
  • You think that the APA's reporting standards for statistical tests aren't stringent enough
  • You're not sure how to interpret standardized measures of effect size
  • You want to know a little bit more about Bayesian statistics
  • You're not sure how to interpret your Bayesian statistics
  • You're looking for a free/better/more user-friendly/more widely-compatible stats program to run on your own computer
Instructors:
  • You're looking for a quick, easy, free, relatively brief resource to guide your students through the morass that is psych stats
    • Bonus: links are embedded! :D
      However, for best effect, you must download the PDF, as the online preview version may randomly insert characters that will break the links :(
 Enjoy, and I hope you find this helpful!

ResearcherID