Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Exit polls aren't what you think they are (nuttersandnuttier.com)
159 points by mcbrown on Nov 8, 2016 | hide | past | favorite | 15 comments


He didn't choose Springfield at random for his example, this particular bit of counter-intuitive statistic is called the Simpson's Paradox.

https://en.wikipedia.org/wiki/Simpson%27s_paradox


I don't think this is really Simpson's paradox (which is named after Edward Simpson, from 1951). In Simpson's paradox, you have a statistic X that is lower than Y in both of cases, but overall, Y is higher than X.

It would sound like Simpson's paradox if one candidate won a higher percentage of the vote in both East and West springfield, yet lost the election. But this is of course impossible.

Simpson's paradox arises when you compare two different percentages, say belonging to Candidate A and B, across two different treatments, say East and West Springfield, but you don't compare the sample sizes. It doesn't apply here because everyone who votes is assumed to vote for either A or B.

An example of Simpson's paradox would be like this. We look at the percentage of their own party that a candidate wins. Then it could be that Candidate A wins 90% of the Democrats in East Springfield while B wins only 80% of the Republicans; and in West Springfield, A wins 60% of the Democrats while B wins 50% of the Republicans. Yet, due to differences in population between East and West, A overall only wins 65% of the Democrats while B wins 75% of the Republicans.


The problem described by the article seems like a case of violating dimensional analysis, rather than Simpson's paradox. It might be more obvious if we use clear units: 50 scores voted for A in West, and 80 dozens voted in East.

The article talks about "weighting" the results, which is exactly figuring out the conversion from "democrats in West" to a common unit, "single person", to allow proper arithmetic operations on them.


His example isn't Simpson's paradox. I don't think you can have an analogue of Simpson's paradox in this case (if every subgroup votes more for Ronald Clump than Tirllary Swinton, then Ronald Clump wins, period). You could have something like: women are more likely to vote for Trillary Swinton in both districts than men are, but overall women still vote for Ronald Clump more than men do. That would need one distrit overall voting for Ronald Clump and also having lots of women in it, while the other district votes mostly for Trillary Swinton but has mostly men. For example.


Oh man, you said Springfield and Simpsons in the same sentence so I thought the name might be derived from The Simpsons. Well, it isn't.


This is why apostrophies are important - "Simpson's paradox" is the paradox belonging to Simpson. "The Simpsons Paradox" is a paradox named after "The Simpsons"

And don't blame me, I voted for Kodos.


I like how the Wikipedia article uses "Lisa and Bart" for their examples, instead of "Alice and Bob".


Doh! That wasn't what I thought it would be.


One thing I rarely see brought up is that people lie to exit poll takers. Maybe I'm just stupid, or politics are too far above my understanding, but I don't understand why exit polling is taken as gospel (see 2000 shrub vs bore, or brexit if you prefer an international flavor), given there's absolutely zero requirement that responses be truthful.

I know I wasn't good with my statistics classes (I managed low to mid "A"s, but I never really understood the steps I was reproducing, or the why behind the process), but how do you correct for that type of uncertainty?

Is there a good, basic statistics reference that HN would recommend? We used Devore's "Probability and Statistics for Engineering and the Sciences", and it didn't "click" with me. I'd love to find a good textbook on the subject.


> The reason for the initial error in the 2016 primary is obvious: the rural/urban split caught exit pollsters — who probably assumed things would look a lot like 2012 — completely by surprise.

Wouldn't it make a lot more sense to use the 2012 dem primary as a basis instead?

> And if you hear anyone say the exit polls are a sign of a rigged election, please do tell them that I told you to tell them that I said to say that they’re not very knowledgeable about the subject.

Yeah, it's too bad. Now that I know this info about exit poll results it would be nice if they could qualify the numbers a little when reporting them.


There wasn't a 2012 Democratic primary. Not a meaningfully contested one anyway.

In New York State in 2008, Clinton beat Obama with support coming from both urban and rural areas (with Clinton perhaps doing better in rural areas than Obama):

http://uselectionatlas.org/RESULTS/state.php?year=2008&fips=...

It's also not necessarily the case that the turnout for past elections will be a sound guide to the turnout for the next election. A lot of people vote as a result of affinity for a particular candidate or because of a motivating issue.


Wouldn't it make a lot more sense to use the 2012 dem primary as a basis instead?

Not if you're interested in cases where unaffiliated voters went DEM in the general election, or if GOP-registered voters crossed over. Certainly those are interesting cases.


So obviously, early results from exit polls aren't reliable because we don't know turnout numbers.

But after the election, we know exactly who voted, so it seems we could use exit polls at that point to sanity-check the results. Given a paper trail, a significant discrepancy could trigger a recount.


> Unfortunately, everyone (you, your family, that egg on Twitter, most pundits, and at least one organization purporting to be doing exit polling) has no fucking idea how exit polls are conducted and why those initial figures are a steaming pile of crap until real figures on turnout (i.e. the votes themselves) have been tabulated.

We do, don't be so condescending and expletive. The popular press just wants some cheap sound bites to get more views / clicks. They do the same with academic research, in which they mistake economic significance for statistical significance, ignore any shortcomings, ignore non-rejected hypotheses, and project the findings outside of the (often very limited) scope.


This whole article uses condescension as a comedic device, it works for some and not others. But he wasn't actually condescending, he was just trying to entertain.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: