Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think this is really Simpson's paradox (which is named after Edward Simpson, from 1951). In Simpson's paradox, you have a statistic X that is lower than Y in both of cases, but overall, Y is higher than X.

It would sound like Simpson's paradox if one candidate won a higher percentage of the vote in both East and West springfield, yet lost the election. But this is of course impossible.

Simpson's paradox arises when you compare two different percentages, say belonging to Candidate A and B, across two different treatments, say East and West Springfield, but you don't compare the sample sizes. It doesn't apply here because everyone who votes is assumed to vote for either A or B.

An example of Simpson's paradox would be like this. We look at the percentage of their own party that a candidate wins. Then it could be that Candidate A wins 90% of the Democrats in East Springfield while B wins only 80% of the Republicans; and in West Springfield, A wins 60% of the Democrats while B wins 50% of the Republicans. Yet, due to differences in population between East and West, A overall only wins 65% of the Democrats while B wins 75% of the Republicans.



The problem described by the article seems like a case of violating dimensional analysis, rather than Simpson's paradox. It might be more obvious if we use clear units: 50 scores voted for A in West, and 80 dozens voted in East.

The article talks about "weighting" the results, which is exactly figuring out the conversion from "democrats in West" to a common unit, "single person", to allow proper arithmetic operations on them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: