Simpson's paradox

Consider 2 doctors Doc and Tor that treat 2 diseases: 🔵 (heart disease) and ▲ (flu)

However, the annual report shows that Tor has a higher overall success rate than Doc. How is this possible?

Interactive demo

The answer is: by carefully choosing the number of patients each doctor seen for every desease, we can craft an example that shows Tor in a preferable light.


Tor

Move sliders to set success rates

*
[0.15, 0.35]

Doc

Move sliders to set success rates

*
[0.20, 0.40]

Explanation

The core of the paradox lies at the fact that we naturally assume that the overall success rate is in the center of the segment. However, it's not always so. By choosing the numbers we can shift overall success rate to the left or to the right. It will always be within the segment, but not necessarily will be the midpoint.

If we have 2 segments such that ...

... we can still choose 2 points p1 from the first segment and p2 from the second segment such that S1(✳) > S2(✳). We can then carefully choose the numbers in a way that overall success rates have the value corresponding to ✳ on each segment. Specifically, if we assume that the total number of cases is TOTAL and success rates for each case are 🔵 and ▲ and the desired overall rate is ✳, then the total number of cases in category 🔵 is (▲-✳)/(▲-🔵) * TOTAL, and in category ▲ is TOTAL minus total number of cases in category 🔵.

So if we take a look at the intersection of S1 and S2 - it will be some segment S3. We take its mid point. Then ✳ for S1 is any point to the right of midpoint of S3 and ✳ for S2 is any point to the mid of midpoint of S3.

Outro

Note: there are many scenarios where this paradox can arise:

Back to main page

© Copyright 2025, Iaroslav Tymchenko