Love this! I am learning to (assistant) coach my son's soccer team. We only do tournaments, which start as pool play and end in a playoff game. We recently moved up an age division from U10 to U12 because one of our players had a birthday. That player happened to be my son and my husband happens to be the head coach, so we moved the whole team up. At any rate, we are now one of the younger and smaller teams in our division.
At one of the tourneys, we got creamed yet still qualified for the playoffs. Except that no one, including us, believed that from the way we looked on the field. We did eke out 2 wins, but lost by a landslide 2 more. The total points should have qualified us, but they didn't put us in the playoffs and we didn't realize it until the way home when I actually checked the refs numbers and the total points we earned. We were cheated out of losing badly in the playoffs because we didn't understand Simpson's Paradox and didn't see it necessary to double check the numbers until later.
Moving forward, I can now absolutely strategize the pool play better to take advantage of Simpson's Paradox. I can also do a better job at making sure the refs are treating us fairly. My son and his soccer team thank you!!
Additionally, now that I know about this concept (and totally love it!), I really am seeing it everywhere. What fun!
Another example of Simpson's Paradox can be found in evolutionary biology in connection with the concept of evolutionary altruism. Altruists donate fitness benefits to others at a fitness cost to themselves. Selfish individuals do not donate but they receive fitness benefits from altruists in the same group. So in all mixed groups, altruists are less fit than selfish individuals. However, if you look across groups it is possible that altruists are on average fitter than selfish individuals. The following simple example shows how this can happen.
Column Player
Altruistic Selfish
Row Altruistic 3,3 1,4
player Selfish 4,1 2,2
This table gives the fitnesses (= number of offspring) of asexual organisms in groups of size two.
Within mixed groups, altruists have 1 offspring and selfish individuals have 4. But within unmixed groups altruists have 3 offspring, while selfish individuals have 2. If altruists tend to live with other altruists, and selfish individuals tend to live with other selfish individuals, the average altruist in a metapopulation made of many such groups will be a bit less than 3 and
the average selfish individual will have a fitness that is a bit more than 2.
Individual selection is selection within groups; group selection is selection among groups.
In both, selection requires variation in fitness. So individual selection favors selfishness, but
I have that t shirt! Actually my adult son who studied philosophy, math, and CS at college inherited it. Made good use of it while I was teaching logic and philosophy of language.
I don't get how the utilitarianism above is supposed to reflect Simpson's paradox. What is the superset and what are the partitions? It seems to me you are talking about two different sets. Oh wait, I guess each individual is considered a subset vs. the entire set. In Scenario 1, subsets have higher utility and in Scenario 2, the entire set has higher utility.
Alright, sometimes writing about something clears it up.
Truly insightful for me. You have summarized by effective examples the subtleties and limitations of Simpson’s Paradox. This, without devolving into chaotic and depressing perspectivism (see Nietzsche).
"Some texts make it seem as if the conditional answers (P2) and (P3) are correct and (P1) is wrong. This is not necessarily true. There are several possibilities:"
1. {Z} is a confounder and is the only confounder. Then (P3) is misleading and (P1) and (P2) are correct causal statements.
2. There is no confounder. Moreover, conditioning on {Z} causes confounding. Yes, contrary to popular belief, conditioning on a non-confounder can sometimes cause confounding. (I discuss this more below.) In this case, (P3) is correct and (P1) and (P2) are misleading.
3. {Z} is a confounder but there are other unobserved confounders. In this case, none of (P1), (P2) or (P3) are causally meaningful.
Another is the necessary linkage to causal language:
"Without causal language— counterfactuals or causal graphs— it is impossible to describe Simpson’s paradox correctly. For example, Lindley and Novick (1981) tried to explain Simpson’s paradox using exchangeability. It doesn’t work. This is not meant to impugn Lindley or Novick— known for their important and influential work— but just to point out that you need the right language to correctly resolve a paradox. In this case, you need the language of causation."
Not so much a comment about the article (which Iiked) but a side note: I didn't "get" Simpson's Paradox, a concept that was new to me. So I opened CoPilot and input this prompt: <Please generate for me an explanation of Simpson's Paradox at a 9th-grade reading level. Also, please provide a couple of examples in contexts other than mathematics or statistics.> Bada bing bada boom! Simple definition, examples in about five contexts (sports, relationships, commerce, etc.), & an explanation of how SimPara affects decision- and policy-making. Is this an example of the wonders of AI ... or the abandonment of personal agency (since I didn't "do my own homework")?
See also the Will Rogers phenomenon: when a slightly different partition of a population moves up (or down) all the subset means while the whole popuylation mean remains unchanged.
I'm confused by most of your examples. They don't neatly seem to fit with your definition of Simpson's paradox (which itself is much more general than the definition that you quote, not "saying the same thing").
For example, the Voting, Gerrymandering, and Climate Change ones seem to involve a *majority* of the cells of the partition going one way, but the whole going the other. That's not nearly as counterintuitive or "paradoxical" as *every* cell in the partition going one way, but the whole going the other.
The reason, I think, is that in the former kind of case, I'll just assume that the minority somehow ended up having more weight in the overall calculation. That may not be the right explanation but it's one that seems available. On the other hand, when you tell me that every cell of the partition went one way, but the whole went the other, that explanation isn't available, which is what makes it puzzling.
Thanks. I was assuming that the number of gay men and lesbians is equal, so the numbers stay the same. Bisexual people might skew things a little, but I don't think all that much, just because the base rate is low.
Gene Heyman’s “Addiction: A Disorder of Choice” seems to dive into this in terms of individual choices aggregated over time. And brings in the psychosocial roles of Values (based on Prudential Rules) as culturally shared shorthands for accepting short-term constraints in order to optimize future outcomes (e.g. wearing a seatbelt). Additionally, the prudential choice must also be readily available (e.g. effective and easy to use seatbelts in every seat).
Love this! I am learning to (assistant) coach my son's soccer team. We only do tournaments, which start as pool play and end in a playoff game. We recently moved up an age division from U10 to U12 because one of our players had a birthday. That player happened to be my son and my husband happens to be the head coach, so we moved the whole team up. At any rate, we are now one of the younger and smaller teams in our division.
At one of the tourneys, we got creamed yet still qualified for the playoffs. Except that no one, including us, believed that from the way we looked on the field. We did eke out 2 wins, but lost by a landslide 2 more. The total points should have qualified us, but they didn't put us in the playoffs and we didn't realize it until the way home when I actually checked the refs numbers and the total points we earned. We were cheated out of losing badly in the playoffs because we didn't understand Simpson's Paradox and didn't see it necessary to double check the numbers until later.
Moving forward, I can now absolutely strategize the pool play better to take advantage of Simpson's Paradox. I can also do a better job at making sure the refs are treating us fairly. My son and his soccer team thank you!!
Additionally, now that I know about this concept (and totally love it!), I really am seeing it everywhere. What fun!
Awesome!
Another example of Simpson's Paradox can be found in evolutionary biology in connection with the concept of evolutionary altruism. Altruists donate fitness benefits to others at a fitness cost to themselves. Selfish individuals do not donate but they receive fitness benefits from altruists in the same group. So in all mixed groups, altruists are less fit than selfish individuals. However, if you look across groups it is possible that altruists are on average fitter than selfish individuals. The following simple example shows how this can happen.
Column Player
Altruistic Selfish
Row Altruistic 3,3 1,4
player Selfish 4,1 2,2
This table gives the fitnesses (= number of offspring) of asexual organisms in groups of size two.
Within mixed groups, altruists have 1 offspring and selfish individuals have 4. But within unmixed groups altruists have 3 offspring, while selfish individuals have 2. If altruists tend to live with other altruists, and selfish individuals tend to live with other selfish individuals, the average altruist in a metapopulation made of many such groups will be a bit less than 3 and
the average selfish individual will have a fitness that is a bit more than 2.
Individual selection is selection within groups; group selection is selection among groups.
In both, selection requires variation in fitness. So individual selection favors selfishness, but
group selection favors altruism.
Thanks. That helps me to understand group selection much better.
I have that t shirt! Actually my adult son who studied philosophy, math, and CS at college inherited it. Made good use of it while I was teaching logic and philosophy of language.
One of my favorite philosophy blog posts of all time -- just a nonstop stream of cool illustrations and provocative connections.
I read this when it came out, and I still think about it regularly.
Thanks!
You clearly recognize quality beer. Though, to be fair, I'm not sure that anyone who drinks could not enjoy a Black Tuesday.
I don't get how the utilitarianism above is supposed to reflect Simpson's paradox. What is the superset and what are the partitions? It seems to me you are talking about two different sets. Oh wait, I guess each individual is considered a subset vs. the entire set. In Scenario 1, subsets have higher utility and in Scenario 2, the entire set has higher utility.
Alright, sometimes writing about something clears it up.
Truly insightful for me. You have summarized by effective examples the subtleties and limitations of Simpson’s Paradox. This, without devolving into chaotic and depressing perspectivism (see Nietzsche).
Thank you for this informative article
Where does Homer come
in? 🥸
D’oh!
My favourite exposition of Simpson's Paradox is by Larry Wasserman: https://normaldeviate.wordpress.com/2013/06/20/simpsons-paradox-explained/
One point he makes is worth emphasizing:
"Some texts make it seem as if the conditional answers (P2) and (P3) are correct and (P1) is wrong. This is not necessarily true. There are several possibilities:"
1. {Z} is a confounder and is the only confounder. Then (P3) is misleading and (P1) and (P2) are correct causal statements.
2. There is no confounder. Moreover, conditioning on {Z} causes confounding. Yes, contrary to popular belief, conditioning on a non-confounder can sometimes cause confounding. (I discuss this more below.) In this case, (P3) is correct and (P1) and (P2) are misleading.
3. {Z} is a confounder but there are other unobserved confounders. In this case, none of (P1), (P2) or (P3) are causally meaningful.
Another is the necessary linkage to causal language:
"Without causal language— counterfactuals or causal graphs— it is impossible to describe Simpson’s paradox correctly. For example, Lindley and Novick (1981) tried to explain Simpson’s paradox using exchangeability. It doesn’t work. This is not meant to impugn Lindley or Novick— known for their important and influential work— but just to point out that you need the right language to correctly resolve a paradox. In this case, you need the language of causation."
Not so much a comment about the article (which Iiked) but a side note: I didn't "get" Simpson's Paradox, a concept that was new to me. So I opened CoPilot and input this prompt: <Please generate for me an explanation of Simpson's Paradox at a 9th-grade reading level. Also, please provide a couple of examples in contexts other than mathematics or statistics.> Bada bing bada boom! Simple definition, examples in about five contexts (sports, relationships, commerce, etc.), & an explanation of how SimPara affects decision- and policy-making. Is this an example of the wonders of AI ... or the abandonment of personal agency (since I didn't "do my own homework")?
I think looking for examples to help understand a new concept *is* doing your own homework, even if AI gave them to you.
See also the Will Rogers phenomenon: when a slightly different partition of a population moves up (or down) all the subset means while the whole popuylation mean remains unchanged.
I'm confused by most of your examples. They don't neatly seem to fit with your definition of Simpson's paradox (which itself is much more general than the definition that you quote, not "saying the same thing").
For example, the Voting, Gerrymandering, and Climate Change ones seem to involve a *majority* of the cells of the partition going one way, but the whole going the other. That's not nearly as counterintuitive or "paradoxical" as *every* cell in the partition going one way, but the whole going the other.
The reason, I think, is that in the former kind of case, I'll just assume that the minority somehow ended up having more weight in the overall calculation. That may not be the right explanation but it's one that seems available. On the other hand, when you tell me that every cell of the partition went one way, but the whole went the other, that explanation isn't available, which is what makes it puzzling.
Thanks. I should have said "most of which", not "each of which." I will fix.
You could also point out that an increasing number of women in younger age groups say they're bisexual, so they might be pairing off with each other!
Otherwise, excellent article on this mathematical conundrum!
Thanks. I was assuming that the number of gay men and lesbians is equal, so the numbers stay the same. Bisexual people might skew things a little, but I don't think all that much, just because the base rate is low.
Somewhere in the middle I realized this was not about Homer Simpson
Gene Heyman’s “Addiction: A Disorder of Choice” seems to dive into this in terms of individual choices aggregated over time. And brings in the psychosocial roles of Values (based on Prudential Rules) as culturally shared shorthands for accepting short-term constraints in order to optimize future outcomes (e.g. wearing a seatbelt). Additionally, the prudential choice must also be readily available (e.g. effective and easy to use seatbelts in every seat).