In each course I have taught with an enrollment under 75, I have conducted the first day of class in essentially the same way: a brief self-introduction, a discussion of course policies, and two closely related probability questions, designed as a proverbial “ice breaker” to facilitate rapport and stretch the students’ intuition. Typically, I exit the room for about ten minutes after writing the questions, which I originally heard from Spellman College Professor Colm Mulcahy in a talk honoring the late legendary puzzlemaster Martin Gardner, on the blackboard as follows:
Part 1: A man has exactly two children, at least one of which is a boy. What is the chance he has two boys?
Part 2: A man has exactly two children, at least one of which is a boy born on a Tuesday. What is the chance he has two boys?
For the purposes of these discussions, we make the (not completely accurate) assumptions that every child is born either a boy or a girl, with equal likelihood, and that a child is born on each day of the week with equal likelihood. Since some people have difficulty conceptualizing the abstract probability concept of a random variable in such a concrete context (“What do you mean? It’s one guy, he either has two boys or he doesn’t…”), it is often useful to rephrase these and future questions as “What portion of ALL families with exactly two children, at least one of which is a boy, would you expect to have two boys?,” et cetera. From a pure mathematical perspective, the questions are nothing even remotely special, completely appropriate as a quick homework exercise in a discrete math or first probability course. What distinguishes them, rather, is their lack of compliance with potentially blinding intuition.
After the allotted time elapses, I am usually pleased to reenter a loud room with multiple enduring discussions, and I solicit volunteers to offer an answer to Part 1. Whether it is in this context, or if I simply ask the question in casual conversation (several of my friends are quite sick of it by now, I’m sure), the first guess is almost invariably 50%. The offered logic?
“He has two children, one is a boy, the other is either a boy or a girl, each occurring 50% of the time. Therefore, the chance that he has two boys is 50%.”
When presented this seemingly sound logic, I diplomatically respond that I would not consider this an incorrect answer as much as a correct answer to a different question or questions. Most notably, if the question had begun “A man has two children, the OLDER of which is a boy…” or “A man has two children, the YOUNGER of which is a boy…”, then 50% is correct. So how is the question as stated any different?
Solution to Part 1: When two children are born, there are exactly four possible outcomes with respect to the children’s gender in chronological order, each occurring with equal probability: Boy/Boy, Boy/Girl, Girl/Boy, and Girl/Girl. The additional provided information that at least one of the children is a boy eliminates the Girl/Girl outcome, leaving three equally likely outcomes, one of which, Boy/Boy, qualifies as a success. Therefore, the correct probability is 1/3=33.33…%.
For the correct analysis of Part 1 with visual aid, along with discussion of the common flaws in logic yielding the answer of 50%, check out this diagram: Part 1 Analysis
To field another common objection before moving on, it is indeed correct to consider Boy/Girl and Girl/Boy as distinct outcomes, as they each occur 25% of the time when considering two random births. If, for whatever reason, you feel the need to identify these two outcomes as being the same, that is perfectly fine, but you then lose the convenient property that each distinct outcome occurs with equal likelihood (i.e. for two random births you would have three outcomes: 25% two boys, 25% two girls, and 50% one of each).
Now that everyone is on their toes, we strike at the common intuition’s jugular vein with Part 2.
Solution to Part 2: Now considering gender and day of birth, there are a total of 14 possible outcomes in which the older child is a boy born on a Tuesday, 7 of which contain two boys. Similarly, there are 14 outcomes in which the younger child is a boy born on a Tuesday, 7 of which contain two boys. These two components of the sample space overlap at only one outcome, when both children are boys born on Tuesdays, which in particular contains two boys. Therefore, the total number of possible outcomes is 14+14-1=27, each occurring with equal likelihood, and 7+7-1=13 of which contain two boys. Therefore, the answer is 13/27≈48.15%.
Once again, visualized analysis and discussion is provided here: Part 2 Analysis
The biggest expressed hangup with this solution? “How could the day of the week possibly be relevant? The kid has to be born on SOME day of the week, why would it matter which?” The big issue with that last sentence is the phrase “THE kid”, and to try to wrap your mind around this issue more, check out the linked diagrams. In particular, we could consider seven different versions of the Part 2 question by running through each day of the week, thus dividing the full sample space from Part 1 into seven pieces, and restricted to each piece the success rate of having two boys is 13/27, but that doesn’t mean the total success rate is 13/27 because those seven pieces OVERLAP WITH EACH OTHER. To use some mathematical terminology, there is a big difference between a decomposition of a set (simply expressing the set as a union of subsets), and a partition of a set (expressing the set as a union of DISJOINT subsets).
For another example of this phenomenon, let’s turn to the world of sports: What if I told you that during the 2014 season, a AAA pitcher named John Smith earned a decision in every start, won 60% of his starts in games that began before 5pm, and won 60% of his starts that began after 2pm. Was Smith necessarily a winning pitcher? Nope, he could easily have gone 0-4 at 1pm, 6-0 at 4pm, and 0-4 at 7pm. All the given conditions are satisfied, yet he winds up a disappointing 6-8 on the year. OVERLAP IS IMPORTANT!
To return to the more practical rephrasing of the two original questions, we have effectively changed the sample space from “ALL families with exactly two children, at least one of which is a boy” to “ALL families with exactly two children, at least one of which is a boy born on a Tuesday”, and whether your gut tells you that this change should affect what portion of these families have two boys, it is undeniable that at face value we are considering a different collection of families. Perhaps the theme of this piece is that when conducting any sort of analysis, regardless of intuition, any new information is potentially relevant.
An observation worth noting: the answer to Part 2 is much closer to the common gut reaction of 50% than the answer to Part 1 is. Is this just an odd occurrence or indicative of a more general phenomenon?
Definitely the latter. As discussed in the linked diagrams, the only thing keeping the answer from being 50% is the overlap between the cases when the older child satisfies the given condition and the cases when the younger child satisfies the given condition. The more unlikely the given condition is, the smaller that overlap will be compared to the entirety of the sample space, and hence the closer the answer will be to 50%. For example, suppose we posed the question “A man has two boys, at least one of which is boy that was born on Christmas?” As long as we agree that the probability of being born on Christmas is 1/365 (probably not actually true), then we can find the answer. Visualized analysis and discussion of a more generalized version are provided here: Generalized Analysis 1
While discussing these questions with my good friend and EPA scientist Daniel Garver, we had the following exchange:
Daniel: Ok, so what if you tell me that the man has exactly two children at least of which is a boy and ask me the chance he has two boys, and I respond by asking you the day of the week the boy was born on, does the probability change just by you answering my question?
Alex: Well, you can’t say ‘the day of the week THE BOY was born on’, because that makes it seem like one particular child has been specified…
Daniel: Ok, ok, fine, I ask you to provide me with the day of the week on which A MALE CHILD belonging to this man was born, and you answer me. Does the probability change?
Alex: As strange as it sounds, yes.
Daniel: …Ok, but what if you ask me the original question, and I respond by requesting that you provide me with a PICTURE of A MALE CHILD belonging to this man. There’s only one of that exact kid, so does that make the answer exactly 50%?
Alex: Yes, because then it is totally sound logic to say that this man has two kids, one of which is THAT KID, and the other of which is either a boy or a girl with equal probability. They can’t both be THAT KID, so there is no overlap issue.
(slightly unsure pause)
You know, it might seem like nothing can change just by showing you a picture, but if you take the practical as opposed to the abstract approach, the picture really changes the situation as much as it could possibly be changed, because the sample space went from all families with exactly two children and at least one boy to JUST ONE FAMILY.
Daniel: Yeah, that’s right… weird.
It is weird, but it is right. Daniel’s question got me thinking a little more, and I realized that the question could be generalized beyond just conditions that occur in each child independently to include conditions which occur non-independently. After all, it’s ALL ABOUT THE OVERLAP, so as long as we know the conditional probability that BOTH the children are boys satisfying the given condition given that at least one is, we can find the answer. In particular, Daniel’s inquiry can be greatly simplified, because in order to make the answer dead on 50%, we don’t need a condition that only one child in the world satisfies, we just need a condition that two brothers CAN’T SATISFY SIMULTANEOUSLY. For example, if we rule out crazy people like George Foreman, we could just specify a particular name, like “A man has exactly two children, at least one of which is a boy named Jack. What is the chance he has two boys?” He presumably can’t have two children named Jack, so the answer is exactly 50%, and the initial gut reaction is finally right. Visualized analysis and discussion of this further generalized version relying only on overlap probability are provided here: Generalized Analysis 2
Once you make your peace with these questions and their solutions, try them on your friends, family, teachers, students, coworkers, et cetera. Again, they’re nothing Earth-shattering, but with the right crowd they tend to lead to engaging and stimulating discussions. I’ve gotten a lot of good mileage out of them over the years, and I figured it was time to pass them on. Enjoy!