Modern Software Experience

2009-05-31

formula

impossible?

I just came across a blog post that claims that it is impossible to calculate an average number of nth-generation descendants for ancestors that lived n generations ago.
The blog author claims it is impossible because you do not know how many children they had, whether these children lived to adulthood, whether these had children themselves and because the number of children per couple isn’t a constant.

actual versus average

It is true that the only way to determine the actual number of nth-generation descendants for any particular person is to research the genealogy, but that does not imply you cannot calculate an average without doing so.

It is fairly easy to come up with a reasonable estimate for the average number of nth-generation descendants for ancestors n (or more) generations ago by combining a simple formula and some population statistics.

what you need

To calculate the average number of nth-generation descendants, you need a few numbers; the number of individuals in the generation back then, the number of individuals in the current generation, and the number of generations in between. You need to look up or estimate these numbers.

Moreover, for best results, you need numbers that apply to the group you are researching. For example, if you are investigating French ancestry, you’d want to use the numbers for France.
Now, if you were to stick to just the French numbers, you’d be calculating the average number of French descendants, and be excluding any American descendants. So, you may or may not want to use emigration figures and some numbers for the other country to adjust your numbers a bit.

what you do not need

What makes this calculation interesting are all the number you do not need. You do not need to estimate how many children each couple has, or how many of these children go on to have children of their own. You do not need to estimate how many couples each person formed (how often they remarried). You do not need to know their average age. You do not need to know how many are male or female.

You do not even need to know or estimate the number of individuals in the generations in between. All you need to know is the number of individuals in the current generation and the number of individuals in the generation back then.

Perhaps most interesting of all is that you do not need to know the distribution of children. It does not matter that one couple has two children, another have eight children and yet another has none. It does not matter if for one couple all children go on to have many children of their own, and that for another only a few have a few children of their own.

how to calculate

Once you are happy that you have good numbers for the group you are researching, the formula is very simple:

D = C × 2n ÷ A × c

C
number of individuals in Current generation
A
number of individuals in Ancestral generation
n
number of generations in between
D
average number of Descendants
c
correction factor

I’ll explain the correction factor in a minute. Let’s assume it is 1 (i.e. no correction) for now and look at an example calculation.

example

Say you are studying a family, there were five individuals two generations ago and there are 25 individuals in the current generation. Put that into the formula, and we get

D = C × 2n ÷ A × c, with C = 25, n = 2, A = 5 and c = 1
D = 25 × 22 ÷ 5 × 1
D = 25 × 4 ÷ 5
D = 20

Thus, the average number of descendants for those five ancestors back then is 20.

You can easily construct an example tree to verify that this number is right. Say that of the five individuals two generations ago one did not have any descendants at all, and the other two pairs each had just one child each, that those two married with other, and then have five quintuplets, thus making 25 children in the current generation.

That then makes four grandparents that each have 25 second-generation descendants, while the fifth person in that ancestral generation has none. So, the average number of second-generation descendants for those five individuals is 20. This example tree is definitely unbalanced, yet the formula works perfectly.

how it works

The reasoning behind this formula is very simple. We do not know how many children each ancestor had, and have no idea how these were distributed among them, so it may seem hopeless, but this formula makes good use of the one thing we do know: Each nth-generation descendant has 2n ancestors in the early generation (which we call generation zero). That we do know, and that we use.

This is the crucial bit of reasoning, there are 2n ancestors for whom that nth-generation descendants is a descendant. It is a matter of shifting your viewpoint. From the viewpoint of the nth-generation descendant, there are 2n ancestors. From the viewpoint of each those 2n ancestors, that person counts as one of their nth-generation descendants.

calculating

Calculating the average number of nth-generation descendants in the current generation is easy once you know how many nth-generation descendants each individual in the zeroth generation has; add up the numbers for each individual and divide by the number of individuals - and that is exactly what this formula does.

That the formula is doing exactly this may not be immediately obvious, so let’s explain how it works with a physical metaphor. Imagine for a moment that you have figured out the actual tree already. If you have the tree, figuring out the number of nth-generation descendants for each individual in the zeroth generation is easy; just ask the computer to do it. Now imagine you have to figure it out yourself.

coins

Here’s one way to do it. You place a drawing or print of the tree on a large table, and place small empty containers, for example some glass jars on each of the zeroth-generation ancestors.

descendants

You could then somehow follow the lines to their descendants, to end up throwing in the right number of small items, say some coins, into the corresponding jar. Once you have thus enumerated all nth-generation descendants for each ancestor, you count the number of coins in each jar, and then calculate the average.

reverse direction

That would work in theory, but following all the lines to all descendants to end up with the right number of coins is a complex and error-prone procedure.

So, let’s do it the other way round, using the fact that you might just as well trace each line between ancestor and descendant in the other direction. Repeat the following procedure for each nth-generation descendant: trace their ancestry back to the zeroth generation and throw one coin into the jar of each ancestor in that generation. You trace the same lines, so you so end up with the same result, the same number of coins in each jar.

same result

It is not hard to see that this will give the same result. One procedure starts at all the zeroth-generation ancestors to find all nth-generation descendants for each, the other starts at all nth-generation descendants to find all zeroth-generation ancestors for each. Both procedures always follow all the branches exactly once. Once procedure traces all descendants for each ancestors, the other traces all ancestors for each descendent. Both procedures end up tracing each unique ancestor-descendant relationship exactly once.

procedure

Now, the beauty of starting with the descendants is that you can use the coins to help you do so; Start by placing one coin on each individual in the current generation, and then follow the following recursive procedure; remove the coin and replace it with two coins, one on each parent, then repeat for those parents, until you hit the glass jars, then put the coins in the jars. Then do the same thing for the next person in the current generation, until you’ve done this for all the nth-generation descendants, and you are left with bunch of glass jars with a varying number of coins in each. The number of coins in each jar is the exact number of nth-generation descendants for that ancestor.

averaging

To calculate the average number of nth-generation descendants for the individuals in the zeroth generation, add those numbers up, and divide by the number of individuals. In other words: count all the coins, and divide by the number of individuals.

how many

Now, before you start drawing a family tree and searching the house for jars and coins, think a moment about just what we end up calculating and how.
To calculate the average, we merely count all the coins. We are only calculating an average, so we do not really care how the coins end up being distributed over those ancestors, we only care how many coins there are. But if we do not care how they are distributed, why bother with the coins and jars in the first place? Why draw a tree at all? We already know exactly how many coins there are.

formula

We know that we will placing one coin for each ancestor of each nth-generation descendant , and we know that each nth-generation descendant has 2n zeroth-generation ancestors; so we would be busy distributing C times 2n coins over A jars, only to end up dividing the total number of coins, C × 2n, by A. Well, if that is the formula anyway, we can do without all the attributes, and simply plug the numbers into the formula: D = C × 2n ÷ A.

correction factor

There is one important thing that I’ve ignored so far, and that is the issue of pedigree collapse. There are always exactly 2n ancestors n generations ago, but those 2n ancestors are not necessarily 2n different ancestors. Some might be the same ones, and if you go back far enough, you are sure to find the same ones over and over again.
There are always exactly 2n unique descendant-ancestor relationships, but merely at most 2n unique ancestors. That is were the correction factor comes in.

If you were to place all coins in front of each jar, and wait with throwing them in until you’ve placed all the ancestral coins for one of the nth-generation descendants, you would sometimes find multiple coins in front of the same jar. To end up calculating the right number of descendants, you would throw only one of those coins in. You would always trace exactly 2n unique descendant-ancestor relationships for each nth-generation descendants but throw at most 2n coins in the ancestor jars.

The formula without a correction factor calculates the number of nth-generation descendants including duplicates twice, including triplicates thrice and so on.

unknown parents

You do not always know all ancestors, sometimes the father is unknown and sometimes the mother was known at the time, but not recorded.
This does not influence the formula at all.

Consider the entire population; you may not know who the ancestor is, but you do know that the ancestor is within the population. Now consider other individuals with their family tree spread out on other tables. Each of you is busy placing coins into jars and is left with a number of coins that should go into, well, unknown jars. Some of these jars may actually correspond to zeroth-generation individuals in your tree, in which case the formula is not affected - you do not know which of your jar to place the coin in, but for the calculation of the average, you do not care which jar it is anyway.

Now consider the coins that belong in other trees. If you were somehow able to figure out were these coins belong you could walk over to the right table and hand them your coins. But, those people at the other tables would be doing the very same thing and various tables would be handing you some coins. On average, each table would receive just as many coins as it gave away - and that is why it doesn’t matter.

There are unknown parents outside your documented tree, but there are also children in other trees for which the individuals in your tree are the undocumented parents. On average, those two values compensate each other perfectly, and that is why the calculation of the average number of descendants is not affected.

Similar reasoning applies to so-called non-parental events. As long as you are merely calculating an average, and are not trying to incorporate some specific knowledge of event, it does not affect the calculation of an average.

limits

The formula merely calculates an average, no lower or upper limit. The lower limit for the number of descendants is always zero. There is no hard upper limit, but you could use the known record for most children to calculate an impractically high number.

You could also use available statistics on number of children and the number of children that have children themselves to calculate limits with varying degrees of certainty to be able to make statements such as it is 98.76 % sure that this group has less than 4312 descendants in the 5th generation..

population statistics

The formula applies to specific generations, without regard to them being alive or not. It does not automatically apply to the number of people alive at a particular time. Any such number includes multiple generations, and does not include those who’ve already died.

You should not careless mix the number of individuals in a particular generation with the number of people alive at a certain time. Those are related but different values.

In practice, you’ll probably have to use population statistics as your source for information. The most basic population statistic is the number of people alive alive at a certain time.

If you are after the number of individuals in a specific generation, you will have to guestimate that number from the population statistics you have. For example, you could assume that there are roughly three generations alive at the time, with more people in the most recent one than the older ones, and somehow guestimate that the generation you are after makes up 30 percent of the total population at that time. If you have more population numbers, you could use these make a better estimate.

people alive

You could ignore this issue for simplicity’s sake, and simply plug in the number of people alive at the time as the number of individuals in that particular generation. When you do so, you are actually calculating something else; the average number of living descendants for all people living back then assuming an average of n generations in between.

That is not the same as, but it is a fairly reasonable estimate for the average number of nth-generation descendants for the main generation back then. It may be off by a small factor, but it is very unlikely to be off by a magnitude.

conclusion

Because of the correction factor for pedigree collapse and the difference between individuals in a generation and people alive, it is hard to calculate the average exactly, but straightforward plugging in of population figures does produce a reasonable estimate of the average, and that is likely to be good enough for most purposes. After all, even if it produced exactly the average, it would still be an average; individual genealogies will vary.

Still, it is possible to calculate a reasonable estimate for the average number of nth-generation descendants from nothing but estimates for the total number of individuals in both generations.

links