I just came across a blog post that claims that it is impossible to calculate
an average number of nth-generation descendants for ancestors that lived n
generations ago.
The blog author claims it is impossible because you do not know how many
children they had, whether these children lived to adulthood, whether these had
children themselves and because the number of children per couple isn’t a
constant.
It is true that the only way to determine the actual number of nth-generation descendants for any particular person is to research the genealogy, but that does not imply you cannot calculate an average without doing so.
It is fairly easy to come up with a reasonable estimate for the average number of nth-generation descendants for ancestors n (or more) generations ago by combining a simple formula and some population statistics.
To calculate the average number of nth-generation descendants, you need a few numbers; the number of individuals in the generation back then, the number of individuals in the current generation, and the number of generations in between. You need to look up or estimate these numbers.
Moreover, for best results, you need numbers that apply to the group you are
researching. For example, if you are investigating French ancestry, you’d want
to use the numbers for France.
Now, if you were to stick to just the French numbers, you’d be calculating the
average number of French descendants, and be excluding any American descendants.
So, you may or may not want to use emigration figures and some numbers for the
other country to adjust your numbers a bit.
What makes this calculation interesting are all the number you do not need. You do not need to estimate how many children each couple has, or how many of these children go on to have children of their own. You do not need to estimate how many couples each person formed (how often they remarried). You do not need to know their average age. You do not need to know how many are male or female.
You do not even need to know or estimate the number of individuals in the generations in between. All you need to know is the number of individuals in the current generation and the number of individuals in the generation back then.
Perhaps most interesting of all is that you do not need to know the distribution of children. It does not matter that one couple has two children, another have eight children and yet another has none. It does not matter if for one couple all children go on to have many children of their own, and that for another only a few have a few children of their own.
Once you are happy that you have good numbers for the group you are researching, the formula is very simple:
D = C × 2n ÷ A × c
I’ll explain the correction factor in a minute. Let’s assume it is 1 (i.e. no correction) for now and look at an example calculation.
Say you are studying a family, there were five individuals two generations ago and there are 25 individuals in the current generation. Put that into the formula, and we get
D = C × 2n ÷ A × c, with C = 25, n = 2, A = 5 and
c = 1
D = 25 × 22 ÷ 5 × 1
D = 25 × 4 ÷ 5
D = 20
Thus, the average number of descendants for those five ancestors back then is 20.
You can easily construct an example tree to verify that this number is right. Say that of the five individuals two generations ago one did not have any descendants at all, and the other two pairs each had just one child each, that those two married with other, and then have five quintuplets, thus making 25 children in the current generation.
That then makes four grandparents that each have 25 second-generation descendants, while the fifth person in that ancestral generation has none. So, the average number of second-generation descendants for those five individuals is 20. This example tree is definitely unbalanced, yet the formula works perfectly.
The reasoning behind this formula is very simple. We do not know how many children each ancestor had, and have no idea how these were distributed among them, so it may seem hopeless, but this formula makes good use of the one thing we do know: Each nth-generation descendant has 2n ancestors in the early generation (which we call generation zero). That we do know, and that we use.
This is the crucial bit of reasoning, there are 2n ancestors for whom that nth-generation descendants is a descendant. It is a matter of shifting your viewpoint. From the viewpoint of the nth-generation descendant, there are 2n ancestors. From the viewpoint of each those 2n ancestors, that person counts as one of their nth-generation descendants.
Calculating the average number of nth-generation descendants in the current generation is easy once you know how many nth-generation descendants each individual in the zeroth generation has; add up the numbers for each individual and divide by the number of individuals - and that is exactly what this formula does.
That the formula is doing exactly this may not be immediately obvious, so let’s explain how it works with a physical metaphor. Imagine for a moment that you have figured out the actual tree already. If you have the tree, figuring out the number of nth-generation descendants for each individual in the zeroth generation is easy; just ask the computer to do it. Now imagine you have to figure it out yourself.
Here’s one way to do it. You place a drawing or print of the tree on a large table, and place small empty containers, for example some glass jars on each of the zeroth-generation ancestors.
You could then somehow follow the lines to their descendants, to end up throwing in the right number of small items, say some coins, into the corresponding jar. Once you have thus enumerated all nth-generation descendants for each ancestor, you count the number of coins in each jar, and then calculate the average.
That would work in theory, but following all the lines to all descendants to end up with the right number of coins is a complex and error-prone procedure.
So, let’s do it the other way round, using the fact that you might just as well trace each line between ancestor and descendant in the other direction. Repeat the following procedure for each nth-generation descendant: trace their ancestry back to the zeroth generation and throw one coin into the jar of each ancestor in that generation. You trace the same lines, so you so end up with the same result, the same number of coins in each jar.
It is not hard to see that this will give the same result. One procedure starts at all the zeroth-generation ancestors to find all nth-generation descendants for each, the other starts at all nth-generation descendants to find all zeroth-generation ancestors for each. Both procedures always follow all the branches exactly once. Once procedure traces all descendants for each ancestors, the other traces all ancestors for each descendent. Both procedures end up tracing each unique ancestor-descendant relationship exactly once.
Now, the beauty of starting with the descendants is that you can use the coins to help you do so; Start by placing one coin on each individual in the current generation, and then follow the following recursive procedure; remove the coin and replace it with two coins, one on each parent, then repeat for those parents, until you hit the glass jars, then put the coins in the jars. Then do the same thing for the next person in the current generation, until you’ve done this for all the nth-generation descendants, and you are left with bunch of glass jars with a varying number of coins in each. The number of coins in each jar is the exact number of nth-generation descendants for that ancestor.
To calculate the average number of nth-generation descendants for the individuals in the zeroth generation, add those numbers up, and divide by the number of individuals. In other words: count all the coins, and divide by the number of individuals.
Now, before you start drawing a family tree and searching the house for jars
and coins, think a moment about just what we end up calculating and how.
To calculate the average, we merely count all the
coins. We are only calculating an average, so we do not really care how the coins
end up being
distributed over those ancestors, we only care how many coins there are. But if we do
not care how they are distributed, why bother with the coins and jars in the
first place? Why draw a tree at all? We already know exactly how many coins
there are.
We know that we will placing one coin for each ancestor of each nth-generation descendant , and we know that each nth-generation descendant has 2n zeroth-generation ancestors; so we would be busy distributing C times 2n coins over A jars, only to end up dividing the total number of coins, C × 2n, by A. Well, if that is the formula anyway, we can do without all the attributes, and simply plug the numbers into the formula: D = C × 2n ÷ A.
There is one important thing that I’ve ignored so far, and that is the issue of
pedigree collapse. There are always exactly 2n
ancestors n generations ago, but those 2n
ancestors are not necessarily 2n
different ancestors. Some might be the same ones, and if you go back far enough,
you are sure to find the same ones over and over again.
There are always exactly 2n
unique descendant-ancestor relationships, but merely at most 2n
unique ancestors. That is were the correction factor comes in.
If you were to place all coins in front of each jar, and wait with throwing them in until you’ve placed all the ancestral coins for one of the nth-generation descendants, you would sometimes find multiple coins in front of the same jar. To end up calculating the right number of descendants, you would throw only one of those coins in. You would always trace exactly 2n unique descendant-ancestor relationships for each nth-generation descendants but throw at most 2n coins in the ancestor jars.
The formula without a correction factor calculates the number of nth-generation descendants including duplicates twice, including triplicates thrice and so on.
You do not always know all ancestors, sometimes the father is unknown and
sometimes the mother was known at the time, but not recorded.
This does not influence the formula at all.
Consider the entire population; you may not know who the ancestor is, but you do know that the ancestor is within the population. Now consider other individuals with their family tree spread out on other tables. Each of you is busy placing coins into jars and is left with a number of coins that should go into, well, unknown jars. Some of these jars may actually correspond to zeroth-generation individuals in your tree, in which case the formula is not affected - you do not know which of your jar to place the coin in, but for the calculation of the average, you do not care which jar it is anyway.
Now consider the coins that belong in other trees. If you were somehow able to figure out were these coins belong you could walk over to the right table and hand them your coins. But, those people at the other tables would be doing the very same thing and various tables would be handing you some coins. On average, each table would receive just as many coins as it gave away - and that is why it doesn’t matter.
There are unknown parents outside your documented tree, but there are also children in other trees for which the individuals in your tree are the undocumented parents. On average, those two values compensate each other perfectly, and that is why the calculation of the average number of descendants is not affected.
Similar reasoning applies to so-called non-parental events. As long as you are merely calculating an average, and are not trying to incorporate some specific knowledge of event, it does not affect the calculation of an average.
The formula merely calculates an average, no lower or upper limit. The lower limit for the number of descendants is always zero. There is no hard upper limit, but you could use the known record for most children to calculate an impractically high number.
You could also use available statistics on number of children and the number of
children that have children themselves to calculate limits with varying degrees of
certainty to be able to make statements such as it is 98.76 % sure that this group has less than 4312 descendants in the 5th generation.
.
The formula applies to specific generations, without regard to them being alive or not. It does not automatically apply to the number of people alive at a particular time. Any such number includes multiple generations, and does not include those who’ve already died.
You should not careless mix the number of individuals in a particular generation with the number of people alive at a certain time. Those are related but different values.
In practice, you’ll probably have to use population statistics as your source for information. The most basic population statistic is the number of people alive alive at a certain time.
If you are after the number of individuals in a specific generation, you will have to guestimate that number from the population statistics you have. For example, you could assume that there are roughly three generations alive at the time, with more people in the most recent one than the older ones, and somehow guestimate that the generation you are after makes up 30 percent of the total population at that time. If you have more population numbers, you could use these make a better estimate.
You could ignore this issue for simplicity’s sake, and simply plug in the number of people alive at the time as the number of individuals in that particular generation. When you do so, you are actually calculating something else; the average number of living descendants for all people living back then assuming an average of n generations in between.
That is not the same as, but it is a fairly reasonable estimate for the average number of nth-generation descendants for the main generation back then. It may be off by a small factor, but it is very unlikely to be off by a magnitude.
Because of the correction factor for pedigree collapse and the difference between individuals in a generation and people alive, it is hard to calculate the average exactly, but straightforward plugging in of population figures does produce a reasonable estimate of the average, and that is likely to be good enough for most purposes. After all, even if it produced exactly the average, it would still be an average; individual genealogies will vary.
Still, it is possible to calculate a reasonable estimate for the average number of nth-generation descendants from nothing but estimates for the total number of individuals in both generations.
Copyright © Tamura Jones. All Rights reserved.