Modern Software Experience

2009-06-04

Genealogy

Average

What is the average size of a digital genealogy? What was it a generation ago, what is it today, and what will it be a generation from now?

What is the size of the average genealogy today, one generation after the introduction of genealogy software?

today

Two of those three questions are easily answered. Genealogy software appeared a generation ago, so back then, the average size of a digital genealogy was close to zero. A generation from now, the average size will have more than doubled, but just what is today’s average size?

What is the size of the average genealogy today, one generation after the introduction of genealogy software?

I will first show that the average is less than ten, and then use the same numbers to show that it is more than four million.

statistics

Well, calculating an average is in the realm of statistics, and you know what they say about statistics…

To make sure that I do not disappoint those who expect numbers worse than lies, I will first show that the average is less than ten, and then use the same numbers to show that it is more than four million.

today’s average

What is today’s average genealogy size? It is hard to answer that. Anyone who has been researching for a few years is likely to have several thousand individuals in their database, but there are many smaller databases around, and quite a few much larger ones as well.

vendor limits

Vendors whose software is limited to handling just a few thousand individuals are likely to argue that the average is just a few hundred, or at most a few thousand, as only that position allows them conclude that they are able to handle most databases, serve most users.

Did these vendors figure out the average first and then made sure their limits are higher? Or did they arbitrarily impose some limit and are they now trying to rationalise the limitation of their product?
What is the average genealogy size anyway?

official numbers

The article My Large is Smaller than Yours highlights the maximum GEDCOM sizes of several vendors. Several vendors have or had a limit of 5.000 individuals. Some have even smaller ones, but if we were to go by this somewhat popular vendor limit, we’d guess that 5.000 must be pretty large and guess that the average is a few hundred or a few thousand at most.

Truth is, as long we do not know how these vendors decided on the limitations of their products, those limitations are not even a poor indicator of average size, but no indicator at all.

Luckily, some other vendors have published actual numbers from which an average can be calculated. Surprising, these numbers suggest an even smaller average size. Way smaller.

official numbers

Press releases for social genealogy applications and sites such as We’re Related and Geni include numbers to boast how popular these applications are. Press releases that mention both the number of profiles and the number of users allow calculating an average.

The How Geni beats We’re Related argues that Geni’s profiles/users ratio (P/U ratio) is three times that of We’re Related simply because Geni supports GEDCOM import and We’re Related does not.

That is an interesting thought, but let’s get back to the more immediate observation is that both ratios are remarkable small.

applicationdateprofilesusersratio
Geni.com2009-03-3050.000.0003.000.000 15
We’re Related2009-05-21200.000.00040.000.0005

If we simply average the Geni and We’re Related ratios (we’re after an average, after all), the average size appears to be 10. If we take into account that We’re Related claims more users, and calculated the weighted average, we find that it is considerably less than 10 (250 M / 43 M = 5,8).

In My Large is Smaller than Yours, I disagreed with Brigham Young University calling a database as small as just seven individuals large, but these numbers suggest they were actually right to do so, because seven is above average size already.

The numbers for Geni and We’re Related suggest that the average genealogy is so small, that a limit of 5.000 individuals is not just large, but positively gigantic.

MyHeritage

MyHeritage home page numbers

The numbers on the home page of MyHeritage seem to confirm the average of 10 profiles per user; 344.242.531 names for 32.092.296 members is 10,7.
Interestingly, MyHeritage lists a third number, the number of family sites, and we should probably divide by that instead of the number of users anyway. If we do so, the average becomes 344.242.531 / 6.791.344 = 50,6.

From my personal experience browsing MyHeritage, that number strikes me as quite possible. However, MyHeritage’s average isn’t a good estimate of today’s average tree size, simply because all three MyHeritage numbers are strongly influenced by the pushy behaviour of Family Tree Builder and their low limit for free use. Still, for what it is worth, the MyHeritage numbers seem to confirm that the average is just a 2-digit value.

merging trees

The Geni and We’re Related numbers show that the average size of a tree is less than ten individuals. Or do they?

Perhaps the calculation of the average was not entirely correct. The calculation did not take tree merging into account. The ability to merge trees of different users into a single larger tree and then work together on it is the defining of social genealogy applications, and I ignored that feature altogether.

The calculation of the average genealogy size should take tree merging into account. After all, even if every user were to merge their tree with only one other user, the average tree would still be about twice as big as the previous calculation suggests.

The Big Tree

Geni users can merge their trees, and the largest fragment that resulted so far is known as The Big Tree. Never mind that it isn’t a tree, That’s what they call it.

Geni.com has been using this tree to promote the site. Their 2008 Aug 4 press release Family Trees Grow Virally on Geni.com boasts that The largest tree on Geni now contains profiles of over 600.000 people and was built by over 40.000 users., which averages out to 15 profiles per user.

There is a Google spreadsheet that tracks the size of The Big Tree, and it is between 15 million and 16 million profiles large already. The users in The Big Tree may have contributed only 15 profiles each (600.000 / 40.000 = 15), but now all connect to more than 15 million profiles, a million times as much. Perhaps we should recalculate the average size...

average

To estimate the average size of each user’s tree, taking into account that many are connected to The Big Tree, we need to know how many users are connected to it. The 2009 Apr 15 press release Geni.com Launches Geni Facebook Application helps us out: Over 650.000 users have already joined Geni’s largest family tree, which has over 12 million connected profiles..

If we assume that the current number of connected users is about ¾ million, then ¼ of the 3 millions users in total are part of The Big Tree. So, ignoring the size of all other trees, and assuming a current size for the The Big Tree of 16 million profiles, the average genealogy size is 4 million already.

Size is relative, and average size is a statistic.

updates

2009-06-05 Social Genealogy Metrics

The Social Genealogy Metrics articles defines some metrics that do make sense.

links

articles

sites

press releases

We’re Related

Geni.com