Modern Software Experience

2009-06-13

Genealogy Hosting

Average Size

Average GEDCOM Upload Size discussed factors that bias the average size of a genealogy hosting service and set some criteria for a site to calculate an average for. FamilyLink’s WebTree meets these criteria, and Average WebTree Size calculated the average for this site.

That average for WebTree is roughly 12.500 individual per database. How does that compare to other sites, biased or not?

GENDEX Sites

There are not that many genealogy hosting sites that publish usable numbers, but there is another way to get numbers for a large sample of online genealogy databases: GENDEX Sites.

The GENDEX Sites article list several historical and current numbers for GENDEX sites.

vendor bias

The figures for GENDEX sites do have some vendor bias. Few applications supports GENDEX directly, but those that do are likely to be overrepresented. Still, with multiple applications supporting GENDEX directly, and each one of these having a different bias, the GENDEX bias might well be negligible.

GENDEX versus GEDCOM

The GEDCOM and GENDEX format serve different purposes. Simply put, GEDCOM provides full data, and GENDEX provides just the index.
Both formats yield the same individual count, yet the GEDCOM average and GENDEX average are not likely to be exactly the same.

There is not only a vendor bias, but GENDEX is also lesser known. The group of GENDEX users is not just smaller than the group of GEDCOM users, but also has a slightly different make-up.

GENDEX

The original GENDEX site was operational for eight years (1996 - 2004). The last submission to GENDEX was made some five years ago, and it included databases submitted 13 years ago, so we expect a relatively low average.

When GENDEX shut down in 2004, it had indexed more than 60.000 records in more than 22.000 database, that is an average of 2.727,27.

FamilyTreeSeeker

The FamilyTreeSeeker site offers an English language interface to the Dutch StamboomZoeker site, which was started on 2006 Sep 25.
FamilyTreeSeeker contains mostly Dutch data. FamilyTreeSeeker is naturally biased towards statistics for Dutch genealogies and several popular Dutch applications.

FamilyTreeSeeker has a statistics page. There currently are 15.492.073 records in 5.709 databases. Thus the average size is 2.713,62.

FamilyTreeSeeker also has a statistics page. That page makes it clear that the largest tree it indexes is one I recognise as another site, one which contains an unknown number of databases. Still, also shows that FamilyTreeSeeker indexes 17 databases that contain more than 100.000 individuals each, of which four even contain more than 250.000 individuals each. The statistics page lists the 25 largest databases, and the smallest of these is just under 80.000 individuals.

Correcting by removing the largest tree from the total, the average becomes 15.492.073 - 934.949 / 5.708 = 14.557.124 / 5.708 = 2.550,30.

It is fairly remarkable that, so many years later, the average is not considerably higher than that for the original GENDEX site. It is actually lower. Perhaps this relates to the ease of GENDEX export in several Dutch applications.
The average and the size of the largest databases do suggest that FamilyTreeSeeker is indexing a lot of very small trees. Perhaps that is because tracing ancestors before 1811, the country-wide introduction of family names, is more difficult, and researchers stops there.

GenServ

When I just visited GenServ its home page, it boasted 23068778 individuals/notes online in 16037 GEDCOM databases, which averages to just 1.438,47 individuals per GEDCOM.

smaller

That GenServ’s average is remarkably smaller than the other averages is easily explained by two things.
Their offer of 15 days free access in exchange for any GEDCOM with at least 20 valid names is likely to prompt many small uploads. More than one visitor will have uploaded many small files instead of one large.
Additionally, GenServ has been in operation since 1991, a time when the average size of genealogy databases was a lot smaller than it is now.

TNG Network

Technically, the TNG Network is a GENDEX system, but practically it is a vendor-specific system. It was started on 2006 Jan 16.

I just checked the TNG Network. The home pages states that there are 7.729.635 records in 1.236 databases. That is an average size of 6.253,75.

When I visited for the 2009 Feb 21 article, there were 7.065.060 records in 1.055 databases, which averages to 6.696,74 individuals per database.

According to Camera on the Road: GENDEX Database Project: Connecting Genealogy Family Trees Online. published on 2006 Oct 19, there were 4 million records in 230 databases in the TNG Network back then, and that averages to 17.391,30 records per database.

There are several obvious observations here.
First of all, the TNG Network average is higher than that of the other GENDEX sites. That makes sense, as the TNG Network is a more recent site, containing more recent databases, and the average has gone up over time.

Surprising is that the 2006 average is higher than the WebTree average, and that the average has decreased since then. What happened here?
Is something similar happening to WebTree? Will its average decrease towards 7.000?

GEDCOM sites

Many genealogy hosting sites that allow you to upload a GEDCOM do boast the total number of records in their database, but do not list how many users contributed that data, making it impossible to calculate an average.
Some sites do provide the necessary data.

GenCircles

GenCircles is a GEDCOM hosting site. The domain was registered in 1999, and the site started in 2000. That is almost ten years ago, so we expect a relatively low average.

GenCircles used to be quite popular, but it was put up for sale in 2006, and in 2007, owner Pearl Street Software was bought by MyHeritage.

average

The GenCircles site is still operational, but the rate of new uploads has slowed significantly since then. So, although the average calculated is for 2009, it is not representative for the period 2000-2009, but perhaps more for the period 2000-2007.

There is a enough public data on the GenCircles site to calculate the average size of the currently more than 100.000 trees. That average size turns out to be 1.946,98.

bias

GenCircles is associated with Pearl Street Software, the maker of Family Tree Legends, and will therefore have a bias towards the average for that application.
However, the most important factor dragging down GenCircles’s average is its Smart Matching feature. The possibility of a match with already uploaded databases makes it attractive to upload your database, so many user’s did so earlier than they would otherwise have down.

MyHeritage

When I visited on 2009 Jun 4, the MyHeritage home page claimed 344.242.531 names in 6.971.344 family sites, which averages to 49,38 individuals per database. That is better than Geni and We’re Related, but it is extremely low compared to other averages listed here.

The explanation for the low number is simply that MyHeritage has a limit for free use, one that was recently lowered from 500 to 250. That is so recent, that it seems safe to say that MyHeritage has as average database size of 50 (instead of 2.500) because its free use limit is 500.

RootsWeb WorldConnect

RootsWeb WorldConnect was started in 1999 and is still going strong.

When I checked on 2009 Jun 10, it stated Names: 580,636,456 Surnames: 5,087,186 Databases: 421,867 exactly. Well, 580.636.456 divided by 421.867 is 1.376,34.

One explanation for the low average is that this collection does contain databases contributed as much as ten years ago, when the average size was a lot less than it is now.

However, a brief look into WorldConnect’s history shows another significant factor. The current numbers include data from Ancestry World Tree (AWT), which has a much lower average than RootsWeb WorldConnect.
Before the merger of AWT into WorldConnect back in 2001, the WorldConnect average was about 2.500 individuals per database, and after it the merger, the average of the combined collection was close to 1.000 individuals per database.

It seems that Ancestry World Tree’s low average size has been dragging down the RootsWeb average ever since. It has climbed back up from 1.000 to about 1.350, but it is still way below the about 2.500 it used to be.

2.500

increasing

The numbers for these databases suggest that the average upload size was about 2.500 in say 2000 already. It seems reasonable to expect the current average to be higher. Various databases have a low average for a variety of reasons, but the average for the TNG Network, about 7.000, is in line with the expectation of a growing average. Its average was larger than the WebTree average, and the current difference between the two is less than a factor two, so perhaps the actual average is in between, say at about 10.000?

decreasing

Then again, perhaps the average upload size has been near constant and really is declining, because genealogy is becoming more popular. So each database is increasing in size, but there is a growing number of databases, and the new databases keep the average low.

Perhaps it is not the average database size that is decreasing, but the average upload size. As applications and web sites are becoming easier to user, and the web has become a common part of life, users are quicker to upload their database than they were in the past?

Perhaps WebTree and TNG somehow mostly appeal to users of large databases, while the other sites appeal to a more general audience?

median

Interestingly, the average size of WebTree databases is about 12.500, but the median size of WebTree databases is about 2.500.

updates

2011-10-02: Camera on the Road

The Camera on the Road web site times out. The link has been removed.

links

articles

sites