Modern Software Experience

2009-06-13

WebTree

average size

Average GEDCOM Upload Size discussed how features of genealogy hosting sites bias their average, and concluded that the best estimate for the average GEDCOM upload size would be provided by a site that is free, not associated with a desktop vendor, has no upload limits, allows deleting your database. It should of course provide public numbers and ideally be a fairly new site, to avoid older database dragging the average down.

WebTree

There is a site that meets all these criteria; FamilyLink’s WebTree.
FamilyLink has no ties to any desktop application. It has several web applications, but there is no direct connection with WebTree, and the GEDCOM support of FamilyLink’s application is poor or non-existent.
WebTree was introduced about a year ago, is free, has no uploads limits, and it does publish numbers.

homepage

FamilyLink’s WebTree prominently displays the current number of ancestors (actually, profiles) that have been contributed by its users, but the home page does not show how many users or trees there are, so it is not possible to calculate any average using just the information on the home page.

I seem to recall that the site looked rather nice when it started, but they’ve apparently slapped several items onto the page without regard for looks, and it sure looks messy now.

The home page does display the latest ten contributed trees and their sizes. When I just looked the smallest one was just 10 individuals, the largest nearly 4.000 individuals, and most were larger than 1.000.

all trees

There is a See All link next to the Latest Family Trees heading. If you follow that, it shows an overview of all trees. When I just visited, there were 1.022 trees, and the number of profiles was 13.317.144, so the average size of those trees is 13.317.144 / 1.022 = 13.030,47.

12.500

I’ll make a small practical adjustment to this number in a bit. Right now I just want to remark that this average is more than 10.000, even more than 12.5000.

If we accept 12.500 as the rounded estimate of average size, then the average size of genealogy databases is one-eighth of 100.000. Thus, to put it in some perspective; if this average doubles every generation, it will take three generation to reach an average of 100.000.

single page

WebTree defaults to showing the trees in antichronological order. It offers an alphabetic view too, but not one that orders the trees by size. It will show an overview all trees on a single page.

When I just visited, the first ten trees in the alphabetical view were 36.585, 12.382, 25.628, 15.792, 53.360, 20, 18.279, 18.437, 52 and 835 profiles respectively.

Quickly browsing through the page showing all the databases, I spotted eleven trees of more than 100.000 profiles each, and three containing than more 90.000 profiles. I later found that there are a few more.

spreadsheet

I was looking for a site that would give totals and the number of databases, but had found something better; a site that gives all the numbers. I decided to take advantage of this happy circumstance.

I saved the page that shows the numbers for all trees as a file, and then massaged that file a bit to turn it into a spreadsheet for analysis.

That was a bit harder than it should have been, because the FamilyLink does not exactly respect web standards. Their page coding is needlessly convoluted and messy too, but in the end the conversion into a spreadsheet succeeded.

less trees

It turns out that although the site claims there are 1.022 trees, the overview page shows only 1.015 records. There is no doubt about that at all. I could not have miscounted it if I wanted to. The page code shows a number for each record, and the last number in the page is 1.015.

Perhaps seven trees were updated, and the count at the top of the page actually shows the number of uploads instead of the number of trees.
Whatever the exact reason for the difference, all calculations that follow were done with those 1.015 records.

average

Those 1.015 trees contain 12.246.941 profiles, so the average size is 12.065,95 profiles. That is well over 10.000, and still close to 12.500, ⅛ of 100.000.

right?

An average size of 12.500 individuals? Is this number right? Is this the average size of today’s GEDCOM upload?

I do not know. All I know is that this number is right for WebTree, a site that actually fits the criteria I came up with. Perhaps I need better criteria, but I cannot simply dismiss this number because I expected a higher or lower value. I set criteria, found a site that fits the criteria, to calculate the number I wanted to know, not to confirm some value I expected.

objection

Perhaps the most significant objection against this particular number is that it is based on a sample of not much more than a thousand databases. That is a rather small sample compared to some of the other sites, which are collections containing millions of databases. That does leave a relatively large margin of error. Still, there is no particular reason to assume the actual average is much larger or smaller.

not unreasonable

I have calculated numbers for more sites. Most of these sites have some bias, and most averages are lower, yet generally do suggest that the average size is in the thousands. Therefore, I dare say that this estimate is not unreasonable.

more

There is much more to say about the statistics for WebTree and what these statistics mean, if anything. But it is perhaps best to first discuss other averages, biased or not, to put this number in some perspective.

updates

2010-05-03 WebTree no more

FamilyLink has abandoned WebTree. See WebTree no more.

2010-04-23 WebTree link

The broken WebTree link has been removed.

links