Average GEDCOM Upload Size discussed how features of genealogy hosting sites bias their average, and concluded that the best estimate for the average GEDCOM upload size would be provided by a site that is free, not associated with a desktop vendor, has no upload limits, allows deleting your database. It should of course provide public numbers and ideally be a fairly new site, to avoid older database dragging the average down.
There is a site that meets all these criteria; FamilyLink’s WebTree.
FamilyLink has no ties to any desktop application. It has several web
applications, but there is no direct connection with WebTree, and the GEDCOM
support of FamilyLink’s application is poor or non-existent.
WebTree was introduced about a year ago, is free, has no uploads limits, and it
does publish numbers.
FamilyLink’s WebTree prominently displays the current number of ancestors
(actually, profiles) that have been contributed by its users, but the home page
does not show how many users or trees there are, so it is not possible to
calculate any average using just the information on the home page.
I seem to recall that the site looked rather nice when it started, but they’ve apparently slapped several items onto the page without regard for looks, and it sure looks messy now.
The home page does display the latest ten contributed trees and their sizes. When I just looked the smallest one was just 10 individuals, the largest nearly 4.000 individuals, and most were larger than 1.000.
There is a See All
link next to the Latest Family Trees
heading.
If you follow that, it shows an overview of all trees. When I just visited,
there were 1.022 trees, and the number of profiles was 13.317.144, so the
average size of those trees is 13.317.144 / 1.022 = 13.030,47.
I’ll make a small practical adjustment to this number in a bit. Right now I just want to remark that this average is more than 10.000, even more than 12.5000.
If we accept 12.500 as the rounded estimate of average size, then the average size of genealogy databases is one-eighth of 100.000. Thus, to put it in some perspective; if this average doubles every generation, it will take three generation to reach an average of 100.000.
WebTree defaults to showing the trees in antichronological order. It offers an alphabetic view too, but not one that orders the trees by size. It will show an overview all trees on a single page.
When I just visited, the first ten trees in the alphabetical view were 36.585, 12.382, 25.628, 15.792, 53.360, 20, 18.279, 18.437, 52 and 835 profiles respectively.
Quickly browsing through the page showing all the databases, I spotted eleven trees of more than 100.000 profiles each, and three containing than more 90.000 profiles. I later found that there are a few more.
I was looking for a site that would give totals and the number of databases, but had found something better; a site that gives all the numbers. I decided to take advantage of this happy circumstance.
I saved the page that shows the numbers for all trees as a file, and then massaged that file a bit to turn it into a spreadsheet for analysis.
That was a bit harder than it should have been, because the FamilyLink does not exactly respect web standards. Their page coding is needlessly convoluted and messy too, but in the end the conversion into a spreadsheet succeeded.
It turns out that although the site claims there are 1.022 trees, the overview page shows only 1.015 records. There is no doubt about that at all. I could not have miscounted it if I wanted to. The page code shows a number for each record, and the last number in the page is 1.015.
Perhaps
seven trees were updated, and the count at the top of the page actually shows
the number of uploads instead of the number of trees.
Whatever the exact reason for the difference,
all calculations that follow were done with those 1.015 records.
Those 1.015 trees contain 12.246.941 profiles, so the average size is 12.065,95 profiles. That is well over 10.000, and still close to 12.500, ⅛ of 100.000.
An average size of 12.500 individuals? Is this number right? Is this the average size of today’s GEDCOM upload?
I do not know. All I know is that this number is right for WebTree, a site that actually fits the criteria I came up with. Perhaps I need better criteria, but I cannot simply dismiss this number because I expected a higher or lower value. I set criteria, found a site that fits the criteria, to calculate the number I wanted to know, not to confirm some value I expected.
Perhaps the most significant objection against this particular number is that it is based on a sample of not much more than a thousand databases. That is a rather small sample compared to some of the other sites, which are collections containing millions of databases. That does leave a relatively large margin of error. Still, there is no particular reason to assume the actual average is much larger or smaller.
I have calculated numbers for more sites. Most of these sites have some bias, and most averages are lower, yet generally do suggest that the average size is in the thousands. Therefore, I dare say that this estimate is not unreasonable.
There is much more to say about the statistics for WebTree and what these statistics mean, if anything. But it is perhaps best to first discuss other averages, biased or not, to put this number in some perspective.
FamilyLink has abandoned WebTree. See WebTree no more.
The broken WebTree link has been removed.
Copyright © Tamura Jones. All Rights reserved.