Average Genealogy Hosting Size mentioned the average size for various genealogy hosting sites, including those for WebTree and GenCircles.
The numbers for WebTree and GenCircles were not derived from totals listed on
the site’s home page, but from analysis of all sizes.
Average WebTree Size
discussed derivation of the WebTree average. This article discusses how I got
the data for GenCircles, as well as how I dealt with some issues I encountered.
Because GenCircles and Family Tree Legends were both products of Pearl Street Software, we can expect overrepresentation of and thus a bias towards the average size of Family Tree Legends databases.
When Pearl Street Software put GenCircles up for sale they claimed 135
million records and 160.000 registered users. That would work out to less than
850 records per user, but not all the registered users contributed a tree.
Many users registered because they thought that was necessary, others registered
but failed to upload a database, and so on.
There is a enough public data on the GenCircles site to calculate the average
size of the currently more than 100.000 trees.
That average size turns out to be 1.946,98.
GenCircles has a list all files
link on the home page. This leads to
27 pages with a list on each on; pages A through Z and page 0-9. Analysing that
data is just a matter of copying those page into a spreadsheet. Well, that is
easier said than done.
One practical issue is that the total number of trees (more than 100.000) is more than 65.536, and even 32-bit spreadsheets may have several 16-bit limits in them.
Once I had all the data copied into a spreadsheet, I noticed a few odd things. Not only do many trees have the same name, some trees have no name. Some names are hurriedly chosen names such as My Family, which may have been chosen by anyone, but at other times it is clear that the a user has uploaded multiple versions of the same database over time. I have done nothing to correct either issue.
That
GenCircles has quite some trees
with just one individual them is not a
unique issue, but that GenCircles has 1.975 trees with zero records, often trees
with the same name as other ones, suggests that users have regularly encountered
uploading issues that were never fixed.
Although databases with just one record arguably are not trees either, only the zero-sized trees have been removed from the analysis.
That included 1.326 nameless zero-sized trees. Once the zero-sizes trees were removed, there were 649 nameless trees left. After removing another 8.396 empty trees, there were 101.370 trees left.
There are 140 files larger than 100.000 individuals, of which 20 are larger than 250.000 individuals, and the largest is 846.886 individuals. The name of that database makes it clear it was contributed in 2007, so it is not unlikely that the corresponding desktop database has passed 1.000.000 records already. I guess that it is currently the largest genealogical research database owned by a single individual.
The 101.370 databases left after removing the empty databases from consideration contain 197.365.966 records in total, so that is an average of 1.946,98 records per database.
Copyright © Tamura Jones. All Rights reserved.