In June of 2009, I wrote a series of articles examining genealogy database size.
It all started with three genealogy articles I published at the end of May.
First, there was Medium Size Genealogy, the first ever attempt to estimate of a medium size genealogy;
the size of your database if you investigated your ancestry thoroughly, and did not investigate anything else.
It introduced a simple formula to calculate a ballpark figure, and for various estimates of the key values,
the medium size varies from 12.000 through 50.000 to 88.0000.
The second article was My Large is Smaller than Yours, an article that poked fun at vendors
using large
to describe medium, small, tiny and even miniscule databases, mostly to try and hide the fact that the capacity of their
application is severely limited.
That 2009 article concluded with the remark that In an age of 32-bit desktop computers and 64-bit servers, every 16-bit value is small
.
The third article was Average Number of Nth Generation Descendants. Some bloggers had claimed that it impossible to calculate an average number of nth-generation descendants for ancestors that lived n generations ago, I proved otherwise by providing a formula to do so.
The first two articles prompted discussion in the geneasphere about just what is large, see links in Average Size Month.
During the month of June, I wrote two interrelated series, one looking into the average database size of various genealogy hosting sites,
and one introducing social genealogy metrics; metrics for social genealogy sites.
Some unscientific polling suggested that many expected the average to be just a few thousand.
The final article in the first series, Genealogy Database Average, Median and Mode, notes that the average is
higher, but that the median fits the expectations; perhaps we actually estimate the median when we try to estimate the average.
The ballpark figures found are an average size of about 12.500, and a median size of about 2.500 individuals.
| database | size |
|---|---|
| Randy Seaver | 41.324 |
| Karen | 24.978 |
| Becky Higgins | 4.483 |
| Caroline Gurney | 5.096 |
| Pamela Wile | 1.845 |
| Carol | 16.661 |
| Celia | 8.066 |
| MNFamilyHistorian | 11.922 |
| Mel | 9.453 |
| Jacqueline Foster | 10.167 |
| Liz Tapley | 3.808 |
| Ginger Smith | 8.561 |
| Jen Smart | 1.308 |
| Julie | 4.837 |
| GeneaPopPop | 5.012 |
| Geniaus | 8.473 |
| Nastrond | 91.380 |
| MidWestAncestree | 23.476 |
| Elizabeth Handler | 4.347 |
| Doris Wheeler | 4.105 |
| Bill West | 25.971 |
| Reba Mc | 5.645 |
| Lyn Swan | 14.579 |
| Lis K | 517 |
| Sébastien Comeau | 43.018 |
| Tim Forsythe | 6.054 |
| Tessa Keough | 5.657 |
| number | 27 |
| total | 390.563 |
| average | 14.465 |
| median | 8.066 |
Every week, Randy Seaver of Genea-Musings posts Saturday Night Genealogy Fun (SNGF), challenging participants to perform some random genealogy or family history related task.
This week, the challenge was to post your genealogy database numbers.
Randy posted his own, and participants left theirs in comments or in a post on their own blog.
I decided to collect the key value, the number of individuals in each database, and calculate some statistics.
Most numbers are copied directly from the comments on Randy's blog post, or the blog post the comment linked to. MidWestAncestree broke his database into five parts, probably because of performance or capacity issues with Ancestry Family Trees. The numbers she posted have been added together again.
There are a few tiny genealogy databases, quite a few small ones, five medium size genealogy databases, and one large one.
I did include Randy's database, but I did not include my own database, simply because I do not want to skew the results towards large databases.
That leaves 27 participants who posted their database size, totalling 390.563 individuals,
with an average database size of 14.465, and a median size of 8.066.
Both the average and the median for the SNGF participants are higher than the average and median calculated in 2009.
Part of the difference is simply that, in the more more than two years since, databases have grown larger.
Part of the reason is that participants are self-selecting, and some bloggers with tiny or small databases may feel to intimidated by the medium size database to reveal their numbers.
Perhaps, but surely that effect was more than compensated for by leaving my own database out.
The main reason is probably that genealogy bloggers aren't average; the average (ahem) genealogy blogger is more obsessed with enthusiastic about genealogy than most other genealogists.
Updated with more numbers from latest participants, and a comment on my Google+ post.
Updated with more numbers from comments on Randy's Google+ post. No comments on either of his two FaceBook posts.
Copyright © Tamura Jones. All Rights reserved.