Modern Software Experience

2009-05-30

What’s Large?

different

Different vendors and organisations have completely different ideas as to what constitutes a large genealogy database.

Genbox

For example, the 2004 Nov 14 Release Notes: Version 3.3.1 page for Genbox Family History states Improved performance for importing large GEDCOM files. Tested with 100-MByte file (about 6 million lines; 400,000 individuals): analysis pass under 6 minutes; full import just under 6 hours (about 1100 individuals per minute)..

I consider that a rather modest use of large; for a file that size, use of very large seems appropriate. Not every vendor is as modest in their usage of the adjective large. In fact, quite a few apply it to databases that are small, tiny or minuscule.

Northern Hill Software

The Change Log for Northern Hill Software’s Pocket Genealogist includes the following Bug Fix description: Desktop Program would run out of memory when Importing LARGE gedcom files (more than 15,000 individuals).

So, Northern Hill Software wants you to think a file of a 15.000 individuals is large, when a well-researched medium-size ancestry can easily contain 25.000 individuals or more (see Medium-Size Genealogy). It would be somewhat improper usage of the adjective if they had merely referred to it as large (lower case), but the fact that they even capitalise LARGE makes it a definite and quite deliberate improper usage. Still, their abusage of the adjective large to downplay the limitations of their software is mild compared to some of the following examples.

Geni.com

When Geni.com added GEDCOM import more than a year ago, that feature was limited to GEDCOM files of just 5.000 individuals or smaller. Yet their announcement on the Geni.com forum added that you can now import a GEDCOM with up to 5.000 names!, with an exclamation mark, as if 5.000 is a whole lot. It is not, and Geni.com user ScottHib was quick to tag the thread with LIFT GEDCOM IMPORT LIMITS.

Arcalife

Arcalife sent out emails a few days ago that they had upgraded their GEDCOM import capability and now allow up to 10.000 individuals instead of just 5.000. I agree with the development direction, but I disagree with the way they describe it: …now support very large sizes – more than 10,000 tree nodes or profiles. A database of just 10.000 individuals is not large, and certainly not very large. It is smallish.

Ged2Web

The Ged2Web change history contains this sentence: Version 2.63 dramatically speeds up the import and indexing of large GEDCOM files (in excess of about 5000 individuals)..

Reference to a such a small limit as large may have made some sense in the early 1980s, when many people were just starting to enter their research into digital databases, and few had entered even one thousand individuals yet, but it makes no sense today. The limit itself is also rather puzzling. Why 5.000? That is such a random number.

PGVHosting

The signup page for PGVHosting, a PHPGedView hosting service, states: Large gedcom files (> 5MB) may timeout before uploading can be completed. You should compress large Gedcom files into ZIP files before uploading them..

I’ve upgraded my five-year old Windows XP machine to 2 GB RAM, and my Vista computer has 4 GB of RAM. A 5 MB file is not insignificant, but it is not really large either. Many documents are larger, and even that five-year old computer has more than enough memory and computing power to easily handle that.

PHPGedView

Of course, the authors of the PGVHosting service merely referred to 5MB as large because the PHPGedView documentation itself contains a similar statement. The FAQ: Installing PHPGedview page of the PGVWiki, states: There are two problems that could cause this when working with large GEDCOM files (>2MB): insufficient memory and insufficient time..

A 2 MB file is not large. Memory is measured in gigabytes, disk space is measured in terabytes and 2MB is less than a single digital photo from a 99 Euro camera. A 2 MB file is nothing special.

MyHeritage

Late in 2008, MyHeritage silently decreased the limits on its free and subscription plans. The limit for free use of their service was lowered from the already tiny 500 individuals to just half that, 250 people.

The message those low limits sends to visitors just starting out in genealogy is misleading. Their limits of 250 individuals for Basic and 2500 for Premium suggest that 250 is large and 2500 is huge, while the truth is that 2500 is a very small maximum already.
To their credit, MyHeritage itself does not actually refer to either limit as large, but then again, they do actively tweet links to reviews that do.

TUFaT

TUFaT (The Ultimate Family Tree) is a little-known PHP and MySQL-based web application for creating a genealogy site.

The TUFaT web site probably has the most over-the-top reference to a tiny file as large; it actually skips large altogether and describes a GEDCOM file of less than half an MB as very large: If your GEDCOM is very large (>500KB), then you may upload in multiple parts..

LDS BYU

You would expect the LDS-owned Brigham Young University (BYU) to lead by example, but they are actually the worst offender.

Lesson 4: GEDCOM Files of their BYU: Religion 261: Introduction to Family History Online Lessons contains the following sentence: If you are importing a large GEDCOM, (over 7 generations), the process goes much faster if all files are on the “C” drive..

So, according to BYU, an ancestry of seven generations is large, despite the fact that you can construct one with as few as seven individuals. They presumably mean a full ancestry, but even a full 7-generation fan chart still contains just 127 persons, and that is not a large database at all, that is a tiny and nearly negligible GEDCOM size, that should import in mere milliseconds.

In an age of 32-bit desktop computers and 64-bit servers, every 16-bit value is small.

conclusion

The suggestion that a small GEDCOM file of just 5.000 individuals is large seems a deceit common to several vendors, and some others are happy to forward even smaller files as large or even very large.

anachronistic

In some cases the ridiculous statements can be explained as some old text that has not been updated in some time, but other limits belong to new products from new vendors.
That these vendors introduce limitations that an early MS-DOS programmer would be ashamed to admit to, and then describe their ridiculously small limit as large is surprisingly anachronistic.

In an age of 32-bit desktop computers and 64-bit servers, every 16-bit value is small.

updates

2010-07-14 Ancestry.com Tree To Go

The web and iTunes shop page for Ancestry.com Tree To Go includes this telling claim: Optimized to handle trees of all sizes, including trees with over 2,000 people.

2011-04-24 BYU Religion 261

The BYU Religion 261 course has been updated. The new course does not seem to include the same error. BYU broke the original link. The broken link has been removed.

links