Modern Software Experience

2012-06-15

obsolete GEDCOM feature

multi-volume GEDCOM

It is a little-known feature of the GEDCOM specification that it supports multi-volume GEDCOM files. This isn't documented in chapter 1, but near the end of chapter 2, so it isn't a basic feature of the GEDCOM grammar, but a feature of the lineage-linked form.

how

Here's how it works. Normally, a single GEDCOM file has the *.GED extension. The full name could be Jones.GED.
When the file is too large to fit on a single volume, it is split into multiple files, and each of these file is placed onto a separate volume. The first file retains the *.GED extensions, but the second file gets extensions *.G00, the third one gets extension *.G01, etcetera. The file is split over as many volumes as necessary, but no more than one hundred and one volumes; *.G99 is the last extension this multi-volume scheme suppports.
If the file Jones.GED were to split over three volumes, the three filenames used would be Jones.GED, Jones.G00 and Jones.G01.

The GEDCOM specification specifies a slightly different naming scheme for use with Macintosh floppies, but the overall approach is the same; the file is split into multiple smaller files, and the file name used for each volume ensures these parts will be put back together in the right order; if you accidentally inserted a disks out of order, the application would tell you that it expected another one.

writing

The rules for splitting are very simple; a split has to be made between two GEDCOM lines. So, each volume contains a whole number of lines.
There are no other rules for splitting a files. In practice, files would be split files at roughly the size of the disk used, to minimise the number of disks needed.

Nothing in the GEDCOM header informs a GEDCOM reader that it is reading a multi-volume GEDCOM file, but only a complete GEDCOM file is terminated with a TRLR record.

reading

Detecting a multi-volume GEDCOM file is very easy. Nothing in the GEDCOM header informs a GEDCOM reader that it is reading a multi-volume GEDCOM file, but only a complete GEDCOM file is terminated with a TRLR record.

If a GEDCOM file terminates with a TRLR record at the end, the GEDCOM file is complete.
If there is no TRLR record, the GEDCOM file is incomplete, is apparently a multi-volume GEDCOM file, and the reader should ask for the next volume. The GEDCOM reader should continue to ask for the next volume until the part it just read terminates with a TRLR record. If a part ends with the TRLR record, the GEDCOM reader is done.
Although the numbering scheme used obviously limits multi-volume GEDCOM files to a total of 101 volumes, the GEDCOM specification does not anticipate this case, and does not specify how to handle it. The situation is unlikely, but it needs to be handled. Luckily, there is only one sensible thing to do. If the reader has handled the *.G99 volume and still not found the TRLR record, even a GEDCOM reader that supports multi-volume GEDCOM files will has to report that the GEDCOM file is incomplete.

outdated feature

The current de facto GEDCOM standard is GEDCOM 5.5.1 from 1999, but the text on multi-volume GEDCOM files is identical to that of the GEDCOM 5.5 specification from 1995. It is also identical to the text in the GEDCOM 5.3 specification from 1993…

Back in 1995, the 5¼" HD floppy disk with a formatted capacity of 1,2 MB had been common for more than a decade already, and 3½" diskettes with a formatted capacity of 1,44 MB had been around for about half that time, and were common too.

If a file did not fit a single disk, you would use an archiver such a ARC or ZIP, and if it still did not fit, the archiver provided the necessary multi-volume support.

Now, although most genealogical databases were still rather small, a few may have been to big for those capacities, but even so, the multi-volume GEDCOM file already was a solution in search of a problem. If a file did not fit a single disk, you would use an archiver such a ARC or ZIP, and if it still did not fit, the archiver provided the necessary multi-volume support.

Users could create a GEDCOM on their hard disk, and then use an archiver to package it on multiple disks. There still were PCs without hard disks around, but anyone whose database was so large that the GEDCOM file needed multiple floppy disk volumes, would probably have been unable to fit the application's native database on a floppy disk in the first place.

Moreover, Iomega introduced ZIP drives and disks late in 1994, and they quickly became popular. The initial ZIP drives supported ZIP disks with a capacity of 100 MB, and later models supported ZIP disk with capacities of 250 MB and 750 MB. Just a few years later, ZIP disks were surpassed in popularity by re-writable CDs, with a capacity of more than 650 MB.

When the GEDCOM 5.5 specification was released late in 1995, the multi-volume GEDCOM support should have been deprecated already.

superfluous

Even when they were still releasing updates of the GEDCOM specification, FamilySearch was already neglecting to update the multi-volume feature. It seems FamilySearch never bothered to rethink the status of this feature, but merely kept copying the same text from one GEDCOM specification to the next.

When the GEDCOM 5.5 specification was released late in 1995, the multi-volume GEDCOM support should have been deprecated already; it should have been something that new applications should still recognise and read, but no longer create.
When the GEDCOM 5.5.1 specification was introduced in 1999, support for the multi-volume format should have become optional. A few years later, it should have been removed from the specification proper and moved into an informational appendix.

The multi-volume feature should have been phased out of the GEDCOM specification. Even the backwards compatibility, a generally desirable feature, wasn't sufficient reason to keep the feature.
The ability to read existing multi-volume was truly superfluous; if you ever happened across a GECCOM file split into multiple volumes, you would not even need a utility to join them up, you could just join them back together into a single file with a COPY command or a text editor.

today

It is a oddity that, more than a dozen years of GEDCOM neglect later, this superfluous feature is still part of the GEDCOM specification.
New computers do not even have floppy drives anymore, most desktop computers have DVD writers, Blu-Ray writers are already a common feature, and most computers are networked. Tablets and cell phones accept multi-gigabyte USB sticks or SD Card.

Although many applications that have been around for years still read multi-volume GEDCOM files, many newer applications do not. Few developers bother to implement it anymore, and users certainly aren't clamouring for it.

The multi-volume GEDCOM feature is thoroughly obsolete, yet it is still part of the GEDCOM specification. It is still legal to split a GEDCOM file into multiple files, and for that annoying reason alone, vendors that desire to claim full GEDCOM support still have to support it.

Best Practice

GEDCOM writer

GEDCOM reader

GEDCOM validator

links