Modern Software Experience

2010-08-24

GEDCOM

Unlike many other introductions to GEDCOM, this text is not about the technical details of the GEDCOM data format, but about basic facts and real-world issues.

misnomer

GEDCOM is an acronym for Genealogical Data Communications.
That name is an unfortunate misnomer. If you knew nothing but the name, you would probably guess that GEDCOM is some kind of communication protocol, but it is not. GEDCOM isn't some language through which genealogy applications talk to each other. A few such languages do exist, but GEDCOM isn't one on these. GEDCOM is not about data communications at all, it is just a file format for data exchange.

Unlike many other introductions to GEDCOM, this text is not about the technical details of the GEDCOM data format, but about basic facts and real-world issues.

data transfer

GEDCOM is a data file format for the transfer of genealogy data. The idea is that you can transfer data between two genealogy application by exporting it from one application into a GEDCOM file and then importing that GEDCOM file into the other application.
There are several reasons why, in actual practice, GEDCOM does not fully live up to that ideal.

de facto standard

GEDCOM is a de facto standard for transfer of data between genealogy applications. GEDCOM is not a de jure standard managed by some official standards body. There is no de jure standard for genealogy, but almost every genealogy application supports GEDCOM.
There are several alternative specifications, but none is as widely supported.

before GEDCOM

rekeying

Users that switched between the first few genealogy applications had to print their data out their data from the old application, and then rekey all of it into the new one. No one had very big databases yet, so that was doable, but it was also error-prone and cumbersome. Nobody enjoyed rekeying their data, and many early adopters had years of research on paper, so their databases grew quickly.

direct import

Soon, several genealogy applications supported direct import from competing products. This is easiest for the user, so even today, many genealogy applications support direct import from several competing products.
However, the direct import is generally limited to a handful of major products, while there are literally hundreds of genealogy applications on the market. It is impractical for any vendor to support them all. And even if direct import from a particular product is supported, that support is unlikely to include a recently released version of that product.

GEDCOM versions

early versions

The first version of GEDCOM introduced in 1984 was just a paper specification. Their was no application that supported it.

The first application to support GEDCOM was PAF 2.0, which supported GEDCOM 2.0. For several years, the LDS developed PAF and GEDCOM side by side. PAF 2.1 supports GEDCOM 2.1.

compatibility

The various versions of GEDCOM are not very compatible with each other. That hardly matters anymore, as GEDCOM 5.5 was released 1995, and GEDCOM 5.5.1 in 1999. Practically every genealogy application supports GEDCOM 5.5 or 5.5.1.

versions

This table shows the main GEDCOM versions.

dateversionbrief note
19841.0first version
1985-122.0PAF 2.0
1987-022.1PAF 2.1
1987-10-093.0lineage-linked form
1989-08-044.0 
1991-12-315.0lineage-linked structures
1993-11-045.3Unicode, schema (abandoned),
used draft
1995-12-115.5official standard
1999-10-025.5.1de facto standard
2000-12-185.6unreleased draft
2001-12-286.0abandoned draft

There have been additional drafts in between these versions. GEDCOM version 5.3 is a draft, but is included in this table because several applications, most notably several versions of Family Tree Maker, use it anyway.

GEDCOM 5.5 versus 5.5.1

Officially, version 5.5.1 is still a draft, but that is only because the LDS forgot to make it official. Many applications, including their own PAF application, use GEDCOM features introduced in GEDCOM 5.5.1.

GEDCOM 5.6

GEDCOM 5.6 is an unreleased GEDCOM draft from 2000 that surfaced early in 2011. It does not really offer any new features, except that its GEDXML foreshadows GEDCOM XML.

GEDCOM XML

GEDCOM XML 6.0 is an ill-chosen name. It is not version 6.0 of GEDCOM, but a draft of an intended replacement of GEDCOM, and should really have received another name and version number 0.9. The LDS abandoned development of this replacement, and did not resume maintaining GEDCOM.

current version

Version 5.5.1 introduced on 1999 Jan 2 is still the latest version of GEDCOM and the de facto standard.

GEDCOM

Several genealogy software vendors started talking about a standard for exchanging data and one of them created GEDCOM.
GEDCOM soon enjoyed widespread support among genealogy software, but that is not because it is the best standard for genealogy data, but merely because it was the first one. Once several major vendors supported it, every new genealogy application had to support it.

owner

GEDCOM was created by the Family History Department of The Church of Jesus Christ of Latter-day Saints (LDS), an organisation that has an interest in genealogy for religious reasons. The LDS is one of the earliest genealogy software vendors; they started selling their application, Personal Ancestral File (PAF) in 1984. PAF 2.0 was the first application to support GEDCOM.

The LDS owns and officially maintains the GEDCOM specification, but the LDS has been remarkable inactive in its role as keeper of the standard since the release of GEDCOM version 5.5.1.

religious data format

In some sense GEDCOM is perfect. We tend to think of GEDCOM as genealogical data format, but to the LDS it is a religious data format, a format to exchange data between databases they maintain for religious reasons.
That GEDCOM has shortcomings as a genealogical data format, is because the LDS is not primarily interested in genealogy, but in recording religious rites performed for their ancestors.

problems

Practically all genealogy applications support GEDCOM, but that still does not mean that you can expect a flawless transfer of your data by exporting your data to a GEDCOM file from one product and then importing that GEDCOM file into another product.

insufficient

The GEDCOM specification is far from perfect. There are various known errors and unnecessary limitations that should have been fixed immediately, but the LDS refuses to fix or update the specification. The most unbelievable shortcoming is that the GEDCOM specification still does not provide a standard for any other partnership type than marriage.

extensions

Vendors are allowed to extend GEDCOM to add support for genealogical data that standard GEDCOM does not support, but other genealogy application may not support these extensions.

The combination of whatever idiosyncrasies and shortcomings that product's GEDCOM files have, and the GEDCOM extensions a product uses is known that product's GEDCOM dialect. Vendors do try to support each other's GEDCOM dialects, but at the same time generally do not bother to document their own GEDCOM dialect.

quality

So, some problems that users encounter are inherent in limitations of GEDCOM specification itself, but many problems are caused by the low quality of vendor's GEDCOM implementations.
The GEDCOM specification allows several character sets to be used. A common problem with old genealogy applications is that they do not support the character sets that they should support, which limits their ability to import GEDCOM files correctly or in fact import them at all.

Another common problem is that implementations provide incomplete support for the GEDCOM standard. In practice, many applications support no more than the application itself uses. A common shortcoming of many genealogy applications is that they allow just one name per individual, while the GEDCOM specification allows more than one.

import log

On import of a GEDCOM file, a genealogy application should produce an import log, a simple text file that provides log of any issues encountered during the import.

What makes many of the GEDCOM import limitations worse is that many genealogy application do not bother to make an import log, or are not honest about the application's limitations. Some vendors will rather lie that your GEDCOM file is wrong than admit to a limitation in their product.
Even with an honest import log, it can be difficult to understand what went wrong. Without an honest import log the average user is completely unable to judge how well the import went.

multimedia

GEDCOM does support multimedia. However, this was only added to GEDCOM after several applications had already decided on their own approach. Although the current standard has been around for some time, transfer of multimedia between applications remains problematic, not in the least because the standard is insufficient.

There are two main issues. One is that the multimedia files must be transferred along with the GEDCOM file, but that the standard does not specify any format for packaging all the files together, leaving the user to manage the file transfers themselves.
The second problem is that the specification does not specify where multimedia files should be stored with respect to the database or GEDCOM file; in practice GEDCOM files contain full directory paths that are unlikely to match those of another application on another system.

export

When it comes to GEDCOM support, vendors still tend to focus on GEDCOM import rather than GEDCOM export. Vendors focus on the ability of their application to import GEDCOM files created by other applications. Many vendors even proudly list all the applications that they believe their application to import perfectly in their feature list.

However, what is more important to you as a user is the quality of the GEDCOM export, and how well other applications support the product's GEDCOM dialect. After all, if no other application can import those files, you have been locked into that product, unable to switch to another.

FTW TEXT

Some vendors have taken so many liberties with the GEDCOM specification, that what their application produces isn't GEDCOM at all. Family Tree Maker is rightly infamous for producing an FTW GEDCOM dialect so awful, that it seems deliberately incompatible.

Even worse, several versions of Family Tree Maker default to creating ostensible GEDCOM files that are not GEDCOM files, but FTW TEXT files. The product's dialog boxes are dishonest about this in a way that makes a user who does not know better believe that FTW TEXT is real GEDCOM. The current owner of Family Tree Maker, Ancestry.com, should release a free FTW TEXT to GEDCOM conversion tool, but still has not done so.

GEDCOM alternatives

alternatives

Over the years, various alternatives to GEDCOM have been proposed, including the LDS' own ill-named GEDCOM XML 6.0. The sheer number of available alternatives is an embarrassment of riches.
These alternatives generally offer worthwhile advantages over GEDCOM, yet not one alternative has achieved significant industry adoption.

adoption

One reason for limited support for alternatives is that most vendors are not eager to support a standard controlled by another vendor.
Some proposals are vendor-independent, but getting any new standard - however good - adopted is difficult. Vendors are unlikely to invest in a format unless it is about to become the new industry standard, but it will not become a new standard unless vendors invest in it.

common extensions

One approach to solving some of GEDCOM's limitations that has been successful is the development of common extensions; a collection of GEDCOM extensions common to a group of products.
GEDCOM 5.5 EL (Extended Location) was developed by a group of German genealogy vendors in collaboration with the Verein für Computergenelaogy e.V. (Society for Computer Genealogy). GEDCOM 5.5 EL is supported by many German genealogy applications and is freely available to other vendors to implement in their product.

replacement

Another approach to deal with GEDCOM's limitations is to create another, better standard, to replace GEDCOM. Many GEDCOM alternatives have been proposed. Most have been forgotten. None enjoy wide industry support.
The GEDCOM Alternatives article provides an overview. Two current developments are FHISO and GEDCOM X.

FHISO

BetterGEDCOM, an informal grassroots project to create a GEDCOM replacement has spawned the creation of the formal Family History Information Standards Organisation (FHISO). FHISO aims to develop modern standards for genealogy data.

GEDCOM X

Late in 2011, FamilySearch's GEDCOM X project was uncovered. FamilySearch officially introduced it early in 2012. The name is likely to cause confusion; like GEDCOM XML, GEDCOM X is not a new version of GEDCOM, but another GEDCOM alternative.

issues

GEDCOM is a standard for transferring data from one genealogy application to another, but because of inherent GEDCOM limitations, incomplete specifications, unsupported dialects and poor implementations, that transfer may be less than perfect. On top of that, many applications do not even provide an import log to help you figure out how well the transfer went.

In practice, basic data such as names and vital events transfers just fine, and that is already a large improvement on a world without any standard for genealogy data. A lot of other data such as notes and sources generally transfers successfully as well. Moreover, GEDCOM dialects of popular products tend to be supported by many other products.

conclusion

GEDCOM is a data format for genealogical data. It is not perfect, and it is not perfectly supported, but it is the only widely supported standard for genealogy data.

Vendors tend to stress the ability of their product to import data from other products, but to a user, the more important thing is the quality of the GEDCOM files it exports, as that largely determines the ability of other products to import those GEDCOM files. Only when other applications will import the file can you use a GEDCOM file to do what it was designed to; move your data from one application to another.

updates

2010-11-05: GEDCOM Alternatives

The GEDCOM ALternatives article provides an overview of the many GEDCOM alternatives proposed over the years.

2011-01-07: GEDCOM 5.6

The hitherto unknown and never officially released GEDCOM 5.6 draft has surfaced.

2011-12-12: GEDCOM X

FamilySearch GEDCOM X project to replace GEDCOM revealed.

2012-02-01: Family History Information Standards Organisation

Family History Information Standards Organisation (FHISO) officially introduced.

2012-02-02: FamilySearch releases GEDCOM X

The secret GEDCOM X site has been made public.

links