Modern Software Experience

2010-08-24

GEDCOM

Unlike many other introductions to GEDCOM, this text is not about the technical details of the GEDCOM data format, but about practical facts and real-world issues.

misnomer

Many introductions to GEDCOM will still tell you that GEDCOM is an acronym for Genealogical DataCommunications. That is where the name GEDCOM comes from, but GEDCOM is no longer an abbreviation, it is just a name. The reason for that change is that the original expansion is a confusing misnomer. If you knew nothing but that expansion, you'd probably guess that GEDCOM is some kind of communication protocol, but it is not. GEDCOM isn't some language through which genealogy applications talk to each other either. GEDCOM is not about data communications at all, it is just a file format for genealogy data. That's why, starting with GEDCOM 5.5.5 (2019) GEDCOM is no longer an abbreviation, but just a name now.

Unlike many other introductions to GEDCOM, this text is not about the technical details of the GEDCOM data format, but about practical facts and real-world issues.

data transfer

GEDCOM is a data file format, specifically created for the transfer of genealogy data. The idea is that you can transfer data between two genealogy application by exporting it from one application into a GEDCOM file and then importing that GEDCOM file into the other application.
There are several reasons why, in actual practice, GEDCOM does not fully live up to that ideal.

de facto standard

GEDCOM is a de facto standard for data transfer between genealogy applications. There is no de jure standard for genealogy, but almost every genealogy application supports GEDCOM.
There are several alternative data formats, but none is as widely supported.

before GEDCOM

re-enter data

Users that switched between the first few genealogy applications had to print out their data from the old application, and then re-enter all of their data into the new one. No one had very big databases yet, so re-entering your data was still doable, but it was also error-prone and cumbersome. Nobody enjoyed re-entering their data, and many early adopters had years of research on paper, so their databases grew quickly.

direct import

Soon, several genealogy applications supported direct import from competing products. This is easiest for the user, so even today, many genealogy applications support direct import from several competing products.
However, the direct import is generally limited to a handful of major products, while there are literally hundreds of genealogy applications on the market. It is impractical for any vendor to support them all, and even if direct import from a particular product is supported, that support does not always include the most recently released version of that product.

FamilySearch created what we now call legacy GEDCOM; the GEDCOM versions up to and including GEDCOM 5.5.1, all released in the last two decades of the 20th century.

GEDCOM

Several genealogy software vendors started talking about a standard for exchanging data and one of them created GEDCOM.
GEDCOM soon enjoyed widespread support among genealogy software, not because it is the best standard for genealogy data, but merely because it was the first one. Once several major vendors supported it, every new genealogy application had to support it.

FamilySearch

GEDCOM was originally created by FamilySearch. Well, at the time they created GEDCOM, FamilySearch was still known as the Family History Department (FHS) of The Company of the President of The Church of Jesus Christ of Latter-day Saints (LDS), better known as the mormons. Mormons have a limited interest in genealogy for religious reasons.

legacy GEDCOM

FamilySearch created what we now call legacy GEDCOM; the GEDCOM versions up to and including GEDCOM 5.5.1, all released in the last two decades of the 20th century.

A controversial aspect of legacy GEDCOM is the inclusion of LDS-specific and even application-specific extensions, especially because this was done without identifying these as extensions to the base specification. Few genealogy applications support these extension, most do not. That some product vendor claims full GEDCOM support does not imply they support these extension.

Personal Ancestral File

FamilySearch is one of the earliest genealogy software vendors. FamilySearch started selling Personal Ancestral File (PAF) back in 1984, and heavily promoted it on the front page of their website for many years, even long after the release of the last version.

By the way, FamilySearch did no develop PAF themselves. PAF is essentially a slightly modified and reboarded release of Ancestral Quest version 3, developed by Incline Software. While FamilySearch abandoned PAF after the release of PAF 5.2.18 back in 2002, Incline Software continues to develop and market new versions of Ancestral Quest.

FamilySearch GEDCOM and PAF

FamilySearch GEDCOM and PAF are strong associated with each other. for a while, PAF and GEDCOM versions numbers matched each other, and FamilySearch abandoned both around the turn of the century.

Although FamilySearch GEDCOM is strongly associated with PAF, PAF wasn't the first application to support GEDCOM. PAF 2.0 is the second application to support GEDCOM.
The first ever application to support GEDCOM, and the only one to support GEDCOM 1.0 when it was the current version, is Family History System (FHS), an MS-DOS application created by Phillip Brown. The first version of PAF to support GEDCOM is PAF 2.0, and it supports GEDCOM 2.0.

compatibility

The various FamilySearch versions of GEDCOM are not very compatible with each other. That hardly matters anymore, as the last version of FamilySearch GEDCOM, version 5.5 and 5.5.1, were released back in 1995 and 1999. Older versions of GEDCOM are of historic interest only.

"GEDCOM" alternatives

When they released GEDCOM 5.5.1 in 1999, FamilySearch was already working on alternatives to GEDCOM. FamilySearch has since published multiple proprietary GEDCOM alternatives, and keeps creating confusion by using GED or GEDCOM or in the names of these alternatives. For example, the name GEDCOM X they use for one of their GEDCOM alternatives strongly suggests that it is version 10 of GEDCOM, but it isn't GEDCOM at all, and not remotely compatible either.

GEDCOM today

GEDCOM version 5.5.1 is the last legacy specification, but no longer the latest GEDCOM version, nor the de facto standard. Most applications that claim to try and support GEDCOM 5.5.1 actually try and support GEDCOM 5.5.1 Annotated Edition, and leading applications support GEDCOM 5.5.5.

For some two decades since the release of FamilySearch GEDCOM 5.5 and 5.5.1, I've published technical articles that identified issues, and provided solutions and best practices to resolve those issues. Many vendors been following the advice from those articles for many years already, when it was all collected together in the GEDCOM 5.5.1 Annotated Edition (GEDCOM 5.5.1 AE). Thus, the GEDCOM 5.5.1 Annotated Edition is unique in already being the de facto standard before it was published.
Some time after the release of the Annotated Edition, a group of developers, including yours truly, worked together to produce an official successor, GEDCOM 5.5.5.

GEDCOM versions

This table provides an overview of major GEDCOM versions.

GEDCOM Versions
date        version  statusbrief notes
19841.0proposed standardFirst Version
1985-122.0standardPAF 2.0
1987-022.1standardPAF 2.1
1987-10-093.0standardlineage-linked form
1989-08-044.0standard 
1991-09-255.0standardlineage-linked structures
1992-09-185.1draft
1993-11-045.3draftUnicode, multimedia, name parts, schema (abandoned)
1995-08-215.4draftcitation-source-repository model, more name parts, submission record (obsolete)
1995-12-115.5standardstructured addresses, more name parts
1996-01-025.5standardGEDCOM 5.5 with errata
1996-01-105.5errataGEDCOM 5.5 errata sheet
1999-10-025.5.1standardUTF-8, email, URLs, geographic coordinates
2000-12-185.5.2unpublished draftno more escape sequences, GEDXML
2018-05-105.5.1 AEspecial releaseAnnotated Edition; annotations & corrections.
2019-10-025.5.5standardMaintenance release. Quality. Simpler & Stricter.

GEDCOM version 5.3 was released as a draft, but is still included in this table because several applications, most notably several versions of Family Tree Maker, use it anyway.

GEDCOM 5.5.1

For decades, FamilySearch kept claiming that version 5.5.1 is merely a draft, while using it themselves, including in their very public and heavily promoted Personal Ancestral File (PAF) application.

GEDCOM 5.5.2

GEDCOM 5.5.2 is an unreleased GEDCOM draft from 2000 that surfaced early in 2011, that was apparently at at one time meant to become GEDCOM version 5.6. It does not really offer any new features, except that its GEDXML foreshadows GEDCOM XML, one of multiple proprietary alternatives to GEDCOM proposed by FamilySearch.

current

The current de facto standard by usage is GEDCOM 5.5.1 Annotated Edition, which has an official successor in GEDCOM 5.5.5. At this time (2023), GEDCOM 5.5.5 is the latest specification. The GEDZIP specification is version-agnostic; it can be used with both GEDCOM versions, and even some legacy GEDCOM versions.

shortcomings

religious data format

In some sense GEDCOM is perfect for purpose. Genealogists tend to think of GEDCOM as a genealogical data format, but to the LDS it was a religious data format, a format to exchange data between databases they maintain for religious reasons.
That FamilySearch GEDCOM always had serious shortcomings as a genealogical data format, is in part because the mormons primary interest isn't genealogy, but recording religious rites they perform for their ancestors.

problems

Practically all genealogy applications support GEDCOM, but that still does not imply that you can expect a flawless transfer of your data by exporting your data to a GEDCOM file from one product and then importing that GEDCOM file into another product.

insufficient

The legacy GEDCOM specification are far from perfect. Even the last version, GEDCOM 5.5.1, suffers from known errors and unnecessary limitations that should really have been in earlier versions already. Most of these have now been addressed in GEDCOM 5.5.1 Annotated Edition and GEDCOM 5.5.5, but there are still issues to be resolved.

extensions

Vendors are allowed to extend GEDCOM to add support for things that the GEDCOM specification does not support explicitly. The obvious issue with that is that most other genealogy application will not support these extensions.

The combination of GEDCOM extensions, implementation limitations and idiosyncrasies of a product's GEDCOM is known as that product's GEDCOM dialect. Vendors do try to support the dialects of major competitors when importing GEDCOM files, but at the same time generally do not bother to document their own GEDCOM dialect.

quality

Wile various problems that users encounter are inherent in limitations of GEDCOM specification itself, other problems are caused by the less than stellar quality of vendor's GEDCOM implementations.

For example, the FamilySearch GEDCOM specification allows several character sets to be used. A common problem with old genealogy applications is that they do not support the character sets that they should support, which limits their ability to import GEDCOM files correctly or in fact import them at all.

Another common problem is that implementations provide incomplete support for the GEDCOM standard. In practice, many applications support only the GEDCOM features that the application itself uses. For example, a shortcoming of some genealogy applications is that they allow just one name per individual, while the GEDCOM specification allows more than one.

import log

On import of a GEDCOM file, a genealogy application should produce an import log, a simple text file that provides a log of any issues encountered during the import. That is not something the GEDCOM specification demands, that is just something the application should do.

What makes many of the GEDCOM import limitations worse is that many genealogy application do not bother to make an import log, or are not honest about the application's limitations. Some vendors will rather have their product claim that your GEDCOM file is wrong than admit to a limitation in their product.
When an import log is provided, it can be still difficult to understand what went wrong, but without an import log the average user is completely unable to judge how well the import went.

multimedia

GEDCOM does support multimedia, but that support is insufficient; it only allows referencing multimedia files by filename. Well, FamilySearch GEDCOM version 5.5 allows actually including multimedia inside GEDCOM files, but that feature was removed again in GEDCOM 5.5.1, and almost no genealogy application supports it. FamilySearch's decades-long claim that 5.5 was their latest specification, and that 5.5.1 was merely a draft, still did not get widespread support for this feature going.

The GEDCOM 5.5.1 and later specifications do not include a mechanism to bundle media files with a GEDCOM file. The GEDZIP specification created by The GEDCOM Group (gedcom.org) aims to resolve that. Meanwhile, several products allow syncing your database between your desktop application and some major websites, and that sync can include your media files.

export

When it comes to GEDCOM support, vendors tend to focus on GEDCOM import rather than GEDCOM export. Vendors focus on the ability of their application to import GEDCOM files created by other applications. Many vendors even proudly list all the applications that they believe their application to import perfectly in their feature list.

However, what is more important to you as a user is the quality of the GEDCOM export, and how well other applications support the product's GEDCOM dialect. After all, if no other application can import those files, you have been locked into that product, unable to switch to another.

FTW TEXT

Some vendors have taken so many liberties with the GEDCOM specification, that what their application produces isn't GEDCOM at all. Several now historic versions of Family Tree Maker for Windows (FTW) produce a GEDCOM dialect so awful, that it seems deliberately incompatible.

Some versions of Family Tree Maker even default to creating ostensible GEDCOM files that are not GEDCOM files, but what we now call FTW TEXT files. The product's dialog boxes are dishonest about this in a way that makes a user who does not know better believe that FTW TEXT is real GEDCOM.
Software Mackiev, the current owner of Family Tree Maker, offers the Family Tree Maker 2005 Starter Edition which can be used to import an FTW TEXT file and then export the data as an FTW GEDCOM file.

Many GEDCOM alternatives have been proposed. Most have been forgotten. None enjoy wide industry support.

GEDCOM alternatives

alternatives

Over the years, various alternatives to GEDCOM have been proposed, including FamilySearch's own ill-named GEDCOM XML and later GEDCOM X, neither of which is GEDCOM or GEDCOM compatible. The sheer number of available alternatives is an embarrassment of riches.
While many of the proposed alternatives do offer worthwhile advantages over GEDCOM, no alternative has achieved significant industry adoption.

adoption

One reason for limited support for alternatives is that most vendors are not eager to support a standard controlled by another vendor.
Some proposals are vendor-independent, but getting any new standard - however good - adopted is difficult. Vendors are unlikely to invest in a format unless it is about to become the new industry standard, but it will not become a new standard unless vendors invest in it.

replacement

One approach to deal with GEDCOM's issues is to create another, better standard, to replace GEDCOM. Many GEDCOM alternatives have been proposed. Most have been forgotten. None enjoy wide industry support.
The GEDCOM Alternatives article provides an overview.

solutions

GEDCOM 5.5.1 Annotated Edition solves many of the GEDCOM 5.5.1 issues, and that immediately explains why it replaced 5.5.1 as the de facto standard. Its official successor, GEDCOM 5.5.5, simplifies GEDCOM considerably. It also provides a few technically minor, but conceptually significant new features to resolve even more 5.5.1 issues. GEDCOM 5.5.5 is the first GEDCOM version to support intersex, treat all religions the same, include explicit support for relationships other than marriage, and explicitly support for same-sex relationships.

GEDCOM is not perfect, and not perfectly supported either, but it is the only widely supported standard.

conclusion

GEDCOM is a data format for genealogical data. GEDCOM allows transferring data from one genealogy application to another, but because of inherent GEDCOM limitations, incomplete specifications, unsupported dialects and poor implementations, that transfer may be less than perfect. On top of that, many applications do not even provide an import log to help you figure out how the import went.

GEDCOM is not perfect, and not perfectly supported either, but it is the only widely supported standard.
In practice, basic data such as names and vital events transfers just fine, and that is already a large improvement on a world without any standard for genealogy data. A lot of other data such as notes and sources generally transfers successfully as well. Moreover, GEDCOM dialects of popular products tend to be supported by many other products.

Vendors tend to stress the ability of their product to import GEDCOM files created by competing products, but to a user, the more important thing is the quality of the GEDCOM files it exports, as that largely determines the ability of other products to import those GEDCOM files. Only when other applications will import the exported file can you use a GEDCOM file to do what it was designed to; move your data from one application to another.

Many of the problems users experience are issues with legacy GEDCOM, specifically the last versions, GEDCOM 5.5 and 5.5.1. Legacy GEDCOM suffers from inconsistencies, contradictions and some serious limitations. Many GEDCOM 5.5.1 issues have been resolved in GEDCOM 5.5.1 Annotated Edition and GEDCOM 5.5.5.

updates

2023-05-20: extensive update

Extensive revision of the original 2010 article to bring it up to date.

links