Modern Software Experience

GEDCOM

best practice

One thing the GEDCOM specification does not provide are GEDCOM usage rules. In the absence of such rules, several conventions have arisen.
This article documents these conventions as best practice.

ALL-UPPERCASE

GEDCOM is an abbreviation of GEnealogical Data COMmunication, and it is always written ALL-UPPERCASE. That is how the GEDCOM specification does, so that is how it is done.

Not only some users, but some vendors (!) write Gedcom or gedcom. If you are such a vendor, you can of course ignore this advice. However, you should be aware that such usage not only invites mild ridicule, but also is a red flag to any experienced genealogy software reviewer. Not even writing GEDCOM correctly raises immediate doubts about how serious you take GEDCOM support, and thus directs the reviewer’s attention to the application GEDCOM support as an area for review.

abbreviation

GEDCOM is an abbreviation. As a general rule, it is good to explain an abbreviation on its first use.

However, to do so again and again in every review of genealogy software would be insulting to your readers. It is perfectly fine to presuppose some level of knowledge. Readers who comes across it for the first time will probably deduce that it is some genealogical file format from the context already.

Readers that do not recognise what GEDCOM is from the context can always look it up. If you have a introductory article on GEDCOM, you can link to it.

Genealogy application documentation often includes a brief explanation of GEDCOM in the section about import and export of files. Many also have an appendix that explains common jargon.

This website has a Genealogy Jargon page. It is one of the informative pages that every page on this site links to. If you like it, feel free to link it to yourself.

plural

GEDCOM arguably does not have a plural form. It is a file format. We do speak of one GEDCOM file and two GEDCOM files.

However, in colloquial speech we often talk about a GEDCOM. That colloquial usage should not be encouraged in writing, but when you do find yourself referring to a bunch of GEDCOMs, the plural form is formed by appending a single lower-case s.

LDS

The GEDCOM specification is not published by an internationally recognised standard body, but by the Family History Department, a department of the Church of the Latter Day Saints (LDS). They tend to insist that you spell out their full name, which is The Church of Jesus Christ of Latter-day Saints, so the LDS abbreviation is a true blessing. It is common to omit any mention of the Family History Department and simply say that GEDCOM is a specification of the LDS.

de-facto standard

Despite some confused claims to the contrary, GEDCOM is a standard.

The full name of version 5.5 of the GEDCOM specification is The GEDCOM Standard Release 5.5, but GEDCOM is not a standard merely because it includes Standard in its name. All that confirms is that the creators of GEDCOM want you to think of GEDCOM as a standard.

GEDCOM is a standard. However, as it is not published or endorsed by an internationally recognised standards body it is not a de jure standard, but merely a company-specific specification that soon after its creation became the de facto standard for data exchange between different genealogical applications.

versions

There are multiple versions of GEDCOM. Specific versions are referred to by including the version number directly after the word GEDCOM. For example, version 5.5 of the GEDCOM specification is commonly referred to as GEDCOM 5.5. It is not incorrect to write GEDCOM version 5.5 or even use the full name, The GEDCOM Standard Release 5.5, but it somewhat unusual to do so.

current version

Officially, the current (2009 Jul 23) version of GEDCOM is GEDCOM 5.5. However, most genealogical application have been using GEDCOM tags introduced in GEDCOM 5.5.1 for years. So, practically, GEDCOM 5.5.1 is the current version of GEDCOM.

Although context may make clear whether you are referring to the official current version or the de facto current version, it is best to be explicit.

Moreover, current is a time-sensitive word, so it should only be used in combination with a date. Often a publication date is readily apparent, but it does not hurt to add the date you are writing in brackets directly after the word current, as done above.

legal and illegal extensions

The GEDCOM specification allows extension of the GEDCOM specification and defines how vendors should do that.
Extensions that follow the rules are known legal extensions, those that do not follow the rules are known as illegal extensions.

dialects

Vendors rarely provide full support for all GEDCOM features and often extend GEDCOM to support of application features the GEDCOM specification does not cover. The resulting vendor-specific variation on GEDCOM is known as a dialect of the GEDCOM language.

write

To be precise, a GEDCOM dialect is the dialect that the application writes. Most applications read their own dialect and several others. Applications such as GEDCOM viewers that do not write GEDCOM files do not have a GEDCOM dialect.

application

GEDCOM dialects are not specific to a vendor, but to an application. Therefore, GEDCOM dialects are not indicated by vendor name, but by application name. This is a fortunate convention, as some product have changed owner more than once.

naming

As a general rule, a GEDCOM dialect is indicated by putting the application name in front of GEDCOM. For example, the GEDCOM dialect supported by RootsMagic is RootsMagic GEDCOM.

abbreviations

Some applications names are rather long and commonly abbreviated. If the application is commonly referred to by its abbreviation, the GEDCOM dialect is known by the abbreviation instead of the full name.

For example, Personal Ancestral File is commonly referred by its PAF abbreviation, so its GEDCOM dialect is PAF GEDCOM. Ancestral Quest is commonly abbreviated as AQ, so its GEDCOM dialect is AQ GEDCOM.
Legacy Family Tree provides a slight different example. Legacy Family Tree is rarely abbreviated to LFT, but commonly referred to as just Legacy, so its GEDCOM dialect is Legacy GEDCOM.

Family Tree Maker

Family Tree Maker is a special case. There was a DOS product, known as Family Tree Maker and then there was a Windows product known as Family Tree Maker for Windows. These product names were abbreviated to FTM and FTW respectively, so the GEDCOM dialects were known as FTM GEDCOM and FTW GEDCOM respectively. Ancestry.com stopped using the product name suffix for Windows after a while, but the FTW abbreviation stuck.

With the introduction of Family Tree Maker 2008 the FTM abbreviation returned. Thus, FTW 16 was followed by FTM 2008, and its GEDCOM dialect is FTM GEDCOM. In practice, it is unlikely that reuse of the same name will cause confusion.

version dialects

GEDCOM dialects are not just specific to an application, but even to a particular version of an application. When discussing the differences in GEDCOM dialect between say PAF 4 and PAF 5.2, the convention to prefix GEDCOM with the application name or abbreviation can be extended to prefix it with the exact version, e.g. The differences between PAF 4 GEDCOM and PAF 5 GEDCOM are minor.

GEDCOM encodings

The GEDCOM 5.5 specification allows the use of different character sets and encodings, to wit ASCII, ANSEL and UTF-8.
When discussing differences between GEDCOM files based on different encodings, it is customary to prefix GEDCOM with the name of the encoding used, e.g. The GEDCOM 5.5.1 specification allows the same data to be encoded as either an ASCII GEDCOM, ANSEL GEDCOM or UTF-8 GEDCOM..

illegal encodings

This convention is commonly extended to whatever encoding is being used, even if that encoding is not legal GEDCOM. Thus, although technically not a proper GEDCOM file, a GEDCOM 5.5 file encoded in ANSI is still referred to as an ANSI GEDCOM, and one encoded in MacRoman as a MacRoman GEDCOM.

combining

When the convention for GEDCOM dialects and character encoding are combined, the encoding is kept next to GEDCOM. For example, a GEDCOM 5.5 file encoded in ANSEL and created by PAF 5.2 is a PAF 5.2 ANSEL GEDCOM 5.5 file.

not GEDCOM

When you want to communicate that an ostensible GEDCOM file is not a proper GEDCOM file, use quotes around GEDCOM.
This situation typically occurs with illegal encodings or illegal extensions, but may also occur with FTW TEXT.

FTW TEXT

FTW TEXT is an undocumented proprietary format of Family Tree Maker for Windows that causes problems because FTW tries to pass it off as GEDCOM. It also uses incorrect and deliberately misleading terminology such as abbreviated tags.

FTW TEXT discusses what FTW TEXT is, Dealing with FTW TEXT discusses how to deal with it, and Documenting FTW TEXT discusses how to document what you’ve done.

GEDCOM alternatives

There are several alternatives to GEDCOM. These are known as GEDCOM alternatives.

GEDCOM 6

Surprisingly, one of the alternatives is known as GEDCOM 6. That is an ill-chosen name, because GEDCOM 6 does not use the GEDCOM grammar.
Because it is not GEDCOM, this name is best quoted. That underscores that generally, references to GEDCOM without a version number do not mean to include GEDCOM 6.

GedXML

The GEDCOM 6 name is easily explained; the LDS proposed this new file format, also known as GEDCOM XML as the successor to GEDCOM 5.x. The GEDCOM XML name is abbreviated to GedXML, and that is the better name, as it avoids the unnecessary confusion that arises from including GEDCOM in the name, without using a name so different that you’d think the two are completely unrelated.

Most discussion of GEDCOM implicitly exclude GedXML, but from time to time, for example when discussing GEDCOM alternatives, it may be prudent to be explicit about the exclusion.

conclusion

None of the above is particularly original. These conventions have evolved over time and are in use already. I just thought it might be a good idea to write it all down, to make it easier for others learn about these conventions and adopt them as best practice.

updates

2011-06-12 More GEDCOM articles

Added links to A Gentle Introduction to GEDCOM, GEDCOM Magic, GEDCOM Tags, GEDCOM Alternatives and GEDCOM Validation.

links