Modern Software Experience

2011-07-15

GEDCOM header

system idapplication
ANSTFILEAncestral File
BROSKEEPBrother's Keeper
CFTREECumberland Family Tree
FTLFamily Tree Legends
GEDFANThe GEDCOM Fan Creator
GRAMPSGRAMPS
KITHKIN_PROKith and Kin Pro
PAFPersonal Ancestral File
FTWFamily Tree Maker for Windows
TMGThe Master Genealogist
UFTREEUltimate Family Tree
AncestQuestAncestral Quest
FamilyOriginsFamily Origins for Windows
FamTreesQEFamily Trees Quick & Easy
FamTreeHrtgFamily Tree Heritage
LegacyLegacy Family Tree
ReunionReunion
RootsMagicRootsMagic

SOUR and DEST

A GEDCOM header often contains both a SOUR and a DEST tag. The HEAD.SOUR tag identifies the source of the GEDCOM file, and the HEAD.DEST tag identifies the destination.

HEAD.SOUR

The HEAD.SOUR tag identifies the source. That source is not the database the GEDCOM file was created from, but the system that created the GEDCOM file.
The optional HEAD.SOUR.DATA tag may identify the database the GEDCOM file was created from, the optional HEAD.FILE may identify the original filename of the GEDCOM file, and the optional HEAD.NOTE tag may provide a description of the content of the GEDCOM file.

The HEAD.SOUR tag identifies the system that created the GEDCOM file. The value of the HEAD.SOUR tag is a short and unique system identifier, meant for automated processing. The table provides some examples of such system identifiers for several popular genealogy applications.

Subtags of HEAD.SOUR provide additional information, such as the version number of the application, the full name of the product, the company that produced it, even their full address. All that information is obviously meant for humans interested in the source of the GEDCOM file. The value of the HEAD.SOUR tag itself is meant for processing by other applications.

<APPROVED_SYSTEM_ID>

The GEDCOM specification states that the HEAD.SOUR tag value has to be an <APPROVED_SYSTEM_ID>. That sounds very definitive, but it is not.
There is no public authoritative list of approved system identifiers. FamilySearch used to maintain a list of such names on their gedcom.org site, but stopped maintaining that site years ago. They may still have, a probably outdated, list, but they do not publish it anymore.
That there is no official list does not mean that the tag value cannot be checked. For starters, there is a maximum length of 20 characters.

A small table of system identifiers like the one I've included here is easy to create by examining a bunch of GEDCOM files, but it is not likely to be complete. There are hundreds of genealogy applications already, old ones are hard to get by and new ones continue to be introduced.

There is no public authorative list of approved system identifiers.

usage

The HEAD.SOUR tag is mandatory; a GEDCOM header must include it. Many GEDCOM readers do not use the system identifier provided by the HEAD.SOUR tag at all, but others do. If all genealogy applications always produced nothing but perfect GEDCOM files, the HEAD.SOUR tag would merely be informational, it would not be needed. In the real world, the system identifier is used to detect GEDCOM dialects and errors.

The system identifier is mostly used to detect GEDCOM dialects, specifically the application's own dialect, i.e. GEDCOM files produced by the same system, although perhaps another version of that system. Several applications do so to recognise and correct errors that earlier versions of the same application produced. That is how it should be; a vendor should correct errors, but at the same time continue to support GEDCOM files produced by earlier versions that still had the now corrected error.
A GEDCOM reader can also use the system identifier to recognise a GEDCOM file produced by another system, for example some system that does not use the CONC and CONT tags correctly, to try and correct its errors.

When the HEAD.DEST is present, it is the HEAD.DEST tag, not the HEAD.SOUR tag, that specifies the GEDCOM dialect used.

HEAD.DEST

The HEAD.SOUR tag is mandatory; a GEDCOM header must include it. The HEAD.DEST tag is optional; a GEDCOM header may include it.
A GEDCOM reader that tries to support GEDCOM dialects it knows about, must process the HEAD.DEST tag. When the HEAD.DEST is present, it is the HEAD.DEST tag, not the HEAD.SOUR tag, that specifies the GEDCOM dialect used.

Several applications do not merely allow you to generate a GEDCOM file for import into another application, but explicitly support several other applications. They allow you to generate a GEDCOM specifically meant to be imported by a particular application. When you choose to do so, the generating application tries to take advantage of all its knows about that other application's GEDCOM dialect to make sure as much data as possible gets imported without any problems.
It still specifies its own system id as the creator of the GEDCOM file. It additionally specifies the system id of the other application as the destination of the GEDCOM file.

Thus, GEDCOM readers for applications that try to detect their own dialect must not only process the HEAD.SOUR tag, but should also look for and process the HEAD.DEST tag. That the system identifier specified by HEAD.SOUR is for their own application, does not immediately imply that the GEDCOM file is in their own GEDCOM dialect. That he system identifier specified by HEAD.SOUR is for some other application, does not immediately imply that the GEDCOM file is another GEDCOM dialect.
If the HEAD.DEST tag is present, it specifies the GEDCOM dialect used. The HEAD.SOUR tag continues to specify the system that created the GEDCOM file, and should not be ignored, not even if the GEDCOM dialect used is all the receiving system is interested in.

When a vendor tries to create files in a third party GEDCOM dialect, what they end up creating is often best described as their dialect of that dialect…

dialect of a dialect

Few vendors document their GEDCOM dialect. When one vendor tries to create GEDCOM files in the GEDCOM dialect of another vendor, the receiving system may need to deal with two types of errors; the GEDCOM writer may not only get the third party GEDCOM dialect wrong, but is also likely to continue to incorporate shortcomings of its own dialect that its developers do not recognise as such.
When an application generates a GEDCOM file in another application's dialect, it should not only try to support that dialect well, it should also remember to support only that dialect, instead of its own, but it is not impossible that it accidentally continues to include some of its own GEDCOM extensions.

When a vendor tries to create files in a third party GEDCOM dialect, what they end up creating is often best described as their dialect of that dialect…
That is the reality the receiving system has to deal with. When the HEAD.DEST specifies its own own dialect, the system needs to look at the HEAD.SOUR to recognise the creator's dialect of its own dialect. Luckily, few vendors try to create GEDCOM files in another vendor's GEDCOM dialect.

The GEDCOM specification fails to specify what the legal values for HEAD.DEST are.

specification

The GEDCOM specification fails to specify what the legal values for HEAD.DEST are. It does not even state that the values used with HEAD.DEST are the same system identifiers as those used with HEAD.SOUR. That is sloppy. Luckily, it is so obvious to experienced developers that HEAD.DEST and HEAD.SOUR use the same system identifiers, that most do not even notice that the specification fails to tell them this.

These are the relevant parts of the GEDCOM 5.5.1 specification:

HEADER:=
n HEAD
+1 SOUR <APPROVED_SYSTEM_ID>

+1 DEST <RECEIVING_SYSTEM_NAME>

The header structure provides information about the entire transmission. The SOURce system name identifies which system sent the data. The DESTination system name identifies the intended receiving system.

APPROVED_SYSTEM_ID:={Size=1:20}
A system identification name which was obtained through the GEDCOM registration process. This name must be unique from any other product. Spaces within the name must be substituted with a 0x5F (underscore _) so as to create one word.

RECEIVING_SYSTEM_NAME:={Size=1:20}
The name of the system expected to process the GEDCOM-compatible transmission. The registered RECEIVING_SYSTEM_NAME for all GEDCOM submissions to the Family History Department must be one of the following names:

  • "ANSTFILE" when submitting to Ancestral File.
  • "TempleReady" when submitting for temple ordinance clearance.
The GEDCOM specification is so sloppy, that it is downright misleading.

The GEDCOM specification is so sloppy, that it is downright misleading.
The GEDCOM specification states that the value for HEAD.SOUR should be an <APPROVED_SYSTEM_ID>, and that the value for HEAD.DEST should be a <RECEIVING_SYSTEM_NAME>; one is a system id, the other is a system name.
That one value is called a system id and the other is called a system name within the same specification, strongly suggests that systems have both a system id and system names and that these are two different things.

The inconsistent choice of names suggests that <APPROVED_SYSTEM_ID> and <RECEIVING_SYSTEM_NAME> are two different things, and nowhere in the GEDCOM specification does it state that the <APPROVED_SYSTEM_ID> and <RECEIVING_SYSTEM_NAME> do in fact use the same system identifier values. The specification does not even mention either <APPROVED_SYSTEM_ID> or <RECEIVING_SYSTEM_NAME> anywhere else at all. <RECEIVING_SYSTEM_NAME> should really have been called <RECEIVING_SYSTEM_ID>.
<RECEIVING_SYSTEM_NAME> should really have been called <RECEIVING_SYSTEM_ID>. The only hint that the so-called system name is nothing more than an ill-conceived alias for system id is provided by the few sentences right after the header syntax, which mention the SOUR and DEST tag on equal footing.

legal values

The way the GEDCOM specification is written encourages another misinterpretation.
The specification for the GEDCOM header states that the HEAD.DEST tag is optional, and the specification for <RECEIVING_SYSTEM_NAME> states which two values are legal for submission to the Family History Department. Some developers have misunderstood this as saying that ANSTFILE and TempleReady are the only two legal values for the HEAD.DEST tag.
That is not what the specification says. It only says that GEDCOM submissions to the Family History Department should use either their ANSTFILE or TempleReady system. It does not state that other values are not allowed.

The specification not only encourages this misinterpretation by only mentioning these two values, and by not stating that the HEAD.SOUR and HEAD.DEST tags use the same system identifiers, but also by not specifying at all what to do in the much more common case that you do not generate GEDCOM files for submission to the Family History Department. That encourages the misinterpretation that the HEAD.DEST tag should only be used when generating GEDCOM files for the Family History Department. That is not what the specification states, and few developers doubt that other system identifiers are legal. The specification does not forbid any system identifier as the HEAD.DEST value.

The brief statement the GEDCOM specification provides right after the GEDCOM header syntax is right; HEAD.SOUR specifies the source, and HEAD.DEST specifies the destination.

Some vendors, including FamilySearch, use made-up values with HEAD.DEST.

made-up system identifiers

The GEDCOM specification fails to state that ANSTFILE and TempleReady are two system ids. The GEDCOM specification fails to state that HEAD.SOUR and HEAD.DEST use the same system identifiers. Despite these shortcomings, most developers still understand that they can use any system identifier with HEAD.DEST. Alas, few developers do not understand that a sloppy but widely understood specification still does not allow them to use anything but system identifiers. Some vendors, including FamilySearch, use made-up values with HEAD.DEST.

Made-up system values I have encountered include GEDCOM, ANY, and Other. I do not believe that these are system identifiers; I do not know of any system that uses either GEDCOM, ANY or Other as its system id.

It is not hard to understand what happened. The GEDCOM specification not only fails to specify that HEAD.DEST uses the same system identifiers as HEAD.SOUR, and that any of these system ids can be used to indicate the receiving system's GEDCOM dialect. The GEDCOM specification also does not specify what a system should do in the most common case of all; data is simply being exported to GEDCOM file, without the that GEDCOM file being tailored to any particular destination system. The result is that some vendors have decided to use made-up system identifiers such as GEDCOM, ANY and Other. That is wrong.
That FamilySearch's own Personal Ancestral File (PAF) gets this wrong (it uses DEST Other) is not as surprising as it initially seems when you know that PAF was not created by FamilySearch, but merely adapted from Ancestral Quest, a third-party application.

There are two correct ways to export to a GEDCOM file that does not have a particular destination system. The first, and most obvious way, is to simply omit the optional HEAD.DEST tag, in recognition of the fact that there is no particular destination. The second one is to use the same system id with HEAD.DEST as used with HEAD.SOUR. After all, the HEAD.DEST value tells the receiving system which GEDCOM dialect is being used, and any GEDCOM file produced by the system is likely to be in a dialect GEDCOM file that only the originating system itself fully understands.

summary

The remarkable brief discussion of the HEAD.SOUR and HEAD.DEST tags in the GEDCOM specification has proven to be insufficient. The specification is not only incomplete, but manages to be misleading too, and that has resulted in erroneous implementations.
The specification does not tell you, but the HEAD.SOUR and HEAD.DEST tags use the same system identifiers. There is no public authoritative list of system identifiers, but most systems only need to recognise their own system identifier, and perhaps those of a few other systems they are often used with.
The mandatory HEAD.SOUR tag identifies the system that generated the GEDCOM file.
The HEAD.SOUR identifies the GEDCOM dialect used, unless the HEAD.DEST tag is present. The optional HEAD.DEST tag is used to specify the receiving system, i.e. the GEDCOM dialect the receiving system should support to fully understand the GEDCOM file.

GEDCOM readers can use the HEAD.SOUR and HEAD.DEST values to recognise the GEDCOM dialect and, especially when used in combination with the HEAD.SOUR.VERS value, to recognise and correct known errors. Systems that want to recognise their own dialect must process HEAD.DEST in addition to HEAD.SOUR, and can use HEAD.SOUR to identify the generating system's dialect of their own dialect.

Most GEDCOM files are not created for a particular receiving system. Vendors should not make up a system identifier for this most common case, but either simply omit the optional HEAD.DEST tag, or specify their own system id, in recognition of the simple fact that their output is unlikely to be pure GEDCOM, and certain to be their own dialect.

links