Modern Software Experience

2012-02-27

more GEDCOM magic

GEDCOM 5.5EL

GEDCOM 5.5EL is a set of GEDCOM extensions.
The GEDCOM 5.5EL (Extended Locations) specification is on GenWiki, the genealogy wiki maintained by the Verein für Computergenealogie e.V., the German Society for Computer Genealogy. The one-page specification is available in both German and English. It mentions half a dozen genealogy applications that read and write GEDCOM 5.5EL, but it does not tell implementers how to recognise a GEDCOM 5.5EL file.

There is no official method for detecting the GEDCOM 5.5EL extensions.

detecting GEDCOM 5.5EL

There is no official method for detecting the GEDCOM 5.5EL extensions. Detecting GEDCOM 5.5EL would be easy if the GEDCOM 5.5EL specification called for the addition of some tell-tale marker to the GEDCOM header, but it does not. However, some applications, such as PC-AHNEN, do add some information to the GEDCOM header.


0 HEAD
1 SOUR PCAHNEN
2 VERS 2010
1 DATE 27 FEB 2012
1 DEST PAF
1 GEDC
2 VERS 5.5.1 EL
3 _EXTENDED_LOCATIONS
2 FORM LINEAGE-LINKED
1 CHAR UTF-8

PC-AHNEN GEDCOM header

Specifically, PC-AHNEN adds a space and EL to the version number, and adds the extra line 3 _EXTENDED_LOCATIONS below it.
The 3 _EXTENDED_LOCATIONS line is legal; the non-standard tag _EXTENDED_LOCATIONS starts with an underscore, and is 19 characters long, while GEDCOM tags are allowed to be up to 31 characters in length. However, the addition of a space and EL to the version number is illegal, and the non-standard version number may prevent GEDCOM readers from recognising the file as a GEDCOM 5.5.1 file.
The illegal VERS value invalidates the GEDCOM header and as a rule, GEDCOM readers should reject files that do not even contain a valid GEDCOM header. A forgiving GEDCOM reader that recognises the illegal value could issue a non-fatal error and continue.

Once you know this GEDCOM header extension exists, it makes some sense to look for it, but this GEDCOM header addition is completely non-standard. The _EXTENDED_LOCATIONS tag is not in the GEDCOM 5.5EL specification, so you cannot rely on it being present in every GEDCOM 5.5EL file.

GEDCOM 5.5EL application list

One possibility is using a list of GEDCOM 5.5EL supporting applications and version numbers to recognise GEDCOM 5.5EL, but that kludge has serious limitations. First of all, there is no official list of GEDCOM 5.5EL-supporting applications. The half dozen applications on the GEDCOM 5.5EL page do not constitute a complete list of applications that support GEDCOM 5.5EL. It is a list that of the applications whose vendors joined together to create GEDCOM 5.5EL as an extension supported by all. It does not list the minimum version numbers, nor any of the applications that came to support GEDCOM 5.5EL later.

Besides, even if such a list were available, it would not do much good. After all, an application that supports GEDCOM 5.5EL might offer an option to turn GEDCOM 5.5EL output on or off. So, even when you know the application and version number, and know it supports GEDCOM 5.5EL, you still do not whether the GEDCOM file actually contains GEDCOM 5.5EL extensions or not.
Perhaps the biggest issue with the application list approach is that it depends on each GEDCOM reader having an up-to-date version of that list; GEDCOM 5.5EL files from a brand new applications will not be recognised by an older applications that are no longer being maintained.

A GEDCOM file that contains PLAC tags but no _LOC tags, isn't a GEDCOM 5.5EL file.

_LOC tag

Here is a simple, reliable and practical way to detect GEDCOM 5.5EL; just look for the _LOC tag. A GEDCOM file that uses GEDCOM 5.5EL is practically sure to contain the _LOC tag; the _LOC tag will only be absent in the highly unlikely case that the GEDCOM file does not contain any place names. A GEDCOM file that contains PLAC tags but no _LOC tags, isn't a GEDCOM 5.5EL file.

If the file does not contain PLAC tags, it might still be a GEDCOM 5.5EL file. A GEDCOM reader could look for the _GODP and _WITN tags, but there are several issues with that.
It is not wise to make detection logic more complex than it needs to be, and it is not wise to rely on the _GODP and _WITN tags. Because GODP and WITN used to be GEDCOM tags (removed in GEDCOM 5.4), it is not unlikely that several applications now support _GODP and _WITN instead without supporting GEDCOM 5.5EL.
You also have to wonder how much detection effort a GEDCOM file without place names is still worth.
The real clincher is that looking for the _LOC tag is a reasonable detection method, and that it is trivial for a GEDCOM writer to ensure that its GEDCOM 5.5EL files contains at least one _LOC record. After all, a valid GEDCOM file should contain two addresses; the submitter address in the SUBM record, and the vendor address in the GEDCOM header.

detection

A GEDCOM reader can recognise a GEDCOM 5.5EL file by looking for the _LOC tag. A GEDCOM reader need not bother using any vendor-specific extensions to GEDCOM 5.5EL to recognise GEDCOM 5.5EL, should not rely on a list of applications and version numbers, and not try to use the _GODP or _WITN tag to detect GEDCOM 5.5EL. A GEDCOM reader should keep it simple, and stick to a straightforward check for the _LOC tag.

A GEDCOM writer using GEDCOM 5.5EL should ensure detection of the GEDCOM 5.5EL extensions by all GEDCOM readers using this straightforward method, even for databases that are devoid of place names, by including a _LOC record for the vendor address in the GEDCOM header.

links