Modern Software Experience

2018-05-10

This is the original text, archived for reference. It was replaced by a slightly updated text on 23 May 2018 (link below).

GEDCOM 5.5.1 specification, improved and enhanced

GEDCOM Annotated Edition

Today I announce The FamilySearch GEDCOM 5.5.1 Specification Annotated Edition (TFG551SAE).
This is not a new GEDCOM version. This is an enhanced edition of the current GEDCOM version.
The Annotated Edition is the full FamilySearch GEDCOM 5.5.1 Specification, improved with corrections and enhanced with annotations.

The annotations are varied in nature. There are corrections of errors, commentary about differences with GEDCOM version 5.5, observations on the structure of GEDCOM files, notes meaning of some sections, clarifications of confusing parts and resolutions of contradictions. There is commentary on bad examples within the specification and on some real-world issues the specification never mentions. Several annotation provide advice, guidelines and best practices, often with links to relevant articles.
Obsolete and deprecated sections have been clearly marked as such, but have not been otherwise ignored; those sections do still contain corrections and annotations.

A bonus sections provides extensive ANSEL Unicode tables, ANSEL / Unicode conversion algorithms, a brief overview of GEDCOM validators, and a GEDCOM 5.5.1 version detection algorithm (needed because of truncated GEDCOM version numbers).

examples

The Annotated Edition corrects obvious errors, such as the impractically small maximum length of media file names. It notes non-obvious errors, like the omission of tab as a legal character in user text. It corrects erroneous definitions, in the GEDCOM grammar, in the lineage-linked form and the Appendix. It adds helpful observations, facts that developers may not immediately notice, like that it is legal for identifiers to contain spaces. It provides various best practices, including best practices for character sets and encodings as well as CONC and CONT usage.

acknowledgements

This is my creation, which means that I am to blame for all its shortcomings and defects, buts I did not create this alone.
Several experts did not hesitate to review one or more pre-releases, and the publication is considerably better for it. They caught mistakes and contributed observations that led to new annotations and corrections.

why

This project started back in 2013. I had already written many articles that mention GEDCOM issues and a fair number of articles focussed on particular aspects of GEDCOM. These articles address issues with either the GEDCOM specification itself, vendor interpretation and implementation of GEDCOM, or both.
The problem with any solutions or best practices provided by these articles was, perhaps not so much that developers were unwilling to implement changes, but that developers had to find these articles first. The way to make sure they'd find relevant information, was to put that information into the GEDCOM specification itself. Thus arose the idea of an annotated GEDCOM specification; the full GEDCOM 5.5.1 specification, improved through annotations that provide helpful information and links.

annotations and corrections

The annotations are not written next to the original text, as that would require tiny text in wide margins. They are no footnotes at the bottom of each page either. The annotations are clearly recognisable boxes with a light yellow background, inserted in the flow of the text.

An early draft had annotations for every little thing, but inserting an annotation box with explanatory text for each minor correction is ridiculous. I felt that it made more sense to simply make a minor textual correction, than to create an annotation for it, so I decided to forgo annotation purity; it is still called the Annotated Edition, but it really is an Annotated and Corrected Edition.
Some may take issue with this approach, but it not only significantly reduces the number of annotations, it also ensures that corrections are hard to overlook.

edits

Some textual corrections have an annotation, many do not need one. All edits have been clearly marked, there's never any doubt what's the original text, and what's new or replacement text. The original text is stricken through, in dark red on a light red background. New and replacement text is dark green, on a light green background.

If there is one thing that surprised me during the creation of the Annotated Edition, it is just how much of the GEDCOM 5.5.1 specification is obsolete or deprecated.

obsolete & deprecated

Some parts of specification are obsolete, other parts are deprecated. These parts have all been marked through both a foreground and background colour.
If there is one thing that surprised me during the creation of the Annotated Edition, it is just how much of the GEDCOM 5.5.1 specification is obsolete or deprecated.

Other parts of the specification highlighted by both foreground and background colours are the explicit use of the CONC and CONT in the lineage-linked form and the use of C-style comments in some places.

page numbers

An advantage of not trying force-fit annotations in a small margin is that you can use a regular font size. An issue with including annotations in the main text flow is that the text becomes longer, so it becomes practically impossible to keep everything on the same page as in the original document.
Not only does the Annotated Edition have things on different pages than the original text, different pre-releases of the Annotated Edition may have the same thing on different pages as well.
This is not unimportant. The GEDCOM specification contains many explicit references to page numbers; it is annoying if they aren't right, and perhaps even more annoying if the page numbers in the Annotated Edition are different from those in the unannotated edition changed.

To avoid confusion between the original page numbers and those of the Annotated Edition, the pages of the Annotated Edition are not numbered at all. Instead, the original page breaks are shown within the text, and the original page numbers are shown in the margin.

some other numbers

The source file that the PDF is created from contains more 800.000 characters including all the mark-up instructions.
The FamilySearch GEDCOM 5.5.1 Specification is 100 pages longs. Using a slightly smaller font, the Annotated Edition is roughly 175 pages.

The FamilySearch GEDCOM 5.5.1 Specification contains hyper-links to navigate from one section or definition to another. The Annotated Edition contains all the same links and more. Annotations often contain links to related definitions or other annotations.
There are more than 750 internal links, and close to 100 external links.

The Annotated Edition includes more than 200 textual replacements, more than 50 textual additions, and more than 300 annotations. That's quite a lot, but we probably still missed things.

availability

The The FamilySearch GEDCOM 5.5.1 Specification Annotated Edition is not available for download yet.
Right now, it seems best to have one more review round, with a wider selection of reviewers.
Developers and advanced users interested in providing correction, additions and other feedback can request a review invitation by emailing me. The email address to use is on the contact page.

Frequently Asked Questions

Is this a new version of GEDCOM?

This is not a new version of GEDCOM. Some errors have been corrected, and some contradictions have been resolved, helpful notes have added, etcetera, but this is still GEDCOM 5.5.1.

why 5.5.1? Isn't 5.5.1 a draft?

No, GEDCOM 5.5.1 isn't a draft, and never really was a draft. FamilySearch themselves started to use it immediately.

what's it for?

The Annotated Edition benefits developers of genealogy software. The observations, additions, corrections, resolution of contradictions, inclusion of solutions and best practices helps them improve their GEDCOM support. Improved GEDCOM support benefits users.

what does it cost?

The Annotated Edition is copyrighted publication available as a free download.

The Annotated Edition is still under technical review by multiple experts and not publicly available yet.

updates

2018-05-13 pre-release sent

A pre-release dated 13 May 2018 has been sent to reviewers.

links