Modern Software Experience

2013-09-09

This article was originally titled @IDENT@ CONT.
It has been renamed and is part II of the GEDCOM Identifiers quadrology now.

GEDCOM syntax error

GEDCOM line

This is how the FamilySearch GEDCOM 5.5.1 specification defines a GEDCOM line:

gedcom_line:=

level + delim + optional_xref_ID + tag + optional_line_value terminator

A GEDCOM line starts with a level number, must contain a tag, and is always terminated by a newline. That, and the delimiter (space) between the level number and the tag, are the mandatory parts of a GEDCOM line. There are two optional parts: the cross-reference identifier and the line value.
Many GEDCOM lines have a line value. Those line values are the data being transferred, those line values are what GEDCOM is about.
To allow cross-references between records, GEDCOM lines may also have a cross-reference identifier. Delimiters (spaces) to separate all parts from each other are mandatory.

0 HEAD
0 HEAD
1 SOUR TJ
2 NAME Tamura Jones
2 VERS 1.0
1 DATE 9 Sep 2013
1 FILE IdentCONT.GED
1 NOTE Test File: CONT line with identifier.
1 GEDC
2 VERS 5.5.1
2 FORM LINEAGE-LINKED
1 CHAR UTF-8
1 LANG English
1 SUBM @U1@
0 @U1@ SUBM
1 NAME Name
0 @I1@ INDI
1 NAME One /Note/
2 SURN Note
2 GIVN One
1 NOTE First line of a note.
2 @IDENT@ CONT Second line of a note.
2 CONT Third line of a note.
0 TRLR

This GEDCOM syntax allows an identifier on any GEDCOM line. A specific GEDCOM form, such as the LINEAGE-LINKED form, restricts the usage of identifiers, by specifying which records have identifiers and which do not.
The LINEAGE-LINKED form defined in the FamilySearch GEDCOM 5.5.1 specification demands identifiers for most top-level records, but does not allow identifiers on the HEAD and TRLR records.

CONC and CONT

The GEDCOM syntax defines two special tags: CONC and CONT. The CONC and CONT are special because they are part of the GEDCOM syntax, not some GEDCOM form, and while most GEDCOM tags define records, the CONC and CONT extend line values.

According to FamilySearch GEDCOM 5.5.1 syntax, it is legal for CONC and CONT lines to have an identifier. That is a mistake.
Identifiers are used for cross-references between GEDCOM records. An identifier for a CONC or CONT line would allow a cross-reference into the middle of some line value. An interesting idea perhaps, but not a practical one, and surely not FamilySearch's intention either.
Realistically, CONC and CONT lines should not have identifiers. That is a GEDCOM syntax restriction.

Technically, it is legal for CONC and CONT lines to have an identifier. However, there is no GEDCOM writer that creates such lines, and no GEDCOM reader that truly expects it.

The FamilySearch GEDCOM 5.5.1 specification does not state that GEDCOM syntax restriction, so, formally, there is no such restriction.
Technically, it is legal for CONC and CONT lines to have an identifier. However, there is no GEDCOM writer that creates such lines, and no GEDCOM reader that truly expects it.
In fact, correct CONC and CONT handling isn't the strongest feature of most genealogy applications any way. That's not only fairly well known among advanced users and genealogy application developers, the Three Torture Tests I published back in 2010 continue to be too much of a challenge for most genealogy applications.

IdentCONT.GED

Still, here's another tiny GEDCOM file to test GEDCOM readers with. The IdentCONT.GED is a tiny GEDCOM file. It barely contains anything; apart from the mandatory header and such, there's just one INDI record, with a NOTE record.
That NOTE record consists of just three lines:

First line of a note.
CONT Second line of a note.
Third line of a note.

So far, nothing special or challenging. The twist is that the CONT line for the second line starts with an identifier. That's all.
That identifier is superfluous, the file is fine without it. It is there because it, although it is perfectly legal, it's unexpected.
Some GEDCOM readers may parse GEDCOM lines in such a way that they aren't bothered by a superfluous identifier, but other GEDCOM readers have CONC and CONT handling code that gets confused when it encounters one. In fact, as a few quick test revealed, quite a few applications get more than a little bit confused.

VGed IdentCONT.GED

GEDCOM validators

VGed 3.04

VGed 3.04 confirms that IdentCONT.GED is a valid GEDCOM file.
VGed warns that the one INDI record in there isn't referenced, and that's arguably a minor shortcoming of VGed; that warning makes a lot of sense in general, but not for GEDCOM files containing only just one INDI record.

GED-inline

GED-inline also confirms that IdentCONT.GED is a valid GEDCOM file.
The GED-line site's validation report is nothing but header data and statistics.
There is no error or warning.

Chronoplex GEDCOM Validator 1.0.8.0

The Chronoplex GEDCOM Validator 1.0.8.0 does not agree that the GEDCOM file is valid. The Chronoplex GEDCOM Validator gets really confused and complains that line 21 has Invalid user defined tag 'CONT'.

Chronoplex GEDCOM Validator IdentCONT.GED

That's a truly remarkable error message for a GEDCOM validator, as the CONT tag isn't a user-defined tag at all, but a standard GEDCOM tag, one that's always present, whatever the GEDCOM form, because it is part of the GEDCOM syntax.

There are some other strange things about the Chronoplex GEDCOM Validator output. The Chronoplex GEDCOM Validator claims to have processed 2 people (INDI records) and 2 sources, while the file contains just one INDI record and no sources at all.

Windows desktop applications

Chronoplex My Family Tree 3.0.5.0

The Chronoplex GEDCOM Validator failed to recognise IdentCONT.GED as a valid GEDCOM file, so it's hardly a surprise that Chronoplex My Family Tree 3.0.5.0 fails to import the file correctly. Then again, looking at Chronoplex Import GEDCOM screen, I do find it a bit surprising that Chronoplex didn't get it right. They are not only aware that many applications get CONC and CONT wrong, they even offer a Fix CONC tag spacing option to deal with it.

Chronoplex My Family Tree IdentCONT.GED

Chronoplex My Family Tree 3.0.5.0 imports the file without producing any errors or warnings, but the results are surprising. The text of the second line (Second line of a note) has been replaced by the GEDCOM identifier and the enclosing ampersands (@IDENT@):

First line of a note.
@IDENT@
Third line of a note.

Personal Ancestral File 5.2.18.0

PAF 5.2.18.0 imports the file without problems.
The import list show nothing but information from the GEDCOM header and some statistics.

RootsMagic 6.3.0.3

RootsMagic 6.3.0.3 imports the file without errors or warnings.
Tthe import listing shows information from the GEDCOM header.

Behold 1.0.5.1

Louis Kessler's Behold 1.0.5.1 imports the file without problems.
Behold does not produce any error or warning.

Gramps 3.4.5.2

Gramps 3.4.5.2, despite the release of version 4.0 in June of this year, still the latest version for Windows, does not understand the GEDCOM line and says so: Line ignored as not understood.

Gramps import IdentCONT.GED

Gramps is seriously confused by the CONT line and makes a remarkable mistake while reporting the problematic line: the error message includes the line value of the next CONT line; not the entire next line, just its line value, some white space and a newline in between. It looks like Gramps was halfway through transforming the NOTE line and subsequent CONT lines into a single Gramps note record already, when it noticed the unexpected identifier and reported it.

The import does not produce the correct result. Only the first of the three line has been imported as it should.
A second note, of type GEDCOM Import has been created. It claims to report Records not imported into INDI (individual) Gramps ID I0001:; That message helpful, but technically incorrect, as CONT lines aren't GEDCOM records. That message suggests that Gramps does not properly distinguish between GEDCOM lines and GEDCOM records, and that in turn may be part of the reasons its gets confused by this particular CONT line.

MyHeritage Family Tree Builder 7 import IdentCONT.GED

MudCreek GENViewer 1.23

MudCreek GENViewer 1.23 gets confused by this GEDCOM file. GENViewer does not produce an import log, but after import, the note is gone.

I tried GENViewer Lite 1.15, and it showed only the first line of the note.

MyHeritage Family Tree Builder Basic 7.0.0.7105

MyHeritage Family Tree Builder isn't eager to highlight any import issues. The dialog you get after importing is titled Success even if there was a failure…
Text along the top proclaims success too. It is only near the bottom of the dialog box that it says Some issues were encountered during the import process, and you need to choose the View issues button to see what those issues are. Only when you choose that button, do you get to see the Issues Encountered during GEDCOM Import dialog box.

MyHeritage Family Tree Builder Basic 7.0.0.7105 gets confused too, and reports that the line contains an unrecognised tag. Put that way, it sounds like just the same error as the Chronoplex GEDCOM Validator makes, but it is not. The Chronoplex GEDCOM Validator claimed that CONT is an invalid user-defined tag, while MyHeritage Family Tree Builder claims that @IDENT@ (the identifier, not the tag) is an unrecognised tag.

Upon cursory examination of Family Tree Builder's dialog box, it seems to report that the sole INDI record in the file isn't referenced by any other record, just like the VGed validator does. However, upon closer examination, it becomes apparent that the FTB claims something quite different; it does not claim that the @I1 identfier isn't referenced, it claims that the non-existent identifier N1 is referenced but not defined…
That's utter nonsense that has nothing to do with the GEDCOM file.

After import, the second line of the note is missing.:

First line of a note.
Third line of a note.

Family Tree Builder 2014 IdentCONC.GED import complete

Family Tree Maker 2014

Family Tree Maker 2014's Import Complete dialog box reports that there are 4 warnings for five records. Choosing the View Log file button opens the identCONT.ftm.import.log file in NotePad.
It lists the following four errors:

Line 21 : error 8 : "CONT" subordinated to wrong item. Line ignored.
Line 21 : error 8 : "CONT" subordinated to wrong item. Line ignored.
Line 21 : error 8 : "CONT" subordinated to wrong item. Line ignored.
Line 21 : error 4 : Invalid tag: @IDENT@. Line ignored.

None of these FTM 2014 errors makes much sense.
The CONT lines are subordinate to the NOTE line, and that is perfectly fine.
There are three text lines in the NOTE, so there are two CONT lines in the GEDCOM file, yet somehow, Family Tree Maker manages to issues the same error thrice, and all three times for the same line…
The fourth error is nonsense; @IDENT@ isn't a tag, it is an identifier.

The result of the import is as you would expect after reading those errors: only the first line of the comment is imported correctly. The second and third line are gone.

MyBlood 1.31

MyBlood 1.31 crashed while reading the file.
The MyBlood process kept running and had to be killed from Windows Task Manager.

conclusion

So, what we have here is a simple GEDCOM file of just 23 lines that contains one completely legal line that many well-known applications get seriously confused by. They fail to import it correctly, produce erroneous error message, or even crash. Although this particular legal construct is not likely to occur in real-world GEDCOM files, it remains disappointing to see how brittle the CONC and CONT handling of many applications is. It should not be that easy to confuse or crash an application.

Best Practice

GEDCOM Writer

GEDCOM Reader

GEDCOM Validator

updates

2013-09-10: Gramps fix

Doug Blank of the Gramps project has committed a fix.
This fix will be included in the next updates for Gramps 3.4 and 4.0.

2013-11-09: Gramps 4.0.2

Gramps 4.0.2 has been released and includes the following fix:

  • Fix some regressions on GEDCOM file format export and enhancement on CONT/CONC handling

links

GEDCOM & validators

applications mentioned