A basic check an application should perform when trying to import a file is
making sure that it is importing the right kind of file. This is generally done by
recognising the file header. Many file formats deliberately include a so-called
magic value in the header that identifies the file format. For example, all Windows bitmap
images start with the ASCII values for the two letters B and M, in that
order.
The first line of a valid GEDCOM file is 0 HEAD. As a result, many genealogy
applications check that the file starts with 0 HEAD
.
Some slightly more flexible readers allow a few empty lines, spaces and tabs before the magic value. That flexibility is nice, but less important than getting the magic value right.
All but seriously outdated GEDCOM readers will recognise 0 HEAD when the file is
encoded in UTF-16 ("UNICODE" in confused GEDCOM terminology) instead of
UTF-8, ANSEL or ASCII. Sadly, quite a few current GEDCOM readers are seriously
outdated, but this text is not about Unicode support.
Fact is that most GEDCOM readers get the magic value wrong.
The magic value is not 0 HEAD but a first line that contains
nothing but 0 HEAD ; in other words 0 HEAD followed by a newline.
A reader that generously wants to be flexible beyond the GEDCOM specification
could accept some additional whitespace on that line.
The newline may seem an insignificant detail. You are not likely to encounter many stories that start with "0 HEADING OUT INTO" or something similar. Nor is it likely to match a numbered list that starts with "0 HEADQUARTERS".
That GEDCOM files start with a zero and that tags are ALL-CAPS makes it
very unlikely that any random text file will pass the erroneous magic value test.
Ordinary text rarely starts with a zero, most people start numbering their lists
at one instead of zero, and most text intended for humans is not written in
ALL-CAPS.
That the erroneous magic value could match some random text is not the issue. There are two issues; real errors and another format.
One issue is that a carelessly coded test may fail to recognise
real errors, such as some value following the HEAD tag, for example 0 HEAD 123
.
That is illegal. The HEAD tag should not be followed by any value. If there is a
value anyway, That’s an error; perhaps it isn’t a GEDCOM file after all.
Here is a table summarising how several current genealogy applications respond to a GEDCOM file
that has a value following the HEAD tag.
| application | version | action |
|---|---|---|
| Family Tree Maker Classic | 16.0.350 | warning message |
| New Family Tree Maker | "2009" 18.0.0.307 | no error or warning |
| Personal Ancestral File | 5.2.18.0 | no error or warning |
| Aldfaer | 4.1 | no error or warning |
| Behold | 0.98.9.91 "alpha" | message |
| Brother’s Keeper | 6.3.11 | no error or warning |
| Cognatio | 1.4.1 | no error or warning |
| GEDCOM Explorer | 1.0.0.85 | no error or warning |
| Legacy Family Tree | 7.0.0.90 | no error or warning |
| MyBlood | 1.0.600 "Alpha 4.1" | no error or warning |
| RootsMagic | 4.0.1.1 | no error or warning |
Family Tree Maker Classic produces a message: Line 1 : Tag: HEAD, cannot have a value
field: 123, ignored.
. The log file does not clearly label the message as
either a warning or an error message, but the dialog box that prompts you to
view the log file states that the import generated one warning and zero errors.
Behold produces the following message: level 0 record(s) should not have
additional text following the record type. GEDCOM only allow this on "0 NOTE"
records. Behold will include and display the extra text.
. Behold does not
identify its messages as either an error or a warning message, merely as a Possible
GEDCOM problems
.
If there is not even a valid GEDCOM header, the file simply isn’t a GEDCOM file and should be rejected.
The table shows that most genealogy applications fail this incredibly basic test; they report neither an error nor a warning. Behold reports the issue, but does not indicate that it is an error. Family Tree Maker Classic reports the issue, but miscategorises the error as a warning.
The inclusion of illegal text in a GEDCOM header is not some minor issue, but
a very serious one, that should be treated as the fatal error it is.
Even flexible
GEDCOM readers that try to support as many GEDCOM dialects as possible have to
draw a hard line somewhere. The GEDCOM header is the natural compatibility
boundary.
If there is a valid GEDCOM header, the reader can use it to determine what
extensions to support, what errors to compensate for. If there is not even a valid GEDCOM header, the file simply
isn’t a GEDCOM file and should be rejected. It makes little sense to try and
bother to read the rest if the application that created the file cannot even
get the GEDCOM header right.
That most current genealogy applications fail to report an error in the GEDCOM header, even when there is illegal content on the very first line, the one line the reader must examine to check whether it could be a GEDCOM file at all, is very disappointing.
The other issue is that merely checking that a file starts with 0 HEAD
it is not likely, but certain to match the
magic value by another file format. A test that merely checks that a file begins with
0 HEAD
, will not only match 0 HEAD, but also match 0 HEADER.
FTW TEXT files start with 0 HEADER and FTW TEXT is not just some
random file format, but a proprietary format of Family Tree Maker, an genealogy
application. What’s worse,
Family Tree Maker actively misleads its users into thinking that FTW TEXT is
GEDCOM. Thus, a GEDCOM reader is quite likely to be presented with an FTW TEXT
file for processing and should produce an appropriate response when that
happens.
If the GEDCOM recognition is done right, the GEDCOM reader will, upon being
presented with an FTW TEXT file, inform the reader that the file is not a GEDCOM
file, even if it does not know about FTW TEXT files.
Alas, as related in The FTW TEXT Problem, only a few of the already
mentioned applications act correctly.
The following summarises how these applications respond to otherwise valid
GEDCOM file that starts with 0 HEADER like an FTW TEXT file, instead of 0 HEAD.
| application | version | action |
|---|---|---|
| Family Tree Maker Classic | 16.0.350 | no error or warning |
| New Family Tree Maker | "2009" 18.0.0.307 | no error or warning |
| Personal Ancestral File | 5.2.18.0 | no error or warning |
| Aldfaer | 4.1 | erroneous message |
| Behold | 0.98.9.91 "alpha" | message |
| Brother’s Keeper | 6.3.11 | no error or warning |
| Cognatio | 1.4.1 | error message & abort |
| GEDCOM Explorer | 1.0.0.85 | error message & abort |
| Legacy Family Tree | 7.0.0.90 | no error or warning |
| MyBlood | 1.0.600 "Alpha 4.1" | no error or warning |
| RootsMagic | 4.0.1.1 | no error or warning |
Aldfaer’s import dialog shows the progress and highlight the message GEDCOM
bestand heeft geen HEADER record
(GEDCOM file has no HEADER record) in red.
Problem is, that the message is as wrong as it gets. A HEADER record is
exactly
what this file does have instead of a HEAD record, so this
message only adds to the confusion. Aldfaer does not
abort, but continues the import.
Behold produces the following message: The HEAD record is missing.
This may not be a GEDCOM file. Behold will use what it can from it.
. Behold
does not indicate whether this is an error or warning message.
Cognatio puts up a messagebox with the text Serious errors were found when searching the GEDCOM file. The import was
cancelled.
. The messagebox icon makes it clear this an error, and the import is
aborted.
The import log additionally notes that HEADER is an unknown tag
.
GEDCOM Explorer puts up an messagebox with the text Not a valid Gedcom file, Error processing "GEDCOMHEADER.GED", Processing aborted
. Again, the
messagebox icon used conveys this is an error, and the import is aborted.
Cognatio did best; it detects the problem, provides an error message to the user, aborts the import, and clearly documents the errors in the import log.
If you compare these results with those in The FTW TEXT Problem, you may notice that both Legacy and RootsMagic are confused by an actual FTW TEXT file and that neither produced any message about the erroneous header. This immediately suggests why Legacy and RootsMagic are confused; neither supports FTW TEXT, but both allow themselves to get confused by not performing proper header checks before proceeding with the rest of the file.
Detecting GEDCOM is easy; just check for the magic value, but do get the magic
value right. The magic value is not 0 HEAD but 0 HEAD; 0 HEAD followed by a newline.  is Unicode character U+2424, the Symbol for NewLine.
If a file does not start with that, it is not a GEDCOM file.
If a GEDCOM file is expected, processing should be aborted with an error message.
| magic value | file format | action |
|---|---|---|
0 HEAD | GEDCOM | process as GEDCOM |
| anything else | unknown | abort |
A forgiving GEDCOM reader could allow whitespace in between 0 HEAD and the newline
as a non-fatal error. For ease of discussion, the rest of this text assumes a
regular GEDCOM reader.
Detecting FTW TEXT is just as easy, it has magic value slightly different
from that for GEDCOM. The magic value for FTW is 0 HEADER
followed by a newline. If a file does not start with that, it is not an
FTW TEXT file.
Because Family Tree Maker misleads its users into think that its FTW TEXT files are GEDCOM files, these users are likely to be confused when their ostensible GEDCOM file is refused because it is not GEDCOM.
The magic keys for GEDCOM and FTW TEXT both start with 0 HEAD, so it is
tempting to think that a reader that supports both could get away with checking
just that, but that is a mistake. Not only does the reader need to make sure it
is either GEDCOM or FTW TEXT and issue an error when it encounters a value such
as 0 HEADING, which is neither, it also needs to distinguish
between GEDCOM and FTW TEXT.
A GEDCOM reader that supports both GEDCOM and FTW TEXT should detect both and provide an informational message that it detected a GEDCOM header detected FTW TEXT header, as the case may be.
| magic value | file format | action |
|---|---|---|
0 HEAD | GEDCOM | process as GEDCOM |
0 HEADER | FTW TEXT | process as FTW TEXT |
| anything else | unknown | abort |
I recommend detecting FTW TEXT even if it not supported by the reader, just to provide the user with a really helpful message; to not just produce an error message telling the user that the file is not a GEDCOM file, but add an informational message that the file is a FTW TEXT file instead of a GEDCOM file. The application help file can explain that further.
Because FTW GEDCOM is very problematic GEDCOM dialect, it even makes sense to
recognise FTW GEDCOM and issue some warnings that FTW GEDCOM files are
problematic. Recognising a GEDCOM file as produced by a particular application
is merely a matter of examining the GEDCOM header’s SOUR field.
There is no known need to examine an FTW TEXT header’s SOURCE field;
all FTW TEXT files were produced by FTW. However, it is not a bad idea to
examine that field anyway and issue a warning when it contains another value
than expected.
GEDCOM Tags provides an overview of GEDCOM tags.
GEDCOM 5.5EL discusses detection of GEDCOM 5.5EL.
Copyright © Tamura Jones. All Rights reserved.