Modern Software Experience

2011-05-10

quality assurance

GEDCOM Validation is not easy. Validators are picky by nature. What makes GEDCOM validation particularly hard is that there is no up-to-date GEDCOM validator.

GedFan

Genealogy Software Performance Testing explains that I am about to release GedFan. Right now, the pre-release is available to anyone who finds the not-so-secret download page.
Before I officially release it, I need to do some quality assurance.

The GedFan utility generates GEDCOM files. That's all it does. I did not plan to release GedFan as a utility, I intended to release the files that GedFan produces. However, these files get so large, that I am releasing the utility to save on bandwidth.

The GedFan utility is about the files it creates. It is important to keep this in mind, and not get carried away worrying about something like GedFan's MS-DOS look and feel. GedFan quality assurance is about making sure the GEDCOM files it produces really are valid GEDCOM files.

straight face test

The absolute minimal quality assurance is the straight face test; to check for obvious errors, so you can state, with a straight face, that you've done some testing. For a GEDCOM file, that means as much as loading it into NotePad to verify that looks a like a GEDCOM file, visually scanning through it to make sure it doesn't contain any obvious oddities, and when it seems okay loading it into at least one genealogy application.
If it looks like a GEDCOM file, and works like a GEDCOM file, it probably is a GEDCOM file. Well, not really, but this hurried test is much better than not testing at all.

A straight-face test may be good enough for a pre-release, it definitely isn't good enough for the official release. The GEDCOM files GedFan produces will be used to test the capabilities and performance of genealogy software. It should be perfect GEDCOM files.

PAF

Part of the straight-face test was importing a GEDCOM file into Personal Ancestral File (PAF) for Windows. When I started checking the validity of the GEDCOM files, PAF already imported the files without any error or warning. That is reassuring, but does not mean the GEDCOM files are flawless.

What I need is a GEDCOM validator.

GedChk

The GEDCOM specification was created by FamilySearch. Back in the days when MS-DOS was popular, FamilySearch not only created PAF for DOS, they also created GedChk, their first and only GEDCOM validator.
FamilySearch did not lavish much attention on GedChk. The latest version they released is GedChk Beta 0.9, dated 1998 Sep 9.
You are supposed to be able to download it from their GEDCOM FAQ page, but ever since FamilySearch changed their site design a few months back, several things have been missing in action. GedChk is one these things.

The GedChk validator is an MS-DOS application, but apparently not a well-behaved one. When you try to run it from the Command Prompt (either CMD.EXE or COMMAND.COM) on a 64-bit editions of Windows, you'll be shown a messagebox titled Unsupported 16-Bit Application which tell you that The program or feature "\??\P:\gedcheck\GEDCHK.EXE" cannot start or run due to incompatibity [sic] with 64-bit versions of Windows. Please contact the software vendor to ask if a 64-bit Windows compatible version is available., and when you dismiss the messagebox, the text This version of P:\gedcheck\GEDCHK.EXE is not compatible with the version of Windows you're running. Check your computer's system information to see whether you need a x86 (32-bit) or x64 (64-bit) version of the program, and then contact the software publisher. will be shown in the Command Prompt window itself.

GedChk 0.9 in DOSBox 0.74

You can run GedChk inside a DOS Box. The screenshot shows GedChk inside the DOSBox emulator. That is GedChk Beta 0.9 running inside DOSBox 0.74, to validate GEDCOM files created by GedFan 0.1; that isn't the most reassuring combination of version numbers ever. Even when you have a DOS Box already, GedChk as provided by FamilySearch does not work. You need to rename the file GED55.GMR.txt to GED55.GMR first.

GEDCOM 5.5.1

The GED55.GMR contains a GEDCOM 5.5 grammar. It does not include the GEDCOM 5.5.1 tags, so the validator predictably complains about the WWW tag. I wanted the files to pass GedChk validation, but I also wanted to use GEDCOM 5.5.1, as that is the de facto standard. To make that happen, so I party updated GED55.GMR to GEDCOM 5.5.1. I added the EMAIL, FAX and WWW tags and changed the CONT subtag for the ADDR record from mandatory to optional, as that is what both the GEDCOM 5.5 and GEDCOM 5.5.1 specification say.

terminator

GedChk also complained about a missing terminator. The GEDCOM files already ended with 0 TRLR as they should, but that is not good enough. The GEDCOM grammar demands that each line, and that includes the last one, be terminated with the line terminator (newline). The GEDCOM even specification states explicitly that The required GEDCOM HEADer record appears only on the first disk and the required TRLR (trailer) record appears only on the last disk and must be followed by the terminator. (their bold). I added a newline right after 0 TRLR, and GedChk stopped complaining.

submitter name

The GEDCOM validator complained that the SUBM record, which originally contained only a NAME tag, should contain at least one ADDR tag, so I fixed that issue.

After all the preceding corrections, GedChk still wasn't happy about the submitter record. It continued to report one error: The NAME in submitter record (@SUB1@ SUBM) may not be complete.. That error is actually a warning that is miscategorised as an error. GedChk does not like it that the name is GedFan. GedChk is happy when I change the name to Ged Fan, but that isn't its name. I decided to solve this by adding the version number. That gets a space in there, in between the utility name and its version number, and that space is probably all GedChk checks for.

VGED

VGED: FAN18.GED

VGED by Tim Forsythe was originally known as The GEDCOM Validator (TGV). Last year, all his projects were still on his RumbleFische site. Everything has moved to the new AncestorsNow site. Tim has terminated the VGED project. Version 3.02 created last year is the last release. The download link on the VGED page works after you sign in as a member.

VGED is easier to use than GedChk, and runs just fine on Windows 64-bit. VGED is a validator for GEDCOM 5.5. VGED does not support GEDCOM 5.5.1, so, like GedChk, it complains about the WWW tag.

VGED does not have an editable grammar file like GedChk does, so it is not possible to make it recognise the WWW tag.

VGED does have an option dialog box that allows you to turn check on and off. I turned all the checks on, but VGED found nothing else to complain about.

GedPad

GedPad: FAN19.GED

GedPad is a product of Nigel Button Software, better known for The Complete Genealogy Reporter. GedPad is included with The Complete Genealogy Reporter, but can be downloaded separately.

GedPad isn't a validator. GedPad is a GEDCOM editor with some validation features. The buttons to the right of the edit window allow you to search for non-standard dates, non-standard names, missing files, dead links, unlinked records, duplicate records, and parentless families. It will even fix many of these issues.
GedPad's user-interface is a bit unusual and it does not support ASCII or ANSEL files, but it is a handy tool to have around when some GEDCOM file gives you trouble.

It is no surprise that, after validation with both GedChk and VGED, GedPro did not find any issues with the GedFan-generated files, but it does not hurt to check.

GEDCOM Explorer

GEDCOM Explorer: FAN18.GED

GEDCOM Explorer is a product of GedCom Solutions (Yes, they IniCapped GEDCOM in their company name). Like GedPad, the user interface is a bit unusual, and, as the screenshot shows, it uppercases all names for no particular reason.

GEDCOM Explorer provides a range of GEDCOM sanity checks, available from the Utility menu; Name Problems.., Family Name Duplicates..., Partner Age Problems..., Marriage/Birth Problems..., Family Birth Span..., Child Birth Problems..., Same Marriage Date..., Same Partners..., No Parents..., Empty Families..., Sex Problems..., Duplicate People..., INDI Duplicates... and Source Duplicates....

The only possible problem GEDCOM Explorer finds in the GedFan-generated files is that the home individual has unknown sex - and that is deliberate. All its others checks produced No Data.

Genealogica Grafica

Genealogica Grafica: FAN12.GED

Genealogica Grafica is a web page generator by Tom de Neef. It is the successor the KStableau. Genealogica Grafica can create web pages with graphical family trees. Genealogica Grafica is still relatively unknown, but those who do know it, also know that it includes extensive consistency checks. If you honestly think your genealogy doesn't contain any inconsistencies, because your current software does not report any problems anymore, you probably haven't tried Genealogica Grafica yet.

Genealogica Grafica does not only perform genealogical consistency checks, it also reports GEDCOM errors and inconsistencies. The most important ones, shown in the dialog box, is that all the links between individuals and GEDCOM families are consistent.

Genealogica Grafica did not find any problems with the GedFan-generated files.

Behold

Behold: FAN19.GED

Behold is a product of Louis Kessler. Behold 2.0 should be a genealogy editor, but Louis is currently working towards the release of Behold 1.0, a genealogy viewer.

Behold has a very forgiving GEDCOM import that accepts many invalid GEDCOM files, and produces a detailed import log with errors and warnings. It shows a summary of that log in its main window. For the GedFan-generated files, it reports that there are No problems to report.

GEDCOM validation

GEDCOM Validation is not easy. Validators are picky by nature. What makes GEDCOM validation particularly hard is that there is no up-to-date GEDCOM validator.

GedChk

FamilySearch GedChk is a 16-bit MS-DOS application. That implies some limitations already, but it does not even run on today's 64-bit Windows. GedChck has not been maintained since 1998 and although FamilySearch PAF uses GEDCOM 5.5.1, GedChck does not support it. You can update GedChk by modifying its GEDCOM grammar file, but FamilySearch should have done so in 1999, when it released GEDCOM 5.5.1.

VGED

Tim Forsythe's VGED is an easy to use Windows application, but it is limited to GEDCOM 5.5 as well. VGED has various options that allow you to control what it checks, but you cannot update its GEDCOM grammar. Tim recently announced that VGED will no longer be updated, and when asked about it, informed me that he has no plans for a final update to support GEDCOM 5.5.1.

more tools

Several genealogy applications are helpful in verifying GEDCOM files. GedPad is a GEDCOM editor that includes consistency check. GEDCOM Explorer is a GEDCOM utility that provides many sanity checks. Genealogica Grafica is a web page generator with extensive genealogy consistency check, that reports GEDCOM errors as well. Behold is a genealogy application with a forgiving GEDCOM importer that provides detailed errors and warnings.

current situation

The current situation is definitely less than perfect. There is not one up-to-date GEDCOM validator, and the only validator that is kind-of official requires an emulator to run on modern systems. That still is no excuse to not perform any validation at all. The existing validators do support GEDCOM 5.5 and will catch many violation of the GEDCOM specification. There are others applications that aren't validators, but will report a bunch of GEDCOM errors. Thus, there are plenty of tools to check GEDCOM files before releasing them.

GedFan

I had to upgrade the GedChk grammar file, VGED does not understands the WWW tag and GEDCOM Explorer warns about an individual whose gender is specified as unknown, but none of the tools reports any real errors for the GedFan-generated files. So the files seem okay, this bit of quality assurance is done, and GedFan can be released now.

updates

2011-07-32 GED-inline

Nigel Munro Park has created GED-inline, a new GEDCOM validator. GED-inline is the first GEDCOM validator to support GEDCOM 5.5.1.

2011-08-05 VGed 3.03

Tim Forsythe has updated VGed with support for GEDCOM 5.5.1. VGed 3.03 supports GEDCOM 5.5.1, but does not auto-detect the GEDCOM version.

2012-01-15 VGedX 1.00

Tim Forsythe has introduced VGedX, a command-line GEDCOM validator.

2012-05-03 My Family Tree 2.0

Chronoplex My Family Tree 2.0 includes a GEDCOM validator.

2012-08-30: Chronoplex GEDCOM Validator

Chronoplex just released My Family Tree 2.0.2.0, and Chronoplex GEDCOM Validator 1.0.0.0 is a separate utility now.

links