Modern Software Experience

2010-10-10

GEDCOM Import

quality

StamboomNederland GEDCOM Import and StamboomNederland GEDCOM Export provided an overview of how StamboomNederland supports GEDCOM. StamboomNederland GEDCOM Export already looked at the overall quality of the GEDCOM export. This text focuses on the quality of the GEDCOM Import, and adds a few observations on the quality of the GEDCOM export.

character encodings

The GEDCOM specification allows four different character encodings; ASCII, ANSEL and UTF-8 and UNICODE (a misnomer, it should have been called UTF-16, UTF-8 is Unicode too). It also specifically forbids the use of Windows ANSI (code page 1252).

In practice, a GEDCOM reader does not need to support UTF-16; several desktop genealogy applications do support it, but it is rarely used. Then again, support for UTF-16 comes naturally for Unicode applications.
A GEDCOM reader should support ASCII, ANSEL and UTF-8. Although code page-based programs should write ANSEL GEDCOM files, many don't bother to do so, so a GEDCOM reader should additionally support MS-DOS, MacRoman and Windows ANSI.

no import log

A major shortcoming of StamboomNederland is that it does not produce an import log. It produces an import process report, but that is almost useless; the report does not state which character encoding the import process recognised. However, it does contain the sentence ATTENTION!! GEDCOM file does not have a header tag (HEAD) when it does not recognise the header.

StamboomNederland gets something right that many applications get wrong; an ASCII GEDCOM file is treated as ASCII, as it should.

UNICODE GEDCOM

Although StamboomNederland is an Unicode application, it does not support UNICODE GEDCOM. I tried little-endian UTF-16, I tried big-endian UTF-16, and tried both with and without a Byte Order Mark, but for all of these the import process report erroneously complains that the file lacks a header. Visual inspection within StamboomNederland showed the database to be empty.

ASCII GEDCOM

Replacement Character

StamboomNederland gets something right that many applications get wrong; an ASCII GEDCOM file is treated as ASCII, as it should.
Many applications pretend to support ASCII-encoded GEDCOM files by treating them as Windows ANSI-encoded files; the end results is that they accept byte codes that are illegal in ASCII as Windows ANSI characters.
StamboomNederland does what it should do; it replaces all illegal codes with Unicode character U+FFFD, the Replacement Character, usually displayed  as a white question mark inside a black diamond.
I only discovered this because I deliberately tested it. StamboomNederland does not produce an import log and does not report to you that your GEDCOM file contains illegal characters.

MS-DOS GEDCOM

An attempt to import an MS-DOS (code page 850) GEDCOM gave the same results as the ASCII import, so it seems that StamboomNederland does not recognise MS-DOS GEDCOM files, and treats any encoding it does not recognise as ASCII. Once again, there is no message stating whether the encoding was recognised or not.

ANSEL GEDCOM

Developers new to genealogy do not always understand how important ANSEL support is. During the presentation just a few weeks before the official introduction, the word was that StamboomNederland did not support ANSEL. I tested today, and double-checked that my test file was indeed an ANSEL file; StamboomNederland does support ANSEL GEDCOM. I did not examine the quality of the ANSEL support, I merely note that it does support ANSEL.
The ANSEL support is limited to GEDCOM import; StamboomNederland always exports using UTF-8.

ANSI and UTF-8

The ANSI and UTF-8 test file imported just fine. StamboomNederland imports both UTF-8 files with and UTF-8 files without a Byte Order Mark.

GEDCOM dialects

According to the StamboomNederland article the Central Bureau for Genealogy published in their own quarterly, StamboomNederland supports the dialects of Aldfaer, GensDataPro, Haza-Data and PRO-GEN. These are the most popular Dutch genealogy applications. During the presentation we were told that only Aldfaer support was working yet.
I have not tried to establish how well each dialect is supported, but simply decided to investigate some problems reported on bulletin boards.

Aldfaer doubling

One problem reported on the StamboomNederland sub-forum of StamboomForum was that after import of an Aldfaer GEDCOM into StamboomNederland, names like Jantje van Leiden turned into Jantje van Leiden van Leiden. I tried this but could not replicate this problem, nor did anyone confirm the complaint of the original poster. Perhaps it was a temporary problem, perhaps it only happens with GEDCOM files produced by an older version of Aldfaer, but right now, this problem does not seem to exist.

Someone else, who did not identify the genealogy application used, reported that names containing an umlaut became double names. I have not experienced that either. It seems likely that both users experienced the same resolved defect in different ways.

GensDataPro header


0 HEAD
1 GEDC
2 VERS 5.5
1 SOUR GensDataPro
2 VERS 2.8.
1 DATE 18-9-2010
1 CHAR ANSI

A problem reported on 2009 Oct 19 in the Google Group for StamboomNederland is that an attempt to import a GensDataPro 2.8 ANSI GEDCOM 5.5 file failed. The StamboomNederland process import report for the file claiming that the GEDCOM file lacks a header. The poster included the GensDataPro to show that the header is just fine; ironic, because it is not.
The header contains the line 1 DATE 18-9-2010; that date is in an illegal format, it should read 1 DATE 18 Sep 2010. The GEDCOM header produced by GensDataPro is wrong. Technically, it simply isn't a GEDCOM header; it specifies ANSI and contains an illegal date format.

It seems that StamboomNederland was not ready to handle anything but the correct date format, so trying to read the header failed, and it then produced the only error message it has for what it believes to be invalid header: ATTENTION!! GEDCOM file does not have a header tag (HEAD), the same error message it produces for UNICODE GEDCOM files.

That error message is wrong, there is a perfectly valid header tag, but the header ain't right. I tried to reproduce the problem by creating an equally invalid header, but StamboomNederland processed it just fine. I guess they solved the problem by ignoring the date format used in headers, but this problem was never a defect in StamboomNederland, it is a long-existing defect in GensDataPro.

GensDataPro marriages

Another problem a user reported is that StamboomNederland does not read church marriages, and the explanation for that was given as well. For church marriage notices, GensDataPro does not use the MARR tag, but the ORDI tag, which StamboomNederland does not support. That explanation is only partially right. GensDataPro does use the MARR (marriage) and MARB tags, but it also uses the ORDI tag.
This problem will be resolved as StamboomNederland's support for Dutch GEDCOM dialects improves.

sources

Another message about PRO-GEN import gives a large list of tags and subtags that StamboomNederland does not support. A quick summary of that message is that support for sources leaves to be desired - and that is a complaint several users have voiced.

When you make your project public, everyone can view all the data of all living persons.

privacy

A new StamboomNederland project defaults to being private, but the individuals in a GEDCOM file default to being public. That's true of all the individuals in the GEDCOM file, including the living ones. When you make your project public, everyone can view all the data of all living persons. This unlimited disclosure is a violation of Dutch privacy rules.

StamboomNederland lacks reasonable privacy defaults. Simply making all individuals public isn't reasonable; it is not reasonable to expect the user to edit each individual in their project. StamboomNederland should make an effort to determine which individuals are alive.
Part of that reasonable effort is support for the Dutch GEDCOM dialects. Several Dutch genealogy applications save information about whether an individual is living or not; StamboomNederland should take advantage of that.

unsupported tags

StamboomNederland treats unsupported tags inconsistently; the data does not show up in your tree, but does show up in the odd sources that StamboomNederland makes up.

sources

GEDCOM sources

StamboomNederland's support for sources leaves a lot to be desired. There are two huge problems with the current support. One is that it simply does not support citations and sources as it should; I imported several files containing citations and sources, but when I looked for these in StamboomNederland, none of it was there.

StamboomNederland: Jantje van Leiden Citation

What is on each citation page instead of a citation is completely useless to the user: a raw XML representation of the entire user record. An ability to link to the source is absent.
There is no citation or source support for uploaded GEDCOM files at all; it just shows the raw XML for the individual. That isn't of any use to anyone.

online sourcing

I tried creating a source and citation in StamboomNederland itself.  I tried hard. I guess it worked in a demo once.
You cannot just add some source to a fact as you in practically every other genealogy application.
It seems you need to leave the individual edit page to go the sources page and create a source, and then - unless you remember to use the browser back button - have to go back to the persons page to reselect the same individual again to try and create a citation of that source.

StamboomNederland: Search for Citation

StamboomNederland does not allow you to select a source from a list of sources, but prompts you to perform a search. So, StamboomNederland apparently expects you to find the desired source by searching for its title or some text contained within that source, using the confusingly titled Search for citation page (you are definitely not searching for an existing citation, but trying to creating one, and apparently have to search for a source). When you try to perform a search from that page, you have to wait half a minute for the all too familiar Interne fout (Internal error) page. I tried this several times, even started new sessions, but whatever I did, the search result always was Interne fout.

StamboomNederland: search result: interne fout

It is hard to make any sense what StamboomNederland is trying do to with citations and sources and to discover what works at all. What I am sure of is that StamboomNederland discourages the use of sources with its considerably less than optimal user experience.

colossal conceptual confusion

The Central Bureau of Genealogy did not ask the StamboomNederland developers to support EE-style source templates. All the CBG asked for was straightforward support for citations and sources. Yet, StamboomNederland fails to import citations and sources from GEDCOM files, and its online user-interface for adding sources and citations is disfunctional. Yet that still isn't the biggest problem.

The biggest problem became apparent during the live demonstration a few weeks before the introduction; the colossal conceptual confusion that has made it into product; StamboomNederland treats a GEDCOM file as a source for the data inside that file. StamboomNederland does not treat a GEDCOM file as a database containing data complete with citations and sources, they think of each GEDCOM file as the source for all the data inside that GEDCOM. That does not make any sense. This is a serious mistake, a collossal conceptual confusion between sources inside a database and a file used to transfer the database from one system to another.

experiment

StamboomNederland: Jantje van Leiden


0 HEAD
1 SOUR SNL
2 VERS 2200
1 SUBM @SUB1@
1 GEDC
2 VERS 5.5
2 FORM LINEAGE-LINKED
1 DEST GED55
1 DATE 13 FEB 2000
1 CHAR UTF-8
0 @SUB1@ SUBM
1 NAME SNL user
1 ADDR SNL user's address
0 @I1@ INDI
1 REFN 1
1 NAME Jantje /van Leiden/
1 SEX
1 BIRT
0 TRLR

I've come up with a simple experiment that demonstrates the problem. This experiment does not involve any other application. First, I created a project and added just one person; just a name, no details, no pictures, no citations or sources. Only an empty project is simpler than that.

Then, I exported that project to a GEDCOM files, and imported the resulting GEDCOM file into a fresh project. The result should be the same as the original project; just a name, nothing more.

The user interface makes this simple plan an exercise in jumping back and forth between pages; the Dashboard page to find the project and go to the Project details page to click the Export GEDCOM button that starts the export process, the Processes tab of the My Profile page to download the GEDCOM, the Dashboard page to create a new project, and the Sources page to import the GEDCOM. The user experience is plain awful.

GEDCOM quality

The project is about as simple as possible, yet the GEDCOM file that StamboomNederland creates for it still manages to disappoint.
Below the NAME tag for the complete name should be a GIVN tag for the given name, a SURN tag for the surname, and a SPFX tag for the surname prefix. StamboomNederland makes you enter the surname prefix separately, yet does not export it separately.
Also notice the SEX tag without a value; that is wrong. I forgot to indicate a gender, so StamboomNederland should list the value U for unknown. The GEDCOM specification syntax specifies that the SEX tag may be omitted, but that any SEX tag that is included must have a value.

import

The exported GEDCOM file contains just the name, no other facts, and no citations or sources.
However, when you import this GEDCOM file into another project, StamboomNederland treats the entire GEDCOM file as the single source for all the data in the file, and the citation page for the lone individual suddenly shows the raw XML for that individual as a citation.

StamboomNederland: Jantje van Leiden Citation

StamboomNederland makes up that source and those citations, yet when you export the project to a GEDCOM file again, StamboomNederland does not export the source or citation! Although the new project is about twice as big, the exported GEDCOM looks exactly the same.

I expected this experiment to show how round-tripping the data to and from a GEDCOM file creates increasingly ridiculous citations. Instead I discovered something else; Although StamboomNederland has been live for weeks, StamboomNederland still does not export citations or sources at all.

three sources

I decided to try another experiment; I created an empty project, added three sources and exported the project to a GEDCOM file. Creation of sources seemed to fail. I saw the Interne fout page several times, but after some struggling, the Sources page showed Source One, Source Two and Source Three. The exported GEDCOM file was essentially empty; just a GEDCOM header, immediately followed by the GEDCOM trailer.

This simple experiment established that the StamboomNederland GEDCOM export doesn't support sources or citations at all, but when I decided to export the raw XML, I found that the XML file does contain the sources.

defective

I went back to the original experiment and had a look at its XML file; the sources and citations that StamboomNederland makes up do not export to GEDCOM files, but do export to XML files. The StamboomNederland GEDCOM export is seriously defective.

what's going on?

After some more experiments I now believe that I have a good idea of what works and what doesn't.
The import from and export to raw XML works just fine. That makes sense, as that functionality was not created by the StamboomNederland programmers, but is standard functionality provided by the database system itself. The only problem with StamboomNederland's raw XML format is that no other system supports it.

StamboomNederland's GEDCOM support is unfinished and seriously messed up.

The problems with the GEDCOM import and export are problems with the GEDCOM to XML and the XML to GEDCOM converters. The GEDCOM to XML converter used during GEDCOM import ignores all the citations and sources in the GEDCOM file and then goes on to create an XML file in that treats the GEDCOM file as the only source, cited for every fact. The XML to GEDCOM conversion used for GEDCOM export does not support citations or sources yet, and that is why it produces a GEDCOM file that contains no citations or sources at all.
In plain English: StamboomNederland's GEDCOM support is unfinished and seriously messed up.

conclusion

The StamboomNederland GEDCOM import process supports all the character encodings it should support. Several problems reported on public forums seem to have been fixed since they were first reported. The export quality of names is poor. StamboomNederland allows sharing files publicly, but lacks reasonable support for privacy of living persons. Support for the Dutch GEDCOM dialects that StamboomNederland is supposed to support is still poor, but likely to improve over time.

StamboomNederland's ostensible support for citations and sources is the worst I have ever seen. Online creation of citations seems impossible and is definitely impractical. The GEDCOM exports does not support citations or sources at all. The GEDCOM import does not support citations and sources contained within a GEDCOM either, but instead treats the entire GEDCOM as the single source for every fact.

In short, real citation and sources support is completely absent, and the GEDCOM import displays colossal conceptual confusion that results in lots of useless raw XML in your tree. All in all, the ostensible support for citation and sources is so awful that StamboomNederland would be a better application without it.

links