Modern Software Experience

2009-03-25

GEDCOM Tests

PC switch

Some of my readers have been worrying whether I still do GEDCOM import and export tests. Yes, I do. I did switch from a four year old PC running Windows XP to a new one running Windows Vista, but the old PC is still around, so I can keep comparing with all the performance data I already have, while compiling new performance data for the new machine.

RootsMagic 4

I should have done tests for Family Tree Builder 3 already, but I don’t need to take a poll to know that most of my readers would rather hear about RootsMagic version 4, released today.

This then, is about RootsMagic 4. It is not a full review, but only about its GEDCOM support.

RootsMagic GEDCOM

I have been playing with RootsMagic 4 since the early Previews and throughout the Public Beta. I’ve send in quite a few defects reports, and these were quickly handled.

During all this, I hardly looked at the GEDCOM support. I usually imported a large PAF database to experiment with. Now that I’ve doing some tests, I regret not taking time to study the GEDCOM support a bit sooner, as I was surprised by some of the things I found.

the different PC difference

Let’s start with the difference a different PC makes. There are many differences between the two PCs. They do not even run the same version of Windows. I’ve discussed this in some detail in my review of the new PC. The quick summary is that although the old one running Windows XP has a higher clock speed, the new one running Windows Vista is the faster one.

import speed

I redid the RootsMagic 3 import tests on the new PC. The Windows XP tests were was done with RootsMagic 3.2.5, the new Windows Vista tests were done using RootsMagic 3.2.6. That is not exactly the same version, but it not really different either. The real difference is made by the machine and the operating system.

1 MB GEDCOM

On the old computer running Windows XP, RootsMagic 3 needed 16 seconds to import the 1 MB GEDCOM. On the new computer running Windows XP, it needs only 5 seconds; just switching to the new PC makes the import about three times as fast.

Of course, for such small time intervals, time measured to one second of precision is not very accurate, but then it does not have to be; when the import is that fast, it hardly matters how fast it is exactly.

100k INDI GEDCOM

The 100k INDI GEDCOM test is a more accurate indicator of the performance difference. On the old computer running Windows XP, RootsMagic 3 needed 5m21s seconds to import the 100k INDI GEDCOM. On the new computer running Windows Vista, it needs only 1m58s seconds; just switching to the new PC makes the import close to three times as fast.

RootsMagic 4

GEDCOM Import

menu item

RootsMagic 4 does not have an Import GEDCOM menu item, it has an Import… menu item. When you choose that item, RootsMagic presents the Import File from Another Program dialog box.

direct import

RootsMagic supports direct import from all previous versions of RootsMagic, Family Origins version 4 and later, Personal Ancestral File (PAF) and Family Tree Maker Classic, but not New Family Tree Maker.

RootsMagic 4 import dialog box

import

RootsMagic allows import of a GEDCOM file into an empty project.

The direct import and GEDCOM import options are combined in a single import dialog. When you choose to import a GEDCOM file and I know where the file is, RootsMagic presents a normal File Open dialog box.

Once you’ve picked the GEDCOM file to import, RootsMagic does not immediately start importing it, but prompts you for information on the GEDCOM source and whether to include that. The default is NO - Do not add additional source to imported data, so for the import test I just chose OK.

That is the only question RootsMagic asks. Once you answer it, it starts the import.

progress

During import, RootsMagic display a progress dialog. It is conceptually the same as the one shown by RootsMagic 3, but shows more detail. RootsMagic 3 shows people, families, sources and repositories imported so far. RootsMagic 4 additionally shows the number of events and citations.

RootsMagic 4 import progress dialog box

RootsMagic still does not display an overall progress bar, time elapsed or time remaining, but that is hardly a problem, as it is pretty fast.

passes

RootsMagic processes the GEDCOM in a single pass. The picture above merely shows that in this particular GEDCOM file all People, Events and Citations come before the Families, Sources and Repositories.

The Final Progress listed at the bottom of the dialog box is the finalisation of the import, which took perhaps a second.

Once the import is done, the progress dialog box disappears. RootsMagic does not show any import summary or statistics.

import log

file name

The RootsMagic 4 import log file has the same base name as the file being imported, but has file extension *.LST. The import log is written to the same directory as the database it is imported too.

show

Back when I looked at RootsMagic 3’s GEDCOM support, I remarked upon the fact that it makes an import log, but does not offer the user to show it upon completion. This has not changed.

unknown error

I additionally remarked that the quality of RootsMagic 3’s import log is low, and I am sorry to report that it has not improved either. RootsMagic still knows just one type of error; anything and everything it cannot handle for whatever reason is Unknown info (line number). RootsMagic tells you the line number, but never tells you what the problem is.

interpretation

It is easy to understand what the problem is when a line contains a PAF-specific GEDCOM extension such as _MARNM, but the line itself does not always make interpretation of the uninformative Unknown info message that easy.

For example, the message Unknown info (line 1455855) for line 1455855 containing 4 DATE 3 Nov 1911 tells me nothing; that looks like a perfectly valid date in a perfectly valid tag, so how am I supposed to figure out what the (supposed) error is?

wrong

Perhaps RootsMagic is right to complain about that line. That other programs do not complain about it still does not mean the GEDCOM file is right. However, RootsMagic is most definitely wrong; RootsMagic is wrong to complain about an error without giving any hint just what that error might be.

issues

The big issue here is not whether RootsMagic 4 is right or wrong to reject this particular line. The issue is that the import log does not give the slightest hint why it rejects this line. A yet bigger issue is that the import process finishes without informing the user that there were import errors.

That was my major complaint about RootsMagic 3’s GEDCOM import, and it remains my major complaint about RootsMagic 4’s GEDCOM import.

import speed

The RootsMagic 4 GEDCOM import speed is nothing short of impressive.

On the old PC, RootsMagic 3 imports the 1 MB GEDCOM in 16s and the 100k INDI GEDCOM in 321s. RootsMagic 4 imports the same files in just 6s and 226s.

Its average import speed for the 100k INDI GEDCOM is about 440 INDI per second or 170.000 bytes per second.

The import time of just 226 seconds makes RootsMagic 4 nearly as fast as PAF, which needed 224 seconds. That is a very nice performance, as RootsMagic has a more complex database than PAF, in support of its many features.

On the new PC, RootsMagic 3 imports the same files in 5s and 118s, while RootsMagic 4 performs the imports in just 4s and 80s. That is an import speed of more than 1250 INDI per second, and close to half a MB per second.

GEDCOM export

GEDCOM only

RootsMagic supports direct import from various competing programs, but not direct export. Export is always to GEDCOM. The Export… menu item brings up the GEDCOM Export dialog box.

options

The GEDCOM Export dialog box has quite a few options. All I wanted to know right now is how fast the export is and what the result looks like, so I simply opted to export all data for everyone.

speed

Export of the 100k INDI GEDCOM takes longer than import of the same database.
On the Windows XP machine, it takes 8m29s (509s), on the Vista machine, it takes 4m43s (283s). That is 509 ÷ 226 = 2,25 the times it takes to import on the old computer, and 283 ÷ 80 = 3,53 the time it takes to import on the new one.
So GEDCOM export takes roughly three times as long as import - and is still slower than GEDCOM import into RootsMagic version 3.

character encoding

When you export a GEDCOM file from RootsMagic 3, you can not choose the encoding, because RootsMagic 3 supports nothing but ANSI. When you export a GEDCOM file from RootsMagic 4, you do not need to choose an encoding; RootsMagic always uses UTF-8, as it should. There is no option to choose a lesser encoding.

Byte Order Mark

An UTF-8 file should start with a Byte Order Mark (BOM). A Byte Order Mark tells editors such as Notepad which Unicode encoding has been used. If the Byte Order Mark is absent, the editor must use heuristics to guess the encoding, and will sometimes guess wrong (Windows has a function to do the guessing). Including the BOM makes sure an editor does not have guess, but will know, and will therefore interpret the file correctly.

The GEDCOM files that RootsMagic creates do include the BOM as they should. There seems to be no option to create UTF-8 files without a BOM, and that is a good thing.

other encodings

RootsMagic does not export BOM-less UTF-8 files and it does not support anything but UTF-8 export. All that is as it should be.

There are genealogical applications on the market that do still not support UTF-8. The inability of these programs to read those GEDCOM files is a problem for those vendors, not for the RootsMagic user.

Once you are using a fully Unicode-enabled application, you just do not want to switch back to the limitations of code-page based application.

ANSEL

I did a few quick tests to determine which character sets and encodings RootsMagic 4 supports on import. I found that it supports ANSI, UTF-8 and UTF-16 (what the GEDCOM standard confusingly calls UNICODE, as if UTF-8 isn’t Unicode), but does not support ANSEL.

That RootsMagic 4 does not support ANSEL is a major limitation already. What’s even worse than the lack of support is that RootsMagic 4.0 does not warn you that it does not support ANSEL, but bluntly imports ANSEL GEDCOM files as if they are ANSI GEDCOM files, mangling your data on import. That is unacceptable.

version 5.5.1

There’s another thing that RootsMagic 4 does right. RootsMagic does not just use GEDCOM 5.5.1 features, but also correctly labels its GEDCOM files as version 5.5.1 files.

There are few genealogy applications that do this right, mainly because even FamilySearch’s latest version of PAF (still PAF 5.2.18) does not lead by example; PAF 5.2.18 writes GEDCOM 5.5.1 files but mislabels them as GEDCOM 5.5, a mistake that makes part of their content illegal. RootsMagic 4 writes GEDCOM 5.5.1 files and labels them as such.

quality

Browsing through the GEDCOM file RootsMagic produced, I did not notice anything really out of the ordinary. Overall, the GEDCOM it had produced looked fine. I noticed some RootsMagic-specific tags, but those start with an underscore as they should.

identifiers

Like most genealogy applications, RootsMagic maintains the record number used in the GEDCOM, so that each person has the same number in both RootsMagic and the originating application. The same is true for family family identifiers.

RootsMagic additionally supports the unique identifiers (_UID in GEDCOM) as also supported by PAF, Ancestral Quest, and RootsMagic's predecessor, Family Origins.

unknown tags

As a quick and dirty test of GEDCOM quality, I imported RootsMagic’s GEDCOM back into PAF 5 again. PAF obliged by producing more than 80.000 errors. Most of these are about PAF not supporting the _TMPLT tag, but there are also a few errors about the unknown _SUBQ, _BIBL and _EVDEF tags. There is a pattern here; all of this has to do with citations and sources.

sourcing

That seem understandable. After all, RootsMagic 4 supports Evidence Explained-templates for sourcing, and PAF 5 does not. Then again, RootsMagic still supports non-templated sources, so it is hard to understand just why the exported GEDCOM so problematic.

multiple names

Another import problem that PAF 5 complains quite a lot about is PAF cannot store more than one name per individual.

also known as

What happened is this; Although the GEDCOM standard supports multiple names per individual, PAF supports just one name per individual. Well, that is not exactly true, PAF has a PAF-specific also-known as field, which shows up in PAF GEDCOM files as the PAF-specific _AKA tag.

RootsMagic

RootsMagic’s GEDCOM reader supports that PAF-specific tag, and then treats it as another name for the same person. RootsMagic’s GEDCOM writer does not use that non-standard _AKA tag, but simply lists multiple names, as GEDCOM allows. PAF does not support that GEDCOM feature, and then reports an import error.

PAF

This is a defect in PAF. RootsMagic went beyond the call of duty by making sense of the PAF-specific tag and processing it correctly, and is not to blame for PAF’s inability to read a proper GEDCOM file.

RootsMagic 3

Out of curiosity, I decided to try the same PAF to RootsMagic and back conversion with RootsMagic version 3 instead of 4. PAF complained about the same things, but not as often as it complained about the RootsMagic 4 GEDCOM. Somehow, the RootsMagic version 4.0 GEDCOM files are more problematic than RootsMagic version 3 GEDCOM files. This seems an unintended effects of the RootsMagic rewrite, and will perhaps be addressed with the first RootsMagic 4 patch.

conclusion

fast

RootsMagic is Unicode-based now. Both the GEDCOM import and GEDCOM export speed have improved considerably. The RootsMagic 4 GEDCOM import speed is impressive, and on a new computer, is it is blazing.

failed

Alas, because the 100k INDI GEDCOM is a ANSEL GEDCOM, and RootsMagic 4.0 mangles ANSEL GEDCOM files, I consider the import to have failed. Hence the strike-through in the test overview below.

RootsMagic 3

I did a quick test to make sure; RootsMagic 3 does support ANSEL. That ANSEL support is limited because RootsMagic 3 is code-page based application, but it is there.

RootsMagic 4 was supposed to have all features of RootsMagic 3, but the ANSEL support seems to have been overlooked.

import errors

That RootsMagic does not warn the user about import errors is a serious omission, and that the import log lists errors but does not identify the errors remains the weakest part of RootsMagic’s GEDCOM import handling. RootsMagic’s handling of import errors is so uninformative that the import log might almost just as well not be there at all.

RootsMagic 4’s GEDCOM import is speedy, very speedy, but I’d rather have a somewhat slower one that produces a better GEDCOM import log.

character encodings

RootsMagic 4 never bothers the user with questions about character encodings, which is a good thing. However, it does not admit to its own limitations, and that is a bad thing.

RootsMagic supports import of ANSI, UTF-8 and even UTF-16 GEDCOM files, but does not support ANSEL GEDCOM files. RootsMagic 4 does not warn about its limitation, but bluntly mangles your data if it is ANSEL encoded.

The GEDCOM export always produces UTF-8 files complete with Byte Order Mark, as it should. There are still genealogy applications on the market that cannot not read that, but that is a limitation of those applications. RootsMagic is doing the right thing.

changes

RootsMagic 4 was rewritten to be a solid basis for future versions. Some big and small features were added but it is still in many ways the same program as before. The rewrite gave the GEDCOM import an impressive speed boost, but it seems functionally unchanged; the quality of the import log is still deplorable.

Apparently, improving import functionality was not on the agenda for version 4.0. That’s understandable; supporting Unicode and all the new features of RootsMagic 4 is quite enough change for one version already.

That the export quality of the first version to fully support EE-style sourcing is a bit problematic is no reason for despair yet. 
However, import functionality seems to have been completely off the agenda for version 4; no one noticed that RootsMagic 4 lacks ANSEL support, although that is still a fundamental feature for genealogy software.

Here’s hoping that version 4.1 will solve the problems, bring back ANSEL support and improve the import log.

updates

2009-04-02 ANSEL

A RootsMagic patch released 2009 Mar 30 upgrades the RootsMagic version to 4.0.1.1. Version 4.0.1.1 imports ANSEL GEDCOM as ANSEL.

The GEDCOM import is slightly slower now, but still blazing fast. On the old computer, RootsMagic imports more than 100 KB per second, and on the new one, more than thousand INDI per second. RootsMagic 4 is still a bit faster than RootsMagic 3.

2010-02-22: UTF-8 mangling

RootsMagic 4 imports UTF-8 GEDCOM just fine, but mangles when it is an UTF-8 GEDCOM without Byte Order Mark by importing it as if it was Windows ANSI.

2010-08-09: RootsMagic 4.0.9.5

The RootsMagic 4.0.9.5 update fixes several defects, including import of UTF-8 without BOM.

import speed

RootsMagic 3

RootsMagic 3.2.5 (old computer, Windows XP)

file1 MB GEDCOM100k INDI GEDCOM
time16s5m21s
time in seconds16321
INDI per second303,88311,74
bytes per second65.993,44120.870,38

RootsMagic 3.2.6 (new computer, Windows Vista)

file1 MB GEDCOM100k INDI GEDCOM
time5s1m58s
time in seconds5118
INDI per second972,40848,03
bytes per second211.179,00328.808,41

RootsMagic 4 original release

RootsMagic 4.0.1.0 (old computer, Windows XP)

file1 MB GEDCOM100k INDI GEDCOM
time6s3m46s
time in seconds6226s
INDI per second810,33442,77
bytes per second175.982,50171.678,73

RootsMagic 4.0.1.0 (new computer, Windows Vista)

file1 MB GEDCOM100k INDI GEDCOM
time4s1m20s
time in seconds480
INDI per second1.215,501.250,84
bytes per second262.523,75484.992,41

RootsMagic 4 with ANSEL patch

RootsMagic 4.0.1.1 (old computer, Windows XP)

file1 MB GEDCOM100k INDI GEDCOM
time6s4m56s
time in seconds6296
INDI per second810,33338,06
bytes per second175.982,50131.079,03

RootsMagic 4.0.1.1 (new computer, Windows Vista)

file1 MB GEDCOM100k INDI GEDCOM
time4s1m37
time in seconds497
INDI per second1.215,501.031,62
bytes per second262.523,75399.993,74

links