Modern Software Experience

2008-08-22

Confucius Cup 2008

Welcome to the Confucius Cup 2008, where the best genealogy programs will compete for the coveted Confucius Cup. It’s entirely virtual, but coveted nonetheless. It’s new, but every vendor craves it already. The Confucius Cup is the torture test for the true titans of genealogy software.

The Confucius Cup is a real-world test based on The Confucius Challenge. To claim the cup, the contestants must complete a brutal Confucius Cascade consisting of increasingly gigantic GEDCOMs, tough time limits and an unforgiving judge.

Confucius Cascade 2008

The Confucius Cascade 2008 consists of four challenges leading up to the fifth and final challenge.

the files

Files this big truly test a program’s import efficiency; it cannot afford to waste either CPU cycles or memory. The fifth and final challenge would the Confucius database itself. I do not have this handy, but that matters little. So far, not one program comes close to completing this cascade of challenges.

The 1MB GEDCOM and the 100k INDI GEDCOM are private files you will recognise from my genealogy software reviews. The ITIS and LIFE GEDCOM are public files (See Two Huge GEDCOM Files), that you can download to perform these with your software on your PC.

details

The exact details of these files are as follows.

file  size in bytes number of INDI records number of FAM records
1MB GEDCOM 1.055.895 4.862 1.700
100k INDI GEDCOM 38.799.393 100.067 45.613
ITIS GEDCOM 95.711.087 472.676 65.799
LIFE GEDCOM 595.039.769 2.080.787 225.673

contestants

The Confucius Cup 2008 does not feature the first two challenges of the cascade. The contestants consists of those programs that completed the first two challenges already.

The Contestants

Qualification

100k INDI GEDCOM

Only the best programs stand a chance of completing the challenges. The Confucius Cup 2008 is about the final two challenges of the Confucius Cascade 2008. The field of participants is limited to those programs that completed the first two challenges already (see GEDCOM Import Speed), but not all the programs that completed the import of the 100K INDI GEDCOM qualify.

performance

The Confucius Cup is not just about capability to perform a task, but about performance too.
For the Confucius Cup 2008, the maximum time to import the 100k INDI GEDCOM was set at 10 minutes on 2,7 GHz PC with 2 GB of RAM. For a challenge like this, a program that takes more than ten minutes to import a file containing just 100.000 INDI records is out of its league already.

sidelines

Some well-known names that completed these warm-up challenges were disqualified for being too slow. Although PAF made the cut (3m44s), Ancestral Quest 12 (52m25s) did not. Some other well-known programs that did not meet the bar, programs that are simply too slow to compete at this level, are Family Tree Maker 2008 (2h31m11s), The Master Genealogist 7 (1h26m00s), and Legacy Family Tree (1h01m05s). These programs will have to watch from the sidelines while their more capable competition battles it out.

GedWise needed 10 minutes and 3 seconds for the 100k INDI GEDCOM and was given a wild card.

contestants

This scoreboard shows the remaining contestants and their performance on the 100k INDI GEDCOM.

program version time seconds INDI / s bytes / s
GENViewer Lite1.15 2,344s2,344 42.690,7016.552.642,06
Legacy Charting7.0.071.394 12s12 8.338,923.233.282,75
Relatives demo 1.1 50s50 2.001,34775.987.86
Relatives full 1.1 55s55 1.819,40705.443,51
Family Tree Builder2.0 1m47s107 935,21362.611,15
PAF5.2.18.0 3m44s224 446,72177.675,86
RootsMagic3.2.5 5m21s321 311,74120.870,38
Aldfaer4.0 6m30s390 256,5899.485,62
GedWise6.210m03s 603165,9564.343,94

upgrades

Upgrades are not allowed or encouraged, but mandatory. The cup is open to current release versions only. Family Tree Maker 16 is a lot faster than Family Tree Maker 2008, but FTM 2008 is the latest version, so even if FTM 16 were fast enough, it would still not be allowed to compete.

PAF is the oldest entrant. FamilySearch has been neglecting PAF maintenance for years now. There are no updates or defect fixes at all, version 5.2.18.0 from 2002 is still the latest version. Battery Park Software seems to be displaying similar product neglect; GedWise 6.2 for Palm OS, released on 2006 Jan 15, is still the latest version.

RootsMagic has been updated from 3.2.5 to 3.2.6. Version 4 will be released soon, but RootsMagic 3.2.6 is still the latest public release. MyHeritage Family Tree Builder (FTB) 2.0.0.676 is still the latest version. The latest version of Aldfaer is 4.0.5 and the latest version of Legacy Charting is 7.0.0.118. Relatives updated from 1.0 to 1.1 just in time for the cup.

These are the versions that will be competing for the Confucius Cup 2008.

Two Groups

The contestants have been divided into two groups, the genealogy editors and the genealogy viewers. The genealogy editors must save the database they import into their own format, the genealogy viewers do need not to support their own format at all.
Note that Relatives is a genealogy editor, but that the Relatives demo qualifies as a viewer because the demo does not save files.

The original Confucius Challenge is a challenge aimed at genealogy editors, but viewers should not be forgotten.
Viewers do not need to save files. Saving a large file is a matter of seconds, yet viewers tend to be a lot faster than genealogy editors. Part of the performance difference is explained by the innate advantages of a viewer, but that hardly explains it all. The more important difference might well be that the authors of editors are still focusing on yet more features, while the authors of viewers actually put performance on the list of criteria for their design.

Order of Contestants

First, all contestants will try to complete the ITIS challenge. Then, those that succeeded will try to complete the LIFE challenge.
The order of the contestants is determined by their performance on the previous challenge. The genealogy editors will compete first, the genealogy viewers last. Within each group, the slowest programs will go first and the fastest programs will go last.

There is one exception. Because both files have been created with PAF, PAF will go first to demonstrate that the file can indeed be read.


Third challenge


ITIS GEDCOM

The challenge is to read the ITIS GEDCOM, a file of of less than 100 MB, and less than 500.000 INDI records, but not much less.
To stand a reasonable chance of completing the next challenge, this challenge should be completed in less than a quarter of an hour, but all programs that succeed within half an hour will move on to the next challenge.

Genealogy Viewers

PAF 5.2.18.0

As both files were created with PAF, it was up to PAF to prove it could import its own GEDCOM files before challenging the other contenders.

PAF imports the ITIS GEDCOM in 17m50s (1070s). During the import, the Task Manager did not show any serious increase in memory at all. The resulting import speed of some 441 INDI per second, and close to 90 kilobyte per second is respectable.

GedWise 6.2

GedWise 6.2 actually exceeded the maximum import time for the 100k INDI GEDCOM by three seconds, and was given a wildcard.

GedWise takes 3h04m14s to complete the import to GedWise format. The dialog box shows that import time, but truncated; it shows the minutes and seconds, but not the hours. The time to HotSync the resulting database with the Palm is not included in this challenge.

GEDCOM error

The import log points out one error with the ITIS GEDCOM: "Header Record on line 16 ignored: 2 Form LINEAGE - LINKED". GedWise’s error message is correct. "Form" is not a GEDCOM tag, and "LINEAGE - LINKED" is not a known GEDCOM form. The header should read "FORM LINEAGE-LINKED", with just one space between the tag and its value.

This error immediately reveals another one; the header claims that the file was produced by PAF 5.2.18.0, but PAF would never generate that header. Apparently, the header was created from a PAF header, but should actually list another creator than it currently does.

GedWise does not claim excessive amounts of memory and the GedWise database is less than 30 MB large, but the import time of more than 3 hours is way too slow. GedWise imports less than 50 INDI per second, and less than 10 KB per second.

Aldfaer 4.05

Windows Task Manager shows Aldfaer to be using some 800 MB while importing. The import takes 23m23s, but that is without saving the imported file. Clicking the necessary buttons and waiting for the save to complete takes about five seconds, so the total import time is 23m28s. The resulting file is less than 110 MB. The log file does not show any import issues.

RootsMagic 3.2.6

RootsMagic imports the ITIS GEDCOM is 10m16s (616s). RootsMagic still lost to PAF on the 100k GEDCOM, but turns out to be faster than PAF for a file roughly five times as big. Task Manager did not show any serious increase in memory at all. The resulting import speed of more than 750 INDI per second, some 150 KB per second, is very nice.

One possible explanation for RootsMagic passing PAF could be that PAF spend a lot time on the many fields that are not present, while RootsMagic only spends it time dealing with the fields that are actually present in the GEDCOM file.

MyHeritage Family Tree Builder 2.0.0.676

MyHeritage Family Tree Builder (FTB) managed to complete the challenge, but took way too long to do so. It took only a few minutes for the progress bar in the import dialog box to reach 100 %. At that point Task Manager already showed FTB to be using 910 MB of RAM and CPU usage to be at approximately 50 %. It stayed like that for hours, without any progress indication other than the import dialog box that simply kept claiming to be 100 % done..

After more than 5½ hours, FTB decided to save the file it had imported, and took about five minutes to do so. The measured import time so far is an embarrassingly 5h41m28s. Once FTB is done saving the file, it pops up a dialog with statistics. That box claims it took 19.981,4 seconds, that is 5h33m01s400ms. FTB started to save after 5h33m13s. Never mind the few other seconds, it is pretty clear that MyHeritage has programmed FTB to try and pass of the import time minus the save time as its import time. That’s an attempt to influence the results by presenting false information to the judge, and would be ground for disqualification if FTB had not failed the test already.

The box with statistics has a button titled View Issues. When you click it, nothing seems to happen for about an hour (!), after which another dialog box appears to show the import issues that FTB encountered. The errors FTB claims to have found are Indexes Referenced but not defined, Unrecognised tags and Miscellaneous Warnings. I save this import log as a file - and only with that action did the GEDCOM import really conclude, more than six hours after it was started.

I then tried to view the import log by clicking on the TreeView that the FTB dialog presents. As soon as I did so, FTB decided to crash. I searched several directories for the log file, but FTB apparently crashed trying to write it. FTB’s poor time already shows it to be out of its league already, and not ready to take on the LIFE GEDCOM challenge. This is a tough challenge, and there is no wiggle room. Failing any part of the import task is failure to import. The failure to write the import log means there will be no import time on the score board.

Relatives 1.1.01 (full)

Relatives does not display large trees correctly yet, but this new program did impress with its import speed. Although version 1.1 had improved on the practical import save with a much faster file save, this challenge proved to much for the newcomer. The Task Manager showed its memory increasing to 1.363 MB and Windows even had to increase the size of the paging file. After a bit more than 10 minutes a messagebox popped up: "Uncaught exception: Out of memory". That ended the file import. Relatives immediately started to draw the tree, but that ended with "FATAL: Out of memory". When I killed the process, paging file usage dropped by about 1,3 GB.

Genealogy Viewers

Relatives 1.1.01 (demo)

The Relatives demo does not save files and therefore qualifies as a genealogy viewer, but that matters little. Relatives runs out of memory.

Legacy Charting 7.0.0.18

Legacy Charting show an hourglass during import. The import is not finished until that hourglass disappears and the Chart Creation Wizard displays a list of records. Legacy Charting import the ITIS GEDCOM in just 30 seconds. That is more than 15.000 INDI records per second, and more than 3 MB per second.

GENViewer Lite 1.15

GENViewer Lite is so fast, that you generally don’t notice that it has a multi-phase import. The first phase reads the file and counts the records. This seems to happen about as fast as possible. How many more phases there are is not clear, but with a large file, there is a noticeable wait between GENViewer displaying the counts in its status bar and finishing the import.

During this second phase, the hard disk LED started burning. That is another thing you hardly notice that with small files, but GENViewer does create several temporary work files. I looked at a few files, and guess these are index files.
GENViewer’s memory usage was not excessive yet; for the ITIS GEDCOM, the Task Manager showed the memory usage to be climbing to 171 MB.

GENViewer shows the time it takes to import, and this is a time I’ve come to trust as more accurate than the Windows clock applet I use. GENViewer imported the ITIS GEDCOM is less than 19 seconds. Its import speed is more than 25.000 INDI records per second, and more than 5 MB per second.

results

PAF, Aldfaer, RootsMagic, Legacy Charting and GENViewer Lite succeeded. GedWise, Family Tree Builder and Relatives failed. GedWise ran out of time, Relatives ran out of memory., and Family Tree Builder ran out of both. It is back to drawing board for these vendors.

RootsMagic was the only program to complete the challenge in less than a quarter of an hour. PAF needed more than 17 minutes and Aldfaer more than 23 minutes. All three completed the challenge in less than half an hour, and return for the next challenge.

Wildcard GedWise 6.2 was too slow to continue to the next round, but its get a positive mention nonetheless. GedWise found an error in the GEDCOM header. None of the other programs reported this error, which suggest that they do not bother to really read the header, and bluntly default to processing a lineage-linked GEDCOM without checking whether the header states that it actually is a lineage-linked GEDCOM.

score board

program version time seconds INDI / s bytes / s
GENViewer Lite1.15 18s625ms18,625 25.378,585.138.850,31
Legacy Charting7.0.071.394 30s30 15.755,873.190.369,57
Relatives demo1.1.01 -- --
        
RootsMagic3.2.6 10m16s616 767,33155.375,14
PAF5.2.18.0 17m501.070 441,7189.449,61
Aldfaer4.05 23m28s1.480 335,7167.976,62
GedWise6.23h04ms12s 11.05247,029.521,60
Family Tree Builder2.0.0.676 -- --
Relatives full1.1.01 -- --

Fourth Challenge

LIFE GEDCOM

Several genealogy programs fail to import the 100k INDI GEDCOM because they gobble memory like crazy. The programs that made the cut for this test are better than that. These programs have already demonstrated their ability to import a file of not much less than 100 MB. However, even with 2 GB of RAM installed, a GEDCOM file of more than half a GB is going to be quite a challenge.

This time, the GEDCOM header was checked before proceeding. The header of the LIFE GEDCOM was found to contain the same defect as the header of the ITIS GEDCOM. The challenge proceeded in the predetermined order anyway.

The desired completion time for this challenge is less than one hour, but all programs that complete the challenge within two hours will be deemed successful.

Genealogy Editors

PAF 5.2.18.0

PAF took almost an hour to fail the import. When the error box appeared, the progress box did not repaint anymore. Judging by the speed up until that time, it was close to the 1 million individual mark. If you used PAF, you may know that it restrict the ID in its search box to 6 digits, so my first thought was that there are more 6-digit limitations in the program, and the import failed on record 1.000.000. That would be weird though, as the GEDCOM file was created with PAF.

post-mortem

Time for a little post-mortem software research. PAF left two temporary files. The largest of these was 536.887.296 bytes large. That is ( 512 × 1024 + 16 ) × 1024, so my guess is that failed because its 32-bit arithmetic that could not handle some internal overflow that occurred shortly after the file size passed the 28 bits needed to handle a size of 512 × 1024 × 1024 bytes. It is a little known but easily observed fact that PAF allocates file space in blocks of 8192 bytes. A file size of 536.887.296 bytes is 65.538 blocks. Up until 65.536 blocks, the number of blocks fitted in a 16-bit integer, and PAF crashed shortly after the file sized passed 65.536 blocks. This suggests that PAF failed because it is using a 16-bit value for the number of blocks, instead of the 32-bit value it should be using.

There is also a *.PAF file of some 686 MB, and there are quite definitely records in there, but when I open it with PAF, it appears empty.

That PAF could create the GEDCOM file (if it did) suggests that this defect does not occur during editing or GEDCOM export, but is particular to its GEDCOM import. PAF is old and started as a 16-bit application. Apparently, both the editing and export code were upgraded to 32-bit to handle large databases and export them; Perhaps they even tested import of large, but apparently never with one large enough to make the temporary work file exceed 65.536 blocks.

Oh well, the bottom line is that PAF fails to import the ostensible PAF GEDCOM. If this issue were fixed, the import time would probably be roughly 2½ hours.

Aldfaer 4.05

During the attempt to import the ITIS GEDCOM, I already noticed that Aldfaer would fail within seconds of starting the import if you imported anything else just before that. This experience repeated with the LIFE GEDCOM. The ITIS import succeeded after a clean start, and, apart from Aldfaer again not updating its progress dialog box for a while, the LIFE import seemed to go fine. After two minutes, Aldfaer’s import failed with an "external exception".

RootsMagic 3.2.6

RootsMagic proved to be up to challenge. It imported the more than 2 million records in roughly 1 hour and twenty minutes. Task Manager continued to show low memory usage while the import progressed. The import speed is more than 400 INDI per second, and more than 100.000 bytes per second. That’s respectable performance

Genealogy Viewers

Legacy Charting 7.0.0.118

Legacy Charting failed with a complaint that it could not access the temporary file it created, and than made the ridiculous suggestion that its temporary file may be in use by another program. I double checked to make sure that it was not tripping up because of a temporary file it had left behind after an earlier import. I also double checked that there was more free space than the size of the GEDCOM file. Legacy Charting fails with an error message that is obviously wrong. Legacy Charting failed the LIFE GEDCOM challenge.

GENViewer Lite 1.15

GENViewer struggled to import the LIFE GEDCOM. Task Manager showed GENViewer to be using 891 MB of RAM. The hard disk LED was on throughout the second import phase, which lasted more than 8 minutes, but GENViewer came through. Later, exiting the program took about ten seconds, GENViewer apparently needed that time to do the right thing by cleaning up its temporary files before exiting.

results

Only one genealogy editor and one genealogy viewer managed to complete the challenge.

GENViewer used quite a bit of RAM, but not all of it. It created several temporary files on disk, and That’s probably what slowed it down. Still, less than ten minutes to import the file is still more than 1 megabyte per second. I wonder how well Legacy Charting will do once its import defect is fixed.

RootsMagic was the only genealogy editor to complete this challenge. It took more than a hour, and the huge difference with GENViewer suggests that there is still room for improvement, but it is an acceptable time. You can start the import, have a meeting and a lunch, and come back to find the import completed.

RootsMagic’s performance would be good enough to tackle the Confucius database - if it was a Unicode program. Alas, RootsMagic 3 is a code page-based program limited to the Windows ANSI character set.

score board

program version time seconds INDI / s bytes / s
GENViewer Lite1.15 8m17s750ms497,750 4.180,381.195.459,10
Legacy Charting7.0.071.394 -- --
        
RootsMagic3.2.6 1h20m56s4.856 428,50122.537,02
PAF5.2.18.0 -- --
Aldfaer4.05 -- --
GedWise6.2- ---

Fifth Challenge

Confucius database

The fifth and final challenge of the Confucius Cascade 2008 is the Confucius database, the database of Confucius’s descendants. I do not have the GEDCOM handy, but that matters little.

So far, the programs had to process one "ANSI", one ANSEL and two ASCII files. The final challenge in the cascade requires the program to support either UTF-8 or UTF-16 ("UNICODE").
Both programs claim to support UTF-8, but that is a half-truth. Both programs are code-page based designs limited to the 256 characters of Windows code page 1252 ("Windows ANSI"). Their UTF-8 support is limited to those 256 characters. Both turn "孔夫子" (in an UTF-8 GEDCOM) into "???", and changing names into question marks isn’t importing a file, it’s mangling a file.

score board

program version time seconds INDI / s bytes / s
        
GENViewer Lite1.15 -- --
        
RootsMagic3.2.6 -- --

closing

Well-known players simply aren’t good enough to compete. Only three genealogy viewers and six genealogy editors qualified. Only one genealogy viewer and one genealogy editor completed import of the LIFE GEDCOM. The rest ran out of memory, ran out of time, ran out of both, or simply crashed. Neither of the two remaining program would be able to handle the Confucius database, as they do not support Unicode. All programs failed.

links