Modern Software Experience

2009-09-11

Embla Family Treasures

GEDCOM Import

first time failure

One problem I experienced trying to import a GEDCOM was that, after creating a new family project, the File | Import/Export menu remained grey; I had to restart the application to be able to choose it. The second time I created a new file to try and import a GEDCOM, the File | Import/Export menu was immediately available.

Having to choose between Easy and Comprehensive import? That suggests that the Easy import is not comprehensive and the Comprehensive import is not easy.

easy

A general issue with Embla Family Treasures is that it rarely uses proper dialog boxes, but uses the main window as dialog box instead, just like Brother’s Keeper does.

Other than the main-window-as-dialog box, GEDCOM import starts fairly normal, but soon wants you to choose between Easy Gedcom and Comprehensive Gedcom.

The mere fact that a vendor does not bother to capitalise GEDCOM properly should already make you wonder how serious they take they GEDCOM support at all. Having to choose between Easy and Comprehensive import? That suggests that the Easy import is not comprehensive and the Comprehensive import is not easy. So what do I do if am like most users, and prefer an import that is both comprehensive and easy?

Comprehensive GEDCOM import

GEDCOM versus Embla

The Comprehensive option lets you pick a file, and then starts importing it. However, after doing a first pass to analyse the file, it pauses the import, and then asks you to choose between GEDCOM definitions and Embla Family Treasures standard. Never mind that the choice should read GEDCOM standard and Embla Family Treasures definitions, the real issues are more fundamental; it is not at all clear what the difference between the two options is, how a user is supposed to choose between the two options or even why a user should have to choose anything at all; surely the application can simply import the file to the best of its ability, without bothering the user. Moreover, if there really is something to choose at all, why not let the user choose these options before starting the import?

mapping tags to events

The application defaults to Embla Family Treasures standard. Whichever of the two options you pick, the next option ask you to map GEDCOM tags to their corresponding events. Users should not have to do that.

progress

After confirming that you want the next step twice, the import continues. During import, the main-window-as-dialog-box displays the text Import in Progress below an edit box that shows log messages being added to it.

Family Treasures does display a progress indicator, it just isn’t in the main-window-as-dialog-box, it is in the status bar instead. When the import appears finished, the demo puts up a messagebox to inform you that you entered more than 50 people into the demo version. What Embla does not tell you, is that the import is not finished yet, but only quasi-finished.

log file

Embla Family Treasures writes an import log file. Family Treasures seems to create a log by writing messages to an edit box, and finish off with a final message about writing it to a file. So you might it think that it makes the same mistake as MyHeritage Family Tree Builder, that it only writes the log files once the import is done, that it only writes a post-import log instead of an import log.

Embla Family Treasures does write a real import log, it does write the log messages to the file as the import progresses. If Embla Family Treasures crashes during import, you are able to see how far it got and what issues it encountered. Family Treasures simply writes its messages to both the import log file and the edit box.

log file quality

The log messages lack line numbers and do not distinguish between error, warning and info messages, but the log file is still pretty good. Messages about individuals include both the id and the name of the individual as well as the lines that the messages apply to. For messages that apply to families, Family Treasures actually takes the trouble to list the names of both partners.
The log starts with a message which file is being imported and when, and ends with a claimed import time, which is not accurate.

The biggest complaint about the log file is that it only covers the first few phases of the import, and does not contain any messages for the later phases.

What Family Treasures claims as its GEDCOM import time is in fact the time elapsed from the start of phase 2 (which starts after reading and analysing the header) till the moment it quasi-finishes.

quasi-finished

I remarked that the import is quasi-finished. It surely has every appearance of being finished; the title of this main-window-as-dialog box is Step 4 of 4 - Import Options and this step just completed, one of the last log messages in the edit box is GEDCOM Import complete, the application claims a particular import time, there is a Finished button to click, and when you click it, the main window displays your data in a tree view.

All this is seriously misleading. The GEDCOM import is not over yet. As soon as the main window displays the tree view, Family Treasures starts the next two phases of the GEDCOM import process; the place names phase and the database audit phase.

What Family Treasures claims as its GEDCOM import time is in fact the time elapsed from the start of phase 2 (which starts after reading and analysing the header) till the moment it quasi-finishes. The actual import time is significantly longer.

place name phase

Structured place names are new in version 8. Family Treasures version 7 and earlier used a single line for place names, on which you used commas between the various parts, just like you do in PAF and many other genealogy applications. Version 8 has separate fields like TMG, and demands that you tell it part what goes into which field.

After quasi-finishing the import, Embla Family Treasures throws up a dialog box (and a real dialog box this time) asking how to map a particular place name to Embla’s Place/Farm/Parish/Municipality/County (Province)/Country structure. It then asks the same question for the next place name, and so on, for each place name in the database…

Obviously, this way of splitting each place field name gets tedious very fast, so I checked the Import all place automatically option to be done with it. I then saw it map all places one by one. That automatic mapping is a relatively fast phase, but only because it contains no smarts at all.

I am not impressed by how Family Treasures maps structured place name strings to its separate fields, it does not seem any smarter than TMG. However, in its defence, Family Treasures does offer a Continue later option, and that is probably the option you should take if you are seriously consider switching to this.
During one attempt to import place names, the application crashed.

Once the place name splitting is done, Family Treasures once again pops up a message about the 50-people limit in the demo edition, but that only means you cannot edit your data. The Family Treasures demo does not artificially restrict the size of the GEDCOM you can import.

…does Embla’s GEDCOM import for version 8 imports data into a version 7 database and then upgrade it?

audit database phase

Once the place name import phase is over, the audit phase starts. A messagebox appears stating that You need to run the audit to complete the upgrade process. The audit screen will be opened next; press the Start button to begin.. That is a confused message; this is an import process, not an upgrade from one version of Embla to another - or does Embla’s GEDCOM import for version 8 imports data into a version 7 database and then upgrade it? That is what this message seems to imply. After working with Family Treasures for a while it is also what I believe to be the case.

database integrity checks

Anyway, the last step of the import procedure is to run perform an Audit Structure command on the new project database. That may sound like overkill; surely the application knows how to import data into its own database and there should be no need to check the integrity of the database now?
Maybe, but then again, if you trust all databases operation to never introduce any errors, no application would not ever need to check its own database.

Database integrity are checks are a good thing, and performing such a check as the last step of a GEDCOM import is arguably something all applications should be doing, just to make sure everything went well.

Embla knows the Family Treasures GEDCOM import creates database errors, and includes the audit phase to fix these errors.
Family Treasures

During the audit, Family Treasures removed unlinked events, blank families, bad links to events and empty events. Do not think that this shows how useful the audit is, what it really shows is how bad the GEDCOM import is.

Embla did not add the database audit as the final phase of the GEDCOM import process to make sure everything went fine, but because everything is not fine. Embla knows the Family Treasures GEDCOM import creates database errors, and includes the audit phase to fix these errors.

The particular GEDCOM file it imported is not perfect, but all the issues the audit found should be addressed by the GEDCOM import phases that write the database, not by tacking on an audit command.

Moreover, when the actual import code is so poor that the audit phase is mandatory, the user should not be able to choose to cancel that audit…

twice

By the way, once you choose to run the audit, a messagebox appears that says The audit tool could take a long time to run and it should not be interrupted [sic] during running..

That is not the entire message. It goes on to say that If error messages appear, the audit should be re-run until no more error messages are displayed.. That often means that Embla is actually demanding that you run the audit phase twice; the first time to clean up the mess the earlier import phases made, the second time to confirm that everything is okay now.

final

Family Treasures does not write anything to the import log during either the place name splitting or database audit phase of the GEDCOM import. All the messages it produces during these phases are lost as soon as you move on.

After the audit command, Family Treasures showed the Import/Export main-window-as-dialog again. Despite the fact that the place name import and audit had taken considerable time, the log text in the edit box had not been updated; it did not even mention either place name or audit phase, and was still claiming the same import time. The GEDCOM import is really finished when you have clicked the Finish button on that main-window-as-dialog.

The Easy GEDCOM import logic is seriously messed up; it actually asks you to specify the GEDCOM’s character encoding, something it should really not ask at all, before asking you to select a GEDCOM file…

Easy GEDCOM import

mess

The Comprehensive Gedcom import is a comprehensive mess and the Easy Gedcom option is not as easy to deal with as its name suggests.

The Easy GEDCOM import logic is seriously messed up; it actually asks you to specify the GEDCOM’s character encoding, something it should really not ask at all, before asking you to select a GEDCOM file…

Embla Family Treasures Easy Import

GEDCOM source

As if asking users to specify the character encoding, even doing so before they have selected a GEDCOM file to import, is not confusing enough, the same dialog asks the user to select a GEDCOM source; the first and default choice is GEDCOM, but all the database formats it can read directly are the other choices. It is not clear whether Family Treasures expect you to opt for direct import instead of GEDCOM, or expects you to specify the GEDCOM dialect of the GEDCOM file you have not chosen yet. Either way, this main-window-as-dialog is messed up.

Because the Character set option defaults to From GEDCOM as it should, and the GEDCOM source option defaults to From GEDCOM too, you can simply ignore this particular main-window-dialog, and immediately move on choosing a file to import.

The one That’s easy about the Easy Gedcom option is that, apart from the one messed up main-dialog-as-window just discussed, there are no more options, you can just choose Start, and Family Treasures will display the dialog with import messages scrolling by.

Once the import is quasi-done, you are prompted to perform the place name and the database audit phase.

import timing choices

I’ve done all import measurements using the Easy import option, opting to split place names automatically. I do not recommended that option, but did want to measure import times for a complete import that includes all import phases.

According to Family Treasures itself, the audit process should be run until there are no errors anymore. After considering the issue for a while, I decided to always run this phase of the import process just once.

Embla hurts Family Treasure usability and performance by pausing and demanding user input during import. I’ve tried to respond quickly. Embla can avoid the inclusion of these minor delays by improving the import experience.

1 MB GEDCOM

On my 2.4 GHz quad-core Vista machine, Family Treasures claims to be done in 10m58s, but the actual import time is 12m23s. That is an import speed of barely 6,5 individuals per second.

On my 2.7 GHz single-core Windows XP machine, Family Treasures claims to be done in 21m58s, but actually takes 23m31s. That is an import speed of less than 3,5 individuals per second.

So, if all phases had completely properly, the import speed would be probably be less than one individual per second; a truly stunning unaccomplishment.

100k INDI GEDCOM

When I tried to import the 100k INDI GEDCOM on the quad-core Vista machine, import of GEDCOM records aborted after about 2 hours and 20 minutes, at which point only 9.350 individuals had been imported.

The last message in the import log file is Birth: The import terminated because data was found that could not be handled. The line with the birth date is 2 DATE 1947, so it seems hard to believe that Family Treasures really had any problem processing that. All other lines in the vicinity of that one look just fine too.

Although this phase failed with a fatal error, the import continued with the place name and audit phase as if nothing was wrong, but Family Treasures crashed when I tried to start the audit phase.

By the way, 9.350 individuals in 2h20m is 9.350 individuals per 140 minutes, that is 1,113 individual per second - on a 2.4 GHz quad-core with 4GB RAM. So, if all phases had completely properly, the import speed would be probably be less than one individual per second; a truly stunning unaccomplishment.

tiny test file

When I created a smaller set of data including the person born in 1947 and surrounding family, just some 500 individuals in total, Family Treasures complained Not a valid GEDCOM file - operation cancelled. That complaint is in error. The GEDCOM file I tried to import was just fine, the problem I encountered is that Family Treasures does not support UTF-8 encoded GEDCOM files; if the file starts with a Byte Order Mark (as it should, and PAF always writes), it even fails to recognise the file as GEDCOM. That is a rather basic fail that I do simply not expect from an application that is already at version 8.

When I removed the BOM from the GEDCOM file to get past this Family Treasures limitation, it seemed to import just fine. So, despite the log file message, the failure to import seems unrelated to any birth date.

That import was not really fine. Family Treasures does not support UTF-8, yet it imported the file without even an error or warning, which implies that it bluntly imported the UTF-8 file using another character encoding, thus mangling all text upon import.

Windows XP

I tried the import on my old Windows XP machine. It took about 3h02m to fail in exactly the same way; it produced the same fatal error on individual number 9350, once again happily continued with the place name phase, and once again crashed when the audit phase was started.

Family Treasures is one of those applications that reopen the project you last worked on, and after a crash tries to continue with whatever it thinks it is useful or necessary, which does not make any sense after a fatally failed import. This behaviour is rather annoying. I found that the easiest way to prevent it is to delete the project directory.

slowest?

The import speed until the fatal failure is 9.350 individuals in 182 minutes, and that works out to 51,374 individuals per minute, just 0,856 individuals per second. That would make this the slowest desktop GEDCOM import measured so far.
However, this is not a slow import, this is a failed import.

The import of the 5K INDI GEDCOM is slow, and indeed one of the slowest desktop GEDCOM imports I ever measured. It is slower than Family Tree Maker 2008, slower than Hereditree 2008 and slower than WinFamily 7. Of all the applications measured so far, only Genea 1.4.1 managed an even more epic waste of CPU cycles on this basic task.

why

I do not know why the import fails. The error message does not make sense, so I am guessing that Family Treasures has somehow confused itself already, and this is merely how it shows.

The Embla Family Treasures GEDCOM import is so slow, that its tardiness is better expressed in seconds per individual than individuals per second.
another try

I decided to tried a medium sized file, one with 34.3111 individuals, which MudCreek GENViewer loads in about half a second, just to see what would happen.

After seven hours, the progress indicator claimed the import was 33% done. After almost exactly twelve hours, the Windows XP system crashed. I am not positive that the crash is related to some error in Family Treasures. I am sure that the Family Treasures log file showed it that had imported 17.818 individuals in those twelve hours.

Family Treasures still needed to perform several more import phases, but lets keep the calculation simple; an import speed of 17.818 individuals per 12 hours works out to 17.818 / 43.200 individuals per second. That is 0,412 individuals per second, surely better expressed as 2,424 second per individual.

The Embla Family Treasures GEDCOM import is so slow, that its tardiness is better expressed in seconds per individual than individuals per second.

prohibitively slow

This additional result really drives home that even if Embla Family Treasures is able to import a large file at all, it would take prohibitively long to do so. The import speed is so embarrassingly slow that one aspect alone is a strong indicator that Embla employees never use their own product. It certainly proves that, if they ever bother to test this at all, they never tested with anything but minuscule GEDCOM files.

The bottom line remains that Embla Family Treasures failed to import the large file.

character encoding

The GEDCOM import speed is a serious issue, but not everything about the GEDCOM import is so bad. Embla Family Treasures supports three of the four legal GEDCOM character encodings; ASCII, ANSEL and 16-bit UNICODE (properly known as UTF-16). Weirdly, although it supports UTF-16, support for UTF-8 is missing. Family Treasures does additionally support the GEDCOM-illegal but still often seen IBMPC and Windows ANSI character sets.

Every other genealogy application can and should reject Embla Family Treasures GEDCOM files as invalid as soon as it has processed this header.

GEDCOM export

The Embla Family Treasures GEDCOM export is easier than the GEDCOM import. The GEDCOM export supports the same character encodings as the GEDCOM import.

Alas, Embla Family Treasures fails my superficial GEDCOM export examination as hard as Family Tree Maker 200x; like Family Tree Maker 200x, Embla Family Treasures does not even get the GEDCOM header right.

The header contains illegal GEDCOM tags Date and Time instead of the legal tags DATE and TIME. It also uses the CORP (corporation) tag where they clearly intended to use the COPR (copyright) tag. A simple transposition of characters, but one that you do not expect in an application that is at version 8 already. You expect their testing to have weeded such errors out many versions ago already.


0 HEAD
1 SOUR Embla_Familie_og_Slekt
2 VERS 8.0.30
2 NAME Embla Family Treasures
2 CORP Embla Norsk Familiehistorie AS
3 ADR1 Bj²rn²ygeilen 21
3 CITY Hundvêag
3 POST 4085
3 CTRY Norway
3 PHON +47-5189 1307
2 Data Test
3 CORP Copyright 2009 of FirstName LastName
3 Date 10 SEP 2009
2 Time 16:53:15
1 SUBM @SUB1@  
1 FILE FileName.GED
1 CORP The content of this Gedcom file is copyright of FirstName LastName
1 GEDC
2 VERS 5.5
2 FORM LINEAGE-LINKED
1 CHAR ANSEL
1 LANG English (British)
0 @SUB1@ SUBM
1 NAME FirstName LastName

Inclusion of illegal tags in the GEDCOM header is a serious error. Every other genealogy application can and should reject Embla Family Treasures GEDCOM files as invalid as soon as it has processed this header.

header encoding

Notice the ADR1 and CITY line in the Embla Family Treasures GEDCOM header. Those lines are not in error. This is how ANSEL-encoded lines show up in an editor such as NotePad that does not support ANSEL. If NotePad supported ANSEL, you would see that it says:


3 ADR1 Bjørnøygeilen
3 CITY Hundvåg

Family Treasures’ encoding of these two lines is correct. When you choose another encoding for Embla’s GEDCOM export, the header uses that other encoding too, and that is not correct. The GEDCOM specification states that whatever encoding the file uses, the heading should be encoded in ANSEL. Most, if not all, vendors seem to agree that is an error of the GEDCOM specification, that the entire file should use just one encoding, the one specified in the header.

Embla Family Treasures’ ostensible Unicode GEDCOM export actually exports gibberish.

Unicode GEDCOM

What truly convinced me that Embla never bothered to test the GEDCOM export at all is its ostensible Unicode (UTF-16) output. NotePad support Unicode but somehow did not show the GEDCOM file as it should; it seemed to think it the file is encoded in Windows ANSI, and displays spaces between all consecutive letters. Now, NotePad relies on a Windows function that is pretty good at recognising text encodings in the absence of a Byte Order Mark, so this is unusual.

Embla Family Treasures Unicode GEDCOM in NotePad

I soon discovered Embla’s mistake, one that even the simplest test (such as trying to import into PAF) would have revealed: all the lines are in Unicode, but the CR/LF linefeed is not! The result is that the file is not in any valid encoding at all. Embla Family Treasures’ ostensible Unicode GEDCOM export actually exports gibberish.

If you do try to import this into PAF, which supports Unicode GEDCOMs, PAF reports ERROR 1: ExportTestUnicode.GED, line 1: Missing delimiter, and then repeats that error for every other line. That is pretty clear error reporting. That Embla Family Treasures contains this mistake regardless is practically proof that they never performed this test.

conclusion

convoluted import

Family Treasures’ GEDCOM import is weird, and needlessly complex. It seems to import data into a version 7 database and then run an upgrade process to turn it into a version 8 database. The quality of the import process is so poor that Family Treasures includes a database audit to clean up the errors earlier phases made. Because of the low quality of the earlier phases, that final phase is necessary, yet Embla allows the user to cancel that phase…

The awful GEDCOM import experience is hardly acceptable in proof-of-concept code, yet Embla ships it as part of a commercial application.

log file

Embla Family Treasures writes an import log file and what it contains is pretty good, but it only contains messages for the first few import phases, and lacks messages for the final phases. It also claims an import time that is considerably lower than the actual import time, because it is actually the total time for just some of the import phases.

import speed

Embla Family Treasures failed to import the 100k INDI GEDCOM.
Perhaps that it just as well, as the Embla Family Treasures GEDCOM import is so stunningly slow that it actually makes the already remarkably slow GEDCOM import of applications such Family Tree Maker 200x, Hereditree 2008 and WinFamily 7 look good in comparison.
Its GEDCOM import is actually so slow that it is better expressed in seconds per individual than individuals per second.

character encoding

Embla Family Treasures supports ASCII, ANSEL and UNICODE (UTF-16), but lacks support for UTF-8. It fails to recognise properly encoded UTF-8 GEDCOM as GEDCOM, and does not warn you about it limitation, but import UTF-8 GEDCOM files using some other encoding, thus mangling your data.
It does support the GEDCOM-illegal IBMPC and ANSI encodings.

GEDCOM export

The GEDCOM output contains multiple errors so basic, that it is clear that Embla never bothered to test the GEDCOM at all. The UNICODE output is not UTF-16, but gibberish. The ANSEL output is ANSEL, but its GEDCOM header contains both illegal and erroneous tags.

overall

The GEDCOM support in Embla Family Treasures is unreliable, slow and defective. The GEDCOM produces import crashes easily, mangles UTF-8 files and cannot handle a large file. The GEDCOM export produces invalid GEDCOM and its Unicode export does not seem to have been tested at all. The overall quality of the GEDCOM support is so far below par that Embla should consider changing it name to Embarrassing.

import speed

Embla Family Treasures 8.0.30 (Windows XP PC)

file1 MB GEDCOM100k INDI GEDCOM
time23m31s
time in seconds1.411
INDI per second3,460
bytes per second748,330

Embla Family Treasures 8.0.30 (Vista PC)

file1 MB GEDCOM100k INDI GEDCOM
time12m23s
time in seconds743
INDI per second6,540
bytes per second1.421,120

links