Modern Software Experience

2010-09-26

import issues

e-depot

StamboomNederland allows online editing, but is first and foremost an e-depot; an electronic depot for safekeeping of genealogical research. It is a place to store research to ensure it isn't lost, and future researchers may benefit from it.

Researchers can import their current database into StamboomNederland via either GEDCOM or XML. The particular XML format used is specific to StamboomNederland. It is not supported by any other application. The XML format used is not documented yet, and it remains to be seen whether any genealogy application will adopt it. Right now, GEDCOM import is the only practical option.

GEDCOM import

The buttons to import either a GEDCOM file or an XML file are not available in the Dashboard or Projects view, where you would expect them. They are on the Sources view.

StamboomNederland Sources

This illogical placement seems a minor oddity, but is actually hints at a colossal conceptual mistake; the third-party developers think of GEDCOM files as a source for the data they contain. No, that doesn't make any sense.

Another oddity is that choosing the IMPORT GEDCOM button does not bring up a file selection dialog box as you might expect, but only leads to an Import GEDCOM File page where you must click another button to bring up a file selection dialog box.

A minor issue with that button is that it is titled Choose File while it should be titled Select file, as it only lets you make a selection. Once you have selected a file, its file name appears next to the button. You can than choose to import that file by choosing the IMPORT button.

StamboomNederland Import GEDCOM File

It may make some sense for StamboomNederland GEDCOM import to default to adding the new data, but it should definitely support replacement of a previous version.

import into current project

StamboomNederland allows importing files into the current project. An import adds to the data that is already there. Some early users have complained about this; they expected each new import to replace the data that is already there. When they imported an updated version of their database, they found that they had many duplicate individuals.

Several web-based systems will replace the current data when import new ones. That is a logical approach for the way such systems are used; users edit their data on their desktop, and then upload it to share the new version of their database with the world.
Now, StamboomNederland isn't just a site for hosting your data. StamboomNederland allows online editing. StamboomNederland isn't an online viewer, but an online editor, and it makes sense for an editor to allow import of data into existing projects.

e-depot

Then again, an e-depot should allow you to update your database by uploading the latest version from your PC. It should perhaps keep one or two previous versions, but it should definitely not demand that you create a new project. Demanding that users keep creating new projects every time they want to upload a new version breaks other StamboomNederland functionality. From the moment you upload your data for the first time, other users may have bookmarked your project. Those bookmarks will not be updated automatically, so these continue to link to the old project; which either contains old data including mistakes you've now corrected or to a project you've deleted. Thus, not being able to upload and import an updated database weakens the ability to cooperate.

It may make some sense for StamboomNederland GEDCOM import to default to adding the new data, but it should definitely support replacement of a previous version.

import process

upload

The import process starts when you choose the IMPORT button.
StamboomNederland does not display an immediately noticeable progress bar on the middle of the page, but it does display the upload progress on the browser's status bar.

StamboomNederland Upload GEDCOM Success

Read carefully; this Import GEDCOM File success page claims the file was uploaded successfully. It does not claim the file has been imported yet.

Once the upload is done, StamboomNederland displays a success screen. That success screen directs you to the processes overview for more information. The blue Look in the processes overview for more information text is clickable. It leads to the Import and export Processes page. This page displays the status of each import and export process.

import status

Once a GEDCOM has been uploaded, its status initially becomes Waiting; it has to wait for earlier imports to complete. Once a file gets its turn, the import status becomes Started, and once the import is done the status becomes Finished. At least, that is how it happens when you perform multiple imports into the same database. When you import GEDCOM files into different databases, the import processes can run in parallel.

StamboomNederland processes: Waiting Started Finished

When the import of a file is finished, the Import and export Processes page shows an information icon for that file. You can click that icon to view an import log. You can click the downward pointing green arrow to download your GEDCOM again.

upload speed

With web servers, the import speed of files does not depend on the desktop PC, but on the connection and the server.

Upload of the 1 MB GEDCOM took about 10 seconds. The upload of the 100K INDI GEDCOM took about six minutes - after which the server seemed to crash and restart automatically; first, no StamboomNederland page would display anymore, then my session was expired. Development on StamboomNederland isn't finished, and what seems to have happened is that the server restarted for an upgrade as soon as the upload was finished.
I was not sure of that so I uploaded the file again. Upload of the HundredThousand.ged file took 6 minutes and 5 seconds to upload the file and another 10 seconds before displaying the success page. That is a total upload time of 6 minutes and 15 seconds.

import

I initially thought that the server was showing import progress, but it was only showing upload progress. Actual import of the GEDCOM into the database happens after that, and takes longer. The Import and export Processes page showed that the import was not Finished yet, but had only Started, and that the next import was Waiting.

The 1 MB GEDCOM imported just fine, but the import of HundredThousand.ged never completed. It failed.

log file?

The import is supposed to produce a log file, but I doubt that it really is an log file. If it was a real log file, there would have been one for HundredThousand.ged that showed how far the import got, but there was log file at all. Log files are particularly useful when an import fails, as a log file may provide information about what succeeded and what went wrong. It seems that StamboomNederland actually writes a post-import report instead of a log file, a report that is only created after a successful import, which largely defeats the purpose of making the file in the first place…

The file that StamboomNederland creates isn't a GEDCOM import log, it is an import process action report. To the average user, the report is useless.

contents

The report that StamboomNederland creates is of little use to genealogist. it does not list illegal tags, it does not warn that values are too large for the database or anything else that tells you how well it processed the GEDCOM. It tells which steps the imports process executed and how much time each step took. For some record types, it also tells how many records it encountered and how many of these were processed without error. As mentioned in other reviews, other applications do find things to complain about in this file, but StamboomNederland does not, which makes me wonder how much error detection it does.


 13-09-2010  02:20:26  : Start mapIndividualElementsAndPersistCitations
 13-09-2010  02:20:26  : Number of INDI tag found in this file : 4862
 13-09-2010  02:20:27  : Number of INDI tag processed without error : 4862
 13-09-2010  02:20:27  : nl.snl.domain.source.Citation to persist : 4862
 13-09-2010  02:20:27  : Start persisting nl.snl.domain.source.Citation
 13-09-2010  02:20:28  : Finish persisting nl.snl.domain.source.Citation
 map individual records time (millisec)                 : 1695

This extract shows what the StamboomNederland report looks like.

The file that StamboomNederland creates isn't a GEDCOM import log, it is an import process action report. To the average user, the report is useless.
It is nice that the report shows which import steps have succeeded, but that is only useful in a real log file, as premature end to these messages indicates which step failed. The report does contain some bonus information I did not expect but like to see; timings for each step and the total import time. This allows calculating the import speed.

StamboomNederland GEDCOM import is embarrassingly slow.

import speed

In a Dutch PDF about importing GEDCOM files, the CBG states that importing a GEDCOM file containing 5.000 individuals should normally take about five minutes, which is a mere 16,67 individuals per seconds. That would be extremely slow, but the total import time you experience is the upload time plus a wait time plus the actual import time. The wait time may depend on how busy the server is. Naturally, the CBG does not want to set expectations too high and provides a conservative estimate.

The import report that StamboomNederland produces does not list the upload time or the wait time, only the time it took the import process to handle the file. For the 1 MB GEDCOM, that time is 71 seconds. The file contains 4862 individuals, so that is an import speed of 4862 / 71 = 68,48 individuals per second.
Many applications I tried on my now seven-year old single-core desktop PC running Windows XP manage an import speed of more than thousand individuals per second for this file. The StamboomNederland import process is running on a brand new server, yet does not get close to one-tenth of that. Even Family Tree Maker 2009 on that old PC beats StamboomNederland on its brand new server. StamboomNederland GEDCOM import is embarrassingly slow.

StamboomNederland: Out of Memory

During the import, several pages, including the profile page (shown above), would not display anything but a so-called exception report. That exception report provides programmers with a lot of information of what went wrong and where. Notice the first line of the root cause section, visible just above the browser's bottom window border: java.lang.OutOfMemoryError. Apparently, the StamboomNederland GEDCOM import process is so immensely inefficient that it was taking all the server's memory already, leaving nothing for the other server process.

StamboomNederland cannot handle large files.

temporary upload limit

I informed the Central Bureau of Genealogy about this show-stopper. A few hours after my attempt to import HundredThousand.ged, the CBG limited GEDCOM uploads to files of 20 MB. That is roughly 40.000 maybe 45.000 individuals. The announcement of this limitation admits that the server has troubling handling current traffic, and that the slowdown was largely caused by the upload of a few large  GEDCOM files. Thus, their decision seems to confirm my impression that the GEDCOM import is very memory-inefficient, causing trouble for other server processes. StamboomNederland cannot handle large files.

The import of HundredThousand.ged never finished. The import was aborted, yet Import and export Processes page continues to show the import as Started. It seems the programmers forgot to account for aborted import processes with an Aborted state.

Thee limitation is temporary, presumably until the extraordinary memory-hunger of the GEDCOM import process issue has been reduced to more reasonable proportions. I am not privy to the CBG's discussion with the commercial company that created StamboomNederland, but am pretty sure that the CBG is not serving champagne to celebrate the successful completion of StamboomNederland yet.

StamboomNederland keeps all your GEDCOM files forever; it does not delete your GEDCOM files after import, and does not allow you to delete later either.

no delete

There is more to tell about GEDCOM import, but I'll end this overview of GEDCOM import with another major issue. When you decide to import a GEDCOM into a project, StamboomNederland will first upload the GEDCOM and then import the GEDCOM into that project. You can do with that project whatever you want, and that includes deleting it. Even after deletion of the entire project, the GEDCOM file you uploaded still remains on the StamboomNederland server. You can delete the project, but you can not delete the GEDCOM. The downward-pointing green arrows on the Import and export Processes page continue to allow you to download the GEDCOM you uploaded. There is no trash icon to delete the GEDCOM files you've uploaded. StamboomNederland keeps all your GEDCOM files forever; it does not delete your GEDCOM files after import, and does not allow you to delete later either.

links