Modern Software Experience

2012-11-27

Switching from PAF to RootsMagic

RootsMagic does not import my PAF database correctly. That's because I messed up, RootsMagic messed up, and, first of all, FamilySearch messed up.

Personal Ancestral File

I've mentioned it sometimes, so regular readers are aware that I've been using Personal Ancestral File (PAF) for years.
That isn't so odd, Why PAF is still popular lists many reasons why the decade-old Personal Ancestral File (PAF) remains popular with many users.
Still, RootsMagic has won the GeneAward for Best Genealogy Product for three year in a row, so it makes sense to consider a switch to RootsMagic. One major reason I have not switched to RootsMagic yet is that RootsMagic does not import my PAF database correctly. That's because I messed up, RootsMagic messed up, and, first of all, FamilySearch messed up.

multiple names

One issue you come across in genealogical research is that one person may have multiple names. Often the different names are just different spellings of what's really the same name, but truly different names are not rare either. Genealogy software supports this by allowing you to enter multiple names. You generally designate one particular name as the main name, and the others as alternates.

RootsMagic does an above average job of supporting multiple names; while most genealogy software will show just the main name in their name list, RootsMagic can show the alternative names as well. This is one of the reasons I'd like to switch from PAF to RootsMagic, as the properly sorted alternative names will help me spot duplicate individuals.

GEDCOM

The de facto standard for genealogical data interchange is GEDCOM 5.5.1, which, like PAF, is published by FamilySearch. As A Gentle Introduction to GEDCOM explains in a bit more detail, GEDCOM is not perfect, and the various GEDCOM implementations do have idiosyncrasies and shortcomings.

GEDCOM includes support for multiple names, and it's very simple; just list the main name first, followed by all the alternatives. That was in the GEDCOM specification in the previous millennium already, yet even today many applications still do not support it.
The GEDCOM 5.5.1 specifications says this about it:

Multiple Names

GEDCOM 5.x requires listing different names in different NAME structures, with the preferred instance first, followed by less preferred names. However, Personal Ancestral File and other products that only handle one name may use only the last instance of a name from a GEDCOM transmission. This causes the preferred name to be dropped when more than one name is present. The same thing often happens with other multiple-instance tags when only one instance was expected by the receiving system.

Personal Ancestral File

The root of the problems is that Personal Ancestral does not support multiple names as it should. The above quote from FamilySearch's own GEDCOM specification admits as much. With FamilySearch's own PAF not setting the right example, in fact even losing data on import, few other vendors bothered to implement multiple names correctly.

RootsMagic supports multiple names correctly.
Now, if all you knew about PAF was the above quote from the GEDCOM specification, you might think that PAF does not support alternative names at all. The truth is worse: PAF does support multiple names, but does so in a non-standard way.

Also Known As

PAF does not simply support multiple names, as it should, but instead supports both a Full Name and an Also Known As field. You use the Full Name field for recording the preferred name, and the Also Known As field for recording alternate names. That PAF way of doing things is not the standard way of doing things, and that's a problem.

When you export a PAF database to GEDCOM, the Also Known As field gets exported as the _AKA tag. The Also Known As field is a non-standard field, so PAF uses a non-standard tag. All non-standard GEDCOM tags should start with an underscore, and the _AKA tag starts with an underscore, but that does not mean everything is peachy.
The fact remains that the alternate names could and should have been written to the GEDCOM file using the NAME tag. Now that the alternate names are in non-standard tags, there is a good chance that they'll be lost. After all, you can't expect all vendors to start supporting each others non-standard tags. In fact, the GEDCOM standard explicitly states that importing applications are free to ignore non-standards tags, and when they ignore the non-standard _AKA tag, all the alternate names you recorded are lost.

RootsMagic and PAF GEDCOM

The current version of RootsMagic, RootsMagic 6, is the third major release of the RootsMagic rebuild that was released as RootsMagic 4. The RootsMagic 4 GEDCOM article examined RootsMagic's GEDCOM support, and found a few issues with character set support that were subsequently fixed through early RootsMagic 4 updates. It noted that the GEDCOM import is fast, the import log is poor, that RootsMagic correctly identifies its GEDCOM 5.5.1 files as GEDCOM 5.5.1 files (another thing FamilySearch itself does wrong), and that it supports the _UID tag.

Import of a PAF GEDCOM into RootsMagic worked just fine, but import of the RootsMagic GEDCOM into PAF did not go fine. PAF not only complained about various RootsMagic-specific tags it does not know about, it also complained that PAF cannot store more than one name per individual.

What happened is this; the PAF database contains alternate names in Also Known As fields, which its export to a PAF GEDCOM using the _AKA tag. RootsMagic knows about and supports the PAF-specific _AKA tag, and manages to import the alternate names. When RootsMagic exports the database again, the alternate names are exported to GEDCOM as they should be exported; as multiple NAME records. PAF still doesn't support multiple NAME records, so when it is asked to import that RootsMagic GEDCOM file, it complains that PAF cannot store more than one name per individual. When the import is done, PAF remembers only one name per individual. All the other names are lost on import.

PAF should not be using the non-standard _AKA record, PAF should support multiple NAME records.

direct PAF import

It is worth noting that RootsMagic will not only PAF GEDCOM files, but will also import PAF databases directly. However, it did not matter whether I took advantage of RootsMagic's ability to import a PAF file directly, or imported the same PAF database via a PAF GEDCOM; the handling of the Also Known As field is the same for both import methods.

RootsMagic Alternate Names

As RootsMagic 4 GEDCOM already noted, round-tripping the database through GEDCOM fails despite RootsMagic's effort, because is FamilySearch PAF is defective; PAF should not be using the non-standard _AKA record, PAF should support multiple NAME records. By not doing so, FamilySearch PAF is disrespecting FamilySearch's own GEDCOM specification.

It is great that RootsMagic went beyond the call of duty by supporting the Also Known As field and corresponding _AKA tag at all, but RootsMagic's handling of the Also Known As field is not perfect.
New in RootsMagic 4 was the ability to have alternate names show up in the name index. I examined this feature in the RootsMagic Alternate Names article, and in doing so, uncovered a serious defect in RootsMagic's import.

An import routine must import your data. It should highlight your errors, it may not introduce new ones.

splitting names

When the RootsMagic import encounters a name that isn't split into a first name and last name, it tries to split the name itself. The 2009 article RootsMagic Alternate Names provides an example of how that goes wrong.
You might be tempted to argue that it was a new feature, that the name-splitting logic wasn't very smart yet, but would improve over time. However, teaching a computer to split names into given names and surnames isn't easy. The Splitting Names article already observed that correctly splitting a name is considerably more complex than recognising a few surname prefixes.

The key issue isn't how often a simplistic surname prefix recognition algorithm gets it right, or how you can improve upon that. The key issue is that, whenever RootsMagic gets it wrong, RootsMagic introduces an error into your database, and that is simply unacceptable.
An import routine must import your data. It should highlight your errors, it may not introduce new ones.

One could argue that, technically, RootsMagic does not introduce an error when gets a split wrong; but replaces one error (no split) with another (wrong split).
However, a wrong split is considerably worse than no split; it is easy to automatically identify unsplit names, but identifying which names have been split incorrectly is hard, even for knowledgeable persons. That's the real problem with what RootsMagic is doing.

There is one solution: let the user do it. After all, the user should have done it in the first place.
Splitting Names points out that a surname prefix recognition algorithm can still be helpful:

The simplistic surname prefix recognition algorithm that several genealogy application use is not completely useless. It can be used to guess the split and present that guess to the user for approval.
However, that algorithm should not be used to split the name and silently assume that the split was done right. The application should involve the user.

which names

Many users will never notice that RootsMagic's import routine contains broken name-split logic. RootsMagic does not alert users to what it did, the name splitting does split many names correctly and gets only a small percentage wrong, and most names are already split. Most databases do not contain unsplit names to begin with.

In a GEDCOM file, the NAME field contains two forward slashes, the first slash marks the begin of the surname, and the second slash marks the end of the surname, like this: GivenName /Surname/.
PAF additionally supports the GIVN (given name) and SURN (surname) tags.

When you are entering data into PAF, which has just one big Full Name field, you indicate the surname the same way as is done within GEDCOM, by using two forward slashes to mark the begin and end of the surname. If you omit the slashes, PAF will add those slashes; it uses a simple algorithm to guess where the slashes should go, and then allows you to correct its guess. This way, PAF ensures that these names are always split into given name and surname.
However, while PAF only does this for names entered into the Full Name field, it does not not do this for names entered into the Also Known As field.

RootsMagic does not have to split the names the PAF user entered into the Full Name field, as those names are split are already. The names RootsMagic is splitting, not always correctly, are names that were entered into the Also Known As field.

The PAF Help file is no help. There is no guidance on how to use the Also Known As field in there at all.

what I did wrong

I've already explained that RootsMagic should not silently split names, but involve the user. What I haven't explained yet is why the names in the Also Known As field aren't split already. The answer is that I fouled up.

As soon I knew what PAF's Also Known As field is for, I started using it enthusiastically. Now, it is true that PAF hardly takes advantage of the data in Also Known As field, but I figured that I'd better enter all extra names as I encountered them, so as not to forget them, and that one day, some other application would be smart enough to make good use of it.
That much was right; it is good to enter useful data you learn about before you forget about it, and RootsMagic 4 and later do make good use of alternate names.

It was entering the alternative names that I got wrong; for many years, I never bothered to use the slashes. In fact, I often did not even bother to enter full names, but merely entered the alternative given name or surname I'd learned about. Over the course of many years, I entered lots of helpful data, but never thought to enter it in structured format like the Full Name field. It was only when I imported my database into RootsMagic 4, that I realised I'd been doing it wrong all this time.

RootsMagic import behaviour may be wrong, but I would not have experienced any problems if all the Also Known As had been split in the first place. The solution to the problem was obvious: start fixing all the names in the Also Known As fields:

While I wait for RootsMagic to improve its product so it will import multiple alternate names from PAF, I need to fix my database by inserting surname slashes for all the alternate names. Once I've done that, RootsMagic will import these names just fine and not try to split them itself.

That's what I wrote in RootsMagic Alternate Names and I've been checking and fixing Also Known As fields as I come across them ever since.
The RootsMagic Alternate Names article shows how to make a custom report that searches a PAF database for Also Known As fields that aren't empty, and therefore should contain slashes, but do not contain a single slash. That isn't a perfect validity check, but it does provide a list of records that still need to be fixed. However, because I have a rather large database with thousands of Also Known As fields, I'm not done yet.

The question why I did not do it right has one very simple answer; no one told me how to do it right. In all the years I've been using PAF I never came across any guidance on how to use the Also Known As fields correctly. The PAF Help file is no help. There is no guidance on how to use the Also Known As field in there at all. In fact, PAF's behaviour made me think I was doing it right; PAF adds slashes to the Full Name field but not the Also Known As field, so apparently you're not supposed to add them there…

Also Known As guidelines

These are the guidelines that FamilySearch should have provided for the Also Known As field:

Using the Also Known As field

Use full names

Always record full names, even if only part is different from the name in the Full Name field.

Make a comma-separated list

There is only one Also Known As field. You can use it to record multiple names. Separate these names from each other with a comma followed by a space.

Do not add anything else

Do not terminate the list of names with a full stop or any other character. Do not use tabs or any superfluous spaces. Just use the comma followed by a space to separate different names from each other.

Use the regular name format

Each name should be in the same format as names in the Full Name field; Use forward slashes to mark the beginning and the end of the surname.

the good, the bad and the fix

the good

Here's what good about the situation:

the bad

Here's what bad about the situation:

FamilySearch messed up support for alternate names in PAF
I messed up entering alternate names
RootsMagic messes up import of alternate names from PAF

the fix

FamilySearch

FamilySearch should provide Also Known As guidelines to users of version 5.2.18.0, for example by updating the help file, and releasing it as version 5.2.19.0.
The real solution would be for FamilySearch to show some sense of responsibility, and finally fix all of PAF's known issues. I'm not suggesting they add new features, merely that they fix what's known to be wrong. The upgrade from PAF 5.2.18.0 to say PAF 5.3 should provide some assistance to help you fix any alternate names you've entered.

Sadly, this is not likely to happy soon. FamilySearch abandoned PAF more than a decade ago already, and FamilySearch is much better at making claims of technological leadership than actually delivering anything.

me, myself and I

It is easy to avoid the broken split logic in RootsMagic's PAF import; just make sure all your names are split already. I am fixing all the unsplit names, by splitting them myself.
I can help RootsMagic do the right thing by informing them about the problem. This I have done through bug reports and articles for RootsMagic 4, and I have now done again, through bug reports and this article, for RootsMagic 6.

RootsMagic's support for the Also Known As field should be upgraded to support multipe, comma-separated names.
RootsMagic

The RootsMagic import seems to assume there is at most one name in the Also Known As field. Alas, there is only one Also Known As field, so users have been using that single to record multiple alternate names. RootsMagic's support for the Also Known As field should be upgraded to support multipe, comma-separated names.

RootsMagic should check the format of each name in Also Known As field, the same way it checks the format of GEDCOM NAME fields; there should be two forward slashes to mark the beginning and the end of the surname. For any name not in this format, RootsMagic should report an error in the import log.

RootsMagic does not need to throw away their naming splitting logic, but they do need to acknowledge that a simplistic surname prefix recognition algorithm is not good enough, and must stop applying their splitting logic silently.
RootsMagic can use their name splitting logic to suggest name splits, asking the user to either confirm the suggested split or make another split, just like PAF does when you enter a name in the Full Name field.

RootsMagic should not try to split names on import. An import should import data, not change it, and certainly not introduce new errors.
RootsMagic should write a list of unsplit and otherwise problematic names to the import log. RootsMagic should call attention to any import issues and the import log detailing these issues by using the import dialog to inform the user of the number issues encountered during import.
Silently making the changes, and then listing them in the import log may be presented as an option, but should not be the default, and users should be warned that such automatic splitting may introduce errors.

It would be nice if RootsMagic listed not only the unsplit names it encountered, but also how it would split them; this would allow users to focus their fixing efforts on the names it splits wrong, and then let RootsMagic split the rest.

PAF to RootsMagic

PAF is still a popular product, and many PAF users are switching to RootsMagic. What's more, because of its Unicode support, good performance and fast GEDCOM export, PAF is one of the few products favoured by users with large databases. I am one of those users, and after many years of using it, I have created more than 20.000 non-empty Also Known As fields. If I were to let RootsMagic do its thing, it would literally mess up hundreds of names. It may mess up just a few names in smaller databases, but then again, there are many more of these. However, it is not the extent of the problem that's the real issue.

cleaning the mess

Three parties messed up; FamilySearch, I, and RootsMagic. History has taught me to not suggest holding your breath while waiting for FamilySearch to fix anything. I am fixing the issues in my database within the limitations of the product, and trust that RootsMagic will fix the issues with its import.

RootsMagic does not have to support PAF's non-standard extensions. That it does so anyway is an extra that makes RootsMagic more attractive to PAF users considering alternatives (pun intended), but only if RootsMagic does it right.

This is not just about name splitting, the Also Known As field or RootsMagic import. This is not just about how FamilySearch messing up, me messing up and RootsMagic messing up combines into a royal mess.
There is a matter of principle here, a principle that too many vendors brush aside at their whim: an import routine should import data, and report what it cannot handle, it should never introduce any errors or otherwise make things worse.

updates

2012-11-28: Clunk 0.01

Tim Forsythe has introduced Clunk, the GEDCOM Fixer, a Win32 console application. Version 0.01 offers just one feature: splitting _AKA names.

links