A name consists of several parts, such as the given name and the surname. The GEDCOM specification recognises several parts and specifies how to encode these for correct transfer between different genealogy applications.
The GEDCOM specification states that the full name should be followed by several tags that list its constituent parts. In other words, a GEDCOM file should tell you exactly how the full name is split into parts, which GEDCOM calls name pieces. However, not all genealogy applications bother to specify those name pieces in their GEDCOM export.
The specification makes it clear that applications do not have to provide
additional tags. It says that The NPFX, GIVN, NICK, SPFX, SURN, and NSFX tags are provided
optionally for systems that cannot operate effectively with less structured
information
; not even the tags for the given name (GIVN) and surname (SURN) are mandatory.
The specification
not only states that these tags are optional instead of mandatory, it also
suggests that these tags are not important at all, by stating that they are only needed for for systems that cannot operate effectively with less structured
information
.
Apparently, FamilySearch thinks that these tags are superfluous, that most
systems, or at least their own systems, do not need these tags at all. It even
seems to imply that system that do use these tags are inferior, and that is so
wrong.
The clearly prejudiced way of describing systems that do fully support name structure is one of the mistakes in the GEDCOM specification that seems to result from FamilySearch’s preconception that if they do not care about something, nobody else needs it.
That they go so far as to describe systems that do the right thing and
do support these tags as systems that cannot operate effectively with less structured
information
is reprehensible.
Systems that do support name structure and do use these tags are
superior to systems that do not support name structure and do not use these tags.
The right thing to do is distinguish between systems that do fully support the
GEDCOM name structure and systems that provide limited support for that name
structure.
Aldfaer is a popular Dutch genealogy application for Windows. Its dialog box
for entering individual has separate fields for the given name, surname and
surname prefix, but its GEDCOM export does not include the GIVN, SURN or SPFX
tags.
Upon import, Aldfaer recognises the most common surname prefixes, but that does not guarantee that it will import its own GEDCOM files without making a mistake.
First of all, it only recognises the most common surname prefixes, not all of them. A slight spelling variation in an otherwise common surname prefix is enough for it to not be recognised as the prefix it is.
Secondly, even recognising all prefixes is simply is not good enough. A surname may contain (instead of start with) what such a recogniser believes to be a surname prefix.
Thus, because Aldfaer GEDCOM provides only limited support for name structures, export from and import into Aldfaer can not be relied upon to recreate the same database! It may have been changed, and it will have been changed without any warning or error.
Correctly splitting a name is considerably more complex than recognising a few surname prefixes. The example in RootsMagic Alternate Names contains a space. There are much more complex cases, such as double names. On top of that, users may erroneously add titles in a name field, or use it to record an alias when the application has no field for it. Surname prefixes, names with spaces, double names, and user errors make splitting a name complex already, and I have not even mentioned patronyms, nicknames, call names or alternative spellings yet.
Splitting names into their constituent part is not a trivial issue that can be solved by just a few lines of code and a small table of surname prefixes. It actually needs an extensive expert system backed by an extensive database of double names and with knowledge of common abuses of the name field.
The simplistic surname prefix recognition algorithm
that several genealogy application use is completely useless. It can be used to
guess the split and present that guess to the user for approval.
However, that algorithm should not be used to split the name and silently
assume that the split was done right. The application should involve the user.
A basic import rule is that import should not introduce errors.
An application
that relies on nothing more than its surname prefix recognition algorithm to
split names is sure to introduce errors, and that is unacceptable.
The Aldfaer GEDCOM export that fails to include full name structure information does not introduce any error. It provides incomplete information. On export, name information is lost. The GEDCOM that Aldfaer produces is of inferior quality, but it is not wrong.
There are several things an application that does support name structure can do when presented with a GEDCOM that does not provide that structure.
Several applications silently auto-split names into parts using nothing more than some simplistic algorithm. However, because that is practically sure to introduce errors, that approach is wrong.
The best thing an application can do is to complain about the inferior
quality of the file it is asked to import. The application can suggest to the
user that they may they want to choose different export options, upgrade to a
better version of the application that created it, or simply complain in their
turn to the vendor of that application until the inferior export has been fixed.
This avoids importing inferior quality files by encouraging users to provide a
better files and creating awareness about the issue.
The import log should tell how each name has been split, and provide enough information, typically a record ID, to make it easy to find each record in the original database, to make it easy for the user to solve the problem at the source, if possible.
The user may not be able to provide a better file, it which case it should be imported as good as possible. Ideally, the application should provide an interactive interface in which the user can approve or correct suggested splits, which are then written to an editable configuration file that will be used upon the next import. That way, the application will not bother the user with the same question again and again, and a few tries should be enough to accomplish an import that does not need ask about names again.
It is important to keep such a configuration file simple and editable. That may even make is usable with more than one application.
The simplest approach to such a file would be just one name per line, with surname slashes marking the surname (as is done in GEDCOM).
A simple list of names should allow the importing applications to generally make the right decision upon import, so that the user need not fix all the errors a simplistic guessing algorithm would otherwise make after each import. Any name that still does not import correctly only highlights that the originating should provide not just the name, but the name structure as well.
The real beauty of this idea is that you should be able to create such a configuration file by editing the list of decision the importing applications writes to its import log, especially if that part of the import log was written with that possibility in mind.
Meanwhile, informing the user about low quality GEDCOM files does not just create awareness about the issue, but also creates awareness about which programs fully support GEDCOM name structure and which ones do not.
That awareness will see users avoid the applications with limited name structure avoid and favour applications with full name structure support, and there is nothing like user demand to get vendors to upgrade their limited name structure support to full name structure support.
Copyright © Tamura Jones. All Rights reserved.