Modern Software Experience

2009-04-24

Genealogy

Consistency Checks

Good genealogy software offers consistency checks, checks that look for impossible and implausible things, that probably indicate a research or data entry error.

This article describes a check that I’d like genealogy software to have. It is not a very complex one, but as far as I know, has never been implemented in any genealogical software yet.

Naming Children

The children of a single couple should all have different names. It may be required by law, but it hardly need be as a civil servant is sure to protest when you want to give two different children the same name. Any law forbidding identical names exists only to avoid legal disputes about the issues. Having multiple children with the same name is just not practical.

In fact, having multiple children with the same initials is not practical either. So much mail is addressed using initials instead of names, that siblings with identical initials are likely to end up opening each other’s official mail. Just something to keep in mind when you pick names for your children.

Same Name Check

It would be naive to think that a couple having multiple children with the same name is the problem and that the solution is to have the software check for this. If any vendor were dumb enough to implement that straightforward check, users were sure to complain about how useless the check is.

Same Name Children

Despite the laws and customs that forbid a couple to have two children with the same name, many couples do in fact give the same name to multiple children. That is not as contradictory as it may sound. The law and custom applies to the living family; no child may be given the name of another living child. It is allowed to reuse the name of a deceased child, and throughout history, many couples have done so.

Human laws and customs are not laws of nature. Sometimes couples do give the same name to two different living children. This is rare, and a challenge for the researcher when it happens.
Implementations of the consistency check can deal with this by allowing a researcher to override the check.

Naming Conventions

The naming of children is often decided by naming conventions. Children are often named after their ancestors. That is not very creative, and does lead to many people with exactly the same name.

Same Name Cousins

A particularly annoying problem for genealogist is that cousins with the same grandfather are not at all unlikely to be born on the same day, in the same place and be given the exact same name.

I was initially surprised when I noticed a living example of that in my genealogy, but soon realised that this is not rare at all. With families staying in one place and naming their children after grandparents. it is very common for two grandchildren to be born in the same place and receive the same name. It is also very natural for both grandchildren to be born around the same, and once in while they will be born on exactly the same date.

Even if both couples meet at the town house, neither is very likely to change the name of their child, as both had already decided on it months ago. Any practical problems that ensue later, when both kids go to the same school, are solved as they are always solved for children with the same name; with call names or nicknames.

Same Name Convention

A single couple can have multiple children with the same name. To be more precise, they may give a new child the same as a previous child if the previous child is no longer alive.

Nowadays many people feel that all children should have different names, to emphasise that each child is unique. In the tradition of naming children after ancestors the emphasis is not on uniqueness, but on having a child named after the ancestor, so if a child dies young, the couple is likely to give the next child the same name.

Male versus Female

Many names have both male and female forms. That allows some flexibility of naming children after ancestors when the child and the ancestor it is named after are not of same gender. Adrian can become Adriana, Juliana can become Julian, Peter can become Peterlina.

Within the context of the tradition, these names may be regarded as the same, to the civil servant they are different. A single couple may name their son Adrian and their daughter Adriana without anyone getting confused. The children may even have both grandfather called Adrian and a grandmother called Adriana. That is not unlikely at all.

The civil servant will not allow the parents to call both their son and their daughter Alex, and will probably suggest that one be called Alexander and the other Alexandria.

For the Same Name Children Check, the different male and female form of a name are not considered identical, but different; it just about the names being identical.
Children with different gender may have the same name. A boy named Alex may be followed by a girl named Alex.

The Same Name Children Check does not consider gender. It simply considers children with exactly the same name, whatever their gender.

spelling variations

A straightforward check, as described here, works perfectly for modern civil registrations, in which the exact spelling of names is fixed. In earlier times, spelling was fluid; Johannes and Joannes are the same name. A smart implementation of the Same Name Children Check will recognise that, but getting that right is not easy; because what was the same name back then are different names today.
Then again, in modern civil registrations Joannes isn't an acceptable spelling variation of Johannes, but it is unlikely that parents would call one child Johannes and another Joannes; while it is not unlikely that Joannes is a misspelling of Johannes by the researcher. A smart implementation that recognises name variations should report errors for truly identical names, and strong warnings for nearly identical names and spelling variations.

Lifetime Overlap

It is common for a couple to have multiple children. It is not uncommon for a couple to have multiple children with the same name, but there is a difference; in general, their children may be alive at the same time, whereas children with the same name are not alive at the same time; after all, if the older child was still alive, the younger one would not get the name.

The lifespan of children with different names may overlap, the lifespan of children with identical names may not overlap. That way of describing the issue suggests to an obvious implementation of this check; just compare their lifetimes. Look at their birth and dates deaths to check whether the lifetimes of the different children with the same name.

That sounds like the obvious approach, but it will rarely work. More often than not, birth or baptism dates are known, but death or burial dates are not. Without the death or burial dates, there simply are no lifetimes to compares.

You might think there is no problem either; just put the children in order by birth or baptism dates, and then assume the early ones are dead when the later ones are named.

There is a problem, and there is solution.

Problem

example

The problem is perhaps best explained with an example. Consider a couple, Peter and Mary, married in 1812, who get three boys, in 1814, 1816 and 1818. The first two die young, and all three boys are called Jan. The last Jan, born in 1818, marries Harriet in 1841, and then has children of his own. The last child is Martin, born in 1853, and registered by his father.

So, what actually happened is this:

Peter m. Mary in 1812,

1. Jan, b. 1814, died young

2. Jan b. 1816, died young

3. Jan, b. 1818, m. Harriet in 1841

In 1853 Jan and Harriet have a child named Martin

research gone wrong

Suppose that Martin is one of your ancestors. That makes Jan and Harriet your ancestors, so you look for information. Their marriage certificate list his parents as Peter and Mary. You find that Peter and Mary were married in 1812, and then start looking through the birth registers for Jan’s birth. You find that they had a son Jan in 1814, are satisfied with the result, and continue to research the ancestors of Peter and Mary.

Several years later, you decide to add all the children of all your ancestors to the family tree. So you find all three Jan, dutifully add the 1814 and 1816 births and move on to the next couple in your tree.

Your genealogy now contains this fragment:

Peter m. Mary in 1812

1. Jan, b. 1814, m. Harriet in 1841

2. Jan b. 1816, died young

3. Jan, b. 1818

error checking

That fragment is wrong. If you take time too look at it, you will notice that it is wrong, but how often do you look at the children of each couple looking for errors? Chances are that you do not know about this particular error at all, because no one ever told you about it, and if you do not even know this error might exist, you certainly do not know how to spot it.

Many genealogist never perform any error check. Those that do rely on the error checking of their software. I run these checks often and have become used to doing them, but I do remember being somewhat surprised and overwhelmed by the sheer number of problems I had accumulated in a few years.
Going through the entire list, correcting all the problems was a lot of work, and when i was done, I had a sense a accomplishment. I was glad it was done, and did not feel like questioning the quality of the checks. Nowadays I do question the quality of the checks, but most people do not. They are happy when their software tells them there are no errors anymore - and that software fails to catch this problem.

Do not think that the problem I sketched is artificial. I used a made up example to protect the guilty, but I have seen this particular mistake in many trees.

I am not calling attention to this problem to make fun of anyone who make the mistake, I am calling attention to this problem to create awareness that it exists - and to explain how easy it is for software to detect this, to save genealogist from the embarrassment of publishing a mistake like this.

Solution

deducing death dates

One solution to detecting the problem is to let the software do what many genealogist do, and fill in the death dates based on the birth dates of later children.

The software need not modify the actual database, just perform the additions in working memory, to support the next step.

That next step is to simply perform the usual checks, during which it will discover a marriage that occurred years after death:

Peter m. Mary in 1812

1. Jan, b. 1814, d. Bef 1816, m. Harriet in 1841

2. Jan b. 1816, d. Bef 1818

3. Jan, b. 1818

Died before 1816, yet married in 1841? Obviously, something is wrong.

If is implemented on top of the existing checks by performing this preparation first, the actual check will produce a message that tells the user that Jan married Harriet after Jan’s death.
Not the best possible message, even a confusing one, that makes the software seem defective, as there is no death date for Jan at all.

check the overlap

The best method is probably to stay true to the original idea, and check the life time overlap. You may not know any death date, but events happen between birth and burial. This method does not start by assuming anything is wrong, it starts by using all the information there is to construct a minimum lifetime, something you may need to do for more checks; if Jan married in 1841, then he was alive in 1841.

That results in three lifetimes

The first lifetime overlaps with the other two, so something is wrong.

With the consistency check performed this way, the application could report the issue as Jan (1816 - Bef 1818) overlaps with Jan (1814 - 1853), followed by Jan (1818 -) overlaps with Jan (1814 - 1853). Those messages clearly communicate what the problem is, and do so in way that it makes it clear how the program discovered the problem.

lifetime

Notice that I did not list the first lifetime as 1814 - Aft 1841, but as 1814 - Aft 1853. Jan and Harriet had several children, the last of which was born in 1853, and registered by his father, so Jan was alive in 1853.

If we had additional information that Jan was witness at Martin’s wedding, we would use that too; the minimal lifetime calculation uses all events the individual themselves participated in, in whatever role.

advantages

The beauty of using all existing information to construct an minimal lifetime to use in comparisons is that it is both general and simple.
It is general because it does not just look at birth and death events, but takes advantage of all known events.
It is simple because it does not need look at any events of other individuals, but only at all events related to the individual itself.

That simplicity keeps the calculation of the minimum lifetime a straightforward procedure that does not get to information from anywhere else. It also keeps it a general procedure that applies to all individuals (not just same name children) that can be used for more than this particular consistency check.

conclusion

problem

The practice of giving multiple children the same name introduces a particular consistency problem; it is fairly easy to associate descendants with the wrong child, and existing consistency checks do not deal with that.

conceptual solution

Such errors can be discovered by checking for overlap between the lifetimes of the same name children, but a straightforward implementation of that check is no solution. If, as is often the case, not death or burial information is available; there are lifetimes to compare.

emulation approach

One solution is to make the software emulate what many genealogist do, and deduce the death of the early children from the birth dates of the later children, and then go on check each of them for inconsistencies as the software already does.

minimal lifetime approach

Another solution that takes full advantage of all available information, is to construct minimal lifetimes from all the events associated with the individual, and then use these lifetimes to check for overlap of these lifetimes, as originally envisioned.

The beauty of calculating the minimal lifetime is that it uses all information for each involved individual, and does not require any information from events the individual itself is not involved in. It is a general procedure that is not specific to either same name children or this check, yet it does allow the actual check to do exactly what it was conceptually envisioned to do; simply check for overlap between the lifetimes of the same name individuals.

updates

2013-03-16: Genealogie Online

Bob Coret has implemented the Same Name Children Consistency Check in Genealogie Online.

2013-03-20: half-siblings clarification

Parents that remarry may bring children with the same name into the new marriage; those same-name children are still from two separate couples.
A parent may have multiple partners, and that may lead to half-siblings with an different last name. The check foremost applies to all children of a parent that have the same full name, be they sibling or half-siblings. A milder warning may be appropriate for siblings that share the same given name, but do not share the same last name.

2013-03-20: spelling variations

Added paragraph about spelling variations.

2013-05-27: Bonkers

Tim Forsythe has added the start of Same Name Children Consistency Checking to Bonkers; it finds children with identical names. Tim Forsythe intends to make it a smart implementation, that handles spelling variations well.

links