Modern Software Experience

2013-03-20

the consistency check you need most

statistics

I've more than once thought and written about consistency & reasonability checks, yet until a few days ago, I never wondered about the statistics for these checks. Sure, as I work through my own genealogy database, I regularly look at the numbers for my own database, but until a few days ago, I had not wondered about which mistakes are most commmon.
A few days ago, I received an email from Bob Coret that drew attention to these numbers.

A minor genealogy trend of 2012 is that several major online genealogy sites added consistency checks. Late in 2012, Geneanet added consistency checks which are only shown to the owner of the database. A few months earlier, Genealogie Online added consistency checks that are shown to all visitors. On top of that, Tim Forsythe introduced Bonkers, an online GEDCOM sanity checker.

Same Name Children Consistency Check

Which consistency or reasonability checks are done varies from one application to another. Most applications that provide consistency checks perform the basic checks, such as comparing birth and death dates, few check whether a couple's children are born too soon after one another. Back in 2009, I introduced a consistency check that no application was performing yet; the Same Name Children Consistency Check.

To recap the check briefly; children may not be given the same as a living sibling, but may be given the same name as an already deceased sibling. If a couple has multiple children with the same name, the first few must have died young. Quite a few online trees have the first such child marrying a spouse, when it should of course be the last child with that name marrying that spouse.

The Same Name Children Consistency Check is that the lifespan of different children with the same name may not overlap. The check must performed using the minimal lifespan of each child, as implied by the events in the database.

Genealogie Online

About a week ago, I was talking to Bob Coret and mentioned this new consistency check again. About a day later, he had implemented it in Genealogie Online. Another two days later, I received an excited email telling me that het is een echte topperrrrrrr!!!! (it is a real topperrrrrrr!!!!). About half the publication on Genealogie Online had been checked, yet it had already surpassed every other test in number of issues found. The number one issue used to be Age at marriage below 16, with about 27.000 occurrences, the Same Name Children Consistency Check had already found 47.000 issues. Not all the occurrences of people marrying younger than 16 are errors; people did marry young, and those actual marriages are false positives increasing the count. False positives for the Same Name Children Consistency Check are rare.
The number for the Same Name Children Consistency Check has to be divided by at least two, because Genealogie Online issues a message and then counts that towards the total for each child involved in an issue. Still, the Same Name Children Consistency Check finds more issues than any other check.

Same name children with overlapping lifespans is the most common genealogy mistake.

most common issue

I urged Bob to double-check this result. He confirmed that he was using the minimal lifespan approach, so if children have the same name but there are no other dates than birthdates, no message is produced at all. Every link to a family with same name children whose's lifespan overlap checked out.

The new consistency check seems to be working just fine, and Genealogie Online is more than large enough to consider it representative, at least of Dutch genealogy, so that leaves just one conclusion.
Same name children with overlapping lifespans is the most common genealogy mistake.

checkcount
Same Name Children Consistency Check56.068
Age at marriage below 16 years26.835
Mother younger than 1616.621
Mother deceased before birth14.161
Father older than 6513.763
Age above 10013.556
Father deceased more than 9 months13.465
Father younger than 1610.701
Died before marriage9.311
Married before born8.291

statistics

Bob Coret provided some statistics, published here with permission. The table shows the top ten issues found by the Genealogie Online consistency checks. That is for a site with, as of 2013 Mar 20, 54.276 registered members, who created 5.797 family trees, containing 19.482.764, close to twenty million, profiles.
All these numbers are for today, 2013 Mar 20.

The ranking of the various issues has remained fairly constant since last year's introduction of consistency checks to Genealogie Online. Since its introduction a few days ago, the Same Name Children Consistency Check tops the list as the check that finds more issues than any other consistency or reasonability check.

most common genealogy mistake

Genealogie Online is the first and currently only service to implement the Same Name Children Consistency Check, and Bob Coret's latest blog post invites readers to upload their database to his site.
When I wrote Same Name Children Consistency Check back in 2009, I knew the check would be effective at finding serious issues in genealogy databases. and knew from looking at many trees that the issues it find aren't uncommon. I had not thought about ranking the various checks by issues found, so I certainly did not expect the new consistency check to outrank all the existing ones.

The Same Name Children Consistency Check is a must-have consistency check.

It turns out that the mistakes this new check finds are not just common, but the most common genealogy mistakes. The Same Name Children Consistency Check is a must-have consistency check. It is the one check likely to find more issues in your database than any other check.

Now that these statistics are known, more leading vendors are likely to augment their consistency checking repertoire with the Same Name Children Consistency Check.
If the vendor of the genealogy editor you're using is not offering this consistency check yet, ask for it. According to these numbers, we need this check more than any other.

updates

2013-03-20 instant update: Overview frequent inconsistencies

Genealogie Online has a public Overview frequent inconsistencies page now. Numbers are updated daily.

links