Modern Software Experience

2012-11-18

Consistent advice & plausible tips

Genealogy without consistency checks is unlikely to be genealogy.

easy

Performing genealogy consistency & plausibility checks is easy. Just start your genealogy editor and choose the menu item to perform consistency checks.
Well, it should be that easy, but many genealogy editors still do not offer consistency checks. The solution is to upgrade to something better. I'm not joking. You need consistency checks. Genealogy without consistency checks is unlikely to be genealogy. A genealogy editor without consistency checks hardly deserves to be called a genealogy editor.

essential

Consistency checks are an essential, must-have feature of genealogy editors. You should not even bother trying a genealogy editor without consistency checks. If you are trying out a genealogy editor, start by examining the menu and help file to find out whether it offers consistency checks. If it does not, remove it from your system, and try something else.
Don't feel sorry for vendors that put all their efforts into other features. A vendor that cannot be bothered to offer consistency checks in their genealogy editor, should not expect you to bother with their product.

guidance

The vendors that do provide consistency & plausibility checks, even the ones that provide very good checks, provide no guidance on how to use these checks effectively.
Effective genealogy consistency & plausibility checking seems to a topic few have ever written about. Many authors dismiss it with no more than the obligatory remark that you should do it. Even vendors that have gone through the trouble of implementing consistency & plausibility checks in their product, seem so uninterested in making sure you actually take advantage of that feature, that they do not even bother to provide practical default settings.

This articles provides guidance to making the most of out consistency & plausibility checks, such as which checks you should use, which checks you should turn off, how to tweak plausibility tests for best results, and which issues to tackle first.

Performing genealogy consistency & plausibility checks is easy. Solving the problems is the hard part.

check the surroundings

Performing genealogy consistency & plausibility checks is easy. Solving the problems is the hard part.
This article isn't about how to issues problems that the checks uncover, but I still want to give one general piece of advice: check the surroundings.

Errors happen. We make more errors when are in hurry or have a bad day, and all those errors occur in the profiles for the individuals we examined that day. That's why errors tend to be clustered.

Even the most stringent consistency & plausibility checks cannot not catch all errors, but only catch those errors that resulted in inconsistencies and unlikely situations. Other errors remain undetected.
However, we can use the fact that errors tend to be clustered to catch a few more. Whenever the consistency check uncovers an error, do not just fix that error, but double-check all the data for that individual, its partners and families. Double-check who the parents are. If a family seems incomplete, perform research to find all the children.
Not only are you likely to discover any data entry errors you made while you do so, the additional data you enter also provides the consistency checker with more opportunity to discover inconsistencies to alert you to other errors that in the data you already had.

unrecognised dates

Genealogy editors may provide all kinds of checks. Place name checks are particularly useful. This article focuses on consistency & plausibility checks.
These checks work by comparing the dates at which various events occur. Comparison is only possible if there are dates to compare, and those dates are in a format the application recognises. Thus, the first result of a consistency check may very well be list of unrecognised dates, and that should be fixed first.

Inconsistent genealogy isn't real genealogy really isn't genealogy.

first time

The first time you perform a consistency & plausibility check, you are likely to be surprised and more than a little intimidated by the results. Don't panic. First of all, you are doing the right thing. That you decided to perform the checks makes you a better researcher already.

Performing consistency checks is the first step to becoming a real genealogist. Actually solving the consistency issues is arguably the number one thing that separates real genealogists from the rest; after all, inconsistent genealogy isn't real genealogy really isn't genealogy.

Secondly, your database probably isn't as bad as that first check makes it seem. If you went with the default settings for the various checks, there are likely o be a lot of false positives in that list of errors; situations that have been listed as a possible error, but that aren't wrong at all.
In general, the default settings aren't very good. There not only are various settings that need a bit of tweaking, there even are various checks that are best turned off!

turn off

The intimidating size of the list produced by your first consistency check is likely to be a turn-off. However, once you start going through the list to solve issues, false positives become the real turn-off. Luckily, you probably turn off the checks that produce most of those.

The born-before-marriage check is neither a consistency check nor a plausibility check, but a scandal check; it checks whether a couple violated particular mores.

born before marriage

A lot of consistency & plausibility checkers will warn you when a child is born before their parents' marriage. Some slightly smarter ones may take the pregnancy period into account, but that matters little. What matters is that being pregnant before marriage, or even having children before marriage is not rare at all.
The born-before-marriage check isn't a consistency check; there is nothing genealogically inconsistent about having children without or before marriage. The born-before-marriage check isn't a plausibility check either; there is nothing implausible about having kids before marriage, au contraire, it is so common that it is unreasonable to call attention to it.

The born-before-marriage check is neither a consistency check nor a plausibility check, but a scandal check; it checks whether a couple violated particular mores.
The born-before-marriage check is not useless, you may want to turn it back on one day, to go through the results and make sure you didn't make any mistakes, but it should be turned off during regular consistency checking. After all, you are checking for genealogical consistency & plausibility, not looking for gossip.

Duplicate checks often produce a truly impressive amount of false positives.

duplicate check

Some applications include a duplicate check as part of the consistency & plausibility checks. Technically, a duplicate check is neither a consistency nor a plausibility check, but an an even more fundamental check; each detected duplicate represents a failure to record a single individual as such.

Duplicate checks often produce a truly impressive amount of false positives. Feel free to try it, but be don't hesitate to turn duplicate checking off during regular consistency & plausibility checking.
By the way, If you know that you have many duplicates, for example because you recently merged several databases that you maintained separately, it may make sense to tackle that issue first.

same last name check

Several applications offer the option to warn you when two partners have the same last name. That check should be turned off during regular consistency & plausibility checking, as partners having the same last name is neither inconsistent nor implausible.

The same last name check is not useless. You will want to take advantage of that check if you made the mistake of entering many women by their married name instead of their maiden name.

unknown gender

Technically, the unknown gender check isn't a consistency check, but a data completeness check. The idea behind the unknown gender check is that every individual in your database has a gender, and that it is a mistake to omit it. So when some individual is not listed as male or female, but as unknown, the check alerts you, so you can update the profile. That may sound like a useful check, but in practice, it is mostly an annoying one.

As your research progresses and database grows, you'll often find yourself entering stillborn children. It is not unusual for the gender of stillborn children to be unlisted, and over time, your database will come contain quite a few stillborn children of unknown gender. So, when you perform the unknown gender check, most results will be stillborn children for which you will never have information about their gender any way.

You may want to perform this check once in a while, to make sure you have provided a gender for all living born. When you focus on this check, you should find it very easy to ignore stillborn, as these profiles generally lack a first name.
During regular consistency checking, the unknown gender check should be turned off, to keep focus on consistency issues.

born after death

Several applications will check for children born after their parents's death. The notion of children born may have an impossible ring to it, but a just a moment's thought should convince you otherwise. In a biological genealogy, the mother has be alive to give birth, but the father may have died between conception and birth. That is why genealogy applications must distinguish between born after mother's death and born after father's death. Sadly, most applications do not take the pregnancy period into account, and will complain about a child born just days after the father's death, although that is perfectly possible. If you have a large database, the born after father's death check will result in many such false positives.

You should perform the born after father's death check once in a while, to make sure there aren't any cases where the father died long before the child was before. but during regular consistency checking, the born after father's death check should be turned off, while the born after mother's death check should remain turned on.

out of order

Children should be listed in the order they were born. Most reports print children in the order they occur. A random order is confusing. For couples with lots of children, anything but chronological order may easily cause you to overlook the presence of a child, and make you decided to add it again, creating a duplicate.

Many consistency checkers will not only alert you to children that are out of order, but to partnerships that are out of order as well. Those errors are easy to fix, all you need to do is to rearrange the order of children or partnerships, so do not delay fixing them. In the few cases where the children or partnerships are actually in chronological order, but seem out of order because of a typo, you'll often notice that typo right away.

Deal with individual consistency first, relationship consistency second; if a mother's death date is before her birth date, there's no way birth dates for her children can make sense.

individual consistency first

A consistency check may produce a variety of errors. Unless the list is small, it is unlikely that you will fix them all in one go, but even when the list is small, it still pays to fix the most fundamental issues first.
Deal with individual consistency first, relationship consistency second; if a mother's death date is before her birth date, there's no way birth dates for her children can make sense.

settings

Strong consistency checks, such as birth before death, should simply always be done. Some checks are best turned off during regular consistency checking. Other checks should be turned on, but their settings have to tweaked for best results.

maximum age

People can live to be more than 100 years, even more than 110 years. Someone who lives to be 100 is a centenarian, someone who lives to be 110 is a supercentenarian.
Many genealogy consistency checks include a maximum age check. The Guinness Book of World Records has been tracking of the oldest living person in the world, and their data provides practical upper limits, but there is no absolute upper limit. The maximum age check isn't a consistency check, but a plausibility check.

Many applications default to 100 years as the maximum age to check against, and that is too low for regular checks; you'll get warnings for every centenarian in your database (false positives).

Many applications default to 100 years as the maximum age to check against, and that is too low for regular checks; you'll get warnings for every centenarian in your database (false positives). On the other hand, if you set the boundary too high, the check will fail to report real issues (false negatives). You should probably set the boundary value higher than 100, but at the same time keep it as low as is practical for your database.
How low you can set this value without being swamped by false positives does to a large extent depend on the size of your database. After all, a larger database is more likely to contain centenarians or even supercentenarians.
I've found, for a database of more than a quarter million profiles, that 105 is a practical boundary value; it results in only a few false positives, which I have come to recognise as such.
That is how you determine the boundary value for this plausibility check; as low as possible, but still so high that you can remember the few exception to the rule.

100

Vendors are far from wrong to default the boundary value for the maximum age check to 100. Not only do you need you determine the optimal value yourself, 100 actually is the optimal default, not because it corresponds to a particular likelyhood, but because it catches a particular class of errors; cases where the digit for the century is off by one for a child that died young. Children often died very young and off-by-one typos are easily made, so this particular error is far from unlikely - and the maximum age check with the boundary value set to 100 will issue a warning for all those cases. There will be many false positives for actual centenarians, but remains a useful check that you should take advantage.

parents' age at birth

Most children are born to parents in their twenties or thirties, maybe forties. Some ages are considerably more plausible than others, and a plausibility check takes advantage of that, to alert you to unlikely situation that may very well be errors. The parents' age at birth plausibility check has both an upper and a lower limit; neither very young children nor very old people are likely to have children.
There should be separate values for fathers and mothers; while the father and mother are often about the same age, their ages can differ signficanty, and old fathers are considerably less implausible than old mothers.

The default values that various genealogy vendors supply vary a lot. The default maximum age of parents may be 45 or 50 years. I generally set the maximum age of the mother to 55, and that of the father to 75, to avoid too many false positives. The default minimum age may be 16, 15 or 14. You should set it too a low value like that, but as high as possible; there should only be a false positives.

age at partnership

Another common plausibility check is age at partnership, often called age at marriage.
Partnership and parenthood are strongly related, but their age distributions are not identical. People often form new partnerships after a break-up or the death of their partner, and keep doing so until death. The age distribution for first marriages is not very different from the age distribution for first children, but the age distribution of second and third marriages is quite different.
Too avoid too many false positives, I generally set the maximum marriage age to 85, 90 or even 95. Historically, people have married as young as 14 years old, but you can still choose to use 15 or 16 as the minimum to check against; try several values to see what works best for your database.

override warnings

Applications that provide plausibility checks often provide the ability to override the warning; check a box that tells that application to not show that warning again. Simply checking every box will clear all the warnings, but defeats the purpose of the plausibility check. Exercise restraint and do not nonchalantly override warnings you don't like. That said, the ability to override warnings is useful; use it suppress warnings when you are sure the data is correct, so the consistency check does not keep bothering you with that issue, but let's you focus on other cases. For example, when you are sure that some individual is a centenarian, you can override the warning, and when you've done some for many of them, you can tighten the consistency check by reducing the maximum age it tests again.

To get the maximum benefit from consistency & plausibility checking, regularly export your database to perform checks using other applications.

limitations

Consistency & plausibility checking is no panacea. Consistency & plausibility checks will only highlight mistakes that lead to inconsistencies and implausibilities; other issues will remain undetected.

There is no standard for genealogical consistency & plausibility checking. Different applications do not all perform the same checks. To get the maximum benefit from consistency & plausibility checking, regularly export your database to perform checks using other applications.

Consistency & plausibility checks are rarely flawless. For example, when children are adopted by their grandparents, the consistency check may start complaining that the adoptive parents are too old, a complaint it should not give.

summary

Consistency checks should be performed regularly. That's obvious advice. Not so obvious, perhaps even counter-intuitive, is that, for best results, several checks should be turned off. Moreover, several default boundaries values should be tweaked.

The goal of consistency & plausibility checking is to find problems and fix them; and that's easiest to do when you've minimised distracting noise and false positives. Any checks that aren't performed during regular consistency checking can still be performed at other times.

Some issues may need attention before performing consistency checks; If you know your database contains many duplicates, you may want to fix that issue first, to perform consistency checking on much smaller, less duplicated database.

During regular consistency & plausibility checking, you should fix bad dates first, fix consistency issues second, examine plausibility issues third, and not bother with scandal checks at all.

first time

The first few times you perform checks, you'll likely to feel overwhelmed. Tweak the settings for best result, then focus on the most egregious errors and some easy fixes.
Consistency & plausibility checking depends on comparison of dates, so any unrecognised dates should be fixed first. Children & partnerships that are out of order are confusing and but easy to fix, so do not hesitate to give some priority to solving those issues. Ideally, after the first few consistency checks, there should be no unrecognised dates, and no children or partnerships should be out of order anymore.

regular checking

During regular consistency checking, you should fix bad dates first, fix consistency issues second, examine plausibility issues third, and not bother with scandal checks at all.
Deal with individual consistency checks before relationship consistency checks.
Use the ability to override warnings wisely.
Last but not least, regularly export your database to perform consistency checks with other applications.

links