Modern Software Experience

2017-12-02

A Technical but Important Issue

capable database

Your genealogy software needs a capable database; a database capable of handing your genealogy.
What is the Best Genealogy Program already mentioned that your software should support Unicode, and, as a rule, and that means the database must support Unicode. As a rule with very few exceptions, most modern genealogy programs are Unicode-based.
Genealogy Software Limits already mentioned that the software (and its database) should not have low arbitrary limits, on for example the length of names, number of individuals, or total database size. As a general rule, low arbitrary limits are a misfeature of early, 16-bit genealogy software, and the limits of typical 32-bit programs are hardly ever limiting. Most genealogy software sold today is 32-bit or 64-bit software.
Deliberately Limited Genealogy Programs pointed out that some vendors add an anti-genealogical misfeature, deliberately blocking you from entering same-sex marriages. Few vendors still do so. Most genealogy software sold today allows you to enter same-sex marriages.

Your main genealogy editor should use a real database system.

These is another database issue to consider when selecting a genealogy program. Your main genealogy editor should use a real database system. You particularly need to avoid genealogy software that uses GEDCOM as their database format.

genealogy database

Genealogy applications keep your genealogy data in some database, your genealogy database. There is no standard file format for genealogy databases. Different genealogy applications by different vendors use different file formats, just like different word processors by different vendors use different file formats. Each vendor makes up their own file format.

The file format used by genealogy applications does not only vary from one product to another, but even from one version to another. Different versions of the same product often use slightly different file formats, and sometimes wildly different ones. Vendors often make small and sometimes big changes to their file format for new versions of their product, to support new features.

database engine

Genealogy applications are database applications; very simply put, a genealogy application is a user interface on top of a database. Most vendors of database applications do not create their own database from scratch, but rely on proven database technology. Most vendors use a so-called database engine, a ready-made database system that they can use within their program. This database engine provides all the general database stuff, so that the programmer need not worry about that, but can concentrate on application-specific stuff.

The database engine a vendor chooses for their product is influenced by many factors, including the operating system the application runs on, and the programming language used to build the application. For example, many Visual Basic applications use an Access database, FileMaker is popular on MacOS, many programs written in C use something like SQLite, while many web applications use MySQL.
Each of these choices has advantages or disadvantages that affect the capabilities and performance of the end product. Generally speaking, you need not worry about what database engine a genealogy program uses. Practically any choice is fine, as even the largest genealogy - the some two million descendants of Buddha - is fairly small compared to what most database engines can handle.
There are exceptions of course. Some older database systems, particularly some that enjoyed early popularity with MS-DOS and Windows Windows programmers have serious limitations. Two such systems, that were popular with developers of genealogy software, were FoxPro and Access.

limitations

FoxPro: no Unicode

FoxPro is a programming environment aimed at beginning programmers that made it fairly easy to create database applications for the PC. When Microsoft decided to stop supporting FoxPro back in 2007, developers using FoxPro were practically forced to switch to something new, and many switched to Microsoft Access.
A the time, some popular genealogy programs based on FoxPro, such as CommSoft Roots II, IV and V, and Palladium Interactive’s Ultimate Family Tree, had already been discontinued, and most users of these programs had already switched to something else.
The few application vendors who stuck with FoxPro, were limited by an obsolete product that lacks Unicode support. One such vendor, Wholly Genes Software, did not move beyond FoxPro, and ultimately discontinued their product, The Master Genealogist, in 2014.

Access: only 2GB

Microsoft FoxPro is practically forgotten. Microsoft Access remains very popular.
Microsoft Access supports Unicode since Access 2000, but even the latest version of Microsoft Access (Access 2016) limits the total size of a database to 2 gigabytes. That may sound like plenty of room, but whether it actually is plenty depends on how smartly an application program uses that space; a nonchalantly written application can gobble up space like crazy.
One still popular genealogy application that uses an Access database is Millennia's Legacy Family Tree. A few years ago, I discovered the hard way that my genealogy database is so large, and Legacy Family Tree's GEDCOM import so inefficient, that Legacy manages to runs out of database space trying to import my GEDCOM file into an empty database...
Microsoft's position on Access' database size limitation is simple; if you need more space than Access can provide, you should not be using Microsoft Access, but Microsoft SQL Server. The free Microsoft SQL Server Express Edition allows databases up to 10 GB, and that provides plenty of breathing room if 2GB is just a little cramped, as well as an easy migration path to a more powerful edition of SQL Server, or some competing database system.

reasons

There are very good reasons why application developers rely on database engines. There are technical reasons such a transaction integrity, but that is developer jargon, and it is possible to express the reasons in plain English.
Simply put, using a database engine makes writing database applications a lot easier, makes the resulting applications very reliable, and typically, even makes those applications a lot faster than they would be without the database engine.

Among the reasons to use a ready-made database system is that it allows efficient updating of individuals records. The practical upshot of that is that, when you use a typical database applications, you never have to save your work; every record you edit is saved as soon as you move on to something else.

data transfer

Every vendor defines their own database file format, typically by choosing a database engine and defining their database layout. That each vendor can and does define their own file format is a Good Thing; it allows them pick a file format that fits their applications and its features. There is a drawback for users; it may may lead to vendor lock-in; the inability to change to any other product, because your data is locked into their database format, with no obvious way to get it out.
That's why Choosing your First Genealogy Program mentioned The Most Important Genealogy Software Feature; the ability to get your data in and get it out again. Today's practical translation of that conceptual demand is that your primary genealogy editor should import from and export to the GEDCOM format, the GEDCOM 5.5.1 format to be precise.

GEDCOM is a data transfer format, a file format specifically designed to transfer data between disparate systems. Each vendor can can use whatever database format they like for their genealogy application, as long it supports import from and export to GEDCOM files, you can transfer your data from and to other genealogy applications.
Well, that's the idea and it works pretty well, because practically every genealogy application supports GEDCOM. GEDCOM is not perfect, and vendors do not always make it a top priority to make their GEDCOM export work well, but that's another topic.

GEDCOM files are not databases.

plain text

GEDCOM files are not databases. GEDCOM files are text files. To be more precise, GEDCOM files are plain text files; GEDCOM files contain nothing but text, without any layout or formatting information. You can load a GEDCOM file into an editor such as Windows NotePad to see what it looks like; it's a text file consisting of many lines, most of them quite short, with each line containing a bit of information.

Using GEDCOM as a database format is a serious design blunder, and one that tends to create serious problems.

GEDCOM-as-a-database

Most genealogy application developers use a ready-made database engine, some developers create their own custom database system, and then there are developers who use GEDCOM files as a database...

Using GEDCOM as a database format is a serious design blunder, and one that tends to create serious problems. That is not a shortcoming of GEDCOM; GEDCOM was designed as a data transfer format, for transfer of data between genealogy database, not as a genealogy database itself.

You should make sure that your main genealogy editor will import and GEDCOM files, but actively avoid genealogy programs that use GEDCOM as their database format.
It is rarely hard to recognise these products; most vendors that make this blunder are lone developers so convinced that they had an absolutely brilliant idea, that they proudly list this misfeature as a selling point...

flat text

GEDCOM files are not databases. GEDCOM files are text files. To be more precise, GEDCOM files are plain text files; GEDCOM files contain nothing but text, without any layout or formatting information. You can load a GEDCOM file into an editor such as Windows NotePad to see what it looks like; it's a text file consisting of many lines, most of them quite short, with each line containing a bit of information.

GEDCOM files contain data, but that does not make them databases. GEDCOM files are flat files; files without any readily usable file structure.
The data in a GEDCOM file corresponds to a structure (your genealogy), but the only way to discover that structure is to read and interpret the entire file; and that's exactly what your genealogy application does when it imports your GEDCOM file into a database.

GEDCOM dialects

Most programmers who decided to use GEDCOM as a database probably did so believing that an application that uses GEDCOM as a database does not need an GEDCOM import or GEDCOM export feature. That would be true if GEDCOM were a very strict, well-defined standard, but it is not. Some parts of GEDCOM are open to interpretation, other are misunderstood, some features are supported by many vendors, other by only a few, and GEDCOM specifically allows vendors to create their own GEDCOM extensions. In practice, each application has it own GEDCOM dialect, and genealogy applications need their GEDCOM import and export to deal with the differences between the dialects.

To update even a single record anywhere within a GEDCOM file, a program must read the entire GEDCOM file, make the change, and then write the entire GEDCOM file.

update speed

A database file has a readily usable file structure (blocks, tables, indexes, etcetera) designed to allow quick and efficient updating of any individual records anywhere with the database, a GEDCOM file does not. GEDCOM files are flat plain text files. To update even a single record anywhere within a GEDCOM file, a program must read the entire GEDCOM file, make the change, and then write the entire GEDCOM file. That is both slow and error-prone.

auto-save

The typical genealogy application is a real database application that automatically saves every edit. A genealogy application that abuses GEDCOM as its database format cannot do that. GEDCOM-as-a-database applications essentially expect the user to save their changes, and will lose all changes you made when they crash.
Because of this, the vendor may add an auto-save feature, but that only solves the problem if it is done right. With GEDCOM-as-a-database, a straightforward auto-save is not good enough.

With GEDCOM-as-a-database, a straightforward auto-save, especially a slow one, is dangerous.

safe save

With GEDCOM-as-a-database, a straightforward auto-save, especially a slow one, is dangerous.
That technical term, transaction integrity simply means that a change gets either done or not done, never half-way done, not even when the application crashes during a change; upon restart, the application will either complete the change or roll back the parts that were already done. That's important, because many user edits require changes to multiple records and indexes at once, and you really like your database to remain consistent.

When an application uses GEDCOM as a database, it saves changes by rewriting the entire GEDCOM file; the old GEDCOM file gets overwritten by the new one. If the application crashes during the save of the new GEDCOM file, you may find that your entire database is gone; the old GEDCOM file had already been deleted, but the new one had not been written yet...

Applications that use GEDCOM as their database, must provide their own transaction integrity; they have to save changes by first saving your data to a new GEDCOM file, then renaming the GEDCOM files, and finally deleting the old GEDCOM file. Sadly, a programmer who makes the GEDCOM-as-a-database mistake is not likely to do this until after several users lost their data.

examples

It's Our Tree Home Edition

Verwandt, also known as DynasTree and It's Our Tree, was a online family tree system in several languages. Before MyHeritage bought Verwandt in 2010, they released a desktop program they called It's Our Tree Home Edition, and later DynasTree Home Edition. It's Our Tree Home Edition (IOTHE) is a desktop genealogy application that Verwandt offered as a companion to their website, one that only reads and writes GEDCOM files. Its GEDCOM load and save are slow, and the save simply overwrites the existing GEDCOM file, so you stand a real risk of losing your file...
The real kicker is that IOTHE isn't some program Verwandt created, but that is a stripped down edition of Ahnenblatt; it's Ahnenblatt Anorexic. What Verwandt's IOTHE presents as GEDCOM load and save are really Ahnenblatt's GEDCOM import and export.

The full version of Ahnenblatt saves to and loads from Ahnenblatt files, and that full version has always been free. Back then, Ahnenblatt's GEDCOM support was quite slow, but slow GEDCOM import and export isn't a big deal as long as the regular load and save from its own format is fast. Ahnenblatt's regular load and save from its own format is plenty fast, and the GEDCOM import and export are faster now then they were back then. It's Our Tree Home Edition (IOTHE) is dangerous trash, but Ahnenblatt is free desktop genealogy application worth checking out.

MagiKey Family Tree

MagiKey Family Tree, later renamed to MagiTree, is a commercial product of The MagiKey, introduced in 2010. Its most interesting feature is the census tracker, and it is now being sold as MagiCensus. MagiKey uses GEDCOM as its database format. There is an auto-save feature, but it cannot not make up for MagiKey's less than stellar GEDCOM support. MagiKey's GEDCOM import tends to hang, with MagiKey becoming not responding, and demanding the next disk for no good reason. MagiKey is a highly unstable product with a bad design. In fact, MagiKey is so crash-prone that it often crashes when trying to open a GEDCOM file, even if that's a MagiKey GEDCOM file...
That this poor product got a FamilySearch Certified logo only confirms that FamilySearch certification is a certifiable idea.

It is a little known fact that many versions of MyHeritage Family Tree Builder suffer from the GEDCOM-as-a-database mistake.

MyHeritage Family Tree Builder

It is a little known fact that many versions of MyHeritage Family Tree Builder suffer from the GEDCOM-as-a-database mistake. It is little-known because MyHeritage never advertised the fact that Family Tree Builder uses GEDCOM as a database, but actively tried to hide that fact, by zipping the GEDCOM file, a ZEDCOM file. Family Tree Builder use of a zipped GEDCOM file is even worse than use of a GEDCOM file; the unzipping and zipping increases the time and memory needed for loading and saving the file.
All pre-2016 versions of Family Tree Builder suffer various issues, including slow operations, excessive memory usage, and even messed up databases, that all have the ill-considered use of ZEDCOM as the database format as their root cause. MyHeritage addressed the root cause of these with the release of Family Tree Builder 8.0 early in 2016. Family Tree Builder 8.0 and later use a real database system.

Calico Pie's Family Historian is an exception to the rule.

Calico Pie Family Historian

The best known genealogy application to use GEDCOM as its database format is Calico Pie's Family Historian, and Calico Pie's Family Historian is an exception to the rule. Developer Simon Orde knows what he is doing; he knows about databases, but deliberately does not use a database system because he considers them a bad fit for genealogy, and he is right about that, genealogy applications should really be using objectbases instead of databases. Alas, Family Historian does not use an objectbase but a GEDCOM file, and using a GEDCOM file instead of a database brings all the GEDCOM-as-a-database issues with it, but all these issues are handled well. Family Historian supports GEDCOM import and export to deal with differences between its and other GEDCOM dialects. Family Historian loads and saves its GEDCOM files quite fast, often faster than other desktop programs load and save their own database format. Auto-save works well, and does not get in the way, not even with large databases. Family Historian does not only use a temporary file to avoid losing data on a crash, but keeps multiple older copies, so-called snapshots, around, and allows you to switch back to an old version.
In the case of Family Historian, the use of GEDCOM as a database isn't a mistake, but a conscious choice by a developer who knows what he's doing. Family Historian is a fast and stable genealogy application that handles large genealogy databases with ease.

general rule

A genealogy editor should use its own database format, with import from and export to GEDCOM. The only editors that should GEDCOM as their file format are GEDCOM editors, such as GedPad. The rule hardly applies to genealogy viewers, applications that only import your database, but never modify or export it. it is perfectly fine for something like GED2HTML, a utility that produces HTML pages, to import GEDCOM only and not have its own database format at all.

As a general rule, you should avoid genealogy editors that use GEDCOM as their database format. Calico Pie's Family Historian is an exception to that rule.

links

general background

products