Modern Software Experience

2011-06-04

genealogy software limits

Fan Value

genealogy software testing

Genealogy Software Performance Testing discussed how the genealogy software testing and comparison I have done in the past, although innovative at the time, is less than perfect, and suggested some properties that genealogy software tests should have.

Fan Value

The Fan Value introduced the fan value (FV), an easy to understand genealogy software capability metric with a genealogical meaning; an application with fan value 18 is an application that can handle a 18-generation ancestral tree.

32-bit Windows memory

Although 32 bits allows addressing 4 GB of RAM, ordinary Win32 applications are normally limited to addressing 2 GB of RAM, because the other addresses are reserved for use by Windows itself.

4-Gigabyte Tuning (4GT)

Windows can be told to allow 3GB for applications and reserve only 1 GB for the system. This feature is officially known as 4-gigabyte tuning (4GT) and popularly known as the /3GB switch.

However, to actually address more than 2 GB, the application has to be large-address aware and let the system known it is through the IMAGE_FILE_LARGE_ADDRESS_AWARE bit in its executable header. When it does, a properly tuned 32-bit Windows system allow its memory usage to increase to 3 GB, while continuing to reserve 1 GB of the address space for itself.

64-bit Windows does not only allow memory usage to increase to 3GB, but allows it to increase to close to 4GB.

Even large-address aware applications continue to be limited to 2 GB on most 32-bit Windows systems, simply because these 32-bit Windows systems default to reserving 2GB of the virtual address space for use by the system, leaving only a 2GB virtual address space for each applications.

Address Windowing Extensions (AWE)

Address Windowing Extensions (AWE) is a Windows technology that allows 32-bit applications to access more than 4 GB of virtual RAM.
The Address Windowing Extensions (AWE) feature of Windows is not directly based on the Physical Address Extension (PAE) feature of the CPU, but 32-bit Windows does need PAE to address more than 4 GB of physical RAM. Using AWE without PAE is possible, just not very practical performance-wise.

The Win32 application must use the AWE API to get access to up to 64 GB of RAM, but is only allowed to do so when it the Administrator has given the application the Lock Page in Memory privilege.
Some 32-bit server applications, such as Microsoft SQL Server 2000, use this technology, but it is much easier to create a 64-bit application instead.

My large is smaller than yours…

Quite a lot of genealogy software has rather limited capabilities. Back in 2009, My Large is Smaller than Yours discussed how vendors of such products redefine large to mean medium, small, tiny or even miniscule, just so they can claim to support large genealogies. Thus, when some vendor claims their product is capable of handling large genealogies, you really have to wonder what that means.
The new capability metric addresses this issue head-on. When some vendor claims their product is capable of handling large files, you can simply ask them what the fan value is.

FAN files

The GEDCOM fan value is determined using FAN files; GEDCOM files created by the GedFan utility. The FAN files are perfect ancestral trees. Because these files get quite large, I am not distributing the files, but the GedFan utility that creates them.

GEDCOM validation

GEDCOM Validation discussed some tools for validation of GEDCOM output, and how these where used to validate GedFan's output.

Fan Values

Let's look at some fan values now, starting with the tools mentioned in GEDCOM Validation:

GedChck is the only MS-DOS application on this list.
All others are 32-bit Windows applications.

GedChk 0.9

FamilySearch GedChk is an MS-DOS application. Although it can address 1 MM (20 bits), MS-DOS is known as a 16-bit operating system. You might expect GedChk to fail on FAN17.GED because it cannot handle more than 65.535 (2^16-1) individuals. You might expect it to fail on FAN13.GED or FAN14.GED because it cannot handle more than 65.535 lines. You might expect it fail on FAN9.GED because it cannot handle more than 65.535 bytes large.
GedCheck actually process FAN23.GED just fine. Most MS-DOS applications use 32-bit values for file sizes. GedChk fails to open FAN24.GED, presumably because FAN24.GED is more than 4.294.967.295 (2^32-1) bytes large. GedChk's fan value is 23.

VGED 3.02

VGED does not support GEDCOM 5.5.1. VGED processes the FAN files as GEDCOM 5.5 files. That leads to VGED complaining about the GEDCOM 5.5.1 tag WWW as an unknown record, but that hardly limits its usefulness as a validator.
VGED allows you to check or uncheck some of its checks; for determination of its fan value, I choose the Check All button on the option dialog.
The largest FAN file that VGED handles is FAN19.GED, but it needs more than a gigabyte of RAM to do so, close to nine times the sizes of the FAN19.GED file itself. For FAN20.GED and larger files, VGED crashes, just before its memory usage as reported by the Windows Task Manager hits two gigabyte of RAM.
VGED's fan value is 19.

GedPad Build 101008

GedPad takes a few minutes to load FAN20.GED, and is using about one 1¾ gigabytes of RAM once it is done, that is more than seven times the size of the FAN20.GED file. However, GedPad apparently needs even more than that, more than the 2 GB a 32-bit Windows application is allowed. When I clicked its button to find the next parentless family, it threw up an Unexpected Program Error messagebox, that told me it had run out of memory. GedPad needs close to one gigabyte of RAM to do so, but handles FAN19.GED without problems.
GedPad's fan value is 19.

GEDCOM Explorer 2.1.1.5

GEDCOM Explorer has no problem dealing with FAN21.GED. It does use about one gigabyte of RAM, and you when you try to load FAN22.GED, it reports an out of memory error. Oddly, GEDCOM Explorer does not abort after the out of memory error, but it does not allow you to perform any check either; all the menu items are greyed out.
GEDCOM Explorer's fan value is 21.

Genealogica Grafica 1.18.3

Genealogica Grafica runs out of memory loading FAN21.GED. Genealogica Grafica will load FAN20.GED, but becomes unresponsive after reading it, while it trying to build an index of names to display. To succesfully pass the test, the application has to remain responsive while using it. Genealogica Grafica is unresponsive while loading the file, and really should show some progress dialog box while it builds it index, as this may takes several minutes, but once the index has been build, responsiveness is fine. Genealogica Grafica might be able to handle FAN20.GED, but when I tried to load FAN20.GED, Genealogica Grafica remained unresponsive for more than ten minutes, which is more than enough reason to consider it a hanging application and terminate it. Genealogica Grafica remained unresponsive for more than three minutes after reading FAN19.GED, and that's on a 3 GHz multi-core system. Unsurprising, it's search function was unresponsive as well. The largest FAN file for which the search function barely escaped Windows unresponsive detection is FAN17.GED, but for FAN17.GED, the unresponsiveness was both noticeable and annoying. For FAN16.GED, the search function responds just when you begin to be annoyed. Its HTML generation takes some time, but every other feature I tried remained responsive. However, when I set the tableau layout to 16 generations, Genealogica Grafica crashed during HTML generation.
Genealogica Grafica's fan value is less than 16.

Behold 0.99.21

Louis Kessler has tried the FAN files himself, and tweeted that Behold's fan value is 19. Sure enough, when you try to load FAN20.GED, Behold runs out of memory, and puts of a dialog box to tell you so. One of the options offered on that dialog box is to continue trying to use Behold, an option that may be handy when debugging, but should probably not be offered to end users. Behold loads FAN19.GED just fine, but uses more than a gigabyte of RAM. Expanding the Index of Names takes several seconds, but searching and navigating are just fine - and that is about all the functionality Behold currently offers.
Behold's fan value is 19.

out of memory

applicationFV
GedChk 0.923
VGED 3.0219
GedPad Build 10100819
GEDCOM Explorer 2.1.1.521
Genealogica Grafica 1.18.316-
Behold 0.99.2119

Several applications fail because they run out of memory, while there is plenty of unused RAM left in the system. All the aforementioned Windows applications are 32-bit Windows applications, Win32 applications, which are normally limited to 2 GB of virtual RAM.
Of these aforementioned applications, Genealogica Grafica was the only application that failed because of unresponsiveness. Working with the FAN16.GED file, which is just over 12 MB, Genealogica Grafica used more than half a gigabyte of RAM already, that is an expansion factor of more than forty, so it remains to be seen how much better it will do once the unresponsiveness issue is addressed. Still, of the various apps mentioned here, this one remains the one I recommend most, because of its excellent consistency checks.

The most remarkable conclusion is that all the 32-bit Windows applications were bested by an ancient MS-DOS application; GEDCOM Explorer managed a fan value of 23, while VGED, GedPad and Behold managed no more than 19 before running out of memory. However, GedChk does not do much than some basic consistency checks on top of the GEDCOM syntax check; the Windows applications offer a lot more functionality.

Several applications could improve their fan value a bit through more optimal use of memory, but it will probably take a 64-bit application to best GedChk.

links