Modern Software Experience

2011-05-08

better genealogy software testing

two files

For several years now, I've created genealogy software reviews that focus on basic features and capabilities of genealogy applications, including their ability to handle GEDCOM files.
After the first few reviews, I settled on two GEDCOM files that I kept using; a small one and a large one. The small GEDCOM is about 1 MB in size, the large one contains about 100.000 individuals. The 1 MB GEDCOM allowed me to get a feel for the application using a small file, while the 100 k INDI GEDCOM allowed evaluating its capabilities and performance with a large file. Using the same two GEDCOM files for every test allowed the results for different applications to be compared with each other.

comparisons

I collected results in GEDCOM Import Speed, thus creating the first comparative performance review of genealogy applications. I often drew from that overview when reviewing another genealogy application, noting that an application was faster or slower than another one.
The performance difference between genealogy applications are huge. Some applications import the 100 k INDI GEDCOM in few seconds, other applications require hours, yet still fail to import it.
The title of the overview is a bit misleading. There is more in GEDCOM Import Speed than just GEDCOM import speed measurements. The reviews discuss the quality of the GEDCOM support, and the overview provides summaries of my observation on that quality.

reviews

All this was very nice, informative and innovative, but it was not perfect. One issue is that it was so much work that I am still the only one to discuss GEDCOM quality and application performance in genealogy software reviews as a matter of course. That really has to change; GEDCOM quality and application performance are important issues that should not be glossed over. Reviews have to cover these issues, consumers should demand that it be covered.
Vendors read reviews, and do tend to focus their attention on the things the reviewers cover. For example, if reviewers regularly mention the number of different reports supported, vendors are likely to provide more reports. If reviewers regularly mention GEDCOM quality and application performance, then vendors are likely to spend their resources improving those.

I'd love to make things a bit easier for other reviewers by giving them my 1MB and 100 k INDI GEDCOM, but I cannot…

From time to time, a vendor would request the files I tested their product with, so that they could test their product with the same files. Invariably, my answer had to be that they cannot have these files.

not public

From time to time, a vendor would request the files I tested their product with, so that they could test their product with the same files. Invariably, my answer had to be that they cannot have these files.
Perhaps even worse, considering the popularity of online genealogy, was that I could not upload these files to any web site, unless I was sure that I could keep my database private and delete it again.

I've been collecting GEDCOM files for years. The small file is from my collection. I picked that file because of its size, and it turned out to be an interesting choice, as it certainly isn't a perfect GEDCOM. It's in my collection, but I can not not share it with a third party, because it still isn't mine to share. I'd love to point vendors to original web site to download it themselves, just like I often refer vendors to the Good, Engle, Hanks Family GEDCOM, when they ask for a large GEDCOM file to test with, but the 1 MB GEDCOM does not seem to be available for download anymore.

The 100 k INDI GEDCOM is mine. Back when my database reached a bit more than 100.000 individuals, I decided to save a copy, thinking that it might a good database to test with. Alas, that it is mine still does not mean I can share it. The database contains details on living on people which I am not allowed to share. I could remove those details, but then it wouldn't be the same database anymore. I could mess up the data for the living individuals, but I do not want to be responsible for distributing deliberately erroneous information.
Besides, as my research continued, I discovered plenty of mistakes already. So even if there were no privacy issues, I still would not want to distribute the old database.

different PC

I did a bunch of a tests on the same PC running Windows XP, but I inevitably bought a new one. It was an off-the-shelf PC running Windows Vista. Yet a few years later I am enjoying my current made-to-order PC running Windows Vista 64-bits.
These different PC have different performance characteristics, so numbers from different PCs are not directly comparable.

The numbers on my PC weren't directly comparable with numbers on your PC to begin with, but I always included numbers for several well-known and free applications, so that it was not too difficult to get some idea how an application would perform on your PC. Still, as my numbers are no longer all for the same PC, comparison isn't as straightforward as it used to be.

test yourself

If you had the test files, you could try the tests yourself.
The ability to test applications without buying them has improved. My repeated statements that users should be able to try the GEDCOM import with their own data and be able to examine the quality of the GEDCOM output before buying the product has been an important factor in more vendors releasing free editions that include both GEDCOM import and export. You can now try most well-known genealogy applications for free.

You can and should test application with your own data, but it does not hurt to have some numbers for a standard test. Once you know how well some well-known applications fare in these test on your PC, third party numbers for the same test will give you a reasonable idea how well other applications will do.

standard computer?

One way to standardise tests is to define some kind of reference platform, a standard computer, to perform the test on; when everybody uses the same computer to perform tests, results are directly comparable. There is more than one problem with that idea. There is enormous variation in computers that people use for genealogy; it is not just desktop computers, but tablets and smartphones as well. The differences between these are significant. Besides, even it were possible to decide on a reference platform, it would be is sure to become outdated.

The practical approach is to accept that there are many different computers, and create a standardised test, one that can be run on any computer and will not become obsolete with advances in hardware, but be able to demonstrate these advances (i.e. better numbers on better hardware). That way, the results of recently performed test on modern hardware is likely to be relevant to your situation.

Even badly designed genealogy applications perform well for a small genealogy of just a few thousand individuals. Only well-designed genealogy application perform well with large genealogies.

size

The sizes of the two tests files, one file that is 1 MB large and one that contains 100.000 individuals, were chosen to allow easy mental calculations.

Ideally, all tests would be done with your file, or, more realistically, a file that's about the same size. The nearly rounded numbers were chosen to allow quick mental calculations to arrive at a ballpark figure. These numbers aren't more than ballpark figures, because performance isn't a linear function of size.

It is precisely because performance isn't linear with database size that it is important to test application with a large database. Creating an application that handles a small genealogy well isn't hard; today's multi-gigahertz multi-core PCs will hide performances issues. Even badly designed genealogy applications perform well for a small genealogy of just a few thousand individuals. Only well-designed genealogy application perform well with large genealogies.

You hardly need more than two files to gauge performance when an application performs well; that it performs well is all you really need to know. Things aren't as simple when the application doesn't perform so well, and several applications are not just slow, but even fail import the larger file. I have at times used a medium-size genealogy of about 25 thousand individuals to get some additional impressions. Failure to import is a function of database size; at some size, the application runs out of memory. The size the application fails at is tell you how efficient or inefficient it is with memory. Knowing that an application cannot import large files is good to know, but it is better to have some idea just what its limits are.

Failure isn't an option. Failure is a fact of life.

When I was using two test files, the failure to import the large file was a major annoyance. It not only meant that I had to try some medium sized file, it also meant that I had no import time to compare with other applications.
Failure isn't an option. Failure is a fact of life. The solution to this issue is to accept the failure, in fact make it part of the performance test; that the application fails to import larger files isn't a problem, but the desired result. We don't try to figure out how much memory the application needs for one particular file, but turn the question around, and try get an idea what the largest size it can handle with the available memory. We use a range of files, increasing in size, and figure what the largest is that the application can handle.
Ideally, we would have a series of files, each about twice as large as the previous one, to test with.

different tests

It is important to remember that you need different tests for different things.
The GEDCOM torture test files I've shared already aren't performance tests. The torture tests files are tiny files containing genealogical extremes, to test an application's ability to handle those extremes. A performance test is neither a stress test, nor a GEDCOM compliance test. File for a performance test may be an unusual size, but should avoid unusual content.

ideal

Ideally, performance tests are done with your data on your computer, but that is not a very practical proposition. A real-world performance test done in such a way that it gives you a fair idea what performance with your data on your computer would be like.

Ideally, a performance test should be done according to some standard. It does not seem feasible to define a meaningful reference computer. The pragmatic approach is to create some reference test that can be performed on any computer. That way, the wide variety in computers can become a strength instead of a weakness; allowing tests on all computers increases the chance that some test results are for a computer similar to yours.

The test should not depend on particular hardware, but run on any computer, and not become obsolete with advances in hardware, but rather be a way to demonstrate these advances; better numbers for better hardware.

The test should not favour any specific platform or application. The test should not depend on private databases. All test files should be publicly available, so that anyone can verify results. Tests should be royalty-free, so that anyone can publish tests results.

The test should probably be implemented using GEDCOM, but should not depend on it. The initial test implementation may depend on GEDCOM, but the test itself should be portable to a successor technologies.

implementation

I've implemented these ideas on genealogy software testing. I have created a new and fairly simple synthetic genealogy performance test that does not depend on any private database. I will publish that test, so that everyone can use it.

There is one problem though; distributing this test as a bunch of GEDCOM files is not practical, because some of the files are, quite naturally, rather large.
GEDCOM files are space-inefficient and a tad repetitive, so they compress very well, yet I cannot distribute the files in a ZIP archive. The largest file I have now is more than 4 GB, the maximum size supported by the ZIP format. I could opt for ZIP64 or some other compression format, but the resulting archive would still be hundreds of megabytes.
That may not be too large to download, but it is too large to host. Just a few dozen downloads of such a file equals tens of gigabytes already, and that would be a rather noticeable dent in my bandwidth budget.

To save on bandwidth, I've decided to distribute the utility that generates the GEDCOM files instead.
It's called GedFan.

links