Modern Software Experience

2008-08-16

Confucius Challenge

import this

The Confucius Challenge is the challenge to handle the database of Confucius’s descendants.
Alas, most current genealogical software is not up to the Confucius challenge at all.

Part 1 of the Confucius Challenge, and the first step toward completing that challenge is to import the database of Confucius’ descendants. That would be a GEDCOM file containing more than 2 million INDI records.

impractical

Simply trying to import that file into a random genealogy program is likely to fail. Without earlier tests to go by, it is hard to say how long the program will take to succeed or fail. Testing a bunch of programs this way could take many weeks. It just isn’t a very practical approach.
Then again, if you already knew that the program imports a quarter million INDI records in fifteen minutes, you can reasonable expect the import to take more than hour, and if you know it failed to import 100.000 INDI records after hogging the CPU for two days, you’d know better than to start the test.

cascade of challenges

A practical way to perform such test is build a cascade of increasingly difficult challenges. You then demand that each challenge must be completed successfully and in a reasonable amount of time. Completing of one challenge leads to the next challenge in the cascade. Failure to complete a challenge a ends the cascade; the program would not be able to handle the next challenge anyway, and certainly would not be able to handle the last challenge in the cascade.

practical

Using a cascade of challenges is a practical approach. Using a cascade of challenges allows you pick challenges that will complete in a reasonable time, and avoid wasting time on a program that can’t hack it anyway.

how to build a cascade

Building a cascade is easy. Just pick an series of challenges in which each subsequent challenge is a substantial increase from the previous challenge, but not too substantial. It depends on what you are testing, why you are testing, how much patience you have.

A fairly obvious series is powers of ten; 1, 10, 100, 1000, etcetera. It is instantly recognisable. For many computer-based challenges, powers of two may be a better fit; 1, 2, 4, 8, 16, 32, 64, etcetera.

The cascade itself does not matter much. It is just a practical approach to the final test. The cascade exists only to avoid wasting time on tests are not going to succeed anyway.

The cascade for the Confucius Confucius 2008 was built using files I already had. Again, a practical decision. By the way, that cascade actually lacks the final file challenge the cascade was constructed for...

Confucius Cascade

The Confucius Cascade is a cascade of increasing difficult import challenges that culminates with the Confucius database itself. Thus, the Confucius Cascade a practical approach to testing a program’s ability to complete the import part of the Confucius Challenge.

The cascade I constructed for the Confucius Cup 2008 looks like like this:

file brief description
1MB size of roughly one 1 MB
100k count of roughly 100.000 INDI records
100MB size of roughly 100 MB
2M a count of roughly 2 million INDI records

This cascade happens to alternate between file size and number of INDI records as the defining factor, but that is not necessary. It would be quite reasonable to put a medium sized file of say 25.000 INDI records in between the small 1MB and large 100k test. I picked these files because I am using a 1 MB and 100k file in my reviews already, and have a 100 MB and 2M file to continue testing with. You can build your own cascade any way you like.

 links