Modern Software Experience

2012-08-21

WikiTree Logo

third party fix

WikiTree

WikiTree is one of many online genealogy apps. More to the point, is one of the relatively small groups of sites that does not let you edit your tree, but maintains a shared tree instead.

WikiTree's most distinguishing features are its per-profile privacy controls and its honour code, which was inspired by my Geniology article.

WikiTree started in 2008. I created my WikiTree account back in 2009, and manually created a few profiles, but I prefer desktop applications.
I never uploaded my data. WikiTree can't handle large GEDCOM files, and even it could, I still should not have done so. WikiTree does not want users to upload large GEDCOMs, but only small ones, a few thousand individuals at most. WikiTree does not want you upload lots of data, WikiTree wants you to be involved with the data you do upload.

GEDCOM Export

I never used WikiTree's GEDCOM import. I did challenge Chris Whitten to determine WikiTree's fan value, but he has not taken that challenge yet. As WikiTree is not interested in importing large GEDCOM anyway, a small fan value of 10 or 12 would be plenty, but I recently discovered that it's fan value is zero.
I had no reason to look at WikiTree's GEDCOM import, and until a few days ago, I had no reason to look at their GEDCOM export either.

GEDCOM Fixer 0.2

A few days ago I came across Virtual Productions' GEDCOM Fixer 0.2. This is a utility that fixes WikiTree GEDCOM files in preparation for reading by GedStar's desktop utility.
Now, GedStar's GEDCOM import probably isn't perfect, but I never noticed either the current GedStar Pro product for Android or its predecessor, GedStar for Palm OS, to be very picky. The main issue with the GedStar GEDCOM importer is that becomes remarkably slow for large GEDCOM files.
This GEDCOM Fixer 0.2 isn't about fixing GedStar's GEDCOM import limitations, it is about fixing the quality of WikiTree's GEDCOM export.

Many genealogy applications produce less than perfect GEDCOM files, or rather not-quite-GEDCOM files, yet there still are very few genealogy applications that produce something so atrocious that a third party decided to develop a GEDCOM fixer. That Virtual Productions is developing one for WikiTree is a dubious honour.

GEDCOM fixers

The application most infamous for producing bad GEDCOM files is Family Tree Maker Classic. Many versions of Family Tree Maker for Windows (FTW) actually default to producing FTW TEXT files instead of FTW GEDCOM files, and that naturally creates problems, but the FTW GEDCOM dialect is not without issues either.
The FTW GEDCOM article mentions several third-party utilities developed to deal with FTW defects and limitations. Ken Morse's FixDates utility does not fix FTW's GEDCOM output, but prepares third-party GEDCOM files for input into FTW, which has less flexible date support than most genealogy applications.
Rick Parsons's FTWGEDfx does fix FTW GEDCOM files, as does Jane Taubman's GedFix for Family Tree Maker.
The old Family Tree Maker for Windows may be the poster child for bad GEDCOM output, it is not the only application for which fixers were developed. If you follow the link to Jane Taubman's GedFix, you'll see that she is not only offering GedFix for Family Tree Maker, but GedFix for Legacy 4 and GedFix for Generations 8.5 as well.

0 HEAD
1 SOUR WikiTree.com
2 NAME WikiTree: The Family Tree Wiki
2 CORP Interesting.com, Inc.
1 DATE 18 Aug 2012
2 TIME 10:30:16 EDT
1 CHAR UTF-8
1 FILE JONES-77-b2c78d.ged
1 COPR Interesting.com, Inc. and Tamura Jones
1 SUBM Tamura Jones
1 GEDC
2 VERS 5.5
2 FORM LINEAGE-LINKED
1 NOTE This file contains private information
and may not be redistributed, published, or made public.

WikiTree GEDCOM

The fact that someone is developing a GEDCOM fixer for WikiTree piqued my interest. I edited my WikiTree tree a bit, and then exported a GEDCOM file to have a look at it. You choose to export your data, and that's it, there are no export options. I had my GEDCOM file a few seconds later.

UTF-8: GEDCOM 5.5.1

I immediately noticed something wrong with the WikiTree GEDCOM file; it claims to be an UTF-8 GEDCOM 5.5 file. There is no such thing as an UTF-8 GEDCOM 5.5 file, GEDCOM 5.5 does not support UTF-8, support for that Unicode encoding was only introduced in GEDCOM 5.5.1.
WikiTree needs to fix the GEDCOM version number to read 5.5.1.

fan value

That is a small and easy things to fix, but until it is done, the WikiTree GEDCOM header is invalid, and when the header is invalid, the GEDCOM file is invalid. Because WikiTree does not export valid GEDCOM files, WikiTree's fan value is zero.
FamilySearch's PAF still makes the same mistake when exporting UTF-8 GEDCOM, but that does not mean its fan value is zero too; PAF provides the option to export to other encodings that are legal in GEDCOM 5.5.

My name isn't a valid GEDCOM cross-reference.

submitter

That isn't the only problem with the header. For example, it appear to list me as the submitter, but it does not do so correctly, It should use a separate SUBM record (completely missing), and the HEAD.SUBM value should be a cross-reference to that SUBM record. My name isn't a valid GEDCOM cross-reference, so that is another GEDCOM header error.

email

There is also the rather silly issue of using an _EMAIL tag. That _EMAIL tag starts with an underscore, so it is a perfectly legal GEDCOM extension. The use of that GEDCOM extension even seems reasonable because GEDCOM 5.5 does not include the EMAIL tag. However, GEDCOM 5.5.1 does include the EMAIL tag.

copyright

It is interesting that Interesting.com is claiming joint copyright of data I created on their system. In general, the data in the GEDCOM file will have been created and modified by multiple WikiTree users. I am not entirely sure what Interesting.com is trying to claim here, or whether their claim affects your right to import the data into a competing shared web tree. American law isn't my forte, so I'll leave the copyright claim for the lawyers to worry over.

name

It is not just the GEDCOM header that contains errors.

0 @I1@ INDI
1 NAME Tamura /Jones/
1 SURN Jones
...

The fragment on the left shows the first few lines of the INDI record that WikiTree created for my profile.

At first blush it seems fine; it starts by listing the full name, and then list the name parts. However, look again. Oddly, the WikiTree GEDCOM lists the SURN (surname) part but not the GIVN (given name) part. Moreover, it lists the SURN tag as level 1, while it should be level 2; there should be an INDI.NAME.SURN value, this GEDCOM file presents an INDI.SURN value instead.
It is unreasonable to expect any GEDCOM reader to make sense of that. A good GEDCOM reader will report each INDI.SURN occurrence as an error. The reason that most GEDCOM readers still manage to split most names correctly is that the NAME value does contain slashes to mark the surname.

validator

Browsing through the small WikiTree GEDCOM file, I got the impression that they mean well. but never bothered to have any expert look at it, or run a WikiTree GEDCOM file through a GEDCOM validator.

more

The things I mentioned are just some of the issues I noticed. There are more issues, and not all issues have to do with the validity of the GEDCOM file.

WikiTree should not export a biography unless some user actually entered a biography.

biography

One issue that struck me is that WikiTree always exports a default biography for each individual.
On WikiTree, each individual has it own profile, on its own web page. That profile creates a biography, and there is some default text; as soon as you create a profile, WikiTree enters default text in the biography field. That default text get exported like this:

1 _BIO
2 TEXT CONT == Biography ==
3 CONT 
3 CONT John Doe ...  <ref>Entered by Tamura Jones, Aug 18, 2012</ref>
3 CONT 
3 CONC ''No more info is currently available for John Doe. Can you add to 
3 CONC his biography?''
3 CONT 
3 CONT == Sources ==
3 CONT 
3 CONC * [[Jones-77 | Tamura Jones]], firsthand knowledge. Click the Changes 
3 CONC tab for the details of edits by Tamura and others.
3 CONT 
3 CONT <references />
3 CONT 
3 CONC 
1 REFN 1234567
2 TYPE wikitree.user_id
1 REFN 7654321
2 TYPE wikitree.page_id

WikiTree is using the CONC and CONT tags correctly; CONC to concatenate lines, and CONT to continue text on a new line. However, when a line is broken on a space, that space should be on the second line, right after the CONC tag and the single space space that separates the CONC tag from its value.
That the first line of this starts with CONT is a small GEDCOM error; they surely intended to have an empty value for the TEXT tag itself, and immediately follow it with a CONC value.

Anyway, this form text does not occur in your WikiTree GEDCOM files just once, but once for every profile that was not edited it to say something else.
WikiTree should not export a biography unless some user actually entered a biography. If that default text is the part of the GEDCOM file Interesting.com wants to claim copyright on, they are welcome to it. To me, it is noise bloating the GEDCOM file.

WikiTree GEDCOM fix

The GEDCOM Fixer page lists the WikiTree GEDCOM problems it fixes. There is some overlap with the GEDCOM to-do list that WikiTreer-in-Chief Chris Whitten provided me with. Most of the more fundamental GEDCOM validity problems I mentioned were not on that list, but they are now.
I've been in contact with Chris and Daniël Bouman, the man behind Virtual Productions, and they are already contact with each other.
Chris does not want you to have to use GEDCOM Fixer, and is giving some priority to fixing the WikiTree GEDCOM export.

updates

2012-08-21 instant update: GEDCOM fixes

Several fixes have been made today.
The GEDCOM version number is 5.5.1 now. The proprietary _EMAIL and _URL tags have been replaced with the GEDCOM 5.5.1 EMAIL and WWW tags.
The WikiTree GEDCOM uses a SUBM record now. WikiTree GEDCOM uses both the GIVN and SURN tags within the NAME record now. The proprietary _BIO tag has been replaced with the standard NOTE tag.
These changes do not fix all issues, but WikiTree GEDCOM quality is getting attention now, and more improvements will follow.

2012-08-21 instant update: CONC space

Louis Kessler spotted that the CONC usage is not entirely correct.
Added remark on placements of space after CONC when line is broken at a space.

2012-08-30: Chronoplex GEDCOM Validator

Chronoplex just released and the Chronoplex GEDCOM Validator 1.0.0.0. The Chronoplex GEDCOM Validator article uses a fresh WikiTree GEDCOM to quickly compare Chronoplex GEDCOM Validator and VGed 3.04.

links