Modern Software Experience

2011-12-12

FamilySearch GEDCOM Alternative

GEDCOM alternatives

The de facto standard for exchange of genealogical data is GEDCOM, a file format created by FamilySearch, that they stopped maintaining more than a decade ago. That the GEDCOM specification has some serious design flaws is one reason that quite a few GEDCOM alternatives have been proposed over the years. One such alternative was FamilySearch's own GEDCOM XML, which they, quite confusingly, also referred to as GEDCOM 6.0. They introduced a draft specification, they introduced a beta and then they abandoned it again, just as they had abandoned GEDCOM itself.

New FamilySearch

FamilySearch has been neglecting its responsibilities as keeper of the GEDCOM standard for many years. In recent years, FamilySearch employee Gordon Clarke, who was at the centre of their GEDCOM and GEDCOM XML failures, and is now at the centre of their Geni.com-like New FamilySearch (NFS) project, has more than once been quoted as making the nonsensical claim that NFS is the replacement for GEDCOM.

First of all, NFS has been coming Real Soon Now for about a decade…
It has been in extensive alpha (LDS members are allowed access) for years, and only went into limited private beta early this year. It still is not even in public beta yet, so to suggest that it is available as an alternative for anything is ridiculous already.
Even if NFS were available and working perfectly, it still wouldn't be an alternative to the GEDCOM file format. A common file format allows direct exchange of data between users. Having to exchange data through a system like NFS not only complicates things by introducing a unnecessary third party and a performance bottleneck, and probably introduce some transfer limitations of its own, the real issue is that they will not merely facilitate the transfer, but will also hang on to all the data you upload to their system.

FamilySearch SORD

These issues prompted the creation of the BetterGEDCOM project late in 2010. Early this year, at their new RootsTech conference, FamilySearch let it slip that they are working on FamilySearch SORD. Their was talk about a data model, a file format and source code, but nothing concrete was announced or provided. The only thing that became really clear is that FamilySearch SORD is a project distinct from NFS.

New GEDCOM

FamilySearch intends to make RootsTech an annual conference. In September, the preliminary schedule for RootsTech 2012 showed the FamilySearch talks New gedcom (how to produce and consume it) and New gedcom (what it is, what's it's scope, how is the project managed and maintained?). Apparently, FamilySearch is planning to officially introduce New GEDCOM at RootsTech 2012.

The current RootsTech 2012 schedule does not just include slightly changed titles for these talks, but brief descriptions as well:

A New GEDCOM: Project Scope, Goals, and Governance

The GEDCOM standard is stale. What does a new GEDCOM initiative look like? What is its scope and goals? How is the project governed?

A New GEDCOM: Tools, Syntax and Semantics

The GEDCOM standard is stale. What would a new GEDCOM look like in terms of its syntax and semantics? What tools would be made available to promote and apply it?

The speaker for both talks is FamilySearch employee Ryan Heaton.

GEDCOM X

The GEDCOM X name has been used before. GEDCOM X is an abbreviation of GEDCOM Explorer, GedcomX is the name of an ActiveX control for reading GEDCOM files, and GEDx is an old Mac OS utility for converting GEDCOM files to tab-delimited ASCII.

new standard?

The phrase New GEDCOM suggests that FamilySearch wants to be keeper of a new standard to replace GEDCOM. It remains to be seen how many developers are willing to invest time in yet another FamilySearch standard is a good idea. After all, FamilySearch abandoned GEDCOM, announced GEDCOM XML as its replacement, and abandoned that even before the specification was finished.

GEDCOM X

FamilySearch's new standard isn't called New GEDCOM, it is called GEDCOM X. There is no information on GEDCOM X on the FamilySearch site. The FamilySearch site has a GEDCOM page, it does not have a GEDCOM X page. There is a GEDCOM page on the FamilySearch DevNet site, there is no GEDCOM X page.

There used to be a gedcom.org site. The GEDCOM specification still refers it. That site has been defunct for many years, but there is a gedcomx.org site now.

GEDCOM X home page 2011: not public yet

When you visit the gedcomx.org site, you are greeted with a message that tells you the site is not public yet:

Welcome!

This site isn't public just yet.

We're going to ask Github if you've got the necessary permissions. You just need to tell Github it's okay for us to access your profile.

The gedcomx.org domain was created on 2011 Feb 12, and is registered to Gordon Clarke. Gordon Clarke is a FamilySearch employee associated with PAF, GEDCOM and New FamilySearch. On that same date, he registered gedcomx.com and gedcomx.net as well, so he seems very serious about claiming the name.

standard and tools

It is a little known fact that there is more to GEDCOM than just a specification. FamilySearch published a specification, as well as GedLib, some C source code to read GEDCOM files and GedChk, a GEDCOM validator. The GEDCOM specification is sloppy, the C source is of poor quality and the GEDCOM validator has never been finished. Still, they had the right idea; a specification is nice, but a specification supported by source code and a validator is better.

GEDCOM X isn't just a new specification, it is a specification supported by open source. The gedcomx.org site hosts the specification, and some handy links. The source code is at github, a popular site for software development projects.

GEDCOM X project

GEDCOM X is still in development. The message on the GEDCOM X site says it needs to checks github for permission. The GEDCOM X site is still in alpha. There is a GEDCOM X project on the github site, and only the FamilySearch employees associated with that project are allowed to access the GEDCOM X site.
The GEDCOM X project is not public yet, but still private, and remains hidden from view. You can follow the links to find the gedcom x project, but when try to access it, github will tell you that This is not the web page you are looking for..

Build and Test on CloudBees

CloudBees

When you visit either www.gedcomx.com or www.gedcomx.net, you get to see the same not-so-welcome page as on gedcomx.org, but there is a difference. When you visit www.gedcomx.org, the address bar displays www.gedcomx.org. When you visit either www.gedcomx.com or www.gedcomx.net, the URL changes to x.gedcom.cloudbees.net.

GEDCOM X home page 2011: not public yet

The cloudbees.net domain isn't some little known FamilySearch domain. It isn't a FamilySearch domain at all. CloudBees is a commercial service aimed at software developers. Their own home page describes CloudBees thus:

The CloudBees platform is the first Platform as a Service that lets companies build, test and deploy Java web applications in the cloud. With CloudBees, software teams can move their development and production activities instantly to the cloud, without restrictions or infrastructure costs.

Java Logo

CloudBees is a freemium service; you can get started using CloudBees for free, and only need pay if you want their premium services.
Whatever subscription FamilySearch has, it is clear that they are using the CloudBees platform to develop GEDCOM X. Read the CloudBees description again; CloudBees isn't some generic platform for every programming language, but a platform specifically created to build, test and deploy Java web applications, and it so happens that Ryan Heaton is a Java programmer. Taken together, these two fact leave little doubt that GEDCOM X is coded in Java.

CloudBees open source projects

open source

The Open Source Project on CloudBees page list the GEDCOM X project. This tells us that GEDCOM X is an open source project. The GEDCOM X project on github is private for now, but will probably go public once the code is stable enough.

CloudBees was developed to support open source projects. When you use the free service, you agree to publish your source code, and make your repository accessible to anyone to anyone in read-only mode. To keep your github repository secret, as FamilySearch does, you need to upgrade to their premium service, but even when you do so, your Jenkins UI remains visible to the public.
In plain English: the private github project is public on CloudBees. You can download the GEDCOM X source code today. You can even subscribe to an RSS feed.

CloudBees GEDCOM X Dashboard

Jenkins

Jenkins is an open source continuous integration (CI) server. You don't need to know what that means. All you need to know is that Jenkins is a central part of the CloudBees platform, and that the Jenkins user interface is public.

The screenshot above shows the CloudBees Jenkins dashboard for the GEDCOM X project. There are five Jenkins build jobs in the GEDCOM X project:

The separate snapshot and release builds will look familiar to Maven users, and for good reason; Apache Maven is one of the tools that make up the CloudBees platform.

CloudBees GEDCOM X Site Comments

changes

The CloudBees platform works with several version control system, including github. It is common for developers to add a brief comment whenever they commit a change to the system. The Jenkins interface shows these comments, and the files the changes are associated with.

A brief look through just these commit comments shows several interesting things. All commit changes are made by Ryan or heatonra; Ryan Heaton or Ryan Heaton. There might be other programmers involved who aren't allowed to commit changes, but it is not impossible that GEDCOM X is a one developer project.

The commit changes for the GEDCOM X site shows that the GEDCOM X documentation has recently been updated to version 0.6. The file extension *.md used for the documentation indicates that these are in the Markdown format. Markdown is a light-weight plain-text mark-up language, that can be converted to valid web pages. The GEDCOM X site includes such pages as Self-Guided-Tour-Project-Site.md, RS-Developers-Guide.md, Application-Profiles.md, Metadata-Model.md, File-Format.md, JSON-Format.md and Legacy-GEDCOM-Migration-Path.md.
By the way, Legacy does not refer to Millennia's Legacy Family Tree product or file format, but to the notion that GEDCOM is a legacy format.

download

CloudBees GEDCOM X Workspace

The workspace for the GEDCOM X snapshot clearly shows the project structure. According to the README.md file, GEDCOM X is the technological standard whereby genealogical data is stored, shared, searched, and secured across all phases of the genealogical research process.. The LICENSE-header states that the code licensed under the Apache License, version 2.0. That, briefly, means that you are free to use it, redistribute it, to modify it, and even to distribute modified versions, as long as you keep the original right, trademark and attribution notices intact.
Notice the (all files in zip) link near the bottom. That's the download link for the entire GEDCOM X snapshot.

updates

2012-01-31: What's in name?

What does GEDCOM X mean?

2012-02-02: GEDCOM X public

See FamilySearch releases GEDCOM X.

2012-05-20: RootsTech schedule

FamilySearch has broken the RootsTech 2012 schedule link. The link has been removed.

2012-06-06: GEDCOM X Converter

FamilySearch has introduced some specifications and a GEDCOM X Converter.

links

GEDCOM & GEDCOM X

GEDCOM X

technology