FamilySearch's GEDCOM X project recently released a new version of the GEDCOM X Converter they first released last year. That first one had version number 0.1, the second one has version number 0.2.
The GEDCOM X File format has changed in response to the brief high-level observations provided in last year's GEDCOM X Converter. GEDCOM X Converter 0.2: Changes describes and illustrates the changes in some detail with fragments of GEDCOM X files created by version 0.1 and 0.2. The GEDCOM X Converter 0.2 supports the current format only.
The version numbering of the GEDCOM X Converter does not match the version number of the GEDCOM X project. If you were to go by the numbering of the GEDCOM X Converter, you might think that release 1.0 of GEDCOM X is years away. However, if you judge the readiness of the GEDCOM X project by the number of closed and open issues on GitHub, it seems to be approaching a 1.0 release.
Right now, the GEDCOM X Converter is the only executable product of the GEDCOM X project. This utility is not cast in stone. It is just as much in flux as the libraries it depends on, but you do not need to be a programmer to download and run it.
The GEDCOM X Converter version 0.2 does not have a GUI, but it does have command-line paramaters. The converter itself will show all supported command-line parameters when you start it without any command line parameter.
parameter | description |
---|---|
-P (--pause) | Pause before starting the conversion process (experimental, used for profiling) |
-bs (--bson) | Use binary JSON instead of XML for serialization (experimental, used for proof-of-concept) |
-i (--input) FILE | GEDCOM 5.5 input file |
-ix (--inputx) FILE | GEDCOM X input file (experimental, used for benchmarking) |
-js (--json) | Use JSON instead of XML for serialization (experimental, used for proof-of-concept) |
-o (--output) FILE | GEDCOM X output file |
-v (--verbose) | Output all the warnings that are generated during the conversion. |
-vv (--very-verbose) | Output all the warnings and informational messages that are generated during the conversion. |
Notice the option to input a GEDCOM X file instead of a GEDCOM file.
Version 0.2 will not convert a GEDCOM X file to a GEDCOM file, it will merely convert
a GEDCOM X file to a GEDCOM X file.
That option is used by the developers to benchmark performance of the converter.
Several options are for variations of the GEDCOM X file format.
The GEDCOM X format is XML-based.
When you invoke the -js option the file uses JSON instead of XML.
When you invoke the -bs option the file uses binary JSON instead of XML.
These options are evidence that remains of the indecisiveness that has plagued the GEDCOM X project.
A major goal of the GEDCOM X project is to define a new genealogy file format, yet years after the GEDCOM X project started,
more than 1½ years after the project and public availability of the source code was announced,
and more than a year after the former FamilySearch CEO promoted the GEDCOM X project in his RootsTech 2012 keynote speech,
the FamilySearch's GEDCOM X programmers still haven't decided whether they want to use XML, JSON or binary JSON as the basis for their GEDCOM X file format?
Never mind which one of these three technologies is best
or why that one is best,
the real issue is that a decision should have been made years ago already.
Reading through some of the many issues on the GEDCOM X github,
I found that the GEDCOM X project leader settled on XML a few days ago.
That decision makes the options to produce variations of the GEDCOM X format superfluous,
so it is reasonable to expect that these will be removed from future versions of the converter.
The GEDCOM X Converter 0.2 will convert GEDCOM to GEDCOM X. This second release still does not convert back from GEDCOM X to GEDCOM, so it is still not possible to round-trip a file, and that limits the evaulation possibilities.
The changes to GEDCOM X file format were made to make GEDCOM X files smaller, and the GEDCOM to GEDCOM X converter is all we need to compare the size of corresponding GEDCOM and GEDCOM X files. However, you cannot simply pick any random GEDCOM file.
The main page for the GEDCOM X Converter project on GitHub, from where you can now download version 0.2, still notes that it does not support all of GEDCOM yet:
All Records
The following are not currently converted on all types of records:
- Notes (NOTE tag)
- Multimedia (OBJE tag)
- LDS Ordinances
- ID's such as RIN, RFN, REFN and AFN tags
- RESN tag
- AGE tag is not supported on the event structures
- Generic events (EVEN tag)
Individual
The following are not currently converted on an individual records:
- Tags: ALIA ASSO
- Generic facts (FACT tag)
Family
Families are converted into binary relationships (couple and parent-child). All tags are supported except the tags not supported on all records.
Contributor
All tags are supported except the tags not supported on all records.
Source
The following are not currently converted on an individual records:
- Tags: TEXT
Repository
All tags are supported except the tags not supported on all records.
To compare the size of GEDCOM and GEDCOM X files, you must limit yourself to GEDCOM features the GEDCOM X Converter supports.
It is important to be aware of these limitations when evaluating FamilySearch's GEDCOM X Converter, because the GEDCOM X Converter itself still does not automatically warn you when encountering unsupported GEDCOM features.
Many GEDCOM files use features that aren't supported. For most GEDCOM files, version 0.2 of the converter still isn't good enough. That's hardly surprising for an early version of the converter, but it is the flip side that matters: most GEDCOM files aren't good test files for evaluating the GEDCOM X Converter.
For example, most GEDCOM files contain NOTE
records.
If you convert a GEDCOM files containing NOTE
records, those records will not be converted,
those records will not become part of the GEDCOM X file, but simply disappear
, i.e. not be included in the GEDCOM X file in a way.
If you are not aware that data is being thrown away, you might easily draw the wrong conclusion,
misinterpret failure to convert as stellar compression.
To compare the size of GEDCOM and GEDCOM X files, you must limit yourself to GEDCOM features the GEDCOM X Converter supports.
You cannot assume the GEDCOM X Converter did its job perfectly just because it did not report an error or warning.
To illustrate the issue, I used PAF to create LargeNote.GED
.
It's nothing special, just a small GEDCOM file containing one large NOTE
record.
The screenshot shows what happens.
When you run the GEDCOM X Converter normally, it simply creates a GEDCOM X file, without any error or warning.
Only when you choose to use the -v
(verbose) option does the converter warn that the NOTE
record was not processed.
The convertor additionally warns that it does not support _UID
records.
You cannot assume the GEDCOM X Converter did its job perfectly just because it did not report an error or warning. The GEDCOM X Converter defaults to silently failing whenever it encounters an unsupported record. You need to use the verbose option to be informed.
The GEDCOM X Converter did not produce any error or warning when it converted FAN4.GED
,
but that conversion was done without using the verbose option.
When you use the -v
( verbose) option, the converter still does not produce any error or warning.
Only when you use the -vv
(very verbose) option, does the converter issue a few warnings,
informing you that it does not understand the _STS
or WWW
tags.
The _STS
record is a GEDCOM extension.
That the GEDCOM X Converter does not recognise the WWW
tag is an undocumented shortcoming.
There is no real problem; the _STS
record occurs just once, in the header,
and the WWW
occurs only twice, in the header and in the submitter.
The GEDCOM X Converter does not complain about any of the data records, but converts them just fine.
You cannot use just any GEDCOM file if you want to compare GEDCOM and GEDCOM X file sizes, but you can use the fan files. The fan files use just a few GEDCOM features, and the converter's current limitations aren't a real issue. The size of the fan files deliberately form a nice exponential progression, originally to measure the capacity of genealogy software and express it as a fan value, but the fan files can also be used to evaluate the performance of a range of things a function of GEDCOM file size. Most importantly, anyone can download GedFan, create the fan files, and use them to run tests themselves.
The GEDCOM X and GEDCOM X Converter version number have been changed to 1.0.0 M1.
Copyright © Tamura Jones. All Rights reserved.