Modern Software Experience

2011-04-08

Documenting GEDCOM

GEDCOM versions

Different GEDCOM versions support different GEDCOM tags. Newer versions of GEDCOM do not just introduce new tags, but drop support for older tags as well. The GEDCOM specification has a Introduction in front of Chapter 1. That introduction lists changes with respect to earlier versions. It even explicitly lists which tags were added or removed. Sadly, these statements are neither entirely correct nor complete.

This article documents all the discrepancies I've noted over the years, and presents an overview of GEDCOM tags that incorporates all these observations. I do not claim that this overview is entirely accurate, but it certainly addresses a lot of misinformation published by FamilySearch itself. Additions and corrections are very welcome.

GEDCOM 5.0

GEDCOM 5.0 was introduced in 1991. It was followed by GEDCOM 5.5 in 1995. In between several draft were published, most notably GEDCOM 5.3.
The GEDCOM 5.0 specification is not publicly available. Last year, I asked FamilySearch about copies for older GEDCOM specifications, and after several delays and evasive replies, they admitted that they are unable to find their own standards…
The GEDCOM 5.0 column below is based on a reading of the GEDCOM 5.3 and GEDCOM 5.4 specifications.

GEDCOM tag brief description
_IICIndividual Record Count
_FICFamily Record Count
_RULRULes used
ADFAADoption by FAther
ADMOADoption by MOther
DIVODIVOrce.
EOFEnd Of File.
FAMIFAMIly.
PAREPAREnt record
SYSTSYSTem section

before GEDCOM 5.0

Information on earlier GEDCOM specifications is hard to come by. The table to the right lists some of the tags that earlier versions of GEDCOM included, but this table is not complete.

GEDCOM 5.3

The GEDCOM 5.3 specification does provide a list of Some changes in Version 5.2 - 5.3 that were not in previous 5.x versions, but does not provide simple enumerations of the tags that were added or removed like later GEDCOM versions do.
The only explicitly mentioned change in this section is the addition of the MSTAT tag. The MSTAT tag does not appear in later versions of the GEDCOM specification.

GEDCOM 5.3 is remembered for the introduction of SCHEMA tags; SCHEMA, DEFN, LABL and ISA. This mechanism for declaring vendor-defined tags was dropped in GEDCOM 5.4, but is present in many Family Tree Maker GEDCOM files.

The GEDCOM 5.3 tag overview in Appendix A presents an alphabetical lists of GEDCOM tags, but is neither accurate nor complete. Appendix A lists the tag ISSUE, while the actual tag documented in the rest of the GEDCOM 5.3 specification is ISSU. The tags AUDIO, PHOTO and VIDEO are missing from the list.

The tags CPLR, XLTR and INFT tags do occur in the GEDCOM 5.3 introduction and Appendix A, but they do not occur in the specification itself. Thus, these three tags were actually dropped in GEDCOM 5.3.

GEDCOM 5.4

The last two bullet points of the section Changes introduced or Modified in Version 5.4 are:

  • The following tags were eliminated:
    ARVL, BROT, BUYR, CEME, CNTC, CPLR, DEFM, DPRT, EDTR, FIDE, FILM, GODP, HDOH, HEIR, HFAT, HMOT, INFT, INDX, INTV, ISA, ISSU, ITEM, LABL, LCCN, LGTE, MBR, NAMS, NAMR, OFFI, ORIG, OWNR, PERI, PORT, PWIF, PUBR, RECO, SELR, SEQU, SERS, SIBL, SIGN, SIST, SITE, TXPY, XLTR, WFAT, WITN, WMOT, AUDIO, IMAGE, PHOTO, SCHEMA, VIDEO
  • The following tags were added:
    BLOB, CTRY, CREM, EOBJ, FCOM, GIVN, NPFX, NSFX, OBJE, PEDI, RELA, RESI, RESN, SUBN, SURN, STAT, END

Notice that neither of these two lists of tags is in alphabetical order.

The MSTAT tag introduced in GEDCOM 5.3 does not appear in the GEDCOM 5.4 specification, yet is not listed as one of the tags that has been removed. The GEDCOM 5.4 specifications lists the non-existing MSTA tag as removed instead.

The GEDCOM 5.4 specification supports multimedia through the BLOB tag. This tag allows multimedia objects to be encoded inside the GEDCOM. For compatibility with earlier GEDCOM versions, the BLOBs appear after the TRLR tag. The end of a BLOB object is indicated by the EOBJ tag, while actual end of the GEDCOM file is indicated by the END tag.

The GEDCOM 5.4 draft lists the following tags as removed from GEDCOM: ARVL, AUDIO, BROT, BUYR, CEME, CNTC, CPLR, DEFM, DPRT, EDTR, FIDE, FILM, GODP, HDOH, HEIR, HFAT, HMOT, IMAGE, INFT, INDX, INTV, ISA, ISSU, ITEM, LABL, LCCN, LGTE, MBR, NAMR, NAMS, OFFI, ORIG, OWNR, PERI, PHOTO, PORT, PWIF, PUBR, RECO, SCHEMA, SELR, SEQU, SERS, SIBL, SIGN, SIST, SITE, TXPY, VIDEO, WFAT, WITN, WMOT and XLTR.
The GEDCOM 5.4 specification states that it ended support for DEFM, but it actually ended support for DEFN. There never was a DEFM tag.
The GEDCOM 5.4 specification lists the IMAGE tag as removed in version 5.4, but there is no IMAGE tag in GEDCOM 5.3.
Present in GEDCOM 5.3, but not in 5.4, yet like MSTAT not included in the list of removed tags are CLAS, PHUS, REFS, and SOUND.

The list of tags removed in 5.4 lists LGTE as removed, while it should list LEGA as removed. The GEDCOM 5.3 specification is the last GEDCOM form that the LEGA tag actually occurs in. The GEDCOM 5.3 specification uses LEGA in the GEDCOM form, yet lists LGTE in the Appendix. Both GEDCOM 5.4 document and GEDCOM 5.5 list LEGA in their Appendix, without listing either LEGA or LGTE in the GEDCOM form.

The GEDCOM 5.4 introduction list ORDL as new in GEDCOM 5.4, and lists the tag in the Appendix, but ORDL does not appear in the GEDCOM form.

Although the CPLR, XLTR and INFT tags do occur in the GEDCOM 5.3 specification, they only occur in the introduction and the Appendix, not in the specification itself. Thus, these three tags were actually dropped in GEDCOM 5.3 already.

The GEDCOM 5.4 draft introduced the tags BLOB, CREM, CTRY, FCOM, GIVN, NPFX, OBJE, PEDI, RELA, RESI, RESN, SUBN, SURN and STAT. These remained in GEDCOM 5.5.

GEDCOM 5.5

The GEDCOM 5.5 specification is the official successor to the GEDCOM 5.0 specification. The GEDCOM 5.3 and 5.4 specifications were drafts only. The GEDCOM 5.5 introduction lists changes between the various versions.

5.5 from 5.4

The section Modification in Version 5.5 as a result of the 5.4 (draft) review list a number of bullet points that summarise the changes. The text of the last bullet point is

  • The following tags were added:
    ADR1, ADR2, CITY, NICK, POST, SPFX

This list of added tags is not complete. As mentioned in another bullet point, GEDCOM 5.5 also introduced the RIN tag.

5.4 from 5.3

The last two bullet points of the section Changes introduced or Modified in Draft Version 5.4 are:

  • The following tags are no longer used in the Lineage-Linked Form:
    ARVL, BROT, BUYR, CEME, CNTC, CPLR, DEFM, DPRT, EDTR, FIDE, FILM, GODP, HDOH, HEIR, HFAT, HMOT, INFT, INDX, INTV, ISA, ISSU, ITEM, LABL, LCCN, LGTE, MBR, NAMS, NAMR, OFFI, ORIG, OWNR, PERI, PORT, PWIF, PUBR, RECO, SELR, SEQU, SERS, SIBL, SIGN, SIST, SITE, TXPY, XLTR, WFAT, WITN, WMOT, AUDIO, IMAGE, PHOTO, SCHEMA, VIDEO
  • The following tags were added:
    BLOB, CTRY, CREM, FCOM, GIVN, NPFX, NSFX, OBJE, PEDI, RELA, RESI, RESN, SUBN, SURN, STAT

Notice that neither of these two lists of tags is in alphabetical order.

Note that those two lists should be, but are not identical to those in the GEDCOM 5.4 specification itself. The GEDCOM 5.5 specification does not mention the END or EOBJ tags introduced in GEDCOM 5.4. Both seem to have been silently withdrawn after FamilySearch realised that both tags are superfluous.

5.4 from 5.3

The section Changes introduced in Draft Version 5.3 does not provide a list of new or removed tag. Other bullet points do mention that the MSTA tag and the SCHEMA tags were introduced in 5.3 and removed again in 5.4, and that use of the CPLR, XLTR and INFT tags in source substructures was discontinued.
Those statements are not entirely correct; GEDCOM 5.3 did not introduce the MSTA tag, but the MSTAT tag.

The GEDCOM 5.5 document does not correct, but repeats misinformation from GEDCOM 5.4 document. Although the CPLR, XLTR and INFT tags do occur in the GEDCOM 5.3 document, they only occur in the introduction and the Appendix, not in the specification itself. Thus, these three tags were actually dropped in GEDCOM 5.3 already.

summary

The GEDCOM 5.4 draft dropped support for the SCHEMA stuff introduced in GEDCOM 5.3, and several other tags. The statements provided about this in the GEDCOM 5.5 document are not accurate.

The GEDCOM 5.4 draft introduced BLOB, CREM, CTRY, FCOM, GIVN, NPFX, OBJE, PEDI, RELA, RESI, RESN, SUBN, SURN and STAT. These remained in GEDCOM 5.5.
GEDCOM 5.5 additionally added the tags ADR1, ADR2, CITY, NICK, POST and SPFX as result of the GEDCOM 5.4 draft review.

The list of tags introduced in 5.5 is not complete. The GEDCOM 5.5 specification additionally introduced RIN. The inclusion of this new tag is mentioned in one of the introduction's bullet points, yet the tag is missing from the list of new tags.

GEDCOM 5.5.1

GEDCOM 5.5.1 removed the tag BLOB and added EMAIL, FAX, FACT, FONE, ROMN, WWW, MAP, LATI and LONG.

GEDCOM 5.5.1 is the de facto GEDCOM standard. Most genealogy applications, even those that claim to support GEDCOM version 5.5, use some or all of these tags.

5.5.1 from 5.5

The section Modifications in Version 5.5.1 ends with the following two points:

The following tags were added:

EMAILelectronic mailing address
FAXFAX address
FACTA fact or characteristic.
FONEPhonetic variation of a text.
ROMNRomanised variation of a text.
WWWWeb home page address.
MAPPertaining to maps.
LATIvalue of a latitudinal coordinate pertaining to the place of an event
LONGvalue of a longitudinal coordinate pertaining to the place of an event.

The following tag was removed:

BLOB 

Notice that the list of tags added in 5.5.1 is not in alphabetical order.

GEDCOM 5.6

GEDCOM 5.6 was never officially released. It became public early in 2011, more than a decade after its creation. It is historically interesting because it introduced the GEDXML format.
GEDCOM 5.6 examined the changes introduced in GEDCOM 5.6 and concluded that although GEDCOM readers should be updated to support GEDCOM 5.6, the recommended output default remains GEDCOM 5.5.1.

5.6 from 5.5

The section Modifications in Version 5.6 ends with the following two points:

The following tags were added:

CLNDRCalendar type
EMAILElectronic mailing address
FAXFAX address
FACTA fact or characteristic.
FONEPhonetic variation of a text.
MAPPertaining to maps.
LATIvalue of a latitudinal coordinate pertaining to the place of an event
LONGvalue of a longitudinal coordinate pertaining to the place of an event.
ROMNRomanised variation of a text.
URLWeb page address.
WACLDS Temple ordinance event.

The following tag was removed:

BLOBdecision not to allow imbedded multimedia objects.
LEGAnot used in any valid substructures.

Notice that most of the tags that the GEDCOM 5.6 specification lists as added in version 5.6 were actually added in version 5.5.1.

The GEDCOM 5.6 specification additionally added ORDL, a tag first mentioned in the GEDCOM 5.4 specification.

GEDCOM 5.6 contains a small table that highlights the differences between GEDCOM version 5.5, 5.5.1 and 5.6. The only truly new tags are CLNDR and WAC; URL is not really a new tag, but a new name for the WWW tag, which was introduced in GEDOCM 5.5.1.
The BLOB and LEGA tags were dropped in GEDCOM 5.5.1 already. The EMAIL, FACT, FAX, FONE, LATI, LONG, MAP and ROMN tags were all introduced in GEDCOM 5.5.1.

Overview Table

5.05.35.45.55.5.15.6GEDCOM tagbrief description
NNYYYY ABBRAbbreviation.
YYYYYY ADDRPostal address.
YYYYYY ADOPAdoption.
NNNYYY ADR1Address line 1.
NNNYYY ADR2Address line 2.
NNNNYY ADR3Address line 3.
YYYYYY AGEAge at time of event.
YYYYYY AGNCGovernment agency.
YYYYYY ALIAalias link to another record.
YYYYYY ANCIAncestral interest.
YYYYYY ANULAnnulment.
YYNNNN ARVLArrival.
YYYYYY ASSOAssociates.
YYNNNN AUDIOAudio.
YYYYYY AUTHAuthor.
YYYYYY BAPMBaptism.
YYYYYY BARMBar Mitzvah (Jewish boy).
YYYYYY BASMBas Mitzvah (Jewish girl).
YYYYYY BIRTBirth.
YYYYYY BLESBlessing.
NNYYNN BLOBBinary Large OBject.
YYNNNN BROTBrother.
YYYYYY BURIBurial.
YYNNNN BUYRBuyer.
YYYYYY CALNCall number within repository.
YYYYYY CASTCaste
YYYYYY CAUSCause.
YYNNNN CEMECemetery.
YYYYYY CENSCensus.
YYYYYY CHANChange.
YYYYYY CHARCharacter set or encoding.
YYYYYY CHILChild.
YYYYYY CHRChristening.
YYYYYY CHRAAdult christening.
NNNYYY CITYCity.
YYNNNN CLASClassification.
YYNNNN CNTCContact person.
YYYYYY CONCConcatenate lines.
YYYYYY CONFConfirmation.
YYYYYY CONTContinue on next line.
YYYYYY COPRCopyright statement.
YYYYYY CORPCorporation.
YNNNNN CPLRCompiler (person, not program).
NNYYYY CREMCremation.
NNYYYY CTRYCountry.
YYYYYY DATAData.
YYYYYY DATEDate.
YYYYYY DEATDeath.
YYYYYY DESIDescendant Interest.
YYYYY Y DESTDestination system.
YYYYYY DIVDivorce.
YYYYYY DIVFDivorce filed.
YYNNNN DPRTDeparture.
YYYYYY DSCRPhysical description.
YYNNNN EDTREditor (person).
YYYYYY EDUCEducation.
NNNNYY EMAILemail address.
YYYYYY EMIGEmigration.
NNYNNN ENDEND of file.
YYYYYY ENGAEngagement.
NNYNNN EOBJEnd of OBJect.
YYYYYY EVENEvent.
NNNNYY FACTFact.
YYYYYY FAM"Family".
YYYYYY FAMCChild within "family".
YYYYYY FAMSSpouse within "family".
YYYYYY FATHFather in "family".
NNNNYY FAX fax (phone) number.
NNYYYY FCOMFirst communion.
YYNNNN FIDEFidelity (of a record).
YYYYYY FILEFilename.
YYNNNN FILMFilm number.
NNNNYY FONEPhonetic spelling.
YYYYYY FORMFormat.
YYYYYY GEDCGEDCOM details.
NNYYYY GIVNGiven name.
YYNNNN GODPGodparent.
YYYYYY GRADGraduation.
YYNNNN HDOHHead of household.
YYYYYY HEADGEDCOM header.
YYNNNN HEIRHeir.
YYNNNN HFATHusband's father.
YYNNNN HMOTHusband's mother.
YYYYYY HUSBHusband in "family".
YYYYYY IDNOIdentity number.
YNNNNN IMAGEImage
YYYYYY IMMIImmigration.
YYYYYY INDIIndividual record.
YYNNNN INDXIndexed.
YNNNNN INFTInformant.
YYNNNN INTVInterviewer.
YYNNNN ISSUIssue (periodical).
YYNNNN ITEMItem.
YYYYYY LANGLanguage.
NNNNYY LATILatitude.
YYNNNN LCCNLibrary of Congress Call Number.
YYNNNN LEGALegatee (LGTE).
NNNNYY LONGLongitude.
NNNNYY MAPMap coordinates.
YYYYYY MARBMarriage Bann (announcement).
YYYYYY MARCMarriage Contract.
YYYYYY MARLMarriage License.
YYYYYY MARRMarriage.
YYYYYY MARSMarriage settlement.
YYYYYY MEDIMedia.
YYNNNN MBRMember.
YYYYYY MOTHMother in "family".
NYNNNN MSTATMarriage Status.
YYYYYY NAMEName.
YYNNNN NAMRReligous name.
YYNNNN NAMSName sake (godparent).
YYYYYY NATINationality.
YYYYYY NATUNaturalisation.
YYYYYY NCHINumber of children.
NNNYYY NICKNickname.
YYYYYY NMRNumber of "marriages".
YYYYYY NOTENote.
NNYYYY NPFXName prefix.
NNYYYY NSFXName suffix.
NNYYYY OBJEObject.
YYYYYY OCCUOccupation.
YYNNNN OFFIOfficiator.
YYYYYY ORDNOrdination.
YYNNNN ORIGOrigination.
YYNNNN OWNROwner of property.
YYYYYY PAGEPage number.
NNYYYY PEDIPedigree.
YYNNNN PERIPeriod in time.
YYYYYY PHONPhone number.
YYNNNN PHOTOPhotograph.
YYNNNN PHUSPrevious husband.
YYYYYY PLACPlace.
YYNNNN PORTPort.
NNNYYY POSTPostal code.
YYYYYY PROBProbate.
YYYYYY PROPProperty.
YYYYYY PUBLPublication.
YYNNNN PUBRPublisher.
YYNNNN PWIFPrevious wife.
YYYYYY QUAYQuality of data.
YYNNNN RECORecorder.
YYYYYY REFNReference number.
YYNNNN REFSReferenced source.
NNYYYY RELARelationship.
YYYYYY RELIReligion.
YYYYYY REPORepository.
NNYYYY RESIResidence.
NNYYYY RESNAccess restriction.
YYYYYY RETIRetirement.
YYYYYY RFNRecord File Number (within a file).
NNNYYY RINRecord Identification Number.
YYYYYY ROLERole.
NNNNYY ROMNRomanisation.
YYNNNN SELRSeller.
YYNNNN SEQUSequence.
YYNNNN SERSNot the series, but volume within a series.
YYYYYY SEXGender.
YYNNNN SIBLSibling.
YYNNNN SIGNSignature.
YYNNNN SISTSister.
YYNNNN SITESite (specific building).
YYNNNN SOUNDSound bytes.
YYYYYY SOURSource.
YYYYYY SPOUSpouse.
NNNYYY SPFXSurname prefix.
YYYYYY SSNSocial Security Number.
NNNYYY STAEState.
YYNNNN STATSearch Status.
NNYYYY STATObject Status.
YYYYYY SUBMSubmitter (creator of GEDCOM file).
NNYYYY SUBNSubmission.
NNYYYY SURNSurname.
YYYYYY TEXTText.
YYYYYY TIMETime.
YYYYYY TITLTitle.
YYYYYY TRLRTrailer.
YYNNNN TXPYTaxpayer.
YYYYYY TYPEType.
YYYYYY VERSVersion number.
YYNNNN VIDEOVideo.
YYNNNN WFATWife's father.
YYYYYY WIFEWife.
YYYYYY WILLWill.
YYNNNN WITNWitness.
YYNNNN WMOTWife's mother.
NNNNYN WWWWorld wide web address (URL).
YNNNNN XLTRTranslator.
        
GEDCOM 5.3 SCHEMA tags
NYNNNNSCHEMASchema.
NYNNNNDEFNDefinition.
NYNNNNLABLLabel.
NYNNNNISAis-a: inherits characteristics.
        
GEDCOM 5.6 tags
NNNNNY CLNDRCaLenDaR.
NNNNNY URLURL.
        
LDS tags
NNYYYY ANCEAncestors.
YYYYYYAFNAncestral File Number.
YYYYYYBAPLLDS baptims.
YYYYYYCONLLDS confirmation.
NNYYYY DESCDescendants.
YYYYYY ENDLLDS Endownment.
NNNYYY FAMFFamily File.
NNYYYY ORDIOrdinace.
NNNNNY ORDLLDS Ordination.
YYYYYY SLGCSealing of child.
YYYYYY SLGSSealing of spouse.
YYYYYY TEMPLDS Temple code.
YYNNNY WACWashing And Clothing.
        

dealing with misspelled tags

GEDCOM 5.3 introduced the MSTAT tag, yet newer specifications erroneously state it introduced the MSTA tag. GEDCOM 5.3 also introduced the ISSU tag, yet the Appendix lists it as the ISSUE tag instead. GEDCOM 5.5.1 introduced the EMAIL tag, but some products use EMAI instead.

The GEDCOM specification takes precedence over implementations. The specification that actually introduced the tag takes precedence over mentions in other specifications. The actual specification takes precedence over the introduction and the Appendix.
The actual tag names are MSTAT, ISSU and EMAIL; MSTA, ISSUE and EMAI aren't GEDCOM tags, but misspellings of these three GEDCOM tags.

A GEDCOM reader should recognise GEDCOM tags and produce errors for anything else. For these three misspellings, the GEDCOM reader should produce a non-fatal error to let the user know the input is erroneous, and then continue parsing the GEDCOM file as if the correct tag was used.

sources

The text and table were created from my personal notes, which reference the GEDCOM specifications mentioned. The relatively recent documents, GEDCOM 5.5 and GEDCOM 5.5.1 are easy to find, older ones are hard to come by.
Several genealogy software authors contributed corrections to the original table. Please report any additions and corrections you may have.

updates

2011-04-08 instant update

David A. Knight, author of Gedcom.NET and the GedView application for iOS, spotted that all the check marks and crosses in the FACT row were switched around.

2011-05-18 & 2011-05-22: corrections

Louis Kessler, author of Behold, reported corrections for ADR3, EMAIL, INDX, MSTAT, NAMR, RIN and WAC. Noted a few tags that were missing, out of order or misspelled.

2011-07-01 GEDCOM 5.3 SCHEMA

Added a brief paragraph about the GEDCOM 5.3 SCHEMA tags.

links

GEDCOM

contributors