Modern Software Experience

2012-05-31

Structured GEDCOM addresses

address structure

Chapter 2 of the GEDCOM 5.5.1 specification has a section titled Substructures of the Lineage-Linked Form that list the syntax of various non-top-level records in alphabetical order. The first record listed is the ADDRESS_STRUCTURE.

ADDRESS_STRUCTURE:=
    n ADDR <ADDRESS_LINE> {1:1}  p.41
        +1 CONT <ADDRESS_LINE> {0:3}  p.41
        +1 ADR1 <ADDRESS_LINE1> {0:1}  p.41
        +1 ADR2 <ADDRESS_LINE2> {0:1}  p.41
        +1 ADR3 <ADDRESS_LINE3> {0:1}  p.41
        +1 CITY <ADDRESS_CITY> {0:1}  p.41
        +1 STAE <ADDRESS_STATE> {0:1}  p.42
        +1 POST <ADDRESS_POSTAL_CODE> {0:1}  p.41
        +1 CTRY <ADDRESS_COUNTRY> {0:1}  p.41
    n PHON <PHONE_NUMBER> {0:3}  p.57
    n EMAIL <ADDRESS_EMAIL> {0:3}  p.41
    n FAX <ADDRESS_FAX> {0:3}  p.41
    n WWW <ADDRESS_WEB_PAGE> {0:3}  p.42 

The address structure should be formed as it would appear on a mailing label using the ADDR and the CONT lines to form the address structure. The ADDR and CONT lines are required for any address. The additional subordinate address tags such as STAE and CTRY are provided to be used by systems that have structured their addresses for indexing and sorting. For backward compatibility these lines are not to be used in lieu of the required ADDR.and CONT line structure.

There is more than one thing wrong with that quote, and I'm not even counting the full stop in the last sentence that should have been a space.

GEDCOM 5.5

The GEDCOM 5.6 draft contains the same text as the GEDCOM 5.5.1 specification, down to the erroneous full stop, but the GEDCOM 5.5 specification is slightly different, and not just because several tags where only introduced in GEDCOM 5.5.1.
The paragraph of text accompanying the ADDRESS_STRUCTURE syntax is different too.
By the way, in the GEDCOM 5.5 specification, the ADDR record is an optional part of the ADDRESS_STRUCTURE, while in the GEDCOM 5.5.1 specification, the ADDR record is mandatory. The GEDCOM 5.5 specification allows an address consisting of just a PHON record, the GEDCOM 5.5.1 specification does not.

ADDRESS_STRUCTURE:=
    n ADDR <ADDRESS_LINE> {0:1}  p.37
        +1 CONT <ADDRESS_LINE> {0:M}  p.37
        +1 ADR1 <ADDRESS_LINE1> {0:1}  p.37
        +1 ADR2 <ADDRESS_LINE2> {0:1}  p.37
        +1 CITY <ADDRESS_CITY> {0:1}  p.37
        +1 STAE <ADDRESS_STATE> {0:1}  p.37
        +1 POST <ADDRESS_POSTAL_CODE> {0:1}  p.37
        +1 CTRY <ADDRESS_COUNTRY> {0:1}  p.50
    n PHON <PHONE_NUMBER> {0:3}  p.57

The address structure should be formed as it would appear on a mailing label using the ADDR and ADDR.CONT lines. These lines are required if an ADDRess is present. Optionally, additional structure is provided for systems that have structured their addresses for indexing and sorting.

FamilySearch's lineage-linked form specification contradicts itself.

CONT

One issue that is not limited to the ADDRESS_STRUCTURE, is that the CONT tag is shown at all. The CONC and CONT tags are not part of this GEDCOM form, they are part of the basic GEDCOM grammar. As such, they can be used with any GEDCOM tag that has a line value.
The CONC and CONT tags should not be shown in the syntax for any record in any GEDCOM form, yet FamilySearch's lineage-linked GEDCOM form makes that fundamental mistake over and over again, and that has caused misinterpretation and confusion.

The remark that the CONT lines are required is even more wrong. The underlining of the word required does not make it true. It is a fundamental fact of the GEDCOM grammar that use of the CONC and CONT tags, to create long line values and insert line breaks is always allowed, never required.

It is true that, to make the free-form ADDR line value to a multi-line address complete with line breaks, that you need to use the CONT tag to create those line breaks. Still, needing the CONT tag to create line breaks you want simply isn't the same thing as the CONT tag being mandatory. If all you have for an address is the name of a castle or a farm, then a single line will do.

What's really stunning about FamilySearch showing the CONT tag as if it were part of the GEDCOM form syntax, and the CONT lines are required remark, is that these two absurdities are at odds with each other.
The GEDCOM 5.5.1 syntax explicitly states that CONT <ADDRESS_LINE> may occur zero times: {0:3}. So the syntax clearly states that CONT <ADDRESS_LINE> is optional and the accompanying text claims that it is mandatory. FamilySearch's lineage-linked form specification contradicts itself.

ADR1, ADR2, ADR3

The <ADDRESS_STRUCTURE> used to be little more than the ADDR tag, a free-form address field, for which the CONT tag could be used to insert line breaks. The structured address format was introduced later.
The ADR1, ADR2 and CITY tags first appeared in GEDCOM 5.4. The STAE, POST and PHON tag first appeared in GEDCOM 5.5. The GEDCOM 5.5.1 spec did not only add the ADR3 tag, but also the EMAIL, FAX and WWW tags.

backward compatibility

The introduction of these extra tags create an explicit address structure. Having individual tags for the various address parts has advantages, but at the time, the introduction of this explicit structure created a backward compatibility issue; applications that did not support the new GEDCOM version yet did not understand the new tags. So, if an older application tried to read a GEDCOM file using the new tags, most of the address would be lost.

The GEDCOM 5.5.1 specification seems to be trying to say is not that the CONT tag itself is mandatory, but that providing a backward-compatible address is mandatory. The last sentence clearly states that, for backward compatibility, (some of the) newer tags are not to be used in lieu of the required ADDR.and CONT line structure.

That remark does not mean that the GEDCOM specification has a bunch of handy new tags, but that you should not use them. That would not make sense. The new tags were introduced to be used, not to be forbidden.
It is saying that you are not allowed to use only the new structured format, but that if you do you use the new structured format, you should still provide the address in the older older free-format format as well; you should provide each address twice, once using the new tags, and once using the ADDR and CONT tags.

ADDR line value

A structured GEDCOM address does not use the ADDR line value, but the ADDR subtags instead. The ADDR line value is used for free-form addresses only.
What would be the first line of the free-form ADDR line value is coded using the ADR1 tag, and what would be the second line (line value of the first CONT tag) is coded using the ADR2 tag.
That is not immediately clear from the GEDCOM 5.5 and GEDCOM 5.5.1 sections already cited, but it is clearly stated in the Primitive Elements of the Lineage-Linked Form section:

ADDRESS_LINE1:= {Size=1:60}
The first line of the address used for indexing. This is the value of the line corresponding to the ADDR tag line in the address structure.

ADDRESS_LINE2:= {Size=1:60}
The second line of the address used for indexing. This is the value of the first CONT line subordinate to the ADDR tag in the address structure.

ADDRESS_LINE3:= {Size=1:60}
The third line of the address used for indexing. This is the value of the second CONT line subordinate to the ADDR tag in the address structure.

example

It would have been helpful if the GEDCOM specification contained an example of a properly coded address. Alas, although the ADR1 and ADR2 tags were introduced in GEDCOM 5.4, neither the GEDCOM 5.5 nor the GEDCOM 5.5.1 specification contains an example of an address using these tags. The only address examples in both the GEDCOM 5.5 and the GEDCOM 5.5.1 specification are free-form addresses using ADDR and CONT.

PAF Preparer address

PAF Submitter


0 HEAD
1 SOUR PAF
...
1 SUBM @SUB1@
0 @SUB1@ SUBM
1 NAME FirstName LastName
1 ADDR Address Line 1
2 CONT Address Line 2
2 CONT Address Line 3
2 CONT Address Line 4
2 CTRY Country
1 PHON Phone
1 EMAIL email@address.com

FamilySearch's PAF is no help either. It is easy enough to create an database and enter an address; just enter a submitter's address. While the GEDCOM specification calls this a submitter's address, PAF calls this a preparer's address. I entered an address as shown in PAF 5.2.18.0, and then exported to a GEDCOM file.

Notice that PAF did not use the ADR1 and ADR2 tags, but the free-form ADDR line value. That is not because PAF 5.2.18.0 does not know about the new tags. PAF sticks with the ADDR line value, but also takes advantage of the EMAIL tag.

PAF versus GEDCOM

FamilySearch PAF and FamilySearch GEDCOM mismatch in another few remarkable ways. The Preparer tab of PAF's Preferences dialog box has four address lines, yet GEDCOM 5.5.1 only added ADR3, it did not add ADR4. That may explain why PAF submitter's address is still in the old format, but it makes you wonder why they didn't add ADR4.
While PAF asks you to enter a so-called Ancestral File Number (AFN), a FamilySearch-specific identifier, PAF does not export that value to the GEDCOM.

PAF Contact address

PAF Contact

PAF will use the ADR1 and ADR2 tags, just not for the preparer's address. When you add a Contact address for an individual, PAF does use the ADR1 and ADR2 tags. Notice that PAF uses the non-standard but legal _NAME tag to capture the contact name. PAF does so even when the contact name is identical to the individual's name.


0 @I1@ INDI
1 NAME GivenName /FullName/
2 SURN FullName
2 GIVN GivenName
...
1 ADDR
2 _NAME Contact Name
2 ADR1 Address line 1
2 ADR2 Address line 2
2 CITY City
2 STAE State
2 POST ZIPcode
2 CTRY Country
1 PHON phone
1 EMAIL email@address.com
1 URL www.homepage.org
PAF versus GEDCOM

Notice that the ADDR tag does not have a line value.
PAF exports the Contact address using the new tags, and does not export the address the old-fashioned way. The FamilySearch GEDCOM specifications says that applications are not allowed to merely use the new tags, but must also support the free-form address for backward compatibility. FamilySearch PAF does not comply with FamilySearch's GEDCOM specification.

Header and Submitter

PAF uses free-form for both the submitter record and the corporation listed in the GEDCOM header.
The GEDCOM specification does not provide any reason for PAF to use free-form addresses for HEAD.CORP and SUBM. The GEDCOM specification does not make any exception for HEAD.CORP and SUBM. On the contrary, it both cases, the GEDCOM specification simply specifies that the ADDRESS_STRUCTURE should be used.

HEADER:=
n HEAD {1:1}
...
+2 CORP <NAME_OF_BUSINESS> {0:1}  p.54
+3 <ADDRESS_STRUCTURE> {0:1}  p.31
...
+1 SUBM @<XREF:SUBM>@ {1:1}  p.28
...

SUBMITTER_RECORD:=
n @<XREF:SUBM>@ SUBM {1:1}
+1 NAME <SUBMITTER_NAME> {1:1}  p.63
+1 <ADDRESS_STRUCTURE>* {0:1}  p.31
...

* Note: submissions to the ancestral file require the name and address of the submitter

GEDCOM addresses

There is just one ADDRESS_STRUCTURE in the GEDCOM specification. Everywhere an address is used, the specification refers to this address structure. It does not make any exceptions.
The GEDCOM 5.5.1 specification supports both free-form and structured addresses. It explicitly states that, for backward compatibility, the structured addresses should not be used in lieu of free-form addresses. That is another way of saying that, if you use structured addresses, you should still provide the free-form addresses.

PAF addresses

PAF uses structured addresses for contacts, but still uses free-form addresses for both the submitter and the corporation listed in the GEDCOM header, and does not provide both format at the same time.
PAF is neither consistent nor in compliance with the GEDCOM specification.
So, in this case, PAF's behaviour does not seem a good example to follow.

double addresses

The FamilySearch GEDCOM 5.5.1 specification says addresses should be provided twice, but FamilySearch PAF 5.2.18 doesn't do it. Makes you wonder whether there are applications that do provide addresses twice. FormalSoft's Family Origins 5.0 is one application that does so.


0 HEAD
1 SOUR RootsMagic
2 NAME RootsMagic
2 VERS 5.0
2 CORP RootsMagic, Inc.
3 ADDR PO Box 495
4 CONT Springville, UT 84663
4 CONT USA
3 PHON 1-800-ROOTSMAGIC
3 WWW www.RootsMagic.com
1 DEST RootsMagic
1 DATE 31 MAY 2012
1 SUBM @SUB1@
1 FILE RMAddress.ged
1 GEDC
2 VERS 5.5.1
2 FORM LINEAGE-LINKED
1 CHAR UTF-8
0 @SUB1@ SUBM
1 NAME FirstName LastName
1 ADDR Address Line 1
2 CONT Address Line 2
2 CONT Address Line 3
1 PHON phone
1 _EMAIL email@address.com
0 @I1@ INDI
1 NAME GivenName /FullName/
2 GIVN GivenName
2 SURN FullName
...
1 ADDR Address line 1
2 CONT Address line 2
2 CONT City, State ZIPcode
2 CONT Country
2 _NAME GivenName FullName
2 ADR1 Address line 1
2 ADR2 Address line 2
2 CITY City
2 STAE State
2 POST ZIPcode
2 CTRY Country
1 PHON phone
1 EMAIL email@address.com
1 WWW www.homepage.org
1 FAX fax
...
0 TRLR

Family Origins 5.0

FormalSoft Family Origins 5.0 was released in September of 1996. That is after the release of GEDCOM 5.5, which supports the structured addresses, but before the release of GEDCOM 5.5.1, which demands that free-form addresses continue to be supported. Family Origins 5.0 was doing the right thing, ensuring that its GEDCOM files remained backward compatible, even before the GEDCOM specification demanded it.

RootsMagic 5.0

Through several acquisitions, the marketing rights to Family Origins ended up with Genealogy.com, and in 2003, Genealogy.com discontinued Family Origins in favour of its own Family Tree Maker product. The developers of Family Origins created a new genealogy application, called RootsMagic. The current version of RootsMagic is RootsMagic version 5.0. RootsMagic 5.0 exports to UTF-8 GEDCOM 5.5.1 and does export addresses in both formats.

That sounds good, but not all addresses are provided in both formats.
PAF 5.2.18.0 supports the structured format, but provides the addresses for HEAD.CORP and SUBM in the free-form format. RootsMagic provides the submitter address in both formats, but its own corporate address in the old free-form format only.

An oddity is the use of the non-standard _EMAIL tag in the submitter address. RootsMagic exports to GEDCOM 5.5.1, so it can use the EMAIL tag, and it does so for contact addresses, yet the submitter address uses the _EMAIL tag. Not using the standard EMAIL tag but the proprietary _EMAIL tag instead is wrong.

GEDitCOM


0 HEAD
1 SOUR GEDitCOM
2 NAME GEDitCOM
2 VERS 2.9.4
2 CORP RSAC Software
3 ADDR 7108 South Pine Cone Street
4 CONT Salt Lake City, UT 84121
4 CONT USA
4 ADR1 RSAC Software
4 ADR2 7108 South Pine Cone Street
4 CITY Salt Lake City
4 STAE UT
4 POST 84121
4 CTRY USA
...

Family Origins and RootsMagic are not the only applications that export addresses twice. RSAC Software GEDitCOM 2.9.4 does so, and it does so for all addresses, including its own corporate address.
However, GEDitCOM 2.9.4 doesn't do it right; they repeat the company name in the company's address. That is not according to the specification, which clearly states that the line value of ADR1 should equal the ADR1 line value up to the first CONT.

structured address only

Many applications export addresses to the structured format only, and do not bother to export the old free-form format as well.
That is hardly surprising. Many applications still do not export to GEDCOM 5.5.1, but only to GEDCOM 5.5, and the GEDCOM 5.5 specification does not include any remark on backward compatibility.

backward compatibility

FormalSoft Family Origins 5.0 did the right thing. When the structured address format was new, providing the old free-form format in addition to the newer structured format, ensured compatibility with applications that did not support GEDCOM 5.5 yet.
FamilySearch forgot to include any remarks on backward compatibility in the GEDCOM 5.5 specification, and only added the backward compatibility requirement in the GEDCOM 5.5.1 specification, validating FormalSoft's approach.
It should not have been made be a demand in the GEDCOM 5.5.1 specification. It should have been made a demand in the GEDCOM 5.5 specification and then become merely a recommendation in the GEDCOM 5.5.1 specification.

The GEDCOM 5.5 specification was released back in 1995 and the GEDCOM 5.5.1 specification was released back in 1999. Usage of pre-GEDCOM 5.5 applications has been negligible for years. Thus, the practical need to keep providing the old free form format in addition to the structured format ceased to exist years ago. What RootsMagic 5 is doing, providing addresses in both formats, is technically superior, but hardly necessary anymore. One can even argue that it is detrimental, as it bloats GEDCOM files even more.
The GEDCOM 5.5.1 specification is the last one that FamilySearch publicly released (GEDCOM 5.6 became public much later, despite FamilySearch). FamilySearch has abandoned development of GEDCOM, but if they had not, a newer version would probably not demand the backward compatible address format anymore.

The demand to support a pre-GEDCOM 5.5 format is out of date now.

Best Practice

The GEDCOM 5.5.1 specification is the latest, but it is dated. The demand to support a pre-GEDCOM 5.5 format is out of date now.
The GEDCOM 5.5.1 specification is so dated, that developers should not even treat that demand as a recommendation anymore. However, because of one particular mistake many developers made, namely continuing to use only the old format for some addresses, support for the old format cannot be dropped completely yet.

GEDCOM writers

GEDCOM readers

GEDCOM validators

updates

2012-05-31 instant update: GedFan & Siblings1200.ged

The GedFan util and the Siblings1200.ged file have been updated to comply with the best practice documented here.
Both use structured addresses now.

links