Modern Software Experience

2013-05-14

The original title of this article is GEDCOM ANSEL Table. It was changed to GEDCOM 5.5.1 specification ANSEL Table, after the introduction of the ANSEL / Unicode Conversion Tables article.

GEDCOM 5.5.1

Appendix C

The GEDCOM 5.5.1 specification has several appendices. The ANSEL character set used by many GEDCOM files is presented in Appendix C: ANSEL Character Set.
Various interesting observations can be made about how this GEDCOM 5.5.1 ANSEL table relates to earlier versions of the GEDCOM specification, or the ANSEL standard itself, but the first problem you'll notice when you look up GEDCOM 5.5.1 Appendix C is that the table is not what it should be. The FamilySearch GEDCOM specification admits this shortcoming; The graphic characters shown are not always accurate, however the name of the diacritic and the decimal equivalent should agree with the ANSEL standard. It is true that the names are not messed up, but the examples are damaged too.

graphic

The GEDCOM 5.5.1 specification is provided as an Adobe PDF file. When you try the view the table in Appendix C using Adobe Reader, many a character that should be there have been replaced by a • (U+2022 Bullet). Where it should show a graphic displaying slash D - uppercase, it does not show Đ (U+0110 Latin Capital Letter D with Stroke) but • (U+2022 Bullet). Where it should show a graphic displaying cedilla, it does not show ◌̧ (U+0327 Combining Cedilla) as it should, it does not even show ¸(U+00B8 Cedilla), it shows • (U+2022 Bullet) instead. The example for the macron should be gājējs, but the table shows g•j•js.

Several browsers include the ability to view PDF files. When you view the GEDCOM 5.5.1 PDF using these browsers, you will other characters than a bullet, but not the right characters.

Adobe Reader cannot find font WPMultinationalARoman

WPMultinationalARoman

When you open the PDF and scroll to the table, Adobe Reader will you something is wrong. It will display a message box with the text Cannot find or create the font 'WPMultinationalARoman'. Some characters may not display or print correctly..
Adobe is not to blame. This is entirely FamilySearch's fault. FamilySearch could have included the font in the PDF.
Another possible approach would have been to not rely on some proprietary font, but on images.

WordPerfect

The font WP MultinationalA Roman is a WordPerfect font. This, and other WordPerfect 9 fonts can be downloaded from the Corel's FTP site as wpfonts.exe. That is a self-extracting ZIP file containing TrueType fonts. The WPCO01NA.TTF file contains WP MultinationalA Courier, The WPHV01NA.TTF file contains WP MultinationalA Helvetica, and WPRO01NA.TTF file contains WP MultinationalA Roman. The wpfonts_readme.txt file confirms that these font files can be embedded in Adobe PDF files.

GEDCOM 5.5

The Adobe PDF for the original GEDCOM 5.5 specification has the same problem.
However, a later release of the GEDCOM 5.5 specification, that includes a few errata dated 11 December 1995, contains a table that looks fine. The GEDCOM 5.5.1 ANSEL tables can be repaired by referencing both the older FamilySearch GEDCOM 5.5 specification and the original ANSEL specification.
The corrected table is presented below.

GEDCOM 5.5.1 ANSEL tables

ANSEL examples

The ANSEL table in GEDCOM contains examples. These examples were copied from the ANSEL specification, but on more than one occassion, some letters or combining characters were lost. These letters and combining characters have been restored.

spaces for readability

The name column of the table contains text such as ligature ae—lowercase.
These are not the ANSEL character names. The corresponding ANSEL character names have spaces around the dash now: ligature ae — lowercase. These spaces have been restored to enhance readability.

characters

The characters in the graphics column are not images, but actual characters. These should display correctly unless you are using a browser or system that belongs in the previous millennium.
These tables aren't images, but real web tables, with their layout controlled through CSS.
You can change the size of the font for the page, and the table will resize with it.

combining characters

The table below is a further improvement on the GEDCOM 5.5.1 ANSEL table in that it displays the actual combining characters, not some non-combining character that approximates it, in the graphics column. For example, the first character shown is U+0300 Combining Grave Accent, not U+0060 Grave Accent. To show the combining characters correctly and to clearly show the relative position of the combining character, the table uses ◌ (U+25CC Dotted Circle) as a placeholder character.

ANSEL Non-spacing graphic characters
HEX wpcode Dec Graphic Name example of use
E1 1,0 225 ◌̀ grave accent règle
E2 1,6 226 ◌́ acute accent está
E3 1,3 227 ◌̂ circumflex accent même
E4 1,2 228 ◌̃ tilde niño
E5 1,8 229 ◌̄ macron gājējs
E6 1,22 230 ◌̆ breve altă
E7 1,15 231 ◌̇ dot above żaba
E8 1,7 232 ◌̈ umlaut (diaeresis) öppna
E9 1,19 233 ◌̌ hacek vždy
EA 1,14 234 ◌̊ circle above (angstrom) hår
ED 1,10 237 ◌̕ high comma, off center rozdel̕ ovac
EE 1,16 238 ◌̋ double acute accent időszaki
F0 1,17 240 ◌̧ cedilla ça
F1 1,18 241 ◌̨ right hook, ogonek vietą
F6 2,7 246 ◌̲ underscore s̲amar
FE 1,9 254 ◌̓ high comma, centered ge̓otermika
ANSEL Spacing graphic characters
C/R wpcode Dec Graphic Name example of use
A1 1,152 161 Ł slash L — uppercase Łódź
A2 1,80 162 Ø slash O — uppercase Øst
A3 1,78 163 Đ slash D — uppercase Đuro
A4 1,88 164 Þ thorn — uppercase Þann
A5 1,36 165 Æ ligature AE — uppercase Ægir
A6 1,166 166 Œ ligature OE — uppercase Œuvre
A8 1,1 168 · middle dot novel·la
A9 5,28 169 musical flat B♭
AA 4,32 170 ® registered trademark ABC®
AB 6,1 171 ± plus or minus A±B
AE 1,11 174 ◌ʼ alif Unʼyusho
B0 2,11 176 ◌ʻ ayn faʻil
B1 1,153 177 ł slash l — lowercase rozbił
B2 1,81 178 ø slash o — lowercase høj
B3 1,79 179 đ slash d — lowercase đavola
B4 1,89 180 þ thorn — lowercase þann
B5 1,37 181 æ ligature ae — lowercase skæg
B6 1,167 182 œ ligature oe — lowercase œuvre
B8 1,24 184 ı dotless i — lowercase masalı
B9 4,11 185 £ British pound £5.00
BA 1,87 186 ð eth verður
C3 4,23 195 © copyright mark ©1993
C5 4,8 197 ¿ inverted question mark ¿Qué?
C6 4,7 198 ¡ inverted exclamation mark ¡Esta!
CF 1,23 207 ß Ess Zed Preußen

updates

2013-05-18: LDS ANSEL versus LDS ANSEL

LDS ANSEL versus LDS ANSEL compares the ANSEL tables provided in different GEDCOM versions.

2014-08-29: GEDCOM specifications

FamilySearch has removed the GEDCOM 5.5 and 5.5.1 specifications from their site. These and other specifications can now be found on the FamilySearch GEDCOM Specifications page.

2018-04-08: GEDCOM ANSEL Series

This article is now part of the GEDCOM ANSEL series.

links