These are the official ANSEL to Unicode conversion tables. Well, these is as official as it gets.
Technically, these aren't an ANSEL to Unicode tables, but a LDS ANSEL (LANSEL) to Unicode tables. The tables presented here include all the LDS Extensions to ANSEL proper, found in different versions of the FamilySearch GEDCOM specifications.
The tables presented here are based on multiple previous articles, now combined into the GEDCOM ANSEL series. The first article presents a correct ANSEL table for GEDCOM 5.5.1, as that specification contains a messed up table. The second article combines the ANSEL standard and ANSEL tables from different GEDCOM versions into a single table. The third articles discusses the mapping of the Alif and Ayn characters, as well as a few other characters.
This article presents the LDS ANSEL to Unicode conversion tables, based upon the previous articles, the official ISO 5426 (and ANSEL) to Unicode conversion table, official MARC-21 to Unicode conversion, and existing implementations, particularly FamilySearch's Personal Ancestral File (PAF) 5.2.18. The ANSEL ALif and Ayn article explains why PAF's alif and ayn conversion is wrong.
There is more to conversion from ANSEL to Unicode than just these tables.
In ANSEL, modifying characters come before the base characters, in Unicode they come after the base character.
Conversion from ANSEL to Unicode and back is easiest for Unicode Normal Form D (decomposed characters), which Mac OS X uses,
and takes extra works for Unicode Normal Form C (composed characters), which Windows uses.
| Hex | wpcode | Dec | Graphic | Name | example of use | code point | Name |
|---|---|---|---|---|---|---|---|
| E0 | 2,4 | 224 | ◌̉ | low rising tone mark | củi | U+0309 | Combining Hook Above |
| E1 | 1,0 | 225 | ◌̀ | grave accent | règle | U+0300 | Combining Grave Accent |
| E2 | 1,6 | 226 | ◌́ | acute accent | está | U+0301 | Combining Acute Accent |
| E3 | 1,3 | 227 | ◌̂ | circumflex accent | même | U+0302 | Combining Circumflex Accent |
| E4 | 1,2 | 228 | ◌̃ | tilde | niño | U+0303 | Combining Tilde |
| E5 | 1,8 | 229 | ◌̄ | macron | gājējs | U+0304 | Combining Macron |
| E6 | 1,22 | 230 | ◌̆ | breve | altă | U+0306 | Combining Breve |
| E7 | 1,15 | 231 | ◌̇ | dot above | żaba | U+0307 | Combining Dot Above |
| E8 | 1,7 | 232 | ◌̈ | umlaut (diaeresis) | öppna | U+0308 | Combining Diaeresis |
| E9 | 1,19 | 233 | ◌̌ | hacek | vždy | U+030C | Combining Caron |
| EA | 1,14 | 234 | ◌̊ | circle above (angstrom) | hår | U+030A | Combining Ring Above |
| EB | 2,11 | 235 | ◌︠ | ligature, left half | akademii︠a︡ | U+FE20 | Combining Ligature Left Half |
| EC | 2,12 | 236 | ◌︡ | ligature, right half | akademii︠a︡ | U+FE21 | Combining Ligature Right Half |
| ED | 1,10 | 237 | ◌̕ | high comma, off center | rozdel̕ ovac | U+0315 | Combining Comma Above Right |
| EE | 1,16 | 238 | ◌̋ | double acute accent | időszaki | U+030B | Combining Double Acute Accent |
| EF | 2,25 | 239 | ◌̐ | candrabindu | Alii̐ev | U+0310 | Combining Cadrabindu |
| F0 | 1,17 | 240 | ◌̧ | cedilla | ça | U+0327 | Combining Cedilla |
| F1 | 1,18 | 241 | ◌̨ | right hook, ogonek | vietą | U+0328 | Combining Ogonek |
| F2 | 2,0 | 242 | ◌̣ | dot below | teḍa | U+0323 | Combining Dot Below |
| F3 | 2,1 | 243 | ◌̤ | double dot below | k̲h̲ut̤bah | U+0324 | Combining Diaeresis Below |
| F4 | 2,3 | 244 | ◌̥ | circle below | Samskr̥ta | U+0325 | Combining Ring Below |
| F5 | 2,6 | 245 | ◌̳ | double underscore | G̳hulam | U+0333 | Combining Double Low Line |
| F6 | 2,7 | 246 | ◌̲ | underscore | s̲amar | U+0332 | Combining Low Line |
| F7 | 2,16 | 247 | ◌̦ | left hook | dārzin̦a | U+0326 | Combining Comma Below |
| F8 | 2,14 | 248 | ◌̜ | right cedilla | kho̜ng | U+031C | Combining Left Half Ring Below |
| F9 | 2,9 | 249 | ◌̮ | half circle below (upadhmaniya) | ḫumantuš | U+032E | Combining Breve Below |
| FA | 250 | ◌︢ | double tilde, left half | n︢g︣alan | U+FE22 | Combining Double Tilde Left Half | |
| FB | 251 | ◌︣ | double tilde, right half | n︢g︣alan | U+FE23 | Combining Double Tilde Right Half | |
| FC | 1,5 | 252 | ◌̸ | diacritic slash through char | U+0338 | Combining Long Solidus Overlay | |
| FD | 253 | � | unused | U+FFFD | Replacement Character | ||
| FE | 1,9 | 254 | ◌̓ | high comma, centered | ge̓otermika | U+0313 | Combining Comma Above |
| FF | 255 | � | illegal | U+FFFD | Replacement Character |
| Hex | wpcode | Dec | Graphic | Name | example of use | code point | Name |
|---|---|---|---|---|---|---|---|
| A0 | 160 | � | unused | U+FFFD | Replacement Character | ||
| A1 | 1,152 | 161 | Ł | slash L — uppercase | Łódź | U+0141 | Latin Capital Letter L with Stroke |
| A2 | 1,80 | 162 | Ø | slash O — uppercase | Øst | U+00D8 | Latin Capital Letter O with Stroke |
| A3 | 1,78 | 163 | Đ | slash D — uppercase | Đuro | U+0110 | Latin Capital Letter D with Stroke |
| A4 | 1,88 | 164 | Þ | thorn — uppercase | Þann | U+00DE | Latin Capital Letter Thorn |
| A5 | 1,36 | 165 | Æ | ligature AE — uppercase | Ægir | U+00C6 | Latin Capital Letter AE |
| A6 | 1,166 | 166 | Œ | ligature OE — uppercase | Œuvre | U+0152 | Latin Capital Ligature OE |
| A7 | 1,6 | 167 | ◌ʹ | mjagkij znak | fakulʹtet | U+02B9 | Modifier Letter Prime |
| A8 | 1,1 | 168 | · | middle dot | novel·la | U+00B7 | Middle Dot |
| A9 | 5,28 | 169 | ♭ | musical flat | B♭ | U+266D | Musical Flat Sign |
| AA | 4,32 | 170 | ® | registered trademark | ABC® | U+00AE | Registered Sign |
| AB | 6,1 | 171 | ± | plus or minus | A±B | U+00B1 | Plus-Minus Sign |
| AC | 1,230 | 172 | Ơ | hook O - uppercase | BƠ | U+01A0 | Latin Capital Leter O with Horn |
| AD | 1,232 | 173 | Ư | hook U - uppercase | XƯA | U+01AF | Latin Capital Letter U with Horn |
| AE | 1,11 | 174 | ◌ʼ | alif | Unʼyusho | U+02BC | Modifier Letter Apostrophe |
| AF | 175 | � | unused | U+FFFD | Replacement Character | ||
| B0 | 2,11 | 176 | ◌ʻ | ayn | faʻil | U+02BB | Modifier Letter Turned Comma |
| B1 | 1,153 | 177 | ł | slash l— lowercase | rozbił | U+0142 | Latin Small Letter L with Stroke |
| B2 | 1,81 | 178 | ø | slash o— lowercase | høj | U+00F8 | Latin Small Letter O with Stroke |
| B3 | 1,79 | 179 | đ | slash d— lowercase | đavola | U+0111 | Latin Small Letter D with Stroke |
| B4 | 1,89 | 180 | þ | thorn— lowercase | þann | U+00FE | Latin Small Letter Thorn |
| B5 | 1,37 | 181 | æ | ligature ae— lowercase | skæg | U+00E6 | Latin Small Letter AE |
| B6 | 1,167 | 182 | œ | ligature oe— lowercase | œuvre | U+0153 | Latin Small Ligature OE |
| B7 | 1,16 | 183 | ◌ʺ | hard sign (tvjordyj znak) | obʺi︠a︡vlenie | U+02BA | Modified Letter Double Prime |
| B8 | 1,24 | 184 | ı | dotless i— lowercase | masalı | U+0131 | Latin Small Letter Dotless I |
| B9 | 4,11 | 185 | £ | British pound | £5.00 | U+00A3 | Pound Sign |
| BA | 1,87 | 186 | ð | eth | verður | U+00F0 | Latin Small Letter Eth |
| BB | 187 | � | unused | U+FFFD | Replacement Character | ||
| BC | 1,231 | 188 | ơ | hook o - lowercase | Sơ | U+01A1 | Latin Small O with Horn |
| BD | 1,233 | 189 | ư | hook u - uppercase | Tự Đức | U+01B0 | Latin Small U with Horn |
| BE | 190 | □ | empty box | U+25A1 | Empty Box | ||
| BF | 191 | ■ | black box | U+25A0 | Black Box | ||
| C0 | 6,33 | 192 | ° | degree sign | 10°C. | U+00B0 | Degree Sign |
| C1 | 6,49 | 193 | ℓ | script l | 25 ℓ. | U+2113 | Script Small L |
| C2 | 4,71 | 194 | ℗ | phono copyright mark | Decca℗ | U+2117 | Sound recording copyright |
| C3 | 4,23 | 195 | © | copyright mark | ©1993 | U+00A9 | Copyright Sign |
| C4 | 5,27 | 196 | ♯ | music sharp sign | D♯ | U+266F | Music Sharp Sign |
| C5 | 4,8 | 197 | ¿ | inverted question mark | ¿Qué? | U+00BF | Inverted Question Mark |
| C6 | 4,7 | 198 | ¡ | inverted exclamation mark | ¡Esta! | U+00A1 | Inverted Exclamation Mark |
| C7 | 199 | � | unused | U+FFFD | Replacement Character | ||
| C8 | 200 | � | unused | U+FFFD | Replacement Character | ||
| C9 | 201 | � | unused | U+FFFD | Replacement Character | ||
| CA | 202 | � | unused | U+FFFD | Replacement Character | ||
| CB | 203 | � | unused | U+FFFD | Replacement Character | ||
| CC | 204 | � | unused | U+FFFD | Replacement Character | ||
| CD | 205 | e | e in middle of line | U+0065 | Latin Small Letter E | ||
| CE | 206 | o | o in middle of line | U+006F | Latin Small Letter O | ||
| CF | 1,23 | 207 | ß | Ess Zed | Preußen | U+00DF | Latin Small Letter Sharp S |
Grey text: code points not documented in the FamilySearch GEDCOM 5.5.1 specification.
Brown text: LDS extensions.
Copyright © Tamura Jones. All Rights reserved.