#1 Latin (Small) Letter Components von Christoph Päper 21.05.2014 11:15

I would like to encode constituent components of lowercase roman letters, as described by [URL=http://idsl1.phil-fak.uni-koeln.de/fileadmin/IDSLI/dozentenseiten/artikel_primus/Primus_Featural_Analysis_2004.pdf]Primus[/URL], Brekle, Watt and others. The analyses differ in details, but generally identify a head-dependent structure of the abstracted glyphs. [i]Heads[/i] or [i]stems[/i] or [i]hastas[/i] or [i]vexillums/vexilli[/i] are usually vertical and include the ascender or descender. [i]Dependents[/i] or [i]codas[/i] or [i]augments[/i] are usually restricted to the middle band and are often rounded or diagonal. Some letters, e.g. ‹x›, ‹l›, ‹o›, ‹c›, ‹w›, ‹s›, ‹m›, ‹j›, are more complicated to analyze in a useful way than others. Some letters have graphic variants that need separate consideration, e.g. ‹a:ɑ›, ‹g:ɡ›, ‹s:ſ›, ‹z:ʒ› and ‹f:ƒ›.

Some of the glyphs required may already be available in fonts for phonetics, punctuation or math characters, but often in a clumsy way that doesn’t look quite like letters. Also, if you wanted to apply Open Type ligation features, i.e. [code]liga[/code], [code]dlig[/code] etc., to them when they appear next to each other, you’d better be able to access them unambiguously.

The following set of idealized components should work with most approaches:

LATIN LETTER COMPONENT VERTICAL LINE ≈ U+007C VERTICAL LINE ‹|›
LATIN LETTER COMPONENT HORIZONTAL LINE ≈ U+002D HYPHEN-MINUS ‹-›
LATIN LETTER COMPONENT VERTICAL LINE WITH RIGHT-BEND BOTTOM ≈ U+0269 LATIN SMALL LETTER IOTA ‹ɩ›
LATIN LETTER COMPONENT VERTICAL LINE WITH RIGHT-BEND TOP ≈ U+027E LATIN SMALL LETTER R WITH FISHHOOK ‹ɾ›
LATIN LETTER COMPONENT VERTICAL LINE WITH LEFT-BEND BOTTOM ≈ U+0279 LATIN SMALL LETTER TURNED R ‹ɹ›
LATIN LETTER COMPONENT VERTICAL LINE WITH LEFT-BEND TOP ≈ U+027F LATIN SMALL LETTER REVERSED R WITH FISHHOOK ‹ɿ›
LATIN LETTER COMPONENT CIRCLE ≈ U+006F LATIN SMALL LETTER O ‹o›
LATIN LETTER COMPONENT LEFT HALF-CIRCLE ≈ U+0063 LATIN SMALL LETTER C ‹c›
LATIN LETTER COMPONENT RIGHT HALF-CIRCLE ≈ U+0254 LATIN SMALL LETTER OPEN O ‹ɔ›
LATIN LETTER COMPONENT BOTTOM HALF-CIRCLE ≈ U+222A UNION ‹∪›
LATIN LETTER COMPONENT TOP HALF-CIRCLE ≈ U+2229 INTERSECTION ‹∩›
LATIN LETTER COMPONENT LEFT DIAGONAL LINE ≈ U+002F SOLIDUS ‹/›
LATIN LETTER COMPONENT RIGHT DIAGONAL LINE ≈ U+005C REVERSE SOLIDUS ‹\›

A single row with 16 slots would be enough for these, but I’m not sure yet if I should do more verbose characters instead:

LATIN LETTER COMPONENT VERTICAL LINE BASE – as in ‹i› or ‹n› or ‹u›
LATIN LETTER COMPONENT VERTICAL LINE UPPER – as in Fuhrmann/Buchmann’s ‹l› ≠ U+02C8 MODIFIER LETTER VERTICAL LINE
LATIN LETTER COMPONENT VERTICAL LINE LOWER – hypothetic ≠ U+02CC MODIFIER LETTER LOW VERTICAL LINE
LATIN LETTER COMPONENT BROKEN VERTICAL LINE – hypothetic in upper and lower band
LATIN LETTER COMPONENT LONG VERTICAL LINE UPPER – as in ‹b› or ‹l› or most uppercase letters
LATIN LETTER COMPONENT LONG VERTICAL LINE LOWER – as in ‹p› or ‹q›
LATIN LETTER COMPONENT FULL VERTICAL LINE – as in ‹þ› or italic ‹f›

LATIN LETTER COMPONENT HORIZONTAL LINE ABOVE BASE – as in ‹f› or ‹t› or ‹z›
LATIN LETTER COMPONENT HORIZONTAL LINE INSIDE BASE – as in ‹e›
LATIN LETTER COMPONENT HORIZONTAL LINE AT BASE – as in ‹z› or maybe ‹s›
LATIN LETTER COMPONENT HORIZONTAL LINE AT TOP – as in some uppercase letters like ‹F›
LATIN LETTER COMPONENT HORIZONTAL LINE AT BOTTOM – hypothetic

LATIN LETTER COMPONENT VERTICAL LINE WITH RIGHT-BEND BOTTOM BASE – as maybe in ‹i›
LATIN LETTER COMPONENT LONG VERTICAL LINE WITH RIGHT-BEND BOTTOM UPPER – as in ‹t›
LATIN LETTER COMPONENT LONG VERTICAL LINE WITH RIGHT-BEND BOTTOM LOWER – as maybe in ‹g›
LATIN LETTER COMPONENT FULL VERTICAL LINE WITH RIGHT-BEND BOTTOM – hypothetic

LATIN LETTER COMPONENT VERTICAL LINE WITH RIGHT-BEND TOP BASE – as in ‹r›
LATIN LETTER COMPONENT LONG VERTICAL LINE WITH RIGHT-BEND TOP UPPER – as in ‹f› or ‹ſ› or ‹ß›
LATIN LETTER COMPONENT LONG VERTICAL LINE WITH RIGHT-BEND TOP LOWER – hypothetic
LATIN LETTER COMPONENT FULL VERTICAL LINE WITH RIGHT-BEND TOP – as maybe in italic ‹f›

LATIN LETTER COMPONENT VERTICAL LINE WITH LEFT-BEND BOTTOM BASE – hypothetic
LATIN LETTER COMPONENT LONG VERTICAL LINE WITH LEFT-BEND BOTTOM UPPER – hypothetic
LATIN LETTER COMPONENT LONG VERTICAL LINE WITH LEFT-BEND BOTTOM LOWER – as in ‹j› or ‹ɡ› or maybe ‹y›
LATIN LETTER COMPONENT FULL VERTICAL LINE WITH LEFT-BEND BOTTOM – hypothetic

LATIN LETTER COMPONENT VERTICAL LINE WITH LEFT-BEND TOP BASE – as in ‹a› or ‹n› or ‹h›
LATIN LETTER COMPONENT LONG VERTICAL LINE WITH LEFT-BEND TOP UPPER – hypothetic
LATIN LETTER COMPONENT LONG VERTICAL LINE WITH LEFT-BEND TOP LOWER – hypothetic
LATIN LETTER COMPONENT FULL VERTICAL LINE WITH LEFT-BEND TOP – hypothetic

LATIN LETTER COMPONENT CIRCLE BASE – as in ‹o› or maybe ‹b›, ‹d›, ‹p›, ‹q›
LATIN LETTER COMPONENT CIRCLE UPPER – as maybe in uppercase ‹B›
LATIN LETTER COMPONENT CIRCLE LOWER – as maybe in ‹g›

LATIN LETTER COMPONENT LEFT HALF-CIRCLE BASE – as in ‹c› or ‹d› or ‹q› or maybe ‹s› or ‹k›
LATIN LETTER COMPONENT LEFT HALF-CIRCLE UPPER – hypothetic, maybe in uppercase ‹E:Ɛ› ≈ U+02BF MODIFIER LETTER LEFT HALF RING
LATIN LETTER COMPONENT LEFT HALF-CIRCLE LOWER – as maybe in ‹g›

LATIN LETTER COMPONENT RIGHT HALF-CIRCLE BASE – as in ‹b› or ‹p› or maybe ‹s›
LATIN LETTER COMPONENT RIGHT HALF-CIRCLE UPPER – as maybe in ‹ß› ≈ U+02BE MODIFIER LETTER RIGHT HALF RING
LATIN LETTER COMPONENT RIGHT HALF-CIRCLE LOWER – as maybe in ‹ɡ› or ‹j›

LATIN LETTER COMPONENT BOTTOM HALF-CIRCLE BASE – as maybe in ‹u› or ‹y›
LATIN LETTER COMPONENT BOTTOM HALF-CIRCLE UPPER – as maybe in uppercase ‹Y›
LATIN LETTER COMPONENT BOTTOM HALF-CIRCLE LOWER – hypothetic

LATIN LETTER COMPONENT TOP HALF-CIRCLE BASE – as maybe in ‹n› or ‹h›
LATIN LETTER COMPONENT TOP HALF-CIRCLE UPPER – hypothetic
LATIN LETTER COMPONENT TOP HALF-CIRCLE LOWER – hypothetic

LATIN LETTER COMPONENT LEFT DIAGONAL LINE BASE – as in ‹v› or maybe ‹k›
LATIN LETTER COMPONENT LEFT DIAGONAL LINE UPPER – hypothetic
LATIN LETTER COMPONENT LEFT DIAGONAL LINE LOWER – as maybe in ‹y›
LATIN LETTER COMPONENT LONG LEFT DIAGONAL LINE UPPER – as in uppercase ‹V›
LATIN LETTER COMPONENT LONG LEFT DIAGONAL LINE LOWER – as in ‹y›

LATIN LETTER COMPONENT RIGHT DIAGONAL LINE BASE – as in ‹v› or maybe ‹k› or uppercase ‹R›
LATIN LETTER COMPONENT RIGHT DIAGONAL LINE UPPER – hypothetic
LATIN LETTER COMPONENT RIGHT DIAGONAL LINE LOWER – hypothetic
LATIN LETTER COMPONENT LONG RIGHT DIAGONAL LINE UPPER – as in uppercase ‹N› or ‹V›
LATIN LETTER COMPONENT LONG RIGHT DIAGONAL LINE LOWER – hypothetic

Diacritics, including tittle, are not encoded again. Existing codepoints are used instead, e.g.:
LATIN LETTER COMPONENT DOT ABOVE = U+02D9 DOT ABOVE

That’s 53 more or less systematic entries plus the 13 prototypes perhaps, but I’ve marked 19 of them as hypothetic. Some of the hypothetic components, though, have appeared in uncial, insular and other outdated variants as well as in experimental or decorative forms of letters. Nevertheless, they’d occupy at least 4 rows, maybe 6.

My actual question is this: Where in the LINCUA PUA space should I put these characters?

Xobor Forum Software ©Xobor.de | Forum erstellen