Req Improve string utilities for Korean Hangum script
#1
Currently it appears that all unicode string utilities treat the character blocks for Hangul identically. As a result methods/functions such as string compare don't work as expected when the unicode strings contain a mix of Hangul precomposed syllables and Hangul Jamo.

Background: the Korean language is written using a symbol set known as Hangul. All sounds in spoken Korean are represented by characters known as "Jamo". These consist of the consonants and vowels. But in typical Korean written texts, these Jamo are combined into syllables.

Unicode spec v8.0 (current) provides codepoint blocks for both the Jamo and the syllables (considered to be "precomposed"). The spec also in section 3.12 provides the mathematical relationship between unicode Jamo glyphs and the precomosed syllables. Thus it is possible convert between the two representations.

There are probably a number of use cases where it would be useful to be able to do unicode string functions with arguments that contain Jamo or precomposed syllables. An example is the function in Kodi "startswith". Currently, the way the unicode string utils work in Kodi, if the argument in the "startswith" call is a Jamo glyph, it will never match to a precomposed syllable.

As an example, say a user wants to find all library items that start with the letter "k". In Korean, this would be expressed by looking for all items that start with "ᄀ". The user would expect that for example artists named "kim" (김) and "kang" (강) would match. But since these names use the precomposed syllable glyphs they aren't returned.

But if a call to "startswith" was made with an argument in the unicode Hangul Jamo block, the function could decompose the characters in the unicode string using the method in the unicode spec, and then determine if "startswith" returned true or false. For example 김 would be decomposed into the Jamo character sequence "ᄀ" "ᅵ" "ᄆ".

scott s.
.
Reply

Logout Mark Read Team Forum Stats Members Help
Improve string utilities for Korean Hangum script0