Kodi Community Forum

Full Version: utf-8 file names and content
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
If you need to access a file with a utf-8 path, then you need to explicitly encode the path. If your text is utf-8, then you need to specify the encoding as utf-8:

io.open(filename.encode('utf-8'), mode='rt', encoding='utf-8)

Normally, Python discovers the filesystem encoding (for filenames) and sets it. However, due to a patch introduced in Kodi 19.2 (https://github.com/xbmc/xbmc/issues/19883) to work around what looks like a nasty Kodi Turkish (and other) string handling problem, the filename encoding is 'ASCII' instead of 'utf-8' (at least on Linux). This means that you have to explicitly specify it (at least until the other bug is fixed).

I'm not sure of the behavior of utf-8 filenames on different windows versions or OS's that don't support utf-8 filenames. Most modern systems support utf-8 paths.

Failure to specify filename.encode('utf-8') can cause errors about out of range ASCII characters when the filename contains non-ASCII characters

Issue 19883 is a cautionary tale about subtle handling of character comparison, etc. in different languages. They don't always obey the rules that we expect.