utf-8 file names and content
#1
If you need to access a file with a utf-8 path, then you need to explicitly encode the path. If your text is utf-8, then you need to specify the encoding as utf-8:

io.open(filename.encode('utf-8'), mode='rt', encoding='utf-8)

Normally, Python discovers the filesystem encoding (for filenames) and sets it. However, due to a patch introduced in Kodi 19.2 (https://github.com/xbmc/xbmc/issues/19883) to work around what looks like a nasty Kodi Turkish (and other) string handling problem, the filename encoding is 'ASCII' instead of 'utf-8' (at least on Linux). This means that you have to explicitly specify it (at least until the other bug is fixed).

I'm not sure of the behavior of utf-8 filenames on different windows versions or OS's that don't support utf-8 filenames. Most modern systems support utf-8 paths.

Failure to specify filename.encode('utf-8') can cause errors about out of range ASCII characters when the filename contains non-ASCII characters

Issue 19883 is a cautionary tale about subtle handling of character comparison, etc. in different languages. They don't always obey the rules that we expect.
Reply

Logout Mark Read Team Forum Stats Members Help
utf-8 file names and content0