Release [Module] youtube-dl - multi-site playable URL resolver
@fbacher thanks for the detailed post. Are you still working on this?

I don't know about Kodi's utf-8 hack for Turkish exactly, but I do notice Kodi changing reasonable locale and filesystem encoding settings. In my addon I was able to download files with non-ASCII names (from yle, not youtube) and display them correctly within Kodi. I will give a quick explanation of my approach here in becomes useful.

Overview:
- Extract filename (and download link) into python string (unicode code points).
- Set download directory, using `xbmcvfs.translatePath()` if required.
- Append filename (utf-8 encoded) to complete download path.
- Download file
- Open filename (utf-8 encoded) and write downloaded data as binary data.
- Pass filename (utf-8 encoded) and path to Kodi as list item.

python:

from urllib.parse import urlencode

filename = "non-ääsci"
url = "example.xyz/video/12345"

# Download path
target_dir = xbmcvfs.translatePath("special://temp").encode("utf-8", "surrogateescape")
filepath = os.path.join(target_dir, filename.encode("utf-8", "surrogateescape"))

# Download remote file.
res = requests.get(url, stream=True)

# Open filename (utf-8 encoded) and write downloaded data as binary data.
with open(filepath, "ab") as file:
    for chunk in res.iter_content(chunk_size=1024**2):
        file.write(chunk)

# Create list item entry for downloaded file.
list_item = xbmcgui.ListItem(label=filename.encode("utf-8", "surrogateescape"))
# From memory it is best to give unicode to urlencode
attrs = {"local_file_playback": filepath.decode("utf-8", "surrogateescape")}
param_string = urlencode(attrs, encoding="utf-8", errors="surrogateescape")
callback_url =  f"plugin://{addon_id}/?{param_string}"
listing = (callback_url, list_item, False)

Obviously this code is incomplete, and requires setting  relevant `addon_id` and `_handle` and having a function to handle "local_file_playback" parameter to play the video.

In summary:
- Changes Kodi makes to Python locale are ignored by explicitly using utf-8 for filenames.
- Changes Kodi makes to Python filesystem encoding are ignored by writing the file as binary data. i.e. video file is not a text format so encoding is not an issue.
- Filenames and paths are passed to Kodi functions are utf-8 encoded bytes.


Extra notes:
- Python os functions accept either bytes or unicode strings: return type matches input type.
- Everything in the OS is bytes (eg. filenames), so use bytes as input to os functions.
- Not all bytes have valid unicode representations when decoded from utf-8.
- Since the underlying bytes can possibly be invalid, error handling is required..
  The surrogateescape error mode represents the invalid utf-8 bytes as reserved unicode code points.
  It is the only error mode in python that provides lossless handling of the invalid utf-8 bytes.
  e.g. b"\xff" == b"\xff".decode("utf-8", "surrogateescape").encode("utf-8", "surrogateescape")
- Many Kodi functions can accept either python unicode strings or utf-8 bytestrings, but they don't accept python unicode strings containing reserved code points (i.e. unprintable utf-8 decoded using surrogateescape error handler.)
Reply


Messages In This Thread
v14.810.0 - by ruuk - 2014-08-11, 19:45
v14.925.0 - by ruuk - 2014-09-25, 17:49
v14.1026.0 - by ruuk - 2014-10-26, 21:53
v14.1210.0 - by ruuk - 2014-12-10, 23:32
v14.1210.1 - by ruuk - 2014-12-11, 21:17
v15.318.0 - by ruuk - 2015-04-02, 23:01
v15.1123.0 - by ruuk - 2015-11-23, 21:20
v15.1124.0 - by ruuk - 2015-11-24, 21:05
v15.1223.0 - by ruuk - 2015-12-23, 22:27
v16.306.0 - by ruuk - 2016-03-13, 19:05
v16.318.0 - by ruuk - 2016-03-21, 20:31
v16.327.0 - by ruuk - 2016-03-31, 01:12
v16.521.0 - by ruuk - 2016-05-22, 18:15
v16.627.0 - by ruuk - 2016-06-28, 20:38
v16.1026.0 - by ruuk - 2016-10-28, 18:32
v17.310.0 - by ruuk - 2017-03-11, 23:47
v17.518.0 - by ruuk - 2017-05-21, 08:53
v17.518.1 - by ruuk - 2017-05-25, 04:00
v17.709.0 - by ruuk - 2017-07-11, 03:00
v17.1228.0 - by ruuk - 2017-12-28, 22:46
v17.1231.0 - by ruuk - 2018-01-01, 05:46
v18.320.0 - by ruuk - 2018-03-25, 02:07
v18.425.0 - by ruuk - 2018-04-28, 02:53
v14.1217.0 - by ruuk - 2014-12-27, 21:12
RE: Script Error Message - by ruuk - 2016-04-14, 13:15
RE: v16.1026.0 - by Lunatixz - 2016-10-28, 20:25
L0RE - by L0RE - 2017-12-19, 14:34
RE: [Module] youtube-dl - multi-site playable URL resolver - by finnhubb - 2022-03-28, 16:01
Script Error Message - by eondesigns1138 - 2016-04-08, 10:46
Error python dependency 2.25.0 - by Alyy - 2016-05-21, 10:45
Logout Mark Read Team Forum Stats Members Help
[Module] youtube-dl - multi-site playable URL resolver2