Music library causing JSON encoding errors
#1
I'm just throwing this out in case it is a known issue; don't know that it's worth spending time investigating.  I was working on an error in a python 3 script that happens when a music library query result is converted to JSON but isn't correctly encoded in utf-8.  From what I gather, it seems like the problem is that there is string data in the library that is implicitly encoded in an 8 bit encoding such as 1251 and not converted to utf-8 prior to insertion into the db.  So there are illegal byte strings with values 128-255 that are being passed into the JSON response.  When the system tries to decode this the resulting stream is invalid UTF-8.

So I guess my question is, are the db routines supposed to enforce utf-8 encoding on text during an update?

edit:  doing some more digging, the error I get is on a byte 0x81 which I don't know what that is supposed to encode.

scott s.
.
maintainer of skin  Aeon MQ5 mods for post-Gotham Kodi releases:
Krypton
Leia
Reply
#2
@scott967 can you post the query ?
Learning Linux the hard way !!
Reply
#3
Quote:are the db routines supposed to enforce utf-8 encoding on text during an update?
I have no idea @scott967, but a good question. From various posts it feels like a modern trend to have all kinds of fancy chars in data and file names. I'm just too old for all this I guess, it took me years to get used to being able to have spaces in names let alone emojis (a user asked).

I am keeping out of the file/folder name thing best I can, but if you can post a concrete example of data nad JSON call then I will take a look.
Reply
#4
(2019-11-22, 19:35)black_eagle Wrote: @scott967 can you post the query ?
Here is my thread on the issue 349455 (thread) the debug log uwazumitaw.kodi (paste) contains the query line 887.  That thread gives the background on the issue and a PR which I interpret to be a work-around of escaping-out c char data that isn't in a utf-8 encoding so that python can handle it (python 3 strings data type are sequence of unicode codepoints, so Kodi must handle conversion of char or wchar in the python interface).  I don't know that the issue is unique to music database.  It just seemed to occur after a music update so I assume it's data-dependent.  My assumption is that when the query results are formatted into JSON they should be in a valid utf-8 byte array at that point.  I didn't know if you all were aware of this issue.

scott s.
.
maintainer of skin  Aeon MQ5 mods for post-Gotham Kodi releases:
Krypton
Leia
Reply
#5
I for one was happily oblivious.

Can you share the data that the GetAlbumsByWhereJSON call would have returned by running the query (listed in the debug) in a SQLite browser, hoping that you still have the same library available. Seeing the data, or some idea of where the offening byte was in the JSON data will help me get my head around it better.

GetAlbumsByWhereJSON builds the JSON results as a variant array, but nothing to ensure valid utf-8 byte is done. I don't think anywhere else checks it either.

Is this a Python 3 thing compared to Python 2?  If so then the answer may lie in the Python interface side of things rather than when the JSON is created
Reply
 
Thread Rating:
  • 0 Vote(s) - 0 Average



Logout Mark Read Team Forum Stats Members Help
Music library causing JSON encoding errors00