Kodi Community Forum

Full Version: How to get unicode from python to $INFO label
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I have some code that uses this string:

Code:
u'Sigur R\xc3\xb3s'

(python repr())

...which would appear a to be a utf-8 encoded unicode string (Although I ma very weak in this area!)

and I am setting that to a window property via:

Code:
xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)

(in a WindowXML)

I suspect I am going wrong somewhere basic but an arvo of researching various encoding things has got me no closer...

anyone have ideas??

...however, this results in gobbledygook on screen.
Try to convert it to a bytestring

s = u'Sigur R\xc3\xb3s'.encode('utf-8')
Unfortunately that doesn't work...same result.

Any other ideas - I think the info IS unicode utf-8, but I think maybe XBMC isn't interpreting it as such
Hmmm ok passing it just artist = 'Sigur R\xc3\xb3s' WITHOUT making it a uncide string works!

That's odd...must be a double translation thing I guess?

Now, how to get the unciode strings into basic string in python - i.e. cast them I guess. I find this area a bit confusing....
The problem is I am using a downstream library and is returning strings with these characters in them, so 'Sigur R\xc3\xb3s' - and these are type as unicode.

If I then pass them as this type, they come out in xbmc wonky. I need to just cast them or get the literal value of the string...but I can't seem to just get the literal value from a unicode string in a variable...

I think I am missing something obvious but have been missing it for two days now and it's driving me nuts!

Any python experts know how to do this??
bossanova808 Wrote:The problem is I am using a downstream library and is returning strings with these characters in them, so 'Sigur R\xc3\xb3s' - and these are type as unicode.

If I then pass them as this type, they come out in xbmc wonky. I need to just cast them or get the literal value of the string...but I can't seem to just get the literal value from a unicode string in a variable...

I think I am missing something obvious but have been missing it for two days now and it's driving me nuts!

Any python experts know how to do this??

I thought it looked like a unicoded utf-8 string...

I use the following python code to insure that the string is in utf-8 coding.
Code:
def get_unicode( to_decode ):
    final = []
    try:
        temp_string = to_decode.encode('utf8')
        return to_decode
    except:
        while True:
            try:
                final.append(to_decode.decode('utf8'))
                break
            except UnicodeDecodeError, exc:
                # everything up to crazy character should be good
                final.append(to_decode[:exc.start].decode('utf8'))
                # crazy character is probably latin1
                final.append(to_decode[exc.start].decode('latin1'))
                # remove already encoded stuff
                to_decode = to_decode[exc.start+1:]
        return "".join(final)

Then I send to XBMC the string with a '.decode("utf-8")' This shows the artist in the proper format(usually..)
mmm, that seemed to give me the same results. This might make it clearer (perhaps)!

Code:
title, artist, album = self.player.getCurrentTrack()
    print "artist (raises exception about ordinal out of range if printed as is) "
    print repr(artist)
    artist2 = 'Sigur R\xc3\xb3s'
    print "artist2 is " + artist2
    print type(artist2)

    #newa =self.get_unicode(artist)
    xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTTITLE", title)
    xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)

and output:

Code:
14:06:58 T:756  NOTICE: artist (raises exception about ordinal out of range if printed as is)
14:06:58 T:756  NOTICE: u'Sigur R\xc3\xb3s'
14:06:58 T:756  NOTICE: artist2 is Sigur Rós
14:06:58 T:756  NOTICE: <type 'str'>

If I pass artist 2 - correct onscreen display

pass artist 1 - gobbldeygook
What's the code in self.player.getCurrentTrack() I think the problem is there. With out the u' prefix it properly works, as you say, but nothing seems to be able to strip out.

bossanova808 Wrote:mmm, that seemed to give me the same results. This might make it clearer (perhaps)!

Code:
title, artist, album = self.player.getCurrentTrack()
    print "artist (raises exception about ordinal out of range if printed as is) "
    print repr(artist)
    artist2 = 'Sigur R\xc3\xb3s'
    print "artist2 is " + artist2
    print type(artist2)

    #newa =self.get_unicode(artist)
    xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTTITLE", title)
    xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)

and output:

Code:
14:06:58 T:756  NOTICE: artist (raises exception about ordinal out of range if printed as is)
14:06:58 T:756  NOTICE: u'Sigur R\xc3\xb3s'
14:06:58 T:756  NOTICE: artist2 is Sigur Rós
14:06:58 T:756  NOTICE: <type 'str'>

If I pass artist 2 - correct onscreen display

pass artist 1 - gobbldeygook
Code:
artist = self.playlist[currentIndex]['artist']

...which is looking at the result of getplaylist:

    self.playlist = self.sb.playlist_get_info()

...

    def playlist_get_info(self):
        """Get info about the tracks in the current playlist"""
        amount = self.playlist_track_count()
        response = self.request('status 0 %i' % amount, True)
        encoded_list = response.split('playlist%20index')[1:]
        playlist = []
        for encoded in encoded_list:
            data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
            item = {}
            for info in data:
                info = info.split(':')
                key = info.pop(0)
                if key:
                    item[key] = ':'.join(info)
            item['position'] = int(item['position'])
            item['id'] = int(item['id'])
            item['duration'] = float(item['duration'])
            playlist.append(item)
        return playlist

and __unquote is:

    def __unquote(self, text):
        try:
            import urllib.parse
            return urllib.parse.unquote (text, encoding=self.charset)
        except ImportError:
            import urllib
            return urllib.unquote(text)

(it does raise the exception and fo through ro just urllib.unquote(text) rather than the .parse version).

I wrote basically none of those functions, they are from pysqueezecenter and I use this in lots of places, so ideally I want to fix it externally if I can...as if I change the output it will likely break other things.

I even tried using repr() on it and then stripping off the u' and the final ' in a gross hack but that didn't work...which surprised me.
I know you really don't want to change the coding, but can you change the response line to the following:
Code:
response = self.request('status 0 %i' % amount, False)

bossanova808 Wrote:
Code:
artist = self.playlist[currentIndex]['artist']

...which is looking at the result of getplaylist:

    self.playlist = self.sb.playlist_get_info()

...

    def playlist_get_info(self):
        """Get info about the tracks in the current playlist"""
        amount = self.playlist_track_count()
        response = self.request('status 0 %i' % amount, True)
        encoded_list = response.split('playlist%20index')[1:]
        playlist = []
        for encoded in encoded_list:
            data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
            item = {}
            for info in data:
                info = info.split(':')
                key = info.pop(0)
                if key:
                    item[key] = ':'.join(info)
            item['position'] = int(item['position'])
            item['id'] = int(item['id'])
            item['duration'] = float(item['duration'])
            playlist.append(item)
        return playlist

and __unquote is:

    def __unquote(self, text):
        try:
            import urllib.parse
            return urllib.parse.unquote (text, encoding=self.charset)
        except ImportError:
            import urllib
            return urllib.unquote(text)

(it does raise the exception and fo through ro just urllib.unquote(text) rather than the .parse version).

I wrote basically none of those functions, they are from pysqueezecenter and I use this in lots of places, so ideally I want to fix it externally if I can...as if I change the output it will likely break other things.

I even tried using repr() on it and then stripping off the u' and the final ' in a gross hack but that didn't work...which surprised me.
Unfortunately that break the entire function...the data that comes back from the server looks like:

Code:
response = self.request('status 0 %i' % amount, True)
        print "response" + str(response)
        encoded_list = response.split('playlist%20index')[1:]
        playlist = []
        for encoded in encoded_list:
            print "encoded" + encoded
            data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
            print "data" + str(data)


20:08:06 T:5232  NOTICE: response1 player_name%3ASqueezeslave player_connected%3A1 player_ip%3A192.168.1.9%3A49712 power%3A1 signalstrength%3A0 mode%3Astop time%3A0 rate%3A1 duration%3A603.826 can_seek%3A1 mixer%20volume%3A50 playlist%20repeat%3A0 playlist%20shuffle%3A0 playlist%20mode%3Aoff seq_no%3A0 playlist_cur_index%3A1 playlist_timestamp%3A1330160627.81035 playlist_tracks%3A11 playlist%20index%3A0 id%3A11144 title%3AIntro genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A100.493 playlist%20index%3A1 id%3A11145 title%3ASvefn-g-englar genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A603.826 playlist%20index%3A2 id%3A11146 title%3AStar%C3%A1lfur genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A406.933 playlist%20index%3A3 id%3A11147 title%3AFlugufrelsarinn genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A467.84 playlist%20index%3A4 id%3A11148 title%3AN%C3%BD%20batter%C3%AD genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A489.533 playlist%20index%3A5 id%3A11149 title%3AHjarta%C3%B0%20hamast%20(bamm%20bamm%20bamm) genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A430.546 playlist%20index%3A6 id%3A11150 title%3AVi%C3%B0ar%20vel%20tl%20loft%C3%A1rasa genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A617.013 playlist%20index%3A7 id%3A11151 title%3AOlsen%20Olsen genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A484.24 playlist%20index%3A8 id%3A11152 title%3A%C3%81g%C3%A6tis%20byrjun genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A474.653 playlist%20index%3A9 id%3A11153 title%3AAvalon genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A246.146 playlist%20index%3A10 id%3A19959 title%3ASvefn-G-Englar genre%3APop artist%3ASigur%20R%C3%B3s album%3AThe%20Pitchfork%20500 duration%3A604.081

20:08:06 T:5232  NOTICE: encoded%3A0 id%3A11144 title%3AIntro genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A100.493

20:08:06 T:5232  NOTICE: data[u'position:0', u'id:11144', u'title:Intro', u'genre:Pop', u'artist:Sigur R\xc3\xb3s', u'album:\xc3\x81g\xc3\xa6tis byrjun', u'duration:100.493', u'']
Found the problem.. It's a bug in the python urillib.unquote() module... -> http://bugs.python.org/issue8136.

Now to find the way to correct it...

The easiest is to modify the __unquote() in the server.py from:
Code:
def __quote(self, text):
        try:
            import urllib.parse
            return urllib.parse.quote(text, encoding=self.charset)
        except ImportError:
            import urllib
            return urllib.quote(text)

TO
Code:
def __quote(self, text):
        try:
            import urllib.parse
            return urllib.parse.quote(text, encoding=self.charset)
        except ImportError:
            #import urllib
            #return urllib.quote(text)
            if isinstance(text, unicode):
                text = text.encode('utf-8')
            res = text.split('%')
            for i in xrange(1, len(res)):
                item = res[i]
                try:
                    res[i] = _hextochr[item[:2]] + item[2:]
                except KeyError:
                    res[i] = '%' + item
                except UnicodeDecodeError:
                    res[i] = unichr(int(item[:2], 16)) + item[2:]
            return "".join(res)


This puts the patched code to fix the urllib.quote() in place of calling the urllib.quote() code.
That looks like some amazing searching and indeed this issue...

However, you seem to have modified __quote instead of __unquote - is that right?

I tried it as __unquote (change the name and the call to __unquote) - I am currently stuck on _hextochr not being recognised....
bossanova808 Wrote:That looks like some amazing searching and indeed this issue...

However, you seem to have modified __quote instead of __unquote - is that right?

I tried it as __unquote (change the name and the call to __unquote) - I am currently stuck on _hextochr not being recognised....

yep my bad... It should be in the __unquote() section.

Heres the real code(found the missing _hextochr):

Code:
def __unquote(self, text):
        try:
            import urllib.parse
            return urllib.parse.unquote(text, encoding=self.charset)
        except ImportError:
            #import urllib
            #return urllib.unquote(text)
            _hexdig = '0123456789ABCDEFabcdef'
            _hextochr = dict((a+b, chr(int(a+b,16))) for a in _hexdig for b in _hexdig)
            if isinstance(text, unicode):
                text = text.encode('utf-8')
            res = text.split('%')
            for i in xrange(1, len(res)):
                item = res[i]
                try:
                    res[i] = _hextochr[item[:2]] + item[2:]
                except KeyError:
                    res[i] = '%' + item
                except UnicodeDecodeError:
                    res[i] = unichr(int(item[:2], 16)) + item[2:]
            return "".join(res)
Give that man a cigar...

Yep, that works, and has the by-product of changing some other funky-ness in my code to something much simpler & neater.

Many many thanks mate, you went above and beyond.
Pages: 1 2