How to get unicode from python to $INFO label
#1
I have some code that uses this string:

Code:
u'Sigur R\xc3\xb3s'

(python repr())

...which would appear a to be a utf-8 encoded unicode string (Although I ma very weak in this area!)

and I am setting that to a window property via:

Code:
xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)

(in a WindowXML)

I suspect I am going wrong somewhere basic but an arvo of researching various encoding things has got me no closer...

anyone have ideas??

...however, this results in gobbledygook on screen.
Addons I wrote &/or maintain:
OzWeather (Australian BOM weather) | Check Previous Episode | Playback Resumer | Unpause Jumpback | XSqueezeDisplay | (Legacy - XSqueeze & XZen)
Sorry, no help w/out a *full debug log*.
Reply
#2
Try to convert it to a bytestring

s = u'Sigur R\xc3\xb3s'.encode('utf-8')
Reply
#3
Unfortunately that doesn't work...same result.

Any other ideas - I think the info IS unicode utf-8, but I think maybe XBMC isn't interpreting it as such
Addons I wrote &/or maintain:
OzWeather (Australian BOM weather) | Check Previous Episode | Playback Resumer | Unpause Jumpback | XSqueezeDisplay | (Legacy - XSqueeze & XZen)
Sorry, no help w/out a *full debug log*.
Reply
#4
Hmmm ok passing it just artist = 'Sigur R\xc3\xb3s' WITHOUT making it a uncide string works!

That's odd...must be a double translation thing I guess?

Now, how to get the unciode strings into basic string in python - i.e. cast them I guess. I find this area a bit confusing....
Addons I wrote &/or maintain:
OzWeather (Australian BOM weather) | Check Previous Episode | Playback Resumer | Unpause Jumpback | XSqueezeDisplay | (Legacy - XSqueeze & XZen)
Sorry, no help w/out a *full debug log*.
Reply
#5
The problem is I am using a downstream library and is returning strings with these characters in them, so 'Sigur R\xc3\xb3s' - and these are type as unicode.

If I then pass them as this type, they come out in xbmc wonky. I need to just cast them or get the literal value of the string...but I can't seem to just get the literal value from a unicode string in a variable...

I think I am missing something obvious but have been missing it for two days now and it's driving me nuts!

Any python experts know how to do this??
Addons I wrote &/or maintain:
OzWeather (Australian BOM weather) | Check Previous Episode | Playback Resumer | Unpause Jumpback | XSqueezeDisplay | (Legacy - XSqueeze & XZen)
Sorry, no help w/out a *full debug log*.
Reply
#6
bossanova808 Wrote:The problem is I am using a downstream library and is returning strings with these characters in them, so 'Sigur R\xc3\xb3s' - and these are type as unicode.

If I then pass them as this type, they come out in xbmc wonky. I need to just cast them or get the literal value of the string...but I can't seem to just get the literal value from a unicode string in a variable...

I think I am missing something obvious but have been missing it for two days now and it's driving me nuts!

Any python experts know how to do this??

I thought it looked like a unicoded utf-8 string...

I use the following python code to insure that the string is in utf-8 coding.
Code:
def get_unicode( to_decode ):
    final = []
    try:
        temp_string = to_decode.encode('utf8')
        return to_decode
    except:
        while True:
            try:
                final.append(to_decode.decode('utf8'))
                break
            except UnicodeDecodeError, exc:
                # everything up to crazy character should be good
                final.append(to_decode[:exc.start].decode('utf8'))
                # crazy character is probably latin1
                final.append(to_decode[exc.start].decode('latin1'))
                # remove already encoded stuff
                to_decode = to_decode[exc.start+1:]
        return "".join(final)

Then I send to XBMC the string with a '.decode("utf-8")' This shows the artist in the proper format(usually..)
Reply
#7
mmm, that seemed to give me the same results. This might make it clearer (perhaps)!

Code:
title, artist, album = self.player.getCurrentTrack()
    print "artist (raises exception about ordinal out of range if printed as is) "
    print repr(artist)
    artist2 = 'Sigur R\xc3\xb3s'
    print "artist2 is " + artist2
    print type(artist2)

    #newa =self.get_unicode(artist)
    xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTTITLE", title)
    xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)

and output:

Code:
14:06:58 T:756  NOTICE: artist (raises exception about ordinal out of range if printed as is)
14:06:58 T:756  NOTICE: u'Sigur R\xc3\xb3s'
14:06:58 T:756  NOTICE: artist2 is Sigur Rós
14:06:58 T:756  NOTICE: <type 'str'>

If I pass artist 2 - correct onscreen display

pass artist 1 - gobbldeygook
Addons I wrote &/or maintain:
OzWeather (Australian BOM weather) | Check Previous Episode | Playback Resumer | Unpause Jumpback | XSqueezeDisplay | (Legacy - XSqueeze & XZen)
Sorry, no help w/out a *full debug log*.
Reply
#8
What's the code in self.player.getCurrentTrack() I think the problem is there. With out the u' prefix it properly works, as you say, but nothing seems to be able to strip out.

bossanova808 Wrote:mmm, that seemed to give me the same results. This might make it clearer (perhaps)!

Code:
title, artist, album = self.player.getCurrentTrack()
    print "artist (raises exception about ordinal out of range if printed as is) "
    print repr(artist)
    artist2 = 'Sigur R\xc3\xb3s'
    print "artist2 is " + artist2
    print type(artist2)

    #newa =self.get_unicode(artist)
    xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTTITLE", title)
    xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)

and output:

Code:
14:06:58 T:756  NOTICE: artist (raises exception about ordinal out of range if printed as is)
14:06:58 T:756  NOTICE: u'Sigur R\xc3\xb3s'
14:06:58 T:756  NOTICE: artist2 is Sigur Rós
14:06:58 T:756  NOTICE: <type 'str'>

If I pass artist 2 - correct onscreen display

pass artist 1 - gobbldeygook
Reply
#9
Code:
artist = self.playlist[currentIndex]['artist']

...which is looking at the result of getplaylist:

    self.playlist = self.sb.playlist_get_info()

...

    def playlist_get_info(self):
        """Get info about the tracks in the current playlist"""
        amount = self.playlist_track_count()
        response = self.request('status 0 %i' % amount, True)
        encoded_list = response.split('playlist%20index')[1:]
        playlist = []
        for encoded in encoded_list:
            data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
            item = {}
            for info in data:
                info = info.split(':')
                key = info.pop(0)
                if key:
                    item[key] = ':'.join(info)
            item['position'] = int(item['position'])
            item['id'] = int(item['id'])
            item['duration'] = float(item['duration'])
            playlist.append(item)
        return playlist

and __unquote is:

    def __unquote(self, text):
        try:
            import urllib.parse
            return urllib.parse.unquote (text, encoding=self.charset)
        except ImportError:
            import urllib
            return urllib.unquote(text)

(it does raise the exception and fo through ro just urllib.unquote(text) rather than the .parse version).

I wrote basically none of those functions, they are from pysqueezecenter and I use this in lots of places, so ideally I want to fix it externally if I can...as if I change the output it will likely break other things.

I even tried using repr() on it and then stripping off the u' and the final ' in a gross hack but that didn't work...which surprised me.
Addons I wrote &/or maintain:
OzWeather (Australian BOM weather) | Check Previous Episode | Playback Resumer | Unpause Jumpback | XSqueezeDisplay | (Legacy - XSqueeze & XZen)
Sorry, no help w/out a *full debug log*.
Reply
#10
I know you really don't want to change the coding, but can you change the response line to the following:
Code:
response = self.request('status 0 %i' % amount, False)

bossanova808 Wrote:
Code:
artist = self.playlist[currentIndex]['artist']

...which is looking at the result of getplaylist:

    self.playlist = self.sb.playlist_get_info()

...

    def playlist_get_info(self):
        """Get info about the tracks in the current playlist"""
        amount = self.playlist_track_count()
        response = self.request('status 0 %i' % amount, True)
        encoded_list = response.split('playlist%20index')[1:]
        playlist = []
        for encoded in encoded_list:
            data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
            item = {}
            for info in data:
                info = info.split(':')
                key = info.pop(0)
                if key:
                    item[key] = ':'.join(info)
            item['position'] = int(item['position'])
            item['id'] = int(item['id'])
            item['duration'] = float(item['duration'])
            playlist.append(item)
        return playlist

and __unquote is:

    def __unquote(self, text):
        try:
            import urllib.parse
            return urllib.parse.unquote (text, encoding=self.charset)
        except ImportError:
            import urllib
            return urllib.unquote(text)

(it does raise the exception and fo through ro just urllib.unquote(text) rather than the .parse version).

I wrote basically none of those functions, they are from pysqueezecenter and I use this in lots of places, so ideally I want to fix it externally if I can...as if I change the output it will likely break other things.

I even tried using repr() on it and then stripping off the u' and the final ' in a gross hack but that didn't work...which surprised me.
Reply
#11
Unfortunately that break the entire function...the data that comes back from the server looks like:

Code:
response = self.request('status 0 %i' % amount, True)
        print "response" + str(response)
        encoded_list = response.split('playlist%20index')[1:]
        playlist = []
        for encoded in encoded_list:
            print "encoded" + encoded
            data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
            print "data" + str(data)


20:08:06 T:5232  NOTICE: response1 player_name%3ASqueezeslave player_connected%3A1 player_ip%3A192.168.1.9%3A49712 power%3A1 signalstrength%3A0 mode%3Astop time%3A0 rate%3A1 duration%3A603.826 can_seek%3A1 mixer%20volume%3A50 playlist%20repeat%3A0 playlist%20shuffle%3A0 playlist%20mode%3Aoff seq_no%3A0 playlist_cur_index%3A1 playlist_timestamp%3A1330160627.81035 playlist_tracks%3A11 playlist%20index%3A0 id%3A11144 title%3AIntro genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A100.493 playlist%20index%3A1 id%3A11145 title%3ASvefn-g-englar genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A603.826 playlist%20index%3A2 id%3A11146 title%3AStar%C3%A1lfur genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A406.933 playlist%20index%3A3 id%3A11147 title%3AFlugufrelsarinn genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A467.84 playlist%20index%3A4 id%3A11148 title%3AN%C3%BD%20batter%C3%AD genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A489.533 playlist%20index%3A5 id%3A11149 title%3AHjarta%C3%B0%20hamast%20(bamm%20bamm%20bamm) genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A430.546 playlist%20index%3A6 id%3A11150 title%3AVi%C3%B0ar%20vel%20tl%20loft%C3%A1rasa genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A617.013 playlist%20index%3A7 id%3A11151 title%3AOlsen%20Olsen genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A484.24 playlist%20index%3A8 id%3A11152 title%3A%C3%81g%C3%A6tis%20byrjun genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A474.653 playlist%20index%3A9 id%3A11153 title%3AAvalon genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A246.146 playlist%20index%3A10 id%3A19959 title%3ASvefn-G-Englar genre%3APop artist%3ASigur%20R%C3%B3s album%3AThe%20Pitchfork%20500 duration%3A604.081

20:08:06 T:5232  NOTICE: encoded%3A0 id%3A11144 title%3AIntro genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A100.493

20:08:06 T:5232  NOTICE: data[u'position:0', u'id:11144', u'title:Intro', u'genre:Pop', u'artist:Sigur R\xc3\xb3s', u'album:\xc3\x81g\xc3\xa6tis byrjun', u'duration:100.493', u'']
Addons I wrote &/or maintain:
OzWeather (Australian BOM weather) | Check Previous Episode | Playback Resumer | Unpause Jumpback | XSqueezeDisplay | (Legacy - XSqueeze & XZen)
Sorry, no help w/out a *full debug log*.
Reply
#12
Found the problem.. It's a bug in the python urillib.unquote() module... -> http://bugs.python.org/issue8136.

Now to find the way to correct it...

The easiest is to modify the __unquote() in the server.py from:
Code:
def __quote(self, text):
        try:
            import urllib.parse
            return urllib.parse.quote(text, encoding=self.charset)
        except ImportError:
            import urllib
            return urllib.quote(text)

TO
Code:
def __quote(self, text):
        try:
            import urllib.parse
            return urllib.parse.quote(text, encoding=self.charset)
        except ImportError:
            #import urllib
            #return urllib.quote(text)
            if isinstance(text, unicode):
                text = text.encode('utf-8')
            res = text.split('%')
            for i in xrange(1, len(res)):
                item = res[i]
                try:
                    res[i] = _hextochr[item[:2]] + item[2:]
                except KeyError:
                    res[i] = '%' + item
                except UnicodeDecodeError:
                    res[i] = unichr(int(item[:2], 16)) + item[2:]
            return "".join(res)


This puts the patched code to fix the urllib.quote() in place of calling the urllib.quote() code.
Reply
#13
That looks like some amazing searching and indeed this issue...

However, you seem to have modified __quote instead of __unquote - is that right?

I tried it as __unquote (change the name and the call to __unquote) - I am currently stuck on _hextochr not being recognised....
Addons I wrote &/or maintain:
OzWeather (Australian BOM weather) | Check Previous Episode | Playback Resumer | Unpause Jumpback | XSqueezeDisplay | (Legacy - XSqueeze & XZen)
Sorry, no help w/out a *full debug log*.
Reply
#14
bossanova808 Wrote:That looks like some amazing searching and indeed this issue...

However, you seem to have modified __quote instead of __unquote - is that right?

I tried it as __unquote (change the name and the call to __unquote) - I am currently stuck on _hextochr not being recognised....

yep my bad... It should be in the __unquote() section.

Heres the real code(found the missing _hextochr):

Code:
def __unquote(self, text):
        try:
            import urllib.parse
            return urllib.parse.unquote(text, encoding=self.charset)
        except ImportError:
            #import urllib
            #return urllib.unquote(text)
            _hexdig = '0123456789ABCDEFabcdef'
            _hextochr = dict((a+b, chr(int(a+b,16))) for a in _hexdig for b in _hexdig)
            if isinstance(text, unicode):
                text = text.encode('utf-8')
            res = text.split('%')
            for i in xrange(1, len(res)):
                item = res[i]
                try:
                    res[i] = _hextochr[item[:2]] + item[2:]
                except KeyError:
                    res[i] = '%' + item
                except UnicodeDecodeError:
                    res[i] = unichr(int(item[:2], 16)) + item[2:]
            return "".join(res)
Reply
#15
Give that man a cigar...

Yep, that works, and has the by-product of changing some other funky-ness in my code to something much simpler & neater.

Many many thanks mate, you went above and beyond.
Addons I wrote &/or maintain:
OzWeather (Australian BOM weather) | Check Previous Episode | Playback Resumer | Unpause Jumpback | XSqueezeDisplay | (Legacy - XSqueeze & XZen)
Sorry, no help w/out a *full debug log*.
Reply

Logout Mark Read Team Forum Stats Members Help
How to get unicode from python to $INFO label0