Question: problem about encoding (i think utf-8)
#1
I've try to get the xml data from movie site

but when the languages is not English XBMC can't display that for example.
example of xml data
Code:
<PartNumber>จบ</PartNumber>
<LastUpdated>15/9/2555</LastUpdated>
<PartNumber>19</PartNumber>
<LastUpdated>15/9/2555</LastUpdated>

pic 1 on my virtualenv these language is correct:
Image

but pic 2 when it show on XBMC that isn't correct:
Image

but XBMC can show this correct language on the settings.xml, But this is not setting it's data.

help me please!

edit:
this is my function to decode utf-8.
Code:
def __decodefunction(text):
    try:
        text = unicode(text, 'utf-8')
    except TypeError:
        return text
Reply
#2
Can we see the full source of your xml? XBMC should handle utf-8 just fine without decoding to unicode
Reply
#3
(2013-06-02, 19:40)Bstrdsmkr Wrote: Can we see the full source of your xml? XBMC should handle utf-8 just fine without decoding to unicode

yes sir

i use Xbmcswift2

this is my function to append data to return to addon.py
the episode['part'] is problem
Code:
def get_episodes(episode_id):
    url = MAIN_URL + '?type=serielink&ps=100&id=%d' % int(episode_id)
        tree, html = __get_tree(url)
    pattern = re.compile("\<Link\>(?P<link>[^\<]*)\<\/Link\>\s*")
    total = re.finditer(pattern,html)
    bufferr = []
    for i in total:
        bufferr.append({'source' : i.group('link')})
    episodes = [{
        'title': item.seriename.string,
        'part': item.partnumber.string,
        'date': item.lastupdated.string
    } for item in tree.findAll('item')]
    count = 0
    for i in episodes:
        i.update(bufferr[count])
        count+=1
    log('_get_episodes got %d item' % len(episodes))
    return episodes

and this is my scraper
Code:
def __get_tree(url):
    log('__get_tree opening url: %s' % url)
    req = urllib2.Request(url)
    req.add_header('User-Agent', USER_AGENT)
    try:
        html = urllib2.urlopen(req).read()
    except urllib2.HTTPError, error:
        raise NetworkError('HTTPError: %s' % error)
    log('__get_tree got %d bytes' % len(html))
    tree = BeautifulSoup(html, convertEntities=BeautifulSoup.XML_ENTITIES)
    return tree, html
Reply
#4
Sorry, we need to see the xml you're parsing to see what the encoding looks like
Reply
#5
maybe the font of the skin you're using doesn't support those characters?

test it in confuence and select the 'arial based' font.
Do not PM or e-mail Team-Kodi members directly asking for support.
Always read the Forum rules, Kodi online-manual, FAQ, Help and Search the forum before posting.
Reply
#6
when I try to set this language in my code then it's error

on the head I using
Code:
#!/usr/bin/python
# -*- coding: utf-8 -*-

ex. in addon.py
Code:
@plugin.route('/')
def index():
    items = [
        {'label': 'ภาษาไทย',
        'path': plugin.url_for('show_series', stype='us')}
    ]
    return plugin.finish(items)

Code:
12:19:48 T:9716   ERROR: EXCEPTION Thrown (PythonToCppException) : -->Python callback/script returned the following error<--
                                             - NOTE: IGNORING THIS CAN LEAD TO MEMORY LEAKS!
                                            Error Type: <type 'exceptions.UnicodeDecodeError'>
                                            Error Contents: 'ascii' codec can't decode byte 0xe0 in position 2: ordinal not in range(128)
                                            Traceback (most recent call last):
                                              File "C:\Users\Kanin\AppData\Roaming\XBMC\addons\plugin.video.\addon.py", line 232, in <module>
                                                plugin.run()
                                              File "C:\Users\Kanin\AppData\Roaming\XBMC\addons\script.module.xbmcswift2\lib\xbmcswift2\plugin.py", line 332, in run
                                                items = self._dispatch(self.request.path)
                                              File "C:\Users\Kanin\AppData\Roaming\XBMC\addons\script.module.xbmcswift2\lib\xbmcswift2\plugin.py", line 306, in _dispatch
                                                listitems = view_func(**items)
                                              File "C:\Users\Kanin\AppData\Roaming\XBMC\addons\plugin.video.\addon.py", line 120, in show_names
                                                } for i, name in enumerate(names)])
                                            UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 2: ordinal not in range(128)
                                            -->End of Python script error report<--
Reply
#7
Looks like you use code from any of my add-ons *g*

Without seeing the full code I guess you are trying to put any unicode char as (xbmcswift2) route or querystring.
My GitHub. My Add-ons:
Image
Reply
#8
If you can give a link to the source you're pulling from, I can help you. You need to know the encoding the site is sending it in. From there you need to get it into utf-8. That line at the top only helps your editor, it doesn't affect the code
Reply
#9
(2013-06-03, 16:39)Bstrdsmkr Wrote: ...That line at the top only helps your editor, it doesn't affect the code...

That's a common mistake. See PEP0263.

Quote: This PEP proposes to introduce a syntax to declare the encoding of
a Python source file. The encoding information is then used by the
Python parser
to interpret the file using the given encoding. Most
notably this enhances the interpretation of Unicode literals in
the source code and makes it possible to write Unicode literals
using e.g. UTF-8 directly in an Unicode aware editor.
My GitHub. My Add-ons:
Image
Reply
#10
(2013-06-03, 16:11)sphere Wrote: Looks like you use code from any of my add-ons *g*

Without seeing the full code I guess you are trying to put any unicode char as (xbmcswift2) route or querystring.

yes sir I'm using xbmcswift2

this is my unicode u'\u0e08\u0e1a'
---------------------------------------------------------------

when I got the html
Code:
html BeautifulSoup(html, convertEntities=BeautifulSoup.XML_ENTITIES)

I think it has edit unicode when I use
Code:
for item in html.findAll('item'):
                tests.append({
            'title': item.seriename.string
        })
print tests
when I print tests it print out item in list with u'aaaa' , u'\u0e08\u0e1a'

[{'title': u'\u0e08\u0e1a'}]

how can i fix it help me please.
Reply
#11
BeautifulSoup always returns unicode, you can directly use it for any listitem-property (label, plot, ...) except the listitem path (or any other path-related properties like context-menu-url) or any log output (like any print, plugin.log(), xbmc.log()).
My GitHub. My Add-ons:
Image
Reply
#12
(2013-06-03, 17:19)sphere Wrote: BeautifulSoup always returns unicode, you can directly use it for any listitem-property (label, plot, ...) except the listitem path (or any other path-related properties like context-menu-url) or any log output (like any print, plugin.log(), xbmc.log()).

I use it for "label"

there have the way it fix it (.encode('utf-8') it) ?

I'm very thank you for you ans.

Is you a creator of xbmcswift2 ?

sory for my bad english. Blush
Reply
#13
If it's always Unicode, then item.seriename.string.encode('utf-8')
Should do the job. Utf-8 can be printed and used in list items
Reply
#14
(2013-06-03, 17:47)Bstrdsmkr Wrote: If it's always Unicode, then item.seriename.string.encode('utf-8')
Should do the job. Utf-8 can be printed and used in list items

I try but it error
Code:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)
Reply
#15
(2013-06-03, 16:51)sphere Wrote:
(2013-06-03, 16:39)Bstrdsmkr Wrote: ...That line at the top only helps your editor, it doesn't affect the code...

That's a common mistake. See PEP0263.

Quote: This PEP proposes to introduce a syntax to declare the encoding of
a Python source file. The encoding information is then used by the
Python parser
to interpret the file using the given encoding. Most
notably this enhances the interpretation of Unicode literals in
the source code and makes it possible to write Unicode literals
using e.g. UTF-8 directly in an Unicode aware editor.

Sorry, bad wording on my part. I meant that it doesn't change the encoding of the strings at run time.

I'm on my phone so I can't look at the code right now, but it sounds sort of like xbmcswift is implicitly encoding to ascii somewhere
Reply

Logout Mark Read Team Forum Stats Members Help
Question: problem about encoding (i think utf-8)1