[Release] Parsedom and other functions

[Release] Parsedom and other functions - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Add-ons (https://forum.kodi.tv/forumdisplay.php?fid=26)
+--- Thread: [Release] Parsedom and other functions (/showthread.php?tid=116498)

Pages: 1 2 3 4 5 6 7 8 9

RE: [Release] Parsedom and other functions - bossanova808 - 2012-09-20

Yep looks like this brok my addon as well. What happened to

'we do unit testing' blah blah - seems like a big regression really...even if you extend it to do utf8 surely the default behaviour should remain what it was??

Sorry that sounds rude, didn't mean it to be - parsedom is really nifty and made things very easy to write, but this is a big-ish change without forewarning...

RE: [Release] Parsedom and other functions - bossanova808 - 2012-09-20

I adjusted my addon to cope with the new unicode strings and it's all working well again

RE: [Release] Parsedom and other functions - newatv2user - 2012-09-20

How did you fixed it? I am getting errors too.

"UnicodeDecodeError: 'ascii' codec can't decode byte.........."

RE: [Release] Parsedom and other functions - bossanova808 - 2012-09-21

For me it wasn't really errors in fetchpage itself, it was my crappy code - this was the first python I ever wrote and I was calling member function of str (e.g. strip.strip()) - directly, rather than on the string object. All the str functions exist for unicode as well, so I just called them directly on the string object, i.e. mystir.strip() instead.

Honestly, I looked in there and was embarrassed at what I saw!! I was quite a while ago now, but geez some ugly code is in there. Still, it works - indeed if I'd jsut written it properly to start, it would have worked even with the parsedom changes, so my bad really.

In your case, maybe pinch the fetchpage function from the older parsedom as an ugly workaround to keep going??

(At a guess on your issue: I would say you're getting strings back with embedded unicode in them - there's a well known bug for this in Python that requires a hack to get around - I came across this in XSqueeze and solve it with a function):

Code:
def unquoteUni(text):

    try:

        import urllib.parse

        return urllib.parse.unquote(text, encoding=self.charset)

    except ImportError:

        _hexdig = '0123456789ABCDEFabcdef'

        _hextochr = dict((a+b, chr(int(a+b,16))) for a in _hexdig for b in _hexdig)

        if isinstance(text, unicode):

            text = text.encode('utf-8')

        res = text.split('%')

        for i in xrange(1, len(res)):

            item = res[i]

            try:

                res[i] = _hextochr[item[:2]] + item[2:]

            except KeyError:

                res[i] = '%' + item

            except UnicodeDecodeError:

                res[i] = unichr(int(item[:2], 16)) + item[2:]

        return "".join(res)

giftie helped me find (or wrote? dude is awesome) - that function which deals with unicode in strings....this is for the case where you have a str type but with actually unicode in it like 'The message is \xe8\x91\xa3' or similar...those characters are unicode encoded (even though the type is str) - and fall outside of the ascii range hence the error you get above. The normal unquote does not work...

RE: [Release] Parsedom and other functions - newatv2user - 2012-09-21

Sorry, but I don't understand. Here's more details to what I'm facing.

This is the portion of my code:

Code:
show_page = 'http://www.vice.com/shows'

try:

        soup = get_remote_data(show_page)

except HTTPError:

        return ''

story_list = common.parseDOM(soup, "ul", attrs={"class": "story_list.*?"})

And this is the error I'm getting:

Code:
10:50:14 T:5592 ERROR: Error Type: <type 'exceptions.UnicodeDecodeError'>

10:50:14 T:5592 ERROR: Error Contents: 'ascii' codec can't decode byte 0xd0 in position 8508: ordinal not in range(128)

10:50:14 T:5592 ERROR: Traceback (most recent call last):

File "C:\Documents and Settings\xxxxxx\Application Data\XBMC\addons\plugin.video.vice2\default.py", line 95, in <module>

Main()

File "C:\Documents and Settings\xxxxxx\Application Data\XBMC\addons\plugin.video.vice2\default.py", line 62, in __init__

for show in cache.cacheFunction(vice.get_episodes):

File "C:\Documents and Settings\xxxxxx\Application Data\XBMC\addons\script.common.plugin.cache\lib\StorageServer.py", line 541, in cacheFunction

ret_val = funct(*args)

File "C:\Documents and Settings\xxxxxx\Application Data\XBMC\addons\plugin.video.vice2\resources\lib\vice.py", line 147, in get_episodes

story_list = common.parseDOM(soup, "ul", attrs={"class": "story_list.*?"})

File "C:\Documents and Settings\xxxxxx\Application Data\XBMC\addons\script.module.parsedom\lib\CommonFunctions.py", line 267, in parseDOM

temp = _getDOMContent(item, name, match, ret).strip()

File "C:\Documents and Settings\xxxxxx\Application Data\XBMC\addons\script.module.parsedom\lib\CommonFunctions.py", line 144, in _getDOMContent

end = html.find(endstr, start)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 8508: ordinal not in range(128)

So do I use unquoteUni before I try parsedom? Thanks for the help.

[Release] Parsedom and other functions - bossanova808 - 2012-09-22

Try fetching the page with fetchpage from parsedom for a start I'd say...

RE: [Release] Parsedom and other functions - stacked - 2012-09-22

Code:
18:26 T:3860  NOTICE: CommonFunctions-1.2.0

18:28 T:3860   ERROR: Traceback (most recent call last):

18:28 T:3860   ERROR:   File "C:\Program Files (x86)\XBMC\portable_data\addons\plugin.video.revision3\default.py", line 382, in <module>

18:28 T:3860   ERROR:     build_sub_directory(url, name)

18:28 T:3860   ERROR:   File "C:\Program Files (x86)\XBMC\portable_data\addons\plugin.video.revision3\default.py", line 139, in build_sub_directory

18:28 T:3860   ERROR:     blah2 = common.fetchPage({"link": blah})['status']

18:28 T:3860   ERROR:   File "C:\Program Files (x86)\XBMC\portable_data\addons\script.module.parsedom\lib\CommonFunctions.py", line 410, in fetchPage

18:28 T:3860   ERROR:     ret_obj["content"] = inputdata.decode("utf-8")

18:28 T:3860   ERROR:   File "C:\Program Files (x86)\XBMC\system\python\Lib\encodings\utf_8.py", line 16, in decode

18:28 T:3860   ERROR:     return codecs.utf_8_decode(input, errors, True)

18:28 T:3860   ERROR: UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte

I'm also having the UnicodeDecodeError problem. It happens when checking the status on this url. The error goes away if you edit line 410 in fetchPage and remove decode("utf-8"). I'm not sure if that would be the permanent solution.

Code:
line 410: ret_obj["content"] = inputdata.decode("utf-8")

to 

line 410: ret_obj["content"] = inputdata

RE: [Release] Parsedom and other functions - bossanova808 - 2012-09-23

Maybe try using the unquoteUni I posted above as an alternative to decode??

You'll probably have to ask TobiasTheCommie for help with stuff inside of parseDOM itself.

RE: [Release] Parsedom and other functions - newatv2user - 2012-09-26

Well, the easiest fix for me was removing 1.2 and installing 1.1. Unfortunately, even with automatic updates disabled, it just keeps updating to the borked 1.2. Wtf? Is there any way to disable update for this?

Update:
Installed 1.1 and changed the version in addon.xml to 1.2. Hope that works.

RE: [Release] Parsedom and other functions - mrstealth - 2012-09-27

(2012-09-26, 04:28)newatv2user Wrote: Well, the easiest fix for me was removing 1.2 and installing 1.1. Unfortunately, even with automatic updates disabled, it just keeps updating to the borked 1.2. Wtf? Is there any way to disable update for this?

Update:
Installed 1.1 and changed the version in addon.xml to 1.2. Hope that works.

I just put the script.module.parsedom-1.1.0 version in my xbmc repository and raised the version to 1.2.1 in addon.xml.

Code:
<addon id='script.module.parsedom' version='1.2.1' name='Parsedom for xbmc plugins' provider-name='TheCollective'>

But this is a workaround and I hope the issue will be fixed in the next parsedom version

RE: [Release] Parsedom and other functions - stacked - 2012-11-23

I don't understand the 1.3.0 update ( View the diff ).

The unquote-ing causes the below statement to produce an index error in getParameters.

Code:
params = common.getParameters("?path=/root/favorites&login=true&name=Tom+%26+Jerry")

I found a workaround of just using quote_plus. Is this the intended use?

Code:
params = common.getParameters(urllib.quote_plus("?path=/root/favorites&login=true&name=Tom+%26+Jerry"))

RE: [Release] Parsedom and other functions - Popeye - 2012-11-23

The commit is just crazy IMHO Big Grin

. The issues others have had with unicode characters is probably due to unquote_plus inability to manage this properly.
I've been using below for ages without any issues what so ever...

PHP Code:
# FROM plugin.video.youtube.beta  -- converts the request url passed on by xbmc to our plugin into a dict  
def get_parameters(parameterString):
    commands = {}
    splitCommands = parameterString[parameterString.find('?')+1:].split('&')
    for command in splitCommands: 
        if (len(command) > 0):
            splitCommand = command.split('=')
            name = splitCommand[0]
            value = splitCommand[1]
            commands[name] = value  
    return commands 

RE: [Release] Parsedom and other functions - stacked - 2012-11-25

Quote:Version 1.4.0
- Version 1.3 was too aggressive on frodo and not needed in eden, so we're doing a rollback on eden and fix on frodo

Great. Everything seems to be working again in eden, but I'm still having the same issue with frodo. I think frodo also needs a rollback.

(2012-11-23, 21:50)Popeye Wrote: The commit is just crazy IMHO . The issues others have had with unicode characters is probably due to unquote_plus inability to manage this properly.
I've been using below for ages without any issues what so ever...

PHP Code:
# FROM plugin.video.youtube.beta -- converts the request url passed on by xbmc to our plugin into a dict def get_parameters(parameterString): commands = {} splitCommands = parameterString[parameterString.find('?')+1:].split('&') for command in splitCommands: if (len(command) > 0): splitCommand = command.split('=') name = splitCommand[0] value = splitCommand[1] commands[name] = value return commands

Thanks, that's what I was planning on using.

RE: [Release] Parsedom and other functions - Popeye - 2012-11-25

Whats the deal with frodo? To me it seems as if xbmc frodo url encode the whole plugin:// uri . If so, this is cursial information for all addon devs that must be shared asap....

RE: [Release] Parsedom and other functions - newatv2user - 2012-11-29

Any fix for the Unicode error yet? If not I'm just gonna go back to 1.1.

Code:
File "C:\Documents and Settings\***\Application Data\XBMC\addons\script.module.parsedom\lib\CommonFunctions.py", line 278, in parseDOM

temp = _getDOMContent(item, name, match, ret).strip()

File "C:\Documents and Settings\***\Application Data\XBMC\addons\script.module.parsedom\lib\CommonFunctions.py", line 144, in _getDOMContent

end = html.find(endstr, start)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 15193: ordinal not in range(128)