2012-02-04, 12:08
if you do
result = parseDOM(data, "p")
Then the second p should be in result[1].
result = parseDOM(data, "p")
Then the second p should be in result[1].
Quote:print item
Plot = common.parseDOM(item, "p")
print 'ParseDOM returned: ' + str(len(Plot))
Quote:08:36:34 T:828 NOTICE: <div class="post-left"><a href="http://documentarystorm.com/last-chance-to-see/" title="Last Chance to See"><img src="http://documentarystorm.com/files/2012/01/last-chance-to-see1.jpg" alt="Last Chance to See (documentary)" height="150" width="150" /></a></div><div class="post-right"><h3><a href="http://documentarystorm.com/last-chance-to-see/" rel="bookmark" title="Stream this documentary: Last Chance to See">Last Chance to See</a></h3><p class="post-meta">Jan 29th, 2012 // <a href="http://documentarystorm.com/category/nature-biology/animals-nature-biology/" title="View all posts in Animals" rel="category tag">Animals</a>, <a href="http://documentarystorm.com/category/nature-biology/" title="View all posts in Nature" rel="category tag">Nature</a> // <a href="http://documentarystorm.com/last-chance-to-see/#comments" title="Comment on Last Chance to See">2 Comments »</a></p><p>Stephen Fry and zoologist Mark Carwardine head to the ends of the earth in search of animals on the edge of extinction.</p><div class="gdsrcacheloader gdsrclsmall" id="gdsrc_asr.7827.0.1.1327816953.48.1.20.6.4.0"><strong>GD Star Rating</strong><br /><em>a WordPress rating system</em></div></div><div class="clearfix"></div>
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] parseDOM : 'start: 'p' - {} - False - <type 'str'>'
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] parseDOM : 'no list found, making one on just the element name'
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] parseDOM : 'Getting element content for 1 matches '
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] _getDOMContent : 'match: <p class="post-meta">'
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] _getDOMContent : 'start: 441, len: 21, end: 887'
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] _getDOMContent : 'done html length: 425'
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] parseDOM : 'Done'
08:36:34 T:828 NOTICE: ParseDOM returned: 1
newatv2user Wrote:That is exactly what I think I am doing.
But I am not getting the desired result:
I think my problem on post #74 is also similar. If there are mixed <li> with and without attributes, it is causing problem.
Or maybe I have a corrupted copy of parsedom. How do I check or reinstall?
Thanks.
Quote:suckerfishDOM = common.parseDOM(contents, "ul", attrs = { "id": "suckerfishnav"})[0]
catDOM = common.parseDOM(suckerfishDOM, "li", attrs = { "class": "cat-item cat-item-[0-9]{1,}"})
print 'Debug Info - catDOM length: ' + str(len(catDOM))
for dCat in catDOM:
print 'looping through catDOM'
print 'Debug Info: ' + dCat
if dCat is None or dCat == '':
continue
newatv2user Wrote:More problems.
Portion of HTML I'm using: http://pastebin.com/C1imeTMG
My code:
Resulting portion of log: http://pastebin.com/zAbk89n9
In summary, only the first match in catDOM is non empty. All the rest are empty. Am I doing it correctly?
Thanks.
ret = common.parseDOM(self.readTestInput("documentarystorm2.html", False), "ul", attrs = { "id": "suckerfishnav"})
print repr(ret)
ret2 = common.parseDOM(ret, "ul", attrs = { "class": "children"})
print "2: " + repr(ret2[0])
for ret in ret2:
ret3 = common.parseDOM(ret, "li", attrs = { "class": "cat-item cat-item-[0-9]{1,}"})
print "3: " + repr(ret3)
T:3232 NOTICE: [SoundCloud] fetchPage : 'called for : 'https://soundcloud.com/connect/login''
20:44:39 T:3232 NOTICE: [SoundCloud] fetchPage : 'Posting data: username=*******&redirect_uri=plugin%3A%2F%2Fplugin.audio.soundcloud%2Foauth_callback&response_type=token&client_id=hijuflqxoOqzLdtr6W4NA&scope=non-expiring&password=******&display=popup'
20:44:39 T:3232 NOTICE: [SoundCloud] fetchPage : 'Added refering url: http://soundcloud.com'
20:44:39 T:3232 NOTICE: [SoundCloud] fetchPage : 'connecting to server...'
20:44:39 T:3232 NOTICE: [SoundCloud] fetchPage : 'URLError : <urlopen error unknown url type: plugin>'
html = '<div id="player" class="loading tv " \r \tdata-media="http://nordond25a-f.akamaihd.net/z/no/open/db/db70c9ca4be6c56b4813f550d822b27e77116bd9/db70c9ca4be6c56b4813f550d822b27e77116bd9_,141,316,563,1266,2250,.mp4.csmil/manifest.f4m" \r \tdata-timezoneoffset="2" \r \tdata-startingbitrateindex="3"\r \tdata-streamingerrormessageurl="/streamingerror"\r \tdata-outoflivebuffermessageurl="/outoflivebuffer"\r \t\t\t\t data-subtitlesurl = "/programsubtitles/koid21008710"\r \t\t\t data-IsRatedR = "False"\r >\r\n\t<!--googleoff: all-->\r\n\t\t\t<div id="nrkFlashContainer">\r\n\t\t\t\t<div class="msg-board">\r\n\t\t\t\t\t\r\n\t<img width="960" \r \t\t class=""\r \t\t alt="" \r \t\t src="http://gfx.nrk.no/iiUIuSEgJNUZ5ESHnHRXHgpqjzVQx3q0AqWf4v5n3sEQ" />\r\n\t<div class="msg no-js-msg">\r\n\t\t<h2><strong class="heading">Ooops, Javascript mangler!</strong></h2>\r\n\t\t<p>\r\n\t\t\tVi kan ikke se at du har aktivert Javascript p\xc3\xa5 din PC, dette m\xc3\xa5 v\xc3\xa6re aktivert for at v\xc3\xa5r videoavspiller skal fungere.<br />\r\n\t\t\t<a href="http://www.nrk.no/some/support/page" target="_blank">Les mer<span class="offscreen"> om hvorfor vi krever javascript</span></a> p\xc3\xa5 v\xc3\xa5re hjelpesider.\r\n\t\t</p>\r\n\t</div>\r\n\r\n\t\t\t\t\t<div class="msg no-flash-msg">\r\n\t\t\t\t\t\t <h2><strong class="heading">Ooops, vi har problemer med \xc3\xa5 laste Flash for avspilling!</strong></h2>\r\n\t\t\t\t\t\t <p>\r\n\t\t\t\t\t\t\t\t<a href="http://get.adobe.com/flashplayer" target="_blank">Klikk her for \xc3\xa5 installere Flash p\xc3\xa5 din maskin.</a><br /><br />\r\n\t\t\t\t\t\t\t\tVirker det fortsatt ikke?<br/>\r\n\t\t\t\t\t\t\t <a href="/hjelp/1.7916314">Les mer<span class="offscreen"> om flash og hvorfor vi krever det</span></a> p\xc3\xa5 v\xc3\xa5re hjelpesider.\r\n\t\t\t\t\t\t </p>\r\n\t\t\t\t\t</div>\r\n\t\t\t\t</div>\r\n\t\t\t</div>\r\n\t<!--googleon: all-->\r\n</div>\r\n\r\n\r\n\r\n\r\n\t<section id="programMetaData" class="container tight">\r\n\t\t<aside id="episode2" class="span-5 clearfix">\t\t\r\n\r\n\t<img width="300" \r \t\t class="episode-image"\r \t\t alt="Verda vi skaper" \r \t\t src="http://gfx.nrk.no/iiUIuSEgJNUZ5ESHnHRXHgeDOYjDhHYN0qWf4v5n3sEQ" />\r\n\t\t\t<!--googleoff: snippet-->\r\n\t\t\t<ul class="infolist clearfix">\r\n\t\t\t\r\n\t\t\t\t\t<li><mark class="age-restriction"><span>A</span></mark> Tillatt for alle</li>\r\n\r\n\t\t\t\t<li><strong>Tilgjengelig til:</strong> \r\n<time datetime="2012-06-16T16:25:00+02:00">16.06.2012</time>\r\n\t\t\t\t</li>\r\n\t\t\t</ul>\r\n\t\t\r\n\t\t\t\r\n\t\t\t<ul class="sharethis clearfix">\r\n\t\t\t\t<li><a href="http://twitter.com/home?status=\r \t\t\t\t\t\t\t\tSe+%27Verda+vi+skaper%27+p%c3%a5+NRK+TV+http%3a%2f%2ftv.nrk.no%2fserie%2fverda-vi-skaper%2fkoid21008710%2fsesong-1%2fepisode-6"\r \t\t\t\t\t target="_blank" title="Del/tips på Twitter"><img src="http://psfil.nrk.no/content/images/tweet.png?1.1.4533.14084a" alt="Del/tips på Twitter" /></a></li>\r\n\t\t\t\t<li><a href="http://www.facebook.com/sharer.php?u=http://tv.nrk.no/serie/verda-vi-skaper/koid21008710/sesong-1/episode-6"\r \t\t\t\t\t target="_blank" title="Del/tips på Facebook"><img src="http://psfil.nrk.no/content/images/facebook.png?1.1.4533.14084a" alt="Del/tips på Facebook" /></a></li>\r\n\t\t\t</ul>\r\n\t\t\t<!--googleon: snippet-->\r\n\t\t</aside>\r\n\r\n\t\t<article id="episode" class="span-10 last">\r\n\t\t\t<hgroup>\r\n\t\t\r\n\t\t\t\t\t<h2><a href="http://tv.nrk.no/serie/verda-vi-skaper">Verda vi skaper</a> \r\n\t\t\t\t\t</h2>\r\n\t\t\t\t<h1>\r\n\t\t\t\t\tVerda vi skaper \r\n\t\t\t\t\t\t<span class="small">6:8</span> \t\t\r\n\t\t\t\t</h1>\t\t\r\n\t\t\t</hgroup>\r\n\t \r\n\t\t\t<section id="taglist" class="stack-links">\r\n\t\t\t\t<strong>Emner:</strong>\r\n\t\t\t\r\n<a href="/kategori/dokumentar-og-fakta" title="Vis flere programmer i kategori "Dokumentar og fakta"">Dokumentar og fakta</a>, <a class="thin" href="/sok?m=tv&q=Kenya&filter=rettigheter&side=1" title="Vis flere programmer tagget med "Kenya"">Kenya</a>, <a class="thin" href="/sok?m=tv&q=Slettelandet&filter=rettigheter&side=1" title="Vis flere programmer tagget med "Slettelandet"">Slettelandet</a>, <a class="thin" href="/sok?m=tv&q=rovdyr&filter=rettigheter&side=1" title="Vis flere programmer tagget med "rovdyr"">rovdyr</a>, <a class="thin" href="/sok?m=tv&q=urbefolkning&filter=rettigheter&side=1" title="Vis flere programmer tagget med "urbefolkning"">urbefolkning</a>, <a class="thin" href="/sok?m=tv&q=tilpasning&filter=rettigheter&side=1" title="Vis flere programmer tagget med "tilpasning"">tilpasning</a>, <a class="thin" href="/sok?m=tv&q=kultur&filter=rettigheter&side=1" title="Vis flere programmer tagget med "kultur"">kultur</a>\r\n\t\t\t</section>\r\n\t\t\r\n\t\t\r\n\t\t\t<div class="tab">\r\n\t\t\t\t<ul class="tab-nav line-sep clearfix">\r\n\t\t\t\t\t<li class="active"><h2><a href="#information">Programinformasjon</a></h2></li>\r\n\t\t\r\n\t\t\t\t\t\t<li><a href="/programreview/koid21008710" id="reviewLink" rel="nofollow">Omtale</a>\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t</li>\r\n\t\t\t\t\t\t<li><a href="/programsubtitles/koid21008710/html" id="subtitlesLink" rel="nofollow">Teksting</a></li>\r\n\t\t\t\t</ul>\r\n\t\t\t\t<div class="tab-panels">\r\n\t\t\t\t\t<section id="information" class="tab-panel">\r\n\t\t\t\t\t\t<div class="mod toggle closed">\r\n\t\t\t\t\t\t\t<p>\r\n\t\t\t\t\t\t\t\tBr. naturserie. På slettelandet veks gras som gir mat til dyr og menneske. Men nokre gonger er kampen for føda farleg. Dorobo-folket i Kenya må jage vekk svoltne løver for å skaffe levebrød. Mennesket og dyra lever tett saman på stepper over heile kloden. Norsk kommentar: Ola Bøe. (Human Planet: Grasslands) (6:8)\r\n\t\t\t\t\t\t\t\t<a href="#" class="control hide-when-open" title="Vis mer om Verda vi skaper">Vis mer</a>\r\n\t\t\t\t\t\t\t</p>\r\n\t\t\t\t\t\t\t<div class="details">\r\n\t\t\t\t\t\t\t\t<dl class="infolist">\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t<dt>Tilgjengelig i:</dt> <dd>Norge</dd>\r\n\t\t\t\t\t\t\t\t\t\t<dt>Første gang sendt:</dt> <dd> <strong></strong> 08.06.2012 20:05</dd>\r\n\t\t\t\t\t\t\t\t\t\t<dt>Siste gang sendt:</dt> <dd> <strong></strong> 08.06.2012 20:05</dd>\r\n\t\t\t\t\t\t\t\t\t\t<dt>Planlagt sendt:</dt> <dd> <strong></strong> 09.06.2012 16:25</dd>\r\n\t\t\t\t\t\t\t\t</dl>\r\n\t\t\t\t\t\t\t\t<dl class="infolist">\r\n\t\t\t\t\t\t\t\t\t\t<dt>Serietittel:</dt> <dd>Verda vi skaper</dd>\r\n\r\n\t\t\t\t\t\t\t\t\t\t<dt>Episodetittel:</dt><dd>Verda vi skaper 6:8</dd>\r\n\r\n\t\t\t\t\t\t\t\t\t\t<dt>Orginal episodetittel:</dt> <dd>Human Planet</dd>\r\n\t\t\t\t\t\t\t\t\t\t<dt>Varighet:</dt> <dd>48 minutter</dd>\r\n\t\t\t\t\t\t\t\t</dl>\r\n\t\t\t\t\t\t\t\t\r\n\r\n\r\n\t\t\t\t\t\t\t\t\t<h3>Seriebeskrivelse:</h3><p>Britisk dokumentarserie</p>\r\n\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t<a href="#" class="control" title="Vis mindre om Verda vi skaper">Vis mindre</a>\r\n\t\t\t\t\t\t\t</div>\r\n\t\t\t\t\t\t</div>\r\n\t\t\t\t\t</section>\r\n\t\t\t\t</div>\r\n\t\t\t</div>\r\n\t\t</article>\r\n\r\n\t</section>'
>>> html ='<div id="player" class="loading tv " \r \tdata-media="http://nordond2b-f.akamaihd.net/z/no/open/1e/1ee465d30cdea83ac036714a0d4e7c7ff7a1095d/1ee465d30cdea83ac036714a0d4e7c7ff7a1095d_,141,316,563,1266,2250,.mp4.csmil/manifest.f4m" \r \tdata-timezoneoffset="2" \r \tdata-startingbitrateindex="3"\r \tdata-streamingerrormessageurl="/streamingerror"\r \tdata-outoflivebuffermessageurl="/outoflivebuffer"\r \t\t\t\t data-subtitlesurl = "/programsubtitles/mkds61000910"\r \t\t\t data-IsRatedR = "False"\r >dsgdsfsdf</div>'
>>> parseDOM(html, 'div', {'id':'player'}, ret='\tdata-outoflivebuffermessageurl')
['/outoflivebuffer"\r \t\t\t\t data-subtitlesurl = "/programsubtitles/mkds61000910"\r \t\t\t data-IsRatedR = "False']
>>> parseDOM(html, 'div', {'id':'player'}, ret='data-subtitlesurl')
[]
Type <class 'socket.timeout'>
Message timed out
Stacktrace File "/home/xbmc/.xbmc/addons/plugin.video.revision3/default.py", line 335, in <module>
build_main_directory(url)
File "/home/xbmc/.xbmc/addons/plugin.video.revision3/default.py", line 51, in build_main_directory
html = common.fetchPage({"link": url})['content']
File "/home/xbmc/.xbmc/addons/script.module.parsedom/lib/CommonFunctions.py", line 399, in fetchPage
ret_obj["content"] = con.read()
File "/usr/lib/python2.7/socket.py", line 351, in read
data = self._sock.recv(rbufsize)
File "/usr/lib/python2.7/httplib.py", line 541, in read
return self._read_chunked(amt)
File "/usr/lib/python2.7/httplib.py", line 592, in _read_chunked
value.append(self._safe_read(amt))
File "/usr/lib/python2.7/httplib.py", line 647, in _safe_read
chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
Type <type 'exceptions.KeyError'>
Message 'content'
Stacktrace File "/storage/sdcard0/Android/data/org.xbmc.xbmc/files/.xbmc/addons/plugin.video.revision3/default.py", line 339, in <module>
get_video(url, name, plot, studio, episode, thumb, date)
File "/storage/sdcard0/Android/data/org.xbmc.xbmc/files/.xbmc/addons/plugin.video.revision3/default.py", line 231, in get_video
result = common.fetchPage({"link": url})['content']
(2012-08-23, 11:08)takoi Wrote: What did you expect to happen? When it times out it times out. If you have a way to recover, then catch it..
response = common.fetchPage({"link": url})
File ".../Library/Application Support/XBMC/addons/script.module.parsedom/lib/CommonFunctions.py", line 410, in fetchPage
ret_obj["content"] = inputdata.decode("utf-8")
File "/Applications/XBMC.app/Contents/Frameworks/lib/python2.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 275-276: invalid data