2012-02-04, 16:42
That is exactly what I think I am doing.
But I am not getting the desired result:
I think my problem on post #74 is also similar. If there are mixed <li> with and without attributes, it is causing problem.
Or maybe I have a corrupted copy of parsedom. How do I check or reinstall?
Thanks.
Quote:print item
Plot = common.parseDOM(item, "p")
print 'ParseDOM returned: ' + str(len(Plot))
But I am not getting the desired result:
Quote:08:36:34 T:828 NOTICE: <div class="post-left"><a href="http://documentarystorm.com/last-chance-to-see/" title="Last Chance to See"><img src="http://documentarystorm.com/files/2012/01/last-chance-to-see1.jpg" alt="Last Chance to See (documentary)" height="150" width="150" /></a></div><div class="post-right"><h3><a href="http://documentarystorm.com/last-chance-to-see/" rel="bookmark" title="Stream this documentary: Last Chance to See">Last Chance to See</a></h3><p class="post-meta">Jan 29th, 2012 // <a href="http://documentarystorm.com/category/nature-biology/animals-nature-biology/" title="View all posts in Animals" rel="category tag">Animals</a>, <a href="http://documentarystorm.com/category/nature-biology/" title="View all posts in Nature" rel="category tag">Nature</a> // <a href="http://documentarystorm.com/last-chance-to-see/#comments" title="Comment on Last Chance to See">2 Comments »</a></p><p>Stephen Fry and zoologist Mark Carwardine head to the ends of the earth in search of animals on the edge of extinction.</p><div class="gdsrcacheloader gdsrclsmall" id="gdsrc_asr.7827.0.1.1327816953.48.1.20.6.4.0"><strong>GD Star Rating</strong><br /><em>a WordPress rating system</em></div></div><div class="clearfix"></div>
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] parseDOM : 'start: 'p' - {} - False - <type 'str'>'
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] parseDOM : 'no list found, making one on just the element name'
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] parseDOM : 'Getting element content for 1 matches '
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] _getDOMContent : 'match: <p class="post-meta">'
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] _getDOMContent : 'start: 441, len: 21, end: 887'
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] _getDOMContent : 'done html length: 425'
08:36:34 T:828 NOTICE: [DocumentaryStorm - 0.0.1] parseDOM : 'Done'
08:36:34 T:828 NOTICE: ParseDOM returned: 1
I think my problem on post #74 is also similar. If there are mixed <li> with and without attributes, it is causing problem.
Or maybe I have a corrupted copy of parsedom. How do I check or reinstall?
Thanks.