Login at Kodi Home

fluidman · 2005-04-30, 12:30

hi,

could one of you get me started on cachedhttp and parsing html, please?

i just want to download http://www.fluidboxmedia.de/test.html via cachedhttp.py and then parse the content and get it displayed in xbmc

the first part should be easy (please correct me if i'm wrong):

Quote:import cachedhttp

c=mycachedhttp()
data=c.urlopen('http://www.fluidboxmedia.de/test.html')

the test.html just countains one table:

Quote:<table>
<tr><td width="100" height="20">header1</td><td width="900" height="20">header2</td><tr>
<tr><td width="100" height="20">row1cell1</td><td width="200" height="20">row1cell2</td><tr>
<tr><td width="100" height="20">row2cell1</td><td width="200" height="20">row2cell2</td><tr>
<tr><td width="100" height="20">row3cell1</td><td width="200" height="20">row3cell2</td><tr>
</table>

can you please show me how i would use re to get the text from the cells from rows 2-4 (in a list maybe)? i can learn best by looking at easy examples. you would help me a lot!

thanks in advance!

lolol · 2005-04-30, 14:58

this is what you look for:

Quote:result = re.findall('<tr><td width="100" height="20">(.+?)</td><td width="200" height="20">(.+?)</td><tr>',data, re.ignorecase | re.dotall)

print result

hope that helps, not tested yet

cu lolol

lolol · 2005-04-30, 15:26

just tested my sample doesn't work...

i think i made a mistake with the "greedy something... Blush

here the correct pattern:

Quote:result = re.findall('<tr><td .+?>(.+?)</td><td .+?>(.+?)</td><tr>',data, re.ignorecase)

gives back:

Quote:>>>
[('header1', 'header2'), ('row1cell1', 'row1cell2'), ('row2cell1', 'row2cell2'), ('row3cell1', 'row3cell2')]

cu lolol

fluidman · 2005-04-30, 17:27

thanks for your help.. i will try it out and see if i get it, hehe