Xml parsing in plugin
#1
Hi guys,
I want to code a video plugin but I have some problems. I've seen the plugin tutorial from Voinage and there he uses a site where the links to the videos are on the html page. On the this site some links where loaded through javascript. I've found the xml files where the video ids are stored. The problem is I need to parse the links from various xml files but I don't know how.

Thanks

kimx
Reply
#2
kimx Wrote:Hi guys,
I want to code a video plugin but I have some problems. I've seen the plugin tutorial from Voinage and there he uses a site where the links to the videos are on the html page. On the this site some links where loaded through javascript. I've found the xml files where the video ids are stored. The problem is I need to parse the links from various xml files but I don't know how.

Thanks

kimx

Try looking at the minidom or ElementTree or BeautifulStoneSoup or one of the many other xml parsing modules for python. Googling something like parse xml with python or even 'parse links from xml with python' should return some good results.
Always read the XBMC online-manual, FAQ and search and search the forum before posting.
For troubleshooting and bug reporting please read how to submit a proper bug report.

If you're interested in writing addons for xbmc, read docs and how-to for plugins and scripts ||| http://code.google.com/p/xbmc-addons/
Reply
#3
I haven't myselft played with minidom but here is an example:
http://sebsauvage.net/python/snyppets/in...#parse_rss

Concerning ElementTree or BeautifulSoup, I used both on the same XML and from my point of view:
- BeautifulSoup supports better errors in XML file but is a little bit slower
Here a quick example (not tested specifically this one, but is should work):
Code:
from BeautifulSoup import BeautifulStoneSoup, Tag, NavigableString

soup =  BeautifulStoneSoup((open(os.path.join(CACHEDIR, XMLFile), 'r')).read())
cat_scrapers = soup.find("scrapers")

if cat_scrapers != None:
    for item in cat_scrapers.findAll("entry"):
        if hasattr(item.title,'string'):
            if item.title.string != None:
                title = item.title.string.encode("cp1252")
        if hasattr(item.version,'string'):
            if item.version.string != None:
                version = item.version.string.encode("utf-8")
        if hasattr(item.lang,'string'):
            if item.lang.string != None:
                language = item.lang.string.encode("utf-8")
        if hasattr(item.date,'string'):
            if item.date.string != None:
                date = item.date.string.encode("cp1252")
        if hasattr(item.previewvideourl,'string'):
            if item.previewvideourl.string != None:
                previewVideoURL = item.previewvideourl.string.encode("utf-8")


- ElementTree doesn't like error (means you have more to right in order to cover those cases) but is very very fast for parsing XML, I have seen a huge different in the speed.
Here the same example with Element Tree (not tested):
Code:
import elementtree.ElementTree as ET

elems          = ET.parse( open( os.path.join( CACHEDIR, XMLFile ), "r" ) ).getroot()
cat_scrapers = elems.find( "scrapers" ).findall( "entry" )

for item in cat_scrapers:
    title             = item.findtext( "title" )
    version           = item.findtext( "version" )
    language          = item.findtext( "lang" )
    date              = item.findtext( "date" )
    added             = item.findtext( "added" )
    previewVideoURL   = item.findtext( "previewVideoURL" )

In both cases you will need to cover the exceptions (using try/excpetion block) and limit cases of course.

Here it is. I hope it helps.
Image
_____________________________

Repositories Installer: select and install unofficial repositories / TAC.TV: watch videos on TAC.TV
Installer Passion-XBMC: Download and Install Add-ons (pre-Dharma only)

Image
Reply
#4
Minidom sits between the two in terms of speed. Like element tree, it requires mostly correct xml. The only benefit of it over element tree is it is in the standard lib in python 2.4 (the one in xbmc).

If you're very concerned about speed, check out lxml Nod
Always read the XBMC online-manual, FAQ and search and search the forum before posting.
For troubleshooting and bug reporting please read how to submit a proper bug report.

If you're interested in writing addons for xbmc, read docs and how-to for plugins and scripts ||| http://code.google.com/p/xbmc-addons/
Reply
#5
First off all thank you. I'm playing with lxml right now but I have problems getting to a node based on a value of an argument. I've tried this example on the lxml homepage:
http://codespeak.net/lxml/tutorial.html#elementpath
but there are errors. Any suggestions?

Thanks and gn8
kimx
Reply
#6
rwparris2 Wrote:If you're very concerned about speed, check out lxml Nod

lxml really good parser but it's huge. Plex devs use lxml on plex apps. I think about using lxml but it's not handy because I have to add different lxml builds for all platforms.

I wish XBMC support user Lib folder to(ticket - 7246). If it's support we can create Additional Python Library Package on Google Addon page (add all parsers in it. Like feedparser, BeautifulSoup or lxml etc..) and install with SVN Installer. After that we dont need it tons of parsers in Plugins.
Reply
#7
those techniques are all well and "proper" but parsing a large doc can have a sever ram impact and on xbox that might siply mean it wont work.

An alternative "cheating" way would be to use regex - if done right it can also be the fastest way of parsing too.
Retired from Add-on dev
Reply

Logout Mark Read Team Forum Stats Members Help
Xml parsing in plugin0