Kodi Community Forum

Full Version: Xml parsing in plugin
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi guys,
I want to code a video plugin but I have some problems. I've seen the plugin tutorial from Voinage and there he uses a site where the links to the videos are on the html page. On the this site some links where loaded through javascript. I've found the xml files where the video ids are stored. The problem is I need to parse the links from various xml files but I don't know how.

Thanks

kimx
kimx Wrote:Hi guys,
I want to code a video plugin but I have some problems. I've seen the plugin tutorial from Voinage and there he uses a site where the links to the videos are on the html page. On the this site some links where loaded through javascript. I've found the xml files where the video ids are stored. The problem is I need to parse the links from various xml files but I don't know how.

Thanks

kimx

Try looking at the minidom or ElementTree or BeautifulStoneSoup or one of the many other xml parsing modules for python. Googling something like parse xml with python or even 'parse links from xml with python' should return some good results.
I haven't myselft played with minidom but here is an example:
http://sebsauvage.net/python/snyppets/in...#parse_rss

Concerning ElementTree or BeautifulSoup, I used both on the same XML and from my point of view:
- BeautifulSoup supports better errors in XML file but is a little bit slower
Here a quick example (not tested specifically this one, but is should work):
Code:
from BeautifulSoup import BeautifulStoneSoup, Tag, NavigableString

soup =  BeautifulStoneSoup((open(os.path.join(CACHEDIR, XMLFile), 'r')).read())
cat_scrapers = soup.find("scrapers")

if cat_scrapers != None:
    for item in cat_scrapers.findAll("entry"):
        if hasattr(item.title,'string'):
            if item.title.string != None:
                title = item.title.string.encode("cp1252")
        if hasattr(item.version,'string'):
            if item.version.string != None:
                version = item.version.string.encode("utf-8")
        if hasattr(item.lang,'string'):
            if item.lang.string != None:
                language = item.lang.string.encode("utf-8")
        if hasattr(item.date,'string'):
            if item.date.string != None:
                date = item.date.string.encode("cp1252")
        if hasattr(item.previewvideourl,'string'):
            if item.previewvideourl.string != None:
                previewVideoURL = item.previewvideourl.string.encode("utf-8")


- ElementTree doesn't like error (means you have more to right in order to cover those cases) but is very very fast for parsing XML, I have seen a huge different in the speed.
Here the same example with Element Tree (not tested):
Code:
import elementtree.ElementTree as ET

elems          = ET.parse( open( os.path.join( CACHEDIR, XMLFile ), "r" ) ).getroot()
cat_scrapers = elems.find( "scrapers" ).findall( "entry" )

for item in cat_scrapers:
    title             = item.findtext( "title" )
    version           = item.findtext( "version" )
    language          = item.findtext( "lang" )
    date              = item.findtext( "date" )
    added             = item.findtext( "added" )
    previewVideoURL   = item.findtext( "previewVideoURL" )

In both cases you will need to cover the exceptions (using try/excpetion block) and limit cases of course.

Here it is. I hope it helps.
Minidom sits between the two in terms of speed. Like element tree, it requires mostly correct xml. The only benefit of it over element tree is it is in the standard lib in python 2.4 (the one in xbmc).

If you're very concerned about speed, check out lxml Nod
First off all thank you. I'm playing with lxml right now but I have problems getting to a node based on a value of an argument. I've tried this example on the lxml homepage:
http://codespeak.net/lxml/tutorial.html#elementpath
but there are errors. Any suggestions?

Thanks and gn8
kimx
rwparris2 Wrote:If you're very concerned about speed, check out lxml Nod

lxml really good parser but it's huge. Plex devs use lxml on plex apps. I think about using lxml but it's not handy because I have to add different lxml builds for all platforms.

I wish XBMC support user Lib folder to(ticket - 7246). If it's support we can create Additional Python Library Package on Google Addon page (add all parsers in it. Like feedparser, BeautifulSoup or lxml etc..) and install with SVN Installer. After that we dont need it tons of parsers in Plugins.
those techniques are all well and "proper" but parsing a large doc can have a sever ram impact and on xbox that might siply mean it wont work.

An alternative "cheating" way would be to use regex - if done right it can also be the fastest way of parsing too.