Kodi Community Forum
[Release] Parsedom and other functions - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Add-ons (https://forum.kodi.tv/forumdisplay.php?fid=26)
+--- Thread: [Release] Parsedom and other functions (/showthread.php?tid=116498)

Pages: 1 2 3 4 5 6 7 8 9

[Release] Parsedom(Fast alternative to BeautifulSoup) and other functions - TobiasTheCommie - 2011-12-08

This is version 0.9.1 of a DOM parser(and other assorted functions) for XBMC.

This addon will go to Final 1.0, with YouTube 3.0.

The parseDOM function greatly cuts down on the time it takes to scrape a site, as compared to BeautifulSoup. The implementation structure is quite different from BeautifulSoup though.

Some scraper functions in the YouTube plugin have been cut from 20+s to <3s by changing from BeautifulSoup to parseDOM

Besides for this feature the addon also provides the following functions:
- An urllib2 wrapper function called fetchPage.
- Strip <tags> from a string.
- Convert call path value=key pairs to dictionary.
- Ask user for keyboard input.
- Ask user for numpad input.
- Replace certain html codes like &amp; with the real character.

Implementation details can be found on the wiki.


This is a dependency used by our other plugins, and we perform constant integration and unittesting during development


Questions, requests, suggestions are welcome.

Note: Before the 1.0 final we do not promise the interface will stay locked.
After the 1.0 version there will be "DeprecatedFunction" warnings on changes for a limited time.


- Popeye - 2011-12-09

Very interesting. What is the reason for the time reduction? would be interesting to hear. Also, any caveats when going from BeautifulSoup?

- TobiasTheCommie - 2011-12-09

The speed increase is mostly because BeautifulSoup actually parses the DOM correctly, and has very good error handling.

Our parseDOM function doesn't even try to parse the document, it just extracts the tag attributes/content you want. parseDOM is basically a few (very hardened by now) regex's.

That means that pages with broken HTML may work in BeautifulSoup, but may fail in parseDOM.

With that said we have yet to hit anything that we couldn't extract with parseDOM, and we use that extensively for both YouTube and BlipTV. Our Vimeo plugin will be moved to parseDOM as well in the future.

I hope that answers your question.

- Popeye - 2011-12-09

Thanks for the information. Sounds very interesting and something I will look into using in my pinkbike and happymtb addons.

- TobiasTheCommie - 2011-12-10

If you have any problems feel free to report them.

Hopefully we can find a solution.

- _Pierre_ - 2011-12-10

Cool will try to use this.

- bossanova808 - 2011-12-11

Ok simple question I presume - I try and import this using

import CommonFunctions

# plugin constants
version = "0.0.1"
plugin = "OzWeather-" + version
author = "Bossanova808"
url = "www.bossanova808.net"

common = CommonFunctions.CommonFunctions()
common.plugin = plugin

and this give me (when running in xbmc - only way I know how):

15:32:38 T:2868   ERROR: Traceback (most recent call last):
                                              File "C:\Users\X\AppData\Roaming\XBMC\addons\weather.OzWeather\default.py", line 32, in <module>
                                                common = CommonFunctions.CommonFunctions()
                                              File "C:\Users\X\AppData\Roaming\XBMC\addons\script.module.parsedom\lib\CommonFunctions.py", line 41, in __init__
                                                if sys.modules[ "__main__" ].dbglevel:
                                            AttributeError: 'module' object has no attribute 'dbglevel'

I looked in the youtube plugin and it appears to be doing just this the same way....

Any ideas for a Python novice?

- TobiasTheCommie - 2011-12-11

Yeah, sorry about that. The released version have some problems with the fallbacks.

This has been fixed in trunk.

This should work.

import xbmc, xbmcgui
# plugin constants
version = "0.0.1"
plugin = "OzWeather-" + version
author = "Bossanova808"
url = "www.bossanova808.net"
dbg = True # Set to false if you don't want debugging
dbglevel = 3 # Do NOT change from 3

import CommonFunctions
common = CommonFunctions.CommonFunctions()
common.plugin = plugin

- bossanova808 - 2011-12-11

Yep but I am not getting very far...

given this url:

and this parse:
ret = common.parseDOM(page, "div", attrs = { "class": "boxed_blue_nopads" })

I get:
File "C:\Users\\AppData\Roaming\XBMC\addons\weather.OzWeather\default.py", line 261, in <module>
                                              File "C:\Users\\AppData\Roaming\XBMC\addons\weather.OzWeather\default.py", line 150, in forecast
                                              File "C:\Users\\AppData\Roaming\XBMC\addons\weather.OzWeather\default.py", line 153, in propertiesPDOM
                                                ret = common.parseDOM(page, "div", attrs = { "class": "boxed_blue_nopads" })
                                              File "C:\Users\\AppData\Roaming\XBMC\addons\script.module.parsedom\lib\CommonFunctions.py", line 188, in parseDOM
                                                item = item.replace("\n", "")
                                            AttributeError: addinfourl instance has no attribute 'replace'

- TobiasTheCommie - 2011-12-11

You are calling parseDOM with a dictionary. parseDOM must be called with a "string" or a [ "list", "of", "strings"]

And now the next version will bitch and complain on that error. Smile

- bossanova808 - 2011-12-11

Yup that would be it. Sorry I don't get python types yet! I thought my fetch was returning a string...thanks!

- TobiasTheCommie - 2011-12-12

I have updated the wiki with some examples with XML instead of HTML.

The parseDOM function has only had limited XML use so far, so this is in no way tested.

But feel free to experiment and report back.

- bossanova808 - 2011-12-12

Ok I have spent some time with this today and wow it's super easy to use - very slick. I am pulling all sorts of things very easily....

I have nothing to compare to really, but this certainly works easily and fast.

- _Pierre_ - 2011-12-12

Can this be used to post a html form and get the result ?

- TobiasTheCommie - 2011-12-12

_Pierre_ Wrote:Can this be used to post a html form and get the result ?

The included fetchPage can currently not be used to post.

We are currently converting the Vimeo plugin to use these dependencies as well. And we are updating the dependencies as we go along.

I "hope" to have login procedure in Vimeo go through the fetchPage, but that is currently not the case. And that would require post ability.

This forum uses Lukasz Tkacz MyBB addons.