Thread Rating:
  • 0 Vote(s) - 0 Average
Python scraper
#1
I've started a project similar to ScraperXML but in Python and the goal is compability with dharma+ addons.
However all information about scraper development is kind of (well thats a understatement) outdated, or perhaps I've missed something?

I'm trying to reverse engineer the ones that are included in dharma release but i'm getting very confused. Is there -any- information on how the dharma engine works with scrapers?

perhaps a flowchart Tongue?
Reply
#2
code. see addons/Scraper.cpp, and video/VideoInfoDownloader.cpp
Reply
#3
Oh, my c/c++ is very rusty. This will be interssting Tongue.

-Z
Reply
#4
I've put up a git on github with the project. Not much yet since i started today. But here it is anyway.

https://github.com/ztripez/pyScraper
Reply
#5
Ok, i've built an addon class that builds a stack with all functions from it's addon and from dependencyn.

I have a couple of questions though:

* The buffer(s) has 20 slots, is there a local buffer in every function or is it one global?


* A snippet from tmdb.xml:
Quote:<CreateSearchUrl dest="3">
<RegExp input="$$1" output="<url>http://api.themoviedb.org/2.1/Movie.search/$INFO[language]/xml/57983e31fb435df4df77afb854740ea9/\1</url>" dest="3">
<RegExp input="$$2" output="+\1" dest="4">
<expression clear="yes">(.+)</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</CreateSearchUrl>

The basics are simple;
- Do regex-replace on buffer 1 with output and use buffer 1 as source and put the result in buffer 3.

However, sinces there are a nested RegExp should i run the regex on the parent buffer and if so, should i do it before or after i've applied the parents regex?
Reply
#6
the buffers are global to the parser. if you dig a bit you'll see the 'clearbuffers=no' tag. that's a way to pass information between functions.

expressions are evaluated in an lifo/depth-search fashion, i.e. dig into the deepest one and evaluate that first.
Reply
#7
spiff Wrote:the buffers are global to the parser. if you dig a bit you'll see the 'clearbuffers=no' tag. that's a way to pass information between functions.
But if the buffers are global for the scraper, why is the 'clearbuffers=no' needed? When does it clean itself?

spiff Wrote:expressions are evaluated in an lifo/depth-search fashion, i.e. dig into the deepest one and evaluate that first.
Alright, i thought so, thanks.


Thanks for the info
-Z
Reply
#8
by default, if that tag isn't set, you clear the buffers at the end of a function call (or well, somewhere before the next function is called, but logic wise it's easiest to have it at the end of an evaluation).
Reply
#9
Alright, thanks
Reply
#10
Thanks for the info
Reply



Python scraper00