2012-06-16, 17:05
(2012-06-16, 12:02)olympia Wrote: So you're saying ALL scrapers will need to be re-written to accomodate the new API?
Can you already give a hint what converting involves? Is it just like changing the tags, or the whole concept will change?
Are you actually going to keep the buffer and regexp based scraper framework or it will be something different?
Yeah, but obviously I will add code so that the old scrapers work just fine in the new engine. Guaranteeing a scraper can mix and match between new api and old I will not garantue (perhaps it will work but I won't garantue it ).
So the concept will in some ways change I guess, I'm sketching still and want input on how best to do it.
What I want is to move away from only using regexp. Regexp is great for lots of things but its not the best for all things, namely html traversal there are better tools. So I want the new engine to allow for more tools, like things similar to jquery and beautifulsoup. But as said, this is why I'm doing this post so we can discuss whats wanted