Standalone XBMC Scraper Utilities
#1
I am desperately trying to implement XBMC web scraping into my 3D Frontend by any means short of diving into the source code of XBMC myself.

After some searching I found a couple projects that seem like they could solve my problem, but I can't get either of them to function properly.

These projects are:
Scrap - http://wiki.xbmc.org/?title=Scrap
and
ScraperXML - http://sourceforge.net/projects/scraperxml/

I can't figure out how to get either of them to function though. I followed the instructions for the Scrap project's command line options, but I think the XML file for the scraper I was testing with was too new for the old Scrap project to recognize. I was unable to find any instructions on how to use ScraperXML, but I received a fatal error every time I launched the binary.

Here is an overview of exactly what I am trying to accomplish:

Purpose - Utilize XBMC scraper XML files to scrape media information from the internet independently from XBMC.

Use - I want to be able to start the tool with a command-line argument such as: scraper.exe imdb.xml "Fight Club"
Then I want the tool to write the results to an output file, such as output.xml, for my frontend to use when extracting the scraped information.

If there is more than 1 possible match to select from, the list of options could be written to the output.xml file instead, at which time my frontend would decide which match to use and once again launch the tool, but this time with the new search string.


SO, does anybody know of a tool that could do this? And if it happens to be ScraperXML or Scrap, could you explain to me how to use it? Any help would be much appreciated. I'm trying to wrap up this project I've been working on for 2 years and this XBMC-powered media info scraping would be the icing on the cake!
Reply
#2
I am working on a Python interface to scrapers which loads XBMC as a shared library, but it'll be a while before it's available. When it is, it would be easy to write a small wrapper program that took a scraper name and media info (e.g., movie title) and output matches and/or match details in whatever format you wanted.
Reply
#3
dbrobins Wrote:I am working on a Python interface to scrapers which loads XBMC as a shared library, but it'll be a while before it's available. When it is, it would be easy to write a small wrapper program that took a scraper name and media info (e.g., movie title) and output matches and/or match details in whatever format you wanted.

Having a python interface like that would be really awesome. The XBMC scrapers are superb thanks to the huge efforts by everyone involved in the project, so making them available via python or any other language would be very useful for everyone.

May I ask what the timeframe on this solution is? Are we talking short term or long term?
Reply
#4
djace Wrote:Having a python interface like that would be really awesome. The XBMC scrapers are superb thanks to the huge efforts by everyone involved in the project, so making them available via python or any other language would be very useful for everyone.

May I ask what the timeframe on this solution is? Are we talking short term or long term?

Probably rather long right now since I'm moving next month so I won't be able to get back to it for a bit.

So far, I have:
  • XBMC loadable as a shared library (for standalone testing/use)
  • Partial Python test harness (moving it to SWIG which is a bit painful), able to run existing scrapers, get results back, get details, etc.

When I finish moving the test harness to SWIG, then I'll work on a general interface to scrapers (API or I/O) - that part will be essentially a capstone and fairly small in itself. It's important to me to first have a (standalone) test harness to verify that everything works the same way and for people to test eventual Python scrapers in an automated manner.
Reply
#5
Nice,
Can you share your work somewhere ?
Reply
#6
Shipis Wrote:Nice,
Can you share your work somewhere ?

The shared library work is up on Github. It's ready to be merged in (probably outdated by other changes by now, but I can update the diffs if someone is ready to merge it in) and has been for a while. Just takes a committer with interest, I guess; I'm not sure who, or how to move that along. 214 (PR)
Reply

Logout Mark Read Team Forum Stats Members Help
Standalone XBMC Scraper Utilities0