[Proposal] Seperating the scraper in a library.
#1
Hi all,

I'm a Belgian Computer Science student ( 1st Master ) at the university of Ghent. I have participated in GSoC successfully 2 years now and I really love the program. It's allows me to code the entire summer without having to worry about making some money for next year and I get the opportunity to get into some new FOSS projects and get to know interesting people.

I've been using XBMC for almost 3 years now I think and I'm a bit obsessive about my media-library which led me to the irc channel a few months ago asking around and concluding the scraper support in a library would help some of my problems. People told me then the idea has been around a while and I had a quick look at the code but didn't continue because I got some payed work. GSoC and the summer would be my opportunity to finally get into the XBMC code-base for real.

Proposal
The main idea is to separate the scraper support in library. There are several sub parts, the intelligent file-iteration, the xml-scraper parsing, the callbacks that implement the xml-scrapers functionallity, the acquiring of the data and finally storing the results.

My proposal will focus on the middle two parts and when I have time left I will include the first. I would include some new features and options for the xml-scrapers to make them even more powerful. For example there is the idea of 2-way communication allowing the scraper to provide feedback to a back-end ( for example enabling the users's choices about what file is what movie to be used for better predictions, some recommendation systems,... ) I would also like to see scraping be a lot more modular so people can "put together" their own custom scraper from prefab blocks. People could then use "small scraper blocks" combined. Ratings, posters, fan-art, plot, actors,... all this are to be small blocks which could get there data from different sources. I might like the IMDB's ratings but the plot's from RT and the posters from somewhere else and there could be no existing scraper having this combination. This would allow such user-defined combinations. It would also allow to only rescrape ratings or plots or,....

I have it worked out a lot better and more technical but I don't think it's interesting for all so if you have any questions feel free to pm me or pm me for my e-mail address so we can mail.

Benefits
There really are a huge number of benefits from having a library for this. I'll name some but the list surely isn't definitive.
  • It will result in more modular, less inter-woven code in xbmc's codebase.
  • Other applications ( utility tools,... ) could use the library and it's bindings to python/java/... to support the same xml-scrapers and have the same scraper functionallity as xbmc does. This will enable a more uniform way of scraping and tools like Ember would do things 100% the XBMC's way. The resulting data should be 100% XBMC-compatible. The tools would support our regular scrapers!
  • Scraping could be done off-site. Your NAS/Fileserver could do the scraping when a new file gets in even if your mediacenter isn't powered on ( the data could go in the mysql database if enabled or otherwise stored until the mediacenter pc is on ). Nice browser-libraries could be made showing your media-library with all the meta-data xbmc would have without your system being on.
  • Modular scrapers have a lot of benefits some of which were mentioned above and other I won't all list here.
  • Two-way communication..
  • If I get to the file iteration part I would make this smart enough so XBMC would only need one path anymore and no longer an indication of what data is there to find ( movies, shows, music,.. )
  • The whole library-idea fits perfectly in the unified back-end movement documented on the wiki.
  • ...

I hope dev's ( and users ) who read this are as excited as I am about this project. Please feel free to ask any question over here, suggest extra features, provide feedback. I have only my experience with the media library so there might be awesome features I just don't think of because I'd never use it.

Greets,
Sander
Reply


Messages In This Thread
[Proposal] Seperating the scraper in a library. - by dzan - 2012-03-24, 12:15
Logout Mark Read Team Forum Stats Members Help
[Proposal] Seperating the scraper in a library.0