Clean scraping API
#1
So for those who knows, and those who don't, I will be working throughout the summer creating a new generalized scraper API for xbmc. I will create this outside xbmc and in python, which will hopefully make it useable in xbmc quickly.

As a first step I've created a script which will gather scraped data, this script will be released into the official repository soon I hope. The data this script will generate will be used to a) give us an idea of how well the old engine works and b) data to train a new engine upon. All data will be anonymous and I will create a blog post about it when it hits official repository.

So to highlight what I want to achieve with a new engine, what I think is of importance:
  • It must be generalized, adding a new media type should be trivial, no part of the core should be bound by media types.
  • What fields is of interest is not tied to the engine, any scraper may add metadata as it sees fit. The user of the engine (xbmc and skinners) may choose what data it understands but scrapers can emit all type of data it wants.
  • Parallelism! As much as possible needs to be parallelism friendly, ideally not only between files but all parts of scraping of a file too.
  • Everything is linked, a movie can have a soundtrack and game associated with it. The director of a movie can be the singer in a band and share photographs on a site.

I will go over my plan here in more details later but first I'd love to know from all current scraper creators what features they would like to see, and perhaps even more important what features they like with the old system.

I hope its going to be a great summer!
If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

Image

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
Reply


Messages In This Thread
Clean scraping API - by topfs2 - 2012-06-14, 19:33
RE: Clean scraping API - by olympia - 2012-06-14, 22:38
RE: Clean scraping API - by DonJ - 2012-06-15, 01:36
RE: Clean scraping API - by da-anda - 2012-06-15, 10:27
RE: Clean scraping API - by topfs2 - 2012-06-16, 11:30
RE: Clean scraping API - by da-anda - 2012-06-18, 22:19
RE: Clean scraping API - by DonJ - 2012-06-27, 12:59
RE: Clean scraping API - by lboregard - 2012-07-01, 04:57
RE: Clean scraping API - by topfs2 - 2012-07-04, 10:34
RE: Clean scraping API - by lboregard - 2012-07-04, 12:09
RE: Clean scraping API - by olympia - 2012-06-16, 12:02
RE: Clean scraping API - by topfs2 - 2012-06-16, 17:05
RE: Clean scraping API - by Maxoo - 2012-06-17, 01:19
RE: Clean scraping API - by RockerC - 2012-06-20, 15:38
RE: Clean scraping API - by NEOhidra - 2012-06-19, 16:25
RE: Clean scraping API - by solidsatras - 2012-06-20, 09:40
RE: Clean scraping API - by Hitcher - 2012-06-20, 10:08
RE: Clean scraping API - by Martijn - 2012-06-20, 10:16
RE: Clean scraping API - by Montellese - 2012-06-20, 10:13
Re: Clean scraping API - by Martijn - 2012-06-20, 16:34
RE: Clean scraping API - by Martijn - 2012-06-20, 21:04
RE: Clean scraping API - by jmarshall - 2012-06-20, 23:46
RE: Clean scraping API - by solidsatras - 2012-06-30, 16:09
RE: Clean scraping API - by Thorbear - 2012-06-30, 13:53
RE: Clean scraping API - by TheAstronaut - 2012-07-02, 16:39
RE: Clean scraping API - by spiff - 2012-07-03, 18:53
RE: Clean scraping API - by TheAstronaut - 2012-07-03, 21:03
RE: Clean scraping API - by Martijn - 2012-07-04, 11:37
RE: Clean scraping API - by topfs2 - 2012-07-07, 12:43
RE: Clean scraping API - by kimp93 - 2012-08-22, 03:28
RE: Clean scraping API - by topfs2 - 2012-08-22, 11:37
RE: Clean scraping API - by aptalca - 2012-07-24, 21:37
RE: Clean scraping API - by kimp93 - 2012-08-23, 05:26
RE: Clean scraping API - by topfs2 - 2012-08-23, 11:53
RE: Clean scraping API - by malte - 2013-03-03, 10:10
RE: Clean scraping API - by topfs2 - 2013-03-06, 09:19
RE: Clean scraping API - by garbear - 2013-03-06, 08:09
RE: Clean scraping API - by garbear - 2013-03-06, 10:11
RE: Clean scraping API - by malte - 2013-03-06, 18:01
RE: Clean scraping API - by topfs2 - 2013-03-11, 15:11
RE: Clean scraping API - by garbear - 2013-03-30, 16:09
RE: Clean scraping API - by topfs2 - 2013-03-31, 20:00
RE: Clean scraping API - by garbear - 2013-04-01, 07:35
RE: Clean scraping API - by malte - 2013-04-02, 14:25
RE: Clean scraping API - by topfs2 - 2013-04-02, 15:03
RE: Clean scraping API - by garbear - 2013-04-02, 16:56
RE: Clean scraping API - by N3MIS15 - 2013-04-03, 07:12
RE: Clean scraping API - by garbear - 2013-04-03, 11:27
RE: Clean scraping API - by topfs2 - 2013-04-04, 08:59
RE: Clean scraping API - by malte - 2013-04-03, 12:56
RE: Clean scraping API - by garbear - 2013-04-04, 08:38
RE: Clean scraping API - by natethomas - 2013-04-04, 10:23
RE: Clean scraping API - by topfs2 - 2013-04-04, 10:56
RE: Clean scraping API - by natethomas - 2013-04-05, 09:58
RE: Clean scraping API - by da-anda - 2013-04-05, 11:25
RE: Clean scraping API - by Bstrdsmkr - 2013-04-05, 16:05
RE: Clean scraping API - by topfs2 - 2013-04-05, 12:27
RE: Clean scraping API - by garbear - 2013-04-05, 16:27
RE: Clean scraping API - by jmarshall - 2013-04-06, 07:36
RE: Clean scraping API - by topfs2 - 2013-04-10, 08:38
RE: Clean scraping API - by natethomas - 2013-04-10, 09:28
RE: Clean scraping API - by garbear - 2013-04-10, 09:42
RE: Clean scraping API - by N3MIS15 - 2013-04-10, 10:40
RE: Clean scraping API - by garbear - 2013-04-10, 09:34
RE: Clean scraping API - by topfs2 - 2013-04-10, 13:29
RE: Clean scraping API - by garbear - 2013-04-10, 13:43
RE: Clean scraping API - by topfs2 - 2013-04-10, 13:58
RE: Clean scraping API - by jmarshall - 2013-04-10, 10:05
RE: Clean scraping API - by garbear - 2013-04-10, 12:08
RE: Clean scraping API - by topfs2 - 2013-04-11, 11:07
RE: Clean scraping API - by N3MIS15 - 2013-04-11, 11:32
RE: Clean scraping API - by topfs2 - 2013-04-11, 11:42
RE: Clean scraping API - by jmarshall - 2013-04-11, 09:00
RE: Clean scraping API - by topfs2 - 2013-04-11, 11:04
RE: Clean scraping API - by garbear - 2013-04-11, 12:05
Re: Clean scraping API - by queeup - 2013-04-11, 16:58
RE: Clean scraping API - by topfs2 - 2013-04-11, 18:04
Re: Clean scraping API - by queeup - 2013-04-11, 19:44
RE: Clean scraping API - by garbear - 2013-04-11, 21:41
Re: Clean scraping API - by queeup - 2013-04-11, 22:05
RE: Clean scraping API - by garbear - 2013-04-11, 22:51
RE: Clean scraping API - by topfs2 - 2013-04-17, 10:50
RE: Clean scraping API - by garbear - 2013-05-09, 23:05
RE: Clean scraping API - by TheMonkeyKing - 2013-10-18, 22:31


Logout Mark Read Team Forum Stats Members Help
Clean scraping API3
This forum uses Lukasz Tkacz MyBB addons.