Clean scraping API

  Thread Rating:
  • 3 Votes - 3.67 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Martijn Offline
Team Kodi
Posts: 12,189
Joined: Jul 2011
Reputation: 170
Location: Dawn of time
Post: #16
I made an add-on for Frodo that exactly does such a thing

http://forum.xbmc.org/showthread.php?tid=132714

Always read the XBMC online-manual, FAQ and search the forums before posting.
Do NOT e-mail Team-XBMC members asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting, make sure you read this first

For your mediacenter artwork go to
[Image: fanarttv.png]
(This post was last modified: 2012-06-20 16:35 by Martijn.)
find quote
Martijn Offline
Team Kodi
Posts: 12,189
Joined: Jul 2011
Reputation: 170
Location: Dawn of time
Post: #17
@topfs2
I saw this reply on the blog post at xbmc.org:

Quote:June 20th, 2012 at 08:42 | #13

I’d tried the plugin, but on the screen on which you have to select which source directories you want to submit the list of directories is too long for my display (running 1080p res) and the list isn’t scrollable. This has to be solved otherwise at least my data will be useless/counterproductive.

Always read the XBMC online-manual, FAQ and search the forums before posting.
Do NOT e-mail Team-XBMC members asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting, make sure you read this first

For your mediacenter artwork go to
[Image: fanarttv.png]
(This post was last modified: 2012-06-20 21:04 by Martijn.)
find quote
jmarshall Offline
Team-XBMC Developer
Posts: 26,221
Joined: Oct 2003
Reputation: 178
Post: #18
topfs2 is on holiday for the next 2 weeks.

If someone else could please take the script and fix it up it would be much appreciated.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


[Image: badge.gif]
find quote
DonJ Offline
Team-Kodi Member
Posts: 524
Joined: May 2005
Reputation: 5
Post: #19
(2012-06-16 11:30)topfs2 Wrote:  I'm not 100% I follow, I'd love some more examples. What I'm not sure is if you want the cnfiguration as an xbmc user or as xbmc (code using this engine) or as a scraper developer? I haven't 100% decided what will trigger the scanning, what I'm focusing on mostly now is when you know file X exist and want to gather data on it what to do. I'd love some thoughts on the actual scanning process too if its of interest in this project.

I think the file scanning process should be completely decoupled from the scraping process. Hence the process would be as follows:

1) "Something" finds a file (this something might be an addon, it might be an external tool which notifies xbmc via e.g. JSON or it might be the integrated xbmc file scanner)

2) The path to the file or directory is pushed to the scraper which starts the process to gather meta data via xml scrapers etc.

Therefore, the important part imo is to create a good/easy to use api to push paths/directories to the scraper. The same api should probably also allow the deletion of data from the library.

I think this would really open up the file scanning process to third party addon/tool developers. Hope this helps at all.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
Thorbear Offline
Junior Member
Posts: 43
Joined: Aug 2010
Reputation: 0
Location: Uppsala, Sweden
Post: #20
i've posted a comment to the article at xbmc.com.
and even though i have version 0.0.4 of the script (stated working) the script fails for me..
i'd love to contribute my data if it gets fixed.

my log: http://xbmclogs.com/show.php?id=4280

[Image: widget]
find quote
solidsatras Offline
Senior Member
Posts: 297
Joined: Mar 2010
Reputation: 10
Post: #21
@Thorbear
(2012-06-20 23:46)jmarshall Wrote:  topfs2 is on holiday for the next 2 weeks.

If someone else could please take the script and fix it up it would be much appreciated.

I think nobody did.
find quote
lboregard Offline
Junior Member
Posts: 5
Joined: Aug 2010
Reputation: 0
Post: #22
(2012-06-27 12:59)DonJ Wrote:  
(2012-06-16 11:30)topfs2 Wrote:  I'm not 100% I follow, I'd love some more examples. What I'm not sure is if you want the cnfiguration as an xbmc user or as xbmc (code using this engine) or as a scraper developer? I haven't 100% decided what will trigger the scanning, what I'm focusing on mostly now is when you know file X exist and want to gather data on it what to do. I'd love some thoughts on the actual scanning process too if its of interest in this project.

I think the file scanning process should be completely decoupled from the scraping process. Hence the process would be as follows:

1) "Something" finds a file (this something might be an addon, it might be an external tool which notifies xbmc via e.g. JSON or it might be the integrated xbmc file scanner)

2) The path to the file or directory is pushed to the scraper which starts the process to gather meta data via xml scrapers etc.

Therefore, the important part imo is to create a good/easy to use api to push paths/directories to the scraper. The same api should probably also allow the deletion of data from the library.

I think this would really open up the file scanning process to third party addon/tool developers. Hope this helps at all.

on a similar note, it should be possible to push ids (imdib, tmdbid, tvdbid and such) to the scrapers in a way that it can be called from the json-rpc interface that's under development.
find quote
TheAstronaut Offline
Member
Posts: 58
Joined: Aug 2011
Reputation: 0
Location: Michigan
Post: #23
I hope this post is in the right forum, if not, moderators please feel free to move it.


I would like to say that I think this is a fantastic project. I was also wondering if I could add a feature request to the list for the new scraper?

Could it be possible to write the scraper in a way so that it could read either an IMDB or themoviedb.org ID# directly from the filename itself?

For example, the movie Iron Man normally would be named something like:

Iron.Man.2008.mkv or Iron Man (2008).mkv

But instead could be named:

Iron.Man.2008.tt0371746.mkv

the 'tt' in this case could be any arbitrary token that would not use reserved/special characters on Windows/Mac/Linux filesystems and would not occur naturally in a movie title.

This naming convention would allow for a user to set up a library once, write the ID#'s to the filename, and not have to worry about things changing if the need to re-scrape ever arose. It would also provide an easy way to correctly tag some of the trickier movie titles out there (i.e. 2012 (2009).mkv or M (1931).mkv to name a few).

Does anyone else feel this would be useful to them?
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #24
this will be possible.
find quote
TheAstronaut Offline
Member
Posts: 58
Joined: Aug 2011
Reputation: 0
Location: Michigan
Post: #25
(2012-07-03 18:53)spiff Wrote:  this will be possible.

That is great news, thanks Smile
find quote
topfs2 Offline
Team-Kodi Developer
Posts: 4,269
Joined: Dec 2007
Reputation: 15
Post: #26
(2012-06-27 12:59)DonJ Wrote:  
(2012-06-16 11:30)topfs2 Wrote:  I'm not 100% I follow, I'd love some more examples. What I'm not sure is if you want the cnfiguration as an xbmc user or as xbmc (code using this engine) or as a scraper developer? I haven't 100% decided what will trigger the scanning, what I'm focusing on mostly now is when you know file X exist and want to gather data on it what to do. I'd love some thoughts on the actual scanning process too if its of interest in this project.

I think the file scanning process should be completely decoupled from the scraping process. Hence the process would be as follows:

1) "Something" finds a file (this something might be an addon, it might be an external tool which notifies xbmc via e.g. JSON or it might be the integrated xbmc file scanner)

2) The path to the file or directory is pushed to the scraper which starts the process to gather meta data via xml scrapers etc.

Therefore, the important part imo is to create a good/easy to use api to push paths/directories to the scraper. The same api should probably also allow the deletion of data from the library.

I think this would really open up the file scanning process to third party addon/tool developers. Hope this helps at all.

This is a good thing and I will for sure keep it in mind, the first version will not do any part of the file finding but will rather be given files it is meant to scan. Later we could add file finders in python aswell


(2012-07-01 04:57)lboregard Wrote:  on a similar note, it should be possible to push ids (imdib, tmdbid, tvdbid and such) to the scrapers in a way that it can be called from the json-rpc interface that's under development.

You want to be able to ask the engine for information about a specific movie even if its not scraped and has a file coupled to it? e.g. I don't have avatar file but a remote could still ask for the data about it? If so that is a very good suggestion and I will keep it in mind!

If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

[Image: badge.gif]

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
(This post was last modified: 2012-07-04 10:34 by topfs2.)
find quote
Martijn Offline
Team Kodi
Posts: 12,189
Joined: Jul 2011
Reputation: 170
Location: Dawn of time
Post: #27
I did some changes to your script so I will update my PR tonight.
There were also some other bug reports that I haven't looked at.

Always read the XBMC online-manual, FAQ and search the forums before posting.
Do NOT e-mail Team-XBMC members asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting, make sure you read this first

For your mediacenter artwork go to
[Image: fanarttv.png]
find quote
lboregard Offline
Junior Member
Posts: 5
Joined: Aug 2010
Reputation: 0
Post: #28
(2012-07-04 10:34)topfs2 Wrote:  
(2012-06-27 12:59)DonJ Wrote:  
(2012-06-16 11:30)topfs2 Wrote:  I'm not 100% I follow, I'd love some more examples. What I'm not sure is if you want the cnfiguration as an xbmc user or as xbmc (code using this engine) or as a scraper developer? I haven't 100% decided what will trigger the scanning, what I'm focusing on mostly now is when you know file X exist and want to gather data on it what to do. I'd love some thoughts on the actual scanning process too if its of interest in this project.

I think the file scanning process should be completely decoupled from the scraping process. Hence the process would be as follows:

1) "Something" finds a file (this something might be an addon, it might be an external tool which notifies xbmc via e.g. JSON or it might be the integrated xbmc file scanner)

2) The path to the file or directory is pushed to the scraper which starts the process to gather meta data via xml scrapers etc.

Therefore, the important part imo is to create a good/easy to use api to push paths/directories to the scraper. The same api should probably also allow the deletion of data from the library.

I think this would really open up the file scanning process to third party addon/tool developers. Hope this helps at all.

This is a good thing and I will for sure keep it in mind, the first version will not do any part of the file finding but will rather be given files it is meant to scan. Later we could add file finders in python aswell


(2012-07-01 04:57)lboregard Wrote:  on a similar note, it should be possible to push ids (imdib, tmdbid, tvdbid and such) to the scrapers in a way that it can be called from the json-rpc interface that's under development.

You want to be able to ask the engine for information about a specific movie even if its not scraped and has a file coupled to it? e.g. I don't have avatar file but a remote could still ask for the data about it? If so that is a very good suggestion and I will keep it in mind!

yes ! that would be awesome !
find quote
topfs2 Offline
Team-Kodi Developer
Posts: 4,269
Joined: Dec 2007
Reputation: 15
Post: #29
You can now follow the work in https://github.com/topfs2/heimdall

The base design of the engine is that the process of scraping is split into tasks, which can run in parallell. These tasks are triggered automatically by the engine, when the scraping item has certain properties etc. So for example we will do the task of searching on tmdb when an item is of type "movie". This work is still in the early stages and the triggering is extremely basic for now but its something to show.

So example of tmdb is https://raw.github.com/topfs2/heimdall/m...rc/tmdb.py and as you can see there is no need to use regexp and its possible to just use json instead to parse, the task can choose the tool fitting for the job on its own

If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

[Image: badge.gif]

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
find quote
aptalca Offline
The Dude
Posts: 938
Joined: Sep 2009
Reputation: 25
Post: #30
Hi topfs2,

I didn't realize you were working on restructuring the whole "scraping" in xbmc. I had a couple of suggestions, sparked by minor things that have been bugging me from time to time. I think those suggestions would really help a lot of people not have to rely on third party media managers.

It is about partial scraper updates. As you know, the scraped options for metadata become out-of-date after a while. Or they are simply non-existent if the library was imported from nfo's and local artwork. It is annoying when a new season for a show starts, and there is no season thumb in the library for the new season, because it simply didn't exist when the show was originally scraped. Rescraping the show to get the new season thumb often results in all the other thumbs and fanart changing as well.

I suggested partial scraper updates just to retrieve the latest scraper results (without a full rescrape) that could be done on demand or automatically as a background service.

Anyway, you can find my full rant here: http://forum.xbmc.org/showthread.php?tid=136411

Thanks and good luck with your project
(This post was last modified: 2012-07-24 21:40 by aptalca.)
find quote
Post Reply