[Proposal] Seperating the scraper in a library.
#21
First of all, thanks a lot for your reply! I really appreciate feedback like this and the opinion of experienced developers, this things make me think about the code, see the flaws in my plan, etc...

(2012-04-07, 03:19)AnalogKid Wrote: I've just discovered this thread having raised a 'similar' proposal in the feature suggestions thread (for non developers).

I'm a former software architect for elements of the Symbian operating system, along with being a former developer on a number of handsets for Nokia, SonyEricsson and Motorola. Sadly, like most folks I stopped coding and spent more time drawing UML then more time infront of 'managers to becoming a manager. Suffice to say, I don't know the XBMC code, but I'm not a total 'flake' either.

At the risk of sounding patronising, or being too simplistic I think it's important to get right back to basics:

1) We have media on the users system, and we 'force' the user to at least classify it as TV, Movie, Music etc. Whether this is entirely necessary is up for debate, but it's how things stand right now.

2) In some cases, we ask the user to go a little further still and help us 'classify' media with more detail by having them stick to SOME semblance of a scheme... e.g. using a filename that might help define the media, or a TV season/episode scheme that regex can parse etc.

3) Based on what we can 'deduce' from 1 and 2 we then try to obtain additional meta information and content (trailers, thumbs, fanart etc).

3a) it SEEMS that XBMC introduces some 'core' functionality that attempts to find some additional meta info and content from the 'local' sources - typically side by side, or within a sub folder of the media itself.... (I'm talking about the NFO file, tbn, jpg etc)

3b) If we can't find that 'local' stuff, or we deem there wasn't enough stuff found, or that local stuff explicitly pointed to online resources, we then use 'scrapers' to try and get meta info and content from online resources

My assertion is this:

3a and 3b are the same thing.... 'an attempt to obtain meta info and content'. The highly abstract concept is that XBMC is saying to a scraper:

PHP Code:
scheme 

You are right, this is more or less how it's done right now. I do think you oversimplified the scraper's role a bit in the scheme, the scraper has to determine what date of the online resources matches XBMC's request. It sort of maps "title", "plot",... to a part of the data. So it contains a bit more intelligence as you suggest but I'm guessing you know this and just wanted to keep things simple here.

(2012-04-07, 03:19)AnalogKid Wrote: How 'Scraper' finds its information should be of no concern to XBMC. XBMC only needs to provide the scraper with as much information as it possibly can in order the 'help' the scraper.
It would be a mistake to make assumptions on how Scraper might do its work... so to assume it will search online, and only provide it with simplistic hints such as 'we think the movie is called Lady In Red' isn't enough.

We COULD provide it with:
- The media resides at 'C:\my movies\lady in red\man in blue.mp4'
- We THINK the movie title is Lady In Red

This allows the scraper to think for itself!... it can go along with our suggestion of 'lady in red' OR it attempt to be smarter and opt for 'man in blue'.


With me so far?

Yes I am Wink You are 100% correct in stating that XBMC shouldn't have to know or care how it gets by the metadata. It should just have to pass "something" the file name and possibly wherever its a tv show or a movie and get the metadata in return. My whole proposal is about this "something", it would be the library I create! A total separation from the current codebase, not depending on anything else. The library would be used by the VideoInfoScanner class for example. This class would get an instance of it initialized with the chosen scraper and then just hand it some callbacks ( if the library itself is threaded which some said wouldn't be a good idea and I rather agree ) or do some API calls in it's own thread ( which is the path I'll take 90% sure ).

I hope that after a while this file iteration would also be separated from XBMC and become a part of the lib so that all XBMC should do is pass some paths to movie/show folders and register callbacks ( now the threading would have to be done in the lib I'd imagine ).

(2012-04-07, 03:19)AnalogKid Wrote: So here's my particular small suggestion in the grand scheme of things (and I'll comment more on the bigger picture later!)....

***** Make the 'local meta / content' detection a scraper like all other scrapers (putting aside the current limitations of scaper API's.*****

The fact that that content exists in the local file system as opposed to an online resource simply doesn't matter. The local file systems IS a 'scrapeable resource' and should be scraped by a scraper.

That means of course that scrapers would have to be able to exercise logic and effectively be 'executable modules' in some way. But it makes sense to me that scrapers have this ability.

You are right that on a very abstract level using the data in the nfo and some data from an online source makes no difference. Both are paths that end up with XBMC acquiring metadata about a file. However extreme abstraction is never very good in my opinion, in this case and for my idea their is an really is a big difference... the availability, speed and all other factors related to the use of a network connection. These things are of no importance when using local data. Also there isn't a lot of local data to work with when new movies come in so I don't really see how it would benefit the end-user or scraper developers a lot to have a scraper for it. Also ( I'm not entirely sure ) I think currently both sources are used in conjunction but you would probably cover this with the daisy chaining.

All in all I think you are right on a software architecture level about treating local data the same but I have my questions about how far this concept should be followed for overall convenience in the actual code.

(2012-04-07, 03:19)AnalogKid Wrote: Benefits:
It abstracts the collection of metadata retrieval away from core XBMC in a consistent manner through the use of 'scrapers'.
XBMC makes no assumptions whatsoever on how a scraper deduces the information
It allows (theoretically) for an entirely different local NFO / tbn / jpg scheme to be implemented as long as there's a scraper that supports it
It moves the NFO / tbn / jpg scanning functionality out of XBMC core and into a scraper

The first two benefits would be achieved by the library anyway since it's a separate library. About the last benefit I already expressed my doubts about its usefulness but if you replace the word "scraper" with "library" the proposal also achieves it. About the possibility of implementing an entirely different local scheme... I really don't see the use but since this would also be part of the library it wouldn't be as hard to do so anyway and if there is really a demand for this I could abstract this a bit more in the lib so it would become even simpler and allows different schemes without treating it like a remote scraping.

(2012-04-07, 03:19)AnalogKid Wrote: Cons:
It's probably a lot of work initially and widening of scraper capability
I wouldn't overestimate the extra work this bring neither, if the back-end is adapted a bit and XML scrapers can take "file:///" url's ( after all those are UNIFORM resource locators ) it defiantly doable, I'm just not really convinced if it's worth it.

(2012-04-07, 03:19)AnalogKid Wrote: There's more to come.... e.g. a strategy for 'daisy chaining' scrapers so that Meta Data and content can progressively be enhanced (sequentially / via priority)
There's even scope for a more complex parallel scrape where multiple sources of Meta Data and content are collated and rationalised.

It's a lot of words, but a simple concept, and I THINK it's in keeping with the OP's line of thinking.

If it's way off, I'll gladly drop out and leave you guys to it. I'm at an 'abstact' level... you guys are at a practical level... but there's a chance somewhere in between lies perfection ;-)

I'll possibly come back with the 'daisy chaining' / sequentially scraping stuff later....
Parallel scraping of different parts of the metadata could be done with my initial suggestion for "modular scrapers" too. I am very curious about what you mean exactly with "daisy chaining" scrapers so please continue your post!

Thanks again for your input! It's a good idea and I would ask the xbmc developers' opinion on this too.

Greets
Reply


Messages In This Thread
RE: [Proposal] Seperating the scraper in a library. - by dzan - 2012-04-07, 14:50
Logout Mark Read Team Forum Stats Members Help
[Proposal] Seperating the scraper in a library.0