Posts: 2
Joined: Dec 2014
Reputation:
0
A feature that I would welcome and would probably be welcome in general would be to manually override, in app, where a specific file is being scraped from. Namely a text editable field in "Movie Information" where a URL can be pasted and the system would simply scrape from that page without attempting to grok the film itself.
From personal experience, and reviewing a variety of threads, .nfo files with URLs are inconsistent at worst and inelegant at best.
On many an occasion, right now included, I've sat pulling my hair out because the system will not scrape a page no matter how many ways I've tried to get it to. The user creation of .nfo files from scratch, seems like a manual solution to an over-automated system failing. This would be a half-step where the onus is on the user to locate a suitable page to scrape from a supported source.
Support can be limited to the minimal (IMDB, TheMDB, TVDB) and content type can be determined with a drop down or radio buttons in situ. Again, by design, the system does not need to interpret or intuit 'is this the right choice?'.
A genuine over-ride, 'take my word for it, this is the movie in question, now please populate the database'.
Posts: 31,445
Joined: Jan 2011
What people are supposed to do, in the event that a scanned movie gets ID'ed wrong from the file name, is just bring up the movie info screen and select "refresh", and then you will be presented with a list of all matches. You can even do additional manual searches for the title, and then select the exact entry that goes with that video file.
The point of using NFOs with URLs is when you don't want to (or are unable to, for some reason) correct the actual file name itself, and want a way to automatically correct the info as it gets scanned. Normally you just use the correct file name and year and you never have to touch NFO files, at all.
If an .NFO file with URL doesn't work then that means that the URL is pointing to a page that can't be used for the scraper. The scraper isn't reading the actual HTML on the page. It's just using the URL to find the exact ID for the movie. I don't know why you think they're "inconsistent". They either work or they don't work.
There is very little that is being "interpreted" or being "automated" in the entire scanning process.
Posts: 1,506
Joined: Nov 2013
uhm, of course the scraper is reading the html pages. where do you think the information comes from? (and where do you think the term scraping originates)? it doesn't *parse* the html files however.
Posts: 31,445
Joined: Jan 2011
Correct me if I am wrong, but I believe our default scrapers are using an API for most of the work in talking to the scraper site. I am well aware that some scrapers are able to grab the actual raw HTML page that a person would see, and extract the information from that.
Posts: 1,506
Joined: Nov 2013
2014-12-05, 11:23
(This post was last modified: 2014-12-05, 11:34 by ironic_monkey.)
Oh really I must have forgotten how to read the scrapers i invented.
Whether it is xml or html is rather irrelevant as long as it ain't parsed. IMDB API costs money and requires you to hide your source.
As far as API usage goes, nothing has changed since i initially created this back in 2007. TVDB used the api from the first moment, IMDB scrapes html. for this reason the scraper system was designed as is - not based on parsing API outputs - but a regex system to handle both API driven and non-API driven backends. some stuff has been added afterwards (xslt and json) but those are not used in your examples.
Posts: 31,445
Joined: Jan 2011
2014-12-05, 13:09
(This post was last modified: 2014-12-05, 21:56 by Ned Scott.)
Thank you for the information.
EDIT: redacted previous statement. My bad.
Posts: 1,506
Joined: Nov 2013
i never meant to bitch. sorry if i came off that way. i just jumped on wrong information, that is all. pointing out an error != bitching from my pov.
Posts: 5,184
Joined: Jan 2009
Reputation:
131
My examples were probably badly chosen but my point stays the same:
Providing the possibility to enter a URL where the scraper should get the information from is IMO a bad idea because the user has to know how/where the scraper gets the data from. For IMDb it scrapes the HTML page, for TVDB it uses some API, for TADB it uses a JSON API, for MusicBrainz it uses an XML API and the specific URLs used by the scraper are usually not meant to be found/handled by users manually.
The only thing that might make sense is to allow to specify a unique identifier for the scraper already defined in Kodi for an item. But isn't this was we already support through NFOs? (I've never used this myself so I don't really know).
Always read the
online manual (wiki),
FAQ (wiki) and search the forum before posting.
Do not e-mail Team Kodi members directly asking for support. Read/follow the
forum rules (wiki).
Please read the pages on
troubleshooting (wiki) and
bug reporting (wiki) before reporting issues.
Posts: 1,506
Joined: Nov 2013
2014-12-05, 15:49
(This post was last modified: 2014-12-05, 15:50 by ironic_monkey.)
sure thing, never meant to contend that point. and yeah, that's what url nfos are. the scrapers have a dedicated function to translate such urls into something they understand. first taker is used, with some prioritise stuff around it (default scraper, scraper for source and so on).
Posts: 31,445
Joined: Jan 2011
Sorry for misreading your comment, ironic_monkey.