Kodi Community Forum
Scraper is quering instead of using URL in .nfo directly - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Support (https://forum.kodi.tv/forumdisplay.php?fid=33)
+--- Forum: General Support (https://forum.kodi.tv/forumdisplay.php?fid=111)
+---- Forum: Linux (https://forum.kodi.tv/forumdisplay.php?fid=52)
+---- Thread: Scraper is quering instead of using URL in .nfo directly (/showthread.php?tid=44694)

Pages: 1 2


Scraper is quering tmdb.org instead of using URL in .nfo directly - freezy - 2009-01-31

Hello folks,

I have a scraper issue. When scanning for movies (scraper set for parent dir: tmdb.org), some of them aren't correctly fetched, meaning I'm getting "OFDB: <title>" as movie title and nothing else as meta data. When I delete the movie from the library and run a scan for new content, I'm getting this (movie in question is Birthday Girl):
Code:
DEBUG: Sort, sorting took 0 millis
   DEBUG: DoScan Scanning dir 'smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/' as not in the database
   DEBUG: Hash[movies,smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/]:DB=[],Computed=[361B0D095D7BAAA13D48AC11379E3E86]
   DEBUG: GetMovieId (smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/cpy-bdg.avi), query = select idMovie from movie where idFile=98
   DEBUG: Found matching nfo file: smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/cpy-bdg.nfo
   DEBUG: CFileSMB::Open - opened media/divx/_all/Birthday.Girl.DVDivX-CPY/cpy-bdg.nfo, fd=10000
   DEBUG: CFileSMB::Close closing fd 10000
   DEBUG: Not a proper xml nfo file (Error document empty., col 0, row 0)
    INFO: Get URL: http://api.themoviedb.org/2.0/Movie.imdbLookup?imdb_id=tt0188453&api_key=57983e31fb435df4df77afb854740ea9
    INFO: Get URL: http://api.themoviedb.org/2.0/Movie.imdbLookup?imdb_id=tt0188453&api_key=57983e31fb435df4df77afb854740ea9
    INFO: Get URL: http://api.themoviedb.org/2.0/Movie.imdbLookup?imdb_id=tt0188453&api_key=57983e31fb435df4df77afb854740ea9
    INFO: Get URL: http://www.ofdb.de/view.php?SText=0188453&Kat=IMDb&page=suchergebnis&sourceid=mozilla-search
   DEBUG: -- nfo-scraper: ofdb.xml
   DEBUG: -- nfo url: http://www.ofdb.de/film/18703,
    INFO: Get URL: http://www.ofdb.de/film/18703,
    INFO: Get URL: http://www.ofdb.de/plot/18703,73247,
    INFO: Get URL: http://www.ofdb.de/view.php?page=fassung&fid=18703&vid=132962
    INFO: Get URL: http://www.ofdb.de/view.php?page=fassung&fid=18703&vid=132962
    INFO: Get URL: http://www.imdb.com/title/tt0188453/
    INFO: Get URL: http://www.imdb.com/title/tt0188453/fullcredits#cast
    INFO: Get URL: http://api.themoviedb.org/2.0/Movie.imdbLookup?imdb_id=tt0188453&api_key=57983e31fb435df4df77afb854740ea9
    INFO: Get URL: http://www.movieposterdb.com/browse/search?title=0188453
    INFO: Get URL: http://www.movieposterdb.com/movie/0188453/Birthday-Girl.html
   ERROR: InternalGetDetails: Unable to parse web site [http://www.movieposterdb.com/movie/0188453/Birthday-Girl.html]
    INFO: Get URL: http://www.ofdb.de/film/18703,
    INFO: Get URL: http://www.ofdb.de/film/18703,
   DEBUG: Adding new item to movies:smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/cpy-bdg.avi
   DEBUG: OpenDir - Using authentication url smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY
   DEBUG: GetMovieId (smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/cpy-bdg.avi), query = select idMovie from movie where idFile=98
   DEBUG: GetMovieId (smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/cpy-bdg.avi), query = select idMovie from movie where idFile=98
   DEBUG: OpenDir - Using authentication url smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY
    INFO: Creating thumb from: smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/cpy-bdg.tbn as: /home/xbmc/.xbmc/userdata/Thumbnails/Video/a/ada628d5.tbn
   DEBUG: SECTION:LoadDLL(Q:\system\ImageLib-i486-linux.so)
   DEBUG: Loading: /usr/local/share/xbmc/system/ImageLib-i486-linux.so
   DEBUG: CFileSMB::Open - opened media/divx/_all/Birthday.Girl.DVDivX-CPY/cpy-bdg.tbn, fd=10000
   DEBUG: CFileSMB::Close closing fd 10000
    INFO: Get URL: http://ia.media-imdb.com/images/M/MV5BMTIwNTkyOTcwMl5BMl5BanBnXkFtZTcwOTk3NTYyMQ@@._V1._SX_SY_.jpg
    INFO: Creating album thumb from memory: /home/xbmc/.xbmc/userdata/Thumbnails/Video/7/77df3ab5.tbn
    INFO: Get URL: http://ia.media-imdb.com/images/M/MV5BMTM1NzcyMjYyNV5BMl5BanBnXkFtZTcwNDA1NzQxMQ@@._V1._SX_SY_.jpg
    INFO: Creating album thumb from memory: /home/xbmc/.xbmc/userdata/Thumbnails/Video/1/1625091a.tbn
    INFO: Get URL: http://ia.media-imdb.com/images/M/MV5BMTIzNzg2MzczOF5BMl5BanBnXkFyZXN1bWU@._V1._SX_SY_.jpg
    INFO: Creating album thumb from memory: /home/xbmc/.xbmc/userdata/Thumbnails/Video/1/17a9f8b4.tbn
    INFO: Get URL: http://ia.media-imdb.com/images/M/MV5BMTg5MzU4NDI0NF5BMl5BanBnXkFtZTcwODY3NjkxMQ@@._V1._SX_SY_.jpg
    INFO: Creating album thumb from memory: /home/xbmc/.xbmc/userdata/Thumbnails/Video/4/4072f0df.tbn
    INFO: Get URL: http://ia.media-imdb.com/images/M/MV5BMTYzMTYxMjc5NV5BMl5BanBnXkFtZTYwMjQ3MjM4._V1._SX_SY_.jpg
    INFO: Creating album thumb from memory: /home/xbmc/.xbmc/userdata/Thumbnails/Video/6/6a77f631.tbn
    INFO: Get URL: http://ia.media-imdb.com/images/M/MV5BMjAzMzA4NzExOF5BMl5BanBnXkFtZTcwMTkwMjMyMg@@._V1._SX_SY_.jpg
    INFO: Creating album thumb from memory: /home/xbmc/.xbmc/userdata/Thumbnails/Video/2/2dae02bc.tbn
    INFO: Get URL: http://ia.media-imdb.com/images/M/MV5BMTM0ODU0MzA1MV5BMl5BanBnXkFtZTcwNjk3MjA4MQ@@._V1._SX_SY_.jpg
    INFO: Creating album thumb from memory: /home/xbmc/.xbmc/userdata/Thumbnails/Video/a/a6a4146f.tbn
   DEBUG: DoScan - Finished dir: smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/
   DEBUG: OpenDir - Using authentication url smb://user:pass@server/divx/Black.Hawk.Down.SCREENER.DVDrip.DiVX-JAR
   DEBUG: Sort, sorting took 0 millis
So it finds the .nfo, but obivously doesn't use the imdb-URL in it, but queries tmdb.org for it, which (weirdly) returns a ofdb.de URL instead of imdb, resulting in ofdb.de getting scraped instead of imdb.com.

So there seem to be two issues (bugs?):
  1. I read somewhere that independently of the scraper, an URL found in the .nfo file always overwrites the scraper, ie an "allocine.fr" URL should scrape data from allocine, although the scraper of the parent folder may be set to imdb. Is this not valid anymore? Does the .nfo file HAVE to be an xml file with the correct structure so the URL gets parsed?
  2. Even though the .nfo file and the URL are obviously found, why does the tmdb scraper still query the tmdb website? And why does it return the ofdb URL?

Build is 17419. Comments/suggestions welcome Smile

Cheers,

-freezy.


- mkortstiege - 2009-01-31

Please pastebin the given .nfo file.

This is most likely due to a non-existent tmdb entry or a invalid imdb id. So whenever the selected and the primary scraper fails, it will use the next nfochains enabled scraper.


- freezy - 2009-01-31

http://pastebin.com/m4e151805

i also tried to change us.imdb.com to http://www.imdb.com, same result.


- mkortstiege - 2009-01-31

Seems as if its not a XBMC issue, its simply the tmdb API not returning a proper result on the imdb id lookup. Maybe you should try in a few days again, if its still not working report a bug to the tmdb guys.


- freezy - 2009-01-31

Shouldn't it take the imdb scraper directly anyway? From the wiki:

Quote:The scraper tries to match url's to all scrapers of the content type a dir is set to. E.g. if you set the content type to movies all movie scrapers check nfo files for a matching url. This means that nfo's override the scraper setting. I.e. a directory is set to use the imdb scraper but you have a german movie in it. Simply create a nfo for that movie with the ofdb link in it and you are sorted!

I don't understand why tmdb is used anyway since it should directly take the imdb scraper.


- mkortstiege - 2009-01-31

Nah, the wiki is just not yet updated.
The newly added nfochains stuff will allow scraper artists to add a function to determine the correct url for the given scraper based on the url (imdb id) found in the original .nfo.

FYI, nfochains enabled scrapers are tmdb, ofdb and moviemaze.

EDIT: There's NO imdb scraper at the moment as it was disabled and tmdb got the new default movie scraper.


- freezy - 2009-01-31

"scraper artists", well put. Smile

anyway, that explains it. though i'm surprised noone else noticed that yet, since the new default scraper in this case will populate the library with german info even though the imdb urls are set correctly.

is the imdb scraper cancelled completely? i understand that tmdb has additional fanart and an api, but in terms of ratings, completeness and probably also uptime and bandwidth, imdb is still unbeaten. acknowledged that the tmdb site can refetch this data from imdb, but then we're completely dependent on how the tmdb dispatches the urls (and obviously not so much correctly)...


- mkortstiege - 2009-01-31

freezy Wrote:"scraper artists", well put. Smile

anyway, that explains it. though i'm surprised noone else noticed that yet, since the new default scraper in this case will populate the library with german info even though the imdb urls are set correctly.

is the imdb scraper cancelled completely? i understand that tmdb has additional fanart and an api, but in terms of ratings, completeness and probably also uptime and bandwidth, imdb is still unbeaten. acknowledged that the tmdb site can refetch this data from imdb, but then we're completely dependent on how the tmdb dispatches the urls (and obviously not so much correctly)...

Well, maybe it's a newly create movie entry and the API does not yet know of it. As the IMDB scraper is not deleted or lost, you could easily re-activate it for yourself.

Mind creating a ticket for the nfochain language issue so i don't forget about it? Maybe we can group the scrapers by language in order to get rid of the foreign lang information. Will have to think about it once again Smile


- freezy - 2009-01-31

OK, will do. Do you have a link on the nfochain thingie? Apparently that substantially changes the way scrapers are dispatched, so if I could update my understanding I probably can describe the bug more accurately. Neither wiki nor forum search came up with anything.


- mkortstiege - 2009-01-31

Nope, unfortunately there's no documentation except the source itself .) I will try to modify the way the scrapers are chosen by this weekend, so no need to file a new bug report.


- freezy - 2009-01-31

Okay. If you can drop a short paragraph on how the scrapers now work then I (or you?) could update the wiki as well?


- spozen - 2009-01-31

Is anyone else getting german title/info? It happens on some of my movies so i have to manually add the plot to the nfo, really annoying.


- freezy - 2009-01-31

ok so i'm not the only one, that's news Wink


- freezy - 2009-02-02

Okay, so I saw that you fixed the language issue, great news! Now, I know there are certain, uh, legal incertitudes concerning imdb scraping, so can you confirm the current imdb scraper doesn't work anymore at all?
Code:
DEBUG: OpenDir - Using authentication url smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY
   DEBUG: Sort, sorting took 0 millis
   DEBUG: DoScan Scanning dir 'smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/' as not in the database
   DEBUG: Hash[movies,smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/]:DB=[],Computed=[361B0D095D7BAAA13D48AC11379E3E86]
   DEBUG: GetMovieId (smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/cpy-bdg.avi), query = select idMovie from movie where idFile=98
   DEBUG: Found matching nfo file: smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/cpy-bdg.nfo
   DEBUG: CFileSMB::Open - opened media/divx/_all/Birthday.Girl.DVDivX-CPY/cpy-bdg.nfo, fd=10001
   DEBUG: CFileSMB::Close closing fd 10001
   DEBUG: Not a proper xml nfo file (Error document empty., col 0, row 0)
    INFO: Get URL: http://api.themoviedb.org/2.0/Movie.imdbLookup?imdb_id=tt0188453&api_key=57983e31fb435df4df77afb854740ea9
   DEBUG: SQLite collision
   DEBUG: DoScan - Finished dir: smb://user:pass@server/divx/Birthday.Girl.DVDivX-CPY/
   DEBUG: SQLite collision
   DEBUG: OpenDir - Using authentication url smb://user:pass@server/divx/Black.Hawk.Down.SCREENER.DVDrip.DiVX-JAR
   DEBUG: Sort, sorting took 0 millis

In my opinion, the tmdb rocks what posters and fanart concerns, but there is too much info missing (ie "charlies angels" doesn't even show up), ratings are not very significant and the lack of votes results in xbmc displaying "Rating: 8.1 ( votes)". I know that imdb is very accommodating what the scraping of their content concerns, as long as you don't do it commercially. On the other hand, studios are VERY picky how their artwork is used (they're already proceeding against movieposterdb.com and basically against everyone who offers images with a resolution higher than 300x500), so I'd worry a lot more about the tmdb scraper than about imdb (but that's another discussion).

I sincerly hope though that the original imdb scraper will work again soon, with fanart from tmdb, so we have each type of data from the site which does it best.

Cheers,

-freezy.


- spozen - 2009-02-02

Read this: http://www.xbmc.org/forum/showpost.php?p=274620&postcount=7

Remember it wasn't me Laugh