Anyone using themoviedb.org to scrape?
#1
I see that in XBMC 9.04 the new default scraper is themoviedb.org. Anyone already using 9.04 from SVN that would like to share their experience of using themoviedb.org?

I'm still scraping using IMDB, and the data retrieved is mostly OK except for the movie synopsis, which can vary greatly in quality (basically depending on the grasp of the English language of the user posting their synopsis on imdb.com). Sometimes it's laughably bad.

For this reason I'm really looking forward to switching to themovidedb.org, so I'd like to know how the switch will go when it happens. Would I just rescan my entire library? Movie posters and fan art still working OK? Happy with it?
Reply
#2
Image quality is much much better in general on tmdb. Plot descriptions are much more to the point and not enormous elaborate narrations like with iMDB.

Occassionally there's no movie information available for a certain movie, but that's no wonder being relatively new and all. I do find that for me tmdb is more accurate in picking the right movie. Unless you have a very particular preference for movies you should be fine.
And when you do come across a video not being available in the database take a couple of minutes to add it yourself.

One thing that bugs me about tmdb though is the sometimes ridiculously long genre information. IMO a movies should be tagged with three genres max, but some go way over that.
Some kind of hybrid scraper would be awesome though.
Reply
#3
Jeroen Wrote:One thing that bugs me about tmdb though is the sometimes ridiculously long genre information. IMO a movies should be tagged with three genres max, but some go way over that.
Some kind of hybrid scraper would be awesome though.

Yea a hybrid scraper would be nice. Another thing that themoviedb is missing is MPAA ratings.

It's a good move though, making it the default means that more people will use it and it should get more support and user contributions.
Reply
#4
Jeroen, many thanks for the info.

So, I installed 9.04 from an SVN build today and rescanned my library using themoviedb.org. Overall, I'm quite happy though in my experience tmdb has been LESS accurate in picking the movie. For example, it somehow managed to choose The Dark Knight from a folder (and movie file) titled 'Batman Begins (2005)'. There were numerous other examples also (though only a small percentage of the total) ... definitely more mismatches than the IMDB scan in any event.

Another gripe regards the tmdb movie user rating. The ratings on tmdb are obviously not as mature as on IMDB (not many people having voted). As both you and migueld have said, a solution to this would be some kind of a hybrid scanner that will allow me to pluck the rating from IMDB.

Nevertheless, I'd say overall, given the higher quality artwork and plot synopsis, this is a great improvement over IMDB.

PS. I've registered on themoviedb.org and have already started adding missing data.
Reply
#5
If you do a manual refresh on the movie that came back as the wrong one, is the correct title available for choice?

If so, please identify the exact filename/foldername used for the lookup, as the fuzzy matching I added should be taking care of that for you.

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#6
Jonathan, thanks for taking the time to look into this. Each time I do a manual refresh, the correct title is indeed available. Where the scraper has chosen the wrong title, I think the correct title has almost always been the second choice on the list. These are the results from a few I manually refreshed just now:


Filename: Batman Begins (2005)/Batman Begins (2005).avi
Results:
1. The Dark Knight
2. Batman Begins


Filename: Before Sunset (2004)/Before Sunset (2004).avi
Results:
1. Sunset Blvd.
2. Before Sunset
3. etc. [...]


Filename: Catch Me If You Can (2002)/Catch Me If You Can (2002).avi
Results:
1. Catch 'em if you can
2. Catch Me If You Can

Then there were a few cases when the scraper chose a sequel, instead of the original:

Filename: Die Hard (1988)/Die Hard (1988).avi
Results:
1. Live Free or Die Hard
2. Die Hard
3. etc. [...]


Filename: Back to the Future (1985)/Back to the Future (1985).avi
Results:
1. Back to the Future Part II
2. Back to the Future
3. etc. [...]

Hope this helps,
Rob
Reply
#7
Jonathan, one more thing I wanted to mention. I'm not sure how tmdb publishes the title of the film in their feed, but what's coming back from the scraper doesn't always quite match up what's on the film's page on tmdb.

Consider The Return of the King (http://themoviedb.org/movie/122).

The main title in the content of this page is 'The Lord of the Rings: The Return of the King'. Yet just below that and indeed between the <title> tags of the HTML, it is simply 'The Return of the King'. And it is this shorter title that is returned by the scraper.

In this case I would prefer the full title if only to group together the trilogy on the list of movies when I'm browsing through them in XBMC.

However, in the case of Princess Mononoke (http://themoviedb.org/movie/128), it is a little more problematic. Once again, the main title in the content of that page is 'Princess Mononoke'. However, it's the Japanese title in the <title> tags of the HTML and also what is returned by the scraper. XBMC (at least my install) is unable to display the Japanese characters, so my title is simply a collection of squares.

Cheers,
Rob
Reply
#8
And "from an SVN build" means what exactly? As that's the exact problem I fixed about a month ago. A debug log would tell you for sure.

EDIT: Grrr - found the problem - sorting by relevance was commented out. No doubt by me whilst testing :p

Fixed in r18946.

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#9
Not sure how to build a release myself (or indeed install it when it's built), but I used r18298, which is the latest pre-built release available on http://xbmcsvnosx.blogspot.com/. I used the helper app he provides to install it.

Great that it was an easy issue to fix!

Any thoughts on the title issue? Smile
Reply
#10
we use the title returned by the api. there is no issue. if you want the long titles, ask the tmdb guys
Reply
#11
I meant "issue" only in the sense of what I had just mentioned; not in the bug-track sense. However, as a software developer of sorts myself (web apps), I would probably would have taken more of an interest had some alerted me to the fact that there were problems in the data I'm pulling in.

I've signed up for an API key for TMDb to check it out.
Reply
#12
Let us know if there's a better way to do things. We have an "alternate title" field to fill in I think already (not sure if it's in-use though).
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply

Logout Mark Read Team Forum Stats Members Help
Anyone using themoviedb.org to scrape?0