Kodi Community Forum

Full Version: Universal Movie Scraper
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
(2012-12-20, 22:30)olympia Wrote: [ -> ]
(2012-12-19, 14:36)PistachioPedro Wrote: [ -> ]I think I've found another bug here. I have the cast scraping settings to IMDB full and when an actor is playing two characters, I'm not getting a decent result. Here's a couple of screens, the first is from Buried (2010) and you can see that what should be one line is spread out over three. There's actually meant to be a third character after Donna Mitchell as well.

I see and can replicate the issue, but honestly I don't know why this is happening. I will look into this in more detail later.

This is fixed in v2.1.0 for IMDb full cast (can't fix it at the moment for non-full cast -> this will be more difficult).
(2012-12-19, 17:03)NEOhidra Wrote: [ -> ]The scraper works fine but i have i feature request regarding the set/collection names.
It does scrape movie titles and plot correctly cording to the selected language but it is not possible to choose a language for the sets/collections. Would it be possible to scrape translated set/collection names, using Frodo?

Added at version 2.1.0
Wonderful! Thank you! Now with 2.1.0 "Collateral (2007)" actaully hits correctly. Manual refresh list do not have the finnish version "... väärä paikka..".
Smile

There is one really strange thing with one title: Dispite the "en-us" settings on my browser, "Percy Jackson & the Olympians: The Lightning Thief (2010)" still get's "Percy Jackson salamavaras (2010)" on imdb.. Nevertheless, Universal Scrapper does the job perfect and picks the english version.

There is something off with IMDB with not obaying the language settings to be absolute.. still many titles hits in finnish version (also with browser)

EDIT: One small thing... "Tinker Bell (2008)" does not found at all (manual also) with Universal, perhaps because of the IMDB classification "Tinker Bell (Video 2008)". Same goes with "The Little Mermaid - Ariel's Beginning (2008)".. not found at all. Not even with : instead of -

"Cavalcade of Cartoon Comedy (2008)" is also not found, IMDB says (TV-Series).

Some titles also occures wrong:
"Resident Evil - Afterlife (2010).iso" hits as "The Blockbuster Buster: Resident Evil - Afterlife" although with browser it gets the correct one..
By manually the SearchString must be "Resident Evil:Afterlife (2010)", then it hits ok. Though filenames can't have ":" chars..

So, could there be a setting to change the "-" marking to ":" before the search?
(2012-12-21, 14:24)realjobe Wrote: [ -> ]There is one really strange thing with one title: Dispite the "en-us" settings on my browser, "Percy Jackson & the Olympians: The Lightning Thief (2010)" still get's "Percy Jackson salamavaras (2010)" on imdb.. Nevertheless, Universal Scrapper does the job perfect and picks the english version.

There is something off with IMDB with not obaying the language settings to be absolute.. still many titles hits in finnish version (also with browser)
Yes, I saw that. This is happening with many movies, although as you say the English title should be there as well, so XBMC can pick the right movie.

(2012-12-21, 14:24)realjobe Wrote: [ -> ]EDIT: One small thing... "Tinker Bell (2008)" does not found at all (manual also) with Universal, perhaps because of the IMDB classification "Tinker Bell (Video 2008)". Same goes with "The Little Mermaid - Ariel's Beginning (2008)".. not found at all. Not even with : instead of -
Right, I will add 'videos' to the search results.

(2012-12-21, 14:24)realjobe Wrote: [ -> ]"Cavalcade of Cartoon Comedy (2008)" is also not found, IMDB says (TV-Series).
I will not add this, this is a movie scraper.

(2012-12-21, 14:24)realjobe Wrote: [ -> ]Some titles also occures wrong:
"Resident Evil - Afterlife (2010).iso" hits as "The Blockbuster Buster: Resident Evil - Afterlife" although with browser it gets the correct one..
By manually the SearchString must be "Resident Evil:Afterlife (2010)", then it hits ok. Though filenames can't have ":" chars..
This is an IMDb search engine issue. Would be great to communicate this to IMDb. This is because IMDb automatically redirect us to the movie page (which it thinks the good movie) instead of returning multiple search results.

(2012-12-21, 14:24)realjobe Wrote: [ -> ]So, could there be a setting to change the "-" marking to ":" before the search?
I will not add such setting.
I think you may have jumped the gun on the accept-language notion. There may still be a geographical component involved.

I've been doing some test searches for 'The Avengers' with various languages set, using a variety of methods - via a browser, using curl from the command line, and with the scraper (editing it to cache the search results). The results were consistent: setting the language to en-us the first result was 'The Avengers (2012)', setting it to anything else the first result was 'Avengers Assemble (2012) aka "Marvel's The Avengers"'.

'Avenger's Assemble' is unambiguously UK-only (which is where I am), so it seems like anything other than en-us is being ignored in favour of geographical location.
(2012-12-21, 18:53)scudlee Wrote: [ -> ]so it seems like anything other than en-us is being ignored in favour of geographical location.

No, setting the accept-language notion works for me for most of the titles, while there are indeed some exceptions.
I did many tests today with many languages and the search results were different to me and were in accordance with the language I set. However, there are some exception titles. Not sure why and how this can be title dependent. Very weird behavior...
(2012-12-21, 19:02)olympia Wrote: [ -> ]
(2012-12-21, 18:53)scudlee Wrote: [ -> ]so it seems like anything other than en-us is being ignored in favour of geographical location.

No, setting the accept-language notion works for me for most of the titles, while there are indeed some exceptions.
I did many tests today with many languages and the search results were different to me and were in accordance with the language I set. However, there are some exception titles. Not sure why and how this can be title dependent. Very weird behavior...

Ha, if I'd only scrolled along to check the second search result!

...That is random.
...besides... Not ALL of the movies we scrape have to hit everytime.. I have 520 titles and 98% hit automatically and that IS GREAT! Smile Rest of them I do manual update. Thanx for the work, really appreciate.
(2012-12-21, 18:11)olympia Wrote: [ -> ]
(2012-12-21, 14:24)realjobe Wrote: [ -> ]EDIT: One small thing... "Tinker Bell (2008)" does not found at all (manual also) with Universal, perhaps because of the IMDB classification "Tinker Bell (Video 2008)". Same goes with "The Little Mermaid - Ariel's Beginning (2008)".. not found at all. Not even with : instead of -
Right, I will add 'videos' to the search results.

Videos has been added to search results as of v2.1.1
(2012-12-21, 22:00)olympia Wrote: [ -> ]
(2012-12-21, 18:11)olympia Wrote: [ -> ]
(2012-12-21, 14:24)realjobe Wrote: [ -> ]EDIT: One small thing... "Tinker Bell (2008)" does not found at all (manual also) with Universal, perhaps because of the IMDB classification "Tinker Bell (Video 2008)". Same goes with "The Little Mermaid - Ariel's Beginning (2008)".. not found at all. Not even with : instead of -
Right, I will add 'videos' to the search results.

Videos has been added to search results as of v2.1.1

Beautiful! Both "problem cases" works now! THANK YOU! Smile
In certification option from themoviedb.org its missing BR country. Hopes you add it in next update.
I was wondering about that "TV-Series" not being included in the searchresults, understanding that this a movie scrapper, but there are also cases that IMDB has the hit, not TheTVDB. e.g. http://www.imdb.com/title/tt1295036/. This is a full length movie, based on Family Guy TV-Series.
Also http://www.imdb.com/title/tt1118511/


I added the TV-Series & TV Short criteria to UniversalScrapper, and now the search also gets TV-Series from IMDB

metadata.universal/universal.xml
Line 54.

Code:
<RegExp input="$$4" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;year&gt;\3&lt;/year&gt;&lt;url cache=&quot;\1-main.html&quot;&gt;http://akas.imdb.com/title/\1/&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="5+">
<expression repeat="yes" noclean="1,2">&lt;td\sclass=&quot;result_text&quot;&gt;\s&lt;a\shref=&quot;/title/([t0-9]*)/[^&gt;]*&gt;(?:&amp;#x22;)?([^&lt;]*?)(?:&amp;#x22;)?&lt;/a&gt;\s*(?:\([IV]+\) )?\([^\(]*?([0-9]{4})[^\)]*\)\s(?:\(TV Movie\)\s|\(Video\)\s|\(TV Series\)\s|\(TV Short\)\s)?&lt;</expression>
</RegExp>
(2012-12-20, 00:04)olympia Wrote: [ -> ]
(2012-12-19, 17:03)NEOhidra Wrote: [ -> ]The scraper works fine but i have i feature request regarding the set/collection names.
It does scrape movie titles and plot correctly cording to the selected language but it is not possible to choose a language for the sets/collections. Would it be possible to scrape translated set/collection names, using Frodo?

I will add this eventually.

Fast as always olympia, thank you!
(2012-12-22, 10:12)realjobe Wrote: [ -> ]I was wondering about that "TV-Series" not being included in the searchresults, understanding that this a movie scrapper, but there are also cases that IMDB has the hit, not TheTVDB. e.g. http://www.imdb.com/title/tt1295036/. This is a full length movie, based on Family Guy TV-Series.
Also http://www.imdb.com/title/tt1118511/


I added the TV-Series & TV Short criteria to UniversalScrapper, and now the search also gets TV-Series from IMDB

metadata.universal/universal.xml
Line 54.

Code:
<RegExp input="$$4" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;year&gt;\3&lt;/year&gt;&lt;url cache=&quot;\1-main.html&quot;&gt;http://akas.imdb.com/title/\1/&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="5+">
<expression repeat="yes" noclean="1,2">&lt;td\sclass=&quot;result_text&quot;&gt;\s&lt;a\shref=&quot;/title/([t0-9]*)/[^&gt;]*&gt;(?:&amp;#x22;)?([^&lt;]*?)(?:&amp;#x22;)?&lt;/a&gt;\s*(?:\([IV]+\) )?\([^\(]*?([0-9]{4})[^\)]*\)\s(?:\(TV Movie\)\s|\(Video\)\s|\(TV Series\)\s|\(TV Short\)\s)?&lt;</expression>
</RegExp>

Neat and works fine - but even better if Olivia would put it into the scraper so it would survive the next upgrade.
Olivia? Huhh? Wink

I will definitely not put it in as odds are pretty high that this would end up in many false positive hit during bulk scraping. This is because there are many similar titles for TV series and Movies, even from the same year, so XBMC might pick the wrong one.