IMDB Scrapping Failure (Well Known Movies)
#16
(2012-11-24, 09:44)olympia Wrote: This should be fixed by now. Yes, it should update automatically. In case not, use force refresh from the context menu on the team-xbmc repo.

1st post Blush I'm running xbmc 10.1 (if it's not broken, don't fix it) and the IMDB scraper still returns to the keyboard on all movie database queries. Please explain the force refresh and what is a "repo"?
Reply
#17
(2012-11-30, 22:27)peachville Wrote:
(2012-11-24, 09:44)olympia Wrote: This should be fixed by now. Yes, it should update automatically. In case not, use force refresh from the context menu on the team-xbmc repo.

1st post Blush I'm running xbmc 10.1 (if it's not broken, don't fix it) and the IMDB scraper still returns to the keyboard on all movie database queries. Please explain the force refresh and what is a "repo"?

Probably since Dharma 10.1 isn't really maintained any longer.
Since then we are already using Eden11.0 for many months and nor heading to Frodo12.0
So maybe it's time to upgrade Smile

Note:
"repo" is the place where all scripts/scrapers and plugins come from
Read/follow the forum rules.
For troubleshooting and bug reporting, read this first
Interested in seeing some YouTube videos about Kodi? Go here and subscribe
Reply
#18
(2012-11-30, 22:53)Martijn Wrote:
(2012-11-30, 22:27)peachville Wrote:
(2012-11-24, 09:44)olympia Wrote: This should be fixed by now. Yes, it should update automatically. In case not, use force refresh from the context menu on the team-xbmc repo.

1st post Blush I'm running xbmc 10.1 (if it's not broken, don't fix it) and the IMDB scraper still returns to the keyboard on all movie database queries. Please explain the force refresh and what is a "repo"?

Probably since Dharma 10.1 isn't really maintained any longer.
Since then we are already using Eden11.0 for many months and nor heading to Frodo12.0
So maybe it's time to upgrade Smile

Note:
"repo" is the place where all scripts/scrapers and plugins come from

Thanks for the fast reply Martijn

The 10.1 IMDB scraper worked just fine till it suddenly dropped dead.
If upgrading xbmc is anything like a Microsoft OS upgrade, I'd rather take that sharp stick in my eye Confused
I'm still running WinXP on my main computer and only upgraded that from Win98 2 years ago....lol.
The thought of loosing data on over 2000 movie titles plus 1.5 TB of TV shows
(I collect movies/tv series like stamps) scares the absolute hell out of me.
Cloning just one 2TB drive takes about 16 hrs so do I really have to go Eden or Frodo?
Isn't there an IMDB script file that can just be edited to point to the new scraper location?

Thanks
peachville
Reply
#19
(2012-11-30, 23:30)peachville Wrote:
(2012-11-30, 22:53)Martijn Wrote:
(2012-11-30, 22:27)peachville Wrote: 1st post Blush I'm running xbmc 10.1 (if it's not broken, don't fix it) and the IMDB scraper still returns to the keyboard on all movie database queries. Please explain the force refresh and what is a "repo"?

Probably since Dharma 10.1 isn't really maintained any longer.
Since then we are already using Eden11.0 for many months and nor heading to Frodo12.0
So maybe it's time to upgrade Smile

Note:
"repo" is the place where all scripts/scrapers and plugins come from

Thanks for the fast reply Martijn

The 10.1 IMDB scraper worked just fine till it suddenly dropped dead.
If upgrading xbmc is anything like a Microsoft OS upgrade, I'd rather take that sharp stick in my eye Confused
I'm still running WinXP on my main computer and only upgraded that from Win98 2 years ago....lol.
The thought of loosing data on over 2000 movie titles plus 1.5 TB of TV shows
(I collect movies/tv series like stamps) scares the absolute hell out of me.
Cloning just one 2TB drive takes about 16 hrs so do I really have to go Eden or Frodo?
Isn't there an IMDB script file that can just be edited to point to the new scraper location?

Thanks
peachville

Well it isn't that hard to upgrade though Wink

Make sure you backup your userdata to a safe place. If anything goes wrong you could always put everything back and be safe with Dharma. Download latest xbmc version and just install. If it goes wrong install 10.1 again and put all userdata back.

A possible problem could be the graphics card support when upgrading..

Well it comes down that we will be dropping support soon for the old version. Scrapers isn't my department but atm imdb scraper is broken across all versions and I don't know if 11.0 could be used for 10.1 if it is fixed.


Note:
Maybe strange request but I would be interested in some ones complete userdata folder to try if upgrading from 10.1 > 12.0 works with very old data/library.
Read/follow the forum rules.
For troubleshooting and bug reporting, read this first
Interested in seeing some YouTube videos about Kodi? Go here and subscribe
Reply
#20
Hello all,
Not sure if it's really fixed already as olympia wrote - for me it wasn't, so I decided to patch it myself.
This should fix the problem:
File: .xbmc/addons/metadata.imdb.com/imdb.xml
Code:
diff imdb_orig.xml imdb.xml
2c2
< <scraper framework="1.1" date="2012-02-19">
---
> <scraper framework="1.1" date="2012-12-01">
12c12
<         <RegExp input="$$1" output="&lt;url&gt;http://akas.imdb.com/find?s=tt;q=\1$$4&lt;/url&gt;" dest="3">
---
>         <RegExp input="$$1" output="&lt;url&gt;http://akas.imdb.com/find?s=tt&amp;q=\1$$4&lt;/url&gt;" dest="3">
28c28
<                 <expression noclean="1">(&gt;&lt;a href=&quot;/title.*)</expression>
---
>                 <expression noclean="1">(&gt;\s*&lt;a href=&quot;/title.*)</expression>
30,31c30,31
<             <RegExp input="$$4" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;year&gt;\3&lt;/year&gt;&lt;url cache=&quot;\1-main.html&quot;&gt;http://akas.imdb.com/title/\1/&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="5+">
<                 <expression repeat="yes" noclean="1,2">&gt;&lt;a href=&quot;/title/([t0-9]*)/[^&gt;]*&gt;(?:&amp;#x22;)?([^&lt;]*?)(?:&amp;#x22;)?&lt;/a&gt; *\([^\(]*?([0-9]{4})</expression>
---
>             <RegExp input="$$4" output="&lt;entity&gt;&lt;title&gt;\2 \4&lt;/title&gt;&lt;year&gt;\3&lt;/year&gt;&lt;url cache=&quot;\1-main.html&quot;&gt;http://akas.imdb.com/title/\1/&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="5+">
>                 <expression repeat="yes" noclean="1,2">&gt;\s*&lt;a href=&quot;/title/([t0-9]*)/[^&gt;]*&gt;(?:&amp;#x22;)?([^&lt;]*?)(?:&amp;#x22;)?&lt;/a&gt;\s*(?:\([IV]*\)\s*)?\([^\(]*?([0-9]{4})\)\s*([^&lt;]*)</expression>

I used the base file from: http://xbmc.git.sourceforge.net/git/gitweb.cgi?p=xbmc/scrapers

Just in case this is the part from my working imdb.xml:
Code:
    <CreateSearchUrl dest="3" SearchStringEncoding="iso-8859-1">
        <RegExp input="$$1" output="&lt;url&gt;http://akas.imdb.com/find?s=tt&amp;q=\1$$4&lt;/url&gt;" dest="3">
            <RegExp input="$$2" output="%20(\1)" dest="4">
                <expression clear="yes">(.+)</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults dest="8">
        <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="\1" dest="7">
                <expression clear="yes">/title/([t0-9]*)/(combined|faq|releaseinfo|vote)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\1&lt;/title&gt;&lt;year&gt;\2&lt;/year&gt;&lt;url cache=&quot;$$7-main.html&quot;&gt;http://akas.imdb.com/title/$$7/&lt;/url&gt;&lt;id&gt;$$7&lt;/id&gt;&lt;/entity&gt;" dest="5">
                <expression clear="yes" noclean="1">&lt;meta name=&quot;title&quot; content=&quot;(?:&amp;#x22;)?([^&quot;]*?)(?:&amp;#x22;)? \([^\(]*?([0-9]{4})\)</expression>
            </RegExp>
            <RegExp input="$$1" output="\1" dest="4">
                <expression noclean="1">(&gt;\s*&lt;a href=&quot;/title.*)</expression>
            </RegExp>
            <RegExp input="$$4" output="&lt;entity&gt;&lt;title&gt;\2 \4&lt;/title&gt;&lt;year&gt;\3&lt;/year&gt;&lt;url cache=&quot;\1-main.html&quot;&gt;http://akas.imdb.com/title/\1/&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="5+">
                <expression repeat="yes" noclean="1,2">&gt;\s*&lt;a href=&quot;/title/([t0-9]*)/[^&gt;]*&gt;(?:&amp;#x22;)?([^&lt;]*?)(?:&amp;#x22;)?&lt;/a&gt;\s*(?:\([IV]*\)\s*)?\([^\(]*?([0-9]{4})\)\s*([^&lt;]*)</expression>
            </RegExp>
            <expression clear="yes" noclean="1"/>
        </RegExp>
    </GetSearchResults>

IMDB did change the Query-URL and also added a whitespace in the Results.

A minor addition I made is that the scraper now also lists the type of media (TV Episode, Video Game, etc) in the result list.
Reply
#21
Wow groxxda great first post!

Thanks man!!! (+1) I used your fix and my IMDB scrapper is working 100% again, Yeah!
My Madnox Mods | Nox Version Guide
---------------------------------------------------------------
Movie Logo Requests | Studio Logo Requests
Logo's Made So Far:
838
Reply
#22
(2012-12-01, 06:33)groxxda Wrote: Not sure if it's really fixed already as olympia wrote

My post is from 24. November, when the original issue has been fixed.
Follwoing that, the entire search got completely broken at 28. Nov when IMDb changed their search engine and result page layout.
Reply
#23
I suggest everyone to use the Universal Scraper for now, until the new search enginer at IMDb gets stabilized. It currently uses the search engine of The Movie Database, but can get all information from IMDb (you can very flexibly configure it via it's settings). Certainly the downside is that you cannot scrape movies which are not on TMDb, but this shouldn't affect a lot of movies.
Reply
#24
Also +1 for me groxxda.
Imdb working Nice with your fix
Reply
#25
This worked for me as well. Thanks groxxda!
Reply
#26
(2012-12-01, 06:33)groxxda Wrote: Hello all,
Not sure if it's really fixed already as olympia wrote - for me it wasn't, so I decided to patch it myself.
This should fix the problem:
File: .xbmc/addons/metadata.imdb.com/imdb.xml
Code:
diff imdb_orig.xml imdb.xml
2c2
< <scraper framework="1.1" date="2012-02-19">
---
> <scraper framework="1.1" date="2012-12-01">
12c12
<         <RegExp input="$$1" output="&lt;url&gt;http://akas.imdb.com/find?s=tt;q=\1$$4&lt;/url&gt;" dest="3">
---
>         <RegExp input="$$1" output="&lt;url&gt;http://akas.imdb.com/find?s=tt&amp;q=\1$$4&lt;/url&gt;" dest="3">
28c28
<                 <expression noclean="1">(&gt;&lt;a href=&quot;/title.*)</expression>
---
>                 <expression noclean="1">(&gt;\s*&lt;a href=&quot;/title.*)</expression>
30,31c30,31
<             <RegExp input="$$4" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;year&gt;\3&lt;/year&gt;&lt;url cache=&quot;\1-main.html&quot;&gt;http://akas.imdb.com/title/\1/&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="5+">
<                 <expression repeat="yes" noclean="1,2">&gt;&lt;a href=&quot;/title/([t0-9]*)/[^&gt;]*&gt;(?:&amp;#x22;)?([^&lt;]*?)(?:&amp;#x22;)?&lt;/a&gt; *\([^\(]*?([0-9]{4})</expression>
---
>             <RegExp input="$$4" output="&lt;entity&gt;&lt;title&gt;\2 \4&lt;/title&gt;&lt;year&gt;\3&lt;/year&gt;&lt;url cache=&quot;\1-main.html&quot;&gt;http://akas.imdb.com/title/\1/&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="5+">
>                 <expression repeat="yes" noclean="1,2">&gt;\s*&lt;a href=&quot;/title/([t0-9]*)/[^&gt;]*&gt;(?:&amp;#x22;)?([^&lt;]*?)(?:&amp;#x22;)?&lt;/a&gt;\s*(?:\([IV]*\)\s*)?\([^\(]*?([0-9]{4})\)\s*([^&lt;]*)</expression>

I used the base file from: http://xbmc.git.sourceforge.net/git/gitweb.cgi?p=xbmc/scrapers

Just in case this is the part from my working imdb.xml:
Code:
    <CreateSearchUrl dest="3" SearchStringEncoding="iso-8859-1">
        <RegExp input="$$1" output="&lt;url&gt;http://akas.imdb.com/find?s=tt&amp;q=\1$$4&lt;/url&gt;" dest="3">
            <RegExp input="$$2" output="%20(\1)" dest="4">
                <expression clear="yes">(.+)</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults dest="8">
        <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="\1" dest="7">
                <expression clear="yes">/title/([t0-9]*)/(combined|faq|releaseinfo|vote)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\1&lt;/title&gt;&lt;year&gt;\2&lt;/year&gt;&lt;url cache=&quot;$$7-main.html&quot;&gt;http://akas.imdb.com/title/$$7/&lt;/url&gt;&lt;id&gt;$$7&lt;/id&gt;&lt;/entity&gt;" dest="5">
                <expression clear="yes" noclean="1">&lt;meta name=&quot;title&quot; content=&quot;(?:&amp;#x22;)?([^&quot;]*?)(?:&amp;#x22;)? \([^\(]*?([0-9]{4})\)</expression>
            </RegExp>
            <RegExp input="$$1" output="\1" dest="4">
                <expression noclean="1">(&gt;\s*&lt;a href=&quot;/title.*)</expression>
            </RegExp>
            <RegExp input="$$4" output="&lt;entity&gt;&lt;title&gt;\2 \4&lt;/title&gt;&lt;year&gt;\3&lt;/year&gt;&lt;url cache=&quot;\1-main.html&quot;&gt;http://akas.imdb.com/title/\1/&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="5+">
                <expression repeat="yes" noclean="1,2">&gt;\s*&lt;a href=&quot;/title/([t0-9]*)/[^&gt;]*&gt;(?:&amp;#x22;)?([^&lt;]*?)(?:&amp;#x22;)?&lt;/a&gt;\s*(?:\([IV]*\)\s*)?\([^\(]*?([0-9]{4})\)\s*([^&lt;]*)</expression>
            </RegExp>
            <expression clear="yes" noclean="1"/>
        </RegExp>
    </GetSearchResults>

IMDB did change the Query-URL and also added a whitespace in the Results.

A minor addition I made is that the scraper now also lists the type of media (TV Episode, Video Game, etc) in the result list.

Maybe I am missing something here but I tried creating a new imdb.xml and replacing it with the code in the post above and XBMC just crashes. Could someone explain what I am doing wrong? Additionally maybe host the file on dropbox so i could just wget it?
Reply
#27
So it took me a little bit but I finally saw what my issue was. In order to save some people the hassle I went through to figure it out, here is the file hosted on dropbox. http://dl.dropbox.com/u/80929459/imdb.xml just wget into the directory.

The only thing I noticed with this quick fix is that I have some movies that are misidentified and are labeled in a foreign language.
Reply
#28
Just for the records: on my test folder with 50 movies the IMDb scraper with the above patch only identifies 20 movies correctly. Still don't see the reason why Universal Scraper with IMDb fields enabled are much better for all of you...
Reply
#29
Thanks groxxda.
Reply
#30
olympia did you use the patch or did you replace it with my contents. I'm not sure if the bundled imdb.xml and the one from the repository I used match. I'm almost sure not, even though the date was identical.
If you could point me to the source of the bundled file I'll use that instead, I guess it was more advanced to prevent the wrong identification. (I'm on Gentoo and use a live build, so don't want to recompile just to get the bundled file again).
Reply

Logout Mark Read Team Forum Stats Members Help
IMDB Scrapping Failure (Well Known Movies)1