Thread Rating:
  • 0 Vote(s) - 0 Average
IMDB not accepting certain useragent strings?
#1
just noticed today that i couldn't scrap any movies from my ubuntu system: looking at the logs i get:

Code:
16:51:57 T:140189498325328 M:835964928   DEBUG: FileCurl::Open(0x7fffd6390d08) http://akas.imdb.com/find?s=tt;q=the%20warlords%20(2007)
16:51:57 T:140189498325328 M:835964928    INFO: easy_aquire - Created session to http://akas.imdb.com
16:51:57 T:140189498325328 M:835727360   DEBUG: FillBuffer: curl failed with code 22

attempting a wget:

Code:
$ wget 'http://akas.imdb.com/find?s=tt;q=the%20warlords%20(2007)' --2010-01-02 16:52:32--  http://akas.imdb.com/find?s=tt;q=the%20warlords%20(2007)
Resolving akas.imdb.com... 72.21.206.70
Connecting to akas.imdb.com|72.21.206.70|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2010-01-02 16:52:33 ERROR 403: Forbidden.

but doing a wget with -U:

Code:
$ wget -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20080418 Ubuntu/7.10 (gutsy) Firefox/2.0.0.14' 'http://akas.imdb.com/find?s=tt;q=the%20warlords%20(2007)'
--2010-01-02 16:54:33--  http://akas.imdb.com/find?s=tt;q=the%20warlords%20(2007)
Resolving akas.imdb.com... 207.171.166.140
Connecting to akas.imdb.com|207.171.166.140|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `find?s=tt;q=the warlords (2007)'

    [   <=>                                                        ] 42,356      69.8K/s   in 0.6s    

2010-01-02 16:54:34 (69.8 KB/s) - `find?s=tt;q=the warlords (2007)' saved [42356]

also wget works fine on my mac...

is there an option somewhere where we can overload the useragent strings that xbmc/imdb scrapper uses?
Reply
#2
I am really new to xbmc and all - but I think I may have a similar problem to yourself. I'm finding that when I add a source for movies to xbmc and use imdb as a scrapper it doesn't seem to add anything to the library after scanning for about 60 seconds. However when I change the scrapper to tvdb.com it seems to be able to dload all the information.

Is there a limit as to how much you can use/download info from IMDB?!

Is there a fix for this?

Thanks
Reply
#3
That is correct. Getting the following for every movie that it tries to scrap since around 9:00PM yesterday:

00:15:06 T:2619337584 M:2954203136 ERROR: CFileCurl::CReadState::Open, didn't get any data from stream.

Changing the scrapper to themoviedb.org seems to work (though I like IMDB better).

regards,
del
Reply
#4
I'm Having the same problem with IMDB on AppleTV. Is imdb scraping down? I can scrape with tmdb but its not as big ad imdb.
Reply
#5
Having trouble too. Tried to search for help in others area, but a nothing right now. Maybe I should update..? Using A build from November I think.
Reply
#6
try adding |User-Agent={your valid user agent} to end of the urls in imdb.xml

urlencoded of course
For python coding questions first see http://mirrors.xbmc.org/docs/python-docs/
Reply
#7
stupid question.....what is a valid user agent?
Reply
#8
rebaker501,

'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20080418 Ubuntu/7.10 (gutsy) Firefox/2.0.0.14' is an example of a valid one. Basically, the user-agent is how your browser identifies itself when requesting a website.

IE: When you go to download firefox, you get the link for your platform (Mac, Win, Linux) automatically. That's because the website uses the user-agent to determine what is the right download.

If anyone gets this to work, please let us know.

del
Reply
#9
No luck after adding my Firefox user agent string to the end of the url. My situation is somewhat similar to that described above (some time this afternoon, IMDB stopped yielding any results).
Code:
03:31:59 T:4316 M:4294967295   DEBUG: CVideoDatabase::GetMovieId (F:\Movies\12 Angry Men.m4v), query = select idMovie from movie where idFile=1
03:31:59 T:4316 M:4294967295   DEBUG: No NFO file found. Using title search for 'F:\Movies\12 Angry Men.m4v'
03:31:59 T:4316 M:4294967295   DEBUG: CIMDB::InternalFindMovie: Searching for '12 angry men' using IMDb.com scraper (file: 'imdb.xml', content: 'movies', language: 'en', date: '2009-08-10', framework: '1.1')
03:31:59 T:4316 M:4294967295   DEBUG: FileCurl::Open(06D9E628) http://akas.imdb.com/find?s=tt;q=12%20angry%20men%7cUser-Agent%3d%7bMozilla%2f5.0+(Windows%3b+U%3b+Windows+NT+6.0%3b+en-US%3b+rv%3a1.9.1.6)+Gecko%2f20091201+Firefox%2f3.5.6+GTB6+(.NET+CLR+3.5.30729)%7d
03:31:59 T:4316 M:4294967295   DEBUG: XFILE::CFileCurl::CReadState::FillBuffer: curl failed with code 22
03:31:59 T:4316 M:4294967295   ERROR: CFileCurl::CReadState::Open, didn't get any data from stream.
03:31:59 T:4316 M:4294967295   DEBUG: FileCurl::Close(06D9E628) http://akas.imdb.com/find?s=tt;q=12%20angry%20men%7cUser-Agent%3d%7bMozilla%2f5.0+(Windows%3b+U%3b+Windows+NT+6.0%3b+en-US%3b+rv%3a1.9.1.6)+Gecko%2f20091201+Firefox%2f3.5.6+GTB6+(.NET+CLR+3.5.30729)%7d
Reply
#10
Nuka1195 Wrote:try adding |User-Agent={your valid user agent} to end of the urls in imdb.xml

urlencoded of course

could you clarify what you mean please? i tried putting in:

Code:
<RegExp input="$$1" output="&lt;url&gt;http://akas.imdb.com/find?s=tt;q=\1$$4&lt;/url&gt;%7CUser-Agent=Mozilla%2F5.0%20(X11%3B%20U%3B%20Linux%20i686%3B%20en-US%3B%20rv%3A1.8.1.14)%20Gecko%2F20080418%20Ubuntu%2F7.10%20(gutsy)%20Firefox%2F2.0.0.14" dest="3">

but it still doesn't work Sad better still, could someone update the imdb.xml file on svn?

cheers,
Reply
#11
Code:
<RegExp input="$$1" output="&lt;url&gt;http://akas.imdb.com/find?s=tt;q=\1$$4|User-Agent=Mozilla%2F5.0%20(X11%3B%20U%3B%20Linux%20i686%3B%20en-US%3B%20rv%3A1.8.1.14)%20Gecko%2F20080418%20Ubuntu%2F7.10%20(gutsy)%20Firefox%2F2.0.0.14&lt;/url&gt;" dest="3">
Reply
#12
thanks spiff!

Seems to be working now.
Reply
#13
I'm totally new to all this editing XML stuff so can someone please either upload their edited XML or give me noobs guide as to what exactly I need to do as I can't get IMDB scraping to work?

Thanks
Reply
#14
Spiff: Thanks, this works.

ChrisWad: find system/scrapers/video/imdb.xml where you installed XBMC to. Open up the file with a text editor. Line 46 should contain "http://akas.imdb.com/find" (it's the only instance of "find" in the file). Replace the original line with the one Spiff posted.
Reply
#15
does that mean you will need to have mozilla firefox installed? - apologies if that is an obvious question
Reply



IMDB not accepting certain useragent strings?00