2011-11-21, 17:19
forget about my previous post, as I finally worked out how to bypass google's search limitations, which were stopping me from batch updating my entire library. in summary, the modifications I suggest to current filmaffinity's v1.4.1 scraper are only 2, very simple yet very useful ones:
some future improvement? sure there is plenty, but what it came to me as obvious was the fact that some miniatures were not being downloaded appropriately from filmaffinity. I can download them manually through xbmc as it will suggest imdb's ones, but I was wondering why movies like Nixon (http://www.filmaffinity.com/es/film737736.html) don't get such miniature. in fact there's no lightbox overthem so it looks like the img code will surely look different. I guess I'll leave this for the proper scraper developers, in order to debug it and release a new scraper version including my 2 previous suggestions.
- include year on filmaffinity search
this allows a more refined search, which will point to the exact movie if your library file names follow XBMC file naming convention (which is broadly filename(year).ext). it will still search exactly as before if year is not entered, so the ones that bothered to label files appropriately will get the bonus, and the ones that didn't just won't. simply change regular expresion line 11 to this:
Code:<RegExp input="$$1" output="<url>http://www.filmaffinity.com/es/advsearch.php?stype[]=title&fromyear=$$2&toyear=$$2&stext=\1</url>" dest="3">
- remove google keyword "site:" from IMDB id resolving
this was activating some kind of special search tracking at google side, which was generating an undetectable through xbmc captcha solving after a couple of hundreds of queries. I came out changing "q=site:imdb.com\1" for "q=imdb\1" finding no performance lost at all. symply change the regular expression line 124 to this:
I added the "&btnI=745" code (forcing google's "I'm feeling lucky" capability) in case it could lighter the query parsing, and the "&pws=0" code (remove all local personal configurations) in case anything stored locally at the browser could be stopping the scraper from reaching its destiny. I'm pretty sure these 2 options are not necessary, but since they were there in all my tests and they worked fine, I would suggest just to leave them there as I did.Code:<RegExp input="$$9" output="<url function="GoogleToIMDB">http://www.google.com/search?q=imdb\1&btnI=745&pws=0</url>" dest="5+">
some future improvement? sure there is plenty, but what it came to me as obvious was the fact that some miniatures were not being downloaded appropriately from filmaffinity. I can download them manually through xbmc as it will suggest imdb's ones, but I was wondering why movies like Nixon (http://www.filmaffinity.com/es/film737736.html) don't get such miniature. in fact there's no lightbox overthem so it looks like the img code will surely look different. I guess I'll leave this for the proper scraper developers, in order to debug it and release a new scraper version including my 2 previous suggestions.