(2018-01-22, 02:58)Lee Thompson Wrote: I wonder if the IMDB scraping issue might be mitigated by having a set sleep time between scrape attempts.
Yes, done that. Also used Google, Duck Duck, Bing, IMDb etc to search for the IMDB Id. Also experimented in rotating through all of these to spread the load.
Bing started timing out very fast. Duck Duck couldn't confirm first result was always the movie you were searching for. Google and IMDb were most reliable.
But, every time restarted scraping of new movies, the time-out time would shorten.
ie: Scan 50 movies OK
Next scan 100 movies, 80% success.
Scan another 100 movies, 40% success.
and less, and less.
This is due to IMDb not having an API, and having to mine their html page.
Google is pretty much the same.