2011-08-21, 00:16
I have just started to fiddle around with scraping proxy sites, and well the lists are very flaky (saying a proxy is up with good response time only to find it's down)
Is there a way to set the timeout of the net.http_HEAD function ? If I'm to try and scrape 10 proxy lists to find a few that are decent it could takes ages before those that don't work time out.
Also is there a way in python to messure the response time of a net.http_HEAD ?
in net.py line 314 you have forgotten a "self" in get_headers() should be get_headers(self)
It could be useful with a plugin test function to see if a plugin for that url exists before adding it ... thinking if I scrape a page, and find 10 different hosting sites, but only 8 of those have plugins, it would be nice to do a test in the addon before adding it, and the user later clicks on it only to find the movie link ain't working. like addon.check_for_plugin(url) returns True / False
I agree with bypassing rfc on the caching as long as it's optional for the addon, that way it can be the end users choice so they can disable it, in case it causes issues with a site.
I'm off to bed now ... I'm not fully recovered after the business trip (at times I were wondering if I were on hidden camera, it was that outlandish)
Is there a way to set the timeout of the net.http_HEAD function ? If I'm to try and scrape 10 proxy lists to find a few that are decent it could takes ages before those that don't work time out.
Also is there a way in python to messure the response time of a net.http_HEAD ?
in net.py line 314 you have forgotten a "self" in get_headers() should be get_headers(self)
It could be useful with a plugin test function to see if a plugin for that url exists before adding it ... thinking if I scrape a page, and find 10 different hosting sites, but only 8 of those have plugins, it would be nice to do a test in the addon before adding it, and the user later clicks on it only to find the movie link ain't working. like addon.check_for_plugin(url) returns True / False
I agree with bypassing rfc on the caching as long as it's optional for the addon, that way it can be the end users choice so they can disable it, in case it causes issues with a site.
I'm off to bed now ... I'm not fully recovered after the business trip (at times I were wondering if I were on hidden camera, it was that outlandish)