2012-04-19, 04:00
Hey guys I'm new to XBMC add-on development and just starting out. I was getting the hang of this until I hit a roadblock. I'm trying to first do a simple scrape of a site using the code below
req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
response = urllib2.urlopen(req)
link=response.read()
response.close()
now this code works fine with all the sites I've tried with so far but with this site "http://khmerportal.com/videos" it keeps giving me the 403 access denied error and tells me to go to this site "http://www.ioerror.us/bb2-support-key?key=17566707" Which seems to tell me that the site is using some kind of software like "Bad Behavior" that protects it from malicious scripts. Now I've been able to scrape this site just fine with .NET so I'm not sure what's in the python library that's causing the site to detect the script. any help would be appreciated.
Oh if you can't help with the error at minimum can you see if you can scrape "http://khmerportal.com/videos" successfully without getting the same error?
req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
response = urllib2.urlopen(req)
link=response.read()
response.close()
now this code works fine with all the sites I've tried with so far but with this site "http://khmerportal.com/videos" it keeps giving me the 403 access denied error and tells me to go to this site "http://www.ioerror.us/bb2-support-key?key=17566707" Which seems to tell me that the site is using some kind of software like "Bad Behavior" that protects it from malicious scripts. Now I've been able to scrape this site just fine with .NET so I'm not sure what's in the python library that's causing the site to detect the script. any help would be appreciated.
Oh if you can't help with the error at minimum can you see if you can scrape "http://khmerportal.com/videos" successfully without getting the same error?