looping url ARRGGHH!!
#1
can someone look at this doing my head in just f5 it and AAARGGGGGGHHHHHHH

Code:
import urllib2,urllib,re

url='http://www.imdb.com/title/tt0844441/'
req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
response = urllib2.urlopen(req)
link=response.read()
response.close()
match = re.compile('.+?/rg/tt-episodes/season.+?/images/.+?season.+?     href="(.+?)"    >(.+?)</a>').findall(link)
for url1, name in match:
    url= str(url)+url1
    print url

this is how it should come out
Code:
http://www.imdb.com/title/tt0844441/episodes?season=6
http://www.imdb.com/title/tt0844441/episodes?season=5
http://www.imdb.com/title/tt0844441/episodes?season=4
http://www.imdb.com/title/tt0844441/episodes?season=3
http://www.imdb.com/title/tt0844441/episodes?season=2
http://www.imdb.com/title/tt0844441/episodes?season=1
Reply
#2
uhm, you're modifying 'url' in a loop. you'll keep tacking on at the end..
Reply
#3
how can i get over it?
Reply
#4
Change the variable your storing the final url in. Like spiff said, right now your using the "url"variable for both the beginning fragment and the final url so it's being overwritten each time. Store the final url in url2 or the like
Reply
#5
still doent work
Code:
import urllib2,urllib,re

url='http://www.imdb.com/title/tt0844441/'
req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
response = urllib2.urlopen(req)
link=response.read()
response.close()
match = re.compile('.+?/rg/tt-episodes/season.+?/images/.+?season.+?     href="(.+?)"    >(.+?)</a>').findall(link)
for url1, name in match:
    url= str(url)+url1
    url2 = url
    print url2

doesnt matter how much i change it it will keep looping
i got you i should change the str(url)

i.e

url2 = str(url)



edit: just done it and works thankyou
Reply

Logout Mark Read Team Forum Stats Members Help
looping url ARRGGHH!!0