Return Strict Amount
#1
how can i return like only 12 items instead of the lot

Code:
url = 'http://deturl.com/www.youtube.com/results?search_query=adele+karaoke'        
req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
response = urllib2.urlopen(req)
link=response.read()
response.close()
match=re.compile('src="http://i1.ytimg.com/vi/(.+?)/1.jpg').findall(link)
for url in match:
  url1='http://www.flipbooth.com/yt/%s/' % url
  req = urllib2.Request(url1)
  req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
  response = urllib2.urlopen(req)
  link=response.read()
  response.close()
  match1 = re.compile('property="og:title" content="(.+?)"/>\r\n<meta').findall(link)
Reply
#2
Do you need to limit the search or the for statement?

For instance is it ok for match1 to obtain all matches if the for loop only deals with the first 12?
If it was a limit on the for statement you could just add a counter, ie:

Code:
nums = {1,2,3,4,5,6,7,8,9,10,11,12} #mock match1 list
sum = 0
i = 0
for i in nums: #same as your for name in match
    sum = sum + 1
    if sum > 4: #hard limit of 5 here (starts with 0)
        break  #this breaks out of the for statement
    else:
        print str(i) #whatever code you need goes here
Reply
#3
i changed the code in first post to show all code

basically when people do a search match spits out about 25 to 30 but takes forever

so i only want it to return about 12
Reply
#4
Looks to me like this is the slow one:
Code:
match1 = re.compile('property="og:title" content="(.+?)"/>\r\n<meta').findall(link)
which is being called however many times
Code:
match=re.compile('src="http://i1.ytimg.com/vi/(.+?)/1.jpg').findall(link)
matches. So doing what I was saying should work:

Code:
url = 'http://deturl.com/www.youtube.com/results?search_query=adele+karaoke'        
req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
response = urllib2.urlopen(req)
link=response.read()
response.close()
match=re.compile('src="http://i1.ytimg.com/vi/(.+?)/1.jpg').findall(link)
i = 0
for url in match:
  i +=1
  if i > 11:
    break
  else:
    url1='http://www.flipbooth.com/yt/%s/' % url
    req = urllib2.Request(url1)
    req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
    response = urllib2.urlopen(req)
    link=response.read()
    response.close()
    match1 = re.compile('property="og:title" content="(.+?)"/>\r\n<meta').findall(link)

Suppose it is "sloppy" but I do not see a way to limit re matches via the re command.
Reply
#5
lol not sloppy.....

but match still returns 24

and match1 only returns 1



match is actually pretty quick

but match1 uses the url from match and scrapes a different website 24 times to get name
Reply
#6
hmmm gotta be a typo. Yea it is ok for match to have 24, and like you are saying match1 is the slow poke because of all the websites. Thats why we want it to stop after 12...

Really has to be a typo somewhere....either with me or you?

Wanna repaste the code? Oh and "if i > 11:" realy should be "if i>=11:"
Reply
#7
OK I ran this:
Code:
# This test program is for finding the correct Regular expressions on a page to insert into the plugin template.
# After you have entered the url between the url='here' - use ctrl-v
# Copy the info from the source html and put it between the match=re.compile('here')
# press F5 to run if match is blank close and try again.

import urllib2,urllib,re

url = 'http://deturl.com/www.youtube.com/results?search_query=adele+karaoke'

req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
response = urllib2.urlopen(req)
link=response.read()
response.close()
match=re.compile('src="http://i1.ytimg.com/vi/(.+?)/1.jpg').findall(link)
print match
i = 0
for url in match:
      i +=1
      if i >= 11:
         break
      else:
         url1='http://www.flipbooth.com/yt/%s/' % url
         req = urllib2.Request(url1)
         req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
         response = urllib2.urlopen(req)
         link=response.read()
         response.close()
         match1 = re.compile('property="og:title" content="(.+?)"/>\r\n<meta').findall(link)
         print match1

and I get the result:
Code:
['TGVhQ61C6IU', 'sTE9s43rdT8', 'v5pJMwgKQcw', 'XUQVk4oG8LM', 'x-nTWktLBL8', 'Cgdes6lFjzM', 'vWaXO1wnOKU', 'L7nmEjjrKGc', 'fgfJLw9Zug4', '1l0drm-lM5M', 'uqhVqsPI5v8', 'KMOmkQ_LhYA', 'AX-kWqAwH1I', 'HHL0q-z7CyM', '59CPfYsIKc4', '2QC8DfcSe_0', 'SFRJpyNQ1K0', '3lrBve6xLw8', 'MgEyI3EFPP4', '_4fNgNCxQWk', '7D26e6l_5iI', 'JFwd16raBKU', 'LrW7Umr_5wM', 'SRn4Fh47kok', 'li9W-yEjK2g', 'uQCaVs5FGpc', '3bxsKcbKVa0', 'buG0HCAFy3s', '2AHVUH_bGBY', 'LtMb_fGHUY0', '4W5TO-woLmg', '0jF6XyW3QY4']
['Turning Tables ~ Adele Karaoke/Intrumental']
['Make You Feel My Love - Adele [Karaoke/Instrumental]']
['Someone Like You - ADELE (Karaoke)']
['Adele - Set Fire To The Rain (Karaoke)']
['One And Only -- Adele (karaoke - full version)']
['Adele - Someone Like You Karaoke']
[]
[]
['Set Fire To The Rain - Adele - Karaoke']
["I'll Be Waiting - Adele Karaoke/Instrumental + Lyrics"]

So it is definitely limiting, the two blanks were removed btw, might want to check for those before adding a 1 to i.

Reply
#8
thats brilliant thank you

you a star
Reply

Logout Mark Read Team Forum Stats Members Help
Return Strict Amount0