Appeal for help with a python routine module for 2shared, ReCapycha, and Capycha?
#1
Question 
I am attempting to expand the scope of my Icefilms plugin, which is incredibly shoddy code, but works rather well.

1)adding support for streaming from 2shared
2shared's actual pure download link structure is like this:
http://dc[server location].2shared.com/download1/?i=[some random letters/numbers]

the letters and numbers are plainly avaliable and scrapeable from the 2shared file webpage sourcecode.

the server location is not, and seems to be encoded in some complicated way.

there is also a bit of javascript code in the sourcecode that when executed on the page loads the direct link and downloads the file.

if one tries to guess the server location (by trying all locations from 1-250), 2shared tells you you've reached your download limit.




recaptcha is a feature of far greater urgency than 2shared
2)passing through the reCaptcha image to the user, so they can enter text in a box to complete it.
reCaptcha is not worth trying to crack! so i figured passing through the image and having the user complete it was the best thing to do. (reCaptcha is on icefilms.info, not 2shared)
I thought about maybe displaying the reCaptcha image as the folder icon... but i think that would stretch it and make it harder to read.

anyway, these 2 things are slightly beyond my [lack of] python skills at the moment, and i would appreciate any advice/patches/code.
Reply
#2
I quickly looked at 2shared and I think you would get the server address by doing the following:

First, you look at the 2shared page's source. In the Javascript, there is a variable named 'key' found beneath the <!-- 0:-1 --> comment:

Code:
<!--  0:-1 -->
    function random_function_name(){
        window.location = $('#random_characters').text();
    }
    [b][color=red]var key='RANDOM_KEY';[/color][/b]
  </script>

A request is then made to the following URL 'http://www.2shared.com/pageDownload1/retrieveLink.jsp?id=RANDOM_KEY' and it returns the full download link URL in the format you outlined above. However, the retrieveLink URL requires the correct headers to be sent before it will return the URL. In particular, you have to set the referrer to be the 2shared HTML page linked to from Icefilms and you have to include the Cookie set by 2shared when you visit the page.

Hope this helps. No idea about the Captchya, though it should be doable.
Reply
#3
Regarding ReCaptcha...

First, grab the URL of the iframe with id="videoframe" and then parse the source of that page for a URL in the following format:

http://www.google.com/recaptcha/api/nosc...=THE_TOKEN


Now you can retrieve the ReCaptcha image.

Code:
token = recaptchaUrl.split('recaptcha/api/noscript?k=')[1]
url = 'http://www.google.com/recaptcha/api/challenge?k=' + token
urllib.urlretrieve(url)

The response is some Javascript that includes a very long string named the 'challenge token.' You now have everything we need to get the image, which is at the following URL:

http://www.google.com/recaptcha/api/imag...ENGE_TOKEN

The next step would be displaying this in XBMC. I'm not sure how to do this, but it is possible. I'm sure someone on the forums or in the IRC channel could help you out with this.

Once you've shown the user the image and gotten their input, you then post your response:

Code:
videoPageUrl = "http://www.icefilms.info/membersonly/components/com_iceplayer/video.php?h=377&w=626&vid=0000&img=&ttl=xxxxxxxxx"
params = urllib.urlencode({'recaptcha_challenge_field': challengeToken, 'recaptcha_response_field': userInput})
resp = urllib.urlopen(videoPageUrl, params)

Be cautious of headers... cookies/referrer may need to be sent with the POST request.
Reply
#4
reCaptcha implemented! thanks maruchan
my displaying code was a little shoddy (captcha displayed as folder icon) but it works perfectly
give it a try Smile

i'll look at adding 2shared soon
Reply
#5
for 2shared, after browsing Stackoverflow for a while the reccommendations over scraping javascript led me to think about trying using webkit to load page. since i am sure that it uses javascript to generate the link.

this might be a bit difficult/slow, but i think it will be more robust against future site changes.

EDIT:2shared site changed, might not need webkit...
Reply

Logout Mark Read Team Forum Stats Members Help
Appeal for help with a python routine module for 2shared, ReCapycha, and Capycha?0