Trying to login to a site/tearing my hair out!
#1
Hi folks,

I'm pretty new to Python and XBMC development but do have some programming knowledge.

i'm trying to build an addon which will scrape http://www.falkirkfc.tv with a legitimite, working, username and password.

The login form on the site contains two hidden fields which act as tokens. One is randomly generated on page load and acts as a field name, it's value is 1. The other is static and is the value of a field whose name is 'return'.

After passing the tokens at login, as well as other data, it still wouldn't log me in to the home page. I then realised that cookies were being passed in the POST request.

Once i'd passed the cookies, login was successful.

Here's where i'm stuck.

Whenever i try to scrape a restricted page on the site after logging in, it logs me back out and outputs the original welcome page.

I've been using urllib and my code is as follows:

Code:
def LOGIN():

        USERNAME = settings.getSetting(id="username")
        PASSWORD = settings.getSetting(id="password")

        #GET SITE COOKIE AND TOKENS
        CJ = cookielib.CookieJar()
        COOKIEHANDLER = urllib2.HTTPCookieProcessor(CJ)
        OPENER = urllib2.build_opener(COOKIEHANDLER)
        
        REQ = urllib2.Request(LOGINURL)
        REQ.addheaders = [('User-agent', USERAGENT)]
        RESPONSE = OPENER.open(REQ)
        LINK=RESPONSE.read()
        RESPONSE.close()
        TOKEN1=re.compile('<input type="hidden" name="return" value="(.+?)" />').findall(LINK)
        TOKEN2=re.compile('<input type="hidden" name="(.+?)" value="1" />').findall(LINK)

        #ADD THE OTHER COOKIES
        c1 = cookielib.Cookie(version=0, name='sb_username28', value=USERNAME, port=None, port_specified=False, domain='www.falkirkfc.tv', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=False, comment=None, comment_url=None, rest={'HttpOnly': None})
        c2 = cookielib.Cookie(version=0, name='sb_url28', value="http%3A%2F%2F", port=None, port_specified=False, domain='www.falkirkfc.tv', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=False, comment=None, comment_url=None, rest={'HttpOnly': None})


        OPENER.addheaders = [('User-agent', USERAGENT)]
        LOGINDATA = urllib.urlencode({'username' : USERNAME, 'password' : PASSWORD, 'return' : TOKEN1[0], TOKEN2[0] : '1', 'Submit' : 'Log in', 'option' : 'com_users', 'task' : 'user.login'})
        RESPONSE = OPENER.open(LOGINURL, LOGINDATA)

        print RESPONSE.read() #OUTPUT SHOWS ME SUCCESSFULLY LOGGED IN

def INDEX(url):

        REQ = urllib2.Request(url) #THE URL I'M PASSING IS TO A PROTECTED PAGE
        REQ.addheaders = [('User-agent', USERAGENT)]
        RESPONSE = urllib2.urlopen(REQ)
        LINK=RESPONSE.read()
        RESPONSE.close()
        print LINK #THIS OUTPUT TAKES ME BACK TO THE HOMEPAGE, LOGGED OUT
        MATCH=re.compile('<div class="show-title-container"><a href="(.+?)" class="show-title-gray info_hover"> (.+?)</a></div>').findall(LINK)
        for href,title in MATCH:
               href = ROOTURL + href
               addDir(title,href,2,'')

If anyone could help me better understand where I am going wrong i'd much appreciate it.

Thanks in advance.
Reply
#2
You probably need your cookies to be sent along with every request. So try reusing your CookieJar in INDEX.
Reply
#3
Got it working now.

Cheers for that!
Reply

Logout Mark Read Team Forum Stats Members Help
Trying to login to a site/tearing my hair out!0