Login at Kodi Home

stacked · 2009-11-18, 19:42

jpickle Wrote:This works, but only if you go to smotri.com get a dynamic link of a live broadcast and pass it to this code. A scraper would have to scrape the smotri.com/broadcast/live
page and then pass all the links to the above code, and bam, I believe you have a valid scraper for smotri.com's live feeds.
what do you think?
jpickle

Yeah, that should be fairly simple for your first plugin. Let me know on your progress. Wink

jpickle · (This post was last modified: 2009-11-19, 02:27 by jpickle.)

I would let you know, but I can't as I don't know how to parse out the video link, the title, and the thumb. I can get the page, but I don't know how to extract only the info needed to pass to the processor code that turner wrote. I am studying, but everyone that I look up on google has all these third party libraries, and what not, and after a couple of hours of research, I just get numb.
I am trying but I feel like I am banging my head against a locked door, you know what I mean? Is there anyway, you or some one else could (Please) just do it, and then once I see how it is done, I can fully understand, and start making a real (hopefully) meaningful contribution to the XBMC project..I know that is a lot to ask, and all of you guys are really busy, but I'm afraid that is, unfortunately, the only way I will be able to do it. Any way, Thanks for all of your help, and sorry that I don't understand well enough to take some burden off of others, but I just don't, sorry.
thanks
jpickle

stacked · (This post was last modified: 2009-11-19, 03:58 by stacked.)

start with a basic python script and then go on from there. just look at the page source, find the section with the important info, and use 'lazy' regex.

Code:
import urllib2, urllib, re, string, sys, os, traceback

HEADER = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1'

def open_url(url):

    req = urllib2.Request(url)

    req.add_header('User-Agent', HEADER)

    content=urllib2.urlopen(req)

    data=content.read()

    content.close()

    return data

url='http://smotri.com/broadcast/list/'

data=open_url(url)

info = re.compile('<a href="/broadcast/view/\?id=(.+?)"\n><img src="(.*?)" width="100" height="75" class="Iframe" alt="(.*?)" title="(.*?)" /></a>        </div>',re.DOTALL).findall(data)

for id,thumb,alt,title in info:

    url='http://smotri.com/broadcast/view/?id='+id

    label=title.decode('utf-8')

    print label

    print url

    print thumb

jpickle · 2009-11-19, 03:56

Wow, Thanks for that. You're good. I didn't realize it was that simple. I really appreciate your help, and I won't forget it. And I will study this and will do my best to help Xbmc.
Thank you
jpickle

stacked · 2009-11-19, 05:47

The code above isn't picking up all the streams on that page. Use this instead.

Code:
import urllib2, urllib, re, string, sys, os, traceback

HEADER = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1'

def open_url(url):

    req = urllib2.Request(url)

    req.add_header('User-Agent', HEADER)

    content=urllib2.urlopen(req)

    data=content.read()

    content.close()

    return data

url='http://smotri.com/broadcast/list/'

data=open_url(url)

limit=re.compile('<table class="VideoList">(.+?)<div id="ajax_slider_">', re.DOTALL).findall(data)

info=re.compile('    <a href="(.+?)"\n><img src="(.*?)" width="100" height="75" class="Iframe" alt="(.*?)" title="(.*?)" /></a>        </div>',re.DOTALL).findall(limit[0])

for url,thumb,alt,title in info:

    url='http://smotri.com'+url

    label=title.decode('utf-8')

    print label

    print url

    print thumb