Kodi Community Forum

In my addon I want to scrape some json and turn it into a dict so I can grab the data I"m interested in. What's the easiest way to do this?

The string is all one line in the page and looks like this:

Code:
<script>var data = { "foo": "bar" };</script>

I want to somehow get it into a python dict, so I would scrape just the json object w/ a regex I'm assuming, and then convert that from JSON to dict.

Let's assume the URL is http://example.com I'm not sure how to make an HTTP request in my addon.

Kodi Python includes all the Standard Library which is very versatile. E.g. a web-client can be implemented using urllib2 module.

Code:
import urllib2

import socket

def load_page(url):

    """

    Minimalistic web-client

    """

    request = urllib2.Request(url, None,

                              {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0',

                               'Accept-Charset': 'UTF-8',

                               'Accept': 'text/html'})

    try:

        session = urllib2.urlopen(request, None)

    except (urllib2.URLError, socket.timeout):

        page = '404'

    else:

        page = session.read()

        session.close()

    return page

To add to the above:

Code:
import json

      html = load_page('http://example.com')

      stuff = re.compile('<script>var data = (.+?);', re.DOTALL).search(html).group(1)

      js     = json.loads(stuff)

# js['foo'] should equal 'bar'

One other extremely irritating thing with web servers is that they can return the data encoded gzip (compressed) even though you specifically tell the web server not to. In that case you need to modify the Minimalistic web-client from this:

Code:
else:

        page = session.read()

        session.close()

    return page

to this:

Code:
import gzip

from StringIO import StringIO

    else:

       if session.info().getheader('Content-Encoding') == 'gzip':

                 buf = StringIO( session.read())

                 f = gzip.GzipFile(fileobj=buf)

                 page = f.read()

       else:

                 page = session.read()

    session.close()

    return page

(2015-05-09, 00:14)learningit Wrote: [ -> ]One other extremely irritating thing with web servers is that they can return the data encoded gzip (compressed) even though you specifically tell the web server not to. In that case you need to modify the Minimalistic web-client from this:

I've wanted to stick to the minimal, but you are right. However, IMO your code is a bit excessive. The following will do the trick, if page contents may be returned gzipped:

Code:
import urllib2

import socket

import zlib

def load_page(url, data=None):

    """

    Minimalistic web-client

    """

    request = urllib2.Request(url, None,

                           {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0',

                           'Accept-Charset': 'UTF-8',

                           'Accept': 'text/html'})

    try:

        session = urllib2.urlopen(request, data)

    except (urllib2.URLError, socket.timeout):

        page = '404'

    else:

        page = session.read()

        if session.info().getheader('Content-Encoding') == 'gzip':

            page = zlib.decompress(page, zlib.MAX_WBITS + 16)

        session.close()

    return page

(2015-05-09, 11:18)Roman_V_M Wrote: [ -> ]
(2015-05-09, 00:14)learningit Wrote: [ -> ]One other extremely irritating thing with web servers is that they can return the data encoded gzip (compressed) even though you specifically tell the web server not to. In that case you need to modify the Minimalistic web-client from this:

I've wanted to stick to the minimal, but you are right. However, IMO your code is a bit excessive. The following will do the trick, if page contents may be returned gzipped:

Code:
import requests

requests.get(url).text

It "just works". Life's too short for urllib2

Yep they all work. The bigger point I was trying to make is getting zipped text back from the web server when you don't specify gzip as an accept can be really frustrating for someone who hasn't dealt with it.

utuxia

Roman_V_M

learningit

Roman_V_M

takoi

learningit