Kodi Community Forum

Full Version: Video Add-on scraper
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I have a very basic video add-on which works fine, however I would like to develop it a bit more.

I searched for related topics to find something about write/build a scraper/crawler in order to feed the video sources automatically. I have not find any proper source/tutorial to dig more about this issue.

Here is my simple code:

Code:
# -*- coding: utf-8 -*-
# Module: default
# Acoording to the "Roman V. M." project by: AAM
# Created on: 11.07.2017
# License: GPL v.3 https://www.gnu.org/copyleft/gpl.html

import sys
from urllib import urlencode
from urlparse import parse_qsl
import xbmcgui
import xbmcplugin

# Get the plugin url in plugin:// notation.
_url = sys.argv[0]
# Get the plugin handle as an integer number.
_handle = int(sys.argv[1])

# Free TV videos are provided by www.telewebion.com
# Here we use a fixed set of properties simply for demonstrating purposes
# In a "real life" plugin you will need to get info and links to video files/streams
# from some web-site or online service.
VIDEOS = {'TELEWEBION':   [{'name': '1- TV1',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/tv1.png',
                           'video': 'http://sl16.telewebion.com:1935/devices/_definst_/tv1-1000k.stream/playlist.m3u8?wmsAuthSign=aXNfZnJlZT0xJnNlcnZlcl90aW1lPTEwLzE0LzIwMTcgNDoxODo0NiBQTSZoYXNoX3ZhbHVlPWVVcWtRdXA4Mm1UV045S1A5NGo1Qnc9PSZ2YWxpZG1pbnV0ZXM9NjAwMA==',
                           'genre': 'TV'},
                           {'name': '2- TV2',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/tv2.png',
                           'video': 'http://sa19.telewebion.com:1935/devices/_definst_/tv2-1000k.stream/playlist.m3u8?wmsAuthSign=aXNfZnJlZT0xJnNlcnZlcl90aW1lPTEwLzE0LzIwMTcgNDoxOTozMiBQTSZoYXNoX3ZhbHVlPXl1YWZ4RHZueEZFNHRIT0czVXpTeUE9PSZ2YWxpZG1pbnV0ZXM9NjAwMA==',
                           'genre': 'TV'},
                           {'name': '3- TV3',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/tv3.png',
                           'video': 'http://sl14.telewebion.com:1935/devices/_definst_/bck2tv3-1000k.stream/playlist.m3u8?wmsAuthSign=aXNfZnJlZT0xJnNlcnZlcl90aW1lPTEwLzE3LzIwMTcgMTI6NDo5IFBNJmhhc2hfdmFsdWU9b3FqYis0UVZscDE5ampFWFFTSFU3UT09JnZhbGlkbWludXRlcz02MDAw',
                           'genre': 'TV'},
                           {'name': '5- TV5',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/tv5.png',
                           'video': 'http://sl19.telewebion.com:1935/devices/_definst_/tehran-1000k.stream/playlist.m3u8?wmsAuthSign=aXNfZnJlZT0xJnNlcnZlcl90aW1lPTEwLzE0LzIwMTcgNTo0OjE2IFBNJmhhc2hfdmFsdWU9WmxCMEw0dzFXVmFQeE5tZUQ0b2l5UT09JnZhbGlkbWludXRlcz02MDAw',
                           'genre': 'TV'},
                           {'name': '6- IRINN',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/irinn.png',
                           'video': 'http://sl15.telewebion.com:1935/devices/_definst_/irinn-1000k.stream/playlist.m3u8?wmsAuthSign=aXNfZnJlZT0xJnNlcnZlcl90aW1lPTEwLzE0LzIwMTcgNDoyMjo0OCBQTSZoYXNoX3ZhbHVlPVRGL0loNmFtclhoR1NOR1o2dE15Snc9PSZ2YWxpZG1pbnV0ZXM9NjAwMA==',
                           'genre': 'TV'},
                           {'name': 'IRAN NASIM',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/nasim.png',
                           'video': 'http://sa16.telewebion.com:1935/devices/_definst_/nasim-1000k.stream/playlist.m3u8?wmsAuthSign=aXNfZnJlZT0xJnNlcnZlcl90aW1lPTEwLzE0LzIwMTcgNDo1MDo1OCBQTSZoYXNoX3ZhbHVlPVdndFlDMHU5WnU4MUVBakZIS3ptRVE9PSZ2YWxpZG1pbnV0ZXM9NjAwMA==',
                           'genre': 'TV'},
                           {'name': 'IRAN iFILM',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/ifilm.png',
                           'video': 'http://sa16.telewebion.com:1935/devices/_definst_/ifilm-1000k.stream/playlist.m3u8?wmsAuthSign=aXNfZnJlZT0xJnNlcnZlcl90aW1lPTEwLzE0LzIwMTcgNDo1MzozMCBQTSZoYXNoX3ZhbHVlPW0wK2hPejJjY0hFZGk4Q1YxTlNJcnc9PSZ2YWxpZG1pbnV0ZXM9NjAwMA==',
                           'genre': 'TV'},
                           {'name': 'IRAN VARZESH',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/varzesh.png',
                           'video': 'http://s12.telewebion.com:1935/devices/_definst_/bck2varzesh-1000k.stream/playlist.m3u8?wmsAuthSign=aXNfZnJlZT0xJnNlcnZlcl90aW1lPTEwLzE3LzIwMTcgMTE6MzQ6MjAgQU0maGFzaF92YWx1ZT05YlBDOEFPY1Y4YUVYUXZSdENoOXBBPT0mdmFsaWRtaW51dGVzPTYwMDA=',
                           'genre': 'TV'}
                       ],
               'APARAT': [{'name': '1- TV1',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/tv1.png',
                           'video': 'https://live.cdn.asset.aparat.com/astv1/edge/tv1.m3u8?wmsAuthSign=4eb70fe8758b8ecd23e39d1ec4dcdf98',
                           'genre': 'TV'},
                           {'name': '2- TV2',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/tv2.png',
                           'video': 'https://live.cdn.asset.aparat.com/astv1/edge/tv2.m3u8?wmsAuthSign=7790eb90e12db3ef4aaec51e4d29cf03',
                           'genre': 'TV'},
                           {'name': '3- TV3',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/tv3.png',
                           'video': 'https://live.cdn.asset.aparat.com/astv1/edge/tv3.m3u8?wmsAuthSign=b5d399b23dc7fc20770abea664f039a3',
                           'genre': 'TV'},
                           {'name': '5- TV5',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/tv5.png',
                           'video': 'https://live.cdn.asset.aparat.com/astv1/edge/tv5.m3u8?wmsAuthSign=7861a9ac8b4bc0c884839d9be9baf265',
                           'genre': 'TV'},
                           {'name': '6- IRINN',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/irinn.png',
                           'video': 'https://live.cdn.asset.aparat.com/astv1/edge/irinn.m3u8?wmsAuthSign=6b7920274ca5fcb2af3e536faa1d0ba1',
                           'genre': 'TV'},
                           {'name': 'IRAN NASIM',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/nasim.png',
                           'video': 'https://live.cdn.asset.aparat.com/astv1/edge/nasim.m3u8?wmsAuthSign=ddb0cfb0e34d96237bda861abb125663',
                           'genre': 'TV'},
                           {'name': 'IRAN iFILM',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/ifilm.png',
                           'video': 'https://live.cdn.asset.aparat.com/astv1/edge/ifilm.m3u8?wmsAuthSign=0acd2828da97315abfe149b7bf812456',
                           'genre': 'TV'},
                           {'name': 'IRAN VARZESH',
                           'thumb': 'http://staticfiles.telewebion.com/web/content_images/channel_images/thumbs/new/240/v3/varzesh.png',
                           'video': 'https://live.cdn.asset.aparat.com/astv1/edge/varzesh.m3u8?wmsAuthSign=46a5e3272e58f4c472f1c649d0639181',
                           'genre': 'TV'}
                       ],
                'OTHER': [{'name': 'BBC PERSIAN',
                           'thumb': 'http://www.bbc.co.uk/news/special/2015/newsspec_11063/persian_1024x576.png',
                           'video': 'http://bbcwshdlive01-lh.akamaihd.net/i/ptv_1@78015/master.m3u8',
                           'genre': 'TV'},
                           {'name': 'RADIO JAVAN',
                           'thumb': 'https://www.radiojavan.com/images/nav_logo.png',
                           'video': 'https://rjtv1.rjtv.me/live/smil:rjtv.smil/playlist.m3u8',
                           'genre': 'TV'},
                           {'name': 'MANOTO TV',
                           'thumb': 'https://www.manototv.com/Content/Images/manoto-logo-8ec71c.png',
                           'video': 'https://d1lcmmejnwm81j.cloudfront.net/manotolive.m3u8',
                           'genre': 'TV'}
                          ]}


def get_url(**kwargs):
   """
   Create a URL for calling the plugin recursively from the given set of keyword arguments.

   :param kwargs: "argument=value" pairs
   :type kwargs: dict
   :return: plugin call URL
   :rtype: str
   """
   return '{0}?{1}'.format(_url, urlencode(kwargs))


def get_categories():
   """
   Get the list of video categories.

   Here you can insert some parsing code that retrieves
   the list of video categories (e.g. 'Movies', 'TV-shows', 'Documentaries' etc.)
   from some site or server.

   .. note:: Consider using `generator functions <https://wiki.python.org/moin/Generators>`_
       instead of returning lists.

   :return: The list of video categories
   :rtype: list
   """
   return VIDEOS.iterkeys()


def get_videos(category):
   """
   Get the list of videofiles/streams.

   Here you can insert some parsing code that retrieves
   the list of video streams in the given category from some site or server.

   .. note:: Consider using `generators functions <https://wiki.python.org/moin/Generators>`_
       instead of returning lists.

   :param category: Category name
   :type category: str
   :return: the list of videos in the category
   :rtype: list
   """
   return VIDEOS[category]


def list_categories():
   """
   Create the list of video categories in the Kodi interface.
   """
   # Get video categories
   categories = get_categories()
   # Iterate through categories
   for category in categories:
       # Create a list item with a text label and a thumbnail image.
       list_item = xbmcgui.ListItem(label=category)
       # Set graphics (thumbnail, fanart, banner, poster, landscape etc.) for the list item.
       # Here we use the same image for all items for simplicity's sake.
       # In a real-life plugin you need to set each image accordingly.
       list_item.setArt({'thumb': VIDEOS[category][0]['thumb'],
                         'icon': VIDEOS[category][0]['thumb'],
                         'fanart': VIDEOS[category][0]['thumb']})
       # Set additional info for the list item.
       # Here we use a category name for both properties for for simplicity's sake.
       # setInfo allows to set various information for an item.
       # For available properties see the following link:
       # http://mirrors.xbmc.org/docs/python-docs/15.x-isengard/xbmcgui.html#ListItem-setInfo
       list_item.setInfo('video', {'title': category, 'genre': category})
       # Create a URL for a plugin recursive call.
       # Example: plugin://plugin.video.example/?action=listing&category=Animals
       url = get_url(action='listing', category=category)
       # is_folder = True means that this item opens a sub-list of lower level items.
       is_folder = True
       # Add our item to the Kodi virtual folder listing.
       xbmcplugin.addDirectoryItem(_handle, url, list_item, is_folder)
   # Add a sort method for the virtual folder items (alphabetically, ignore articles)
   xbmcplugin.addSortMethod(_handle, xbmcplugin.SORT_METHOD_LABEL_IGNORE_THE)
   # Finish creating a virtual folder.
   xbmcplugin.endOfDirectory(_handle)


def list_videos(category):
   """
   Create the list of playable videos in the Kodi interface.

   :param category: Category name
   :type category: str
   """
   # Get the list of videos in the category.
   videos = get_videos(category)
   # Iterate through videos.
   for video in videos:
       # Create a list item with a text label and a thumbnail image.
       list_item = xbmcgui.ListItem(label=video['name'])
       # Set additional info for the list item.
       list_item.setInfo('video', {'title': video['name'], 'genre': video['genre']})
       # Set graphics (thumbnail, fanart, banner, poster, landscape etc.) for the list item.
       # Here we use the same image for all items for simplicity's sake.
       # In a real-life plugin you need to set each image accordingly.
       list_item.setArt({'thumb': video['thumb'], 'icon': video['thumb'], 'fanart': video['thumb']})
       # Set 'IsPlayable' property to 'true'.
       # This is mandatory for playable items!
       list_item.setProperty('IsPlayable', 'true')
       # Create a URL for a plugin recursive call.
       # Example: plugin://plugin.video.example/?action=play&video=http://www.vidsplay.com/vids/crab.mp4
       url = get_url(action='play', video=video['video'])
       # Add the list item to a virtual Kodi folder.
       # is_folder = False means that this item won't open any sub-list.
       is_folder = False
       # Add our item to the Kodi virtual folder listing.
       xbmcplugin.addDirectoryItem(_handle, url, list_item, is_folder)
   # Add a sort method for the virtual folder items (alphabetically, ignore articles)
   xbmcplugin.addSortMethod(_handle, xbmcplugin.SORT_METHOD_LABEL_IGNORE_THE)
   # Finish creating a virtual folder.
   xbmcplugin.endOfDirectory(_handle)


def play_video(path):
   """
   Play a video by the provided path.

   :param path: Fully-qualified video URL
   :type path: str
   """
   # Create a playable item with a path to play.
   play_item = xbmcgui.ListItem(path=path)
   # Pass the item to the Kodi player.
   xbmcplugin.setResolvedUrl(_handle, True, listitem=play_item)


def router(paramstring):
   """
   Router function that calls other functions
   depending on the provided paramstring

   :param paramstring: URL encoded plugin paramstring
   :type paramstring: str
   """
   # Parse a URL-encoded paramstring to the dictionary of
   # {<parameter>: <value>} elements
   params = dict(parse_qsl(paramstring))
   # Check the parameters passed to the plugin
   if params:
       if params['action'] == 'listing':
           # Display the list of videos in a provided category.
           list_videos(params['category'])
       elif params['action'] == 'play':
           # Play a video from a provided URL.
           play_video(params['video'])
       else:
           # If the provided paramstring does not contain a supported action
           # we raise an exception. This helps to catch coding errors,
           # e.g. typos in action names.
           raise ValueError('Invalid paramstring: {0}!'.format(paramstring))
   else:
       # If the plugin is called from Kodi UI without any parameters,
       # display the list of video categories
       list_categories()


if __name__ == '__main__':
   # Call the router function and pass the plugin call parameters to it.
   # We use string slicing to trim the leading '?' from the plugin call paramstring
   router(sys.argv[2][1:])

As you may immediately noticed, this is basically the Kodi example video addon and I just grab the video urls and replace them. I am looking for a way to connect a scraper to feed the video sources in case they changed in time.

I have fair knowledge about Python, and I am familiar more or less with Scrapy, Selenium and some other scraping tools.

I am sorry if this post is duplicated, but I have not find something good about it so I create a new thread.

Thanks!
Thread moved to scraper development
I'm afraid, with such broad question you won't get any real answer. With lots of different websites and APIs out there, there is no single approach.
(2017-11-04, 23:38)Roman_V_M Wrote: [ -> ]I'm afraid, with such broad question you won't get any real answer. With lots of different websites and APIs out there, there is no single approach.
Hey Roman Can You Help Me Out?