Help with a TV Guide script for France (www.telerama.fr)?
#1
Lightbulb 
Hello,

I'd like to developp a script to scrape thus display french TV programs from http://television.telerama.fr/tele/grille.php.

I've passed these steps:
- grab the page
- parse it
- store the needed datas

Here is my partial code (I've just began 2 days ago ...)

Partial grabbed page:
Quote:<div id="leprogramme"> <!-- ####################################################################chaine -->
<div id="item_192" class="sortable">
<div class="chaine" alt="TF 1" title="TF 1">
<div class="logo logo_ch_192" title="TF 1" alt="TF 1">
<a href="javascript:void(0);" onclick="return retirerChaine('192');" class="pointer" ><img src="http://icon.telerama.fr/iconsv2/grille_croix.gif" alt="" border="0" /></a>
</div><!-- logo -->
<div class="sep-15">.</div>
<div class="programme" id="ch_programme_192" alt="" title="">
<div class="emission genre-vide jeunesse pointer" style="width:563px;left:0px;z-index: 1;" id="emission_12711016" onclick="return afficherEmission('12711016', '192');" alt="TFou - Mercredi 22 avril de 06h30 à 11h05" title="TFou - Mercredi 22 avril de 06h30 à 11h05">
<div class="conteneur">
<span class="genre">Jeunesse</span><br />
<span class="titre">
<a href="/tele/emission.php?id=12711016" onclick="return false;" class="annulahref">TFou</a>
</span><br />
<span style="resume">Au sommaire : - Tweenies - Charlie et Lola - Ni hao,&nbsp;...</span>
<div id="data_12711016" style="display:none;">{"Id_Diffusion":"12711016","Id_Emission":"12550526","Id_Chaine":"192","Date_Debut":"2009-04-22 06:30:00","Date_Fin":"2009-04-22 11:05:00","Titre":"TFou","Sous_Titre":"","ShowViewFr":"25041803","note_T":"0","Id_Rubrique":"8403","Rubrique_Libelle":"Magazine jeunesse","Rubrique_Niveau":"3","Type":"Jeunesse","Chaine_Nom":"TF 1","Logo":"192.gif","Id_Hierarchie":"030302","DureeEnSecondes":"10500","resume_court":"Au sommaire : - Tweenies - Charlie et Lola - Ni hao,&nbsp;...","resume_long":"Au sommaire : - Tweenies - Charlie et Lola - Ni hao, Kai-Lan - Chuggington - La Maison de Mickey - Le Petit Dinosaure - Casper, l'école de la peur - Spiez, nouvelle génération - Totally Spies - Bob l'éponge - Monster Buster Club - Power Rangers - Les Fées&nbsp;...","dateheurechaine":"Mercredi 22 avril de 06h30 à 11h05 sur TF 1","intervenant":"","Url_Fiche":""}</div>
</div>
</div><!-- emission-->
<div class="sep" style="left:563px;"> </div>

<div class="emission genre-vide serie pointer" style="width:223px;left:565px;z-index: 2;" id="emission_12711017" onclick="return afficherEmission('12711017', '192');" alt="7 à la maison - Mercredi 22 avril de 11h05 à 11h55" title="7 à la maison - Mercredi 22 avril de 11h05 à 11h55">
<div class="conteneur">
<span class="genre">Série</span><br />
<span class="titre">
<a href="/tele/emission.php?id=12711017" onclick="return false;" class="annulahref">7 à la maison</a>
</span><br />
<span style="resume">Les jumeaux, Sam et David, acceptent de dévoiler des&nbsp;...</span>
<div id="data_12711017" style="display:none;">{"Id_Diffusion":"12711017","Id_Emission":"6585632","Id_Chaine":"192","Date_Debut":"2009-04-22 11:05:00","Date_Fin":"2009-04-22 11:55:00","Titre":"7 à la maison","Sous_Titre":"Petits secrets de famille","ShowViewFr":"1241984","note_T":"0","Id_Rubrique":"8534","Rubrique_Libelle":"Série sentimentale","Rubrique_Niveau":"3","Type":"Série","Chaine_Nom":"TF 1","Logo":"192.gif","Id_Hierarchie":"060902","DureeEnSecondes":"3000","resume_court":"Les jumeaux, Sam et David, acceptent de dévoiler des&nbsp;...","resume_long":"Les jumeaux, Sam et David, acceptent de dévoiler des secrets en contrepartie de lait et de gâteaux. Rapidement, la situation s'envenime...","dateheurechaine":"Mercredi 22 avril de 11h05 à 11h55 sur TF 1","intervenant":"<strong>R&eacute;alisateur : </strong> Harry Harris<br><strong>Acteur : </strong> Stephen Collins (Eric Camden), Catherine Hicks (Annie Camden) ...<br><br>","Url_Fiche":""}</div>
</div>
</div><!-- emission-->


Code:
#!/usr/bin/env python
# -*- coding: cp1252 -*-

#############################################################################

import httplib
import urllib
import sys

import re

import csv

from BeautifulSoup import BeautifulSoup
from BeautifulSoup import NavigableString

#############################################################################

url_a_parser = 'http://television.telerama.fr/tele/grille.php'

conn.putrequest('GET', url_a_parser)
conn.putheader('Accept', 'text/html')
conn.putheader('Accept', 'text/plain')

conn.endheaders()

## Récupération de la réponse
errcode, errmsg, headers = conn.getreply()

## ToDo : Add a check on errors


f=conn.getfile()

f=myPage.read()
mySoup=BeautifulSoup(myPageBuffer)

for resultats in mySoup.findAll('div'):
    machaine = resultats.string
    taillechaine = len(str(machaine))
    if taillechaine > 30: # (what i want to grab is bigger than 30 car)
        trim_left = str(machaine)[1:]
        trim_result = trim_left[:len(trim_left)-1] # lead and tail string's cleaning
        la_liste.append(trim_result) # storing expected datas in a list object

# test
split_datas =  str(la_liste[-1]).split(',')  
print "split_datas: \n" + str(split_datas[18])

I try to parse each data string, but my problem is that the datas are malformatted (csv like): each field (between "any field") is separated with a ',', but the delimiter can be inside a field.
ex: [HTML]"blah,blahblah","none here","here yes, grrr","here:not","last"[/HTML]

Is it possible to solve it easily ?

Next steps:
- re-generate HTML with timeline style
- include ability to choose channels (with cookies)
- add possibility to view programs of "this evening", "tomorrow", "next week", "thursday" ...
- May be add rss scrolling
- ...

Thanks a lot for your useful help !! Wink
Reply
#2
Hi samsam,

Nice to see a projetc for telerama. I didin't look too much at your example, but it seems you could do you slip on something like '","' instead of ',' only.
Otherwise you could use regex too in order to get exactly what you want but I don't think it would be necessary in your case.
Image
_____________________________

Repositories Installer: select and install unofficial repositories / TAC.TV: watch videos on TAC.TV
Installer Passion-XBMC: Download and Install Add-ons (pre-Dharma only)

Image
Reply
#3
Hello,

Thanks for your answer.
The simplest workaround i found was to replace '","' by '";"' than to split with ;
Reply
#4
Hello,

I've finished the data part Nod

The next part is the GUI itself !!!

After a long search on Python timeline / TV Listings, ... I found this very interesting link:

http://forum.xbmc.org/showpost.php?p=65321&postcount=25

The GUI shown is exactly what i'd like to do:

Image

Image

My problem is that the source code is no more available on sourceforge Oo

Anyone have it on its hard drive or had backed it up please ?

Any idea to help me on building such a GUI ?

Thanks a lot !
Reply
#5
I think you will find the last version and the code available here (if we are talking about the same thing):
http://code.google.com/p/python-xmltv/

You should also have a look here:
http://code.google.com/p/mythbox/
Look at the screenshots: http://code.google.com/p/mythbox/wiki/Screenshots
It seems it is also similar to what you want to do.

There is also a the script Football (available at http://code.google.com/p/xbmc-scripting/)
I am not sure, but I think the last version of it using windowXML is doing this kind of GUI, give it a try.
And in all the case even if it is little bit more difficult at the beginning to use WindowXML (instead of Window), you should do it, it is 100 time more powerful and you can do very nice things with it.
Image
_____________________________

Repositories Installer: select and install unofficial repositories / TAC.TV: watch videos on TAC.TV
Installer Passion-XBMC: Download and Install Add-ons (pre-Dharma only)

Image
Reply
#6
Thanks a lot for your useful help Temhil.

I'd really like to keep the http://www.telerama.fr programs's comments, instead of XMLTV ones (unless i won't do that work Wink , python XMLTV seems to be powerfull for that).

I'm gonna check all provided links, and will try, as you suggest, to explore the WindowXML way.

An other question:
I grab the TV programs from an HTML web page.
Do i have to generate a XML file to store the datas, or always grab them "just in time" for each display ?
I guess for my part that the "query" will be done only a couple of times, maybe 2 or 3 max ... !?!

thanks a lot !
Reply
#7
I guess it really depend what you want to do and how you can get the information.
Keep in mind scripts are little bit slower than XBMC (especially on a XBOX) and loading time could be long, I usually prefer to do several short download instead of one big, user does not like to wait.
If you can get and xml page directly from the web it would be the easiest way to parse it (using lib such as elementTree or beautifulsoup), or if you can get only html page, in this case regex (regular expression) would be the best.

Concerning storing data, as I said depend what you want to do, storing between to running of your script, on you just want to have data present while the script is running. In the second case, dont bother use a dictionnary or list ...
You also have the option (if you should to store) to create a local Database and to use a lib like sqllite and do query on it.
Image
_____________________________

Repositories Installer: select and install unofficial repositories / TAC.TV: watch videos on TAC.TV
Installer Passion-XBMC: Download and Install Add-ons (pre-Dharma only)

Image
Reply
#8
In fact, the grabbed website doesn't "offer" a XML data file, only a HTML like encapsuled datas.
I use the nice plugin BeatifulSoup, wich as some limits in that case (malformatted data fields, that i have to pretify with regex or replace).

Each page query (6H of programs approximatively) + full data parsing is about 10 to 20 seconds max to execute, so i guess there's no need to focus now on improve the code !?!
As you said, my code is based on lists and works fine.
When the GUI part will be finished, i'll focus on grabbing datas for X days in background, thus store it in a XML file or a DB like SQLite.

But each things in its time Laugh
Reply

Logout Mark Read Team Forum Stats Members Help
Help with a TV Guide script for France (www.telerama.fr)?0