[dev] Project Free TV
#1
New addon I've been developing off the following site:

hxxp://www.free-tv-video-online.me/

They catalog links to many different sources for movies/tv shows, most of the sources are typical video streaming sites and usually send in FLV format, so quality isn't always quite as high as AVI's from MegaUpload or other file shares

Source & repository:
https://github.com/Eldorados/eldorado-xb...jectfreetv

Heavily dependent on t0mm0's common library as well as the new urlresolver:
http://forum.xbmc.org/showthread.php?tid=105707

I'm reaching the point where I could use some help due to my lack of Python knowledge and hoping to get some input from others as well as tips on how to improve, suggestions on what to add etc.

Currently the addon works quite well for playing movies, a number of the sources are already included in urlresolver and I'm sure more to come
#2
First issue: - SOLVED

This page - hxxp://www.free-tv-video-online.me/movies/

Lists the most recently added movies/links to the site, but I am unable to make use of it

Currently I have a option for 'Latest', scrapes the page for the links and removes duplicate movie names.. but as of now does not do anything beyond

What I would like - list displays as is now, user clicks on movie and my sources dialog box pops up and can select to play

But I'm not sure how to get from my list of latest to their sources, even if I re-scrape the same page for just those links reported there, I'm a bit lost on how to do so - how do I scrape a 2nd time for the list of links pertaining only to the movie you just click on?


Doing some nifty regex work and needing to pass a couple extra queries when selecting to play an item, valid list of sources now shows

Second issue: SOLVED

TV Shows - hxxp://www.free-tv-video-online.me/internet/

Even though I find the site rather well organized, I have an issue with how they treated the tv show section.. all shows listed on the page are separated/linked by using html anchors

So.. I'm lost on how to scrape the page and only return matches within a specified anchor?

Did a simple search between tags as suggested by t0mm0

Third issue:

Sources dialog box - I'm struggling to find a way to properly sort the list, ideally I would like all 'Watch Trailer ##' at the very top followed by the sources, alphabetical order would be nice

Any tips? All searches I've done say I cannot sort a DICT, any coding suggestions on how to compile my links differently?
#3
Eldorado Wrote:Third issue:

Sources dialog box - I'm struggling to find a way to properly sort the list, ideally I would like all 'Watch Trailer ##' at the very top followed by the sources, alphabetical order would be nice

Any tips? All searches I've done say I cannot sort a DICT, any coding suggestions on how to compile my links differently?

Sorting is a good idea. Why isn't xbmc able to sort the list, like it is in library view? I have been checking up about tuples, they can't be sorted. But you can convert a tuple to a list by
Code:
name_of_list = list(name_of_tuple)
and then sort by
Code:
name_of_list.sort()

Does anybody know why we use tuples instead of lists?
#4
Eldorado Wrote:First issue:

This page - hxxp://www.free-tv-video-online.me/movies/

Lists the most recently added movies/links to the site, but I am unable to make use of it

Currently I have a option for 'Latest', scrapes the page for the links and removes duplicate movie names.. but as of now does not do anything beyond

What I would like - list displays as is now, user clicks on movie and my sources dialog box pops up and can select to play

But I'm not sure how to get from my list of latest to their sources, even if I re-scrape the same page for just those links reported there, I'm a bit lost on how to do so - how do I scrape a 2nd time for the list of links pertaining only to the movie you just click on?
i would probably cheat and not bother Wink in short i don't think it fits nicely into the quite narrow situation urlresolver et al is currently designed for. maybe we can fix that somehow?
once we get caching sorted re-scraping the page would be quick as you wouldn't need to download it again.


Eldorado Wrote:Second issue:

TV Shows - hxxp://www.free-tv-video-online.me/internet/

Even though I find the site rather well organized, I have an issue with how they treated the tv show section.. all shows listed on the page are separated/linked by using html anchors

So.. I'm lost on how to scrape the page and only return matches within a specified anchor?
you could do one regex to find the section, something like:
Code:
re.search('<a name="%s">(.+?)<a name=' % letter, html, re.DOTALL)

and then do your scraping within that?
Eldorado Wrote:Third issue:

Sources dialog box - I'm struggling to find a way to properly sort the list, ideally I would like all 'Watch Trailer ##' at the very top followed by the sources, alphabetical order would be nice

Any tips? All searches I've done say I cannot sort a DICT, any coding suggestions on how to compile my links differently?

yeah you can't sort a dict as a dict doesn't have an order. this is something that needs to be fixed in urlrsolver.choose_sources() (i'm on the case - it also interacts with the videoid stuff we were discussing in the other thread)

t0mm0
#5
t0mm0 Wrote:i would probably cheat and not bother Wink in short i don't think it fits nicely into the quite narrow situation urlresolver et al is currently designed for. maybe we can fix that somehow?
once we get caching sorted re-scraping the page would be quick as you wouldn't need to download it again.

Yep, I'm really thinking of not bothering Smile

Though it does suck not knowing when new releases are added.. keep that one disabled for now until I can figure it out

t0mm0 Wrote:you could do one regex to find the section, something like:
Code:
re.search('<a name="%s">(.+?)<a name=' % letter, html, re.DOTALL)

and then do your scraping within that?

Good idea, the regex stuff I'm very green with

So this will get me to the section and I can read everything past that, but how do I stop reading before the start of the next section?
#6
Eldorado Wrote:Good idea, the regex stuff I'm very green with

So this will get me to the section and I can read everything past that, but how do I stop reading before the start of the next section?

that regex will only get you the one section (note the second '<name=' at the end) so that regex should already do what you are asking (captures everything between the anchor with the letter you are looking for and the next anchor). try it out and see....

i find playing with http://www.myregextester.com/ is a good way to visualise what is going on with regexes.

t0mm0
#7
t0mm0 Wrote:that regex will only get you the one section (note the second '<name=' at the end) so that regex should already do what you are asking (captures everything between the anchor with the letter you are looking for and the next anchor). try it out and see....

i find playing with http://www.myregextester.com/ is a good way to visualise what is going on with regexes.

t0mm0

Completely missed that the first time around, muchos gracias!

Thanks for the link, I could have used this earlier Smile
#8
I'm unable to pull in this page, seems the spaces in the url are causing my troubles.. even though that have been converted to +'s

Any tips?

http://www.free-tv-video-online.me/movie...20earlier/

Using t0mm0's net.http.GET(url).content

Log:
Code:
13:35:12 T:6120   ERROR: Error Type: <class 'urllib2.HTTPError'>
13:35:12 T:6120   ERROR: Error Contents: HTTP Error 406: Not Acceptable
13:35:12 T:6120   ERROR: Traceback (most recent call last):
                                              File "C:\Users\M33282\AppData\Roaming\XBMC\addons\plugin.video.projectfreetv\default.py", line 222, in <module>
                                                GetMovieList(url)
                                              File "C:\Users\M33282\AppData\Roaming\XBMC\addons\plugin.video.projectfreetv\default.py", line 78, in GetMovieList
                                                html = net.http_GET(url).content
                                              File "C:\Users\M33282\AppData\Roaming\XBMC\addons\script.module.t0mm0.common\lib\t0mm0\common\net.py", line 179, in http_GET
                                                return self._fetch(url, headers=headers, compression=compression)
                                              File "C:\Users\M33282\AppData\Roaming\XBMC\addons\script.module.t0mm0.common\lib\t0mm0\common\net.py", line 260, in _fetch
                                                response = urllib2.urlopen(req)
                                              File "C:\Program Files (x86)\XBMC\system\python\Lib\urllib2.py", line 126, in urlopen
                                                return _opener.open(url, data, timeout)
                                              File "C:\Program Files (x86)\XBMC\system\python\Lib\urllib2.py", line 397, in open
                                                response = meth(req, response)
                                              File "C:\Program Files (x86)\XBMC\system\python\Lib\urllib2.py", line 510, in http_response
                                                'http', request, response, code, msg, hdrs)
                                              File "C:\Program Files (x86)\XBMC\system\python\Lib\urllib2.py", line 435, in error
                                                return self._call_chain(*args)
                                              File "C:\Program Files (x86)\XBMC\system\python\Lib\urllib2.py", line 369, in _call_chain
                                                result = func(*args)
                                              File "C:\Program Files (x86)\XBMC\system\python\Lib\urllib2.py", line 518, in http_error_default
                                                raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
                                            HTTPError: HTTP Error 406: Not Acceptable
13:35:12 T:6120    INFO: -->End of Python script error report<--
13:35:12 T:6120    INFO: Python script stopped
13:35:12 T:6120   DEBUG: Thread XBPyThread 6120 terminating
13:35:12 T:5996   DEBUG:  XFILE::CPluginDirectory::WaitOnScriptResult - plugin exited prematurely - terminating
13:35:12 T:3852   ERROR: XFILE::CDirectory::GetDirectory - Error getting plugin://plugin.video.projectfreetv/?url=http%3A%2F%2Fwww.free-tv-video-online.me%2Fmovies%2F1960%27s+and+earlier&section=movies&mode=movieslist
13:35:12 T:3852   ERROR: CGUIMediaWindow::GetDirectory(plugin://plugin.video.projectfreetv/?url=http%3A%2F%2Fwww.free-tv-video-online.me%2Fmovies%2F1960%27s+and+earlier&section=movies&mode=movieslist) failed
#9
Eldorado Wrote:I'm unable to pull in this page, seems the spaces in the url are causing my troubles.. even though that have been converted to +'s

Any tips?

http://www.free-tv-video-online.me/movie...20earlier/

Using t0mm0's net.http.GET(url).content

works for me...
#10
t0mm0 Wrote:works for me...

Take out the %20's and replace them with spaces or plus signs, then give er a go

This is the exact url that I'm passing from my logs:

Code:
15:10:09 T:772   DEBUG: Project Free TV: adding dir: 1960's And Earlier - plugin://plugin.video.projectfreetv/?url=http%3A%2F%2Fwww.free-tv-video-online.me%2Fmovies%2F1960%27s+and+earlier&section=movies&mode=movieslist
#11
Eldorado Wrote:Take out the %20's and replace them with spaces or plus signs, then give er a go

This is the exact url that I'm passing from my logs:

Code:
15:10:09 T:772   DEBUG: Project Free TV: adding dir: 1960's And Earlier - plugin://plugin.video.projectfreetv/?url=http%3A%2F%2Fwww.free-tv-video-online.me%2Fmovies%2F1960%27s+and+earlier&section=movies&mode=movieslist

ah, it's not getting url encoded enough times. the path part of the url needs to be valid and urlencoded before passing it to add_video_item() (looks like you are passing "http://www.free-tv-video-online.me/movies/1960's and earlier"?) - is it not urlencoded where you are scraping it from? if it is already urlencoded there is a bug somewhere in my code - can you post the code that causes this to fail and i'll take a look.

edit: guess it is not properly encoded on the site:
Code:
<a href="1960's and earlier">
very sloppy!

running just that bit through urllib.quote() before appending it to the rest of the URL should do the trick!



t0mm0
#12
That did the job! Yep, I was surprised by that link!

For anyone following along I've sent v0.0.4 to my repo, most of everything is playable

Things I would like to clean up:

- order the source links in the popup dialog box
- tv shows sources currently list sources for all episodes instead of just the one you clicked on
- Latest sections for both movies and tv shows have same problem as above

Adding metadata support would be nice too, maybe my next project I can pull the metadata piece out of icefilms
#13
Eldorado Wrote:That did the job! Yep, I was surprised by that link!
cool, some websites are pretty badly coded - browsers are far too lenient and silently fix up too many mistakes so web devs get lazy Wink

Eldorado Wrote:For anyone following along I've sent v0.0.4 to my repo, most of everything is playable

Things I would like to clean up:

- order the source links in the popup dialog box
i'm currently working on a branch of urlresolver which will hopefully fix this and the hoster/videoid issue, nothing in my github yet but should have something for comments in the next few days
Eldorado Wrote:Adding metadata support would be nice too, maybe my next project I can pull the metadata piece out of icefilms
that sounds like it would be another useful building block to go alongside urlresolver and t0mm0.common....

great work on this addon, getting someone else to put my modules through their paces and come up with suggestions has really helped me think about how to make stuff better before their first stable release. i can't wait to get them finalised enough for a release so that we can get this stuff out to a wider audience. hopefully the changes i make to fix the stuff mentioned above won't cause too much work for you to update!

thanks,

t0mm0.
#14
t0mm0 Wrote:i'm currently working on a branch of urlresolver which will hopefully fix this and the hoster/videoid issue, nothing in my github yet but should have something for comments in the next few days

that sounds like it would be another useful building block to go alongside urlresolver and t0mm0.common....

great work on this addon, getting someone else to put my modules through their paces and come up with suggestions has really helped me think about how to make stuff better before their first stable release. i can't wait to get them finalised enough for a release so that we can get this stuff out to a wider audience. hopefully the changes i make to fix the stuff mentioned above won't cause too much work for you to update!

thanks,

t0mm0.

Your common class and urlresolver has made this one very easy! And I chose a rather simple site as my first go

I finally got my tv show sources fixed up, I had to create a new add_video method that accepts an optional dict of queries.. when I create a video item for an episode, I pass the episode name so that when I user clicks on the episode, I am then able to grab only the html portion containing source links for just that episode - unique way this site lays out it's tv show content

This might be useful for others, should I submit a patch to update your add_video_item() and add_item() methods?
#15
Eldorado Wrote:I finally got my tv show sources fixed up, I had to create a new add_video method that accepts an optional dict of queries.. when I create a video item for an episode, I pass the episode name so that when I user clicks on the episode, I am then able to grab only the html portion containing source links for just that episode - unique way this site lays out it's tv show content

This might be useful for others, should I submit a patch to update your add_video_item() and add_item() methods?

yeah, please do, as long as it's generic enough that you can pass anything around - maybe it will be useful in other situations too.

t0mm0

Logout Mark Read Team Forum Stats Members Help
[dev] Project Free TV0