2011-08-29, 23:22
Eldorado Wrote:First issue:i would probably cheat and not bother in short i don't think it fits nicely into the quite narrow situation urlresolver et al is currently designed for. maybe we can fix that somehow?
This page - hxxp://www.free-tv-video-online.me/movies/
Lists the most recently added movies/links to the site, but I am unable to make use of it
Currently I have a option for 'Latest', scrapes the page for the links and removes duplicate movie names.. but as of now does not do anything beyond
What I would like - list displays as is now, user clicks on movie and my sources dialog box pops up and can select to play
But I'm not sure how to get from my list of latest to their sources, even if I re-scrape the same page for just those links reported there, I'm a bit lost on how to do so - how do I scrape a 2nd time for the list of links pertaining only to the movie you just click on?
once we get caching sorted re-scraping the page would be quick as you wouldn't need to download it again.
Eldorado Wrote:Second issue:you could do one regex to find the section, something like:
TV Shows - hxxp://www.free-tv-video-online.me/internet/
Even though I find the site rather well organized, I have an issue with how they treated the tv show section.. all shows listed on the page are separated/linked by using html anchors
So.. I'm lost on how to scrape the page and only return matches within a specified anchor?
Code:
re.search('<a name="%s">(.+?)<a name=' % letter, html, re.DOTALL)
and then do your scraping within that?
Eldorado Wrote:Third issue:
Sources dialog box - I'm struggling to find a way to properly sort the list, ideally I would like all 'Watch Trailer ##' at the very top followed by the sources, alphabetical order would be nice
Any tips? All searches I've done say I cannot sort a DICT, any coding suggestions on how to compile my links differently?
yeah you can't sort a dict as a dict doesn't have an order. this is something that needs to be fixed in urlrsolver.choose_sources() (i'm on the case - it also interacts with the videoid stuff we were discussing in the other thread)
t0mm0