Kodi Community Forum
script.module.urlresolver development - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Add-ons (https://forum.kodi.tv/forumdisplay.php?fid=26)
+--- Thread: script.module.urlresolver development (/showthread.php?tid=105707)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28


- t0mm0 - 2011-08-19

pieh Wrote:<import addon="script.module.urlresolver" version="0.0.1"/> means that addon need script.module.urlresolver ver 0.0.1 or higher. You can have only 1 version of addon (previous versions are stored in addons/packages). If in any installed repository there is available addon with higher version xbmc will auto-update it - that is if user didn't turn off auto-updates).

thanks pieh, that clears up nicely how it works. so that means that when changing the public api for our module we need to maintain backwards compatibility, and addons should request the version they are tested with (as any later versions should also work)

Eldorado Wrote:The idea in itself is much better than taking a user to yet another screen.. the popup doesn't feel like a screen Smile
yes i like it because you can actually resolve the url for the entry labelled with the name of the video - it means you don't need to do anything messy to keep track of the title, icon, plot etc.
Eldorado Wrote:Youtube:

I scrape this - http://www.youtube.com/v/VihlsPKMh4U&hl=en&fs=1

I need to reformat to be this to work - http://www.youtube.com/watch?v=VihlsPKMh4U&hl=en&fs=1
i consider this a bug. resolver plugins should accept all legit hoster urls. i'll fix that.
Eldorado Wrote:MegaVideo - I usually have the ID rather than the full link

so something like 'MegavideoResolver://VIDEOID' might be useful there. i want to try and work out the best way of doing that kind of thing without making it more complex to write the plugin.

t0mm0.


- Eldorado - 2011-08-20

t0mm0 Wrote:so something like 'MegavideoResolver://VIDEOID' might be useful there. i want to try and work out the best way of doing that kind of thing without making it more complex to write the plugin.

t0mm0.

I'm thinking there could be quite a few suggestions on different ways to launch urlresolver for different uses

This sounds like a definite useful idea

What do you think of having the ability to simply pass in a host name along with the video id? Again I'm being a bit selfish as I'm being lazy in my addon Smile But I can see other addons against sites that use multiple sources possibly benefiting

The benefit would be you that in the addon you would never need to have to figure out what host your source is, simply scrape the source name and the video id and pass both into urlresolver, most sites I have seen usually state the name of the host where the source is located

Currently I need to do a bunch of if's to determine the host so that I know what to pass in


- t0mm0 - 2011-08-20

Eldorado Wrote:What do you think of having the ability to simply pass in a host name along with the video id? Again I'm being a bit selfish as I'm being lazy in my addon Smile But I can see other addons against sites that use multiple sources possibly benefiting

The benefit would be you that in the addon you would never need to have to figure out what host your source is, simply scrape the source name and the video id and pass both into urlresolver, most sites I have seen usually state the name of the host where the source is located
yes that sounds better to me. maybe adding a method like 'valid_host()' to the UrlResolver interface which should return true if the domain name (and maybe just the hoster name such as 'megavideo') matches. that wouldn't be much more work on the plugin authors side, but you could then have a function in urlresolver that took domain (or hoster name) and video id as arguments for resolving.

edit: thinking more about it, actually it will be a bit more work than that. there would also need to be a method to resolve by video id. or maybe get_media_url() should accept either a web_url or video_id (you'd know it was a url if it started 'http://')?

any thoughts?

earlier today i fixed the youtube stuff you mentioned, and i also abstracted out the source choice dialog code into urlresolver.choose_source() so take a look and see what you think (docs are updated to include the new functions).

t0mm0


- DragonWin - 2011-08-20

Hey T0mm0

Regarding the scraper function for anon proxies, I'm all for putting it into t0mm0.common.net, that was my original hope Big Grin

I'm going to look into writing the code that will scrape it, but I hope you will put it into t0mm0.common.net, as I have never written classes or modified them before. My thought was that I would create the functions that would do the work, and hopefully you can put it into t0mm0.common.net ?

I have started to mess around with git, but it takes a little while to get used to how it works.

I downloaded urlresolver from git this morning, and when I were testing my solarmovie addon, I noticed that videoweed did not work. Seems that either the plugin was changed / rolled back or they changed there setup. I have modified the plugin to use there api, and it has worked on the links I had to test with.

This is the modified get_media_url function http://pastebin.com/sHNCK4s8

I like the idea of caching support, it would not only lessen the load on the servers, but also the wait time for the users Nod


In regards to also accept media id's in get_media_url, I guess it depends on how it's implemented. I'm thinking if we just check to see if it's an ID, we could then construct the url we were expecting (as we already know that when building the regex's) from the hostname and ID, and then continue down the function without having to think more about what were delivered to the plugin.


- t0mm0 - 2011-08-20

DragonWin Wrote:Regarding the scraper function for anon proxies, I'm all for putting it into t0mm0.common.net, that was my original hope Big Grin

I'm going to look into writing the code that will scrape it, but I hope you will put it into t0mm0.common.net, as I have never written classes or modified them before. My thought was that I would create the functions that would do the work, and hopefully you can put it into t0mm0.common.net ?
sounds good to me! i haven't looked at that stuff at all yet.
DragonWin Wrote:I have started to mess around with git, but it takes a little while to get used to how it works.
cool, it can be a bit daunting at first (i am no expert and only started using git when i started writing xbmc stuff) but it is worth sticking with as there are some cool (and useful!) features.

DragonWin Wrote:I downloaded urlresolver from git this morning, and when I were testing my solarmovie addon, I noticed that videoweed did not work. Seems that either the plugin was changed / rolled back or they changed there setup. I have modified the plugin to use there api, and it has worked on the links I had to test with.

This is the modified get_media_url function http://pastebin.com/sHNCK4s8

thanks you for fixing that! it seems they actually added some protection to their streams (before the stream address was right there on the page). i committed your fix (with a couple of minor changes) here - many thanks for the patch. seems we are building up a nice list of contributors to this project now!


DragonWin Wrote:I like the idea of caching support, it would not only lessen the load on the servers, but also the wait time for the users Nod
i played with caching a bit earlier today but ended up not committing it so far. i implemented proper caching taking account of server headers and everything, but found that most of the servers i was testing with (letmewatchthis etc.) were not set up to allow caching and therefore nothing got cached.

i might instead add an optional caching of pages for a addon-specified time regardless of server headers, as although this is not technically correct, clearly we can live with cached video listings for a short time,
DragonWin Wrote:In regards to also accept media id's in get_media_url, I guess it depends on how it's implemented. I'm thinking if we just check to see if it's an ID, we could then construct the url we were expecting (as we already know that when building the regex's) from the hostname and ID, and then continue down the function without having to think more about what were delivered to the plugin.

yes i think that could be the way to go. i will work on this video id stuff tonight/tomorrow and see what seems best.

thanks,

t0mm0


- DragonWin - 2011-08-21

I have just started to fiddle around with scraping proxy sites, and well the lists are very flaky (saying a proxy is up with good response time only to find it's down)

Is there a way to set the timeout of the net.http_HEAD function ? If I'm to try and scrape 10 proxy lists to find a few that are decent it could takes ages before those that don't work time out.

Also is there a way in python to messure the response time of a net.http_HEAD ?

in net.py line 314 you have forgotten a "self" in get_headers() should be get_headers(self)

It could be useful with a plugin test function to see if a plugin for that url exists before adding it ... thinking if I scrape a page, and find 10 different hosting sites, but only 8 of those have plugins, it would be nice to do a test in the addon before adding it, and the user later clicks on it only to find the movie link ain't working. like addon.check_for_plugin(url) returns True / False

I agree with bypassing rfc on the caching as long as it's optional for the addon, that way it can be the end users choice so they can disable it, in case it causes issues with a site.

I'm off to bed now ... I'm not fully recovered after the business trip (at times I were wondering if I were on hidden camera, it was that outlandish) Shocked


- t0mm0 - 2011-08-21

DragonWin Wrote:I have just started to fiddle around with scraping proxy sites, and well the lists are very flaky (saying a proxy is up with good response time only to find it's down)

Is there a way to set the timeout of the net.http_HEAD function ? If I'm to try and scrape 10 proxy lists to find a few that are decent it could takes ages before those that don't work time out.
i'll add that. there are some bugs with HEAD requests at the moment too (they turn in to GET requests if there is a redirect on the original URL) that i need to try ad work out.

i assume we'll be saving the last known good proxy (for each country) and trying that first?

coupled with the option to enter a proxy, or x-forwarded-for address (not sure where this UI stuff goes, would be nice to have it global), this would hopefully be the last resort anyway so i guess slow is better than nothing Wink

i guess we will also need a way of telling t0mm0.common.net which country the addon (or urlresolver plugin) would prefer to be coming from?

DragonWin Wrote:Also is there a way in python to messure the response time of a net.http_HEAD ?
you could call time.time() before and after the request and subtract the difference (assuming that all xbmc platforms measure time in small enough increments).

DragonWin Wrote:in net.py line 314 you have forgotten a "self" in get_headers() should be get_headers(self)
oops! thanks, fixed.

DragonWin Wrote:It could be useful with a plugin test function to see if a plugin for that url exists before adding it ... thinking if I scrape a page, and find 10 different hosting sites, but only 8 of those have plugins, it would be nice to do a test in the addon before adding it, and the user later clicks on it only to find the movie link ain't working. like addon.check_for_plugin(url) returns True / False
see filter_urls() and the new today choose_source() (which handles everything for you including the UI) and filter_dict()
DragonWin Wrote:I agree with bypassing rfc on the caching as long as it's optional for the addon, that way it can be the end users choice so they can disable it, in case it causes issues with a site.
yeah i'll add this.

DragonWin Wrote:I'm off to bed now ... I'm not fully recovered after the business trip (at times I were wondering if I were on hidden camera, it was that outlandish) Shocked

sounds like fun Wink

thanks for your comments!

t0mm0.


- t0mm0 - 2011-08-21

Eldorado Wrote:What do you think of having the ability to simply pass in a host name along with the video id? Again I'm being a bit selfish as I'm being lazy in my addon Smile But I can see other addons against sites that use multiple sources possibly benefiting

i did some playing around and have come up with a proposal....

without changing the interface at all it can be done fairly simply.

i was playing with the videoweed plugin. i changed valid_url() to:
Code:
def valid_url(self, web_url):
        return re.match('http://(www.)?videoweed.(es|com)/file/[0-9a-z]+',
                        web_url) or re.match('videoweed.*>.+',
                                             web_url, re.IGNORECASE)


and added the following to the top of get_media_url():
Code:
#construct url from video_id
        r = re.search('videoweed.*>(.+)', web_url, re.IGNORECASE)
        if r:
            video_id = r.group(1)
            web_url = 'http://videoweed.es/file/' + video_id

so you can call urlresolver.resolve('videoweed>crirmdz3tj116') or urlresolver.resolve('videoweed.com>crirmdz3tj116') for example.

i chose '>' as the separator as it is not valid in a URL. can anyone see a problem with this approach? if not i'll add this facility to the other plugins and document the convention. i think it might also be a good time to make a page in the docs with a list of plugins and what can be used to call them.

let me know what you think.....

t0mm0


- t0mm0 - 2011-08-21

DragonWin Wrote:Is there a way to set the timeout of the net.http_HEAD function ? If I'm to try and scrape 10 proxy lists to find a few that are decent it could takes ages before those that don't work time out.
looked at this and i think you may as well handle this in your code rather than in t0mm0.common.net

to set the timeout you just need to do
Code:
import socket
socket.setdefaulttimeout(SECS)
see here
then be ready to catch a socket.timeout exception which will mean a timeout has occurred.

t0mm0


- DragonWin - 2011-08-21

Ahh thanks :-)

unfortunately I'm not at that stage yet, I have hit a bit of an issue with javascript on the page I found. It's the only page I have found that looks to fairly up to date on the proxies.

Javascript I found
Code:
</script></td></tr></table><script type="text/javascript">h8z6=2391;n4t0=5349;f6x4=4457;g7j0=3397;w3s9=6722;o5b2=6330;c3h8=6005;p6d4=5521;k1a1=5412;d4y5=1599;m3q7r8=0^h8z6;r8h8z6=1^n4t0;s9w3p6=2^f6x4;x4k1e5=3^g7j0;q7p6y5=4^w3s9;j0t0c3=5^o5b2;w3m3q7=6^c3h8;l2z6k1=7^p6d4;e5d4m3=8^k1a1;h8n4t0=9^d4y5;</script>

<script type="text/javascript">document.write("<font class=spy2>:<\/font>"+(r8h8z6^n4t0)+(s9w3p6^f6x4)+(s9w3p6^f6x4)+(r8h8z6^n4t0))</script>

The above should result in 1221, but I'm not getting any where with deciphering it.

Also it looks like the first line is "random" generated when I reload the page so I can't just say that "r8h8z6^n4t0" is always 1

This is the source: http://spys.ru/free-proxy-list/US/

Any suggestions ?


No problem, I'll do the timeout in the code I'm writing, thanks for the links Big Grin

I'll look into how I can optimize the selection of proxy when I have some thing to work with, but yes, saving e.g. the last 2 chosen proxies from each country could be a way to speed it up, like having 2 lists where list1 contains used proxies, and the other is generated from scraping proxy lists or some thing similar.


- t0mm0 - 2011-08-21

DragonWin Wrote:Javascript I found
Code:
</script></td></tr></table><script type="text/javascript">h8z6=2391;n4t0=5349;f6x4=4457;g7j0=3397;w3s9=6722;o5b2=6330;c3h8=6005;p6d4=5521;k1a1=5412;d4y5=1599;m3q7r8=0^h8z6;r8h8z6=1^n4t0;s9w3p6=2^f6x4;x4k1e5=3^g7j0;q7p6y5=4^w3s9;j0t0c3=5^o5b2;w3m3q7=6^c3h8;l2z6k1=7^p6d4;e5d4m3=8^k1a1;h8n4t0=9^d4y5;</script>

<script type="text/javascript">document.write("<font class=spy2>:<\/font>"+(r8h8z6^n4t0)+(s9w3p6^f6x4)+(s9w3p6^f6x4)+(r8h8z6^n4t0))</script>

^ is a bitwise XOR (in both python and javascript) which (rather handily!) is it's own inverse function (so if you do a bitwies XOR twice you end up back where you started).

so for example the first number is found using:
Code:
n4t0=5349
r8h8z6=1^n4t0
we want:
Code:
x = r8h8z6^n4t0
so we can combine to get:
Code:
x = 1^n4t0^n4t0
the 2 bitwise XORs cancel each other out! so we end up with
Code:
x = 1

so basically no calculations are required at all - all you have to do is grab the numbers out of the initial javascript line and it becomes a simple process of substitution to find the port number.

Code:
...
[b]m3q7r8=0[/b]^h8z6;[b]r8h8z6=1[/b]^n4t0;[b]s9w3p6=2[/b]^f6x4;[b]x4k1e5=3[/b]^g7j0;[b]q7p6y5=4[/b]^w3s9;[b]j0t0c3=5[/b]^o5b2;[b]w3m3q7=6[/b]^c3h8;[b]l2z6k1=7[/b]^p6d4;[b]e5d4m3=8[/b]^k1a1;[b]h8n4t0=9[/b]^d4y5
...

i knew all those maths classes would be useful some time Wink

t0mm0


- DragonWin - 2011-08-21

haha awesome thanks :-)


- DragonWin - 2011-08-21

It's getting there, at least I seem to have the US proxies under control now. Below is a sample output of my ehmm Eek I think it's called dict in python but hash with hashes Cool

As you can see it contains more information than what it just required right now, but it might come in handy at a later stage, like making a filter on proxytype (HIA = high anonymity, ANM = Anonymous, NOA = no anonymity at all)

The socket timeout were set to 2 sec during that run, to also test for timeouts etc.

Code:
{'208.101.63.210:8118': {'proxytype': 'HIA', 'location': 'United States  (Dallas)', 'testtime': 0.312000036239624}, '205.209.188.235:80': {'proxytype': 'NOA', 'location': 'United States  (Redwood City)', 'testtime': 0.38499999046325684}, '173.0.50.237:3128': {'proxytype': 'HIA', 'location': 'United States  (Kansas City)', 'testtime': 1.8899998664855957}, '198.36.222.8:80': {'proxytype': 'ANM', 'location': 'United States  (Rush City)', 'testtime': 0.31799983978271484}, '204.93.211.219:80': {'proxytype': 'ANM', 'location': 'United States  (Skokie)', 'testtime': 0.41899991035461426}, '69.114.243.229:8909': {'proxytype': 'HIA', 'location': 'United States  (Brooklyn)', 'testtime': 0.42099976539611816}, '198.36.222.8:3128': {'proxytype': 'ANM', 'location': 'United States  (Rush City)', 'testtime': 1.0750000476837158}}

With a bit of luck I can reuse the function for all the other countries.


- rogerthis - 2011-08-21

I posted here http://forum.xbmc.org/showthread.php?tid=108110 about adding metadata and watched status through to addons, like favourites is.

I think both functionality would be a big advantage to our addons. It would save a lot of work on our side, and it is something that would hopefully be easy enough to implement on there side. I might be the fact that they are presently blocking it to addons, I don't know.

I have put it up as a feature request http://forum.xbmc.org/showthread.php?tid=108304. Please feel free to make any comments you think are needed.


- t0mm0 - 2011-08-21

DragonWin Wrote:It's getting there, at least I seem to have the US proxies under control now. Below is a sample output of my ehmm Eek I think it's called dict in python but hash with hashes Cool

As you can see it contains more information than what it just required right now, but it might come in handy at a later stage, like making a filter on proxytype (HIA = high anonymity, ANM = Anonymous, NOA = no anonymity at all)
looks like you're making some interesting progress there!
rogerthis Wrote:I posted here http://forum.xbmc.org/showthread.php?tid=108110 about adding metadata and watched status through to addons, like favourites is.

I think both functionality would be a big advantage to our addons. It would save a lot of work on our side, and it is something that would hopefully be easy enough to implement on there side. I might be the fact that they are presently blocking it to addons, I don't know.

I have put it up as a feature request http://forum.xbmc.org/showthread.php?tid=108304. Please feel free to make any comments you think are needed.
i don't know enough about the metadata stuff but the watched status would be really nice. i can't think of a way of doing it in the addon itself other than marking as watched as soon as you press play (using setResolvedUrl() you can't specify what player is used (i posted a thread about that a while ago) so i don't think you can use your own sub-classed player to mark something as watched when it gets near the end) but maybe that would be ok with an option in the context menu to 'mark as unwatched'?

i'm afraid i haven't done much on this code today as i've been slacking Wink

one thing i did try and fail at was using stacking for multi-part videos. looks like you can't use stack:// on plugin:// urls and i guess links would expire if you resolve them first. i guess i'll have to do something with playlists unless someone has any ideas?

t0mm0