Extend the scraper interpeter/engine to scrape for subtitles
#1
Lightbulb 
I suspect this have been sugested or though of before...
How about a Subtitle scrapper thing, something like the undertexter script and opensubtitles.org, seeing the maintainer like XBMC we could probably get quite a good support from it?
If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

Image

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
Reply
#2
I like the idea, ...though as I am not a programmer I am not sure if it will we easier or harder to maintain such an API and subtitle scrapers for subtitles than the current python scripts (but on the other hand those few that exisit are not maintained very well by the community as it is).

Is it possible to extend the existing scraper interpeter/engine to download and handle subtitles without too much trouble (I mean it will need to be able to handle unrar and unzip as well)?

I know that opensubtitles.org has an open API vailable to the public and the is what the opensubtitles.org script uses.
http://forum.xbmc.org/showthread.php?tid=31499

PS! Do you mind if I at the same time bump this request for weather scrapers, which I think one could use the same API for?:
http://forum.xbmc.org/showthread.php?tid=28376
I think it would probably be ideal to use the same API/framework for subtitles as one could use to handle weather as well? Cool

Gamester17 Wrote:I like to suggest that the scraper interpreter/engine in XBMC be extended to support the weather forecast function for XBMC if possible. To scrape weather.com using a scraper XML file instead of an hardcoded function.

That way people could also write alternative scrapers for the weather function just like for movies and TV-Shows. I know for example that many people have requested that we use weather.MSN.com or accuweather.com as the source instead of weather.com (accuweather support has been requested many times before as it supports more locations outside the US, even most smaller towns in Europe), but this way the user would select for themselves which source to use for their weather forecasts.

This way the weather scrapers can also be maintained by end-users themselves (as no need to be a C++ programmer to use regex), and they can easily be updated on an existing build in case a website changes.

Would be nice to be able to utilize alternative weather forecast sites through scrapers this way, example:
weather.com
accuweather.com
weather.msn.com
HAMweather.com
Weather Underground (wunderground.com)
Intellicast (intellicast.com)
MyForecast (MyForecast.com, power by CustomWeather)
timeanddate.com (time and date.com, power by CustomWeather)
Gamester17 Wrote:Yes I like my idea much better (but better use a a new type of python plugin API instead of the scraper API):
http://forum.xbmc.org/showthread.php?tid=28376

Trac ticked for this feature request, see => http://trac.xbmc.org/ticket/4973
Quote:Create a new python plugin API for weather and make the GUI support several weather sites via those plugins

This is a slightly modified request of this:
http://forum.xbmc.org/showthread.php?tid=28376

It would be better to use python scripts instead of extending the scraper parser as suggested in that forum thread, but not python scripts that does the GUI but more like how plugins work in XBMC by only fetching the data then presenting it to XBMC and let XBMC's own GUI display it nativly.

So what I like to suggest is that a new (python) plugin API was added in XBMC, an API for plugins that will work as the back-end scrapers to download weather forecast metadata instead of having it hardcoded in source code like XBMC has today.

So XBMC also needs to be able to get that metadata via the API though the plugins and then display it nativly in the GUI (using libGUI).

This way people could also write alternative plugins for the weather function similar to how the scrapers work for Movies and TV-Shows. I know for example that many people have requested that we use weather.MSN.com or accuweather.com as the source instead of weather.com (accuweather support has been requested many times before as it supports more locations outside the US, even most smaller towns in Europe), but this way the user would select for themselves which source to use for their weather forecasts.

This way the weather scrapers can also be maintained by end-users themselves (as no need to be a C++ programmer to use regex), and they can easily be updated on an existing build in case a website changes.

What do you think?

PS! I also think it would be a great if someone would like to extend the weather function in XBMC to have the option to go into forecast details for a specific day to get an hourly listing, and options for longer forecast for say 10-days ahead. Also things like motion/animation maps, Doppler radar, and satellite images which both weather.com and accuweather support, (AccuWeather Maps and Radar's are animated GIF images, in gif87a or gif89a image formats).
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#3
I'm not to read up on the scrappers so I don't know the current limitations but from what I know it's possible to mix them, so then it might be possible to choice a subtitle scrapper for each source.
Also thetvdb have zips or rars on there which the scrappers can handle, I don't know if it's limited to one but the code is there so I suspect it might not be to much to add to the otherone aswell (again, haven't looked so It might be harder than I think Shocked)

One of the main advantages with using scrappers instead of the scripts is that we could perhaps have an automatic solution, either try to get them when loading the library or autodownload when we start a movie. Also it would be a more generic approach so it might be easier to add new ones aswell.
If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

Image

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
Reply
#4
The scraper API is pretty well documented from in the wiki from the scraper authors point of view which is great Big Grin
http://wiki.xbmc.org/?title=Category:Scraper

PS! MeediOS call their equivalent of a scraper for an "importer", does anyone know if they have importers for subtitles?
For reference; MediaPortal call their equivalent of a scraper for a "grabber", not sure if it handles subtitle downloading?

Wink
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#5
Lightbulb 
Since english isn't my primary language i often download subtitles for various sites.

My suggestion is a tweak of the current scraper-system to also download subtitles from diferent sites based of casue on a separate subtitle-scraper-xml.

I know there is some pyhon-scripts that downloads from diffrent sites but i havn't got them to work that well.
Reply
#6
I'm of the opinion that this should be done when a user requests subtitles, not during the scanning system. With that said, however, there are some subtitle sites that we can actually use to get better IMDb hits, by using their hashing schemes...
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#7
Question 
this site "www.divxplanet.com" is biggest subtitle site in turkey.also they publish good ed2k (emule) movie links Smile. i couldn't write scraper / script for this site my knowledge not enough for that.

is there anybody can write for all turkish people...

thanks for great xbmc team
Reply
#8
we don't have subtitle scrapers. i think you want a script
Reply
#9
Thumbs Up 
spiff Wrote:we don't have subtitle scrapers. i think you want a script


yes you are right, my fault. we need script for take movie subtitle from this site.

is there any guide to built it? i try to write with my limited knowledge...

i use r18899 windows edition.install opensubtitle script but it cannot use file/folders probably adress for xbox and cant test this script.if i can my be can edit for divxplanet.com.....
Reply
#10
i'd try to grab some "inspiration" from the opensubtitles one.
Reply

Logout Mark Read Team Forum Stats Members Help
Extend the scraper interpeter/engine to scrape for subtitles1