2007-10-16, 21:47
I don't know if this has been already suggested (a search for "opensubtitles scraper" here gave me no results), but I think http://www.opensubtitles.org could be a great aid to make a scraper (nothing related with subtitles, although it could be nice to be able to download the subtitles with it).
Opensubtitles has an algorithm to identify a file (it gives a hash for the file) and, based in that hash, users indicate the IMDB ID and then upload a subtitle. Right now, their database contains IDs for 411649 files, which translates into 28561 movies (any TV show episode is referenced to de IMDB ID of the whole TV show, that is one of the reasons for so few movies for so much files). Also, there are many different releases and different (foreign) titles/dubbings for the same movie.
It is of course not exhaustive but could be really valuable as an aid when a file is not correctly identified (because it is not correctly named, or there are some movies with the same title, or maybe it has not the english title).
Also, storing the hash in the database could be great to identify the movie even if it is moved to another path or renamed, since the hash is specific to the file. Also, maybe the hash stored could be used by a script to get the subtitles, without the need to calculate the hash from a samba link in python that I think is problematic...
Opensubtitles has an algorithm to identify a file (it gives a hash for the file) and, based in that hash, users indicate the IMDB ID and then upload a subtitle. Right now, their database contains IDs for 411649 files, which translates into 28561 movies (any TV show episode is referenced to de IMDB ID of the whole TV show, that is one of the reasons for so few movies for so much files). Also, there are many different releases and different (foreign) titles/dubbings for the same movie.
It is of course not exhaustive but could be really valuable as an aid when a file is not correctly identified (because it is not correctly named, or there are some movies with the same title, or maybe it has not the english title).
Also, storing the hash in the database could be great to identify the movie even if it is moved to another path or renamed, since the hash is specific to the file. Also, maybe the hash stored could be used by a script to get the subtitles, without the need to calculate the hash from a samba link in python that I think is problematic...