Scan external subtitle files to video library (not only subs inside video container)
#1
Hi all,

since I also would like to have the information about subtitles in the video library also for external subtitle files (as feature request http://trac.xbmc.org/ticket/8472 suggests), I was browsing through the code a bit too see if this would be very hard to implement, and concluded that it should be possible (even for me). I'm not really familiar with the xbmc code, but I was thinking of adding the code in

Code:
CDVDFileInfo::DemuxerToStreamDetails(CDVDDemux *pDemux, CStreamDetails &details, const CStdString &path)

where the path of the file should be available, so the subtitle files should be detectable. Then I could probably reuse some code from

Code:
CDVDFactorySubtitle::GetSubtitles
or
Code:
CUtil::CacheSubtitles

Could maybe someone with more experience comment whether I'm looking in the right direction?

Thanks!
Reply
#2
That's basically the right direction. When I wrote the hdflagging stuff, I looked at including external subtitles as well. I was sort of pressed for time so this feature was "delayed" to get hdflagging usable, and I've never had time to go back to it.

To answer your question, yes, you do want to use CUtil::CacheSubtitles, however that function also copies the subtitle file out of its location (be it in a subdirectory, or inside a ZIP/RAR file, etc) into the "cache". The cache, if I remember correctly, is a directory that contains support files for the currently playing media. Obviously, this doesn't work very well for the scanner threads, considering there can be multiple of them and someone might also be watching a video while the scan is taking place.

So what would have to be done is split the caching from the CacheSubtitles() function, probably change it to be more like ProbeSubtitles() that returns maybe vector<CStdString> of the files it found or something. There are problems here as well.

If you don't copy subtitles to a common location, now when actually playing the media, if the subtitles are stored inside a RAR/ZIP would require the player to re-open the archive and copy them out which could be a costly operation. Also, I'm not very familiar with subtitle formats but I assume even the stream probe would need to open them to determine the language. So I guess you'd need to copy any that are inside an archive to someplace common for probing, but leave real files where they are, then clean up the copies after you probe them. The copies can't go to "cache" for the reason above, so they'll have be copied to XBMCs temp I reckon.

Also I think there was an issue about the subtitle reader returning the wrong country codes. 3 vs 2 vs long name or something, I forget.

As for where to put your code, I think it would go up (the call stack) from CDVDFileInfo:Big GrinemuxerToStreamDetails in the ... was it called the thumb generator thread?
For troubleshooting and bug reporting please make sure you read this first.
Reply
#3
Thanks for your reply, I'll try to split it in probing and caching.

If we extract subtitle archives in the probing function to a temp dir, I guess we'll have to do that probing also each time you play a movie file. That's probably better than if we rely on what's saved in the library anyway. Should it also update the library each time that happens? Or would that generate too much overhead? Anyway, I'll see how far I get and keep you up-to-date. It will probably take some time (next weekend at the really earliest).
Reply
#4
the caching of subs are just a remnant from the xbox days. there is no reason why we have to do it these days, and i would much prefer a list of url's being passed from the subtitle prober.
Reply
#5
sergej Wrote:If we extract subtitle archives in the probing function to a temp dir, I guess we'll have to do that probing also each time you play a movie file. That's probably better than if we rely on what's saved in the library anyway.

IMHO, that is the best course of action, probing for external subs every-time the file is played. Maybe just to check if they have change from what the db says.

Quote:Should it also update the library each time that happens? Or would that generate too much overhead? Anyway, I'll see how far I get and keep you up-to-date. It will probably take some time (next weekend at the really earliest).

Also, I don't think it will hurt much if the db is updated each time is played and new subs are found. I think that is how the other streamdetails are handled, (although they are probed in the background).
Reply
#6
sergej Wrote:If we extract subtitle archives in the probing function to a temp dir, I guess we'll have to do that probing also each time you play a movie file. That's probably better than if we rely on what's saved in the library anyway. Should it also update the library each time that happens? Or would that generate too much overhead?
Well the player already does the probe every time you play a file, which is where the CacheSubtitles() function is currently called. That would stay the same, except for the fact that it shouldn't cache at all any more as per spiff's recommendation (do anything this man says).

Also if you look when in... somewhere like Application.cpp or something, I forget, when a video is stopped the stream details are stored before the file is closed (not always though, I think only if the background probe is disabled or something). If you want to update the database with an updated list of subtitles, this would be the place to do it probably.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#7
Thanks for your inputs. It's going ok so far, however, when parsing the vobsub subtitles, I get 2-character language codes ("en", "fr", ...), but in mkv streamdetails for example, they are 3-characters ("eng",..). In what format should it be stored in the database? Is there some conversion table, or possibly already some code in xbmc?

As for subrip or similar subtitles, (e.g. movie.english.srt), do you have any ideas how I should parse that to get the correct language from the filename? Users could have named them .english.srt, .en.srt, .eng.srt... or what does xbmc currently assume? I guess something like:
"en" or "eng" or "english" -> store as "eng" in database (if the 3-char is indeed the one to use) would be reasonable.
Reply
#8
Yeah I standardized on using the 3-character codes for audio languages since that's what most containers which contained language info used and the latest ISO spec was for. The ISO spec is 639-2. There is utility code in XBMC for switching between them, see CLangCodeExpander in xbmc/utils/LangCodeExpander.cpp.

I can't remember for sure, but I also think there is utility code for determining the language of the file, which returns a 2-character code, because that's what is in the the language/[name]/langinfo.xml uses to match. You may want to ask some of the actual developers about standardizing on 3-character codes across the board? Someone might take my head off for suggesting that though. Either way, the info in the streamdetails database should be 3-character as that's what skinners and users who use language-criteria smart playlists currently expect.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#9
There is languagecodes in advancedsettings.xml which could help for conversion with some "custom" language codes to DB I guess.

Quote: <languagecodes>

Translation table for subtitle and audio names. Contains entries of the form

<code><short>alt</short><long>Alternate</long></code>

So if user has some non standard language codes in subtitle filenames he could just use these to get them imported to DB, if it's not imported by some predefined conversion table in code first.

let's say
<code><short>en</short><long>English</long></code>
<code><short>eng</short><long>English</long></code>
<code><short>engl</short><long>English</long></code>

would translate
moviename.en.srt , moviename.eng.srt, moviename.engl.srt
to English which would convert to 3char by predefined 3char language table implemented in code: English -> eng and save it to db.

Just an idea...
Reply
#10
I didn't know about the languagecodes setting, but the existing code uses it so it is kinda built-in if you leverage the existing code to determine the language of the external subtitle file.

It is good to know that case is handled in the future when a user pops up with the feature request "I have hundreds of subtitle files with the name of the language as 'en-US' and refuse to rename them." You fool! en-US is a locale not a language!
For troubleshooting and bug reporting please make sure you read this first.
Reply
#11
Just to give you a small update, and possibly get some input:

The code seems to be working now, with support to grab the language from the filename, e.g. movie.eng.srt, movie.en.srt or movie.english.srt. It should also work (not tested yet) for user-defined codes in advancedsettings, like:
Code:
<code><short>engl</short><long>English</long></code>
but only if the long code can be found in the predefined iso639-1 or iso639-2 tables. It then looks for the 3-character iso639-2 and saves that to library (eg. 'engl' -> 'English' -> 'eng'). I don't have a better solution for this. Ideally users would define their codes differently like,
Quote:<code><custom>engl</custom><standard>eng</standard></code>
but I guess it should do it for now. When the iso639-2 cannot be determined, whatever in movie.whatever.srt is saved to the library.

When no language is specified in the filename (eg. movie.srt), then the value "default" is saved to the library. I am not sure if this should be converted to the standard subtitle language while scanning and already saved to the library, I rather think it should be done while playing the file. Otherwise the whole library would have to be rescanned when the setting is incorrect the first time. Or what do you guys think?

For vobsub files (.idx / .sub ), languages are extracted from the stream.

TODO:
- if vobsub .sub file is compressed, extract it to temporary location, copy .idx file there, too, and scan for languages. could be a bit time-consuming, so maybe implement an option to disable that.
- update the library each time the movie is played (if a change happened). currently it's just done when it's added to the library
Reply
#12
I have exactly 1 video that has subtitles so I'm not qualified to say it should work one way or another (and another partial reason I didn't support the subs originally) on the "default" value.

I can tell you that to update the library when a video is played, check xbmc/Application.cpp. Just grep it for "// Save information about the stream if we currently have no data", and you can probably change it to use a bool StreamDetails::operator==(const StreamDeatails &other) const {} and just save them if they are different.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#13
sergej Wrote:TODO:
- if vobsub .sub file is compressed, extract it to temporary location, copy .idx file there, too, and scan for languages. could be a bit time-consuming, so maybe implement an option to disable that.

why? you can read them straight out of them archives, no problems..
Reply
#14
That would be great of course, but I'm not sure if the subtitle demuxer supports this.

I'm using the CDVDDemuxVobsub::CDVDDemuxVobsub method, which is also used to play back the subtitles (as far as I could see), with the following code:

Code:
CDVDDemux* subtitleDemuxer = NULL;

std::auto_ptr<CDVDDemuxVobsub::CDVDDemuxVobsub> demux(new CDVDDemuxVobsub());
if(!demux->Open( strSubtitleFilename ))
    return false;

subtitleDemuxer = demux.release();

for (int iStream=0; iStream< subtitleDemuxer->GetNrOfStreams(); iStream++)
{
    CDemuxStream *vobSubStream = NULL;
    vobSubStream = subtitleDemuxer->GetStream(iStream);
    
    CStreamDetailSubtitle *p = new CStreamDetailSubtitle();
    
    CStdString langCode;
    if ( g_LangCodeExpander.ConvertTwoToThreeCharCode(langCode, vobSubStream->language) )
    {
        // adding standard iso639-2 code
        p->m_strLanguage = langCode;
    }
    else
    {
        // could not find iso639-2 code, basically shouldn't happen for any vobsub files
        p->m_strLanguage = vobSubStream->language;
    }
    
    pStreamDetails->AddStream(p);
}

It works if you pass it the .idx file in 'strSubtitleFilename', when it has a corresponding (extracted) .sub file. It does not work however if the .sub file is inside a rar, at least not with the one I tried. Am I doing something wrong?
Reply
#15
AH. my bad, i didn't think of the case where idx and sub aren't in the same folder (that is, only sub is rarred).

this should be remedied somehow. ideally we'd pass both paths, both to the idx and the sub. how that fits in the current scheme i'm not sure, but it will definitely be needed if we plan to remove the sub caching.
Reply

Logout Mark Read Team Forum Stats Members Help
Scan external subtitle files to video library (not only subs inside video container)0