--> -->
GSOC 2018- Interested in the project "Intro-Outro Detection"
#46
This is the current draft of my proposal.
https://docs.google.com/document/d/1kPma...sp=sharing
Have worked and added upon the suggestions given by Razze.
It was great interacting with several Team Kodi members over the last month on IRC and this forum; hoping to carry forward this interaction in the SoC period. Smile

Regards.
Reply
-->
#47
Thanks a lot to all the Kodi mentors for accepting my proposal.
Looking forward to working with you and Kodi.

Regards.
Reply
-->
#48
Thanks for accepting mine too! Smile

@mohit-0212 Congratulations!

By the way, I'm in Delhi too and have come to IIIT a bunch of times. We should meet sometime!
Reply
-->
#49
The proposal selected for Intro/Outro Detection in GSoC-2018 uses Audio Fingerprinting which will need to be done for every new tv-show which needs to be added to the database and then the intro-outro time intervals need to be manually analysed by a human for that tv show and entered into the database(or everytime the audio-fingerprinting + scene-detection predicts the intro-outro timings it will need to be correctly manually). And plus, that in many tv-shows many a times it happens that the same audio clip which was played during intro-outro, is played during the tv-show where there is no intro-outro scene and plus audio-fingerprinting might not match that much accurately. Another approach to this problem is to make a machine learning model and train it on audio and video features of intro-outro scenes(this marking of intro-outro in the training data needs to be done once for training only) and then this machine learning algorithm will work automagically on every movie/tv-show without the need for any human to analyse the video and enter it's intro-outro time intervals in the database. The timing of intro-outro scenes also needs to be taken into account while training that machine learning model as in some movies/tv-shows intro-outro need not be in the immediate beginning/end.
Reply
-->
#50
Hey @manish_khurana 
Thanks a lot for your suggestion. We can take your approach into account. 

According to me, your idea for making a machine learning model using audio and video features (I'm assuming video frames) can work well on detecting outros, but it might not be good for intros.

So let's assume we make a simple machine learning model for intros. It would essentially be a 2-class classifier of Intros and Not-intros. So for training data for the Class Intros, we take (video frames + audio sample) of one show, convert it to some sort of feature vector and add it to our training data. Similarly we do so for several shows, take the intro video and audio, convert to a feature vector, add it to our training data of positive class Intros. To get Negative class (non-intro) training data, we take random samples of video and audio from several tv shows. So we have (some video frames + audio) over several shows for Intro class and similarly (some video frames + audio) over several shows for Non-intro class. Now we train a classifier using this data. So according to me a classifier won't be able to distinguish well between the two sets, since the two sets are essentially both video frames + audio. Accumulating Intro training set over several shows would eventually make this non-generalizable and indistinguishable from the negative training set. The classifier would at the best overfit for the training data. 
Now if we think, taking video frames into our training data for intros, would basically add not much value, as for both the classes, video frames would mostly be some scenes from the tv show itself and thus not much differentiable. Now considering if we take only the audio feature into account, this might work as the classifier might be able to tell apart songs (intro sequence) and non-song audio (rest). But this might also spur out a lot of false positives in the cases where there is a song sequence in scenes apart from intros (which can happen).
And if we assume that applying Deep Learning techniques on the acquired data might eventually work things out, then for using any of the DL techniques we'd end up collecting huge amount of training data in itself, as I don't know if transfer learning would work on any of the existing architectures for audio and video and for particularly problem statement. 
Why I thought it would work on outros was because outros have distiguishable video frames (black screen with credits), and along with audio, classifier might learn better.
I thought about the above problems for training a proper classifier for video and audio input and thus didn't go with it. I tried out the fingerprinting approach and found it fast and effective. I understand your concern about the database problem. We'll try to figure something out regarding it as we progress in the project on discussion with the mentors. Also the approach in my proposal is a very rudimentary one and is bound to improve iteratively as we proceed in the project.

If you have some positive results with your classifier approach, kindly share the evaluation protocol, metrics you used and results you obtained for them. I'd be happy to try them out and use them if successful. Thanks a lot for your suggestions.

Regards. Smile
Reply
-->
#51
Are there any weekly updates for this like the other project thread?

I enjoy reading along with the progress.
Reply
-->
#52
Hey @docwra, did not want to make a long post, so I have compiled them on docs. Here are the compiled weekly updates https://docs.google.com/document/d/13ssD...sp=sharing . Will make sure to post them here also. 

Thanks for your interest Smile
Reply
-->
#53
I have updated the doc https://docs.google.com/document/d/13ssD...sp=sharing for further weekly updates.

Thanks and Regards. Smile
Reply
-->
#54
Updated the doc https://docs.google.com/document/d/13ssD...sp=sharing with further updates.

Regards Smile
Reply
-->
#55
Just out of interest, could you perhaps explain in psudocode how the detection works? What is it looking for and how is it triggered.
Reply
-->
#56
(2018-07-25, 19:42)docwra Wrote: Just out of interest, could you perhaps explain in psudocode how the detection works? What is it looking for and how is it triggered.
 You need two episodes at the least.
We then run a scene detection on both of them, which will yield the start frame of each scene.
Then we will look for (nearly) matching scene frames between both episodes and declare those the intro (if it's in the first 5 mins) or outro (if it's in the last 5 mins)

That's without any settings changed
Reply
-->
#57
(2018-08-05, 10:51)Razze Wrote:
(2018-07-25, 19:42)docwra Wrote: Just out of interest, could you perhaps explain in psudocode how the detection works? What is it looking for and how is it triggered.
 You need two episodes at the least.
We then run a scene detection on both of them, which will yield the start frame of each scene.
Then we will look for (nearly) matching scene frames between both episodes and declare those the intro (if it's in the first 5 mins) or outro (if it's in the last 5 mins)

That's without any settings changed 
 Can those timings be somehow configurable in current state? A lot of shows start with scenes from previous episodes, then a little bit of the new, then the intro, then the episode. I tried the tool with 2 tvshows (Money Heist and The Americans) and while "intros" were detected they were wrongly positioned. The tool detected a common set of frames in the early beginning of all files (correct according to the description) but not the real intro. After your explanation I now understand why.
Reply
-->
#58
(2018-08-05, 15:57)enen92 Wrote:
(2018-08-05, 10:51)Razze Wrote:
(2018-07-25, 19:42)docwra Wrote: Just out of interest, could you perhaps explain in psudocode how the detection works? What is it looking for and how is it triggered.
 You need two episodes at the least.
We then run a scene detection on both of them, which will yield the start frame of each scene.
Then we will look for (nearly) matching scene frames between both episodes and declare those the intro (if it's in the first 5 mins) or outro (if it's in the last 5 mins)

That's without any settings changed  
 Can those timings be somehow configurable in current state? A lot of shows start with scenes from previous episodes, then a little bit of the new, then the intro, then the episode. I tried the tool with 2 tvshows (Money Heist and The Americans) and while "intros" were detected they were wrongly positioned. The tool detected a common set of frames in the early beginning of all files (correct according to the description) but not the real intro. After your explanation I now understand why. 
 sounds like you want to match with the method "longest_common", have a look at the parameters the tool offers.
I thought we wanted that as default, but in the current code base it's not the default matcher, I'm not sure why we decided for that. We should look at that again.
Reply
-->
#59
This is so awesome!

Will this eventually be written as a Kodi addon that can be run from the box on which Kodi is run (and the storage connected to it)? I envision having this run within Kodi each time I update the library with new TV show episodes and having the List View of the skin show the revised play-duration (minus the intro & outro).

I currently have a Mi Box with an external drive. I'd like eventually to get a Vero 4K+ - in either case, I'd like to run notrobo from the Kodi-playing-box instead of a separate computer.

Or would this make sense to include in a separate media-management app that maintains a media DB, scrapes info from TVDB, downloads posters, and scans/creates the EDL for each episode, then transfers the entire thing to the Kodi box? (This would be nice as I don't like to have my Kodi box connected to the internet.)

Looking forward to being able to use this! Great work everyone!
Reply
-->
#60
Well what you can do now is automate it - so if you have a tool that helps you with renaming you could just add the new tool to the chain.
But it's a command line tool, to make it easy to have it at the core of other programms. E.g. an program with an UI or Kodi itself, if we can handle the reliance on FFMPEG somehow.
Reply
-->

Logout Mark Read Team Forum Stats Members Help
-->
GSOC 2018- Interested in the project "Intro-Outro Detection"0