GSOC 2018- Interested in the project "Intro-Outro Detection"
#1
Hi

I am an undergrad student in Computer Science.
I found this project quite interesting as it can improve the user experience of the Kodi platform by adding a simple utility of auto-skipping intros. Also, doing this involves machine learning which aligns well with my interests. I'd like to work on this and know more about your requirements and how to start.

Regards.
Reply
#2
Well the way I envisioned it is as a python addon which can ask some backend service in the internet for a machine learning model for a certain show/movie. If it has one, download it and apply it to a file.

So the work would be:
- Doing the backend service, which I would do last, as it has the lowest risk and unknown variables
- Looking at python addons to see if we have enough hooks into kodi to solve this -> if we have write our addon
- If that turns out to not be a solution either add the missing parts in c++ or write the whole thing in c++. So you could also start by looking at kodi core code.
- Create something to train a model of a file easily and upload it, we might even be able to use something that's already done for this

So you should probably start looking at how to do python addons, the api possibilities and maybe even the c++ code enabling that.
Reply
#3
Hi

So I have cloned the repo and trying to build it. (faced some issues on Ubuntu 14.04).
 
Quote:- Doing the backend service, which I would do last, as it has the lowest risk and unknown variables.
Agreed. Once we have everything setup, then we can proceed on doing the backend call service.
 
Quote:- Looking at python addons to see if we have enough hooks into kodi to solve this -> if we have write our addon.
Can you elaborate a bit on what you mean by this? Being new to the Kodi environment, some help on where to start looking and begin from will really be helpful.

I have started to look https://kodi.wiki/view/Python_development for python addon development as you mentioned.
 
Quote:- Create something to train a model of a file easily and upload it, we might even be able to use something that's already done for this.
Yes, surely we can look at the existing literature on this and recreate it for our use.

As I can recall, Netflix does have this skip intro and credit feature for most of their shows. And I think we are trying to build something similar for Kodi platform. (correct me if I've got it wrong).

Regards.
Reply
#4
Hi,
I am a undergraduate student studying I.T. Engineering.
I am quite interested in this project, and a program to detect intros was something I have been planning to do since a long time, and it would be great if I could contribute to Kodi.
Further, I am quite interested in the application of ML in the project, and would like to work for the same.
Reply
#5
(2018-02-16, 12:14)mohit-0212 Wrote:  
Quote:- Looking at python addons to see if we have enough hooks into kodi to solve this -> if we have write our addon.
Can you elaborate a bit on what you mean by this? Being new to the Kodi environment, some help on where to start looking and begin from will really be helpful.

I have started to look https://kodi.wiki/view/Python_development for python addon development as you mentioned.
 
Quote:- Create something to train a model of a file easily and upload it, we might even be able to use something that's already done for this.
Yes, surely we can look at the existing literature on this and recreate it for our use.

As I can recall, Netflix does have this skip intro and credit feature for most of their shows. And I think we are trying to build something similar for Kodi platform. (correct me if I've got it wrong).

Regards. 
 What I mean by hooks. You can communicate with kodi via two ways from addons via the python interface or via jsonRPC. Depending on which solution would be targeted we need different "hooks". So if the python addon needs to process the video frames, we need to see if we can get these on one of those ways. Or is it acceptable/fast enough to ask kodi for the filename and fetch it via the python addon and do all processing there - then only send the into times to kodi. Speaking of, there is no api to set something like intro or ending now. We do have bookmarks/chapers but that's not 100% the same. So it might be nice to extend that.

Yes, we're essentially building what Netflix has, but I'm pretty sure they set these values by hand for each show. Which is okay if you only need to do it once for every episode. But as we don't want to ask a central server for timestamps every time we play something I found this solution to scale better.
Reply
#6
Maybe start reading up here then Smile
https://forum.kodi.tv/showthread.php?tid...pid2705372

Also slightly ML is https://forum.kodi.tv/showthread.php?tid=328511
Reply
#7
I would definitely be interested in this.  I know someone is trying to make a service for Plex that does this capabillity.  There is a github for it found here and a reddit discussion about it found here
Image
Reply
#8
Hi

I was going through some papers I could find related to this topic.
One of the first paper that I came across was this https://dl.acm.org/citation.cfm?id=2661714.2661729

It tries to detect intros/outros using two types of input available to us, video frames and audio signals.
  • Video Frames: Usually there is always a black screen just after the intro ends, so it averages over the greyscale values for all video frames upto a few minutes and then takes the minimum of these values. Since the black screen will have the lowest intensity value, we get the time when our intro ends.
  • Audio Signals: So the above black screen is accompanied with a 0.5-1 second of silence gap. They take the root mean square of sound energy and zero crossing rate upto a few minutes and then compare their values and if both of them are below a 'certain' threshold for 0.5-1 second, then that gap is termed as the silent gap.
Above is for a single episode, but can be averaged out for the whole series if the intro/outro for that series are around the same timings.

I tried to implement this in a very primitive manner. The rough script I wrote can be seen here https://gist.github.com/mohit-0212/31ffa...9f845b51a3. This involves only the use of video frames till now, as I could not figure out the certain value for that threshold for audio signals. I also wrote a separate script for audio signals (https://gist.github.com/mohit-0212/540bf...b88e68ff4b) which just spits out the graph for sound energy rms and zcr for now, but can be made useful if the threshold pattern is estimated. I have written it using the python libraries available.

So I tested it (using only video frames) on a few TV show videos I had and it worked roughly fine for most of them. I have taken the initial 5 minutes of each TV show to account for intro sequence.
One of the exceptions was Breaking Bad intro sequence which is very short and most importantly begins and ends with a black screen, so the time of intro end for this video was detected at what was actually intro's start time.
Another problem I had was that some of the episodes used to 'start' with a black screen, so the time it gave out was ~0.x sec which is not correct, so I filtered that by ignoring the first 15 seconds of the video file.
Also, looking at the video frames takes some time and it increases with the video resolution a bit, so we need to figure out how would we like this service to function for a Kodi user. (i.e how to efficiently carry out the processing with the minimum(or no) delay for the user).

Overall it was a decent method for the initial attempt and can be refined by further discussions. Or we can come up with a totally different approach, whatever suits our requirements for the platform.
Any input/feedback from your side on this or anything you want me to do/look at would be great.
Reply
#9
Thanks for the tips. Ill try to run your scripts vs the solution im using today. Too bad the pdf was behind a paywall. Sad
Reply
#10
Hey
What sort of solution are you using currently? 
Also the script was tested only on a handful of episodes of some series, so I can't guarantee correct results for all now, as mentioned above the cases where it doesn't work.
Reply
#11
Edit wrong thread..
Reply
#12
(2018-02-24, 01:30)mohit-0212 Wrote: Hey
What sort of solution are you using currently? 
Also the script was tested only on a handful of episodes of some series, so I can't guarantee correct results for all now, as mentioned above the cases where it doesn't work.
Im using audio of the theme song to identify the end of the intro. You can find the code here https://github.com/Hellowlol/bw_plex

Heres ffmpeg script that uses ffmpeg filters blackdetect with silencedetect: https://gist.github.com/Hellowlol/96e4e7...e9f5883fdf
Reply
#13
To be honest I don't think I will personally be happy with a black screen detection. While that might be the quickest way to solve this, it will also fail on quiet some episodes.
The reason I had this idea is one piece, which always start with a title card that's basically japanese text (the episode name) above different backgrounds. So my idea would be to do a model for one piece and letting an ai learn how that page looks. It would get a training set of those and should be able to figure out the pattern. Then we could take that model and apply it on the first 10 mins of the show (probably only take a frame every 10 sec or so). And figure out where the title card is.
In the case of one piece it might even be better to do this by audio. Anyway we will probably need to configure that by show?

Or do you think we can find a common factor that solves 90% of shows?
Reply
#14
I know black screen detection isn't the best way to do this. I was trying my hands on, by implementing some of the literature available on this. As I also wrote, black screen detection doesn't work for all, in my case for ex, it failed for breaking bad whose intro scene starts with a black screen itself.
To be more accurate for each show, we would have to have a model for individual show. Your idea seems interesting where we take the title card for individual show, use it to train our model, and then apply it for further upcoming episodes. A proper training and implementing pipeline needs to be developed for that which makes the process for each new show easier.


Quote:Or do you think we can find a common factor that solves 90% of shows?

This depends on how we want our Kodi platform to work. Trade off between better accuracy and faster output. I think we should aim for better accuracy, i.e developing model for individual shows, and then try to make this process faster.
Reply
#15
Sounds good, maybe we can even convince some metadata providers like tmdb to ship these models for us if we have a specification that's useable for everybody not just kodi. So good api design/ focusing on standards on that part would be important.
Reply

Logout Mark Read Team Forum Stats Members Help
GSOC 2018- Interested in the project "Intro-Outro Detection"0
This forum uses Lukasz Tkacz MyBB addons.