GSOC 2018- Interested in the project "Intro-Outro Detection"
#16
HI,
So I have been running into troubles while building Kodi(went from 14.04 to Windows and then to 16.04, where I am making progress so not a problem), hence, apologies for the delay.
I read up on how we may implement the project, we can try two ways, training using audio or video
1. Use the video: We see the frames being repeated in multiple episodes and hence mark the times, but more computationally complex
2. Audio: We use the audio being repeated, theme songs(Intro) or maybe an absence of music in multiple episodes(credits).

Hence, as far as I have understood, Kodi can help you pull data about your locally stored video files(movies or TV shows), hence the time duration for intros and outros needs to be stored somewhere and the model trained individually for all TV shows.
Are we considering analyzing the episodes in the background, and training it on the device itself?
Reply
#17
(2018-02-25, 20:09)ssaluja16 Wrote: HI,
So I have been running into troubles while building Kodi(went from 14.04 to Windows and then to 16.04, where I am making progress so not a problem), hence, apologies for the delay.
I read up on how we may implement the project, we can try two ways, training using audio or video
1. Use the video: We see the frames being repeated in multiple episodes and hence mark the times, but more computationally complex
2. Audio: We use the audio being repeated, theme songs(Intro) or maybe an absence of music in multiple episodes(credits).

Hence, as far as I have understood, Kodi can help you pull data about your locally stored video files(movies or TV shows), hence the time duration for intros and outros needs to be stored somewhere and the model trained individually for all TV shows.
Are we considering analyzing the episodes in the background, and training it on the device itself?
Not training on the device itself that's to expensive. Analyzing with the finished model is a must as we can't rely on intro and outro durations.
Reply
#18
(2018-02-25, 15:48)mohit-0212 Wrote: I know black screen detection isn't the best way to do this. I was trying my hands on, by implementing some of the literature available on this. As I also wrote, black screen detection doesn't work for all, in my case for ex, it failed for breaking bad whose intro scene starts with a black screen itself.
To be more accurate for each show, we would have to have a model for individual show. Your idea seems interesting where we take the title card for individual show, use it to train our model, and then apply it for further upcoming episodes. A proper training and implementing pipeline needs to be developed for that which makes the process for each new show easier.
 
Quote:Or do you think we can find a common factor that solves 90% of shows?

This depends on how we want our Kodi platform to work. Trade off between better accuracy and faster output. I think we should aim for better accuracy, i.e developing model for individual shows, and then try to make this process faster.  
 You can get pretty decent results if you combine both blackdetect and audio. I have just tried my ffmpeg script on 3 breaking bad episodes and it got them all. The boring task it to verify the results Tongue I used this part https://github.com/Hellowlol/bw_plex/blo...#L166-L235 Dunno if its usefull for you, but atleast its a starting point. I hope you come up for something awesome i can steal Big Grin
Reply
#19
I think that such a tool should be standalone, independent of Kodi.

Identifying intros could be done by stamping camera scene changes, which should work for most of the videos. Fingerprinting the audio might be an option.

The actual jump can be done by an EDL. It is pretty simple and can be edited by hand. It could be named something like filename_intro.edl


I feel like that the actual workflow should be like this:
1. The user marks the intro (via EDL) and sets a few parameters.
2. The program analyzes the video (by marking scene changes).
3. The program finds the intro (by comparing the framestamps with the scene changes of the episodes).
4. The program generates an EDL for each episode.

The resulting EDLs can be put in the directory of the episodes, so that Kodi can use them. Alternatively, they can be shared online.
Reply
#20
(2018-02-25, 22:13)Razze Wrote: Not training on the device itself that's to expensive. Analyzing with the finished model is a must as we can't rely on intro and outro durations.

So I have built Kodi on my system (Ubuntu 16.04), and have been playing with it.
About the actual implementation, so we just need to push the timestamps o intros and outros to the device and use them to skip, so I'll start looking on the implementation of Add-Ons to Kodi like you mentioned in https://kodi.wiki/view/Python_development.
Any advice to get started on implementing the function?
Reply
#21
(2018-02-26, 17:31)ssaluja16 Wrote:
(2018-02-25, 22:13)Razze Wrote: Not training on the device itself that's to expensive. Analyzing with the finished model is a must as we can't rely on intro and outro durations.

So I have built Kodi on my system (Ubuntu 16.04), and have been playing with it.
About the actual implementation, so we just need to push the timestamps o intros and outros to the device and use them to skip, so I'll start looking on the implementation of Add-Ons to Kodi like you mentioned in https://kodi.wiki/view/Python_development.
Any advice to get started on implementing the function? 
 I don't think it's that simple. While we should abstract one layer to basically just take timestamps, I still think we need to compute those on the device with the model, as your file might be different from mine and have it's intro in a different place.

If you want to get an addon going fast check this out https://github.com/xbmc/generator-kodi-addon
But likely you will still have to read some stuff, before you start with that.
Reply
#22
(2018-02-25, 15:53)Razze Wrote: Sounds good, maybe we can even convince some metadata providers like tmdb to ship these models for us if we have a specification that's useable for everybody not just kodi. So good api design/ focusing on standards on that part would be important.
 Hey Razze,

I have posted a proposal for the Project of enhancing the User experience inn the x86 variant of Kodi. I would just request you to kindly remark on the post. I'll be thankful. The link of post is as follows.. 

https://forum.kodi.tv/showthread.php?tid=328959

I currently dont have an option for Private message, so opted this way to get your attention going. Sorry, if this was inappropriate.

Regards,
Reply
#23
Hey

I tried another approach and the basic implementation can be found here https://gist.github.com/mohit-0212/e907e...812b9952f4.
What it does is, it reads the video file and title cover provided to it, and then compares 1 image per second of the video for the first 5 minutes. The image which matches title cover the best returns the highest score. For computing similarity, I have used an already available library method for now. For title cover images, I took apt searched images from the net, which can be stored as sort of a model for that particular show.
 
(2018-02-26, 03:39)sarbes Wrote: 1. The user marks the intro (via EDL) and sets a few parameters.
It can be improved if we let the user mark the frame till which they want to skip. This can personalise the skip feature according to user's needs and also improve the accuracy.

Also @Razze in one of your earlier replies, you mentioned this:
(2018-02-25, 15:29)Razze Wrote: always start with a title card that's basically japanese text (the episode name) above different backgrounds.
Can you give examples or name of shows where you know this happens (where the title card consist of different backgrounds)? The script uses one title cover image which usually comes at the end of each intro sequence with the title of the show, so the available similarity matching function which I used will most likely not work if the title card has different backgrounds.
In that case, we can:
(1) train a proper PR model for that show (works for different backgrounds or detect show title text on video frames...) and use that, however, as a generalised approach this might be expensive for the shows which do not have multiple title cards, so we'd need to have a workaround for that
(2) come up with a totally new approach

Your suggestions?

Also I had already compiled Kodi on my system from source and I'm going through the codebase. I'm onto reading about addon development, python scripting for Kodi, kodi-addon-generator and a bit about accessing database via Kodi python. Some tips, links or proper directions in this regard to read and try stuff out on the platform would be great.

Thanks.
Reply
#24
One example would be one piece: https://duckduckgo.com/?q=one+piece+titl...&ia=images
This shows some examples.

If we can manage to keep both approaches in one format, I'm perfectly fine with that. Not sure what we could do instead, would be pretty sad if those would not work, as this was my inspiration for this Wink

If your checking out the code base, I would suggest reading the python and jsonRPC docs. And probably joining us in our irc freenode channels at #kodi-gsoc
Reply
#25
Hey @Razze 

Which do you think will be better?
1. We let the user mark the frame till which they want to skip using an episode of a TV show and then store that frame locally and use that frame as a reference for getting intro skip times for the rest of the episodes.
(This will personalise the skip timings for each user, we'll need to make the addon code call and can compute the timings using the frame stored).
or
2. Make it where user doesn't have to interact and the addon does everything itself. In that case, supposing we go by the title cover approach, then we'd need to retrieve the title cover image (need to create a dataset) using the show name or some metadata associated with it and then run the code to get the timings.
3. something else?

Also I have joined the irc channel, I will post my queries there regarding the codebase.

Thanks
Reply
#26
(2018-03-02, 20:20)mohit-0212 Wrote: Hey @Razze 

Which do you think will be better?
1. We let the user mark the frame till which they want to skip using an episode of a TV show and then store that frame locally and use that frame as a reference for getting intro skip times for the rest of the episodes.
(This will personalise the skip timings for each user, we'll need to make the addon code call and can compute the timings using the frame stored).
or
2. Make it where user doesn't have to interact and the addon does everything itself. In that case, supposing we go by the title cover approach, then we'd need to retrieve the title cover image (need to create a dataset) using the show name or some metadata associated with it and then run the code to get the timings.
3. something else?

Also I have joined the irc channel, I will post my queries there regarding the codebase.

Thanks
 I think that with only one reference title cover it might be very hard to get a good match. So I would actually be expecting a title cover library or a pre trained model. Maybe we can even reach out to tvdb/tmdb and they might be able to distribute a trained model per show.
Reply
#27
Yes agreed. One title cover only works for those which have a consistent title cover for the whole series. So for the example you gave One Piece, it might not work correctly. For that, we'd need a more dense dataset. 
Do tvdb/imdb have such a model or an exhaustive set of title cover images for majority shows?

Also one other question, how can we handle a case where the user watches a show and we don't have an available pre-trained model for that?
Reply
#28
(2018-03-04, 23:23)mohit-0212 Wrote: Yes agreed. One title cover only works for those which have a consistent title cover for the whole series. So for the example you gave One Piece, it might not work correctly. For that, we'd need a more dense dataset. 
Do tvdb/imdb have such a model or an exhaustive set of title cover images for majority shows?

Also one other question, how can we handle a case where the user watches a show and we don't have an available pre-trained model for that?
 1. Tvdb/Tmdb would only be an option for distribution, as they are running an api either way. I don't think they have training data

2. I think we can't handle it without training a dataset, it's more likely to do the wrong thing. So do nothing instead, if you ask me.
Reply
#29
Quote:I think we can't handle it without training a dataset, it's more likely to do the wrong thing. So do nothing instead, if you ask me.
Agreed. We can probably let the user have an option to request for intro/outro times which reaches us and then we update our db (if needed). This will make it a continual update process though.

Q) How large a database can we have stored on the Kodi server?

Also, I was also thinking of trying out audio fingerprinting approach.
Here our database can be like: { tv_show1: audio1, audio2;   tv_show2: audio1;   tv_show3: audio1, audio2, audio3 .........}
Then according to the file title user plays, we get the name 'tv_show *', and using the gallery of that particular tv_show, we match the audio we get from file played by user and get our intro durations.

The above similar database structure can also be applied to the title card approach by curating an exhaustive set of title cards for respective shows. Your thoughts, which approach can be better? We can have the other as a backup/alternate method (if required).

Q) I looked at the basic video addon structure to be made for Kodi and am reading more about it. Can you guide where can I start looking for getting the video frames/audio signal from the file being played on Kodi? 

(And I downloaded few episodes of One Piece and one of the random episodes I started(ep-819) had a huge intro 2.5-3 mins, so I can get now how you got the inspiration    Rofl )

Thanks.
Reply
#30
I still think we should distribute the machine learning model that we've trained for a show. I don't know how big something like that gets, but thats something we need to figure out.

And I also think that machine learning on the audio side of things also makes more sense then doing it by strict reference, which might fails as soon as something slight changes.

As told in #kodi-gsoc theres the redenderCapture interface, but I don't know if there is something similar for audio at the moment.
Reply

Logout Mark Read Team Forum Stats Members Help
GSOC 2018- Interested in the project "Intro-Outro Detection"0