Proposal: Voice Commands for Kodi.
#1
Voice Commands for Kodi.

Motivation: I can never find my remote. Could be used for home automation plugins.


Name: Cole

forum/e-mail: posting email on a public forum is not a good idea. pm me please

Summary: As an alternative to using keyboard or mouse, I would like to add the ability to use voice commands for interface control.

How will I achieve this:

Step 1: Investigate pros/cons of publicly available APIs that can be implemented. Online may be a possibility, but the Google API is limited to 50 calls per day, and having a microphone in the living room that communicates with the outside world makes me want to put on a tin foil hat. Kaldi looks like a good option and looks like it supports GPU acceleration.
Step 2. Identify proposed command set, allow for public review and suggestion
Step 3. Plan and write extensible API
Step 4. Implementation of commands.
Step 5. Implementation of Search
Step 6. Background noise filtering -- filter out currently playing audio.
Step 7. Testing, testing, testing, and more testing.

What will the project focus on: Training will be the most important part. For efficient training, a structured set of commands will need to be used. Based on the training data a model can be built. The model will continue to be refined each time a command is issued -- this is the important part. For the first few iteration of the commands the user experience will probably suck. In the future this can be solved by packaging models with distributions for common languages and dialects. An option would be to have a user 'opt in' to upload the feature vectors back to a Kodi server. Accuracy will also be greatly improved since we can limit our vocabulary to the meta-data in the library.

This will improve on other implementations by using deep neural networks rather than GMMs.
http://static.googleusercontent.com/medi.../41176.pdf

This will also be an "always on" implementation. A listen command such as "OK Kodi" will be issued before the requested action.


Benefits: All users will derive benefits from this project. A particular benefit will be seen by persons with disabilities.

Goals: Basic implementation that is well documented

Requirements: C++, python, microphone, license compatable API

Possible mentors:




Future contributions that can build on this frame work include

1. Implement Kinect/OpenNI sensor support for gestures/occupancy detection/person identification.
2. Subtitles from audio and translation.


I am currently an undergrad CS major. I am currently employed by the Naval Research Lab -- I was kept on as a student contractor after my internship last summer. My work focuses on robotics and interaction between sensors and how their importance changes depending on environment. More specifically I do a lot with RDMS, GIS, computer vision, and learning. I am most comfortable with Python, Java, and C++, but will use the language that best fits the job as long as it is not assembly.

http://www.github.com/colek42
Please note, most of my "good" code is closed source. I am looking to start doing some contributions to the OSS community.

Let me know if you are interested and I will refine the proposal and submit to GSOC

Edited for refinement
Reply
#2
Wouldn't it be a good idea to add an API in Kodi (or extend one of the existing ones) so that developers could make binary client addons for many different voice recognition frameworks?

Other than "always on" for home automation addons I'm not so much interested in voice commands for controlling Kodi as media player, however a voice search feature on the other hand would be great! I think specifically voice search would be a very nice feature to have in Kodi, and Amazon Fire TV and Google Nexus Player (Android TV) have already proved the concept.

http://forum.kodi.tv/showthread.php?tid=199486

Amazon Fire TV at least have a pretty cool "Voice Search" feature which shows how voice command adds accessibility to the search function in a media player

This of course relate both directly and indirectly to this other old thread discussing speech recognition and voice control http://forum.kodi.tv/showthread.php?tid=9280

As you already noted, now days there are also ways of accessing third-party open APIs online such as Google Speech and Apple HomeKit / Siri. And there is also Zypr API which aggregates many of those third-party APIs for voice commands https://www.zypr.net

Even so, it would still be nice to have support for offline open source speech recognition engines like CMU Sphinx too, as kodivc does http://forum.kodi.tv/showthread.php?tid=123621
Reply
#3
Amazon works well because it can use a specific dictionary of all of the movies available in the library. Just using the google API without restraining it to a set or keywords would not have great results. By contraining the set of possible matches to the movies and TV shows in the library results would be better. Also note, the way I understand it, to use the Google API users would have to go through a complicated set of instructions enabling the API, and entering the API key into Kodi. Even then the maximum searches per day is 50 per API key. The other option is for Kodi to pay for API usage or only use a remote app for the voice commands since we can access the API through google play services.

The Kaldi framework is more efficient than CMU Sphinx as it supports feature matching through DNN.
Reply
#4
well, it wouldn't be an issue if remote apps would use the voice-2-text features of the according OS (Google voice search, Siri, ...) and then simply forward the preparsed command to Kodi (this should already be possible). But adding a native API is a nice idea as well. Just make sure that the "backends" are implemented as add-ons (Google-Voice API add-on, Kaldi add-on, ...).
Reply
#5
This is an interesting idea but I have two thoughts, first off I agree with RockerC that this should be done with an API and a Plugin style setup that way it can be maintained outside of Kodi releases (plus modularity is always good for large feature set programs like Kodi) this would also make sense because there is already an idea here on the GSoC forum for adding a microphone feature to Kodi's audio engine. These things kinda all tie in to each other in some way.

My other thought is to your idea to use deep neural networks for voice recognition, it a problem of performance. Neural nets are fast once built but building them takes A LOT of power (like melt a high end CPU with one data set power) and Kodi runs on things like the Raspberry Pi, not much room to spar for things like that. This is also we're having a plug in like approach makes sense, someone could make a plug in where a neural net based voice recognition plugin send sample data to a more powerful machine (on the network or cloud) that can handle updating the sample vectors with a larger and larger data sets. People could also build plugins for use with Google, or amazon systems, etc.
Raspberry Pi Model B 2 1024MB @ 1.0Ghz w/OSMC
--Decommissioned-- Raspberry Pi Model B 512MB @ 1.0Ghz w/ 3TB USB Drive Running Open Media Vault
Reply
#6
(2015-03-24, 23:12)poplap Wrote: This is an interesting idea but I have two thoughts, first off I agree with RockerC that this should be done with an API and a Plugin style setup that way it can be maintained outside of Kodi releases (plus modularity is always good for large feature set programs like Kodi) this would also make sense because there is already an idea here on the GSoC forum for adding a microphone feature to Kodi's audio engine. These things kinda all tie in to each other in some way.

My other thought is to your idea to use deep neural networks for voice recognition, it a problem of performance. Neural nets are fast once built but building them takes A LOT of power (like melt a high end CPU with one data set power) and Kodi runs on things like the Raspberry Pi, not much room to spare for things like that. This is also we're having a plug in like approach makes sense, someone could make a plug in where a neural net based voice recognition plugin send sample data to a more powerful machine (on the network or cloud) that can handle updating the sample vectors with a larger and larger data sets. People could also build plugins for use with Google, or amazon systems, etc.

The DNN training would have to be done during idle time. We trade high preprocess time for low overhead and data storage later. I think it works well for something like Kodi. I know my system is always on, but then again maybe I'm the outlier.
Agree with the modularity aspect.

Quote:Kodi runs on things like the Raspberry Pi, not much room to spare for things like that.
I don't even want to know what would happen if I tried to train a DNN on a Pi No
Reply
#7
(2015-03-24, 21:53)da-anda Wrote: well, it wouldn't be an issue if remote apps would use the voice-2-text features of the according OS (Google voice search, Siri, ...) and then simply forward the preparsed command to Kodi (this should already be possible). But adding a native API is a nice idea as well. Just make sure that the "backends" are implemented as add-ons (Google-Voice API add-on, Kaldi add-on, ...).

Do most people use their phones as remotes? I use a small wireless keyboard with a trackpad. In my use-case it makes no sense to have search on a phone. By the time I pick up my phone, unlock it, open the remote app, and click on a.....well you get the point. I want to be able to yell at my TV Kodi Shut Up! If my wife is watching Dancing w/ Stars or some other crap.
Reply
#8
(2015-03-25, 02:39)colek42 Wrote:
(2015-03-24, 21:53)da-anda Wrote: well, it wouldn't be an issue if remote apps would use the voice-2-text features of the according OS (Google voice search, Siri, ...) and then simply forward the preparsed command to Kodi (this should already be possible). But adding a native API is a nice idea as well. Just make sure that the "backends" are implemented as add-ons (Google-Voice API add-on, Kaldi add-on, ...).

Do most people use their phones as remotes? I use a small wireless keyboard with a trackpad. In my use-case it makes no sense to have search on a phone. By the time I pick up my phone, unlock it, open the remote app, and click on a.....well you get the point. I want to be able to yell at my TV Kodi Shut Up! If my wife is watching Dancing w/ Stars or some other crap.
Not most people, but many. Also the remote can have a lockscreen widget with an instant recording button (if OS allows it). I'm just saying that there can and will be different usecases and that it's best to make it modular/add-on driven. So basic logic/API in core and specific implemetation/backend in add-ons.
Reply
#9
@da-anda I tried in Yatse to put advanced voice commands and stuff like that, but Google Siri and all those are limited as explained in one of the post by the fact they do not have the listing of the media for better search.

I've done tons of things to make it more or less works good, but something directly in Kodi would be way better as often more processing power than a phone and more storage space.

Anyway what would be cool if this happens is that remotes could send via JSON a wav or raw sound for the parsing to occurs.
Reply
#10
(2015-03-25, 17:50)Tolriq Wrote: @da-anda I tried in Yatse to put advanced voice commands and stuff like that, but Google Siri and all those are limited as explained in one of the post by the fact they do not have the listing of the media for better search.

I've done tons of things to make it more or less works good, but something directly in Kodi would be way better as often more processing power than a phone and more storage space.

Anyway what would be cool if this happens is that remotes could send via JSON a wav or raw sound for the parsing to occurs.

I think most phones have enough power to be able calculate the feature vector (PCA/ PLCA?) of the sound and send that rather than the entire wav.
Reply
#11
I have no idea what you are talking about but I believe you :p

As long as user can use phone mic that will avoid all the noise when HTPC is far far away.
Reply
#12
I don't either! Thanks for your work on Yatse.
Reply
#13
I have made a project that adds voice commands to Kodi using node.js.
It supports basic commands like stop, play, up, down, left, right, but can be easily extended.
You may find it useful.
https://bitbucket.org/FREEZX/xbmcspk
Reply
#14
Another obstacle to consider is if you also want to make a framework model that in the future can support internationalization to enable foreign languages, as not everyone speaks English well

https://github.com/kempniu/kodivc/issues/3

(2015-03-28, 11:44)FREEZX Wrote: I have made a project that adds voice commands to Kodi using node.js.
It supports basic commands like stop, play, up, down, left, right, but can be easily extended.
You may find it useful.
https://bitbucket.org/FREEZX/xbmcspk
Also checkout http://forum.kodi.tv/showthread.php?tid=123621

kodivc / xbmcvc is another open source project that is an application for controlling Kodi with simple voice commands via Kodi's JSON-RPC API

https://github.com/kempniu/kodivc
Reply

Logout Mark Read Team Forum Stats Members Help
Proposal: Voice Commands for Kodi.0