JARVIS system with xbmc
#1
I wanted to see if there was interested in getting a JARVIS like system that would integrate wtih xbmc. Currently there is a developer that is making a program that could fit with xbmc perfectly.

See some videos here

and visit his facebook page here

I am in no way affiliated with the developer. Just thought it would be cool to have this with my htpc setup.

show your interest by posting in this thread and I will try to get the developer over here to the forums to start discussions on the projects etc..
Nvidia Shield with Kodi 18
Reply
#2
You know, I don't especially care for speech controlled anything. I'll be impressed when "Jarvis" can understand mumbled, conversational commands and questions. Like, "Hey Jarvis, what are we doing today?" Instead of "TO DO LIST!"

Or, "Hey, shut up." Instead, "PAUSE MUSIC!"
Reply
#3
well it looks like this system is starting to get some conversational speech recognition. you can see a conversation here
Nvidia Shield with Kodi 18
Reply
#4
What's there to program? All you need to do is look at Jarvis, tell him to go pick up the remote, and have him change things for you.


For the record, I understand perfectly what you're talking about, but the original Jarvis was an actual person, not a computer as he became in the movies.
Reply
#5
We have the wives for that. Lol.
Nvidia Shield with Kodi 18
Reply
#6
Hi all, I am a representative of our J.A.R.V.I.S System team pointed out by rflores2323.

@natethomas... we share the same vision as you do! We are creating a conversational system (with the odd witty comment) as opposed to just a command system. There are a few ways to say the same thing, and a few ways for Jarvis to reply. Take a look for yourself on our site! http://www.facebook.com/pages/JARVIS-Sys...9203654039
Reply
#7
Very interesting indeed

I Shall keep my eye on this
Reply
#8
I had played with ALICE a while back, messing around with enabling it for speech. Text to speech and vice versa wasn't necessarily all that difficult a concept to make work. Making all them work together however...

Basically your TTS talked to Alice daemon, which with a slight source modification for an extra field in the AIML for a "command" would, when a pattern was recognized, send a command to the LIRC daemon, which then did whatever it was programmed to do.

I had it going for a while, using basic things like light's on, light's off, computer on/off, play music (random - never got it to recognizing playlists/movie names or anything like that...) installed/running and talking with linuxmce for some basic home automation stuff.

The great thing with Alice was that even in it's stock format, it would recognize a couple different phrases asking the same thing, so long as there were a common set of keywords... IE:

Light on = light + on or on + lights.

Would you turn the lights on...
Please turn the lights on...
Lights on...
Turn the *#*$Q# !@#!$ lights on before I whoop the ever-living #$%!@%$!@# out of you!!!

All of the above would work most of the time since "lights + on" was the recognizable keywords.

Now getting TTS to play nicely with ALICE was not easy for me and that's pretty much where I let it lay. Too many times I would be talking on the phone and inadvertently turn lights on and off, TV's on and off or pop on some random music.
Reply
#9
I think a start command would work.. like if you say the name Jarvis the program will start to listen for commands. This is the way VOX commando has it "starts command with pay attention" and it works pretty well.
Nvidia Shield with Kodi 18
Reply
#10
rflores2323 Wrote:I think a start command would work.. like if you say the name Jarvis the program will start to listen for commands. This is the way VOX commando has it "starts command with pay attention" and it works pretty well.

Using the "pay attention" and "ignore me" commands in VoxCommando works reasonably well. A timer can put VoxCommando back into ignore mode after a period of inactivity. Also using the "prefix" mode where any command must start with a special keyword can be effective. In any case you still need a relatively quiet environment unless you are using a wireless headset.

Ultimately I have found the only truly worry free practical use is with something like the Amulet remote. It actually works incredibly well even when I am listening to very loud music, but it does require you to use a hand to lift the device.

I am also waiting to see if the Kinect Microphone will be of use when the drivers are released, but I am not expecting miracles.

Looking forward a few years, for a system to be really usable by the general public will probably require a combination of a number of technologies such as facial and skeletal tracking paired with voice recognition and a powerful form of AI. For example (and I am sure this is techincally doable now - but I haven't tried it yet) the speech recognition could ignore you until you looked at the camera. This is just one technique that can be employed.

The other problem is that it is very difficult to allow for natual language speech comands, where you can just say anything. If you have a relatively small number of commands you can make this appear to work, but once you get into a library of hundreds of commands and possibly accessing 10s of thousands of "items" like the names of songs, it becomes very difficult to maintain flexibility and accuracy at the same time. The example where you can say anything you want to turn the lights on breaks down when you consider that you may end up instead listening to the song "you should have left the light on"... (OK I'm not sure that is the actual title of that Sinead Oconnor songs, but you get the idea). In this case the user must accept that they will have to adhere to a certain syntax to let the computer know what general class of command they are trying to access. With VoxCommando you can customize it how you want and use as many variations as you want, but at some point you will find that you are getting in your own way.

In the end it's not very different from the other ways that we interact with computers. If you are using the mouse and keyboard, you still have to do things a certain way, or the computer won't know what you want.

Still the learning curve is not so bad, and you can start with a printed "menu" of commands. The pay-off is huge when you want to listen to a particular song, or artist, watch a certain TV show or movie. With a few simple speech commands you have instant access. Even if you have a mouse and keyboard available, it is much faster to use voice commands, and obviously way cooler and more fun!Cool
Image
VoxCommando.com
Reply
#11
I suppose in that respect... you could have a top-menu and sub-menus that could be accessed with spoken commands.

For example, some static commands could mean literal such as, "lights on" or "lights off". Those are as simple as can be and shouldn't be misinterpreted as music, movies or other phrases.

But.. say you wanted to listen to music using your above example...

"Jarvis, let's listen to some music..."

Jarvis responds and from this point on, it is in the music sub-menu where other commands wouldn't be matched.

"Jarvis, sinead oconner, ...lights song..."

Playing blah blah blah... Would you like me to queue the entire album?

"yes."

Queuing...

Now... if you wanted to dim the lights...

"Jarvis, exit music..."

Would you like to leave the music playing?

"Yes."

[exits music submenu]

Turn the lights to 40%.

[lights dim]


See where I'm going?


Depending on the sub-menus available, it might even be pretty easy to assign priority of one submenu to another... so if in doubt and Jarvis detects that there are two matches, using your lights example above, and you have the environmental sub-menu giving priority, it could always choose lighting over music.



One thing I wanted try but never got around to it was using something along the lines of simple x10 cameras and setting them up as zones within linuxmce, which was completely doable. Using the mic built-in with those cameras it shouldn't be too complicated to pass that audio to your Jarvis system with a zone tag. That way if you were in a room and wanted to turn the lights on, you could just say "lights on" in the room, the mic would pick it up and respond. That paired with a set of room speakers and multi-zone amp and you could really control most of your house from anywhere.
Reply
#12
fryed_1 Wrote:I had played with ALICE a while back, messing around with enabling it for speech. Text to speech and vice versa wasn't necessarily all that difficult a concept to make work. Making all them work together however...

Basically your TTS talked to Alice daemon, which with a slight source modification for an extra field in the AIML for a "command" would, when a pattern was recognized, send a command to the LIRC daemon, which then did whatever it was programmed to do.

I had it going for a while, using basic things like light's on, light's off, computer on/off, play music (random - never got it to recognizing playlists/movie names or anything like that...) installed/running and talking with linuxmce for some basic home automation stuff.

The great thing with Alice was that even in it's stock format, it would recognize a couple different phrases asking the same thing, so long as there were a common set of keywords... IE:

Light on = light + on or on + lights.

Would you turn the lights on...
Please turn the lights on...
Lights on...
Turn the *#*$Q# !@#!$ lights on before I whoop the ever-living #$%!@%$!@# out of you!!!

All of the above would work most of the time since "lights + on" was the recognizable keywords.

Now getting TTS to play nicely with ALICE was not easy for me and that's pretty much where I let it lay. Too many times I would be talking on the phone and inadvertently turn lights on and off, TV's on and off or pop on some random music.

That's good work! On our J.A.R.V.I.S System facebook page, we have concentrated primarily on the intelligence technology. The various ways of inputting (speech-to-text, kinect, etc) are being developed as we speak and will improve in time. We want the engine to be ready when that happens. For now, we are using what technology is available to us. I think that you meant getting Speech-to-text to play nicely with ALICE wasn't so easy as opposed to TTS. As has been mentioned, a 'trigger' should solve that problem (e.g. calling Jarvis' name).

In our case, we have made use of the Android by creating a prototype app. Your phone, you tend to carry around with you, so it acts as our speech-to-text/microphone in situations where you're not actually using your computer. See our second video demo - http://www.youtube.com/watch?v=SaMHW6Rr2VQ
Reply
#13
fryed_1 Wrote:I suppose in that respect... you could have a top-menu and sub-menus that could be accessed with spoken commands.

For example, some static commands could mean literal such as, "lights on" or "lights off". Those are as simple as can be and shouldn't be misinterpreted as music, movies or other phrases.

But.. say you wanted to listen to music using your above example...

"Jarvis, let's listen to some music..."

Jarvis responds and from this point on, it is in the music sub-menu where other commands wouldn't be matched.

"Jarvis, sinead oconner, ...lights song..."

Playing blah blah blah... Would you like me to queue the entire album?

"yes."

Queuing...

Now... if you wanted to dim the lights...

"Jarvis, exit music..."

Would you like to leave the music playing?

"Yes."

[exits music submenu]

Turn the lights to 40%.

[lights dim]


See where I'm going?


Depending on the sub-menus available, it might even be pretty easy to assign priority of one submenu to another... so if in doubt and Jarvis detects that there are two matches, using your lights example above, and you have the environmental sub-menu giving priority, it could always choose lighting over music.



One thing I wanted try but never got around to it was using something along the lines of simple x10 cameras and setting them up as zones within linuxmce, which was completely doable. Using the mic built-in with those cameras it shouldn't be too complicated to pass that audio to your Jarvis system with a zone tag. That way if you were in a room and wanted to turn the lights on, you could just say "lights on" in the room, the mic would pick it up and respond. That paired with a set of room speakers and multi-zone amp and you could really control most of your house from anywhere.

We have obviously run into this problem a few times. And your solution tends to be the way to handle it for the most part. (The sub-menu concept you were referring to). This is simple called 'context'. During our development (http://www.facebook.com/pages/JARVIS-Sys...9203654039) we try to remember that although Jarvis is a computer, we are trying to simulate talking to a 'person'. You tend not to jump from one subject to another when in conversation with a person, so no need in us wasting time trying to make Jarvis understand in that situation.

A very simple example is when you answer 'yes'. It can be to a number of things. In our next video we're going to introduce a feature that involves referring to something by position (i.e. "the second one"). We opted for being specific (i.e. "the second article"), but will eventually make an allowance for using context to decide what 'one' or 'it' or 'that' is (as we have in other circumstances).
Reply
#14
Voice control for XBMC is already here and bloody brilliant. Not sure if any one already mentioned Vox commando, but below is a video of me demonstrating controlling my lights. if you look through my videos you'll see some early demos of me controlling XBMC too. I didn't make this software, an awesome fellow in Canada called James did.

It's ridiculously easy to set up and costs about £15...

http://www.youtube.com/watch?v=z1_5BUbcBgg

Thanks,

-P

and... I'm an idiot. Just noticed James is already in this thread and Vox Commando is already mentioned :p

It is the way forwards though, if you haven;t tried it then I strongly recommend you do.

-P
Reply

Logout Mark Read Team Forum Stats Members Help
JARVIS system with xbmc1