• 1
  • 2
  • 3(current)
  • 4
  • 5
  • 8
Voice recognition and control?, just basic!
#31
a little more on the setvolume command might help, where we use a "payload range":
Image

so you could say for example: "master volume 58" and it would generate the command volume(58) and send it through the web interface.

you can also use a "payload list", which is a list of strings separated by commas, which you could use for example for your playlists,

and we can use xml payloads, which are lists, stored in an external xml file. Each item, in the list can be assigned multiple names or triggers that translate to it's value. This is still hands on (text editor), but it works. I need to develop an editor!

it looks like this (for using x10 devices):

Code:
<PayloadsRoot>    
        <payload>
            <value>a1</value>            
            <phrase>living room lamp,lamp,lamp1</phrase>
        </payload>
        <payload>
            <value>a2</value>      
            <phrase>spot one,living room lights,living room spot</phrase>
          .
          .
          .
</PayloadsRoot>
Reply
#32
http://voxcommando.com/files/VoxCommando...swords.rar

this should connect to xbmc with user name and password. (just overwrite your old .exe)

Now I need to test not using an xbmc web password (or whatever the default setup is), but now that I've set a password in xbmc, I don't know how to remove it!

Any suggestions?
Reply
#33
It varies depending on skin and version number. I'm running the unofficial nightly build from a few days ago using the confluence skin.

In that instance, Under settings, network, http you should have a username and password field. You should be able to just click in the password box and backspace it out. I think you need to restart XBMC for it to take effect, but I'm not totally sure.

I tried a few sample commands (info, play trailer). I hit a few bumps, but I'm sure I'm overlooking something on my end. I tried before your posted tutorial, so I'll walk through it again tomorrow and see if I missed anything.

I saw a you tube video for music playback that had TTS responses. That could be awesome. Packaged with a good opensource voice package, it would be cool to read back movie/TV info summaries, music reviews etc.

I'll play around with it a bit tomorrow and see if I can get anything working. Thanks for the detailed response.
Reply
#34
arkryal Wrote:It varies depending on skin and version number. I'm running the unofficial nightly build from a few days ago using the confluence skin.

In that instance, Under settings, network, http you should have a username and password field. You should be able to just click in the password box and backspace it out. I think you need to restart XBMC for it to take effect, but I'm not totally sure.

I tried a few sample commands (info, play trailer). I hit a few bumps, but I'm sure I'm overlooking something on my end. I tried before your posted tutorial, so I'll walk through it again tomorrow and see if I missed anything.

I saw a you tube video for music playback that had TTS responses. That could be awesome. Packaged with a good opensource voice package, it would be cool to read back movie/TV info summaries, music reviews etc.

I'll play around with it a bit tomorrow and see if I can get anything working. Thanks for the detailed response.

I downloaded and installed xbmc within the last week or two and I'm using the confluence skin (I think - I never changed from the default) but after I delete the password and hit ok, it still has **** in it. Same thing with username except that is shows the letters. It works ok to change to something else but not to change to "blank".

Let me know if I can help with anything. You can send me your voicecommands.xml if there is something you want me to look at.

I have a bunch of videos of me doing stuff with Vox and EventGhost, including some tts stuff. I don't know much about opensource TTS, I was just calling the default microsoft voice from eventGhost. I added some TTS capability in WMC so you can ask what song is playing.

VC also has some basic code in it for gmail lookup and weather lookup and it can read you your new email, but I haven't created the interface for people to enter their login data, location etc. so it is still a "hidden feature"

If you can suggest a good open source TTS, maybe I can find a way to hook up to it, but for now it just uses Anna.

One feature I will add soon, is a TTS response field for each command, (optional) so you can get to say whatever you want after any given command. I'll probably let you put in multiple responses too, and then VC will pick one at random to give a slightly more AI feel to it.
Reply
#35
P.S. If I had the time right now, I would go ahead and get all the xbmc commands set up, but unfortunately I am swamped with "real" work for the time being.
Reply
#36
Okay, now that I'm playing with this, I'm finding all kinds of fun stuff to do.

I've got a bit of feedback.

First, I'm having trouble chaining multiple commands to a single phrase. Being able to do so would open the door for many cool features. In the case of a media center, a person could say "What was that", have the video jump back 30 seconds, enable subtitles, play for 30 seconds and disable subtitles.

In addition to adding multiple commands, a timing and delay system would be advantageous.

I've also noticed there are some areas where the same command would carry different meanings. If in the menu, "Back" may imply go up to the parent directory, but in video or music mode, it means jump back a few seconds. The HTTP API and the event server are both capable of returning what state the program is in, but the program doesn't know what to do with that. If there were a contextual scripting system, we could establish rules that would verify the state of the program and then respond accordingly. Not a high priority now, but you can see how such a system could increase usability.

I'll keep working on the basic remote style functions, but I'd be interested in methods of creating more elaborate functions. It could be done through scripting of course, and then a command to run a specific script, but that requires a bit too much on the user end to install everything. And then of course you don't want empty commands when a user doesn't have one script or another installed, so it's a good solution now, but not very elegant. I'll post my xml when I get a few dozen of the basic commands in.
Reply
#37
cool. I can see that we are on the same page. That's mostly stuff that I plan to do, it's just a matter of finding the time.Nod

You are right, there is currently no support in VC for macros. One day...

You can pretty much do all of that stuff (context sensitive commands, macros etc.) using EventGhost, which is what I originally designed VoxCommando to work with, when I was just looking to do stuff for myself. Unfortunately, the adoptability factor was a problem for new users so I've been trying to make VC work more as a standalone application. Newbies having to deal with VC and EventGhost at same time usually give up quickly.

For someone like you, who I would describe as a "power user" I would strongly suggest that you look into EventGhost. It is simply a fantastic program, even if you don't use VC, and especially if you aren't afraid of doing a little python programming. Vox currently broadcasts events to UDP and a simple plugin in EventGhost receives them as events. It is also possible to send commands back from EG to VC. From there the possibilities are pretty much limitless. There are some EventGhost plugins for xbmc already, though I'm not sure there is one that taps into the http interface which I guess is the most powerful. I could probably make one given a bit of time. Then again, the http in xbmc is deprecated now right? and I know nothing of json.

you said:
Quote: The HTTP API and the event server are both capable of returning what state the program is in

could you please point me to these functions in the http api?

One thing I would like to be able to do is enable and disable groups of VC commands on the fly. This would to some extent address the context situation, especially if it could be combined with some kind of macro system.

One of my first priorities is to scan the user's library for all your music and tv shows on startup so that you can jump to media by name.

My current problem is that I have suddenly got way too much paying work! Until a few days ago I had gads of free time. I will try to squeeze in what I can. If there is anything quick that I could add or improve that would help you significantly, let me know. I had intended to nail down a few new functions and then concentrate on documentation, but that will have to wait now.

I am excited to have a new person interested and I appreciate all of your great suggestions. I hope to be able to implement most of them eventually.Big Grin
Reply
#38
jitterjames Wrote:could you please point me to these functions in the http api?

I don't have specifics on it, I just remember a script from last year that parsed text via the http, it seemed a bit half-assed in the way it was implemented, but it illustrated the possibility. I'll have to dig around in the archives to find that code and see how they did it exactly. If memory serves, they hijacked a depreciated command and use it to trigger a python script that actually returned the value, but I don't know if the code cleanup would have effected that. Even if that were the case a launched script with a custom keymap entry should still work, but I would defer to someone who knows a bit more about scripting (I'm not well versed in Python).

Edit:
okay, here's an easy way to get the current window, try this in your browser:
http://127.0.0.1:8080/xbmcCmds/xbmcHttp?...GuiStatus()
*Change port if applicable, may need user and password, which can be added to the URL if known
that returns some text with the active window number. It should be easy to parse that and rip out the window number. From there a simple Execute (1) If window# = xxxxx Else Execute (2) function would do the trick.


I tried creating a launch command for xbmc, in a "Launch" folder as instructed in your video. It throws an error.
Code:
Launch error: System.ComponentModel.Win32Exception: The system cannot find the file specified
   at System.Diagnostics.Process.StartWithShellExecuteEx(ProcessStartInfo startInfo)
   at VoxCommando.Launch.LaunchApp(String applicationPath, String argDelimiter)

I have the path set explicitly "C:\XBMC_SVN\XBMC\xbmc.exe" This was copied and pasted from my desktop shortcut, so I know the path and file name are correct. The spoken command is recognized, but the program doesn't launch.
Running Win7x64
Any advice?
Reply
#39
thanks for the reference to getguistatus. I was aware of its existance, but not that it returned the current window/mode whatever. I'll check it out soon.

as for your error, can you please send me your voicecommands.xml file? (just send it to [email protected])

I don't know if you know already, but VC does keyboard emulation, and you can string keys together, which is semi macro-like. The image below shows what I was using for a situation where I was forced to do raw html coding and didn't want to do the tags by hand over and over. the idea here was to select the text and then say "under line" and it would surround the text with <u> </u> by using cut and paste.

Image

also , don't know if you know but any command can be put into loop mode by preceding it with [repeat:n] where n is the delay between loops in millisecs.
this illustration show how to do slow and fast scrolling with only 2 commands by using looping and payload lists for the directions. (pretty cool, I think! - I'm proud of this).
Image

ANY command should kill the loop. So you could create a dummy command that does nothing if you want: "stop scrolling" etc.
Reply
#40
Ok, the timing key sends is cool. That solves one issue with jumplists.
Sadly, it's a pain through the http-API to jump to a given letter. But it does support the SMS jumplists for phones. So jet's say I need to jump to "J", the 10th letter of the alphabet. I could sms-jump to 2 (A), and the nextletter 9 more times. Or SMS jump to 5 one time, twice for k, 3 times for L and so on. Basically using whatever combination results in fewer keystrokes. Less common characters may be a little tricky on the user end. If a user was looking for "Æon Flux", most would say "A E" instead of "ash", so I think I'll skip those all together and let them navigate with next/previous letter commands.

Edit:
Ultimately, I fear jumplists will have to rely on custom filters being loaded separately, jumpsms just doesn't work well for Voice command, it's too awkward.

I got the Launch command to work, it seems I had tried a previous time and didn't delete the command with the same trigger, so it was running the bad shortcut first. All seems fine now.

I'm also wondering how it will recognize starting a movie or sound file based on the name. Commands could be dumped to a new command group from the database, but that would require a restart of VC with every update.

I've ripped about 10,000 CDs to my PC (auctioned for $100 by a radio station that was reformatting Smile ) So it seems like a library with that many entries may bloat the voicecommands.xml if each artist, track, album etc were to have a unique entry. Just wondering if you have any more tricks up your sleeve for handling media by name calls.

I've got a Google spreadsheet to track my progress and document what I'm doing (makes for easier cleanup later). Google's giving me shit about sharing now, but I'll post the link here when I can so others can see how things were accomplished, add functionality, recommend a better method for executing a certain command, add phrases etc. So far, everything is being done through the httpAPI instead of using eventghost, just for the purpose of making easy for casual users to pickup.

Just rediscovered the notification system.
Code:
execbuiltin(Notification(Header,'Message',5000))
This may be useful for on-screen help with voice commands. I'll start adding them to each function as I go.
Reply
#41
Hi jitterjames

Iv just been giving vox commando a go, but i cant get it to detect my microphone. iv got a basic plug in mic and a web cam mic. both are detected by windows and i can see the levels raise in windows sound control when sound is made, so they are definatley both working. Vox commando just displays "audio warnings" under the microphone status and i get no level raising.

Any idea what i could be doing wrong?
Reply
#42
Quote:Edit:
Ultimately, I fear jumplists will have to rely on custom filters being loaded separately, jumpsms just doesn't work well for Voice command, it's too awkward.
It is unfortunate that we have to rely on sms keypad style input to jump to a fricken letter... wtf? I could probably write a special function for XBMC, where if you have a command like this: "jump frog" (you wouldn't have to say frog it could be any word starting with f) it would grab the first letter ('f') and convert it to sms numbers. I don't really know how this works in xbmc though, can you get f by pressing sms3 button 3 times? Maybe it would be better to use search, or filter, or maybe smart playlists? I'm not that familiar with xbmc. I used to have a classic xbox and used xbmc a lot, but when I went HD I switched to using mediaportal and other stuff. That was a while ago.

Quote:I'm also wondering how it will recognize starting a movie or sound file based on the name. Commands could be dumped to a new command group from the database, but that would require a restart of VC with every update.
it is theoretically possible to add new items on the fly, but I gotta be honest, it will be a long time before I can move this to the top of the list of thing I want to do, and in the case of music it would be impractical because of the time it takes to reinitialize a grammar of that size.

Quote:I've ripped about 10,000 CDs to my PC (auctioned for $100 by a radio station that was reformatting Smile ) So it seems like a library with that many entries may bloat the voicecommands.xml if each artist, track, album etc were to have a unique entry. Just wondering if you have any more tricks up your sleeve for handling media by name calls.
ya, 1000 albums is going to put a lot of strain on the system. It's not just the xml, that's not really an issue, it's going to be the time it takes to create the language model for all those names, and the memory requirements. I have about 6500 songs in my library and it takes a bit of time to load the names but not too bad. Recall still works very well, on a fast machine, my netbook can't handle it. In your case you can do artist, and *maybe* album, but requesting songs by name from a library that large is probably going to be too much for it. You'd be better off using straight dictation. This has its own problems of course in terms of accuracy, and weird spellings etc.

Quote:I've got a Google spreadsheet to track my progress and document what I'm doing (makes for easier cleanup later). Google's giving me shit about sharing now, but I'll post the link here when I can so others can see how things were accomplished, add functionality, recommend a better method for executing a certain command, add phrases etc. So far, everything is being done through the httpAPI instead of using eventghost, just for the purpose of making easy for casual users to pickup.
that's awesome. I look forward to seeing it, and contributing. Purely from my point of view, it would be best if you used the wiki on voxcommando.com (just click on guide.) It's currently totally open so you could just start a new page without even having to sign up. I haven't explored the wiki too much yet but I assume we can lock it down once there is some stuff on there that we want to protect from trouble makers.

Quote:Just rediscovered the notification system.
Code:
execbuiltin(Notification(Header,'Message',5000))
This may be useful for on-screen help with voice commands. I'll start adding them to each function as I go.
that's excellent, it could be useful for a lot of things, in particular the "alternates" feature of VC, but to really be useful we would need to be able to create multiple lines of text. Do you know if there is a way to insert newline characters? I tried \r\n but it didn't do anything.

we could also use some kind of external OSD instead but it because an issue for cross-platform stuff. It would probably be better if we could create dialog boxes or something... again, I don't really know enough about xbmc.
Reply
#43
have you set the mic that you want to use as default in sound settings?

if that doesn't work, try posting the log. the logging isn't that sophisticated yet but maybe I'll see something.

edit: the log is generated by default and is just a txt file in the same folder as the .exe
Reply
#44
Hi JJ

Yes iv got the mic selected as the default recording divice in windows. you wernt kidding when you said the logging isn't that sophisticate.

LOG

09/07/2010 12:48:28 VoxLog created:
09/07/2010 12:48:28 Starting VoxCommando, version: 0.73
09/07/2010 12:48:28 error starting directory watcher. wrong folder?
09/07/2010 12:48:28 installed language:English (United States)
09/07/2010 12:48:28 installed language:English (United Kingdom)
09/07/2010 12:48:28 Loading Command Grammar
09/07/2010 12:48:28 idle timer set for 3000 msec.
Reply
#45
ya. can you zip up your whole installation and send it to me? If that's too big, just send the options.xml and voicecommands.xml, but safer to do the whole thing.

btw you may not want your idle timer set for 3 seconds... set to 0 to turn it off, or use a larger number.
Reply
  • 1
  • 2
  • 3(current)
  • 4
  • 5
  • 8

Logout Mark Read Team Forum Stats Members Help
Voice recognition and control?, just basic!2