Login at Kodi Home

byron27 · (This post was last modified: 2014-03-01, 03:10 by byron27.)

Not really except that we can use custom keystrokes and probably many of us would prefer to type the name of a song or video file in search rather than using arrow keys to find items. For a good example of a media platform that already is accessible and has a lot of blind friends check into Apple TV. You can use a remote or opt to use a keyboard... although I've never used a keyboard on Apple TV. I hope keyboard use is not overlooked since I personally use XBMC on my computer and like using a keyboard.

Here's a video of Apple TV in action. http://www.youtube.com/watch?v=cQqY1325BQY

byron27 · 2014-03-01, 09:06

I don't know if this is related but I just noticed that my navigation sounds have disappeared. Are they gone because of the script you built and does anyone have this problem?

pvagner · (This post was last modified: 2014-03-04, 06:48 by pvagner.)

Hello,
Yes while using my solution where basic key bindings are overridden inside the user's keymap it kills navigation sounds unfortunatelly. I haven't yet found a way around it other than maybe I might be able to play the navigation sound as a part of the addon.
After seeing ruuk's comments I wonder whether skin is responsible to play these.
Since I am totally blind I haven't researched nor touched skins. I had a strong assumption skins are forming visual appearance mainly and I may verry likelly screw the visual appearance while tweaking one of the skins.
Yesterday I have also tried live tv with tvheadend backend in XBMC on linux. In the EPG timeline I can hear show titles however with the current approach I have no access to show times and channel names in the EPG timeline. Perhaps the only way on how to improve this would be changing a skin. So while it may add an extra level of complexity forking a skin might be easiest solution at this time.
From my goals I have described last week I am unable to find how to send keyboard events to XBMC instead of built-in actions.
Currently I have made TTS output components as well as the actual voice menus addon as a simple scripts. I am looking into a way on howto make this into a service addon. Either I need a way to store somewhere a TTS object so I will be able to retrieve it at runtime when my command is triggered as a result of a keypress or I will make it run all the time and as a result of some events it will speak the menus and other stuff.
@ruuk could you please share your experimental code? What took you some 20 minutes took me several evenings to figure out and most likelly my solution is more limited than yours I am afraid. The fact I am using speech-dispatcher and you are using festival for actual TTS output is not significant from accessibility point of view. I assume this bit can become modular enough that people will be able to choose which TTS output module they want to use.
Also can you please better describe or point me to a proper documentation on this on how to deal with skins perhaps overidding some common stuff in the confluence skin?
E.G. how to tweak simple OK dialog so it will be possible to arrow over the text displayed inside such dialog?

Thanks everyone for the hints

**ruuk** · (This post was last modified: 2014-03-04, 09:12 by ruuk.)

The code I tried before I just dropped into one of my current addons because it already had a service component so it was an easy place for me to do some tests. I intend to make a proper addon and see how well I can a access all the text elements of a skin. I have several ideas on different methods to achieve this, but I'm pretty confident it can be done. I was planning on creating a TTS base class and sub classing this with one that handles Festival because I'm familiar with it. The idea is that any number of other speech engines could be dropped in with their own sub class where possible and as needed per platform.
I will then post the addon to my repository so people can try it and give me feedback.
The code will also be on GitHub.
At the moment I am on vacation, so it will be a couple of days till I do anything, plus I have a couple of coding things I'm in the middle of. I'll try to have something available within a week to two weeks.

pvagner · 2014-03-04, 20:25

okay thanks for the reply. Most likelly I am not going to speculate about this until you give me some direction then.
I must say I'm really curious what's doable. If you think I might be helpfull in some way give me a hint please.

Thanks and greetings

Peter

**ruuk** · (This post was last modified: 2014-03-04, 22:57 by ruuk.)

(2014-03-04, 06:44)pvagner Wrote: Also can you please better describe or point me to a proper documentation on this on how to deal with skins perhaps overidding some common stuff in the confluence skin?
E.G. how to tweak simple OK dialog so it will be possible to arrow over the text displayed inside such dialog?

Thanks everyone for the hints

Labels and textboxes are not focusable controls. Often a textbox will have an associated slider that is focusable, but the slider does not reference back to the textbox. If you wanted to modify a skin to make a label focusable you could change it to a button that does nothing and edit navigation appropriately. Textboxes would be trickier because the text isn't accessed the same in code. You could perhaps have a separate button that gets it's label from the textbox.

Here's a link to the XBMC skinning manual:
http://wiki.xbmc.org/?title=XBMC_Skinning_Manual

Personally I'm not trying to react to control triggers. I'm monitoring changes from the service where I have direct access to all the infolabels, json and skin files. It will all be clearer (I hope) when I've actually got working code. Until then even I am not sure of the details Smile

**ruuk** · (This post was last modified: 2014-03-08, 04:29 by ruuk.)

I've made a start on a service addon. It will be found in addons under services and it's called XBMC TTS.
It's on my repository, which can be installed with the zip from the link below.

http://ruuks-repo.googlecode.com/files/r...-1.0.0.zip

Here's the GitHub URL for the source code:

https://github.com/ruuk/service.xbmc.tts

It speaks most of the control text on windows and dialogs.
Settings windows are a problem that I'm trying to overcome.
I haven't addressed non-control text at all.

It uses festival for the speech. There is a class in the code that uses pico2wav which works but is not enabled. I need to add some settings to the addon so it can be switched to.

I can only attest to this working on Ubuntu, since that is my dev platform.

This is just the initial work to get started and share what I've done so far.

BTW, it may need an XBMC restart to work.

**ruuk** · 2014-03-09, 02:40

Added a new version to my repository: 0.0.3

Repo

This version adds TTS via the windows internal TTS. It also automatically selects an available TTS falling back to logging if none are available.

Windows TTS should work without installing anything.

On Ubuntu:
- For pico2wave: sudo apt-get install libttspico-utils
- For festival: sudo apt-get install festival

I'm currently concentrating on the Quartz skin, and it is recommended for best results.

I can add other speech engines as long as I can use them from the command line or via TCP or such. Of course, if it's not a free program, I won't be able to do anything as the pay from this free opensource development is lousy Smile

**ruuk** · 2014-03-10, 02:40

Added a new version to my repository: 0.0.5

Repo

Added support for Flite TTS while trying to get XBMC speech on ATV2, which so far is a bust.
This version also adds settings
They are:
- 'Enable' which enables the service on startup
- 'Default TTS Engine' which selects the preferred TTS backend to use if available
- 'Voice' which gives a dialog to select from the voices in the current backend (only works in Festival and Flite)

pvagner · (This post was last modified: 2014-03-10, 20:59 by pvagner.)

Hello,
Thanks for the source and for the rapid development.
I can see TTS API is nicelly evolving.
If possible I would again bring up the original discussion where we were talking about how to retrieve needed info from the XBMC.
I think info labels are good while retrieving items from the file list, from the list of videos, audio tracks in a media library so at the end we will be able to override it for a particular content. As an example I can remember the timeline view of EPG events in the PVR. On the screen we have all the labels but the only focusable control are individual show names. Using info labels we may be able to speak other details.
As for the other windows e.g. home screen, settings screens, OSD menus etc we are using hardcoded ID mappings to their names. It may bring localization issue that we may need to localize these seperatelly from the main XBMC localization. Isn't the JSON RPC API better for this? It can return text of all the focusable controls this way. If nothing else this can be used as a fallback.
I am going to try adding my favourite tts backends and then I'll be able to test more.

**ruuk** · 2014-03-10, 20:48

(2014-03-10, 12:18)pvagner Wrote: If possible I would again bring up the original discussion where we were talking about how to retrieve needed info from the XBMC.
I think info labels are good while retrieving items from the file list, from the list of videos, audio tracks in a media library so at the end we will be able to override it for a particular content. As an example I can remember the timeline view of EPG events in the PVR. On the screen we have all the labels but the only focusable control are individual show names. Using info labels we may be able to speak other details.
As for the other windows e.g. home screen, settings screens, OSD menus etc we are using hardcoded ID mappings to their names. It may bring localization issue that we may need to localize these seperatelly from the main XBMC localization.

Well window names are not all available programmatically, and are not localized in XBMC in any case. I plan on localizing them in the addon in the future. I'm still in a sort of exploratory mode and much of the code may change. I tend to wait till things 'settle down' and then localize strings so I don't have to keep changing the localization strings when I change other things.
As for other elements what could be done is to reference the ID of the the item with a label in a map instead of hard coding a localized string. I think I could change the tables in skintables.py to use this sort of method instead. It would be nice to avoid mappings like this altogether, but not all controls that are 'connected' visually have that connection accessible through code, plus different skins use widely different layouts.

(2014-03-10, 12:18)pvagner Wrote: Isn't the JSON RPC API better for this? It can return text of all the focusable controls this way. If nothing else this can be used as a fallback.

I'm not sure that there is any info relevant to this that you can get with the JSON RPC API that you can't get with infolabels. I just find infolabels less cumbersome to work with Smile

I've used the JSON RPC API in the past when I needed to, but I find the documentation tends to be unclear, and that has made me less inclined to use it when there is an easy alternative.

(2014-03-10, 12:18)pvagner Wrote: I am going to try adding my favourite tts backends and then I'll be able to test more.

If you get anything working, I'd love to add it to the current backends in the addon.

The biggest obstacles I'm considering right now are:

Accessing the text of controls without IDs
XBMC Settings views. Much of this is dynamically generated and harder to get info from.
Interrupting speech when a control is changed. Right now the current phrase has to finish before the next one can start.
I'd like to get speech available on all major platforms. ATV2 has already proven difficult and android probably will as well.

Right now I think I'm going to concentrate primarily on accessing all the elements of the main interface and secondarily on more general access to addons and such.
I'm going to focus on making the addon work well with the Quartz skin, because it has simpler layout and I think focusing on one skin initially will be the fastest route to a usable interface.

On a side note, I'm thinking it would be useful have this addon provide a module so other addons could easily add speech, even for non-accessibility uses.
So an addon could:

Code:
import xbmctts

xbmctts.say('download finished')

pvagner · (This post was last modified: 2014-03-10, 23:19 by pvagner.)

I've got NVDA backend implemented so on Windows we are not limited to sapi5.
And what's hardcoded are window titles only. I am getting home, videos, full screen etc while using your addon however when I do use RPC API I am getting these window titles localized in slovak. So at least some of them are already translated somewhere.
Currently I am testing on windows and I have found out this even reacts to mouse movements and state changes for example when the player changes to full screen, when the wideo stops playing etc.

Questions:
- Is it reasonable to lower sleep delays? Now I feel with the current 400 ms is a little sluggish when quickly arrowing over the list.
- Is there no way to play to xbmc output audio channel? Is it neccessary to use aplay and similar?

Speculations / feature requests:
Before speaking the text it would be a good idea to interupt the speech if the TTS is already speaking something. I think we can just kill aplay / whatever is playing before speaking out a new string. On Windows NVDA does this by default and sapi5 has a method for this.
It would be nice to add a playback status keyboard shortcut what will speak aloud the title of the currently playing media for example while using party mode or playing music in shuffle mode.
A big stuff to think about. Is there an API for intercepting subtitles? Of course it has low priority but it would be nice to lower the volume a bit and speak aloud the individual subtitles as they appear on the screen.

Later on I'll move to linux and I'll try to get speech-dispatcher TTS backend working as this is what most of the blind people running linux have preconfigured already.

I will send some pull requests once I find it working well over here.

Edit:
Windows internal tts has issues with speaking non-ascii characters. The generated vbs file is utf-8 encoded however vbscript interpreter handles it asif it was ansi cp1250 in my case. I don't know much about vbscript but I can't find a way on how to make it unicode aware with some simple google search.
Edit2:
Okay VBS issue solved by generating utf-16 encoded file.
Edit3:
Sapi5 support will have to be rewritten fully in python as by spawning a new process each time we would like to speak something makes us unable to cancel the speech before speaking a new phrase. So all the phrases have to be spoken to the end.

**ruuk** · 2014-03-11, 01:45

(2014-03-10, 21:04)pvagner Wrote: - Is it reasonable to lower sleep delays? Now I feel with the current 400 ms is a little sluggish when quickly arrowing over the list.

Lowering the delay means it will try to speak too soon when you quickly change focus. Of course, if we an find a way to interrupt speech, then that would not be a problem.
If you can't interrupt, the other possibility is to lower delay, and put a second delay before it actually speaks, so that it only speaks when you stop moving. Somewhere there is an optimal set of delay(s).
Optimally, I really hope to get at least some tts backends to be interruptible.

(2014-03-10, 21:04)pvagner Wrote: - Is there no way to play to xbmc output audio channel? Is it neccessary to use aplay and similar?

You can use xbmc.playSFX() but that is positively un-interruptible, and is non-blocking. On top of that it caches the sound, so if you use the same filename it will play the last sound. You can overcome the caching by saving to a unixtime based filename (and hope XBMC doesn't keep filling memory with cached sounds), and you can overcome the non blocking by importing the wave module and using that to calculate the length of the wav file and then sleep for that duration.

(2014-03-10, 21:04)pvagner Wrote: Speculations / feature requests:
Before speaking the text it would be a good idea to interupt the speech if the TTS is already speaking something. I think we can just kill aplay / whatever is playing before speaking out a new string. On Windows NVDA does this by default and sapi5 has a method for this.

Somehow I didn't find NVDA in my searches. It sounds great for this as it is written in python.
Getting interruptible speech is an important goal. I'll be working on ways of doing this for the various backends.

(2014-03-10, 21:04)pvagner Wrote: It would be nice to add a playback status keyboard shortcut what will speak aloud the title of the currently playing media for example while using party mode or playing music in shuffle mode.
A big stuff to think about. Is there an API for intercepting subtitles? Of course it has low priority but it would be nice to lower the volume a bit and speak aloud the individual subtitles as they appear on the screen.

It is possible that the subtitles are displayed on one of the labels in the FullScreenVideo window...

(2014-03-10, 21:04)pvagner Wrote: Later on I'll move to linux and I'll try to get speech-dispatcher TTS backend working as this is what most of the blind people running linux have preconfigured already.

I will send some pull requests once I find it working well over here.

That sounds great.

(2014-03-10, 21:04)pvagner Wrote: Edit:
Windows internal tts has issues with speaking non-ascii characters. The generated vbs file is utf-8 encoded however vbscript interpreter handles it asif it was ansi cp1250 in my case. I don't know much about vbscript but I can't find a way on how to make it unicode aware with some simple google search.
Edit2:
Okay VBS issue solved by generating utf-16 encoded file.
Edit3:
Sapi5 support will have to be rewritten fully in python as by spawning a new process each time we would like to speak something makes us unable to cancel the speech before speaking a new phrase. So all the phrases have to be spoken to the end.

I haven't investigated it yet but you can use the windows internal tts directly from python. I intend to try it out soon.

Code:
import win32com.client

voice = win32com.client.Dispatch("SAPI.SpVoice")

voice.Speak(phrase)

**ruuk** · (This post was last modified: 2014-03-11, 02:45 by ruuk.)

(2014-03-11, 01:45)ruuk Wrote: I haven't investigated it yet but you can use the windows internal tts directly from python. I intend to try it out soon.

Code:
import win32com.client voice = win32com.client.Dispatch("SAPI.SpVoice") voice.Speak(phrase)

Did some testing. This works in python in windows, but you have to install pywin32. Unfortunately this does not install to the xbmc python directory, and the installer for the XBMC version of python fails because it can't find XBMC's python version in the registry. If the contents of the install could be put into the xbmc python site-packages directory, I'm sure this could be made to work, but in the end, this is not a user friendly solution.

pvagner · (This post was last modified: 2014-03-11, 19:48 by pvagner.)

I haven't moved further however I prefer comtypes over pywin32.
Anyway I have got some packaging issues here to consider:
- What about 3rd party python modules such as comtypes? Should we create seperate XBMC addons for them or should we bundle into this addon?
The same goes for speech dispatcher for linux. Speech-dispatcher has python bindings which communicate with the speech-dispatcher daemon using sockets. I have packaged this as a seperate addon.
Edit:
Also in order to send the speech to NVDA we do need to package its client library. That's a dll file about 100 kb in size.