I installed the latest release of ubuntu this weekend, excited to see all of the latest changes. The system again is more polished and they are finally getting their act together with pulseudio.
For the vary majority of user pulseaudio when working is an improvement. Not for text to speech users. When using small clips of audio in quick succession, delays however small are very noticeable. Guess what pulseaudio as another layer in the aduio subsystem adds a delay. Now when using speech in Karmic there is a noticeable delay when using espeak for text to speech.
Open-sapi now has a chanced to step up to the mark. A karmic wine version is still missing as an essential component of the open-sapi install system but this will come wth time. This gives me an opportunity to iron out a few bugs and package it for release.
Watch this space
Tom
Monday, 2 November 2009
Sunday, 18 October 2009
Threads are the key
The first release of the open sapi tool for rbutil has been confirmed to work under Windows and Linux but Mac testting is still needed. This tools allows the generation of supporting speech files needed for the speech accessibility features found in rockbox. This allows the user to take a standard SAPI voice and use it on Windows, Mac and Linux systems to produce text to speech clips in wav format.
The code rework from the main project has allowed for more efficient and faster code execution. The analysis of time critical function has enabled me to use tihe most time efficient commands in these areas.
Now using the system under Ubuntu Linux in combination with speech dispatcher, pulseaudio and Orca we recieive performance close to that currently seen with espeak.
Making the server multi threaded has increases the scailability of the system. There are still increased time improvements to be made with the full intergration into speech dispatcher with a native open sapi module.
Further development of the threading model will enable the server to handel multiple client efficiently.
This has help ease the worry of the looming question of will the system ever been effecient enough to do real time speech synthesis. There are still problems to iron out but we are well on the way.
The code rework from the main project has allowed for more efficient and faster code execution. The analysis of time critical function has enabled me to use tihe most time efficient commands in these areas.
Now using the system under Ubuntu Linux in combination with speech dispatcher, pulseaudio and Orca we recieive performance close to that currently seen with espeak.
Making the server multi threaded has increases the scailability of the system. There are still increased time improvements to be made with the full intergration into speech dispatcher with a native open sapi module.
Further development of the threading model will enable the server to handel multiple client efficiently.
This has help ease the worry of the looming question of will the system ever been effecient enough to do real time speech synthesis. There are still problems to iron out but we are well on the way.
Saturday, 5 September 2009
Vista voice imported !
A few days ago one of Americas most revered education institutes contacted me asking for details on how they could assist me with open-sapi.
This is very exciting as it is the first organisation to see the potential use in their projects. A requirement for them was to move from SAPI 4 voices to the new SAPI 5.3 voice that has been in use in Windows since the release of Vista.
So open-sapi did not quite need their needs as it run SAPI 5.1, but still looked promising for helping to automate their backend text to speech production.
So I started this seriously last night and now (the day after) have Microsoft Anna chattering away on my Ubuntu machine.
So a big milestone for open-sapi is that it now support the lastest and gretest free SAPi TTS voice from microsoftt.
This is very exciting as it is the first organisation to see the potential use in their projects. A requirement for them was to move from SAPI 4 voices to the new SAPI 5.3 voice that has been in use in Windows since the release of Vista.
So open-sapi did not quite need their needs as it run SAPI 5.1, but still looked promising for helping to automate their backend text to speech production.
So I started this seriously last night and now (the day after) have Microsoft Anna chattering away on my Ubuntu machine.
So a big milestone for open-sapi is that it now support the lastest and gretest free SAPi TTS voice from microsoftt.
Wednesday, 26 August 2009
The Blog Begins
This is not really the start of Open Sapi as it has been running for almost a year now, initially started in October 2008.
This project was inspired due to the lack of high quality speech engines in a variety of languages availalbe in other operating systems. The aim of the project is to use the Microsoft Speech API in combination with high quality SAPI speech engines on any other operating system.
I have been concentrating on Linux and in particular Ubuntu. The project is still in a pre-release development stage and can be found at http://code.google.com/p/open-sapi/.
Currently the client and server are both stable. The system performance and reliability in use is questionable due to the projects reliance on lots of other component.
This has kept me busy now for almost a year of development in my free time. I will shortly be recording a video of my system demonstrating the use of a native Microsoft Speech Engine in Linux.
The primary use of the project has been in combination with Orca and Speech Dispatcher to provide TTS feedback using any SAPI engine on the Linux Desktop.
So far there have been three other branches that the work i have been doing has been useful in.
The first is a modified version of the server that can be run in any operating system that provides the Rockbox Utility access to the full features available through SAPI 5.1 and to generate speech to use in the accessibility features of Rockbox with High Quality SAPI engines.
The second is the use of Speech enabled Windows Games in Linux this allow the games to run as they would in Windows giving speech output as they would when run in Microsoft.
The Final Project is a new joint venture to try and use the other element of the Microsoft Speech API to get the Speech Recognition to work and be integrated into Linux. The idea is to get the SR engine from Dragon Naturally Speaking to process speech in the background and use a similar architecture as Open Sapi for integrating into the Linux Desktop. I have helping out on the side of this project as I gained a lot of knowledge when i implemented the Text To Speech side.
There is a large amount of work to be getting on with so I will say good day for the moment and keep adding information as I go.
This project was inspired due to the lack of high quality speech engines in a variety of languages availalbe in other operating systems. The aim of the project is to use the Microsoft Speech API in combination with high quality SAPI speech engines on any other operating system.
I have been concentrating on Linux and in particular Ubuntu. The project is still in a pre-release development stage and can be found at http://code.google.com/p/open-sapi/.
Currently the client and server are both stable. The system performance and reliability in use is questionable due to the projects reliance on lots of other component.
This has kept me busy now for almost a year of development in my free time. I will shortly be recording a video of my system demonstrating the use of a native Microsoft Speech Engine in Linux.
The primary use of the project has been in combination with Orca and Speech Dispatcher to provide TTS feedback using any SAPI engine on the Linux Desktop.
So far there have been three other branches that the work i have been doing has been useful in.
The first is a modified version of the server that can be run in any operating system that provides the Rockbox Utility access to the full features available through SAPI 5.1 and to generate speech to use in the accessibility features of Rockbox with High Quality SAPI engines.
The second is the use of Speech enabled Windows Games in Linux this allow the games to run as they would in Windows giving speech output as they would when run in Microsoft.
The Final Project is a new joint venture to try and use the other element of the Microsoft Speech API to get the Speech Recognition to work and be integrated into Linux. The idea is to get the SR engine from Dragon Naturally Speaking to process speech in the background and use a similar architecture as Open Sapi for integrating into the Linux Desktop. I have helping out on the side of this project as I gained a lot of knowledge when i implemented the Text To Speech side.
There is a large amount of work to be getting on with so I will say good day for the moment and keep adding information as I go.
Subscribe to:
Comments (Atom)
