Bielefeld University, Faculty of Technology, winter term 2011/2012

Intelligent Systems Lab Project: Dialoginterface for Webservices

Participants

Supervisors

Motivation

One aspect of easing everyday tasks is to offer a service that enables people to obtain information or even other services like food delivery or fast public transport informations with natural Speech.
Advantages would be:

Application Szenario

Objectives

The project goals are

Description

The project is designed to be used with any phone. To achieve this, a central service was created, that manages incoming calls, speech recognition, speech synthesis and data retreival and dialogue flow of all kinds. The VoiceXML standard is used to have a well defined format on which the dialogue management bases on. JVoiceXML is used as a VoiceXML browser that interprets the dialogou files generated by the service. To provide the possibility to call the service from any phone, the open software zanzibar, which uses the JVoiceXML project, in combination with an asterisk pbx server can be used. Speech recognition is realized by the Windows Speech Recognition and MARY TTS is used for speech synthesis. As mentioned above, the dialogue flow is managed within VoiceXML files. These files and the corresponding grammar files are generated by php scripts using information obtained from php based web scrapers. Due to the flexibility of JVoiceXML and asterisk, if given, the transmission of the gathered information of the order or query can be of various type, like a simple http request, an email sent to a specific address, e.g. that of the pizza place, a call or a fax to an appropriate phone number or any message passing between a script and an equivalent backend.

Results

Basically our results can be seen in the demonstration video. It shows, how you can order anything that's on the menu of the delivery service. The conversation flow can be improved by changing or altering the vxml file accordingly.
A demonstration is available as Interaction Video (mp4, 20 Mb) (YouTube Link: Video): The video shows some problems of the speech recognizer, too. So it does not recognize words in foreign languages, so you have to pronounce them in the initial language, i.e. italian terms can't be recognized when the initial language is german. Also the voice synthesizer sounds very robotic, but as already mentioned this can be improved in many ways. The speech must become more fluent as well as the voice itself sounds too robotic.

Discussion and Conclusion

As a conclusion of our project we showed, that it is possible to create a markably powerful service with a wide field of application but is hindered due to following problems:

Outlook

SIWI can be enhanced and improved. For example the voicesynthesizer can get better by adding new voices to the MARY TTS Project what will improve the system's acceptance.
The voice can also be a personalization option of the service's settings.
The speech recognition is part of actual research and so it is not fully developed. Due to the fact, that researches are moving on, fellow developers should consider to get implement a different, more improved speech recognizer. Unfortunately most of the speech recognizers available aren't open source but a financial investment would pay off.