.. _speech_recognition_tool: Speech Recognition Tool ======================= `Sphinx 4 `_ is an open source speech recognition toolkit, written in Java. It can recognize audio-files or direct speech input via a microphone. It needs some dependencies like speechmodels or a grammar to work. You can use your own dependencies or simply use the given Language Models. You will need an acoustic model, a dictionary and a language model or a grammar. The usage of grammar excludes the usage of a language model. These parameters are set via the configuration type. Also you can use a manipulated sphinx configuration to adjust the recognizer. Related resources ----------------- Related projects: - `Sphinx4 at sourceforge `_. - `Sphinx4 at github `_. - :ref:`jsgf_parser` Component repository: - Browse component repository: `speech-recognition-tools `_. - ``git clone https://projects.cit-ec.uni-bielefeld.de/git/lsp-csra.speech-recognition-tools.git/`` System startup: The start script sphinxrecognizer.sh can be found at ``/opt/sphinx4-recognizer/bin/sphinx4-recognizer``. The program can be invoked with the following parameters: -d , --data-path The location of your soundfiles you want to be recognized. Also location where results are stored. -i , --microphone Enable microphone input. -e , --folder-exclusion Name directories that you want to be ignored. -r , --recursive-folder-crawling Boolean if sphinx looks for soundfiles in every directory inside the data path. -m , --speechmodel-type Speechmodel configuration. You can use the speechmodel "-m verbmobil" or "-m cocolab". -g , --grammar The grammar sphinx should use. Usage of grammar excludes usage of a language model. -c , --sphinxconfig Configuration file to adjust the recognizers behavior. -s , --scope Set the rsb-scope where you want the sprechhypothese to be published. Interfaces ---------- Input/Output: .. attention:: The soundfiles you want to recognize need to be a .wav, 16khz, mono file! - Sphinx recognizes speech from soundfiles and stores the result in a textfile which you will find in the folder of the recognized file. - If you use microphone input, the result is stored at the location of -d (data-path) property. The speech-recognition-tool can publish the speech recognition results via RSB: =========================================== ======================================== Scope (Informer) Type ------------------------------------------- ---------------------------------------- ``/scope/scope`` `SpeechHypotheses`_ =========================================== ======================================== - The sent result via RSB holds up to five best results. - The best result comes with timestamp-tags for the single words and the whole hypothesis. - The given timestamps represent the appearance of the recognized word in the source-data. - If no source data but microphone input is used, the timestamps represent the appearance of the recognized word since the programm was started plus the systemtime when the tool was startet. - A grammar-tree is generated if the result is generated from grammar. .. _SpeechHypotheses: http://docs.cor-lab.de//rst-manual/trunk/html/generated/sandbox/package-rst-dialog.html#rst.dialog.SpeechHypotheses Language Model Information -------------------------- Language of the Language Model is german. The two speechmodels "Verbmobil" and "Cocolab" can be found at `speechmodel-repo `_, and are installed at ``/etc/speechconfig/``. Default model will be the Verbmobil-speechmodel. Examples -------- The following scenario is the base for all following examples. You have got a folder ``/root/datafolder/`` which looks like the table beneath. +----------------------------------------------------------------------+ | /root/datafolder/ | +====================+===============+================+================+ | *file01.wav* | *file02.wav* | *file03.mp3* | *file04.wav* | +--------------------+---------------+----------------+----------------+ | *newer_folder* | *new_folder* | *crypt_folder* | *file05.wav* | +--------------------+---------------+----------------+----------------+ | *log.log* | *picture.png* | *file06.wav* | *trash_folder* | +--------------------+---------------+----------------+----------------+ You stored soundfiles with speech on it in this folder. Example 1: Now you want to use sphinx to recognize them. To start sphinx, you can use ``./sphinxrecognizer -d /root/datafolder/``. With the -d parameter you tell it, where your audio-data is. Sphinx will then look in ``/root/datafolder/`` for .wav data and will try to recognize them after each other. It will ignore everything else that wont have the suffix .wav. After it is finished, you will find a new file, called ``parentFileName_dateAndTime`` in which you will find the recognized text. Result files will be created in the same folder where the recognized data is. Example 2: You have also got some soundfiles stored in *new_folder*, *newer_folder* and *crypt_folder*. So you could use ``./sphinxrecognizer -d /root/datafolder/ -r true`` It will then also look for .wav files in all underlying directories. Example 3: Also, you know that *trash_folder* is not containing any interesting data. So you could enhance your command line to ``./sphinxrecognizer -d /root/datafolder/ -r true -e trash_folder``. Example 4: If you want the sprechhypothese to be published via rsb, simply use ``./sphinxrecognizer -d /root/datafolder/ -s /scope`` and it will publish its results via rsb on /scope. Example 5: You want to recognize a microphone input live. So you can start up the tool with ``./sphinxrecognizer -d /root/resultFolder/ -i true``. You will then see that the tool is waiting for your input. After it recognized something as a whole phrase, it stores the result in a file in your result-path. It will listen to you repeatedly until you close the tool. Things to keep in mind ---------------------- - If sphinx gives you no results consider the following: - soundfile in wrong format. - soundfile too noisy - soundfile too long (speech blurred) - The usage of a grammar if notable faster than using a language model. - Grammar usage excludes language-model usage. - Sphinx has big problems recognizing files that contain noise. - The soundfiles you want to recognize need to be a .wav, 16khz, mono file! - If microphone input doesn't work correctly, check the system default audio input device.