.. _speech_recognition_tool:

Speech Recognition Tool
=======================

`Sphinx 4 <http://cmusphinx.sourceforge.net/>`_ is an open source speech recognition toolkit, written in Java.

It can recognize audio-files or direct speech input via a microphone.

It needs some dependencies like speechmodels or a grammar to work.
You can use your own dependencies or simply use the given Language Models.

You will need an acoustic model, a dictionary and a language model or a grammar.
The usage of grammar excludes the usage of a language model.
These parameters are set via the configuration type.

Also you can use a manipulated sphinx configuration to adjust the recognizer.


Related resources
-----------------

Related projects:
 - `Sphinx4 at sourceforge <http://cmusphinx.sourceforge.net/>`_.
 - `Sphinx4 at github <https://github.com/cmusphinx/>`_.
 - :ref:`jsgf_parser`


Component repository: 
 - Browse component repository: `speech-recognition-tools <https://projects.cit-ec.uni-bielefeld.de/projects/lsp-csra/repository/speech-recognition-tools/>`_.
 - ``git clone https://projects.cit-ec.uni-bielefeld.de/git/lsp-csra.speech-recognition-tools.git/``


System startup:


The start script sphinxrecognizer.sh can be found at ``<prefix>/opt/sphinx4-recognizer/bin/sphinx4-recognizer``.

The program can be invoked with the following parameters:

 -d <ABSOLUTE_PATH>, --data-path <ABSOLUTE_PATH>  The location of your soundfiles you want to be recognized. Also location where results are stored.

 -i <BOOLEAN>, --microphone <BOOLEAN>  Enable microphone input.

 -e <FOLDER;ANOTHER_FOLDER>, --folder-exclusion <FOLDER;ANOTHER_FOLDER>  Name directories that you want to be ignored.

 -r <BOOLEAN>, --recursive-folder-crawling <BOOLEAN>  Boolean if sphinx looks for soundfiles in every directory inside the data path.

 -m <NAME_OF_MODEL>, --speechmodel-type <NAME_OF_MODEL>  Speechmodel configuration. You can use the speechmodel "-m verbmobil" or "-m cocolab".

 -g <PATH_TO_GRAMMAR_FILE>, --grammar <PATH_TO_GRAMMAR_FILE>  The grammar sphinx should use. Usage of grammar excludes usage of a language model.
 
 -c <PATH_TO_CONFIG_FILE>, --sphinxconfig <PATH_TO_CONFIG_FILE>  Configuration file to adjust the recognizers behavior.

 -s <SCOPE>, --scope <SCOPE>  Set the rsb-scope where you want the sprechhypothese to be published. 


Interfaces
----------

Input/Output: 

.. attention:: The soundfiles you want to recognize need to be a .wav, 16khz, mono file!

- Sphinx recognizes speech from soundfiles and stores the result in a textfile which you will find in the folder of the recognized file. 
- If you use microphone input, the result is stored at the location of -d (data-path) property.

The speech-recognition-tool can publish the speech recognition results via RSB:

=========================================== ========================================
Scope (Informer)                            Type       
------------------------------------------- ----------------------------------------
``/scope/scope``                                  `SpeechHypotheses`_
=========================================== ========================================

 - The sent result via RSB holds up to five best results.
 - The best result comes with timestamp-tags for the single words and the whole hypothesis.
 - The given timestamps represent the appearance of the recognized word in the source-data.
 - If no source data but microphone input is used, the timestamps represent the appearance of the recognized word since the programm was started plus the systemtime when the tool was startet.
 - A grammar-tree is generated if the result is generated from grammar.

.. _SpeechHypotheses: http://docs.cor-lab.de//rst-manual/trunk/html/generated/sandbox/package-rst-dialog.html#rst.dialog.SpeechHypotheses

Language Model Information
--------------------------
Language of the Language Model is german.

The two speechmodels "Verbmobil" and "Cocolab" can be found at `speechmodel-repo <https://projects.cit-ec.uni-bielefeld.de/projects/lsp-csra/repository/lsp-csra.speechmodels/>`_,
and are installed at ``<prefix>/etc/speechconfig/``.
Default model will be the Verbmobil-speechmodel.

Examples
--------

The following scenario is the base for all following examples. 
You have got a folder ``/root/datafolder/`` which looks like the table beneath.

+----------------------------------------------------------------------+
| /root/datafolder/                                                    |
+====================+===============+================+================+
| *file01.wav*       | *file02.wav*  | *file03.mp3*   | *file04.wav*   |
+--------------------+---------------+----------------+----------------+
| *newer_folder*     | *new_folder*  | *crypt_folder* | *file05.wav*   |
+--------------------+---------------+----------------+----------------+
| *log.log*          | *picture.png* | *file06.wav*   | *trash_folder* |
+--------------------+---------------+----------------+----------------+

You stored soundfiles with speech on it in this folder.

Example 1:

Now you want to use sphinx to recognize them.

To start sphinx, you can use ``./sphinxrecognizer -d /root/datafolder/``.
With the -d parameter you tell it, where your audio-data is.

Sphinx will then look in ``/root/datafolder/`` for .wav data and will try to recognize them after each other.
It will ignore everything else that wont have the suffix .wav.

After it is finished, you will find a new file, called ``parentFileName_dateAndTime`` in which you will find the recognized text.
Result files will be created in the same folder where the recognized data is. 

Example 2:

You have also got some soundfiles stored in *new_folder*, *newer_folder* and *crypt_folder*.
So you could use ``./sphinxrecognizer -d /root/datafolder/ -r true``

It will then also look for .wav files in all underlying directories.

Example 3:

Also, you know that *trash_folder* is not containing any interesting data.
So you could enhance your command line to ``./sphinxrecognizer -d /root/datafolder/ -r true -e trash_folder``.

Example 4:

If you want the sprechhypothese to be published via rsb, simply use ``./sphinxrecognizer -d /root/datafolder/ -s /scope`` and it will publish its results via rsb on /scope.

Example 5:

You want to recognize a microphone input live.
So you can start up the tool with ``./sphinxrecognizer -d /root/resultFolder/ -i true``.
You will then see that the tool is waiting for your input.
After it recognized something as a whole phrase, it stores the result in a file in your result-path.
It will listen to you repeatedly until you close the tool.


Things to keep in mind
----------------------
 - If sphinx gives you no results consider the following:
 	- soundfile in wrong format.
 	- soundfile too noisy
 	- soundfile too long (speech blurred)
 - The usage of a grammar if notable faster than using a language model.
 - Grammar usage excludes language-model usage.
 - Sphinx has big problems recognizing files that contain noise.
 - The soundfiles you want to recognize need to be a .wav, 16khz, mono file!
 - If microphone input doesn't work correctly, check the system default audio input device.