.. _sphinx_evaluation_tool:

Sphinx4 Evaluation Tool
=======================

`Sphinx 4 <http://cmusphinx.sourceforge.net/>`_ is an open source speech recognition toolkit, written in Java.

This tool can recognize data-sets and can compare them with a keyword-file.
It is very similar to :ref:`speech_recognition_tool`.

It needs some dependencies like speechmodels or a grammar to work.
You can use your own dependencies or simply use the given Language Models.

You will need an accoustic model, a dictionary and a language model or a grammar.
The usage of grammar excludes the usage of a language model.
These parameters are set via the configuration type.

Also you can use a manipulated sphinx configuration to adjust the recognizer.


Related resources
-----------------

Related projects:
 - `Sphinx4 at sourceforge <http://cmusphinx.sourceforge.net/>`_.
 - `Sphinx4 at github <https://github.com/cmusphinx/>`_.
 - :ref:`speech_recognition_tool`

Component repository: 
 - Browse component repository: `microphone-evaluation <https://projects.cit-ec.uni-bielefeld.de/projects/lsp-csra/repository/microphone-evaluation/>`_.
 - ``git clone https://projects.cit-ec.uni-bielefeld.de/git/lsp-csra.microphone-evaluation.git/``


System startup:


The start script sphinxevaluator.sh can be found at ``/vol/csra//releases/trusty/lsp-csra-nightly/opt/sphinx4-evaluation-tool/bin/sphinxevaluator``.

The program can be invoked with the following parameters:

 -d <ABSOLUTE_PATH>, --data-path <ABSOLUTE_PATH>  The location of your soundfiles you want to be recognized. Also location where results are stored.

 -k <ABSOLUTE_PATH>, --keywords <ABSOLUTE_PATH>  Set the path to a keyword file. The result will then only contain keywords. 

 -e <FOLDER;ANOTHER_FOLDER>, --folder-exclusion <FOLDER;ANOTHER_FOLDER>  Name directories that you want to be ignored.

 -r <BOOLEAN>, --recursive-folder-crawling <BOOLEAN>  Boolean if sphinx looks for soundfiles in every directory inside the data path.

 -m <NAME_OF_MODEL>, --speechmodel-type <NAME_OF_MODEL>  Speechmodel configuration. You can use the speechmodel "-m verbmobil" or "-m cocolab".

 -g <'NAME PATH_TO_GRAMMAR_FILE'>, --grammar <'NAME PATH_TO_GRAMMAR_FILE'>  The grammar sphinx should use. Usage of grammar excludes usage of a language model.
 
 -c <PATH_TO_CONFIG_FILE>, --sphinxconfig <PATH_TO_CONFIG_FILE>  Configuration file to adjust the recognizers behaviour.

 
Interfaces
-----------

Input/Output: 

.. attention:: The soundfiles you want to recognize need to be a .wav, 16khz, mono file!

- Sphinx recognizes speech from soundfiles and stores the result in a textfile.
- You will find a new folder "hypotheses" in the data folder you specified with the -d parameter. In this folder you will find the recognition results.
- The resultfiles will be named after your parameters.   
- If you choose to enable keyword-output, the resultfile will *only* contain recognized keywords.
 

Language Model Information
--------------------------
Language of the Language Model is german.

The two speechmodels "Verbmobil" and "Cocolab" can be found at `speechmodel-repo <https://projects.cit-ec.uni-bielefeld.de/projects/lsp-csra/repository/lsp-csra.speechmodels/>`_,
and are installed at ``/vol/csra/releases/trusty/lsp-csra-nightly/etc/speechconfig/``.
Default model will be the Verbmobil-speechmodel.

Examples
----------

The following scenario is the base for all following examples. 
You have got a folder ``/root/datafolder/`` which looks like the table beneath.

+----------------------------------------------------------------------+
| /root/datafolder/                                                    |
+====================+===============+================+================+
| *set01*            | *set02*       | *set03*        | *set04*        |
+--------------------+---------------+----------------+----------------+
| *set05*            | *new_folder*  | *crypt_folder* | *fileXY.wav*   |
+--------------------+---------------+----------------+----------------+
| *log.log*          | *picture.png* | *set06*        | *trash_folder* |
+--------------------+---------------+----------------+----------------+

You stored soundfiles with speech in this folder that you want to evaluate.

Example 1:

Now you want to use sphinx to recognize them.

To start sphinx, you can use ``./sphinxevaluator -d /root/datafolder/``.
With the -d parameter you tell it, where your audio-data is.

Sphinx will then look in ``/root/datafolder/`` for .wav datas and will try to recognize them after each other.
It will ignore everything else that wont have the suffix .wav.

After it is finished, you will find a new folder in which you will find the results.
The resultfolder will be created in the same folder where the recognized data is. 

Example 2:

You have also got some sound-sets stored in the *set0X* folders.
So you could use ``./sphinxevaluator -d /root/datafolder/ -r true``

It will then also look for .wav files in all underlaying directories.
So you will have in every *set0X* folder a new directory called *hypothesis* where the results are saved.
If other folders contain matching .wav files, sphinx will try to recognize them aswell of course.

Example 3:

Also, you know that *trash_folder* and *crypt-folder* is not containing any interesting data.
So you could enhance your command line to ``./sphinxevaluator -d /root/datafolder/ -r true -e 'trash_folder;crypt_folder'``.
Now the recognizer will ignore the given directories!

Example 4:

If you want to know, if the recognizer understood designated keywords, you can start the tool like this ``./sphinxevaluator -d /root/datafolder/ -k /root/keywordfile``.
It will then recognize the files as usual, but will compare its results with a keywordfile and will only print spotted keywords in the resultfile.

.. todo:: how to set up and where to store keyword-file


Things to keep in mind
----------------------
 - If sphinx gives you no results concider the following:
 	- soundfile in wrong format.
 	- soundfile too noisy
 	- soundfile too long (speech blurred)	
 - The recognizer will take a maximum amount of ten seconds to recognize a file. If the limit is exceeded, the file will be skipped.
 - The usage of a grammar if notable faster than using a language model.
 - Grammar usage excludes language-model usage.
 - Sphinx has big problems recognizing files that contain noise.
 - The soundfiles you want to recognize need to be a .wav, 16khz, mono file!