.. _speech_recognition: Speech Recognition ================== The speech recognition component based on `CMU Sphinx-4 `_ is part of the incremental speech processing toolkit `InproTK `_. It produces `Dialog Acts`_ or `Speech Hypotheses`_ based on keyword-spotting on the ASR results. At present we have one dialog flow configuration for each interaction island (``/citec/csra/home/kitchen/assistance`` , ``/citec/csra/home/hallway/entrance``). Related resources ----------------- Component repository: - Browse component repository: `inprotk-conf `_. - ``git clone https://projects.cit-ec.uni-bielefeld.de/git/inprotk-conf.git/`` System startup: - The speech recognition can be found in ``lsp-csra-base.sh`` on the ``inpro`` tab. - The component script is called ``component_inprotk_speechreco`` Related projects: - Browse repository `system-startup `_. - Browse project `InproTK-DSG `_. Interfaces ---------- =========================================================== ======================================== Scope (Listener) Type =========================================================== ======================================== ``//audio/in/16bit/16000Hz/LE`` `Sound Chunk`_ =========================================================== ======================================== The speech recognition publishes the speech recognition results via RSB: =========================================== ======================================== Scope (Informer) Type =========================================== ======================================== ``/dialogact`` `Dialog Act`_ ``/speechhypotheses`` `Speech Hypotheses`_ =========================================== ======================================== .. _Dialog Acts: http://docs.cor-lab.de//rst-manual/trunk/html/generated/sandbox/package-rst-dialog.html#message-dialogact .. _Dialog Act: http://docs.cor-lab.de//rst-manual/trunk/html/generated/sandbox/package-rst-dialog.html#message-dialogact .. _Sound Chunk: http://docs.cor-lab.de//rst-manual/trunk/html/generated/stable/package-rst-audition.html#rst.audition.SoundChunk .. _Speech Hypotheses: http://docs.cor-lab.de//rst-manual/trunk/html/generated/sandbox/package-rst-dialog.html#rst.dialog.SpeechHypotheses .. _speech_recognition_visualization: Speech recognition visualizations --------------------------------- The speech recognition component provides a lot of visualizations which are shown on start up. The first window shows the current speech hypothesis of the actual speech recognition. The second window visualizes the speech state. It is possible to pause the speech recognition by clicking on the red circle button. The last window shows the prosody monitor. .. figure:: /images/software/inpro1.png :figclass: align-center The current speech hypothesis. .. figure:: /images/software/inpro2.png :figclass: align-center Voice activity detection. .. figure:: /images/software/inpro3.png :figclass: align-center Prosody monitor. Examples -------- React to a human greeting (in the hallway) ########################################### .. code-block:: java :emphasize-lines: 3,19 // RSB Listener Listener listener = Factory.getInstance(). createListener("/citec/csra/home/hallway/entrance/dialogact"); listener.activate(); // Add a local event handler listener.addHandler(new TaskHandler(){ @Override public void internalNotify(Event event) { // Only handle dialog act types if (event.getData() instanceof DialogActType.DialogAct) { DialogActType.DialogAct dialogact = (DialogActType.DialogAct) event.getData(); // Only react on final results if (dialogact.getIU().getEdittype().equals(DialogAct.EditType.COMMIT)) { switch(dialogact.getType()){ case GREET: System.out.println("Greeting"); break; case GOODBYE: System.out.println("Goodbye"); break; default: System.out.println("something else"); } } } } } Configure own speech recognition ################################ 1. Check out the configuration project: .. code-block:: bash git clone -b minimal https://projects.cit-ec.uni-bielefeld.de/git/lsp-csra.inprotk-conf.git 2. Change grammar in the config folder: ``src/main/resources/de/unibi/agai/inproapp/config/test.gram`` 3. Change configurations, e.g., the output scope in the ``iu-config.xml`` .. code-block:: xml :emphasize-lines: 3 Create own JSGF ############### In this example we will create a "simpleCommand.gram" grammar. The grammar is defined in a file with the .gram extension and consists of two parts, the header and the body. The header itself consists of up to three parts: self-identification: - looks like: ``#JSGF version char-encoding locale;`` - Example: ``#JSGF V1.0 UTF-8 de;`` - “#JSGF” is required and “version char-encoding locale” is optional. grammar-name: - looks like: ``grammar grammarName;`` or ``grammar packageName.grammarName;`` - Example: ``grammar simpleCommand;`` - The grammar-name is required. imports: - looks like: ``import ;`` or ``import ;`` - Example: ``import ;`` - The grammar header can optionally include import declarations. An import declaration allows one or all of the public rules of another grammar to be referenced locally. - In the CSRA all grammars need to be in the same folder for imports! The complete header would look like this: .. code-block:: bash #JSGF V1.0 UTF-8 de; grammar simpleCommand; The body contains the rules for this grammar. Every rule can be defined once, double entries will overwrite. The order of definition of rules is not significant. We structure the rules by `DialogActTypes`_, so that we have only one public rule which defines these DialogActTypes. Only the first public rule can be used as an entry! All further public rules can just be imported into other grammars. Step by step we will write some rules for this grammar. The patterns for rule definitions are: .. code-block:: bash = ruleExpansion ; public = ruleExpansion ; The components of the rule definition are an optional public declaration, the name of the rule being defined, an equals sign ' = ', the expansion of the rule, and a closing semi-colon ' ; '. The rule expansion defines how the rule may be spoken. It is a logical combination of tokens (text that may be spoken). Lets define a simple rule to say "Hello Flobi": .. code-block:: bash #JSGF V1.0 UTF-8 de; grammar simpleCommand; public = ; = Hello Flobi ; We want to add more robots, alternatives. A rule can be defined as a set of alternative expansions separated by vertical bar characters ' | '. .. code-block:: bash #JSGF V1.0 UTF-8 de; grammar simpleCommand; public = ; = Hello Flobi | Hello Meka ; We could also use parentheses and alternatives to make it more elegant. .. code-block:: bash #JSGF V1.0 UTF-8 de; grammar simpleCommand; public = ; = Hello (Flobi | Meka) ; A rule expansion can also refer to another rule. So we could create a rule to contain all the robot-names. .. code-block:: bash #JSGF V1.0 UTF-8 de; grammar simpleCommand; public = ; = Hello ; = Flobi | Meka ; Now we can either say "Hello Flobi" or "Hello Meka". But we can not simply say "Hello". So we can use optional grouping. Square brackets may be placed around any rule definition to indicate that the contents are optional. .. code-block:: bash #JSGF V1.0 UTF-8 de; grammar simpleCommand; public = ; = Hello [] ; = Flobi | Meka; Lets add more rules beside the greeting: .. code-block:: bash #JSGF V1.0 UTF-8 de; grammar simpleCommand; public = | ; = Hello [] ; = Flobi | Meka; = = (wer bin ich | wie ist mein Name | wie heiße ich); Or a little more complex one: .. code-block:: bash #JSGF V1.0 UTF-8 de; grammar simpleCommand; public = | | ; = Hello [] ; = Flobi | Meka; = = (wer bin ich | wie ist mein Name | wie heiße ich); = ; = [] [(mach | schalte | stell)] [] ([das] Licht|[die] Lichter |[die] Lampen) [] []; = (überall | hier | alle | in der Küche | im Bad | im Wohnzimmer); = (an | aus | heller |dunkler); Bigger expressions should be used with care since they tend to make the recognition more imprecise. We can also modify the quantity of a expansion by using a kleene-star ' * ' or a plus symbol ' + '. A rule expansion followed by the kleene-star symbol indicates that the expansion may be spoken **zero or more** times and a rule expansion followed by the plus symbol indicates the expansion may be spoken **one or more** times. .. code-block:: bash :emphasize-lines: 10 #JSGF V1.0 UTF-8 de; grammar simpleCommand; public = | | ; = Hello [] ; = Flobi | Meka; = = (wer bin ich | wie ist mein Name | wie heiße ich); = ; = [] (bitte)* [(mach | schalte | stell)] [] ([das] Licht|[die] Lichter |[die] Lampen) [] []; = (überall | hier | alle | in der Küche | im Bad | im Wohnzimmer); = (an | aus | heller |dunkler); We added a little politeness to the action-request switch-light, but we can say it as often as we want and also omit it. If we use a plus symbol there, we have to say it at least one time! The grammar format also supports right-recursion, so you can make a reference in a rule to itself as the last part of its definition. We can add a filler-rule to handle hesitation-noises etc. And we want to allow filler to said multiple times. .. code-block:: bash #JSGF V1.0 UTF-8 de; grammar simpleCommand; public = | | | ; = Hello [] ; = Flobi | Meka; = = (wer bin ich | wie ist mein Name | wie heiße ich); = ; = [] (bitte)* [(mach | schalte | stell)] [] ([das] Licht|[die] Lichter |[die] Lampen) [] []; = (überall | hier | alle | in der Küche | im Bad | im Wohnzimmer); = (an | aus | heller |dunkler); = (ähm | hm | hä | aha | argh | och | oje | öh) [] So we can say it one time and maybe as often as we want to again. | These are the basics for usage, there are more features like tagging and weighting tokens. | See https://www.w3.org/TR/jsgf/ for detailed information about jsgf. | Other tutorials: `One `_ and `Two `_ | The grammar is read by the speech-recognition-tool and by the jsgf-parser tool. They should not differ in their features, but the usage of certain features is not tested or stable. Here is a list of what is supported: ============================== ======================================= ============================================ Feature sphinx jsgf-parser ============================== ======================================= ============================================ rulename characters: text yes yes numbers only no no text and numbers yes yes special characters _ $ - : , | \ @ % ! ^ & ~ # _ $ - : , | \ @ % ! ^ & ~ # rulenames and yes yes quoted Tokens yes no(?) comments yes yes imports yes yes rule expansions: sequences yes yes alternatives yes yes parentheses yes yes optional grouping yes yes weights yes no(?) kleene-star yes yes plus-symbol yes yes tags yes no(?) right recursion yes yes ============================== ======================================= ============================================ .. _DialogActTypes: http://docs.cor-lab.de//rst-manual/0.15/html/generated/stable/package-rst-dialog.html#rst.dialog.DialogAct.Type Creation of Dialog Acts ####################### Here we have a very simple example grammar .. code-block:: xml :emphasize-lines: 2 grammar example; public = ( | | ); = (hello | nice to meet you ) Flobi; = ( Goodbye | bye | see you soon) Flobi; = (what time is it | how is the weather); Assumption: Someone said "hello Flobi" as input for this tool. A `SpeechHypothesis`_ will be generated and send. The generated SpeechHypothesis contains: =============================================================================== =================================================== field contains in this example =============================================================================== =================================================== list of the words that describe the unterstood input. ``[hello, flobi]`` confidence for the speechhypothesis. 1 (totaly sure) grammartree containing grammar rules and spoken tokens in xml format. `` hello flobi `` flag if the result is final final =============================================================================== =================================================== The grammar-tree contains as root the name of the grammar. This tag will be ignored when inspecting the tree, so that we have a rule as root element. A `DialogAct`_ can be created from the SpeechHypothesis and its related grammar-tree. The root element of the tree will be the `DialogActType`_. See what `DialogActTypes`_ are present. .. attention:: In this example the grammar rule and the `DialogActType`_ does match exactly and it is highly recommended to create grammars that match these dialog act types! Small adaptions can be handled (is the rule a prefix of a type?) but usually the type will be ``OTHER``, if the triggered rule can not be matched with a type! This means if there is no grammar-tree for the SpeechHypothesis, the DialogActType will be ``OTHER``. Final and unfinal SpeechHypotheses differ at least in their grammar-tree. The tree for unfinal hypotheses contains all possible trees for that hypotheses, so maybe more than one. For unfinal hypotheses with more than one grammar-tree only the first will be inspected. The final-flag of the hypotheses has also impact on the EditType. For final SpeechHypotheses the EditType of the DialogAct will be ``COMMIT``, else it will be ``ADD``. The generated DialogAct contains: =============================================================================== =================================================== field contains in this example =============================================================================== =================================================== type of the current DialogAct ``GREET`` incremental_unit, contains information about the `EditType`_, id and more EditType = ``COMMIT`` input- `SpeechHypotheses`_ complete SpeechHypothesis from above =============================================================================== =================================================== The field-lists above are incomplete, see the marked links for more details. .. _DialogAct: http://docs.cor-lab.de//rst-manual/trunk/html/generated/sandbox/package-rst-dialog.html#message-dialogact .. _EditType: http://docs.cor-lab.de//rst-manual/0.15/html/generated/stable/package-rst-dialog.html#rst.dialog.IncrementalUnit.state .. _DialogActType: http://docs.cor-lab.de//rst-manual/0.15/html/generated/stable/package-rst-dialog.html#rst.dialog.DialogAct.Type .. _DialogActTypes: http://docs.cor-lab.de//rst-manual/0.15/html/generated/stable/package-rst-dialog.html#rst.dialog.DialogAct.Type .. _SpeechHypothesis: http://docs.cor-lab.de//rst-manual/0.15/html/generated/stable/package-rst-dialog.html#rst.dialog.SpeechHypothesis .. _SpeechHypotheses: http://docs.cor-lab.de//rst-manual/0.15/html/generated/stable/package-rst-dialog.html#rst.dialog.SpeechHypotheses