Marcus Spies, IBM Global Services, Heidelberg / Universität HeidelbergAbstract
Der Vortrag gibt einen Überblick (mit akustischen Demonstrationen) zum Forschungsstand im Bereich der auditiven Wahrnehmung, mit Betonung der Auswirkungen auf neue Audio-Produkte und Medien. Schwerpunkte sind die Konstitution auditiver Objekte, die Rekonstruktion der Artikulation in der Sprachwahrnehmung und die Bedeutung erlernter motorischer Programme für das Parsing komplexer auditiver Szenen. Forschungsresultate beeinflussen hier die Sprachtechnologie, die Möglichkeit der Datenreduktion bei neuen Medien, das Design neuer Musikinstrumente wie den Einsatz akustischer Rückmeldungen bei Arbeitstätigkeiten mit hoher Kontrollbelastung.
This talk will give an overview of the state of the art in research on auditory perception with a special emphasis on the impact of this research on new applications and media. Auditory demonstrations of some reported phenomena will be presented.
In speech recognition, office products with high recognition rates could be developed on the basis of statistical procedures on samples of spoken text. However, starting with the sensory apparatus of the cochlea, human auditory perception uses highly elaborated processing mechanisms some properties of which have become clear only quite recently. Central processing, then, leads to an organization of sounds into auditory objects in analogy to objects in vision. This allows us to perceive and distinguish phonemes and musical patterns even in noisy environments or under "cocktail-party"-mixture conditions. If speech technology with large vocabularies is to become applied in noisy environments, equivalents of these cognitive mechanisms will have to be employed.
Another major area of influence from cognitive science on speech and music technology is related to the close connection between perception and motor systems in humans. Perceptual distinctions between environment sounds and speech sounds are enhanced by cognitive articulatory models. We hear "somebody talking," not just "speech". Motor experience also seems to play a significant role in the temporal resolution of acoustic events. Practicing a (traditional) musical instrument amounts to establishing motor programs involving complex hierarchical subdivisions of time. The impact of active musical experience on time-related processes of identification etc. of auditory objects are still poorly understood. New instruments are currently designed which will allow humans to explore sounds in completely new ways analogous to existing tools in visual virtual reality. Properties of timing in human movements will influence the design of such musical instruments as well as the use of auditory cues in working environments.
Therefore, in forthcoming systems of speech and music technology, results from cognitive science are highly likely to have a strong practical impact. Among the first examples of this tendency are phoneme models expressed in terms of articulation parameters and data reduction algorithms based on psychoacoustics which are being employed in the technology of new media.