ES 202 211-2003
Speech Processing@ Transmission and Quality Aspects (STQ); Distributed speech recognition; Extended front-end feature extraction algorithm; Compression algorithms; Back-end speech reconstruction algorithm (V1.1.1)

Standard No.
ES 202 211-2003
Release Date
2003
Published By
ETSI - European Telecommunications Standards Institute
Latest
ES 202 211-2003
Scope
"The present document specifies algorithms for extended front-end feature extraction@ their transmission@ back-end pitch tracking and smoothing@ and back-end speech reconstruction which form part of a system for distributed speech recognition. The specification covers the following components: a) the algorithm for front-end feature extraction to create Mel-Cepstrum parameters; b) the algorithm for extraction of additional parameters@ viz.@ fundamental frequency F0 and voicing class; c) the algorithm to compress these features to provide a lower data transmission rate; d) the formatting of these features with error protection into a bitstream for transmission; e) the decoding of the bitstream to generate the front-end features at a receiver together with the associated algorithms for channel error mitigation; f) the algorithm for pitch tracking and smoothing at the back-end to minimize pitch errors; g) the algorithm for speech reconstruction at the back-end to synthesize intelligible speech. NOTE: The components (a)@ (c)@ (d)@ and (e) are already covered by the ES 201 108 [1]. Besides these (four) components@ the present document covers the components (b)@ (f)@ and (g) to provide back-end speech reconstruction and enhanced tonal language recognition capabilities. If these capabilities are not of interest@ the reader is better served by (un-extended) ES 201 108 [1]. The present document does not cover the ""back-end"" speech recognition algorithms that make use of the received DSR front-end features. The algorithms are defined in a mathematical form@ pseudo-code@ or as flow diagrams. Software implementing these algorithms written in the 'C' programming language will be provided with the final published version of the present document. Conformance tests are not specified as part of the standard. The recognition performance of proprietary implementations of the standard can be compared with those obtained using the reference 'C' code on appropriate speech databases. It is anticipated that the DSR bitstream will be used as a payload in other higher level protocols when deployed in specific systems supporting DSR applications. The Extended Front-End (XFE) standard incorporates tonal information@ viz.@ fundamental frequency F0 and voicing class@ as additional parameters. This information can be used for enhancing the recognition accuracy of tonal languages@ e.g. Mandarin@ Cantonese@ and Thai. The Extended Front-End (XFE) standard incorporates Voice Activity information as part of the voicing class information. This can be used for segmentation (or end-point detection) of the speech data for improved recognition performance."

ES 202 211-2003 history

  • 2003 ES 202 211-2003 Speech Processing@ Transmission and Quality Aspects (STQ); Distributed speech recognition; Extended front-end feature extraction algorithm; Compression algorithms; Back-end speech reconstruction algorithm (V1.1.1)



Copyright ©2024 All Rights Reserved