European Telecommunications Standards Institute (ETSI)

Contact Information

650, route des Lucioles
Sophia Antipolis CEDEX, Alpes Maritimes 06921 France

Phone:

33 492-944200

Fax:

33 493-654716

Business Type:

Service

Supplier Website

European Telecommunications Standards Institute (ETSI)

Contact Information

650, route des Lucioles
Sophia Antipolis CEDEX, Alpes Maritimes 06921 France

Phone:

33 492-944200

Fax:

33 493-654716

Business Type:

Service

Supplier Website

ETSI - ES 202 212

Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Extended advanced front-end feature extraction algorithm; Compression algorithms; Back-end speech reconstruction algorithm

inactive

Organization:	ETSI
Publication Date:	1 August 2003
Status:	inactive
Page Count:	93

scope:

The present document specifies algorithms for extended advanced front-end feature extraction, their transmission, back-end pitch tracking and smoothing, and back-end speech reconstruction which form part of a system for distributed speech recognition. The specification covers the following components:

a) the algorithm for advanced front-end feature extraction to create Mel-Cepstrum parameters;

b) the algorithm for extraction of additional parameters, viz., fundamental frequency F0 and voicing class;

c) the algorithm to compress these features to provide a lower data transmission rate;

d) the formatting of these features with error protection into a bitstream for transmission;

e) the decoding of the bitstream to generate the advanced front-end features at a receiver together with the associated algorithms for channel error mitigation;

f) the algorithm for pitch tracking and smoothing at the back-end to minimize pitch errors;

g) the algorithm for speech reconstruction at the back-end to synthesize intelligible speech.

NOTE: The components a), c), d) and e) are already covered by the ES 202 050 [2]. Besides these (four) components, the present document covers the components b), f) and g) to provide back-end speech reconstruction and enhanced tonal language recognition capabilities. If these capabilities are not of interest, the reader is better served by (un-extended) ES 202 050 [2].

The present document does not cover the "back-end" speech recognition algorithms that make use of the received DSR advanced front-end features.

The algorithms are defined in a mathematical form, pseudo-code, or as flow diagrams. Software implementing these algorithms written in the 'C' programming language is contained in the ZIP file es_202212v010101p0.zip which accompanies the present document. Conformance tests are not specified as part of the standard. The recognition performance of proprietary implementations of the standard can be compared with those obtained using the reference 'C' code on appropriate speech databases.

It is anticipated that the DSR bitstream will be used as a payload in other higher level protocols when deployed in specific systems supporting DSR applications. In particular, for packet data transmission, it is anticipated that the IETF AVT RTP DSR payload definition (see bibliography) will be used to transport DSR features using the frame pair format described in clause 7.

The extended advanced DSR standard is designed for use with discontinuous transmission and to support the transmission of Voice Activity information. Annex A describes a VAD algorithm that is recommended for use in conjunction with the Advanced DSR standard, however it is not part of the present document and manufacturers may choose to use an alternative VAD algorithm.

The Extended Advanced Front-End (XAFE) incorporates tonal information, viz., fundamental frequency F0 and voicing class, as additional parameters. This information can be used for enhancing the recognition accuracy of tonal languages, e.g. Mandarin, Cantonese, and Thai.

Document History

ES 202 212

November 1, 2005

Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Extended advanced front-end feature extraction algorithm; Compression algorithms; Back-end speech reconstruction algorithm

A description is not available for this item.

ES 202 212

November 1, 2003

ES 202 212

August 1, 2003

Contact Preferences

This is embarrasing...

Customize Your GlobalSpec Experience

Select Your Free Newsletters

Industry Newsletters

Select Your Free Product Alerts

This is embarrasing...

European Telecommunications Standards Institute (ETSI)

European Telecommunications Standards Institute (ETSI)

ETSI - ES 202 212

Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Extended advanced front-end feature extraction algorithm; Compression algorithms; Back-end speech reconstruction algorithm

scope:

Document History

References