UNLIMITED FREE
ACCESS
TO THE WORLD'S BEST IDEAS

SUBMIT
Already a GlobalSpec user? Log in.

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

Customize Your GlobalSpec Experience

Finish!
Privacy Policy

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

IEEE - WHITE PAPER: INDIAN LANGUAGE RESOURCES - SPEECH SUBCOMMITTEE REPORT

INDIAN LANGUAGE RESOURCES—SPEECH SUBCOMMITTEE REPORT

active, Most Current
Organization: IEEE
Publication Date: 1 January 2023
Status: active
Page Count: 36
scope:

The document represents the output from the speech workgroup covering details of use cases, the lay of the land for standards, and the gaps that exist in speech-language resources for Indian languages. The scope of Indian languages covered in this document is limited to the 22 officially recognized scheduled languages of the country (explicitly listed below) along with the acknowledged different dialects of the same language. The output of the document will cover the following areas:

Dominant use cases drive the deployment of technology in speech-language for the scheduled Indian languages, and the majority of which belong to the Indo-Aryan or Dravidian language families. The use cases can both be driven from public utility and industry perspectives.

List of speech-language resources required for the identified use cases for the officially recognized scheduled languages of India: Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, and Urdu.

Identification of required standards based on the use cases driving the deployment. This should lead to a detailed description of the gaps that exist. The study can incorporate results from similar undertakings from around the world. A good example is the study undertaken by the European Union, which the authors will use as a reference for their work.

Detailed listing of the available resources for speech resources for the listed official languages. The listing should provide a view of the conformance of such resources to the required standards. A good representation of the same is the language matrix as used by W3C [1]. The resources should also include any computer code for the identified use cases.

The document can also make a proposal for required workshops, conferences, etc., for detailed discussions, deliberations, trigger research initiatives, contests, etc., to promote the necessary investigations to expose the gaps.

A critical disclaimer is that this study is only about the standard gaps and will not make any effort to define the required standards. The authors do not make any commitments that the work will equally apply to other nonofficial languages of India.

Before delving into details of the above-mentioned points, the authors will first outline the main technology components typically implemented in speech-based solutions. This will highlight the main requirements and evaluation metrics used for various speech-based components which will further help contextualize the need for standardization.

Document History

WHITE PAPER: INDIAN LANGUAGE RESOURCES - SPEECH SUBCOMMITTEE REPORT
January 1, 2023
INDIAN LANGUAGE RESOURCES—SPEECH SUBCOMMITTEE REPORT
The document represents the output from the speech workgroup covering details of use cases, the lay of the land for standards, and the gaps that exist in speech-language resources for Indian...
Advertisement