UNLIMITED FREE
ACCESS
TO THE WORLD'S BEST IDEAS

SUBMIT
Already a GlobalSpec user? Log in.

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

Customize Your GlobalSpec Experience

Finish!
Privacy Policy

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

IEEE - WHITE PAPER: INDIAN LANGUAGE RESOURCES - TEXT PROCESSING SUBCOMMITTEE REPORT

INDIAN LANGUAGE RESOURCES—TEXT PROCESSING SUBCOMMITTEE REPORT

active, Most Current
Organization: IEEE
Publication Date: 1 January 2023
Status: active
Page Count: 41
scope:

The scope of this survey includes the following:

There are quite many text processing tasks that exist in the field of NLP. However, while some tasks require standards [like POS tagset, named entity recognition (NER), etc.], some may not need the same (word sense, domain terms, etc.). Identify the tasks for which standardization is required. This includes the standardization in the annotation (including tagset and guideline), definition, and formatting for the different text processing tasks. In this survey, the authors categorize the tasks based on the input in the following dimensions and explore different tasks in each dimension and the need for standardization:

a. Character-level

b. Word-level

c. Sentence-level

d. Discourse-level

e. Code-mixed

In addition to the above, the survey also includes a few end-user applications (e.g., question answering (QnA), summarization) as case studies to understand the need for standardization for specific tasks.

A detailed survey of the existing standards available primarily focusing ILs. The survey also includes global standards whenever available to understand the big picture across international languages.

Furthermore, the scope of the survey also includes documenting available resources and tools.

Identifying the gaps.

The scope of the task does not include the following:

There are many approaches to solve each text processing task. Some of them may have some correlation with the granularity of annotation (e.g., number of tags). The survey does not include any study on approaches used to solve the tasks.

Often there are multiple standards (originated from different research groups at the same time) available for a task. This survey lists all of them and does not compare them side-by-side as the same is not the scope of the prestandardization. Rather, generalize the gaps across them.

Document History

WHITE PAPER: INDIAN LANGUAGE RESOURCES - TEXT PROCESSING SUBCOMMITTEE REPORT
January 1, 2023
INDIAN LANGUAGE RESOURCES—TEXT PROCESSING SUBCOMMITTEE REPORT
The scope of this survey includes the following: There are quite many text processing tasks that exist in the field of NLP. However, while some tasks require standards [like POS tagset, named entity...
Advertisement