IEEE - Institute of Electrical and Electronics Engineers, Inc.

Contact Information

445 Hoes Lane
Piscataway, NJ 08854 USA

Phone:

(732) 981-0060
(800) 701-IEEE

Fax:

(732) 981-9667

Business Type:

Service

Supplier Website

Email Supplier

IEEE - Institute of Electrical and Electronics Engineers, Inc.

Contact Information

445 Hoes Lane
Piscataway, NJ 08854 USA

Phone:

(732) 981-0060
(800) 701-IEEE

Fax:

(732) 981-9667

Business Type:

Service

Supplier Website

Email Supplier

IEEE - Institute of Electrical and Electronics Engineers, Inc.

Contact Information

445 Hoes Lane
Piscataway, NJ 08854 USA

Phone:

(732) 981-0060
(800) 701-IEEE

Fax:

(732) 981-9667

Business Type:

Service

Supplier Website

Email Supplier

IEEE - WHITE PAPER: INDIAN LANGUAGE RESOURCES - TEXT PROCESSING SUBCOMMITTEE REPORT

INDIAN LANGUAGE RESOURCES—TEXT PROCESSING SUBCOMMITTEE REPORT

active, Most Current

Organization:	IEEE
Publication Date:	1 January 2023
Status:	active
Page Count:	41

scope:

The scope of this survey includes the following:

There are quite many text processing tasks that exist in the field of NLP. However, while some tasks require standards [like POS tagset, named entity recognition (NER), etc.], some may not need the same (word sense, domain terms, etc.). Identify the tasks for which standardization is required. This includes the standardization in the annotation (including tagset and guideline), definition, and formatting for the different text processing tasks. In this survey, the authors categorize the tasks based on the input in the following dimensions and explore different tasks in each dimension and the need for standardization:

a. Character-level

b. Word-level

c. Sentence-level

d. Discourse-level

e. Code-mixed

In addition to the above, the survey also includes a few end-user applications (e.g., question answering (QnA), summarization) as case studies to understand the need for standardization for specific tasks.

A detailed survey of the existing standards available primarily focusing ILs. The survey also includes global standards whenever available to understand the big picture across international languages.

Furthermore, the scope of the survey also includes documenting available resources and tools.

Identifying the gaps.

The scope of the task does not include the following:

There are many approaches to solve each text processing task. Some of them may have some correlation with the granularity of annotation (e.g., number of tags). The survey does not include any study on approaches used to solve the tasks.

Often there are multiple standards (originated from different research groups at the same time) available for a task. This survey lists all of them and does not compare them side-by-side as the same is not the scope of the prestandardization. Rather, generalize the gaps across them.

Document History

WHITE PAPER: INDIAN LANGUAGE RESOURCES - TEXT PROCESSING SUBCOMMITTEE REPORT

January 1, 2023

INDIAN LANGUAGE RESOURCES—TEXT PROCESSING SUBCOMMITTEE REPORT

The scope of this survey includes the following: There are quite many text processing tasks that exist in the field of NLP. However, while some tasks require standards [like POS tagset, named entity...

Contact Preferences

This is embarrasing...

Customize Your GlobalSpec Experience

Select Your Free Newsletters

Industry Newsletters

Select Your Free Product Alerts

This is embarrasing...

IEEE - Institute of Electrical and Electronics Engineers, Inc.

IEEE - Institute of Electrical and Electronics Engineers, Inc.

IEEE - Institute of Electrical and Electronics Engineers, Inc.

IEEE - WHITE PAPER: INDIAN LANGUAGE RESOURCES - TEXT PROCESSING SUBCOMMITTEE REPORT

INDIAN LANGUAGE RESOURCES—TEXT PROCESSING SUBCOMMITTEE REPORT

scope:

Document History