DSF/ISO/DIS 24614-2
Language resource management - Word segmentation of written texts - Part 2: Word segmentation for Chinese, Japanese and Korean
| Organization: | DS |
| Status: | inactive |
| Page Count: | 38 |
| ICS Code (Writing and transliteration): | 01.140.10 |
scope:
The basic concepts and general principles for word segmentation defined in Part 1 are applied for Chinese, Japanese and Korean (CJK). The objective of the word segmentation is to suit the requirements for the computational applications of language resources, for the natural language processing, and for other specific applications such as IR (information retrieval) and MT (machine translation). Part 2 is restricted to a particular task delineated by word segmentation, which is distinct from morphological or syntactic analysis per se, although word segmentation greatly depends on morpho-syntactic analysis. The main task of Part 2 is to define word segmentation unit for Chinese, Japanese and Korean. Although they are related to each other at the lexical level, each of these three languages has distinct structural differences and these differences have to be reflected on the definition of word segmentation and its practical guidelines. Due to the fact that these three languages share similarities in words composed of Chinese characters, general rules for identifying word segmentation units (WSU) in Chinese text can also be applied to the processing for Japanese and Korean to some extent.
Document History