Constraint-based Word Segmentation for Chinese

Publikation: Bidrag til bog/antologi/rapportBidrag til bog/antologiForskning

Abstract

Written Chinese text has no separators between words in the same way as European languages use space characters, and this creates the Chinese Word Segmentation Problem, CWSP: given a text in Chinese, divide it in a correct way into segments corresponding to words. Good solutions are in demand for virtually any nontrivial computational processing of Chinese text, ranging from spellchecking over internet search to deep analysis.

Isolating the single words is usually the first phase in the analysis of a text, but as for many other language analysis tasks, to do that perfectly, an insight in syntactic and pragmatic content of the text is essentially required. While this parallelism is easy for competent human language user, computer-based methods tend to be separated into
phases with little or no interaction. Accepting this as a fact, means that CWSP introduces a playground for a plethora of different ad-hoc and statistically based methods.

In this paper, we show experiments of implementing different approaches to CWSP in the framework of CHR Grammars [Christiansen, 2005] that provides a constraint solving approach to language analysis. CHR Grammars are based upon Constraint Handling Rules, CHR [Frühwirth, 1998, 2009], which is a declarative, high-level programming language for specification and implementation of constraint solvers.
OriginalsprogEngelsk
TitelConstraints and Language
RedaktørerPhilippe Blache, Henning Christiansen, Veronica Dahl, Denys Duchier , Jørgen Villadsen
ForlagCambridge Scholars Publishing
Publikationsdato2014
Sider237-251
Kapitel11
ISBN (Trykt)978-1-4438-6052-9
StatusUdgivet - 2014

Citer dette