Constraint-based Word Segmentation for Chinese

Research output: Chapter in Book/Report/Conference proceedingBook chapterResearch

Abstract

Written Chinese text has no separators between words in the same way as European languages use space characters, and this creates the Chinese Word Segmentation Problem, CWSP: given a text in Chinese, divide it in a correct way into segments corresponding to words. Good solutions are in demand for virtually any nontrivial computational processing of Chinese text, ranging from spellchecking over internet search to deep analysis.

Isolating the single words is usually the first phase in the analysis of a text, but as for many other language analysis tasks, to do that perfectly, an insight in syntactic and pragmatic content of the text is essentially required. While this parallelism is easy for competent human language user, computer-based methods tend to be separated into
phases with little or no interaction. Accepting this as a fact, means that CWSP introduces a playground for a plethora of different ad-hoc and statistically based methods.

In this paper, we show experiments of implementing different approaches to CWSP in the framework of CHR Grammars [Christiansen, 2005] that provides a constraint solving approach to language analysis. CHR Grammars are based upon Constraint Handling Rules, CHR [Frühwirth, 1998, 2009], which is a declarative, high-level programming language for specification and implementation of constraint solvers.
Original languageEnglish
Title of host publicationConstraints and Language
EditorsPhilippe Blache, Henning Christiansen, Veronica Dahl, Denys Duchier , Jørgen Villadsen
PublisherCambridge Scholars Publishing
Publication date2014
Pages237-251
Chapter11
ISBN (Print)978-1-4438-6052-9
Publication statusPublished - 2014

Cite this

Christiansen, H., & Bo, L. (2014). Constraint-based Word Segmentation for Chinese. In P. Blache, H. Christiansen, V. Dahl, D. D., & J. Villadsen (Eds.), Constraints and Language (pp. 237-251). Cambridge Scholars Publishing.
Christiansen, Henning ; Bo, Li. / Constraint-based Word Segmentation for Chinese. Constraints and Language. editor / Philippe Blache ; Henning Christiansen ; Veronica Dahl ; Denys Duchier ; Jørgen Villadsen. Cambridge Scholars Publishing, 2014. pp. 237-251
@inbook{993f0e2c5cbf48db810e3af4397dde12,
title = "Constraint-based Word Segmentation for Chinese",
abstract = "Written Chinese text has no separators between words in the same way as European languages use space characters, and this creates the Chinese Word Segmentation Problem, CWSP: given a text in Chinese, divide it in a correct way into segments corresponding to words. Good solutions are in demand for virtually any nontrivial computational processing of Chinese text, ranging from spellchecking over internet search to deep analysis.Isolating the single words is usually the first phase in the analysis of a text, but as for many other language analysis tasks, to do that perfectly, an insight in syntactic and pragmatic content of the text is essentially required. While this parallelism is easy for competent human language user, computer-based methods tend to be separated intophases with little or no interaction. Accepting this as a fact, means that CWSP introduces a playground for a plethora of different ad-hoc and statistically based methods.In this paper, we show experiments of implementing different approaches to CWSP in the framework of CHR Grammars [Christiansen, 2005] that provides a constraint solving approach to language analysis. CHR Grammars are based upon Constraint Handling Rules, CHR [Fr{\"u}hwirth, 1998, 2009], which is a declarative, high-level programming language for specification and implementation of constraint solvers.",
author = "Henning Christiansen and Li Bo",
year = "2014",
language = "English",
isbn = "978-1-4438-6052-9",
pages = "237--251",
editor = "Philippe Blache and Henning Christiansen and Veronica Dahl and {Denys Duchier} and J{\o}rgen Villadsen",
booktitle = "Constraints and Language",
publisher = "Cambridge Scholars Publishing",

}

Christiansen, H & Bo, L 2014, Constraint-based Word Segmentation for Chinese. in P Blache, H Christiansen, V Dahl, DD & J Villadsen (eds), Constraints and Language. Cambridge Scholars Publishing, pp. 237-251.

Constraint-based Word Segmentation for Chinese. / Christiansen, Henning; Bo, Li.

Constraints and Language. ed. / Philippe Blache; Henning Christiansen; Veronica Dahl; Denys Duchier; Jørgen Villadsen. Cambridge Scholars Publishing, 2014. p. 237-251.

Research output: Chapter in Book/Report/Conference proceedingBook chapterResearch

TY - CHAP

T1 - Constraint-based Word Segmentation for Chinese

AU - Christiansen, Henning

AU - Bo, Li

PY - 2014

Y1 - 2014

N2 - Written Chinese text has no separators between words in the same way as European languages use space characters, and this creates the Chinese Word Segmentation Problem, CWSP: given a text in Chinese, divide it in a correct way into segments corresponding to words. Good solutions are in demand for virtually any nontrivial computational processing of Chinese text, ranging from spellchecking over internet search to deep analysis.Isolating the single words is usually the first phase in the analysis of a text, but as for many other language analysis tasks, to do that perfectly, an insight in syntactic and pragmatic content of the text is essentially required. While this parallelism is easy for competent human language user, computer-based methods tend to be separated intophases with little or no interaction. Accepting this as a fact, means that CWSP introduces a playground for a plethora of different ad-hoc and statistically based methods.In this paper, we show experiments of implementing different approaches to CWSP in the framework of CHR Grammars [Christiansen, 2005] that provides a constraint solving approach to language analysis. CHR Grammars are based upon Constraint Handling Rules, CHR [Frühwirth, 1998, 2009], which is a declarative, high-level programming language for specification and implementation of constraint solvers.

AB - Written Chinese text has no separators between words in the same way as European languages use space characters, and this creates the Chinese Word Segmentation Problem, CWSP: given a text in Chinese, divide it in a correct way into segments corresponding to words. Good solutions are in demand for virtually any nontrivial computational processing of Chinese text, ranging from spellchecking over internet search to deep analysis.Isolating the single words is usually the first phase in the analysis of a text, but as for many other language analysis tasks, to do that perfectly, an insight in syntactic and pragmatic content of the text is essentially required. While this parallelism is easy for competent human language user, computer-based methods tend to be separated intophases with little or no interaction. Accepting this as a fact, means that CWSP introduces a playground for a plethora of different ad-hoc and statistically based methods.In this paper, we show experiments of implementing different approaches to CWSP in the framework of CHR Grammars [Christiansen, 2005] that provides a constraint solving approach to language analysis. CHR Grammars are based upon Constraint Handling Rules, CHR [Frühwirth, 1998, 2009], which is a declarative, high-level programming language for specification and implementation of constraint solvers.

M3 - Book chapter

SN - 978-1-4438-6052-9

SP - 237

EP - 251

BT - Constraints and Language

A2 - Blache, Philippe

A2 - Christiansen, Henning

A2 - Dahl, Veronica

A2 - , Denys Duchier

A2 - Villadsen, Jørgen

PB - Cambridge Scholars Publishing

ER -

Christiansen H, Bo L. Constraint-based Word Segmentation for Chinese. In Blache P, Christiansen H, Dahl V, DD, Villadsen J, editors, Constraints and Language. Cambridge Scholars Publishing. 2014. p. 237-251