Uncovering Prepositional Senses

Tine Lassen

Research output: Book/ReportPh.D. thesis

Abstract

This dissertation is concerned with the semantics of Danish prepositions in an ontology-based information retrieval framework. In such a framework, conceptual indexing of texts is needed and, for us, the goal of this indexing process is to index texts based on the conceptual content of larger text chunks – ideally based on the conceptual content of sentences. The conceptual content of text chunks is mapped into a so-called generative ontology, which is to be understood as a non-finite set of concepts. Basically, a generative ontology consists of a given finite ontology ordered by the ISA relation called the skeleton ontology, and a set of production rules (cf. generative grammars) that allows for production of compound concepts. We represent such compound concepts in the ontology language ONTOLOG. In this language, compound concepts are represented as conceptual feature structures of the form c[r1:c1]. The attributions consist of pairs of relations and concept arguments which function as conceptual restrictions on the core concept. However, the generative ontology should not admit arbitrary combinations of relations and concepts: We thus propose to introduce ontological affinities that may specify ontologically admissible ways of combining concepts. The main focus of the dissertation is to identify such ontological affinities for semantic relations denoted by a selection of Danish prepositions. We describe two experiments: The first small-scale experiment concerns a domain-specific corpus which includes texts from the domain of nutrition. For this corpus, sentences containing syntactic structures in the form of NP-PREP-NP are annotated with information about e.g. semantic types for heads of the noun phrases and the relation denoted by the preposition. The relations used in the annotation of these data stem from a small pre-defined set of relations, and the ontological type information stems from the SIMPLE ontology. The resulting data set was used as input to a machine-learning algorithm, and the result was a set of rules that predict the semantic relation of a preposition based on the ontological types of its arguments. Based on encouraging results of this first experiment, the second and larger experiment was launched. This experiment concerns a general language corpus for which the same type of syntactic structures were annotated. This time, the annotation used the newly released Danish language wordnet, DanNet, as a source of ontological type information, while the relations stem from a larger set of relations which were the result of an analysis of dictionary entries and corpus evidences containing prepositions. Again, machine learning was applied, and the result was a set of rules. These rules were transformed into a dictionary of prepositional senses, where, given a preposition and a sense, ontological affinities are expressed as restrictions on the ontological types of the arguments. Thus, the essential results of this research is knowledge about the relations that subset of Danish prepositions can denote and the ontological affinities for these relations.
Original languageEnglish
PublisherRoskilde Universitet
Number of pages314
Publication statusPublished - Sept 2010
SeriesDatalogiske Skrifter
Number131
ISSN0109-9779

Cite this