Ontology-based Information Retrieval

Research output: Book/ReportPh.D. thesisResearch

Abstract

In this thesis, we will present methods for introducing ontologies in information retrieval. The main hypothesis is that the inclusion of conceptual knowledge such as ontologies in the information retrieval process can contribute to the solution of major problems currently found in information retrieval.


This utilization of ontologies has a number of challenges. Our focus is on the use of similarity measures derived from the knowledge about relations between concepts in ontologies, the recognition of semantic information in texts and the mapping of this knowledge into the ontologies in use, as well as how to fuse together the ideas of ontological similarity and ontological indexing into a realistic information retrieval scenario.


To achieve the recognition of semantic knowledge in a text, shallow natural language processing is used during indexing that reveals knowledge to the level of noun phrases. Furthermore, we briefly cover the identification of semantic relations inside and between noun phrases, as well as discuss which kind of problems are caused by an increase in compoundness with respect to the structure of concepts in the evaluation of queries.


Measuring similarity between concepts based on distances in the structure of the ontology is discussed. In addition, a shared nodes measure is introduced and, based on a set of intuitive similarity properties, compared to a number of different measures. In this comparison the shared nodes measure appears to be superior, though more computationally complex. Some of the major problems of shared nodes which relate to the way relations differ with respect to the degree they bring the concepts they connect closer are discussed. A generalized measure called weighted shared nodes is introduced to deal with these problems.


Finally, the utilization of concept similarity in query evaluation is discussed. A semantic expansion approach that incorporates concept similarity is introduced and a generalized fuzzy set retrieval model that applies expansion during query evaluation is presented. While not commonly used in present information retrieval systems, it appears that the fuzzy set model comprises the flexibility needed when generalizing to an ontology-based retrieval model and, with the introduction of a hierarchical fuzzy aggregation principle, compound concepts can be handled in a straightforward and natural manner.


Original languageEnglish
Place of PublicationRoskilde
PublisherRoskilde Universitet
Number of pages196
Publication statusPublished - 2006
SeriesDatalogiske Skrifter
Number107
ISSN0109-9779

Keywords

  • information retrieval
  • ontologies
  • natural language processing
  • knowledge representation

Cite this

Styltsvig, H. B. (2006). Ontology-based Information Retrieval. Roskilde: Roskilde Universitet. Datalogiske Skrifter, No. 107
Styltsvig, Henrik Bulskov. / Ontology-based Information Retrieval. Roskilde : Roskilde Universitet, 2006. 196 p. (Datalogiske Skrifter; No. 107).
@phdthesis{6e2e89a0b77211db974c000ea68e967b,
title = "Ontology-based Information Retrieval",
abstract = "In this thesis, we will present methods for introducing ontologies in information retrieval. The main hypothesis is that the inclusion of conceptual knowledge such as ontologies in the information retrieval process can contribute to the solution of major problems currently found in information retrieval.This utilization of ontologies has a number of challenges. Our focus is on the use of similarity measures derived from the knowledge about relations between concepts in ontologies, the recognition of semantic information in texts and the mapping of this knowledge into the ontologies in use, as well as how to fuse together the ideas of ontological similarity and ontological indexing into a realistic information retrieval scenario.To achieve the recognition of semantic knowledge in a text, shallow natural language processing is used during indexing that reveals knowledge to the level of noun phrases. Furthermore, we briefly cover the identification of semantic relations inside and between noun phrases, as well as discuss which kind of problems are caused by an increase in compoundness with respect to the structure of concepts in the evaluation of queries.Measuring similarity between concepts based on distances in the structure of the ontology is discussed. In addition, a shared nodes measure is introduced and, based on a set of intuitive similarity properties, compared to a number of different measures. In this comparison the shared nodes measure appears to be superior, though more computationally complex. Some of the major problems of shared nodes which relate to the way relations differ with respect to the degree they bring the concepts they connect closer are discussed. A generalized measure called weighted shared nodes is introduced to deal with these problems.Finally, the utilization of concept similarity in query evaluation is discussed. A semantic expansion approach that incorporates concept similarity is introduced and a generalized fuzzy set retrieval model that applies expansion during query evaluation is presented. While not commonly used in present information retrieval systems, it appears that the fuzzy set model comprises the flexibility needed when generalizing to an ontology-based retrieval model and, with the introduction of a hierarchical fuzzy aggregation principle, compound concepts can be handled in a straightforward and natural manner.",
keywords = "informations{\o}gning, ontologier, natursprogsanalyse, videnrepresentation, information retrieval, ontologies, natural language processing, knowledge representation",
author = "Styltsvig, {Henrik Bulskov}",
year = "2006",
language = "English",
publisher = "Roskilde Universitet",

}

Styltsvig, HB 2006, Ontology-based Information Retrieval. Datalogiske Skrifter, no. 107, Roskilde Universitet, Roskilde.

Ontology-based Information Retrieval. / Styltsvig, Henrik Bulskov.

Roskilde : Roskilde Universitet, 2006. 196 p.

Research output: Book/ReportPh.D. thesisResearch

TY - BOOK

T1 - Ontology-based Information Retrieval

AU - Styltsvig, Henrik Bulskov

PY - 2006

Y1 - 2006

N2 - In this thesis, we will present methods for introducing ontologies in information retrieval. The main hypothesis is that the inclusion of conceptual knowledge such as ontologies in the information retrieval process can contribute to the solution of major problems currently found in information retrieval.This utilization of ontologies has a number of challenges. Our focus is on the use of similarity measures derived from the knowledge about relations between concepts in ontologies, the recognition of semantic information in texts and the mapping of this knowledge into the ontologies in use, as well as how to fuse together the ideas of ontological similarity and ontological indexing into a realistic information retrieval scenario.To achieve the recognition of semantic knowledge in a text, shallow natural language processing is used during indexing that reveals knowledge to the level of noun phrases. Furthermore, we briefly cover the identification of semantic relations inside and between noun phrases, as well as discuss which kind of problems are caused by an increase in compoundness with respect to the structure of concepts in the evaluation of queries.Measuring similarity between concepts based on distances in the structure of the ontology is discussed. In addition, a shared nodes measure is introduced and, based on a set of intuitive similarity properties, compared to a number of different measures. In this comparison the shared nodes measure appears to be superior, though more computationally complex. Some of the major problems of shared nodes which relate to the way relations differ with respect to the degree they bring the concepts they connect closer are discussed. A generalized measure called weighted shared nodes is introduced to deal with these problems.Finally, the utilization of concept similarity in query evaluation is discussed. A semantic expansion approach that incorporates concept similarity is introduced and a generalized fuzzy set retrieval model that applies expansion during query evaluation is presented. While not commonly used in present information retrieval systems, it appears that the fuzzy set model comprises the flexibility needed when generalizing to an ontology-based retrieval model and, with the introduction of a hierarchical fuzzy aggregation principle, compound concepts can be handled in a straightforward and natural manner.

AB - In this thesis, we will present methods for introducing ontologies in information retrieval. The main hypothesis is that the inclusion of conceptual knowledge such as ontologies in the information retrieval process can contribute to the solution of major problems currently found in information retrieval.This utilization of ontologies has a number of challenges. Our focus is on the use of similarity measures derived from the knowledge about relations between concepts in ontologies, the recognition of semantic information in texts and the mapping of this knowledge into the ontologies in use, as well as how to fuse together the ideas of ontological similarity and ontological indexing into a realistic information retrieval scenario.To achieve the recognition of semantic knowledge in a text, shallow natural language processing is used during indexing that reveals knowledge to the level of noun phrases. Furthermore, we briefly cover the identification of semantic relations inside and between noun phrases, as well as discuss which kind of problems are caused by an increase in compoundness with respect to the structure of concepts in the evaluation of queries.Measuring similarity between concepts based on distances in the structure of the ontology is discussed. In addition, a shared nodes measure is introduced and, based on a set of intuitive similarity properties, compared to a number of different measures. In this comparison the shared nodes measure appears to be superior, though more computationally complex. Some of the major problems of shared nodes which relate to the way relations differ with respect to the degree they bring the concepts they connect closer are discussed. A generalized measure called weighted shared nodes is introduced to deal with these problems.Finally, the utilization of concept similarity in query evaluation is discussed. A semantic expansion approach that incorporates concept similarity is introduced and a generalized fuzzy set retrieval model that applies expansion during query evaluation is presented. While not commonly used in present information retrieval systems, it appears that the fuzzy set model comprises the flexibility needed when generalizing to an ontology-based retrieval model and, with the introduction of a hierarchical fuzzy aggregation principle, compound concepts can be handled in a straightforward and natural manner.

KW - informationsøgning

KW - ontologier

KW - natursprogsanalyse

KW - videnrepresentation

KW - information retrieval

KW - ontologies

KW - natural language processing

KW - knowledge representation

M3 - Ph.D. thesis

BT - Ontology-based Information Retrieval

PB - Roskilde Universitet

CY - Roskilde

ER -

Styltsvig HB. Ontology-based Information Retrieval. Roskilde: Roskilde Universitet, 2006. 196 p. (Datalogiske Skrifter; No. 107).