Ontology-based Information Retrieval

Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandlingForskning

Resumé

Fokus i denne afhandling er anvendelse af ontologier i informationssøgning (Information Retrieval). Den overordnede hypotese er, at indføring af konceptuel viden, så som ontologier, i forbindelse med forespørgselsevaluering kan bidrage til løsning af væsentlige problemer i eksisterende metoder.


Denne inddragelse af ontologier indeholder en række væsentlige udfordringer. Vi har valgt at fokusere på similaritetsmål der baserer sig på viden om relationer mellem begreber, på genkendelse af semantisk viden i tekst og på hvordan ontologibaserede similaritetsmål og semantisk indeksering kan forenes i en realistisk tilgang til informationssøgning.


Genkendelse af semantisk viden i tekst udføres ved hjælp af en simpel natursprogsbehandling i indekseringsprocessen, med det formål at afdække substantivfraser. Endvidere, vil vi skitsere problemstillinger forbundet med at identificere hvilke semantiske relationer simple substantivfraser er opbygget af og diskutere hvordan en forøgelse af sammenføjning af begreber influerer på forespørgselsevalueringen.


Der redegøres for hvorledes et mål for similaritet kan baseres på afstand i ontologiers struktur, og introduceres et nyt afstandsmål -- \q{shared nodes}. Dette mål sammenlignes med en række andre mål ved hjælp af en samling af intuitive egenskaber for similaritetsmål. Denne sammenligning viser at \q{shared nodes} har fortrin frem for øvrige mål, men også at det er beregningsmæssigt mere indviklet. Der redegøres endvidere for en række væsentlige problemer forbundet med \q{shared nodes}, som er relateret til den forskel der er mellem relationer med hensyn til i hvor høj grad de bringer de begreber de forbinder, sammen. Et mere generelt mål, \q{weighted shared nodes}, introduceres som løsning på disse problemer.


Afslutningsvist fokuseres der på hvorledes et similaritetsmål, der sammenligner begreber, kan inddrages i forespørgselsevalueringen. Den løsning vi præsenterer indfører en semantisk ekspansion baseret på similaritetsmål. Evalueringsmetoden der anvendes er en generaliseret \q{fuzzy set retrieval} model, der inkluderer ekspansion af forespørgsler. Selvom det ikke er almindeligt at anvende fuzzy set modellen i informationssøgning, viser det sig at den har den fornødne fleksibilitet til en generalisering til ontologibaseret forespørgselsevaluering, og at indførelsen af et hierarkisk aggregeringsprincip giver mulighed for at behandle sammensatte begreber på en simpel og naturlig måde.


OriginalsprogEngelsk
Udgivelses stedRoskilde
ForlagRoskilde Universitet
Antal sider196
StatusUdgivet - 2006
NavnDatalogiske Skrifter
Nummer107
ISSN0109-9779

Emneord

  • informationsøgning
  • ontologier
  • natursprogsanalyse
  • videnrepresentation

Citer dette

Styltsvig, H. B. (2006). Ontology-based Information Retrieval. Roskilde: Roskilde Universitet. Datalogiske Skrifter, Nr. 107
Styltsvig, Henrik Bulskov. / Ontology-based Information Retrieval. Roskilde : Roskilde Universitet, 2006. 196 s. (Datalogiske Skrifter; Nr. 107).
@phdthesis{6e2e89a0b77211db974c000ea68e967b,
title = "Ontology-based Information Retrieval",
abstract = "In this thesis, we will present methods for introducing ontologies in information retrieval. The main hypothesis is that the inclusion of conceptual knowledge such as ontologies in the information retrieval process can contribute to the solution of major problems currently found in information retrieval.This utilization of ontologies has a number of challenges. Our focus is on the use of similarity measures derived from the knowledge about relations between concepts in ontologies, the recognition of semantic information in texts and the mapping of this knowledge into the ontologies in use, as well as how to fuse together the ideas of ontological similarity and ontological indexing into a realistic information retrieval scenario.To achieve the recognition of semantic knowledge in a text, shallow natural language processing is used during indexing that reveals knowledge to the level of noun phrases. Furthermore, we briefly cover the identification of semantic relations inside and between noun phrases, as well as discuss which kind of problems are caused by an increase in compoundness with respect to the structure of concepts in the evaluation of queries.Measuring similarity between concepts based on distances in the structure of the ontology is discussed. In addition, a shared nodes measure is introduced and, based on a set of intuitive similarity properties, compared to a number of different measures. In this comparison the shared nodes measure appears to be superior, though more computationally complex. Some of the major problems of shared nodes which relate to the way relations differ with respect to the degree they bring the concepts they connect closer are discussed. A generalized measure called weighted shared nodes is introduced to deal with these problems.Finally, the utilization of concept similarity in query evaluation is discussed. A semantic expansion approach that incorporates concept similarity is introduced and a generalized fuzzy set retrieval model that applies expansion during query evaluation is presented. While not commonly used in present information retrieval systems, it appears that the fuzzy set model comprises the flexibility needed when generalizing to an ontology-based retrieval model and, with the introduction of a hierarchical fuzzy aggregation principle, compound concepts can be handled in a straightforward and natural manner.",
keywords = "informations{\o}gning, ontologier, natursprogsanalyse, videnrepresentation, information retrieval, ontologies, natural language processing, knowledge representation",
author = "Styltsvig, {Henrik Bulskov}",
year = "2006",
language = "English",
publisher = "Roskilde Universitet",

}

Styltsvig, HB 2006, Ontology-based Information Retrieval. Datalogiske Skrifter, nr. 107, Roskilde Universitet, Roskilde.

Ontology-based Information Retrieval. / Styltsvig, Henrik Bulskov.

Roskilde : Roskilde Universitet, 2006. 196 s. (Datalogiske Skrifter; Nr. 107).

Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandlingForskning

TY - BOOK

T1 - Ontology-based Information Retrieval

AU - Styltsvig, Henrik Bulskov

PY - 2006

Y1 - 2006

N2 - In this thesis, we will present methods for introducing ontologies in information retrieval. The main hypothesis is that the inclusion of conceptual knowledge such as ontologies in the information retrieval process can contribute to the solution of major problems currently found in information retrieval.This utilization of ontologies has a number of challenges. Our focus is on the use of similarity measures derived from the knowledge about relations between concepts in ontologies, the recognition of semantic information in texts and the mapping of this knowledge into the ontologies in use, as well as how to fuse together the ideas of ontological similarity and ontological indexing into a realistic information retrieval scenario.To achieve the recognition of semantic knowledge in a text, shallow natural language processing is used during indexing that reveals knowledge to the level of noun phrases. Furthermore, we briefly cover the identification of semantic relations inside and between noun phrases, as well as discuss which kind of problems are caused by an increase in compoundness with respect to the structure of concepts in the evaluation of queries.Measuring similarity between concepts based on distances in the structure of the ontology is discussed. In addition, a shared nodes measure is introduced and, based on a set of intuitive similarity properties, compared to a number of different measures. In this comparison the shared nodes measure appears to be superior, though more computationally complex. Some of the major problems of shared nodes which relate to the way relations differ with respect to the degree they bring the concepts they connect closer are discussed. A generalized measure called weighted shared nodes is introduced to deal with these problems.Finally, the utilization of concept similarity in query evaluation is discussed. A semantic expansion approach that incorporates concept similarity is introduced and a generalized fuzzy set retrieval model that applies expansion during query evaluation is presented. While not commonly used in present information retrieval systems, it appears that the fuzzy set model comprises the flexibility needed when generalizing to an ontology-based retrieval model and, with the introduction of a hierarchical fuzzy aggregation principle, compound concepts can be handled in a straightforward and natural manner.

AB - In this thesis, we will present methods for introducing ontologies in information retrieval. The main hypothesis is that the inclusion of conceptual knowledge such as ontologies in the information retrieval process can contribute to the solution of major problems currently found in information retrieval.This utilization of ontologies has a number of challenges. Our focus is on the use of similarity measures derived from the knowledge about relations between concepts in ontologies, the recognition of semantic information in texts and the mapping of this knowledge into the ontologies in use, as well as how to fuse together the ideas of ontological similarity and ontological indexing into a realistic information retrieval scenario.To achieve the recognition of semantic knowledge in a text, shallow natural language processing is used during indexing that reveals knowledge to the level of noun phrases. Furthermore, we briefly cover the identification of semantic relations inside and between noun phrases, as well as discuss which kind of problems are caused by an increase in compoundness with respect to the structure of concepts in the evaluation of queries.Measuring similarity between concepts based on distances in the structure of the ontology is discussed. In addition, a shared nodes measure is introduced and, based on a set of intuitive similarity properties, compared to a number of different measures. In this comparison the shared nodes measure appears to be superior, though more computationally complex. Some of the major problems of shared nodes which relate to the way relations differ with respect to the degree they bring the concepts they connect closer are discussed. A generalized measure called weighted shared nodes is introduced to deal with these problems.Finally, the utilization of concept similarity in query evaluation is discussed. A semantic expansion approach that incorporates concept similarity is introduced and a generalized fuzzy set retrieval model that applies expansion during query evaluation is presented. While not commonly used in present information retrieval systems, it appears that the fuzzy set model comprises the flexibility needed when generalizing to an ontology-based retrieval model and, with the introduction of a hierarchical fuzzy aggregation principle, compound concepts can be handled in a straightforward and natural manner.

KW - informationsøgning

KW - ontologier

KW - natursprogsanalyse

KW - videnrepresentation

KW - information retrieval

KW - ontologies

KW - natural language processing

KW - knowledge representation

M3 - Ph.D. thesis

BT - Ontology-based Information Retrieval

PB - Roskilde Universitet

CY - Roskilde

ER -

Styltsvig HB. Ontology-based Information Retrieval. Roskilde: Roskilde Universitet, 2006. 196 s. (Datalogiske Skrifter; Nr. 107).