Probabilistisk Topic Modellering

Malthe Frøkjær-Rubbås, Kidane Mahari Tesfai & Christoffer Olling Back

Studenteropgave: Fagmodulprojekt

Abstrakt

This paper investigates the broad field of research of Statistical Natural Language Processing which has the study of mathematically categorizing words in texts, as their object of interest. More precisely, the report focuses on the specific topic model, Probabilistic Latent Semantic Analysis, in an attempt to demonstrate its mathematical modeling power. The paper will feature a comprehensive explanation of the mathematics behind this specific topic model, while taking the reader through some the basics of probability theory to the more complex structures of information theory and Expectation Maximization. The report will consist of some key aspects in these different fields of mathematics, and finally demonstrating their usefulness in their common relation to the PLSA-model. From this, the aim of this paper is to provide the reader with an elaborate understanding of PLSA and the mathematical concepts that it draws upon. Finally, the paper will demonstrate an artificially generated research example, that both will deepen the readers understanding of the assumptions of the model, and also exhibit the model's accuracy in computing the data that it is designed to analyze.

UddannelserMatematik, (Bachelor/kandidatuddannelse) Bachelor
SprogEngelsk
Udgivelsesdato20 dec. 2016
Antal sider49
VejledereEva Uhre

Emneord

  • PLSA