Abstract
Programs for gene prediction in computational biology are examples of systems for which the acquisition of authentic test data is difficult as these require years of extensive research. This has lead to test methods based on semiartificially produced test data, often produced by {\em ad hoc} techniques complemented by statistical models such as Hidden Markov Models (HMM). The quality of such a test method depends on how well the test data reflect the regularities in known data and how well they generalize these regularities. So far only very simplified and generalized, artificial data sets have been tested, and a more thorough statistical foundation is required.
We propose to use logic-statistical modelling methods for machine-learning for analyzing existing and manually marked up data, integrated with the generation of new, artificial data. More specifically, we suggest to use the PRISM system developed by Sato and Kameya. Based on logic programming extended with random variables and parameter learning, PRISM appears as a powerful modelling environment, which subsumes HMMs and a wide range of other methods, all embedded in a declarative language. We illustrate these principles here, showing parts of a model under development for genetic sequences and indicate first initial experiments producing test data for evaluation of existing gene finders, exemplified by GENSCAN, HMMGene and genemark.hmm.
We propose to use logic-statistical modelling methods for machine-learning for analyzing existing and manually marked up data, integrated with the generation of new, artificial data. More specifically, we suggest to use the PRISM system developed by Sato and Kameya. Based on logic programming extended with random variables and parameter learning, PRISM appears as a powerful modelling environment, which subsumes HMMs and a wide range of other methods, all embedded in a declarative language. We illustrate these principles here, showing parts of a model under development for genetic sequences and indicate first initial experiments producing test data for evaluation of existing gene finders, exemplified by GENSCAN, HMMGene and genemark.hmm.
Originalsprog | Engelsk |
---|---|
Titel | Proc. International Conference on Machine Learning and Data Mining MLDM'2007 : Lecture Notes in Artificial Intelligence |
Antal sider | 15 |
Vol/bind | 4571 |
Forlag | Springer |
Publikationsdato | 2007 |
Sider | 741-755 |
ISBN (Trykt) | 978-3-540-73498-7 |
Status | Udgivet - 2007 |
Begivenhed | International Conference on Machine Learning and Data Mining MLDM'2007 - Leipzig, Tyskland Varighed: 18 jul. 2007 → 20 jul. 2007 |
Konference
Konference | International Conference on Machine Learning and Data Mining MLDM'2007 |
---|---|
Land/Område | Tyskland |
By | Leipzig |
Periode | 18/07/2007 → 20/07/2007 |
Navn | Lecture notes in artificial intelligence |
---|---|
Nummer | 4571 |
ISSN | 0302-9743 |
Emneord
- bioinformatik
- sekvensanalyse
- softwaretest
- maskinindlæring