Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrroly-sine containing genes

Christian Theil Have, Sine Zambach, Henning Christiansen

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

Background
Pyrrolysine (the 22nd amino acid) is in certain organisms and under certain circumstances encoded by the amber stop codon, UAG. The circumstances driving pyrrolysine translation are not well understood. The involvement of a predicted mRNA structure in the region downstream UAG has been suggested, but the structure does not seem to be present in all pyrrolysine incorporating genes.

Results
We propose a strategy to predict pyrrolysine encoding genes in genomes of archaea and bacteria. We cluster open reading frames interrupted by the amber codon based on sequence similarity. We rank these clusters according to several features that may influence pyrrolysine translation. The ranking effects of different features are assessed and we propose a weighted combination of these features which best explains the currently known pyrrolysine incorporating genes. We devote special attention to the effect of structural conservation and provide further substantiation to support that structural conservation may be influential – but is not a necessary factor. Finally, from the weighted ranking, we identify a number of potentially pyrrolysine incorporating genes.

Conclusions
We propose a method for prediction of pyrrolysine incorporating genes in genomes of bacteria and archaea leading to insights about the factors driving pyrrolysine translation and identification of new gene candidates. The method predicts known conserved genes with high recall and predicts several other promising candidates for experimental verification. The method is implemented as a computational pipeline which is available on request.
Original languageEnglish
JournalB M C Bioinformatics
Volume14
Issue number1
Number of pages12
ISSN1471-2105
DOIs
Publication statusPublished - 2013

Keywords

    Cite this

    @article{66d0c38d6ad245028bf7e851216235a3,
    title = "Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrroly-sine containing genes",
    abstract = "BackgroundPyrrolysine (the 22nd amino acid) is in certain organisms and under certain circumstances encoded by the amber stop codon, UAG. The circumstances driving pyrrolysine translation are not well understood. The involvement of a predicted mRNA structure in the region downstream UAG has been suggested, but the structure does not seem to be present in all pyrrolysine incorporating genes.ResultsWe propose a strategy to predict pyrrolysine encoding genes in genomes of archaea and bacteria. We cluster open reading frames interrupted by the amber codon based on sequence similarity. We rank these clusters according to several features that may influence pyrrolysine translation. The ranking effects of different features are assessed and we propose a weighted combination of these features which best explains the currently known pyrrolysine incorporating genes. We devote special attention to the effect of structural conservation and provide further substantiation to support that structural conservation may be influential – but is not a necessary factor. Finally, from the weighted ranking, we identify a number of potentially pyrrolysine incorporating genes.ConclusionsWe propose a method for prediction of pyrrolysine incorporating genes in genomes of bacteria and archaea leading to insights about the factors driving pyrrolysine translation and identification of new gene candidates. The method predicts known conserved genes with high recall and predicts several other promising candidates for experimental verification. The method is implemented as a computational pipeline which is available on request.",
    keywords = "Messenger RNA, Pyrrolysine, Amino Acids, Archaebacteria, Bacteria",
    author = "Have, {Christian Theil} and Sine Zambach and Henning Christiansen",
    year = "2013",
    doi = "10.1186/1471-2105-14-118",
    language = "English",
    volume = "14",
    journal = "B M C Bioinformatics",
    issn = "1471-2105",
    publisher = "BioMed Central Ltd.",
    number = "1",

    }

    Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrroly-sine containing genes. / Have, Christian Theil; Zambach, Sine; Christiansen, Henning.

    In: B M C Bioinformatics, Vol. 14, No. 1, 2013.

    Research output: Contribution to journalJournal articleResearchpeer-review

    TY - JOUR

    T1 - Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrroly-sine containing genes

    AU - Have, Christian Theil

    AU - Zambach, Sine

    AU - Christiansen, Henning

    PY - 2013

    Y1 - 2013

    N2 - BackgroundPyrrolysine (the 22nd amino acid) is in certain organisms and under certain circumstances encoded by the amber stop codon, UAG. The circumstances driving pyrrolysine translation are not well understood. The involvement of a predicted mRNA structure in the region downstream UAG has been suggested, but the structure does not seem to be present in all pyrrolysine incorporating genes.ResultsWe propose a strategy to predict pyrrolysine encoding genes in genomes of archaea and bacteria. We cluster open reading frames interrupted by the amber codon based on sequence similarity. We rank these clusters according to several features that may influence pyrrolysine translation. The ranking effects of different features are assessed and we propose a weighted combination of these features which best explains the currently known pyrrolysine incorporating genes. We devote special attention to the effect of structural conservation and provide further substantiation to support that structural conservation may be influential – but is not a necessary factor. Finally, from the weighted ranking, we identify a number of potentially pyrrolysine incorporating genes.ConclusionsWe propose a method for prediction of pyrrolysine incorporating genes in genomes of bacteria and archaea leading to insights about the factors driving pyrrolysine translation and identification of new gene candidates. The method predicts known conserved genes with high recall and predicts several other promising candidates for experimental verification. The method is implemented as a computational pipeline which is available on request.

    AB - BackgroundPyrrolysine (the 22nd amino acid) is in certain organisms and under certain circumstances encoded by the amber stop codon, UAG. The circumstances driving pyrrolysine translation are not well understood. The involvement of a predicted mRNA structure in the region downstream UAG has been suggested, but the structure does not seem to be present in all pyrrolysine incorporating genes.ResultsWe propose a strategy to predict pyrrolysine encoding genes in genomes of archaea and bacteria. We cluster open reading frames interrupted by the amber codon based on sequence similarity. We rank these clusters according to several features that may influence pyrrolysine translation. The ranking effects of different features are assessed and we propose a weighted combination of these features which best explains the currently known pyrrolysine incorporating genes. We devote special attention to the effect of structural conservation and provide further substantiation to support that structural conservation may be influential – but is not a necessary factor. Finally, from the weighted ranking, we identify a number of potentially pyrrolysine incorporating genes.ConclusionsWe propose a method for prediction of pyrrolysine incorporating genes in genomes of bacteria and archaea leading to insights about the factors driving pyrrolysine translation and identification of new gene candidates. The method predicts known conserved genes with high recall and predicts several other promising candidates for experimental verification. The method is implemented as a computational pipeline which is available on request.

    KW - Messenger RNA

    KW - Pyrrolysine

    KW - Amino Acids

    KW - Archaebacteria

    KW - Bacteria

    U2 - 10.1186/1471-2105-14-118

    DO - 10.1186/1471-2105-14-118

    M3 - Journal article

    VL - 14

    JO - B M C Bioinformatics

    JF - B M C Bioinformatics

    SN - 1471-2105

    IS - 1

    ER -