Skip to main navigation Skip to search Skip to main content

Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells

  • Wilson K.M. Wong
  • , Vinod Thorat
  • , Mugdha V. Joglekar
  • , Charlotte X. Dong
  • , Hugo Lee
  • , Yi Vee Chew
  • , Adwait Bhave
  • , Wayne J. Hawthorne
  • , Feyza Engin
  • , Aniruddha Pant
  • , Louise T. Dalgaard
  • , Sharda Bapat
  • , Anandwardhan A. Hardikar*
  • *Corresponding author

Research output: Contribution to journalJournal articleResearchpeer-review

147 Downloads (Pure)

Abstract

Machine learning (ML)-workflows enable unprejudiced/robust evaluation of complex datasets. Here, we analyzed over 490,000,000 data points to compare 10 different ML-workflows in a large (N=11,652) training dataset of human pancreatic single-cell (sc-)transcriptomes to identify genes associated with the presence or absence of insulin transcript(s). Prediction accuracy/sensitivity of each ML-workflow was tested in a separate validation dataset (N=2,913). Ensemble ML-workflows, in particular Random Forest ML-algorithm delivered high predictive power (AUC=0.83) and sensitivity (0.98), compared to other algorithms. The transcripts identified through these analyses also demonstrated significant correlation with insulin in bulk RNA-seq data from human islets. The top-10 features, (including IAPP, ADCYAP1, LDHA and SST) common to the three Ensemble ML-workflows were significantly dysregulated in scRNA-seq datasets from Ire-1αβ-/- mice that demonstrate dedifferentiation of pancreatic β-cells in a model of type 1 diabetes (T1D) and in pancreatic single cells from individuals with type 2 Diabetes (T2D). Our findings provide direct comparison of ML-workflows in big data analyses, identify key elements associated with insulin transcription and provide workflows for future analyses.

Original languageEnglish
Article number853863
JournalFrontiers in Endocrinology
Volume13
DOIs
Publication statusPublished - 23 Mar 2022

Keywords

  • beta-cell
  • diabetes
  • human islet
  • insulin
  • machine-learning (ML) algorithms
  • single-cell RNA-sequencing (scRNAseq)

Citation Styles