Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells

Wilson K.M. Wong, Vinod Thorat, Mugdha V. Joglekar, Charlotte X. Dong, Hugo Lee, Yi Vee Chew, Adwait Bhave, Wayne J. Hawthorne, Feyza Engin, Aniruddha Pant, Louise T. Dalgaard, Sharda Bapat, Anandwardhan A. Hardikar*

*Corresponding author

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Abstract

Machine learning (ML)-workflows enable unprejudiced/robust evaluation of complex datasets. Here, we analyzed over 490,000,000 data points to compare 10 different ML-workflows in a large (N=11,652) training dataset of human pancreatic single-cell (sc-)transcriptomes to identify genes associated with the presence or absence of insulin transcript(s). Prediction accuracy/sensitivity of each ML-workflow was tested in a separate validation dataset (N=2,913). Ensemble ML-workflows, in particular Random Forest ML-algorithm delivered high predictive power (AUC=0.83) and sensitivity (0.98), compared to other algorithms. The transcripts identified through these analyses also demonstrated significant correlation with insulin in bulk RNA-seq data from human islets. The top-10 features, (including IAPP, ADCYAP1, LDHA and SST) common to the three Ensemble ML-workflows were significantly dysregulated in scRNA-seq datasets from Ire-1αβ-/- mice that demonstrate dedifferentiation of pancreatic β-cells in a model of type 1 diabetes (T1D) and in pancreatic single cells from individuals with type 2 Diabetes (T2D). Our findings provide direct comparison of ML-workflows in big data analyses, identify key elements associated with insulin transcription and provide workflows for future analyses.

OriginalsprogEngelsk
Artikelnummer853863
TidsskriftFrontiers in Endocrinology
Vol/bind13
DOI
StatusUdgivet - 23 mar. 2022

Emneord

  • beta-cell
  • diabetes
  • human islet
  • insulin
  • machine-learning (ML) algorithms
  • single-cell RNA-sequencing (scRNAseq)

Citer dette