Supervised learning of Plasmodium falciparum life cycle stages using single-cell transcriptomes identifies crucial proteins
Author(s): Swarnim Shukla, Soham Choudhuri, Gayathri Priya Iragavarapu, and Bhaswar Ghosh.
Vital gene expressions form the basis for the detection of malaria infection levels. Quantification of infected erythrocytes and classification of their life cycle stages are done at a macroscopic level by experts, for making informed decisions for diagnosis and treatment of malaria. Of late multiple computational approaches have been proposed to circumvent the problem of dimensionality leading to accurately predicted results. In this work, a dimensionality reduction technique based on Genetic Algorithm (GA) is applied to Plasmodium falciparum single cell transcriptomics to arrive at an optimized subset of features from the larger dataset. Features are chosen based on their class variants considering increased efficiency and accuracy, to separately transform the selected elements into a lower dimension. For the classification of the life cycle of malaria parasites based on single cell transcriptome data, a three-pronged approach employing the multiclass Support Vector Machine (SVM), Logistic Regression (LR), and Random Forest (RF) technique is used. Further, we constructed protein interaction networks of the genes identified by the feature selection method and gene ontology analysis elucidated the role of the proteins in the progression of the parasite through its life cycle. Our approach presents a novel protocol to implement ML techniques on scRNA seq datasets and subsequently harness the extracted information for biomarker/drug target detection.