RITUPAM SARMA
Bioinformatician & Demonstrator | University of Glasgow
Profile
Bioinformatician with broad expertise spanning cancer biology, infection biology, and computational genomics. Proficient in analysing complex multi-omic datasets, including spatial transcriptomics, from cancer models and protozoan parasites like Plasmodium and Leishmania. Adept at developing and applying advanced computational workflows in R, Python, and UNIX/Linux for NGS data processing, variant analysis, and network mapping. Passionate about translating complex biological questions into robust computational strategies, with strong interdisciplinary training and a proven ability to collaborate effectively in international research settings.
Core Skills
Programming & Workflow Development
- Languages: R, Python, SQL, UNIX/Linux.
- Workflow Automation: Automated pipelines for RNA-seq, GWAS, and WGS on HPC clusters using job schedulers (qsub).
- Database Management: Designed bioinformatics databases using different flavours of SQL including mySQL, PostgreSQL, SQLite; automated CSV/TSV processing, and performed data extraction with SQL.
Statistical Analysis & Machine Learning
- Tools: DESeq2, scikit-learn, SciPy, linear models, hypothesis testing, enrichment analysis.
- Applied statistical models in R and Python for differential expression, variant enrichment, and GWAS interpretation.
- Recent training in PyTorch, TensorFlow and unsupervised learning.
Genomic & Transcriptomic Analysis
- Techniques: RNA-seq (bulk & single-cell), WGS, GWAS, variant annotation.
- Tools: PLINK, VEP, SIFT, PolyPhen-2, IGV, Seurat.
- Experience includes variant filtering (VCF, GFF, FASTA), BRCA1 variant analysis, SNP-level QC, and Manhattan/QQ plots in R.
NGS & Multi-Omics Platforms
- Microbial genomics: Analysed E. coli O157:H7 genomes for quality control, error correction, and downstream assembly.
- Spatial & single-cell transcriptomics: Analysed 10x Visium data, applied Seurat for clustering, integration, and biomarker identification.
- Multi-omics integration in Leishmania: Combined WGS, metabolomics, and protein modelling to study drug resistance.
Data Visualisation & Reporting
- Visualisation tools: ggplot2, Matplotlib, Seaborn, PCA, volcano plots, heatmaps, expression maps.
- Communicated insights from spatial, multi-omic, and single-cell studies with customised plots and visual summaries.
Molecular Visualisation & Bioinformatics Tools
- PyMOL for 3D molecular interactions and educational visuals.
- Familiar with MetaboAnalyst, SnpEff, freebayes, vcffilter, and Galaxy platform.
Chronic Stress Effects on Lung Cancer
View PresentationCollaborative project (CRUK, University of Glasgow, Medical University of Vienna) using 10x Visium to investigate spatial gene expression changes in lung cancer models under chronic stress, focusing on biomarker discovery.
Chronic stress has been linked to cancer progression, but its impact on the tumor microenvironment (TME) at a spatial level remains unclear. This project aims to determine how stress influences spatial gene expression in lung cancer and whether these effects differ between tumor and non-tumor regions.
Institutions: University of Glasgow, CRUK Scotland Institute, Medical University of Vienna
Key Contributors: Dr. Chrysoula Vraka (CRUK Scotland Institute & Medical University of Vienna), Mr. John Cole (University of Glasgow)
- Cohort: 19 lung cancer samples (9 high-stress, 10 low-stress), stress quantified via questionnaires and glucocorticoid levels
- Technology: 10x Genomics Visium spatial transcriptomics with tumor and non-tumor regions annotated by pathology
- Analysis Pipeline: Seurat workflow including QC (filtering low-quality spots), normalization, scaling, PCA/UMAP, clustering, differential expression, marker gene identification, and spatial visualization
- Built and executed the Seurat analysis pipeline from QC to clustering and visualization
- Conducted differential expression analysis to compare high- vs. low-stress groups
- Performed cluster marker identification and interpretation of top 10 genes per cluster
- Collaborated with supervisors to plan next steps, including validation and multi-omic integration
- High- vs. low-stress groups showed distinct expression differences, with low-stress samples generally having higher gene expression
- Median detected genes per spot: 400–5,500; average reads per spot: ~25,000
- Identified clear spatial clustering across tumor and non-tumor regions
- Top marker genes successfully characterized spatial domains of the tissue
Interpretation: Chronic stress significantly influences spatial gene expression patterns in lung cancer. Stress-related effects are evident in both tumor and non-tumor regions, suggesting a broader impact on TME interactions.
Colorectal Cancer Liver Metastasis (CRLM)
AACR AbstractDeveloped an end-to-end bioinformatics pipeline for multi-scale, multi-omic assessment of CRLM using spatial transcriptomics. Analysis identified region-specific insights and nominated MET and IL17 as potential therapeutic targets.
Colorectal cancer (CRC) often metastasizes to the liver. The tumor microenvironment (TME) and spatial heterogeneity within the metastases are poorly understood. This project aimed at characterizing spatial gene expression patterns in CRLM to discover region-specific drivers and therapeutic targets.
Cohort: 41 patients (29 untreated) underwent synchronous CRC and CRLM resection (2002–2010)
- CosMx SMI (Nanostring): Paired CRC & CRLM from 4 patients; single-cell ST, 23 fields of view (FOV)
- GeoMx DSP (Nanostring): Paired CRC & CRLM from 4 patients; regional profiling across PanCK+, PanCK-, aSMA+ compartments, 116 FOV
- 10x Genomics Visium: Paired CRC & CRLM from 2 patients; regional ST, 4 FOV
- Focused on 10x Visium analysis, built an end-to-end bioinformatic pipeline (QC, alignment, clustering, differential expression, visualization)
- Integrated CRC vs. CRLM samples with Seurat (R) to explore region-specific patterns
- Performed pathway enrichment and visualization (heatmaps, volcano plots, SpatialFeaturePlots)
- Collaborated closely with clinicians and wet-lab scientists to validate findings
- Applied SingleR for cell-type annotation of clusters, with clinical validation confirming accuracy and translational relevance
- Identified distinct spatial expression signatures between primary CRC and liver metastases
- Found enrichment of MET and IL17 pathways in CRLM, highlighting potential druggable targets
- Results formed part of an AACR 2025 abstract and manuscript in preparation
MSc Project: Plasmodium falciparum Gene Families
Investigated the role of variant surface antigens (VSAs), specifically rifin and stevor superfamilies, in malaria pathogenesis. Employed network mapping and integrated expression data across life cycle stages to delineate novel gene sub-groups. Discovered new patterns of co-expression, suggesting specific roles in immune evasion and deepening understanding of host-parasite interactions.
Variant surface antigens (VSAs) like rifin and stevor families in Plasmodium falciparum play key roles in malaria pathogenesis and immune evasion. This MSc project aimed to map sequence-similarity networks for these families to detect sub-groups and relate clusters to parasite life cycle stages and cell-line phenotypes.
Institutions: MSc Bioinformatics, University of Glasgow
Key Contributors: Project by Ritupam Sarma (GUID: 2734454s), supervised by Dr Virginia Howick, with contributions from Dr Priscilla Ngotho
- Sequence Analysis: All-vs-all BLASTn on rifin and stevor FASTA sets to identify sequence similarities
- Data Processing: Filtering and table integration using Python and R for network preparation
- Network Construction: Exported networks in GEXF format; community detection via Gephi modularity (Louvain algorithm)
- Visualization: Applied ForceAtlas2 and Fruchterman-Reingold layouts in Gephi for cluster representation
- Integration: Related clusters to expression data across parasite life cycle stages
- Led the entire MSc project, from data collection and BLASTn analysis to network construction and visualization
- Developed Python and R scripts for data filtering, integration, and initial cluster analysis
- Performed Gephi-based community detection and layout optimization to identify novel sub-groups
- Integrated expression data with networks to uncover co-expression patterns and their biological implications
- Collaborated with supervisors to interpret results and propose future functional studies
- Rifin Family: Identified four major and one minor clusters with clear separation of rifin A vs. rifin B subfamilies
- Isolate Structure: Minimal isolate-driven clustering, with ring-stage sequences dominating the networks
- Stevor Family: Revealed two major clusters, with unexpected isolate-specific grouping and limited correlation to life cycle stages
- Clinical Insights: Majority of stevor sequences originated from acute clinical samples, suggesting stage-specific roles
Interpretation: The clear intra-family clustering indicates underlying functional sub-structures within rifin and stevor families, potentially linked to immune evasion strategies. These findings deepen understanding of host-parasite interactions in malaria pathogenesis.
Amphotericin B Resistance in Leishmania
Applied a multi-omic approach (WGS, metabolomics, protein modelling) to investigate drug resistance mechanisms. Analyzed Cyp51 mutations affecting ergosterol synthesis, identifying specific mutations and metabolic shifts contributing directly to resistance.
Amphotericin B is a frontline drug for leishmaniasis, but resistance in Leishmania mexicana compromises treatment efficacy. This project used a polyomics approach to identify genomic and metabolic determinants of resistance, focusing on CYP51's role in sterol biosynthesis and ergosterol pathway disruption.
Institutions: University of Glasgow (Glasgow Polyomics and HPCC resources)
Key Contributors: MSc Bioinformatics 2022-23 project (author ID: 2734454); datasets from Glasgow Polyomics; prior resistant lines from Mwenechanya et al. (2017)
- Genomics Pipeline: Whole-genome sequencing (WGS) analysis with FastQC, TrimGalore!, Bowtie2 alignment, Samtools, IGV visualization, VariantToolChest (VTC), SnpEff/SnpSift annotation, bamaddrg, FreeBayes variant calling, and vcffilter
- Metabolomics: LC-MS data processing with PiMP for QC and MetaboAnalyst for PCA, volcano plots, and pathway analysis
- Protein Modeling: BLAST homology search, SignalP 5.0 for signal peptides, SWISS-MODEL for structure prediction, visualization in PyMOL and UCSF ChimeraX
- Structural Analysis: Ligand docking with SwissDock (obtusifoliol), molecular dynamics (MD) simulations with UNRES, and protein-protein interaction (PPI) docking with ClusPro
- Designed and executed the multi-omics bioinformatics pipeline for WGS variant calling and annotation, identifying resistance-associated SNPs
- Processed LC-MS metabolomics data, performing QC, statistical analysis, and pathway enrichment to link metabolic shifts to resistance
- Conducted protein structure modeling and docking simulations for CYP51 variants, interpreting structural impacts on ergosterol synthesis
- Integrated genomic, metabolomic, and structural data to converge on CYP51 as a key resistance mediator
- Collaborated with the Glasgow Polyomics team to validate findings and propose therapeutic re-sensitization strategies
- Genomic Variants: Identified 127 non-synonymous variants between resistant and wild-type (WT) strains, with a key Asn176Ile substitution in CYP51
- Metabolomics: PCA separated resistant vs. WT groups; sterol-linked metabolites showed significant fold-changes, indicating disrupted ergosterol synthesis
- Structural Insights: Docking and MD simulations revealed altered CYP51 binding behavior; PPI analysis showed reduced interacting surface area
- Pathway Impact: Convergent evidence of CYP51-mediated sterol pathway disruption as the primary resistance mechanism
Interpretation: The multi-omics integration highlights CYP51 mutations and downstream metabolic shifts as central to Amphotericin B resistance, providing targets for re-sensitization strategies to restore WT activity and improve drug efficacy.
Developed and deployed web applications for bioinformatics data analysis, demonstrating proficiency in translating computational workflows into user-friendly tools.
Demonstrator
University of Glasgow, UK | 09/2024 – 09/2025
Assisted multiple cohorts of undergraduate, MSc, and PhD students in mastering bioinformatics concepts, focusing on R, command-line tools, and genomic data analysis. Courses included: Omics & R, Statistics & Clinical Data, Command Line Omics & Genome Analysis, and RNA-seq.
Demonstrator
BSI Bioinformatics Course | 12/2024 – 09/2025
Led hands-on training in basic R and advanced scRNA-seq workflows using Seurat (R), covering data preprocessing, QC, visualisation, batch correction, cell-type annotation, and differential expression analysis.
MSc Bioinformatics (Project Grade: Merit)
University of Glasgow, UK | 2022 - 2023
BSc Zoology (Grade: 83.36%)
Cotton University, INDIA | 2018 - 2021
- Core studies in animal biology, systematics, comparative anatomy, and evolutionary biology.
- Coursework in Cell & Molecular Biology, Genetics, Microbiology, and Biotechnology.
- Foundation in Physiology, Biochemistry, and Immunology.
- RNA-Seq with Bioconductor in R - DataCamp (Completed Sep 2025)
- Analyzing Genomic Data in R - DataCamp (Completed Sep 2025) - 16 hours
- Intermediate SQL - DataCamp (Completed Nov 2025)
- Introduction to SQL - DataCamp (Completed Nov 2025)
- Introduction to TensorFlow in Python - DataCamp (On track for completion)
- Introduction to Deep Learning with PyTorch - DataCamp (Completed June 2025)
- Unsupervised Learning in Python - DataCamp (Completed May 2025)
- Unsupervised Learning in R - DataCamp (Completed May 2025)
Colin Wood, Luke McNickle, Andrew Cameron, Ritupam Sarma, Vaidehi Pandya, Joao Da Silva Filho, Colin Steele, John Cole, Joanne Edwards, Paul Horgan, Campbell Roxburgh. Multi-scale multi-omic assessment of matched synchronous colorectal cancer liver metastases using multiple spatial transcriptomic tools [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 4586.
https://doi.org/10.1158/1538-7445.AM2025-4586