Soham Mukherjee
Soham Mukherjee

Lead Software Engineer

Hi! I’m Soham, a Lead Software Engineer at Cadence Design Systems. I completed my PhD in Computer Science at Purdue University, where I worked in the CTDA group under the guidance of Prof. Tamal K. Dey. My doctoral research focused on integrating concepts from computational topology into machine learning, culminating in my dissertation titled “Unveiling Patterns in Data: Harnessing Computational Topology in Machine Learning.” (You can find it here — feel free to take a look!)

I started my PhD journey at The Ohio State University and later moved to Purdue in Fall 2020 with my advisor. Prior to that, I earned my Bachelor’s degree in Electronics and Telecommunication Engineering from Jadavpur University, Kolkata, where I conducted undergraduate research in the Circuits and Systems domain in the ADESL lab with Prof. Mrinal Kanti Naskar.

I’m originally from Kolkata, the “City of Joy” in eastern India—a place that never fails to inspire me. Outside of research and engineering, I enjoy exploring the world through my camera lens. Photography is my creative escape from the equations and algorithms I wrestle with during the day.

Download Résumé

Experience

  1. Lead Software Engineer

    Cadence Design Systems

    Responsibilities include:

    • Built an end-to-end pipeline to convert raw unstructured data into attributed graphs and benchmarked GNN architectures — GCN, GIN, and EdgeConv — for a regression task on sparse, high-dimensional targets.
    • Designed a multimodal predictor combining graph embeddings, log embeddings, and state-space models to capture both structural and sequential signals from heterogeneous data sources.
    • Developed a pre-training strategy using Triplet Loss to learn similarity structure from a proxy label before fine-tuning on sparse downstream targets, improving generalization on limited labeled data.
    • Built a scalable data augmentation framework that programmatically generates and validates new graph instances, enabling robust dataset enrichment for ML benchmarking and robustness testing.
    • Wrote an NLP-based feature extraction framework to parse unstructured logs into structured features for downstream ML modeling.
    • Scaled GNN training to graphs with up to 300K nodes using distributed GPUs; built experiment tracking, dataset versioning, and containerized workflows for fast iteration across architectures.
  2. Research Intern

    IBM T.J. Watson Research Centre

    Responsibilities include:

    • Graph generation with geometrical and topological constraints. (Patent filed)
  3. Engineering Intern

    Physna Inc.

    Responsibilities include:

    • Deployed CNNs to predict 3D computer-aided design (CAD) models from 2D images.
    • Automated segmentation and registration of point-cloud data obtained from scanning machine parts enabling efficient and accurate inspection.

Education

  1. PhD Computer Science

    Purdue University

    Thesis on Computational Topology and Machine Learning. Supervised by Prof Tamal K. Dey. Courses included:

    • Deep Learning
    • Machine Learning
    • Machine Learning on Graphs
    Read Thesis
  2. MS Computer Science and Engineering

    The Ohio State University
    GPA: 3.8/4.0
  3. BE Electronics & Teleccommunication Engineering

    Jadavpur University
    GPA: 9.5/10.0
Skills & Hobbies
Programming Languages
Python
PyTorch
TensorFlow
HuggingFace HuggingFace
AI Topics
Graph Neural Network
Gen-AI
LLMs
Agentic AI
Recent Publications
(2024). D-GRIL: End-to-end topological learning with 2-parameter persistence. arXiv preprint arXiv:2406.07100.
(2023). GRIL: A 2-parameter Persistence Based Vectorization for Machine Learning. Proceedings of 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML).
(2023). Topological Deep Learning: Going Beyond Graph Data. arXiv preprint arXiv:2206.00606.
(2022). Determining clinically relevant features in cytometry data using persistent homology. PLoS Computational Biology.
(2022). GEFL: Extended filtration learning for graph classification. Learning on Graphs Conference.