// ml.engineer · builder

Nikhil
Mourya

ML Engineer · Builder

I optimize models for production.
Production still finds a way to optimize me.

About

I'm Nikhil Mourya — I shrink models for a living; .from_pretrained() hands you weights, not a thesis.

I base out of BIT Mesra; half the "RAG stacks" I meet are a vector DB cosplaying as a product requirement — the rest is just vibes billing rent.

I'm always maintaining that unbroken eye contact with a single block of code for hours like we have beef.

Model Compression

Pruning and LoRA are quiet admissions that full fine-tuning is often billing for capacity you never needed. I like models that shrink without forgetting what they're for.

efficiency

NLP & Text Systems

Summarization, classification, pipelines where fluent isn't the same as faithful. Production NLP is mostly telling confident hallucinations they can't sit with us.

production

ML Engineering

Training loops, eval harnesses, Flask on GCP — the glue between notebook and someone else's pager. If it only runs on my laptop, it's a demo; if it survives Friday, it's work.

infra

Competitive Programming

Codeforces Specialist — 1400+. Graphs have a way of humbling you on a schedule; the upside is you stop trusting clever one-liners without proof.

cf · 1400+
ongoing

Specialist Codeforces 1400+

Still grinding rated rounds — the graphs are optional; the ego damage isn't. Proof you can think with a clock breathing down your neck.

peak: 1400+ · still climbing
2024

Finalist SIH — Hospital Mgmt System

99.5% uptime with real models in the loop — the 0.5% was character development. Backend stayed polite even when the night shift wasn't.

99.5% uptime · production ML
2023

Finalist IIIT Delhi — ResNet50

94% accuracy after grid search stopped me from brute-forcing the hyperparameter void. Sometimes the boring search is the clever move.

94% acc · −30% train time

Skills

PyTorch PyTorch
TensorFlow TensorFlow
HuggingFace HuggingFace
Scikit-learn Scikit-learn
NumPy NumPy
Pandas Pandas
Python Python
Flask Flask
Git Git
MySQL MySQL
LangChain LangChain
LangGraph LangGraph
PyTorch PyTorch
TensorFlow TensorFlow
HuggingFace HuggingFace
Scikit-learn Scikit-learn
NumPy NumPy
Pandas Pandas
Python Python
Flask Flask
Git Git
MySQL MySQL
LangChain LangChain
LangGraph LangGraph
C++ C++
Google Cloud Google Cloud
Linux Linux
Jupyter Jupyter
Weights & Biases Weights & Biases
GitHub GitHub
📊 Matplotlib
PyTorch PyTorch
Python Python
TensorFlow TensorFlow
FastAPI FastAPI
RAGs
C++ C++
Google Cloud Google Cloud
Linux Linux
Jupyter Jupyter
Weights & Biases Weights & Biases
GitHub GitHub
📊 Matplotlib
PyTorch PyTorch
Python Python
TensorFlow TensorFlow
FastAPI FastAPI
RAGs

Projects

01

Pruned U-Net for Biomedical Image Segmentation (IIT KGP)

Removed 97% of a model's parameters. It still works.

97.3% params removed 92% FLOPs cut IoU > 0.95
PyTorch U-Net Model Pruning Computer Vision MoNuSeg
02

PEGASUS + LoRA: Efficient Text Summarization

Fine-tuned 767M parameters using only 1.57M of them.

99.8% param reduction 27× faster training 767M → 1.57M
PyTorch HuggingFace PEGASUS LoRA NLP
03

AttentionIsALLICode

Not "used a framework." Actually from scratch.

Full architecture Multi-head attention Custom training loop
PyTorch Transformers NLP From Scratch
04

Vectorless RAGs

Vector databases are the default. I wanted to know if the default was actually necessary.

Zero embeddings Zero vector DB Full retrieval
Python Ollama LLaMA 3 RAG Tree Traversal Streamlit

System Evolution [Inference Mode]

[sys_msg]: education_weights_pruned.sh --mode=aggressive --silent
LAYER_01: HireBuddy_Founding_Member (2025) [STATUS: DEPLOYED]
[ROLE] Founding Member — Machine Learning & Product Hybrid
[INIT] Built AI resume-job matching sys (RAG, Transformers)
[PROCESS] NLP_Model.screen(volume="10k/day", target="shortlist")
[METRIC] shortlisting_acc: +35% | langChain_perf: 0.92
[DEPLOY] Serving via Flask + Docker on GCP; REST API integr.
[OPTIMIZE] User_Engagement.backprop(boost=+40%)
[SUCCESS] ML workflows compiled to production features.