ML Engineer & AI Researcher

Himanshu Kumar

Final-year CS student, IIIT Nagpur

I build and deploy end-to-end ML systems — from production RAG pipelines and multi-agent architectures to fine-tuned LLMs and vision-language models. My work spans applied engineering (FastAPI, vector databases, LangGraph) and research (published at AAAI 2026), bridging the gap between model development and real-world deployment.

RAG Systems LLM Fine-Tuning Multi-Agent Systems Multimodal Learning Multilingual NLP Mechanistic Interpretability Vision-Language Models

Institution IIIT Nagpur

Degree B.Tech CSE

CGPA 8.41 / 10

Publications 3

Internships 4

LinkedIn GitHub Scholar Email

Publications

Research

2026

Under Review · Vizuara AI Labs

The Geometry of Entanglement: Bridging Representation Probing and Mechanistic Interpretability

Himanshu Kumar — Primary Researcher

Multi-scale investigation of how deep vision models implicitly encode attributes beyond their training objective, revealing a fundamental geometric asymmetry between task-relevant and implicitly encoded features.

Task features: ~750 localized neurons with strong correlations (|r|>0.6)

Implicit attributes: ~1,200 neurons with distributed weak signals (|r|<0.2), 94% linearly separable

95% of gender info concentrates in 1/16 of ResNet-50 dimensions (causal entanglement)

Paper coming soon

2026

LM4UC Workshop · AAAI 2026

When Gujarati Meets English: Toward Robust Translation of Code-Mixed Low-Resource Indian Language

Himanshu Kumar — First Author

Created the first large-scale Gujlish–English parallel corpus addressing translation for millions of Gujarati speakers who naturally code-mix Gujarati with English. Fine-tuned NLLB-200 for Romanized Gujarati and intra-sentential code-mixing.

30K sentence pairs via BPCC + GPT-4o generation with human validation

1.5–2× BLEU and ChrF++ improvements over Google Translate

New Gujlish evaluation benchmarks adapted from XNLI and IN22

Paper (OpenReview)

2025

arXiv Preprint

NanoVLM: How Small Can Vision Language Models Be and Still Generate Coherent Text?

Himanshu Kumar — First Author

Systematically studied the lower bound of VLM scale for coherent image captioning. Designed a family of parameter-efficient vision-language models achieving 10× parameter reduction vs. standard VLMs.

NanoVLM (mini/base/large) with up to 10× parameter reduction

Curated ShortDesc (20–25w) and LongDesc (60–70w) minimal alignment datasets

New evaluation axes: creativity, consistency, semantic coherence

arXiv Code

Work Experience

Industry & Research

Jan 2026 –
Mar 2026

Vizuara AI Labs

AI Research Intern — Mechanistic Interpretability

Investigated implicit demographic encoding in vision models using a multi-scale interpretability pipeline combining linear probing, filter-level correlation analysis, and representation geometry across ResNet and ViT architectures.
Identified distributed subspace encoding of sensitive attributes and performed targeted filter ablations, reducing gender probing accuracy from 85% to 43% while preserving task performance.

May 2025 –
Jul 2025

Wadhwani School of AI,
IIT Madras

AI Intern — Generative AI & RLHF

Developed an ambient soundscape music generation pipeline using MusicGen, training on a curated 10-hour dataset and building a web platform to collect structured human preference feedback.
Applied Reinforcement Learning from Human Feedback (RLHF) to align generated audio with human aesthetic and perceptual preferences using reward modeling and preference-based optimization.

Nov 2024 –
Mar 2025

IIT Guwahati

AI Research Intern — Multimodal Learning

Built multimodal image captioning VLM using a pretrained ViT encoder and GPT-2/BERT decoder, trained on Flickr30k.
Applied Bottom-Up Top-Down attention for enhanced feature extraction, achieving measurable improvements in caption quality.
Research directly informed the NanoVLM arXiv paper.

Jun 2024 –
Aug 2024

ProCohat Technologies

ML Engineering Intern

Architected a production MultiPDF RAG system with semantic chunking, embedding-based retrieval, and vector similarity search across 100+ documents, achieving 85% retrieval accuracy with Supabase pgvector storage.
Deployed end-to-end ML inference pipeline serving 50+ daily queries with sub-2s latency using FastAPI.

Selected Work

Projects

Multimodal RAG over HuggingFace Courses

Full-stack multimodal RAG system over 8 HuggingFace courses (2,200+ chunks) using BGE-small text and CLIP image embeddings in Qdrant. Streaming inference with FastAPI SSE and Gemini 2.5 Flash / Llama 3.3 70B fallback.

RAG CLIP Qdrant FastAPI Next.js

GitHub

Healthcare Appointment Scheduling — Multi-Agent System

LangGraph-based multi-agent workflow with specialized agents for intent parsing, doctor availability retrieval, and appointment booking. Real-time slot validation, conflict resolution, and async email/SMS notifications.

LangGraph Multi-Agent NLP FastAPI

GitHub

Sankshipt: Multilingual News Summarization

Transformer-based summarization pipeline for 10 Indian languages with language detection, cross-lingual normalization, and entity-aware summarization preserving key events and names.

Transformers Multilingual NLP Flask 10 Languages

GitHub

Multi-Label Sentiment Analysis

9-label sentiment classifier using DistilBERT for overlapping emotional categories. Addressed severe class imbalance via weighted binary cross-entropy and threshold tuning, achieving 88% accuracy.

DistilBERT PyTorch Multi-Label Class Imbalance

GitHub

NanoVLM — Tiny Vision-Language Model

ViT encoder + GPT-2 decoder VLM family achieving coherent image captioning at 10x smaller scale. Includes curated alignment datasets and novel evaluation criteria for caption quality.

ViT GPT-2 Multimodal PyTorch

GitHub

Technical

Skills

AI / ML Domains

NLP Computer Vision Multimodal Learning Generative AI LLMs RLHF RAG Systems Mechanistic Interpretability Low-Resource NLP Fine-Tuning (LoRA/PEFT) Prompt Engineering TTS/STT

Frameworks & Libraries

PyTorch TensorFlow HuggingFace Transformers Sentence-Transformers LangChain LangGraph scikit-learn Pandas NumPy

Languages

Python C / C++ SQL CUDA

Tools & Infrastructure

Docker Git FastAPI Flask Qdrant Supabase AWS Vercel HuggingFace Spaces Weights & Biases

Recognition

Awards & Certifications

AI Hackathon
Jagriti, IIT BHU

2nd Runner-up

Built Multi-label Sentiment Analysis system · 700+ teams

Enigma CTF
Codefest, IIT BHU

2nd Runner-up

1,600+ teams

NVIDIA DLI
Mar 2025

CUDA C/C++ Fundamentals

NVIDIA Deep Learning Institute

NVIDIA DLI
Feb 2025

Deep Learning Fundamentals

NVIDIA Deep Learning Institute

NVIDIA DLI
Nov 2024

AI for Anomaly Detection

NVIDIA Deep Learning Institute

NVIDIA DLI
Oct 2024

Transformer NLP Applications

NVIDIA Deep Learning Institute

Download

Resume

Download PDF Open in New Tab

Get in touch

Contact

Seeking ML Engineer, AI Researcher, or Data Scientist roles. Open to research collaborations and interesting problems at the frontier of AI.

himanshukumariiitn@gmail.com linkedin.com/in/himanshu-kumar035 github.com/mansh7763 Google Scholar

What I Work On

Production RAG & multi-agent systems

LLM fine-tuning & deployment pipelines

Mechanistic interpretability of vision & language models

Multilingual & low-resource NLP

Efficient vision-language model architectures

RLHF and human preference alignment