ML Engineer & AI Researcher
Himanshu Kumar
Final-year CS student, IIIT Nagpur
I build and deploy end-to-end ML systems — from production RAG pipelines and multi-agent architectures to fine-tuned LLMs and vision-language models. My work spans applied engineering (FastAPI, vector databases, LangGraph) and research (published at AAAI 2026), bridging the gap between model development and real-world deployment.
Publications
Research
Under Review · Vizuara AI Labs
The Geometry of Entanglement: Bridging Representation Probing and Mechanistic Interpretability
Multi-scale investigation of how deep vision models implicitly encode attributes beyond their training objective, revealing a fundamental geometric asymmetry between task-relevant and implicitly encoded features.
Task features: ~750 localized neurons with strong correlations (|r|>0.6)
Implicit attributes: ~1,200 neurons with distributed weak signals (|r|<0.2), 94% linearly separable
95% of gender info concentrates in 1/16 of ResNet-50 dimensions (causal entanglement)
LM4UC Workshop · AAAI 2026
When Gujarati Meets English: Toward Robust Translation of Code-Mixed Low-Resource Indian Language
Created the first large-scale Gujlish–English parallel corpus addressing translation for millions of Gujarati speakers who naturally code-mix Gujarati with English. Fine-tuned NLLB-200 for Romanized Gujarati and intra-sentential code-mixing.
30K sentence pairs via BPCC + GPT-4o generation with human validation
1.5–2× BLEU and ChrF++ improvements over Google Translate
New Gujlish evaluation benchmarks adapted from XNLI and IN22
arXiv Preprint
NanoVLM: How Small Can Vision Language Models Be and Still Generate Coherent Text?
Systematically studied the lower bound of VLM scale for coherent image captioning. Designed a family of parameter-efficient vision-language models achieving 10× parameter reduction vs. standard VLMs.
NanoVLM (mini/base/large) with up to 10× parameter reduction
Curated ShortDesc (20–25w) and LongDesc (60–70w) minimal alignment datasets
New evaluation axes: creativity, consistency, semantic coherence
Work Experience
Industry & Research
AI Research Intern — Mechanistic Interpretability
- Investigated implicit demographic encoding in vision models using a multi-scale interpretability pipeline combining linear probing, filter-level correlation analysis, and representation geometry across ResNet and ViT architectures.
- Identified distributed subspace encoding of sensitive attributes and performed targeted filter ablations, reducing gender probing accuracy from 85% to 43% while preserving task performance.
AI Intern — Generative AI & RLHF
- Developed an ambient soundscape music generation pipeline using MusicGen, training on a curated 10-hour dataset and building a web platform to collect structured human preference feedback.
- Applied Reinforcement Learning from Human Feedback (RLHF) to align generated audio with human aesthetic and perceptual preferences using reward modeling and preference-based optimization.
AI Research Intern — Multimodal Learning
- Built multimodal image captioning VLM using a pretrained ViT encoder and GPT-2/BERT decoder, trained on Flickr30k.
- Applied Bottom-Up Top-Down attention for enhanced feature extraction, achieving measurable improvements in caption quality.
- Research directly informed the NanoVLM arXiv paper.
ML Engineering Intern
- Architected a production MultiPDF RAG system with semantic chunking, embedding-based retrieval, and vector similarity search across 100+ documents, achieving 85% retrieval accuracy with Supabase pgvector storage.
- Deployed end-to-end ML inference pipeline serving 50+ daily queries with sub-2s latency using FastAPI.
Selected Work
Projects
Multimodal RAG over HuggingFace Courses
Full-stack multimodal RAG system over 8 HuggingFace courses (2,200+ chunks) using BGE-small text and CLIP image embeddings in Qdrant. Streaming inference with FastAPI SSE and Gemini 2.5 Flash / Llama 3.3 70B fallback.
GitHubHealthcare Appointment Scheduling — Multi-Agent System
LangGraph-based multi-agent workflow with specialized agents for intent parsing, doctor availability retrieval, and appointment booking. Real-time slot validation, conflict resolution, and async email/SMS notifications.
GitHubSankshipt: Multilingual News Summarization
Transformer-based summarization pipeline for 10 Indian languages with language detection, cross-lingual normalization, and entity-aware summarization preserving key events and names.
GitHubMulti-Label Sentiment Analysis
9-label sentiment classifier using DistilBERT for overlapping emotional categories. Addressed severe class imbalance via weighted binary cross-entropy and threshold tuning, achieving 88% accuracy.
GitHubNanoVLM — Tiny Vision-Language Model
ViT encoder + GPT-2 decoder VLM family achieving coherent image captioning at 10x smaller scale. Includes curated alignment datasets and novel evaluation criteria for caption quality.
GitHubTechnical
Skills
AI / ML Domains
Frameworks & Libraries
Languages
Tools & Infrastructure
Recognition
Awards & Certifications
Jagriti, IIT BHU
2nd Runner-up
Built Multi-label Sentiment Analysis system · 700+ teams
Codefest, IIT BHU
2nd Runner-up
1,600+ teams
Mar 2025
CUDA C/C++ Fundamentals
NVIDIA Deep Learning Institute
Feb 2025
Deep Learning Fundamentals
NVIDIA Deep Learning Institute
Nov 2024
AI for Anomaly Detection
NVIDIA Deep Learning Institute
Oct 2024
Transformer NLP Applications
NVIDIA Deep Learning Institute
Download
Resume
Get in touch
Contact
Seeking ML Engineer, AI Researcher, or Data Scientist roles. Open to research collaborations and interesting problems at the frontier of AI.
What I Work On
Production RAG & multi-agent systems
LLM fine-tuning & deployment pipelines
Mechanistic interpretability of vision & language models
Multilingual & low-resource NLP
Efficient vision-language model architectures
RLHF and human preference alignment