AI Researcher & ML Scientist

Himanshu Kumar

Final-year CS student, IIIT Nagpur

I work at the intersection of multimodal learning, multilingual NLP, and mechanistic interpretability — building and understanding AI systems that operate across languages, modalities, and scales. Published at AAAI 2026 with research on representation geometry and efficient vision-language models.

Mechanistic Interpretability Multilingual NLP Multimodal Learning Representation Geometry Low-Resource AI Vision-Language Models
Institution IIIT Nagpur
Degree B.Tech CSE
CGPA 8.41 / 10
Publications 3
Research Internships 3
NVIDIA Certifications 5

Publications

Research

2026

LM4UC Workshop · AAAI 2026

When Gujarati Meets English: Toward Robust Translation of Code-Mixed Low Resourced Indian Language

Himanshu Kumar — First Author

Created the first large-scale Gujlish–English parallel corpus addressing translation for millions of Gujarati speakers who naturally code-mix Gujarati with English. Fine-tuned NLLB-200 for Romanized Gujarati and intra-sentential code-mixing.

30K sentence pairs via BPCC + GPT-4o generation with human validation

1.5–2× BLEU and ChrF++ improvements over Google Translate

New Gujlish evaluation benchmarks adapted from XNLI and IN22

2025

arXiv Preprint

NanoVLM: How Small Can Vision Language Models Be and Still Generate Coherent Text?

Himanshu Kumar — First Author

Systematically studied the lower bound of VLM scale for coherent image captioning. Designed a family of parameter-efficient vision-language models achieving 10× parameter reduction vs. standard VLMs, revealing that caption length matters more than parameter count for alignment.

NanoVLM (mini/base/large) with up to 10× parameter reduction

Curated ShortDesc (20–25w) and LongDesc (60–70w) minimal alignment datasets

New evaluation axes: creativity, consistency, semantic coherence

2025

Under Review · Vizuara AI Labs

The Geometry of Entanglement: Bridging Representation Probing and Mechanistic Interpretability

Himanshu Kumar — Primary Researcher

Multi-scale investigation of how deep vision models implicitly encode attributes beyond their training objective, revealing a fundamental geometric asymmetry between task-relevant and implicitly encoded features.

Task features: ~750 localized neurons with strong correlations (|r|>0.6)

Implicit attributes: ~1,200 neurons with distributed weak signals (|r|<0.2), 94% linearly separable

95% of gender info concentrates in 1/16 of ResNet-50 dimensions (causal entanglement)

Work Experience

Research & Industry

Jan 2026 –
Feb 2026

Vizuara AI Labs

AI Research Intern — Mechanistic Interpretability

  • Led mechanistic interpretability study on representation entanglement in vision models using correlation analysis and linear probing across layers and neurons.
  • Distinguished task-relevant features from implicitly encoded attributes; performed targeted ablations to quantify bias–utility trade-offs.
  • Built reproducible representation analysis pipelines that became the basis of a publication.

May 2025 –
Jul 2025

Wadhwani School of AI, IIT Madras

AI Research Intern — Generative AI & RLHF

  • Developed soundscape music generation system using MusicGen trained on curated ambient datasets scraped from YouTube.
  • Designed and deployed a web-based human feedback collection platform for evaluating 30-second generated audio clips.
  • Applied RLHF to align generative outputs with human aesthetic and perceptual preferences.

Nov 2024 –
Mar 2025

IIT Guwahati

AI Research Intern — Multimodal Learning

  • Built multimodal image captioning VLM using a pretrained ViT encoder and GPT-2/BERT decoder, trained on Flickr30k.
  • Applied Bottom-Up Top-Down attention for enhanced feature extraction, achieving measurable improvements in caption quality.
  • Research directly informed the NanoVLM arXiv paper.

Jun 2024 –
Aug 2024

ProCohat Technologies

ML Engineering Intern

  • Architected a MultiPDF RAG system with semantic search across 100+ documents; 85% retrieval accuracy with Supabase-backed vector storage.
  • Deployed production pipeline via FastAPI serving 50+ daily queries at sub-2s latency.

Selected Work

Projects

AI-Powered Appointment Scheduling — Multi-Agent System

Multi-agent architecture with specialized agents for intent parsing, availability reasoning, and booking confirmation. Real-time slot validation, conflict resolution, and async email/SMS notifications.

LangChain LangGraph Multi-Agent NLP
GitHub

Sankshipt: Multilingual News Summarization

Transformer-based summarization pipeline for 10 Indian languages with language detection, normalization, and entity preservation for cross-lingual topic coverage.

mBART Transformers Multilingual NLP Flask
GitHub

Multi-Label Sentiment Analysis

9-label multi-label sentiment classifier using DistilBERT for overlapping emotional categories in user-generated text. Mitigated severe class imbalance via weighted loss — 88% accuracy.

DistilBERT PyTorch Multi-Label Class Imbalance
GitHub

NanoVLM — Tiny Vision-Language Model

ViT encoder + GPT-2 decoder VLM family achieving coherent image captioning at 10× smaller scale. Includes curated alignment datasets and new evaluation criteria for caption quality.

ViT GPT-2 Multimodal PyTorch
GitHub

Technical

Skills

AI / ML Domains

NLP Computer Vision Multimodal Learning Generative AI RLHF RAG Systems Mechanistic Interpretability Low-Resource NLP Representation Geometry

Frameworks & Tools

PyTorch TensorFlow HuggingFace LangChain LangGraph FastAPI Flask Docker

Languages

Python C / C++ SQL CUDA

Core Coursework

Linear Algebra Probability & Statistics Machine Learning NLP Reinforcement Learning CUDA Programming DSA

Recognition

Awards & Certifications

AI Hackathon
Jagriti, IIT BHU

2nd Runner-up

Built Multi-label Sentiment Analysis system · Competed against 700+ teams

Enigma CTF
Codefest, IIT BHU

2nd Runner-up

Competed against 1,600+ teams

NVIDIA DLI
Mar 2025

CUDA C/C++ Fundamentals

NVIDIA Deep Learning Institute

NVIDIA DLI
Feb 2025

Deep Learning Fundamentals

NVIDIA Deep Learning Institute

NVIDIA DLI
Nov 2024

AI for Anomaly Detection

NVIDIA Deep Learning Institute

NVIDIA DLI
Oct 2024

Transformer NLP Applications

NVIDIA Deep Learning Institute

Get in touch

Contact

Seeking ML Engineer, AI Researcher, or Data Scientist roles starting May 2025. Open to research collaborations and interesting problems at the frontier of AI.

Current Research Interests

Mechanistic Interpretability of vision & language models

Representation geometry and feature disentanglement

Low-resource and code-mixed multilingual NLP

Efficient vision-language model architectures

RLHF and human preference alignment