Himanshu Kumar

AI/ML Engineer & Researcher

Passionate about building intelligent systems that solve real-world problems. Specializing in Computer Vision, NLP, and Vision-Language Models with experience in cutting-edge AI research and development.

About Me

8.35
CGPA
3
Internships
5
Certifications(Nvidia)
1
Publication

I'm a final-year Computer Science Engineering student at IIIT Nagpur, passionate about pushing the boundaries of Artificial Intelligence. My journey in AI began with curiosity about how machines can understand and interpret the world around us.

Currently, I'm focused on multimodal AI systems, particularly Vision-Language Models, where I've developed novel approaches to create compact yet efficient models. My research has been published in ArXiv, and I've gained hands-on experience through internships at prestigious institutions like IIT Madras and IIT Guwahati.

I'm actively seeking opportunities as an ML Engineer, AI Engineer, or Data Scientist where I can contribute to impactful projects and continue learning from industry experts.

Work Experience

May 2025 - July 2025

AI Intern - Generative AI and RLHF

IIT Madras - Wadhwani School of AI
  • Working on soundscape music generation using MusicGen, trained on soothing, ambient, and mind-relaxing sounds scraped from YouTube
  • Designed and deployed a web-based human feedback collection platform to evaluate 30-second generated audio clips
  • Implementing Reinforcement Learning from Human Feedback to align generative sound models with human aesthetic preferences
November 2024 - March 2025

AI Intern

Indian Institute of Technology Guwahati
  • Built a Multimodal Image Captioning VLM using pretrained Vision Transformer (ViT) as encoder and GPT-2/BERT-based models as decoder
  • Trained on Flickr30k dataset and implemented Bottom-Up Top-Down technique for enhanced image feature extraction
  • Achieved significant improvements in caption quality and coherence through advanced attention mechanisms
June 2024 - August 2024

Developer Intern

ProCohat Technology Pvt. Ltd.
  • Contributed to building a MultiPDF Chatting RAG Application enabling users to query multiple PDFs using natural language
  • Integrated PDF text extraction and similarity matching using Supabase for efficient storage and retrieval
  • Enhanced personalized document management with features for retrieving from past uploads

Featured Projects

NanoVLM: Tiny Multimodal Vision Language Model

Developed compact Vision-Language Models (mini, base, large) designed to generate coherent textual descriptions from images while being up to 10 times smaller than existing small VLMs. Achieved an average creativity score of 39.84/50 with ROUGE-1 score below 0.5 for originality.

PyTorch Vision Transformer GPT-2 Computer Vision NLP

Sankshipt: News Summarizer Application

Designed an intelligent application to summarize news articles and identify related topics using advanced NLP and machine learning models. Supports 10 Indian languages and provides topic-based article retrieval with high-quality summaries.

NLP Machine Learning Multi-language Flask Text Summarization

MultiLabel Sentiment Analysis

Designed a sophisticated sentiment analysis model for multi-label classification with 9 emotion labels, successfully tackling severe class imbalance. Leveraged DistilBERT's transformer architecture and achieved 88% accuracy through weighted loss implementation.

DistilBERT Transformers Multi-label Classification Sentiment Analysis PyTorch

Technical Expertise

Machine Learning

Supervised Learning Unsupervised Learning Deep Learning Computer Vision NLP Vision-Language Models Large Language Models RAG Systems

Programming Languages

Python C C++ SQL

Frameworks & Libraries

PyTorch TensorFlow NumPy Pandas Scikit-learn Flask FastAPI Seaborn

Tools & Platforms

GitHub Supabase CUDA Docker Jupyter

Publications & Certifications

Research Publication

"NanoVLMs: How small can we go and still make coherent Vision Language Models?" Published in ArXiv (2025)

CUDA C/C++ Fundamentals

NVIDIA Deep Learning Institute
Issued: March 8, 2025

Deep Learning Fundamentals

NVIDIA Deep Learning Institute
Issued: February 16, 2025

Transformer NLP Applications

NVIDIA Deep Learning Institute
Issued: October 15, 2024

AI for Anomaly Detection

NVIDIA Deep Learning Institute
Issued: November 9, 2024

Competitive Achievements

2nd Runner-up

AI Hackathon, Jagriti, IIT BHU
(Out of 700+ participants)

2nd Runner-up

Enigma, Codefest, IIT BHU
(Out of 1600+ participants)

Education & Key Courses

B.Tech Computer Science & Engineering

Indian Institute of Information Technology, Nagpur
CGPA: 8.35/10
2022 - Present (Final Year)

Data Structures Algorithms DBMS Operating Systems Computer Networks

Mathematics Foundation

Strong mathematical foundation essential for AI/ML research and development.

Linear Algebra Calculus Discrete Mathematics Graph Theory Probability & Statistics

Artificial Intelligence Specialization

Comprehensive AI curriculum covering cutting-edge technologies and methodologies.

Machine Learning Deep Learning Computer Vision Natural Language Processing Conversational AI

Let's Connect

I'm actively seeking opportunities in AI/ML Engineering and Data Science. Let's discuss how we can work together to build the future of AI!

Open to Opportunities

ML Engineer AI Engineer Data Scientist Research Intern