Vaibhav Sharma

Machine Learning Engineer & Software Developer

Building intelligent solutions and crafting elegant code to solve complex problems

About Me

I am a passionate Machine Learning Engineer and Software Developer with a strong foundation in AI/ML, big data, and scalable systems. I enjoy building intelligent solutions, exploring new technologies, and contributing to open-source projects. My work spans research, engineering, and deployment of ML systems at scale.

Education

MS, Data Science
Indiana University Bloomington, USA
GPA: 3.8/4.0
Relevant Coursework: Applied Machine Learning, Data Mining, Statistics, Reinforcement Learning, Advanced Database Concepts, Applied NLP

Quick Facts

  • Passionate about AI/ML and Software Development
  • Strong background in both theoretical and practical aspects
  • Always learning and exploring new technologies
  • Open to collaboration and new opportunities

Skills & Expertise

Languages

Python95%
Java85%
Scala80%
SQL90%
Shell80%

Frameworks & Libraries

PyTorch90%
TensorFlow90%
scikit-learn90%
MLlib80%
Hugging Face85%
Langchain80%
OpenAI API80%
GPT-480%

Big Data & ML Infra

Apache Spark90%
Hadoop80%
Airflow85%
Delta Lake80%
Kafka80%
Hive75%
Databricks85%
MLFlow80%

Cloud & Deployment

AWS (SageMaker, Lambda)85%
GCP (Vertex AI)80%
AzureML75%
Docker85%
Kubernetes75%
MLFlow80%

Systems & Tools

Spring Boot70%
Flask85%
RESTful APIs90%
CI/CD80%

Techniques

NLP90%
Time Series85%
Forecasting80%
Recommendation Systems85%
A/B Testing80%
PCA75%
Causal Inference70%
Embedding Models80%
LLM fine-tuning80%
Prompt engineering80%

Professional Experience

Google Summer of Code | Open-source AI Developer

Apr 2025 – Present

Remote, USA

  • Building a research paper contributions reproduction, mapping and evaluation open-source tool using AI agents.
  • Leveraged retrieval pipelines and prompt chaining strategies to interpret and map research contributions to modular code components.
  • Worked on integrating multimodal inputs into LLM workflows for structured content understanding and alignment.

Radical AI | AI Engineer

Jun 2024 – Aug 2024

Remote, USA

  • Designed and deployed worksheet generator for educators using RAG architecture and LangChain, optimizing data retrieval pipelines using GCP Vertex AI API.
  • Integrated evaluation metrics for Question-Answering system for comparing fine tuning vs RAG performance.

Inmobi | ML Engineer

Apr 2021 – Jul 2023

Bangalore, India

  • Optimized inference latency of a lookalikes ML recommendation engine by 67% through systematic experimentation with Spark query plans and caching strategies, significantly improving online performance.
  • Designed augmentation algorithm over internal identity graphs to enrich user profiles, enabling efficient user ID transactions to Cosmos DB and improving CTR by 20% while reducing RU consumption by 40%.
  • Integrated hyper-log-log data structure in DB storage model and Spark jobs for ETL framework mitigating data availability delay issues and reducing compute costs by 30%.
  • Evaluated BERT embeddings vs traditional LDA for topic modelling on app descriptions; deployed a hybrid NLP model that improved coherence scores by 60%, enhancing semantic relevance in recommendations.
  • Redesigned big data pipeline quality checks with automated data validation and dashboarding, reducing manual QA efforts by 50% and ensuring reliable monitoring at scale.
  • Built threshold-based forecasting models on high-dimensional user activity data; reduced feature space by 80% while preserving model accuracy, enabling scalable deployment for downstream tasks.
  • Developed an ensemble model combining LightGBM and TF-IDF features for user lookalike classification, achieving 20% offline ROC-AUC lift and 25% improvement in business KPIs during A/B testing.
  • Modernized the A/B testing framework implementation and pipelines for performance benchmarking reducing manual efforts by 50%.

Fintech Unicorn (CRED) | Data Scientist

Jul 2020 – Mar 2021

Remote

  • Achieved 87% accuracy in forecasting the peak signups & bureau fetches on the CRED app during high-demand periods in 2021 using ARIMA.
  • Built pipelines for real-time monitoring and analysis of traffic spikes to support business insights and load balancing.
Loading projects...

Get In Touch

Let's Connect

I'm always open to discussing new projects, creative ideas, or opportunities to be part of your vision.