Smit Pancholi profile picture

Hello, I am

Smit Pancholi

Data Scientist | Data Analyst | ML Engineer

My LinkedIn Profile My Github Profile










Get To Know More

About Me



I am a Data Scientist with a strong foundation in analytics, machine learning, and AI-driven systems. With hands-on experience in NLP, cloud computing, and full-stack development, I thrive on solving real-world problems using data. I have worked in academic research and industry settings, developing scalable pipelines, predictive models, and intelligent applications across diverse domains. I recently graduated with a Master of Science in Data Science from The George Washington University.


Education icon

Education

The George Washington University
Washington, D.C.

Master of Science in Data Science
GPA 3.95/4.00

Global Leaders Fellowship award recipient (25%/year Scholarship on Tuition Fee)

Relevant Coursework: Data Science, Data Mining, Data Warehousing, Data Visualization, Deep Learning, NLP, ML


GLS University
Gujarat, India

Bachelors of Computer Applications
GPA 3.70/4.00

Relevant Coursework: Statistics, DBMS, Advanced Object Oriented Programming, Data Communication and Networks

Experience icon

Work Experience

3+ years of combined academic and industry experience in Data Science & ML

Research Assistant

The George Washington University, Washington, D.C.
Jan 2025 – Apr 2025

  • Developed a transcription system for mental health podcast data using Whisper and PyAnnote on AWS GPU instances, helping organize and label raw audio by speaker.
  • Processed long recordings by splitting, denoising, and aligning audio segments with speaker turns using voice activity detection (VAD).
  • Produced clean, time-stamped, and speaker-attributed transcripts, improving accessibility for research on mental health communication.
NLP Whisper PyAnnote AWS

Machine Learning Engineer Intern

STL Digital Inc., Fremont, CA
Oct 2024 – Dec 2024

  • Created a Proof of Concept (PoC) system using FastAPI, PyTesseract, Sentence Transformers, and ChromaDB to extract and search through both structured and unstructured business documents.
  • Connected the system to Google Cloud Storage for syncing and archiving, with built-in error handling to reduce manual review time by over 60%.
  • Added a conversational interface using Mistral-7B-Instruct v0.3, allowing users to ask questions about uploaded documents in natural language.
PoC FastAPI GCP LLMs

Data Analyst

Aumento, Mumbai, India
Nov 2021 – Jul 2023

  • Processed and validated over 700,000 rows of pharmaceutical sales and distribution data, ensuring accuracy across multiple dimensions including sales value, free units, plant codes, and customer segments.
  • Analyzed sales performance totaling over $9.7M USD, distinguishing between NHD and Non-NHD channels, and uncovered division-wise trends across key brands, materials, and business zones.
  • Built interactive dashboards in SAP Analytics Cloud to track ~4M free units in Q4 across top-performing brands, 15+ plants, and 5 customer zones, enabling real-time business insights and decision-making.
Data Validation KPI Analysis Dashboarding SAP Analytical Cloud

Browse My Recent

Projects

Capstone

Multi Agent AI System for Personal Finance

Built a two-agent system using LLMs (LLaMA3, BERT) to classify expenses and provide budget optimization insights. Submitted research paper to Springer (under review).


Chatbot

Presidential Chatbot

Developed a GPT-2 based chatbot with RAG and cosine similarity to retrieve U.S. and Russian speech excerpts using LangChain and ChromaDB with a Streamlit UI.


Music

Music Generation Using Deep Neural Networks

Used LSTM and Variational Autoencoders in PyTorch/TensorFlow to generate original melodies, leveraging time-series pattern recognition in MIDI datasets.


Airbnb

Airbnb Trends Analysis

Created a Dash-based dashboard using GCP services to analyze 102K+ NYC listings. Uncovered winter booking trends and pricing patterns via EDA and PCA.


View My

Certifications

Generative AI with LLMs

Generative AI with LLMs

Completed hands-on course by DeepLearning.AI and AWS on transformers, LoRA, RLHF, and deployment using SageMaker & LangChain.


Docker Professional Certificate

Docker Professional Certificate

Covered Docker architecture, container orchestration, image optimization, and CI/CD integration in production environments.


Explore My

Technical Skills

🖥️ Programming & Databases


Python

Advanced

R

Intermediate

C / C++

Intermediate

Java

Intermediate

SQL

Experienced

MongoDB

Intermediate

JavaScript

Intermediate

🤖 AI & ML Frameworks


PyTorch

Experienced

TensorFlow

Intermediate

Hugging Face Transformers

Intermediate

LangChain

Intermediate

Sentence Transformers

Intermediate

🌐 Web & Development


HTML5

Experienced

CSS3

Experienced

AngularJS

Intermediate

PHP

Intermediate

Node.js

Intermediate

React.js

Intermediate

🔧 Data Analytics Tools


Tableau

Intermediate

Power BI

Intermediate

Jupyter Notebook

Experienced

R Studio

Intermediate

MySQL

Intermediate

SAP Analytical Cloud

Beginner

MS Office / SharePoint

Experienced

☁️ Cloud Platforms


AWS (S3, Lambda, EC2, RDS)

Experienced

GCP (BigQuery, Storage, Functions)

Experienced

Azure (ML, Blob, VMs)

Intermediate

Get in Touch

Contact Me