Smit Pancholi profile picture

Hello, I am

Smit Pancholi

Data Scientist | Data Analyst | ML Engineer

My LinkedIn Profile My Github Profile










Get To Know More

About Me



I am a Data Scientist with a strong foundation in analytics, machine learning, and AI-driven systems. With hands-on experience in NLP, cloud computing, and full-stack development, I thrive on solving real-world problems using data. I have worked in academic research and industry settings, developing scalable pipelines, predictive models, and intelligent applications across diverse domains. I recently graduated with a Master of Science in Data Science from The George Washington University.


Education icon

Education

The George Washington University
Washington, D.C.

Master of Science in Data Science
GPA 3.95/4.00

Global Leaders Fellowship award recipient (25%/year Scholarship on Tuition Fee)

Relevant Coursework: Data Science, Data Mining, Data Warehousing, Data Visualization, Deep Learning, NLP, ML


GLS University
Gujarat, India

Bachelors of Computer Applications
GPA 3.70/4.00

Relevant Coursework: Statistics, DBMS, Advanced Object Oriented Programming, Data Communication and Networks

Experience icon

Work Experience

3+ years of combined academic and industry experience in Data Science & ML

Full-Stack Engineer (Blockchain)

EDA Clinical, Pennsylvania
Oct 2025 – Present

  • Built a Zcash Light Wallet Server by deploying a fully synced Zebra full node on Azure and containerizing services with Docker Compose and YAML-based infrastructure.
  • Created a secure app that ingests user UFVKs, syncs wallet data, and displays detailed transactions with search, memo indexing, sorting, and Azure-backed persistent storage.
  • Developed an anonymous forum where users post and comment via minimal ZEC memo transactions, integrating blockchain verification, post rendering, and a smooth interaction workflow.
Blockchain Zcash Azure Docker Full-Stack

AI/ML Engineer

EDA Clinical, Pennsylvania
Aug 2025 – Oct 2025

  • Built an end-to-end pipeline that converts unstructured clinical protocols into USDM-ready JSON across eight core sections, from study design to schedules.
  • Combined LLM-based extraction with rule-based checks and a HITL editor for audit-ready, schema-valid outputs.
  • Led artifact merge and validation across versions to preserve traceability, IDs, and timelines, reducing manual cleanup and accelerating protocol digitization.
  • Delivered a RAG-powered search and review layer using FAISS and validations to verify and refine protocol content.
LLMs RAG FAISS Clinical NLP USDM

Research Assistant

The George Washington University, Washington, D.C.
Jan 2025 – Apr 2025

  • Developed a transcription system for mental health podcast data using Whisper and PyAnnote on AWS GPU instances, helping organize and label raw audio by speaker.
  • Processed long recordings by splitting, denoising, and aligning audio segments with speaker turns using voice activity detection (VAD).
  • Produced clean, time-stamped, and speaker-attributed transcripts, improving accessibility for research on mental health communication.
NLP Whisper PyAnnote AWS

Machine Learning Engineer Intern

STL Digital Inc., Fremont, CA
Oct 2024 – Dec 2024

  • Created a Proof of Concept (PoC) system using FastAPI, PyTesseract, Sentence Transformers, and ChromaDB to extract and search through both structured and unstructured business documents.
  • Connected the system to Google Cloud Storage for syncing and archiving, with built-in error handling to reduce manual review time by over 60%.
  • Added a conversational interface using Mistral-7B-Instruct v0.3, allowing users to ask questions about uploaded documents in natural language.
PoC FastAPI GCP LLMs

Data Analyst

Aumento, Mumbai, India
Nov 2021 – Jul 2023

  • Processed and validated over 700,000 rows of pharmaceutical sales and distribution data, ensuring accuracy across multiple dimensions including sales value, free units, plant codes, and customer segments.
  • Analyzed sales performance totaling over $9.7M USD, distinguishing between NHD and Non-NHD channels, and uncovered division-wise trends across key brands, materials, and business zones.
  • Built interactive dashboards in SAP Analytics Cloud to track ~4M free units in Q4 across top-performing brands, 15+ plants, and 5 customer zones, enabling real-time business insights and decision-making.
Data Validation KPI Analysis Dashboarding SAP Analytical Cloud

Browse My Recent

Projects

Capstone

Multi Agent AI System for Personal Finance

Built a two-agent system using LLMs (LLaMA3, BERT) to classify expenses and provide budget optimization insights. Submitted research paper to Springer (under review).


Chatbot

Presidential Chatbot

Developed a GPT-2 based chatbot with RAG and cosine similarity to retrieve U.S. and Russian speech excerpts using LangChain and ChromaDB with a Streamlit UI.


Music

Music Generation Using Deep Neural Networks

Used LSTM and Variational Autoencoders in PyTorch/TensorFlow to generate original melodies, leveraging time-series pattern recognition in MIDI datasets.


Airbnb

Airbnb Trends Analysis

Created a Dash-based dashboard using GCP services to analyze 102K+ NYC listings. Uncovered winter booking trends and pricing patterns via EDA and PCA.


View My

Certifications

Generative AI with LLMs

Generative AI with LLMs

Completed hands-on course by DeepLearning.AI and AWS on transformers, LoRA, RLHF, and deployment using SageMaker & LangChain.


Docker Professional Certificate

Docker Professional Certificate

Covered Docker architecture, container orchestration, image optimization, and CI/CD integration in production environments.


Explore My

Technical Skills

🖥️ Programming & Databases


Python

Advanced

R

Intermediate

C / C++

Intermediate

Java

Intermediate

SQL

Experienced

MongoDB

Intermediate

JavaScript

Intermediate

🤖 AI & ML Frameworks


PyTorch

Experienced

TensorFlow

Intermediate

Hugging Face Transformers

Intermediate

LangChain

Intermediate

Sentence Transformers

Intermediate

🌐 Web & Development


HTML5

Experienced

CSS3

Experienced

AngularJS

Intermediate

PHP

Intermediate

Node.js

Intermediate

React.js

Intermediate

🔧 Data Analytics Tools


Tableau

Intermediate

Power BI

Intermediate

Jupyter Notebook

Experienced

R Studio

Intermediate

MySQL

Intermediate

SAP Analytical Cloud

Beginner

MS Office / SharePoint

Experienced

☁️ Cloud Platforms


AWS (S3, Lambda, EC2, RDS)

Experienced

GCP (BigQuery, Storage, Functions)

Experienced

Azure (ML, Blob, VMs)

Intermediate

Get in Touch

Contact Me