Featured Projects
A mix of production systems, machine-learning research, and full-stack tools — built across coursework, internships, and side projects.

JobFlow
AI-powered resume tailoring for every application
Intelligent job application tool with 8-step AI pipeline that generates ATS-optimized resumes and personalized cover letters from any job posting URL.
8-step pipeline with keyword matching, relevance scoring, and ATS optimization

Hilton Invoice Code Finder
Offline invoice classification web app
Pro bono offline-first web app for a Hilton corporate manager. Matches free-text invoice descriptions to 93 GL codes across 10 departments using TF-IDF + cosine similarity in the browser. iPhone-optimized UI; localStorage persistence for use in the field.
DALL-E + SAM Image Editing
Generative pipeline: text → image → mask → inpaint
Three-stage generative-image pipeline combining OpenAI DALL-E 3 for generation, Meta's Segment Anything Model for region selection, and DALL-E's edit endpoint for mask-based inpainting. Demonstrated on a fashion design concept.
1024×1024 generation + SAM segmentation + 3-variant inpainting
Time-Series Retail Sales Forecasting
Daily retail sales forecasting + descriptive analytics
End-to-end pipeline on a year of daily POS data for a small retail client. Compared naive, seasonal-naive, ARIMA, Random Forest, and Gradient Boosting; explored the global-pool approach from Montero-Manso & Hyndman (2021). Paired with a descriptive sales report (day-of-week, seasonal, by-department) delivered to the client.
Gradient Boosting: MAE $494, sMAPE 13.0% — 24% MAE reduction vs ARIMA
La Liga Ranking
Team ranking from match-result averaging
Ranked all 20 teams of the 2020-21 La Liga season from 760 match records. Encoded wins/draws/losses (1, 0.5, 0), pivoted into a team-vs-team matrix, and computed per-opponent average performance. Team project with Jenisha Shrestha and Bipin Bisural.
Top 4 (Atlético, Real Madrid, Barcelona, Sevilla) — exact match to actual 2020-21 standings
Heart Disease Prediction
Clinical prediction model
Cardiovascular risk assessment on the UCI Heart Disease dataset. Team project with Jenisha Shrestha; compared multiple classifiers, with Logistic Regression emerging as the most promising.
0.152 misclassification rate (Logistic Regression)
Data science, applied from problem to production.
I'm a Junior Data Science student at the University of Colorado Boulder, graduating May 2027. My work spans machine learning, statistical modeling, and full-stack AI development — building applications end-to-end from problem definition through production deployment.
Four production AI applications are currently live: WhaleWatch (real-time SEC filing intelligence), Zenith (Claude + Pinecone RAG motivational platform), JobFlow (an 8-step Claude pipeline for ATS-optimized resumes), and Hilton Invoice Code Finder (browser-side TF-IDF classification). Academic projects cover time-series forecasting (Gradient Boosting on retail POS data, 24% MAE reduction vs ARIMA), ensemble classification methods, and multi-modal generative pipelines combining DALL-E with Meta's Segment Anything Model.
Coursework includes Time Series Analysis (APPM-STAT 4720/5720), Machine Learning, Deep Learning, and Statistical Modeling. I hold an IBM SkillsBuild Enterprise Design Thinking Practitioner certification (December 2025) and am currently pursuing an additional data science certification while advancing my SQL skills. Languages: Nepali, Hindi, and English.
Tech Stack
Languages
- PythonProficient
- RProficient
- SQLIntermediate
- ExcelProficient
Frameworks
- Next.js
- Flask
- FastAPI
- React
Data Tools
- TableauIntermediate
- Power BIIntermediate
ML / AI
- Scikit-learn
- Claude API
- Pinecone
Databases
- PostgreSQL
- SQLite
Let's Connect
Open to data science internship opportunities, research collaborations, and interesting problems.
- Emailmanasacharya2004@gmail.com
- LinkedInmanas-acharya-969702379
- GitHubacharyaww
- LocationLafayette, Colorado


