// Data Engineer · Analyst · AI Builder
Pipelines, models, and insights — end to end.
MS in Business Analytics & AI at UT Dallas. I build the data infrastructure that makes analytics possible — from 700M-row/day Spark pipelines to ML models and LLM-powered agents.
// 00. about
I'm a Data Engineer and ML practitioner originally from India, now pursuing my MS in Business Analytics & AI at the University of Texas at Dallas. My path into data started with electronics engineering, moved through analytics, and evolved into building the pipelines and systems that make data actually useful at scale.
At Cognizant, I spent 2.5 years going from writing SQL reports to architecting PySpark pipelines that processed 700M+ rows daily — learning along the way that the best data work lives at the intersection of engineering rigor and business impact.
Outside of work, I enjoy experimenting with LLMs and agents, exploring ML research, and building side projects that solve real problems. I'm currently looking for Data Engineering or ML Engineering roles where I can keep building things that matter.
// 01. experience
Full-stack data career — from pipeline architecture to BI dashboards and ML delivery.
// 02. projects
ML engineering, deep learning, LLM-powered apps, and data visualization.
Academic advising assistant on Google Cloud Vertex AI + Claude Agent. Integrates LLM prompting with structured academic data and rule-based logic — reducing hallucination on structured course and degree rules via prompt engineering and API-based workflows.
Custom CNN in PyTorch for multi-class MRI tumor classification. Achieved 94%+ F1-score via ablation studies across optimizers (Adam vs SGD) and batch sizes. Deployed batch inference on AWS EC2 with predictions streamed into Databricks for real-time clinical review.
CNN-based model classifying handwritten letters and digits across MNIST and USPS datasets — achieving 98.5% accuracy. Built with PyTorch using batch normalization and dropout regularization for robust generalization across both datasets.
Ensemble classification models predicting loan default risk on 50K+ applications — 98.3% accuracy. Feature engineering in Spark SQL (credit ratios, risk tiers) with class-imbalance handling. Insights delivered via Power BI backed by Delta Lake.
Predicted water potability from key quality parameters using six ML algorithms — achieving accuracy from 92.2% to 99.8% across models. Conducted comprehensive EDA and visualization on the Kaggle dataset; tuned hyperparameters via RandomizedSearchCV.
Predictive model forecasting Kickstarter campaign success to help creators optimize project strategies. Compared Decision Tree, Random Forest, Gradient Boost, and AdaBoost — achieving up to 85.78% accuracy with hyperparameter tuning.
Supervised ML model and web application to detect plant vitamin (NPK) deficiencies from images — achieving 87.6% accuracy. Built end-to-end from model training through deployment as an interactive web app.
End-to-end retail analytics covering market basket analysis (MLxtend), customer segmentation via K-means clustering, and regression modeling on transaction data. Documented insights to improve targeted marketing and customer business value.
Tableau dashboard comparing 15-year returns of Mutual Funds vs Market Index. Formulated an optimal asset allocation strategy to achieve a 20% CAGR target across different index instruments.
Power BI dashboard providing comprehensive insights into sales, profit, orders, and profit margin. Designed for executive-level visibility with drill-through capability across product lines and time periods.
// 03. skills
Spanning data engineering, analytics, ML, and cloud infrastructure.
// 04. certifications
// 05. education
// 06. blog
Notes on data engineering, ML, and things I've learned building in the field.
// 07. contact
Open to data engineering and analytics opportunities. Let's talk.