Projects
Selected side projects.
- Loan UnderwritingA logistic regression model for predicting one-year probability of default and improving loan underwriting decisions.
- Music Recommender SystemA PySpark recommender system for large-scale music listening data stored in HDFS.
- NLP Topic ModelingTopic modeling on scraped news articles using LDA, K-means clustering, BERT, and BART summarization.
- NLP Text Stock PredictionStock price prediction using LSTM and BERT models to analyze daily news headlines and forecast stock movement.
- Twitter Data PipelineAn Airflow ETL pipeline for Twitter data using Docker, Python, and AWS S3.
- Health Insurance Premium Price AnalysisHealth insurance premium analysis using causal inference, regression, clustering, and machine learning methods.
- Bank Customer Segmentation DashboardAn interactive dashboard for analyzing dummy bank customer data from the United Kingdom.
- Data Analysis on Research Text DataExploratory text analysis on National Science Foundation research grant data.
- NYC Park Crime Tableau DashboardA Tableau dashboard analyzing crime incidents at New York City parks.