This big data project uses PySpark to extract and transform large-scale music listening data stored in HDFS and build a collaborative-filtering recommender system.
- Developed a recommender system using an Alternating Least Squares model.
- Evaluated the model against a popularity baseline.
- Used Mean Average Precision at K as the main performance metric.
- Improved MAP@100 by 16.7x compared with the popularity baseline.