Question: Q: Develop a streamlined alternative to Spotify. This project will feature a music recommendation system, playback, and streaming capabilities, alongside real - time suggestions derived

Q: Develop a streamlined alternative to Spotify. This
project will feature a music recommendation system, playback, and streaming
capabilities, alongside real-time suggestions derived from user activity.
Phase #1: Extract, Transform, Load (ETL) Pipeline [40 Marks]
The first task involves creating an Extract, Transform, Load (ETL) pipeline
utilising Free Music Archive (FMA), a readily available dataset ideal for
assessing various endeavours in music information retrieval (MIR), a domain
focused on navigating, querying, and structuring extensive music libraries.
You'll be working with the fma_large.zip dataset, comprising 106,574
tracks, each lasting 30 seconds, and spanning 161 unevenly distributed
genres. Compressed, the dataset totals 93 GiB in size. Moreover, you may
find the fma_metadata.zip data necessary for track details like title, artist,
genres, tags, and play counts, covering all 106,574 tracks. Your selection of
features will vary based on your approach to music recommendation.
After downloading, you'll use Python to load the dataset and execute
important feature extraction methods like Mel-Frequency Cepstral Coefficients
(MFCC), spectral centroid, or zero-crossing rate. These techniques will
convert the audio files into numerical and vector formats. Additionally,
consider exploring normalisation or standardisation techniques, as well as
dimensionality reduction, if necessary, as they can greatly enhance the
accuracy of your recommendation model.
Finally, due to the datasets vast size, it's important to store it in a scalable
and accessible manner. Fortunately, MongoDB fulfils these requirements
seamlessly. After transformation, you can effortlessly load the audio features
into a MongoDB collection for further utilisation.
Phase #2: Music Recommendation Model [70 Marks]
Now that the data is securely stored in MongoDB, the next step involves using
Apache Spark to train a music recommendation model. You have the option to
leverage Apache Sparks MLlib machine learning library or explore deep
learning methodologies for enhanced accuracy, utilising emerging frameworks
like the TorchDistributor API for PyTorch. Algorithms such as collaborative
filtering and Approximate Nearest Neighbours (ANN) can be used in this
process.
Following the training phase, it is important to assess your music
recommendation model using different evaluation metrics. It's worth noting
that hyperparameter tuning for the model holds considerable importance, and
the parameters you select must be supported by your implementation.
Phase #3: Deployment [30 Marks]
Upon completing the model training, your next task is to deploy it onto a web
application. However, the challenge lies in the fact that it's not just any web
application but a streaming service. Your objective is to develop an interactive
music streaming web application that incorporates the mentioned features.
Crafting a well-structured and user-friendly web interface carries substantial
weightage. You have the freedom to utilise frameworks such as Flask or
Django for this purpose.
You'll leverage Apache Kafka to dynamically generate music
recommendations for users in real-time, using historical playback data to
inform future suggestions. All of this will seamlessly occur concurrently in the
background of the web application.
The web application must not include a form prompting users to upload audio
files. Instead, recommendations must be exclusively generated by Apache
Kafka in real-time, monitoring user activity to tailor suggestions accordingly.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!