Zeta Ae Math Journal Volume 10 No.
May 2025, pp.
E-ISSN: 2579-5864 P-ISSN: 2459-9948
D https://doi.
org/10.
31102/zeta.
Music Artist Recommendation System Based on Listening History Using SVD and MICE Imputation Approaches David Vijanarco Martal1.
Mohamad Khoirun Najib1* School of Data Science.
Mathematics, and Informatics.
IPB University.
Indonesia *Corresponding Author Email: mkhoirun@apps.
ABSTRACT
In the digital era, music streaming platforms face challenges in providing relevant music recommendations to users.
This research aims to develop a music artist recommendation system based on the user's listening history using the SVD and MICE methods.
In this research.
MICE was applied together with ALS predictive SVD is used to identify latent patterns between users and artists, while MICE address the problem of missing data in listening history.
The data used comes from the online music platform Last.
Analysis was carried out with Julia 1.
5 software.
The results show that the model with MICE provides more accurate and consistent recommendations compared to SVD, especially in the context of missing data.
Accuracy using the MICE model provides results of up to 96%, while the SVD model provides an accuracy of 90,22%.
This approach can increase the relevance of recommendations, helping users find artists according to their These findings support the application of MICE in music recommendation systems, with the potential to improve user experience on music streaming platforms.
Keyword: Recommendation system.
SVD.
MICE.
missing data.
music recommendations Article info:
Submitted: November 20, 2024 Accepted: May 30, 2025 How to cite this article:
Martal.
Najib.
Music Artist Recommendation System Based on Listening History Using SVD and MICE Imputation Approaches.
Zeta - Math Journal, 10.
, 70-80 https://doi.
org/10.
31102/zeta.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution-ShareAlike 4.
0 International License.
Zeta Ae Math Journal.
Vol.
No.
1, pp.
70 - 80.
May, 2025.
INTRODUCTION
In the increasingly developing digital era, music streaming platforms have become an essential part of daily life.
These platforms allow users to access millions of songs from various genres and artists around the Music possesses a transformative power, and this power can shape and alter the character of those who listen to it (Panjaitan, 2.
The main challenge for users is finding music that matches their personal The combination of music and lyrics can influence listenersAo emotions and even evoke deep feelings (Mihalcea & Strapparava, 2.
Therefore, music recommendation systems are a key component in enhancing the user experience when listening to music.
A good recommendation system not only helps users discover new music that aligns with their interests but also increases user loyalty to the platform.
Young listeners spend an average of 25 hours per week streaming music (Roettgers, 2.
According to one of the largest music platforms.
Spotify, approximately 1.
8 million new songs are uploaded each month, with a total of 11 million artists and music creators worldwide (Spotify, 2.
The same applies to other music platforms as well.
There are several types of recommendation systems, and the most commonly used technique is collaborative filtering.
Collaborative filtering is a method that provides recommendations based on user similarity (Februariyanti et al.
, 2.
One popular collaborative filtering algorithm in recommendation systems is the matrix-based model, such as Singular Value Decomposition (SVD), which can identify hidden patterns in user data and reduce the dimensionality of user-artist interaction data.
In previous studies.
SVD has been applied in land mapping (Nagaputra et al.
, 2.
, facial recognition (Septian, 2.
, and watermarking documents (Sinurat & Siagian, 2.
In music recommendation.
SVD maps users and artists into a latent factor space where the relationships between them can be more effectively calculated and analyzed.
To address the issue of missing data, the Multiple Imputation by Chained Equations (MICE) technique is used as an imputation method.
MICE fill in missing data with predicted values from other variables in the dataset, creating a completer and more consistent dataset for analysis.
In previous research.
MICE has been applied to impute missing values in health data (Hedge et al.
, 2.
and in the development of event management strategies (Handayani et al.
, 2.
Additionally.
MICE has been compared with various other methods, such as regression imputation, in handling missing data in BKKBN survey datasets (Putri et al.
A music recommendation system using MICE can produce more accurate recommendations even in the presence of incomplete listening history data.
This research aims to develop a music artist recommendation system based on users' listening history using the SVD and MICE approaches.
This approach is expected to provide more relevant recommendations and assist users in discovering artists that match their preferences, even when the listening history data is LITERATURE REVIEW 1 Singular Value Decomposition (SVD) Singular Value Decomposition (SVD) is one of the matrix factorization techniques frequently used in model-based recommendation systems, particularly in the context of predicting user preferences for certain items (Zulaeka et al.
, 2.
SVD works by decomposing or approximating a large matrix into several smaller matrix, enabling the identification of latent patterns or hidden relationships between users and items.
In the context of music recommendation systems.
SVD helps uncover latent relationships between users and artists based on listening history, thus allowing the system to recommend relevant artists to each user.
SVD approximates the interaction matrix ycI of size yco y ycu into three component matrix (Heath, 2.
These three matrices can be expressed as follows:
ycI OO ycOyuycO ycN ycO: an orthogonal matrix of size yco y yco, ycO: an orthogonal matrix of size ycu y ycu, yu: a diagonal matrix of size yco y ycu.
Music Artist Recommendation System Based on Listening History Using SVD and MICE Imputation Approaches David Vijanarco Martal, dkk.
The matrix yu follows the rule:
yuaycnyc = { yuaycn ,ycn O yc ,ycn = yc The SVD approach utilizes this decomposition to project users and artists into a latent factor space, where the distance between users and artists in this space indicates the degree of preference matching.
In music recommendation systems.
SVD enables the prediction of a user's preference for artists they have not yet listened to.
Once the decomposed matrix ycO, yu, and ycO ycN are obtained, the system can compute the multiplication of these matrix to estimate the predicted interaction matrix ycIA, which represents the likelihood of a user's interest in each artist.
Thus, the artists with the highest predicted values in ycIA for a specific user are recommended as relevant music choices.
2 Multiple Imputation by Chained Equations (MICE) Multiple Imputation by Chained Equations (MICE) is a method that fills in missing data values iteratively using predictive models built from other variables in the dataset.
In this study.
MICE is applied to impute missing values using the mean values within the user-artist listening history matrix.
As the predictive model in MICE.
Alternating Least Squares (ALS) is usedAia matrix factorization method that minimizes the error between actual and predicted values.
One previous application of ALS is in spectroscopic image analysis (Wang et al.
, 2.
ALS has also been previously applied in recommendation systems, such as in e-commerce recommendations (Gosh et al.
, 2.
The use of MICE with ALS in this study aims to produce a more accurate recommendation system despite the presence of missing data, thus yielding more relevant artist recommendations for users.
In the MICE method.
ALS serves as the predictive model that approximates the data matrix using two latent factor matrix optimized iteratively.
The main steps in MICE with ALS include initializing missing values, matrix factorization using ALS, creating alternate optimization matrix, and imputing missing values.
Initialization of missing values is performed by filling in the missing entries in the user-artist matrix ycI with temporary values, such as the average weight for each artist.
The matrix ycI has dimensions yco y ycu, where yco is the number of users and ycu is the number of artists, with each element ycycnyc representing the interaction between user ycn and artist yc.
This can be expressed as:
ycI=[ U
A yc1ycu
A yc2ycu
U ]
A ycycoycu The matrix with imputed missing values is then factorized using the ALS model.
ALS decomposes ycI into two latent factor matrices: the user matrix ycE .
f size yco y yc.
and the artist matrix ycE .
f size ycu y yc.
, where yco is the number of latent factors (Kuroda et al.
, 2.
The ALS approach aims to approximate the matrix ycI as the product of ycI OO ycEycE ycN ycycnyc OO ycyycnycN ycyc This means each element is approximated by:
where ycyycn is the preference vector of user ycn, and ycyc is the factor vector of artist yc.
The factorized matrices are then optimized to minimize a specific objective function.
ALS works by alternately updating matrix ycE and ycE to minimize the squared error between the known values in ycI and the predicted values in ycIC = ycEycE ycN .
The objective function to be minimized is as follows:
min Oc.
cn,y.
cycnyc Oe ycEycnycN ycyc ) yuI(Anycyycn An2 Anycyc An ) ycE,ycE .
Zeta Ae Math Journal.
Vol.
No.
1, pp.
70 - 80.
May, 2025.
where ya is the set of index pairs .
cn, y.
for which ycycnyc is known, and yuI is a regularization parameter that prevents At each iteration, the missing values in ycI are estimated using the predicted values from the ycEycE ycN Once all missing values are filled in, the algorithm repeats the optimization process to adjust the results based on the updated information until convergence or a predefined number of iterations is reached.
The final recommendations are generated using the most optimal ycI matrix obtained from the iteration process.
RESEARCH METHODS
1 Tools and Methods The tool used in this research is a personal computer (PC).
The specifications of the PC are an ASUS X441M laptop with 4 GB of RAM and an Intel(R) Celeron(R) N4000 CPU @1.
10GHz processor.
The software used is Julia version 1.
5, along with several packages available within the Julia ecosystem.
These packages include CSV.
DataFrame.
LinearAlgebra.
Plots.
Statistics, and BenchmarkTools.
2 Data The data used in this research is user data from a music platform, containing social network information, tagging, and artist listening data from a group of users on the online music site Last.
This dataset was released for the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2.
, available at http://ir.
es/hetrec2011, held in conjunction with the 5th ACM Conference on Recommender Systems (RecSys 2.
, accessible at http://recsys.
org/2011.
3 Research Stages The research was conducted through the following steps:
Collecting music listener data from the music platform on the online music site Last.
Performing initial data exploration, including analysis of mean, standard deviation, minimum, maximum, and total number of data points.
Sampling a subset of the entire dataset.
This sampling was intended to improve computational efficiency in modeling recommendations for users of the Last.
fm online music platform.
Constructing a user-artist matrix containing weights representing the number of times each artist was listened to by each user.
Identifying zero entries in the matrix, which represent the parts to be recommended for each user.
Modeling artist recommendations using the Singular Value Decomposition (SVD) method.
Modeling artist recommendations using the Multiple Imputation by Chained Equations (MICE) method with Alternating Least Squares (ALS) as the predictive model.
Comparing the recommendation results from both principal components using the SVD model and the MICE method with ALS predictive modeling.
Calculating the accuracy of the models by assessing the consistency of the recommendations provided to each user.
RESULTS AND DISCUSSION
Data Analysis This research uses music listener data from the music platform on the Last.
fm website.
The platform contains 170623 artists, each with a profile and ID assigned by Last.
The collected data consists of historical listening records from 2100 users on the Last.
fm music platform.
The dataset includes 3 variables, each containing 92834 entries in the form of integers.
A summary of the data is presented in Table 1 below.
Table 1.
Summary of music listener data on the online music site Last.
fm music platform Variable Mean Minimum Median Maximum User ID Artist ID Weight Music Artist Recommendation System Based on Listening History Using SVD and MICE Imputation Approaches David Vijanarco Martal, dkk.
From the data summarized in Table 1, a sample of the first 500 listeners out of the total 2,100 was Then, from all the artists ever listened to by each of these 500 users, the list was reduced to the first 1,510 artists out of all artists available on the Last.
fm music platform.
A summary of the sampled data consisting of 500 users and 1510 selected artists is presented in Table 2 below.
Table 2.
Summary of data for the first 500 music listeners Mean Minimum Median Variable User ID Artist ID Weight Maximum From Table 2, the data has become simpler, and the maximum number of times each user listened to an artist has also decreased.
The data size is reduced to 13633 entries.
This filtered data is then formed into a matrix of size 500 y 1510, where the rows represent User IDs and the columns represent Artist IDs.
The matrix is further filtered by removing columns that contain only zeros.
The resulting matrix contains only users and artists that have been listened to by at least one user on the Last.
fm music platform.
The filtered matrix has dimensions of 500 y 1496.
This sampled data, organized in matrix form, is the dataset used for modeling in this study.
Some of this data is shown in Table 3 below.
Table 3.
Some music listener data on the Last.
fm platform
U 102 U 182 U 288 U
U 254 U
U 222 U
U 42 U
U 203 U
U 1001
U 11
Table 3 above shows the filtered data.
Each row represents a User ID, and each column represents an Artist ID listened to by each user.
The values in the table indicate the weights corresponding to how many times each artist was listened to by each user on the Last.
fm music platform.
Recommendation Modeling with SVD SVD modeling can be performed by reconstructing the matrix as shown in Equation .
and reducing the principal components until the error of the data matrix approaches a certain error threshold.
The user matrix from the Last.
fm music platform used in this modeling is as follows.
The modeling is carried out by decomposing the user matrix into three matrix, such as the orthogonal matrix ycO, the diagonal matrix of singular values Oc, and the transpose of the orthogonal matrix ycO.
The resulting decomposition matrix are then truncated by retaining only yco principal components.
This simplifies the user matrix by leaving only the principal components used for matrix reconstruction.
This approach reduces Zeta Ae Math Journal.
Vol.
No.
1, pp.
70 - 80.
May, 2025.
computational complexity while preserving the essential information needed to generate accurate Suppose the SVD modeling is performed with yco = 2 principal components.
The matrix ycO.
Oc, and ycO formed in the first iteration are as follows.
Oe3.
252 y 10Oe19 Oe0.
Oe1.
ycO= U Oe2.
05 y 10Oe6 03 y 10Oe8 Oe0.
ycO= U Oe0.
93 y 10Oe19 Oe1.
894 y 10Oe10 U Oc= U 257 y 10Oe12 257 y 10Oe12 .
469 y 10Oe14 ] Oe2.
65 y 10Oe6 Oe4.
44 y 10Oe7 Oe2.
74 y 10Oe6 Oe0.
Oe0.
26 y 10Oe8 Oe2.
18 y 10Oe6 Oe2.
43 y 10Oe7 Oe4.
95 y 10Oe5 42 y 10Oe4 Oe1.
47 y 10Oe4 Oe5.
89 y 10Oe5 0 Oe2.
833 y 10Oe17 902 y 10Oe17 U Oe6.
56 y 10Oe6 U 5.
44 y 10Oe6 Oe0.
U 0.
02 y 10Oe2 Oe0.
U Oe4.
88 y 10Oe5 Oe2.
31 y 10Oe6 92 y 10Oe6 Oe3.
57 y 10Oe4 Oe8.
45 y 10Oe5 Oe0.
08 y 10Oe2 77 y 10Oe5 ] Then, the matrix ycO.
Oc, and ycO are truncated based on the number of principal components to be retained.
In the example given, the matrix are truncated to keep 2 principal components.
The matrix ycO becomes a 500 y 2 matrix.
Oc becomes a 2 y 2 diagonal matrix, and ycO becomes a 2 y 500 matrix.
The truncated matrix ycO.
Oc, and ycO are then reconstructed back into the user matrix.
The resulting reconstructed matrix will reflect the changes caused by truncating the principal components.
The reconstructed matrix result for this first iteration is shown as follows.
327 y 10Oe20 405 y 10Oe4 911 y 10Oe10 392 y 10Oe4 912 y 10Oe6 765 y 10Oe20 Oe9.
664 y 10Oe5 513 y 10Oe10 376 y 10Oe4 165 y 10Oe5 025 y 10Oe20 615 y 10Oe5 616 y 10Oe5 057 y 10Oe10 602 y 10Oe4 U 2.
906 y 10Oe19 U 5.
509 y 10Oe10 U 4.
383 y 10Oe5 023 y 10Oe19 939 y 10Oe10 772 y 10Oe4 543 y 10Oe5 The matrix reconstruction process is performed iteratively.
At each iteration, the missing entries in the user matrix that marked by zeros are estimated using the truncated matrix ycOyco .
Ocyco , and ycOyco obtained from the previous step.
The reconstruction error is then calculated by considering the error only at the positions in the matrix that were originally zero.
In this study, the error of the matrix is measured using the mean absolute percentage error (MAPE).
The MAPE for this matrix can be calculated using the following equation.
MAPE =
ayc Oeyayc 1 ) ycuycuycyco.
ayc ) .
where yayc is the matrix at iteration yc for yc = 0, 1, 2.
A (McKenzie, 2.
The iteration is continued until the resulting MAPE value falls below the threshold of 0.
David Vijanarco Martal, dkk.
Music Artist Recommendation System Based on Listening History Using SVD and MICE Imputation Approaches In this research, various numbers of principal components .
are used to evaluate the effectiveness of the recommendation modeling with SVD.
As an initial assumption, the retained principal components .
are set to 2, 3, 4, and 5.
The SVD recommendation modeling with different yco values was implemented using Julia A summary of the modeling results can be seen in Table 4 below.
Table 4.
Summary of data from modeling results with SVD Principal component Iteration MAPE Time .
0,00099809 384,849 0,00099773 394,669 0,00099979 383,179 0,00099801 394,181 From Table 4, it can be seen that yco = 5 yields the smallest number of iterations.
This indicates that yco = 5 produces the fastest modeling results compared to the principal components of 2, 3, and 4, based on the number of iterations used.
The progression of the error values for different yco values can be seen in Figure 1
MAPE
yco=2 yco=3 yco=4 yco=5 Iteration Figure 1.
The error values across iterations for principal components 2, 3, 4, and 5.
From Figure 1, the progression of MAPE values for each factor appears to be similar or nearly the same.
To provide a clearer view of the MAPE values as the iteration approaches convergence, see Figure 2 below.
MAPE
yco=2 yco=3 yco=4 yco=5 Iteration Figure 2.
The movement of the error value changes when the iteration will stop Zeta Ae Math Journal.
Vol.
No.
1, pp.
70 - 80.
May, 2025.
Figure 2 shows how the iteration stops when the MAPE values for each principal component approach the predetermined tolerance threshold.
The model with 5 principal components reaches the tolerance threshold earlier compared to the other components.
Recommendations are made by sorting the matrix values according to each user ID.
The sorted values correspond to the entries in the matrix that were originally zero.
The column indices of these values represent the artist ID recommended for each user.
The top 10 recommended artists for user ID 2 using the SVD model with 2, 3, 4, and 5 principal components can be seen in Table 5 below.
yeU=ya Savage G Leona L Andre M Massive A Easybeats Elton J Dilated P Jordin S Suede Aeoau uiiA Table 5.
Artist recommendation results for ID 2 yeU=yc yeU=ye Savage G God IAA Blur Pussycat D Sunset R Danity K Massive A All That R Jordin S ShakespearAos S Dilated P Rachel S Funeral FAF Bob D Rachel S Suede Bob D Ke\$ha Jordin S yeU=ye Elton J Danity K Aeoau uiiA Jordin S ShakespearAos S Savage G Miley C BlackmoreAos N Bob D From Table 5, the top 10 recommended artists for user ID 2 with 2, 3, 4, and 5 principal components can be seen.
The recommendation results are not consistent with each other.
For example, when comparing 3 and 4 principal components, only 3 recommended artists overlap.
Therefore, further analysis is needed to determine the number of principal components that can provide consistent recommendation results.
The consistency of the model will be evaluated by comparing the recommendation results from two consecutive values of yco.
Then, the accuracy percentage will be calculated using the following equation.
Ocycu ycn=1 ycuycn where ycuycn is the ratio of the number of matching recommendations to the total recommendations given for user ID ycn, and ycu is the total number of user ID, which is 500.
In this study, the value of yco that achieves an SVD model accuracy of Ou 90% will be sought.
Using Julia 1.
5, it was found that yco = 5 provides consistent results with an accuracy of 90.
The top 10 recommendations for user ID 2 using the SVD model with 85 and 86 principal components can be seen in Table 6 below.
Table 6.
Artist recommendation results for ID 2 principal components 85 and 86
Principal components 85
Principal components 86
Miles Davis Miles Davis Teta Lando
Teta Lando
Great Like Swimmer
Great Like Swimmer
DestinyAos Child
DestinyAos Child
VAST
VAST
Rock Star Supernova Simian Mobile Disco Simian Mobile Disco Josy Gonzylez Josy Gonzylez Rock Star Supernova Westlife Westlife Groove Coverage Alicia Keys From Table 6, the SVD recommendation model with yco = 85 is able to provide consistent results compared yco = 86, with up to 9 matching recommendations.
The simulation results using yco = 85 can be seen in Table 7 below.
Music Artist Recommendation System Based on Listening History Using SVD and MICE Imputation Approaches David Vijanarco Martal, dkk.
Table 7.
SVD modeling results with principal components 85 Principal component Iteration Time .
MAPE From Table 7, the SVD model with yco = 85 requires a considerable amount of time.
However, the results produced with 85 principal components are consistent when compared to those with 86 components, achieving an accuracy of 90.
If the number of principal components is further increased, the accuracy will also improve, but not significantly.
Recommendation Modeling with MICE Modeling with MICE can be carried out by reconstructing the matrix as in equation .
and performing ALS optimization as in equation .
This iteration continues until the changes in matrix ycE and ycE between iterations fall below a certain threshold.
In this study, the threshold value used is 0.
001, the regularization parameter yuI is 0.
1, and the imputation method applied is the mean imputation.
Imputation is used to modify the user matrix so that the parts of the matrix with a value of 0 are assigned a certain value.
In this study, imputation is done by calculating the mean of how many times each artist was listened to by each user.
Users who have a weight of 0 for an artist are not included in the mean calculation.
As a result, the mean obtained represents the average number of times each artist was listened to, excluding users who did not listen to that artist.
This mean imputation causes each column in the user matrix to have the same values.
The user matrix after imputation using the mean can be seen in the following matrix.
The results of the recommendation modeling using MICE with ALS as the predictive model were obtained using Julia 1.
The number of latent components used in this model are 2, 3, 4, and 5.
A summary of the modeling results can be seen in Table 8 below.
Table 8.
Summary of data from modeling results with MICE using the ALS predictive model Principal component Iteration MAPE From Table 8, the modeling process produces a model below the threshold with MAPE reaching 0% in just 3 iterations.
This applies to all four values of latent components used.
The top 10 recommended artists for user ID 2 using the MICE model with ALS as the predictive model and latent components of 2, 3, 4, and 5 can be seen in Table 9 below.
Zeta Ae Math Journal.
Vol.
No.
1, pp.
70 - 80.
May, 2025.
Table 9.
Artist recommendation results for ID 2 with MICE model yeU=ya Funeral FAA Elton J Easybeats Blur Andre M Pleq & Segue Dilated P Swedem Leona L Danity K yeU=yc Elton J Funeral FAA Easybeats Andre M Pleq & Segue Dilated P Swedem Danity K Bullet FMV Blur yeU=ye Elton J Funeral FAA Easybeats Blur Pleq & Segue Andre M Dilated P Swedem Leona L Danity K yeU=ye Elton J Funeral FAA Easybeats Leona L Pleq & Segue Dilated P Andre M Swedem Blur Bullet FMV From Table 9, each number of latent components produces consistent recommendation results across the different components.
For example, with 2 and 3 latent components, it is evident that for user ID 2, these components produce consistent recommendations with 9 out of 10 recommendations matching.
Next, the model accuracy will be calculated as previously described using equation .
This model accuracy reflects how consistently the model provides recommendations for each number of latent components.
The accuracy results comparing latent components of 2, 3, 4, and 5 can be seen in Table 10 below.
Table 10.
Accuracy of MICE model using ALS prediction model yco=2 yco=3 yco=4 yco=5 yeU=ya yeU=yc yeU=ye yeU=ye From Table 10, it can be concluded that the MICE model using ALS as the predictive model provides the most consistent results when using k = 2.
This model achieves an accuracy of up to 96%.
CONCLUSION
This research demonstrates that the accuracy of both SVD and MICE models is quite good in modeling a recommendation system.
Modeling with SVD gives results with accuracy 90.
22%, while modeling with MICE gives results with accuracy 96%.
The accuracy provided by the MICE model is significantly better and more consistent compared to the SVD model.
This indicates that the MICE model can generate more accurate predictions than the SVD model without additional imputation processes, particularly in matrix with significant missing data.
It reinforces the notion that MICE can effectively address missing data issues, thereby allowing the recommendation model to provide more optimal, relevant, and personalized recommendations for users.
Based on these results, it can be concluded that the MICE approach is able to reduce the impact of missing data on recommendation accuracy, making the resulting recommendation model more relevant and These findings support the potential application of both SVD and MICE methods in various music recommendation system scenarios that face missing data challenges, with broader implications for enhancing user experience in accessing content tailored to their preferences.
David Vijanarco Martal, dkk.
Music Artist Recommendation System Based on Listening History Using SVD and MICE Imputation Approaches REFERENCES Februariyanti.
Laksono.
Wibowo.
, & Utomo.
Implementasi Metode Collaborative Filtering untuk Sistem Rekomendasi Penjualan pada Toko Mebel.
Jurnal Khatulistiwa Informatika, 9.
Gosh.
Nahar.
Wahab.
Biswas.
Hossain.
, & Andersson.
Recommendation