ChapterPDF Available

Responsible Artificial Intelligence for Music Recommendation

Authors:

Abstract

The widespread adoption of music streaming platforms has necessitated the development of accurate music genre classification systems. These systems, which are often based on techniques for machine learning (ML) and artificial intelligence (AI), have raised ethical and social concerns. As such, implementing responsible AI practices has become essential in developing music recommendation systems. This chapter presents a responsible AI-based music recommendation approach that employs classical ML models, neural networks, and deep neural networks to accurately classify music genres while adhering to ethical principles and ensuring accountability. Feature retrieval techniques were utilized to extract relevant information from music data, and model performance was evaluated using a range of metrics. The deep neural network model demonstrated superior performance, achieving 93% accuracy on both the training and test datasets, a micro-average Receiver Operating Characteristic (ROC) curve area of 99, and 91.4% confidence interval accuracy. Using Explain Like I’m 5 (eli5), permutation importance was utilized to pinpoint the dataset’s most significant features, and these features were subsequently used to retrain the models. Finally, the SHapley Additive exPlanations (SHAP) technique was employed to provide interpretability of the model predictions. The chapter concludes that the developed responsible AI-based music recommendation system can offer personalized recommendations to users while minimizing potential risks and ensuring accountability through transparency and explainability.
Responsible Artificial Intelligence for
Music Recommendation
Sudi Murindanyi, Audrey Nakate, Moses Ntanda Kyebambe, Rose Nakibuule,
and Ggaliwango Marvin
Abstract The widespread adoption of music streaming platforms has necessitated
the development of accurate music genre classification systems. These systems,
which are often based on techniques for machine learning (ML) and artificial intelli-
gence (AI), have raised ethical and social concerns. As such, implementing respon-
sible AI practices has become essential in developing music recommendation sys-
tems. This chapter presents a responsible AI-based music recommendation approach
that employs classical ML models, neural networks, and deep neural networks to
accurately classify music genres while adhering to ethical principles and ensuring
accountability. Feature retrieval techniques were utilized to extract relevant informa-
tion from music data, and model performance was evaluated using a range of metrics.
The deep neural network model demonstrated superior performance, achieving 93%
accuracy on both the training and test datasets, a micro-average Receiver Operat-
ing Characteristic (ROC) curve area of 99, and 91.4% confidence interval accuracy.
Using Explain Like I’m 5 (eli5), permutation importance was utilized to pinpoint
the dataset’s most significant features, and these features were subsequently used to
retrain the models. Finally, the SHapley Additive exPlanations (SHAP) technique
was employed to provide interpretability of the model predictions. The chapter con-
cludes that the developed responsible AI-based music recommendation system can
offer personalized recommendations to users while minimizing potential risks and
ensuring accountability through transparency and explainability.
S. Murindanyi · A. Nakate · M. N. Kyebambe · R. Nakibuule · G. Marvin (B)
Department of Computer Science, CoCIS, Makerere University, Kampala, Uganda
e-mail: ggaliwango.marvin@mak.ac.ug
S. Murindanyi
e-mail: sudi.murindanyi@students.mak.ac.ug
A. Nakate
e-mail: audrey.nakate@students.mak.ac.ug
M. N. Kyebambe
e-mail: mntanda@cis.mak.ac.ug
R. Nakibuule
e-mail: rnakibuule@cis.mak.ac.ug
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024
S. J. Nanda et al. (eds.), Data Science and Applications, Lecture Notes in Networks
and Systems 818, https://doi.org/10.1007/978-981-99-7862-5_22
291
292 S. Murindanyi et al.
Keywords Artificial intelligence (AI) ·Responsible AI ·Intelligent systems ·
Music recommendation
1 Introduction
Music genre classification has become increasingly important with the rise of music
streaming platforms. It involves categorizing music into groups based on its musical
and cultural characteristics, which can be used to develop a recommendation system
based on the categories of music that users listen to [ 11]. In addition, the field of
music recommendation systems has changed due to AI and ML, enabling developers
to create algorithms that can suggest personalized playlists based on individual user
preferences. However, there are worries about the ethical and social ramifications
of using AI in music recommendation systems [ 17]. As a result, responsible AI has
emerged as a critical concept in recent years, aiming to ensure that AI is developed
and deployed in ethical, transparent, and accountable ways [ 24,34]. This study aims
to apply responsible AI to music recommendation systems, specifically focusing on
classifying music into its respective genres using machine learning techniques.
Feature retrieval in music refers to extracting and representing musical features
from a piece of music for analysis and categorization [ 21]. These extracted features
can describe the music for different purposes, such as genre classification, catego-
rization, and recommendation. Therefore, this study discusses how these qualities
are used to categorize music, evaluating the performance of the models using various
evaluation metrics such as confidence intervals, the area under the Receiver Oper-
ating Characteristic (ROC) curve, the F1-score, accuracy, recall, and precision. The
ultimate goal of this project is a responsible AI-based music recommendation system
that can provide personalized recommendations to users while minimizing the risk
of harm and ensuring transparency and accountability [ 23,24]. We explore classical
ML models, neural networks, and deep neural networks to classify music genres
accurately. Furthermore, we utilize explainable AI techniques like SHAP and ELI5
to provide interpretable explanations for the decisions made by the models. Addition-
ally, we will use grid search to find optimal hyperparameters that can significantly
improve model performance. By exploring these techniques, we aim to develop a
responsible AI-based music recommendation system to provide users with personal-
ized recommendations while minimizing potential risks and ensuring accountability.
The next sections of the paper are structured as follows: The project’s background
and motivations are covered in Sect. 2, which also explores the use of AI in music
recommendation systems, ethical issues, the importance of categorizing music gen-
res, and using musical features for classification. The literature review in Sect. 3
discusses various approaches to classifying music genres by related work, points out
gaps in the literature, and emphasizes the importance of this study. While Sect. 4
discusses details of the methodology used to complete the task, Sect. 5 the results of
our experiment. The study is concluded in Sect. 6, and the project’s future work is
covered in Sect. 7.
Responsible Artificial Intelligence for Music Recommendation 293
2 Background and Motivation
Music streaming platforms have become i ncreasingly popular among music lovers
due to their convenience [ 15]. However, with a vast collection of music available
on these platforms, it can be challenging for users to navigate and find music they
like. Systems for making music recommendations have appeared as a solution to
this issue; ML and AI methods suggest customized playlists to individuals according
to their listening patterns and preferences [ 3]. Nevertheless, using AI in music sys-
tems for recommendations highlights ethical and social questions [ 32]. For instance,
the recommendations made by these systems can influence users’ music choices
and preferences, leading to potential issues such as filter bubbles and privacy con-
cerns [ 25]. As such, there is a need for responsible AI-based music recommendation
systems that ensure the ethical, transparent, and accountable deployment of these
systems.
Music genre classification is an essential component of music recommendation
systems, as it involves categorizing music into groups based on its musical and
cultural characteristics [ 6]. This classification is essential for classifying the vari-
ous genres of music that are available, making it simpler for listeners to discover
new music and scholars interested in figuring out the relationships between distinct
musical genres and the fundamental structure of musical pieces [ 30]. Previously,
approaches for classifying music genres included manual annotation and statistical
algorithms [ 29]. However, due to their capacity to recognize patterns in data, ML
techniques have recently gained more and more traction in this categorization.
Music features are specific characteristics of a piece of music that can be used to
describe its content [ 35]. The need to study music features comes from a desire to
understand the underlying structure of different pieces of music and support music
information retrieval technology advancement [ 8]. With the growth of digital music
and large music datasets, machine learning algorithms have been applied to music
information retrieval tasks since they are more accurate and efficient data analysis
methods [ 33]. Feature retrieval is a crucial element in music genre classification and
involves extracting relevant features from a piece of music and using them to group
it [ 20]. In the past, features such as rhythm and timbre were standard features used to
categorize music [ 16]. Recently, machine learning algorithms have been developed
for this process, which is highly accurate, efficient, and easily adapted to new and
changing musical styles.
This study applies responsible AI principles to music recommendation systems,
employing ML methods to classify music genres explicitly. By exploring various
models and evaluation metrics, we aim to develop a responsible AI-based music rec-
ommendation system that can provide personalized recommendations to users while
minimizing potential risks and ensuring transparency and accountability. The goal
is to contribute to developing reliable AI applications in the music industry and pro-
mote AI’s ethical and account standards in society. The following section explains the
techniques used by people who have developed music genre classification algorithms
and models.
294 S. Murindanyi et al.
3 Literature Review
Over the years, different researchers have implemented and proposed other music
classification methods. These include K-Nearest Neighbors (KNN), Support Vector
Machine (SVM), Decision Tree (DT), and Random Forest (RT), which will be dis-
cussed in this section. For those that use the SVM method for classifying, Kobayashi
et al. [ 19] suggested using sub-band signal correlations as a feature extraction method
from the audio files. They show how audio signal input can be used to get numer-
ical representations of musical features. Nirmal et al. [ 27] and Fold et al. [ 12]used
spectrograms to obtain the image or visual terms of music signals as input into
the Convolutional Neural Network (CNN) classifier to recognize music genres. The
neural networks are trained to recognize how an image of a song in a specific genre
would most likely look. Chen et al. [ 7] used the active Transfer Music Genre Clas-
sification Method (ATMGCM), a new technique that combines Transfer Learning
with Active Learning, which has been proposed. This method was found to have a
higher accuracy when compared with the SVM and Random Forest methods. Liang
et al. [ 22] showed how effective Transfer Learning is. An initial task plus a target task
makeup Transfer Learning. Both functions can be performed using a neural network
customized for the dataset. Kikuchi et al. [ 18] expanded the music summarization
technique. With this, multiple sections of music data are summarized, and the most
prevalent features are assigned to the related genre. In their research, Pelchat et al. [28]
utilized neural networks. Neural networks are techniques used in machine learning
to extract features from various datasets. With this, short-time segments from audio
songs are passed through a spectrogram to get their representation as spectrogram
images. These are then used as inputs into the neural networks. Fulzele et al. [ 13]
used LSTM and SVM hybrid methods. This was implemented by first training the
classifiers for the two ways separately and combining them by summing the poste-
rior probabilities obtained from each model for each genre predicted. The posterior
probability is the probability that each sample is predicted as one of the genres. Das
et al. [ 9] tweaked existing methods such as CNN, ResNet50, VGG19, and VGG16
to develop an enhanced version of the spectrographic audio representation. They fed
these architectures a double layer of neural networks and multiple images for each
audio file. Aguiar et al. [ 2] expanded on data augmentation techniques that can be
used to improve the use of convoluted networks in music genre classification. These
techniques include noise addition, generating spectrograms of music at different
loudness ranges, time stretching (increasing the length of the audio passed t hrough
the spectrogram), and pitch shifting, among other techniques. Duggirala et al. [ 10]
explored using Natural Language Processing to classify music. They began by con-
sidering the tempo of the audio track. This helped them gauge the average sentence
length used by an artist. With this knowledge, they could calculate the number of
sentences and notes in an audio track. Ginsel et al. [ 14] analyzed the features that
can be used for music recognition. Examples of the features they studied include the
number of chords played every ten frames, harmony, timbre, and tempo and rhythm.
They learned how different combinations of features contribute to the classification
Responsible Artificial Intelligence for Music Recommendation 295
of songs. Bassiou et al. [ 5] explored using lyrics and audio to classify Greek folk
music. When processing the lyrics, punctuation marks, special characters, numbers,
and other nonletter inclusions were deleted from the lyrics. Frequently occurring
terms found in a Greek folk song were extracted from the remaining lyrics, which
were considered when training the classifier. Sharma et al. [ 31] made a hybrid model
based on SVM, RVM, and an ensemble classifier. The Mel Frequency Cepstral Coef-
ficients (MFCC), Statistical Spectogram Descriptors, the Spectral Centroid, Spectral
Roll Off, and Zero Crossing Rate, among other features, were features they would
extract from the audio. They trained two models separately and then added them
to the remaining model to improve accuracy. The hybrid had the highest accuracy
compared to when each model was used individually. Aguiar et al. [ 4] explored using
machine learning to classify music. In their study, using Artificial Neural Networks
and Support Vector Machine, they conclude that using many features may result in
lower model accuracy. This can be seen as an advantage since less processing time
is needed when fewer features are used.
3.1 Identifying Research Gaps in the Literature
1. Lack of implementation of responsible AI models: While reliable AI models have
been proposed to solve classification systems’ ethical and social implications,
more literature should be on using them.
2. Limited use of evaluation metrics: The literature on music genre classification
often needs more proper evaluation metrics, such as confidence intervals, to eval-
uate how well classification models are doing. This makes it challenging to com-
pare the effectiveness of various models and may produce skewed or incorrect
results.
3. Inadequate data usage and limited feature selection: In music genre classification,
systems can result in models that perform poorly on underrepresented genres.
Moreover, relying on a few features to differentiate between music pieces can
lead to unsatisfactory genre classification results.
3.2 Contributions of This Paper
1. We can employ Explain Like I’m 5 (ELI5) and SHAP as techniques for Explain-
able AI to ensure that our music genre classification model is accountable and
responsible.
2. We utilized various evaluation metrics to thoroughly assess the models, includ-
ing accuracy, recall, and F1-score. In addition, we also incorporated confidence
interval and AUC for a more comprehensive evaluation.
296 S. Murindanyi et al.
3. The paper’s techniques utilize a balanced dataset to avoid data imbalance issues
and genre bias in algorithm development. In addition, the dataset’s multiple fea-
tures provide flexibility for selecting impactful music classification features.
4 Methodology
This study aims to create a responsible intelligent music recommendation system
that can accurately classify different genres of music. To achieve this, a meticulous
methodology was employed. Initially, a comprehensive and diverse dataset, known
as the GTZAN dataset from Kaggle, was procured from an online resource dedicated
to research. Next, a thorough Exploratory Data Analysis (EDA) was conducted to
gain insights and understand the dataset, greatly aiding the model’s development.
The dataset was then split into training, testing, and validation sets. Finally, vari-
ous classical machine learning models, a neural network model, and a deep neural
network model were developed.
Additionally, the hyperparameters were optimized using grid search to attain supe-
rior results. The primary aim was to identify the best-performing model using accu-
racy, precision, F1-score, recall, AUC, and confidence interval metrics. Furthermore,
permutation importance was utilized to identify the most impactful features in the
dataset using eli5, and these features were employed to retrain the models [ 23].
Finally, the SHAP technique was employed to provide interpretability of the model
predictions [ 26]. Figure 1 shows step by step process followed for our study.
Fig. 1 Methodology overview
Responsible Artificial Intelligence for Music Recommendation 297
Fig. 2 Records per music genre
4.1 Data Acquiring and EDA
4.1.1 Data
The GTZAN dataset [ 1], comprising ten genres with 100 audio recordings each,
lasting 30 s, was utilized in this study. First, the audio recordings were converted to
Mel Spectrograms, and features like Mel Frequency Cepstral Coefficients (MFCC),
roll-off, spectral-bandwidth, just to mention a few were extracted, which were then
stored in CSV files. Then, to increase the data size fed into the classification models,
the audio files were split into ten 3-second audio files and saved their features in the
CSV file, with the same format as the original file but for each 30-second song’s
mean and variance over multiple features. The dataset’s genres include blues, clas-
sical, country, disco, hip-hop, jazz, metal, pop, reggae, and rock. There was no class
imbalance upon importing the dataset into the working space, as shown in Fig. 2.
Furthermore, all music genres contained approximately 1000 data points, and the
dataset had no missing variables or duplicates.
4.1.2 EDA
Exploratory Data Analysis (EDA) is a crucial step in creating prediction models
as it helps understand the dataset and identify trends, patterns, and correlations
between different variables. In this study, three essential components of EDA were
conducted, which included Data Assessment, Data Cleaning, and Data Visualization.
Data Assessment was used to evaluate the dataset’s quality and completeness. During
Data Cleaning, several actions were taken, including scaling the data features to pre-
vent features with larger values from affecting the algorithm’s prediction compared
298 S. Murindanyi et al.
Fig. 3 Beats per minutes feature description for every music genre
to those with lower values, thus enhancing the models’ performance. Finally, we
employed data visualization techniques to produce graphical representations of the
data, which provided insight into the results. For example, we gained valuable insight
into various music genres’ beats per minute (BPM). BPM is a crucial indicator of a
musical piece’s tempo or speed, representing the number of beats that occur in one
minute of the music. Examining Fig. 3 of the box plot below shows that most music
genres have a similar BPM range. However, disco music led to slight variations, with
numerous outlier points.
4.2 Model Development and Evaluation
4.2.1 AI Models
After conducting Exploratory Data Analysis, we proceeded to preprocess our dataset,
which consisted of approximately 9990 records. First, we split the data into three
sets: 70% for training (6993 records), 20% for validation (1978 records), and 10%
for testing (1019 records). Next, we developed models using various techniques.
Initially, We used the SVM, DT, LR, and RF traditional machine learning models, all
from sklearn with default hyperparameters. Additionally, we employed a Multi-layer
Perceptron classifier from sklearn with default hyperparameters and a deep neural
network (DNN) with four layers and a 20% dropout rate after each layer.
After completing the initial model development, we applied grid search to iden-
tify optimal hyperparameters for all four classical ML models and the two neural
network models. This step aimed to obtain hyperparameters that could produce supe-
rior models compared to those using default hyperparameters. For instance, when
Responsible Artificial Intelligence for Music Recommendation 299
using logistic regression, we explored various hyperparameters such as the solver
(Newton-CG, LBFGS, and Liblinear), penalty (10 and 12), and c_values (100, 10, 1,
0.1, and 0.01). The grid search process helped us determine the best combination of
hyperparameters for logistic regression, which in this case was the solver LBFGS, a
penalty of 12, and a c_value of 100. We applied similar methods for the other models
in our analysis.
4.2.2 Evaluation
We utilized a variety of evaluation criteria for evaluating the performance of our
models. These included F1-score, precision, recall, and accuracy. Additionally, we
employed AUC and confidence intervals as additional measures of model perfor-
mance.
1. Accuracy: The metric evaluates the ratio of correctly identified positive and neg-
ative instances to the total number of samples.
2. Precision: The metric assesses the proportion of correctly predicted positive
instances relative to the total number of instances predicted to be positive.
3. Recall: The metric quantifies the fraction of actual positive instances that are
correctly identified among all truly positive instances.
4. F1-score: The metric combines precision and recall in a balanced way using a
weighted harmonic mean.
5. AUC: The metric evaluates the area under the ROC plot, illustrating the trade-off
between the true positive rate and the false positive rate across various threshold
values.
6. Confidence intervals: The metric assesses the range of values within which a
population parameter that expected to lie with a certain confidence level, thereby
quantifying the precision and reliability of the estimated parameter.
4.3 Feature Importance and Model Retrain
In order to further improve our models, we conducted a feature importance analysis.
Our dataset initially contained a total of 57 features, and we used the permutation
importance method via the eli5 package to rank their importance in descending order.
Based on these results, we decided to focus on the top 30 features and ignore or drop
the remaining 27. Table 2 displays the permutation importance algorithm’s results,
which reveal each feature’s relative importance in driving the model predictions. By
reducing the feature space in this way, we were able to build more accurate models
while also gaining valuable insights into the key drivers of our models’ predictions.
300 S. Murindanyi et al.
4.4 Explainable AI
After achieving satisfactory results with our music genre classification models, we
delved deeper into understanding the relative importance of each feature in deter-
mining the genre label of a given track by employing SHapley Additive exPlanations
(SHAP). This approach makes it possible to calculate the impact of individual fea-
tures on the model’s prediction for a given instance, analyze the SHAP values of
different features, and identify which audio features had the greatest impact on the
model’s genre predictions. Additionally, SHAP gave us insights into the interactions
between different features, which was crucial in understanding how different audio
features worked together to influence the genre classification.
5 Results
The initial training of the model yielded promising results across all models. How-
ever, some models, such as the Decision Tree, demonstrated overfitting to the dataset.
The model’s performance is illustrated by the significant disparity between its high
training set accuracy of 99% and its low test set accuracy of 63%. As the primary
objective is to create models that perform well on new data, the DNN model out-
performed the others by achieving 93% accuracy (the training samples) and 92%
accuracy (the test samples). The MLP classifier was not far behind, with 99% accu-
racy on the training data and 87% accuracy on the test data, although these values
suggest some degree of overfitting behavior. Table 1 shows the initial training results.
As outlined in the methodology section, we utilized permutation importance from
ELI5 to establish the relative importance of features, with the rankings presented
in Table 2. This table depicts the descending order of features, with mfcc1_mean
occupying the highest rank in the music classification hierarchy, followed by spectral
centroid_mean and subsequent features accordingly.
The second round of training utilized the outcomes from ELI5, wherein only
the top 30 most significant features were employed for training. The outcomes of
this training are presented in Table 3. It is well-established that reducing the number
Table 1 Models performances for the first experiment
Model Train accuracy Test accuracy Test prediction Test recall Tes t F 1-sco re
LR 0.743 0.715 0.712 0.715 0.712
DT 0.999 0.633 0.633 0.633 0.633
RF 0.999 0.853 0.859 0.858 0.858
SVM 0.923 0.846 0.846 0.846 0.846
MLP classifier 0.999 0.877 0.878 0.877 0.877
DNN 0.93 0.92 0.92 0.92 0.92
Responsible Artificial Intelligence for Music Recommendation 301
Table 2 Feature importances using permutation importance from the eli5 library
Weight Features
0.1841.±0.0085 mfcc1_mean
0.1718.±0.0217 spectral_centroid_mean
0.1472.±0.0076 rms_mean
0.1362.±0.0106 roll-off_mean
0.1352.±0.0092 spectral_bandwidth_mean
0.1317.±0.0039 zero_crossing_rate_mean
0.0943.±0.0170 perceptr_var
0.0840.±0.0044 mfcc2_mean
0.0796.±0.0139 chroma_stft_mean
0.0793.±0.0061 mfcc3_mean
0.0652.±0.0102 mfcc4_mean
0.0563.±0.0125 mfcc9_mean
0.0533.±0.0100 harmony_var
0.0444.±0.0040 mfcc6_mean
0.0440.±0.0085 spectral_centroid_var
0.0320.±0.0110 rms_var
0.0312.±0.0100 mfcc5_mean
0.0299.±0.0059 mfcc8_mean
0.0273.±0.0066 mfcc17_mean
0.0262.±0.0087 mfcc11_mean
37 more 37 more
Table 3 Models performances for the second experiment
Model Train accuracy Test accuracy Test prediction Test recall Tes t F 1-sco re
LR 0.692 0.690 0.685 0.690 0.686
DT 0.999 0.674 0.676 0.674 0.675
RF 0.999 0.868 0.869 0.863 0.867
SVM 0.999 0.905 0.906 0.905 0.905
MLP classifier 0.999 0.866 0.866 0.866 0.865
DNN 0.93 0.93 0.93 0.93 0.93
of features in machine learning does not always guarantee better outcomes since
it essentially entails reducing the amount of information. Nevertheless, it is also
known that important features can offer valuable insights that enable the model
better to understand the underlying patterns and relationships in the data. In this case,
logistic regression did not perform well in feature reduction; however, other models
slightly improved their performance, with DNN Gin achieving an accuracy of 93%
in training and testing. Additionally, Fig. 4 shows the micro-average ROC curve with
302 S. Murindanyi et al.
Fig. 4 ROC curve for DNN model
an AUC score of 0.99 indicating outstanding model performance in discriminating
between positive and negative instances across all classes. This observation implies
that the model exhibits a low rate of false positives relative to true positives, thereby
serving as a reliable and informative measure of the model’s overall performance. As
illustrated in Fig. 5, the confusion matrix generated using the test data for the DNN
model provides clear evidence of its strong performance. In addition, the results
indicate that the model could classify most of the data accurately.
As also shown in Fig. 6, the results of our DNN model also showed that the model
we developed had a mean accuracy of 0.914, meaning that it correctly predicted the
outcome 91.4% of the time. The baseline accuracy, which represents the accuracy
of a simple, naive model that predicts the most common outcome for every instance,
was slightly lower at 0.903. The confidence interval for the model accuracy was
found to be between 0.899 and 0.929 with a 95% level of confidence, indicating that
we can be quite confident that the true accuracy of the model lies within this range.
Overall, these results suggest that our model performed better than the baseline and
with high confidence, indicating its potential utility in the context of the studied task.
The SHAP plot as illustrated in Fig. 7 generated for the model showed that
“mfcc1_mean” feature had a higher impact on classifying music genres like classical
and country while having a low impact on genres like blues and hip-hop. On the other
hand, features like “zero_crossing_rate_mean” were found to be better in classifying
blues and hip-hop. These findings highlight the importance of specific musical char-
acteristics for genre classification and can be used to improve the model’s accuracy.
Responsible Artificial Intelligence for Music Recommendation 303
Fig. 5 Confusion matrix for DNN model on the test data
Fig. 6 Confidence interval for DNN model
6 Conclusion
This chapter demonstrates the significant contributions of responsible AI to the field
of music genre recommendation. By developing an interpretable and accountable
model using advanced techniques such as grid search, ELI5, and LIME, we have
shown the achievement of high levels of accuracy and adhering to ethical principles,
and ensuring transparency. The deep neural network model outperformed several
304 S. Murindanyi et al.
Fig. 7 DNN Shapley values scores on the test data
other state-of-the-art classifiers and achieved 93% accuracy after feature importance.
Furthermore, by employing the SHAP method to identify the most important features
in the dataset and explain how these features contribute to the model’s predictions,
we have demonstrated the potential of responsible AI to provide personalized recom-
mendations to users while minimizing potential risks. The integration of responsible
AI principles into music genre recommendation systems has several benefits. Firstly,
it ensures that the system is transparent and accountable, allowing users to under-
stand how recommendations are generated and providing a means for redress in case
of errors or biases. Another key benefit of this approach is that it mitigates poten-
tial risks by upholding ethical principles and preventing harm to users and society,
thereby fostering trust in the system and reinforcing a sense of responsibility.
7 Future Work
The dataset used in this study was limited to tabular data featuring music genre
attributes. However, the GTZAN available on Kaggle provides access to additional
data, including music genre audio files and their corresponding Mel Spectrograms.
In the future work, we plan to develop an interpretable multi-label music genre
classification pipeline using Mel Spectrograms and Convolutional Neural Networks
(CNNs) to provide accurate predictions and insights into how the model arrives at its
predictions. The goal is to create a comprehensive system to take the audio input and
generate a music genre classification output, which will have practical applications
such as music recommendation. Furthermore, this system will avoid the black-box
nature of many current systems, making it a valuable contribution to the field.
Responsible Artificial Intelligence for Music Recommendation 305
References
1. www.datasets/andradaolteanu/gtzan-dataset-music-genre-classification
2. Aguiar, R. L., Costa, Y. M., & Silla, C. N. (2018). Exploring data augmentation to improve
music genre classification with convnets. In 2018 international joint conference on neural
networks (IJCNN) (pp. 1–8). IEEE
3. Akimchuk, D., Clerico, T., & Turnbull, D. (2019). Evaluating recommender system algorithms
for generating local music playlists. arXiv:1907.08687
4. Al-Tamimi, A. K., Salem, M., Al-Alami, A. (2020). On the use of feature selection for music
genre classification. In 2020 Seventh international conference on information technology trends
(ITT) (pp 1–6). IEEE
5. Bassiou, N., Kotropoulos, C., & Papazoglou-Chalikias, A. (2015). Greek folk music classifi-
cation into two genres using lyrics and audio via canonical correlation analysis. In 2015 9th
international symposium on image and signal processing and analysis (ISPA) (pp. 238–243).
IEEE
6. Budhrani, A., Patel, A., & Ribadiya, S. (2020). Music2vec: music genre classification and
recommendation system. In 2020 4th International Conference on Electronics, Communication
and Aerospace Technology (ICECA) (pp. 1406–1411). IEEE
7. Chen, C., & Steven, X. (2021). Combined transfer and active learning for high accuracy music
genre classification method. In 2021 IEEE 2nd international conference on big data, artificial
intelligence and Internet of Things Engineering (ICBAIE) (pp. 53–56). IEEE
8. Cheng, Y. (2020). Music information retrieval technology: Fusion of music, artificial intelli-
gence and blockchain. In 2020 3rd international conference on smart blockchain (SmartBlock)
(pp. 143–146). IEEE
9. Das, P. P., & Acharjee, A., et al. (2019). Double coated vgg16 architecture: An enhanced
approach for genre classification of spectrographic representation of musical pieces. In 2019
22nd international conference on computer and information technology (ICCIT) (pp. 1–5).
IEEE
10. Duggirala, S., & Moh, T. S. (2020). A novel approach to music genre classification using
natural language processing and spark. In 2020 14th international conference on ubiquitous
information management and communication (IMCOM) (pp. 1–8). IEEE
11. Elbir, A., & Aydin, N. (2020). Music genre classification and music recommendation by using
deep learning. Electronics Letters, 56(12), 627–629.
12. Ford, L., Bhattacharya, S., Hayes, R., & Inman, W. (2020). Using deep learning to identify
multilingual music genres. In 2020 SoutheastCon (Vol. 2, pp. 1–5). IEEE
13. Fulzele, P., Singh, R., Kaushik, N., & Pandey, K. (2018). A hybrid model for music genre clas-
sification using LSTM and SVM. In 2018 eleventh international conference on contemporary
computing (IC3) (pp. 1–3). IEEE
14. Ginsel, P., Vatolkin, I., & Rudolph, G. (2020). Analysis of structural complexity features for
music genre recognition. In 2020 IEEE congress on evolutionary computation (CEC) (pp. 1–8).
IEEE
15. Hampton-Sosa, W. (2017). The impact of creativity and community facilitation on music
streaming adoption and digital piracy. Computers in Human Behavior, 69, 444–453.
16. Jiang, Y., & Jin, X. (2022). Using k-means clustering to classify protest songs based on con-
ceptual and descriptive audio features. In Proceedings on Culture and computing: 10th inter-
national conference, C&C 2022, held as part of the 24th HCI international conference, HCII
2022, virtual event, June 26–July 1, 2022 (pp. 291–304). Springer
17. Kamehkhosh, I., Jannach, D., & Bonnin, G. (2018). How automated recommendations affect
the playlist creation behavior of users. In ACM IUI 2018-Workshops
18. Kikuchi, Y., Naofumi, A., & Dobashi, Y. (2020). A study on automatic music genre classifica-
tion based on the summarization of music data. In 2020 international conference on artificial
intelligence in information and communication (ICAIIC) (pp. 705–708). IEEE
306 S. Murindanyi et al.
19. Kobayashi, T., Kubota, A., & Suzuki, Y. (2018). Audio feature extraction based on sub-band
signal correlations for music genre classification. In 2018 IEEE international symposium on
multimedia (ISM) (pp. 180–181). IEEE
20. Kumaraswamy, B. (2022). Optimized deep learning for genre classification via improved moth
flame algorithm. Multimedia Tools and Applications, 81(12), 17071–17093.
21. Li, H., Fei, X., Yang, M., Chao, K. M., & He, C. (2021). From music information retrieval
to stock market analysis: Theoretical discussion on feature extraction transfer. In 2021 IEEE
international conference on e-business engineering (ICEBE) (pp. 54–58). IEEE
22. Liang, B., & Gu, M. (2020). Music genre classification using transfer learning. In: 2020 IEEE
conference on multimedia information processing and retrieval (MIPR) (pp. 392–393). IEEE
23. Marvin, G., & Alam, M. G. R. (2021). Cardiotocogram biomedical signal classification and
interpretation for fetal health evaluation. In 2021 IEEE Asia-Pacific conference on computer
science and data engineering (CSDE) (pp. 1–6). https://doi.org/10.1109/CSDE53843.2021.
9718415
24. Marvin, G., Nakatumba-Nabende, J., Hellen, N., & Alam, M. G. R. (2022). Responsible arti-
ficial intelligence for preterm birth prediction in vulnerable populations. In 2022 IEEE Asia-
Pacific conference on computer science and data engineering (CSDE) (pp. 1–6). https://doi.
org/10.1109/CSDE56538.2022.10089301
25. Millecamp, M., Htun, N. N., Jin, Y., & Verbert, K. (2018). Controlling spotify recommenda-
tions: Effects of personal characteristics on music recommender user interfaces. In Proceedings
of the 26th Conference on user modeling, adaptation and personalization (pp. 101–109)
26. Murindanyi, S., Mugalu, B. W., Nakatumba-Nabende, J., & Marvin, G. (2023). Interpretable
machine learning for predicting customer churn in retail banking. In 2023 7th International
conference on trends in electronics and informatics (ICOEI) (pp. 967–974). IEEE
27. Nirmal, M., & Mohan, S. (2020). Music genre classification using spectrograms. In 2020
International conference on power, instrumentation, control and computing (PICC) (pp. 1–5).
IEEE
28. Pelchat, N., & Gelowitz, C. M. (2020). Neural network music genre classification. Canadian
Journal of Electrical and Computer Engineering, 43(3), 170–173.
29. Sanden, C., & Zhang, J. Z. (2011). Enhancing multi-label music genre classification through
ensemble techniques. In Proceedings of the 34th international ACM SIGIR conference on
research and development in information retrieval (pp. 705–714)
30. Shah, M., Pujara, N., Mangaroliya, K., Gohil, L., Vyas, T., & Degadwala, S. (2022). Music
genre classification using deep learning. In 2022 6th international conference on computing
methodologies and communication (ICCMC) (pp. 974–978). IEEE
31. Sharma, S., Fulzele, P., & Sreedevi, I. (2018). Novel hybrid model for music genre classifica-
tion based on support vector machine. In 2018 IEEE symposium on computer applications &
industrial electronics (ISCAIE) (pp. 395–400). IEEE
32. Sorbán, K. (2021). Ethical and legal implications of using ai-powered recommendation systems
in streaming services. Információs Társadalom: társadalomtudomáinyi folyóirat, 21(2), 63–82.
33. Wang, S., Cao, J., & Yu, P. (2020). Deep learning for spatio-temporal data mining: A survey.
IEEE Transactions on Knowledge and Data Engineering
34. Whang, S. E., Tae, K. H., Roh, Y., & Heo, G. (2021). Responsible AI challenges in end-to-end
machine learning. arXiv:2101.05967
35. Zhang, K. (2021). Music style classification algorithm based on music feature extraction and
deep neural network. Wireless Communications and Mobile Computing, 2021, 1–7.
... Implementing responsible AI practices has a become essential, especially with the rise of ethical and social concerns in the digital landscape. A study by Murindanyi et al. (2024) demonstrates the potential of responsible AI to produce personalized recommendations for users while decreasing possible risks. Highlighting how users can utilize AI based music recommendation systems that can offer personalized music recommendations while decreasing risk through enforcing accountability by transparency and explainability. ...
Article
Full-text available
The rapid adoption and evolving nature of artificial intelligence (AI) is playing a significant role in shaping the music streaming industry. AI has become a key player in transforming the digital music streaming industry, particularly in enhancing user experiences and driving subscription growth. Through AI automation, platforms personalize music recommendations, optimize subscription offerings, and improve customer support services. This article reviews the role of AI in driving consumer subscription behaviors on digital music streaming platforms (DMSP), with a focus on recommendation algorithms, dynamic pricing models, marketing automation, and the future of AI in the music industry. Potential challenges related to privacy, ethics, and algorithmic biases are also discussed, showcasing how AI is revolutionizing the music streaming industry.
Chapter
Full-text available
Protest music is a phenomenal and widely circulated form of protest art in social movements. Previous protest music research has extensively focused on lyrics while ignoring other musical features that also contribute to the role of protest music in social movements. This study fills a gap in previous research by converting 397 unstructured musical pieces into structured music features and proposing a k-means clustering analysis to categorize protest songs based on both high-level conceptual features collected from Spotify and low-level descriptive audio features extracted via Librosa. The Davies–Bouldin index, inertia curve, Silhouette curve, and Silhouette diagram were the main measurements used to compare model performance. An innovative threshold filtering approach (optimizer area) was used to label 128 protest songs. Through a bottom-up folksonomy approach to music classification, this study overcomes the limitations of traditional genre classification by introducing other high-level features (e.g., energy, danceability, instrumentalness) and their roles in determining protest music categories.
Conference Paper
Full-text available
Finding similar objects and patterns based on the similarity score is one of the fundamental and useful tasks in Data Mining. Different applications may introduce different features that need to be extract and analyzed. However, if two applications share some similar core concepts, it is possible to transfer some features that will be beneficial to transfer learning. Thus, this paper uses some features from Music Information Retrieval and Stock Market Analysis to theoretically illustrate the possibility of Feature Extraction Transfer. We use a 3-tuple or 6-tuple vector to record the music fundamental melody whereas a 5-tuple vector to record the daily behavior of the stock market from the candlestick chart. Hence, the flow of one music melody and the flow of one stock market can be treat as a time series vector sequence. Using this linkage, we have computed some feature exaction from Music Information Retrieval onto Stock Market Analysis and obtained some positive results. For example, the similarity between Activision Blizzard Inc and Zynga Inc have achieved a similarity score of 0.6250. Moreover, these positive results gave some ideas on implementing a self-supervised learning based system to manage your stock market and the potential of implementing a transfer learning between these two applications.
Article
Full-text available
Musical genres are categorical labels created by humans for characterizing pieces of music. This categorization of musical genre is done by the common characteristics shared by its members. Typically these are associated to the rhythmic structure, instrumentation, and harmonic content of the music. In fact, automated musical genre classification could replace or assist the human user in this categorization process and might be the valuable one along with music information retrieval systems. This paper intends to propose a new automated music genre classification model with the aid of an enhanced deep learning model. The proposed model includes two major phases: (a) Feature Extraction and (b) Classification. In the feature extraction phase, the most relevant features like Non-Negative Matrix Factorization (NMF) features, Short-Time Fourier Transform (STFT) features and pitch features are extracted from the given music signal. Subsequently, a weight function is multiplied with the extracted features, particularly to enhance the association between them. Further, these weighted features are subjected to classification via an Optimized Deep belief Network (DBN), where the weights and activation function are fine-tuned. A new Improved Moth Flame Optimization Algorithm (IMFO) is introduced in this work for fine-tuning. The performance of adopted work is evaluated over other existing approaches with respect to Type I and Type II measures, and error analysis, respectively.
Article
Full-text available
Recommendation engines are commonly used in the entertainment industry to keep users glued in front of their screens. These engines are becoming increasingly sophisticated as machine learning tools are being built into ever-more complex AI-driven systems that enable providers to effectively map user preferences. The utilization of AI-powered tools, however, has serious ethical and legal implications. Some of the emerging issues are already being addressed by ethical codes, developed by international organizations and supranational bodies. The present study aimed to address the key challenges posed by AI-powered content recommendation engines. Consequently, this paper introduces the relevant rules present in the existing ethical guidelines and elaborates on how they are to be applied within the streaming industry. The paper strives to adopt a critical standpoint towards the provisions of the ethical guidelines in place, arguing that adopting a one-size-fits all approach is not effective due to the specificities of the content distribution industry.
Article
Full-text available
The music style classification technology can add style tags to music based on the content. When it comes to researching and implementing aspects like efficient organization, recruitment, and music resource recommendations, it is critical. Traditional music style classification methods use a wide range of acoustic characteristics. The design of characteristics necessitates musical knowledge and the characteristics of various classification tasks are not always consistent. The rapid development of neural networks and big data technology has provided a new way to better solve the problem of music-style classification. This paper proposes a novel method based on music extraction and deep neural networks to address the problem of low accuracy in traditional methods. The music style classification algorithm extracts two types of features as classification characteristics for music styles: timbre and melody features. Because the classification method based on a convolutional neural network ignores the audio’s timing. As a result, we proposed a music classification module based on the one-dimensional convolution of a recurring neuronal network, which we combined with single-dimensional convolution and a two-way, recurrent neural network. To better represent the music style properties, different weights are applied to the output. The GTZAN data set was also subjected to comparison and ablation experiments. The test results outperformed a number of other well-known methods, and the rating performance was competitive.
Conference Paper
Maternal and Neonatal health has been greatly constrained by the in-access to essential maternal health care services due to the preventive measures implemented against the spread of covid-19 hence making maternal and fetal monitoring so hard for physicians. Besides maternal toxic stress caused by fear of catching covid-19, affordable mobility of pregnant mothers to skilled health practitioners in limited resource settings is another contributor to maternal and neonatal mortality and morbidity. In this work, we leveraged existing health data to build interpretable Machine Learning (ML) models that allow physicians to offer precision maternal and fetal medicine based on biomedical signal classification results of fetal cardiotocograms (CTGs).We obtained 99%, 100% and 97% accuracy, precision and recall respectively for the LightGBM classification model without any GPU Learning resources. Then we explainably evaluated all built models with ELI5 and comprehensive feature extraction.