ThesisPDF Available

Pattern Annotation and Classification in Broadcast Content of Radio Productions

Authors:

Abstract and Figures

In the current digital century, there are plenty of radio stations to choose from. However, the choice usually is only based on the music genre, and the listener has to recognize if the program, schedule, and amount of talking suits their demands. In order to compare the amount of music/talking on a radio station, it could either be compared manually by listening, although, in modern times, this could also be automated by the usage of machine learning. This study concentrates on the recognition of speech and non-speech on their patterns by using radio productions as input and optimizing the extraction of numerical values, algorithms, and methods to combine and precise the accuracy over distinguishing the dierent categories and labels. The distinguishing is achieved by using knowledge from earlier research and combining modern newly introduced technologies and ideas, the paper experiments with a multi-layer classical machine learning setup. The numerical extraction from the audio input is executed with the usage of existing research and technologies from the digital signal processing and audio processing elds in combination with optimized parameters. Based on the literature review, the experimental setup extracts a set of features from the audio tracks, which are manually labeled to create ground truth label data. The experiments are covering three algorithms and will compare not only the algorithms but also the methods of extracting by tuning the hop and window sizes. Furthermore, two algorithms in the multi-layer setup are being parameter tuned using grid-search methods to result in an optimal setup specialized on the numerical data. The results indicate that the numerical extraction and the decision between the hop and window size is one of the most critical parameters. Furthermore, the results indicate that both MLP and XGBoost are very good in performance and show both similar results with negligible dierences. Further research and experiments are demanded to optimize and increase the performance of the models by, for example, focusing on silence periods and reducing the impact of background noise on the performance.
Content may be subject to copyright.
sr 2·fmax
sr
Xm(ω) =
X
n=−∞
x(n)w(nmR)ejωn
=
x(n) = n
w(n) = M
Xm(ω) = mR
R=
xrms =1
n·
n
X
i=1
x2
i
ZC Rr=1
2
N
X
n=1
|sign(xr(n)) sign(xr1(n))|
sign(x) =
1, x 0
1, x < 0
Cr=P
N
2
k=1 f[k]|Xr[k]|
P
N
2
k=1 |Xr[k]|
RF=f[K] = k·df :
K
X
k=1
|X(k)| ≤
N/1
X
k=1
|X(k)|
BW =v
u
u
tPN/2
k=1(ffc)2· |X(k)|2
PN/2
k=1 |X(k)|2=v
u
u
tPN/2
k=1(k·df SPc)2· |X(k)|2
PN/2
k=1 |X(k)|2
SF M =
N
qQN1
n=0 x(n)
PN1
n=0 x(n)
N!
×
×
×
×
×
×
×
×
×
samples =seconds ·sr
perf =(main ·speech +main ·nonspeech)
2
0.9739981
0.9708935
0.9021245
0.8364094
0.8894666
0.9712075
0.9728739
1.0
0.9036042
0.8305369
0.8932156
ResearchGate has not been able to resolve any citations for this publication.
Book
Full-text available
Signal Processing Methods for Music Transcription is the first book dedicated to uniting research related to signal processing algorithms and models for various aspects of music transcription such as pitch analysis, rhythm analysis, percussion transcription, source separation, instrument recognition, and music structure analysis. Following a clearly structured pattern, each chapter provides a comprehensive review of the existing methods for a certain subtopic while covering the most important state-of-the-art methods in detail. The concrete algorithms and formulas are clearly defined and can be easily implemented and tested. A number of approaches are covered, including, for example, statistical methods, perceptually-motivated methods, and unsupervised learning methods. The text is enhanced by a common reference and index. This book aims to serve as an ideal starting point for newcomers and an excellent reference source for people already working in the field. Researchers and graduate students in signal processing, computer science, acoustics and music will primarily benefit from this text. It could be used as a textbook for advanced courses in music signal processing. Since it only requires a basic knowledge of signal processing, it is accessible to undergraduate students. © 2006 Springer Science+Business Media LLC. All rights reserved.
Article
Full-text available
Speech processing is emerged as one of the important application area of digital signal processing. Various fields for research in speech processing are speech recognition, speaker recognition, speech synthesis, speech coding etc. The objective of automatic speaker recognition is to extract, characterize and recognize the information about speaker identity. Feature extraction is the first step for speaker recognition. Many algorithms are suggested/developed by the researchers for feature extraction. In this work, the Mel Frequency Cepstrum Coefficient (MFCC) feature has been used for designing a text dependent speaker identification system. Some modifications to the existing technique of MFCC for feature extraction are also suggested to improve the speaker recognition efficiency.
Conference Paper
In this paper, real time identification of advertisement segments in a radio broadcast is performed. There are certain distinctive characteristics of advertisements that distinguish from the rest of the broadcasting information, Speech technology related to recognition of specific patterns in speech signal can characterize this distinction. Machine learning tools such as Hidden Markov Models, Artificial Neural Networks and Ensemble Method are used to classify advertisement and non-advertisement patterns. An ensemble classification technique gave a better classification performance. The system was created using blind audio segmentation for optimization of real time analysis. This work is done mainly using audio characteristics and can be extended to visual data.
Article
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.