Fig 4 - uploaded by Huy Phan
Content may be subject to copyright.
Category-wise performance comparison between the proposed system with triplet loss and the DCASE 2018 baseline on Task 1A.
Source publication
In this work, we propose an approach that features deep feature embedding learning and hierarchical classification with triplet loss function for Acoustic Scene Classification (ASC). In the one hand, a deep convolutional neural network is firstly trained to learn a feature embedding from scene audio signals. Via the trained convolutional neural net...
Contexts in source publication
Context 1
... system, the developed baseline, and the DCASE 2018 baseline are shown in Table IV. As can be seen, the propose system outperforms all the DCASE 2018 baseline with a large margin, 15.6% absolute (with triplet loss) on Task 1A and 16.6% absolute on Task 1B (without triplet loss). Improvements on individual categories can also be seen, as shown in Fig. 4 for a comparison between the proposed system with triplet loss and the DCASE 2018 baseline on Task 1A, with several categories enjoying a significant gain of more than 20%, such as "shopping mall", "tram", "metro", ...
Context 2
... system, the developed baseline, and the DCASE 2018 baseline are shown in Table IV. As can be seen, the propose system outperforms all the DCASE 2018 baseline with a large margin, 15.6% absolute (with triplet loss) on Task 1A and 16.6% absolute on Task 1B (without triplet loss). Improvements on individual categories can also be seen, as shown in Fig. 4 for a comparison between the proposed system with triplet loss and the DCASE 2018 baseline on Task 1A, with several categories enjoying a significant gain of more than 20%, such as "shopping mall", "tram", "metro", ...
Similar publications
Unsupervised spoken term discovery consists of two tasks: finding the acoustic segment boundaries and labeling acoustically similar segments with the same labels. We perform segmentation based on the assumption that the frame feature vectors are more similar within a segment than across the segments. Therefore, for strong segmentation performance,...
An important part of the human-computer interaction process is speech emotion recognition (SER), which has been receiving more attention in recent years. However, although a wide diversity of methods has been proposed in SER, these approaches still cannot improve the performance. A key issue in the low performance of the SER system is how to effect...
Music genre classification is one of the trending topics in regards to the current Music Information Retrieval (MIR) Research. Since, the dependency of genre is not only limited to the audio profile, we also make use of textual content provided as lyrics of the corresponding song. We implemented a CNN based feature extractor for spectrograms in ord...
Road traffic monitoring is very important for intelligent transportation. The detection of traffic state based on acoustic information is a new research direction. A vehicles acoustic event classification algorithm based on sparse autoencoder is proposed to analysis the traffic state. Firstly, the multidimensional Mel-cepstrum features and energy f...