December 2019
·
42 Reads
Phonetics and Speech Sciences
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
December 2019
·
42 Reads
Phonetics and Speech Sciences
June 2019
·
324 Reads
·
32 Citations
EURASIP Journal on Audio Speech and Music Processing
We propose a new method for music detection from broadcasting contents using the convolutional neural networks with a Mel-scale kernel. In this detection task, music segments should be annotated from the broadcast data, where music, speech, and noise are mixed. The convolutional neural network is composed of a convolutional layer with kernel that is trained to extract robust features. The Mel-scale changes the kernel size, and the backpropagation algorithm trains the kernel shape. We used 52 h of mixed broadcast data (25 h of music) to train the convolutional network and 24 h of collected broadcast data (ratio of music of 50–76%) for testing. The test data consisted of various genres (drama, documentary, news, kids, reality, and so on) that are broadcast in British English, Spanish, and Korean languages. The proposed method consistently showed better performance in all the three languages than the baseline system, and the F-score ranged from 86.5% for British data to 95.9% for Korean drama data. Our music detection system takes about 28 s to process a 1-min signal using only one CPU with 4 cores.
September 2018
·
5 Reads
Phonetics and Speech Sciences
September 2018
·
66 Reads
Phonetics and Speech Sciences
June 2015
·
14 Reads
Phonetics and Speech Sciences
We propose a new method for automatic fluency scoring of English speaking tests spoken by nonnative speakers in a free-talking style. The proposed method is different from the previous methods in that it does not require the transcribed texts for spoken utterances. At first, an input utterance is segmented into a phone sequence by using a phone recognizer trained by using native speech databases. For each utterance, a feature vector with 6 features is extracted by processing the segmentation results of the phone recognizer. Then, fluency score is computed by applying support vector regression (SVR) to the feature vector. The parameters of SVR are learned by using the rater scores for the utterances. In computer experiments with 3 tests taken by 48 Korean adults, we show that speech rate, phonation time ratio, and smoothed unfilled pause rate are best for fluency scoring. The correlation of between the rater score and the SVR score is shown to be 0.84, which is higher than the correlation of 0.78 among raters. Although the correlation is slightly lower than the correlation of 0.90 when the transcribed texts are given, it implies that the proposed method can be used as a preprocessing tool for fluency evaluation of speaking tests.
June 2014
·
19 Reads
·
1 Citation
Phonetics and Speech Sciences
In this paper, we propose an automatic fluency evaluation algorithm for English speaking tests. In the proposed algorithm, acoustic features are extracted from an input spoken utterance and then fluency score is computed by using support vector regression (SVR). We estimate the parameters of feature modeling and SVR using the speech signals and the corresponding scores by human raters. From the correlation analysis results, it is shown that speech rate, articulation rate, and mean length of runs are best for fluency evaluation. Experimental results show that the correlation between the human score and the SVR score is 0.87 for 3 speaking tests, which suggests the possibility of the proposed algorithm as a secondary fluency evaluation tool.
... In [24][25][26], the main objective was the application of Mel-spectrograms and a Mel-scale kernel in detection of such elements as speech, singer's voice or music in the noisy audio recordings. Based on these studies, it could be stated that Mel-spectrograms might be employed as a successful approach in feature extractions and might serve well as input in the classification models based on convolutional neural networks. ...
June 2019
EURASIP Journal on Audio Speech and Music Processing
... Jang and Kwon recently proposed a method for fluency and pronunciation evaluation using an aligner in case when the transcribed text is given [10]. But, it could not evaluate the fluency of free-talking utterances if the transcribed text is not available. ...
June 2014
Phonetics and Speech Sciences