Figure 3 - uploaded by Yu. V. Andreyev
Content may be subject to copyright.
Source publication
Various phonemes are considered in terms of nonlinear dynamics. Phase portraits of the signals in the embedded space, correlation dimension estimate and the largest Lyapunov exponent are analyzed. It is shown that the speech signals have comparatively small dimension and the positive largest Lyapunov exponent
Contexts in source publication
Context 1
... is constructed from the eigen vectors of covariance matrix of set vectors and the sum of squared projections of the set on the axes of the new basis, i.e. de facto, the energy corresponding to the specific direction is determined by the eigen values of the covariance matrix. It follows that if the set can be embedded in the subspace of R d then some eigen values will be equal to zero or vanish if the initial data were subjected to the impact of noise. In the numerical experiment the above procedure was accomplished as follows. A matrix X = ( x 1 , x 2 , . . . x N ) was constructed from the vectors x i . Then the covariance matrix A = X × X T of the dimension d × d was found and its eigen values and eigen vectors were calculated. Then the set is projected on the basis of eigen vectors[16]. Let us consider the phase portraits for two sounds ’a’ pronounced by the same speaker in the different fragments of the speech. The sets are constructed in 3D space but for clearer comparison they are depicted in the projection on the same plane (fig. 3). In addition, the figures of the normalized eigen values which determine the dimension of the space are shown in fig. 4. The last graph shows that the main part of energy is connected to the first and the second eigen values while the energy level corresponding to the other eigen values is significantly lower. It is seen that the fourth eigen value is of the order of 10 − 2 . Thus, to within 10 − 2 the phase space can be treated as 3D one determined by the largest three eigen values. In general, the phase portraits of sounds pronounced by the same speaker reveal some similarity that is also valid for the other vowels. Besides, as it is seen from fig. 3, the phase portraits manifest significant smoothness implying the presence of only the low frequency components that is valid for the vowels. The slight difference in the dynamics of the same sound can be explained by the reduction of ’a’ in the different fragments according to the difference of ...
Context 2
... is constructed from the eigen vectors of covariance matrix of set vectors and the sum of squared projections of the set on the axes of the new basis, i.e. de facto, the energy corresponding to the specific direction is determined by the eigen values of the covariance matrix. It follows that if the set can be embedded in the subspace of R d then some eigen values will be equal to zero or vanish if the initial data were subjected to the impact of noise. In the numerical experiment the above procedure was accomplished as follows. A matrix X = ( x 1 , x 2 , . . . x N ) was constructed from the vectors x i . Then the covariance matrix A = X × X T of the dimension d × d was found and its eigen values and eigen vectors were calculated. Then the set is projected on the basis of eigen vectors[16]. Let us consider the phase portraits for two sounds ’a’ pronounced by the same speaker in the different fragments of the speech. The sets are constructed in 3D space but for clearer comparison they are depicted in the projection on the same plane (fig. 3). In addition, the figures of the normalized eigen values which determine the dimension of the space are shown in fig. 4. The last graph shows that the main part of energy is connected to the first and the second eigen values while the energy level corresponding to the other eigen values is significantly lower. It is seen that the fourth eigen value is of the order of 10 − 2 . Thus, to within 10 − 2 the phase space can be treated as 3D one determined by the largest three eigen values. In general, the phase portraits of sounds pronounced by the same speaker reveal some similarity that is also valid for the other vowels. Besides, as it is seen from fig. 3, the phase portraits manifest significant smoothness implying the presence of only the low frequency components that is valid for the vowels. The slight difference in the dynamics of the same sound can be explained by the reduction of ’a’ in the different fragments according to the difference of ...
Similar publications
In this paper we find the possible phase portraits and bifurcations for a general class of host-vector epidemic models with non-linear incidence function generalizing the Ross model.
Four (2+1)-dimensional nonlinear evolution equations, generated by the Jaulent-Miodek hierarchy, are investigated by the bifurcation method of planar dynamical systems. The bifurcation regions in different subsets of the parameters space are obtained. According to the different phase portraits in different regions, we obtain kink (antikink) wave so...
Citations
A new method is proposed for measurements of an index of acoustic voice quality using the Kulbaka–Leybler information metric. The primary benefit of this method consists in its high-level dynamic properties, as calculated by eliminating the problem of a small number of observational samples. A theoretical study of the method’s efficiency was conducted, and its conclusions were confirmed experimentally. It has been established that, in order to provide a sufficiently precise assessment of the voice quality of the speaker, a speech signal of duration 2–3 minutes is required.
The automatic segmentation of the vocal signal precedes the features extraction stages, respectively the emotion recognition/classification. The extraction of the prosodic parameters as fundamental frequency (F0) and formants (F1-F4) cepstral coefficients LPCC and MFCC are made only on the vowel areas. The analysis tools from the SROL corpus are using a hybrid hierarchical system with four segmentation methods based on the autocorrelation function, AMDF method, the cepstral analysis and HPS method. Since the performance of this instrument has not been yet satisfactory, we analyzed other segmentation possibilities in order to obtain the best possible accuracy in segmentation. The predictive neural network used in this paper is in fact a simple perceptron which can approximate with high accuracy the quasi-periodic signals such as the vowels. The consonants have noisy properties and are complicated transition processes. The prediction error for the consonants comparing with the vowels is higher when it is used a sample neural network architecture.
Speech and music prosody deals with the presence of rhythm. This work is the analysis of the presence of the chaotic nature in the duration patterns of speech segments. A chaotic system is one whose characteristics are apparently random with some amount of predictability. This study is carried out both quantitatively and qualitatively. The qualitative analysis is carried out using recurrence plots and phase space plots, whereas quantitative analysis is carried out using statistical measurements such as Hurst Exponent and Recurrence period density entropy (RPDE). Results obtained in the chaotic analysis are useful in electronic music synthesis and speech synthesis in order to improve the naturality of speech generated.