Moe Pwint’s research while affiliated with University of Computer Studies Yangon and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (16)


Table 1 . Relative Error (RE[%]) for the TIDIGITS database
Figure 2. (a) The noisy signal, (b) the corresponding clean signal, (c) the modified signal using the basis function φ m
Table 2 . ANOVA table for the relative error
Figure 3. (a) The noisy signal (5 dB), (b) the modified signal (red) and envelope of the output of mmse filter (green)
Figure 4. The sample entropy of a clean speech signal 

+3

An evolutionary approach for segmentation of noisy speech signals for efficient voice activity detection
  • Article
  • Full-text available

October 2015

·

395 Reads

·

1 Citation

Artificial Intelligence Research

·

·

Moe Pwint

This paper presents a new approach to automatically segmenting speech signals in noisy environments. Segmentation of speech signals is formulated as an optimization problem and the boundaries of the speech segments are detected using a genetic algorithm (GA). The number of segments present in a signal is initially estimated from the reconstructed sequence of the original signal using the minimal number of Walsh basis functions. A multi-population GA is then employed to determine the locations of segment boundaries. The segmentation results are improved through the generations by introducing a new evaluation function which is based on the sample entropy and a heterogeneity measure. Experimental results show that the proposed approach can accurately detect the noisy speech segments as well as noise-only segments under various noisy conditions.

Download

Structuring Sport Video through Audio Event Classification

September 2010

·

33 Reads

·

1 Citation

Lecture Notes in Computer Science

Automatic audio information retrieval is facing great challenge due to the advances of information technology, more and more digital audio, images and video are being captured, produced and stored. To develop an automatic audio signal classification for a large dataset, building audio classifier is still challenging in existing work. In this proposed system we combine the two classifiers, SVM and decision tree, to classify the video information. To classify the audio information by using decision tree, the SVM is applied as a decision for feature selection. The aim is to achieve high accuracy in classifying of mixed types audio by combining two types of classifiers. In this proposed system four audio classes are considered and this classification and analysis is intended to analyze the structure of the sports video. Soccer videos are experimented in this system and experimental study indicates that the proposed framework can produce satisfactory results.


An approach for mulit-label music mood classification

August 2010

·

54 Reads

·

18 Citations

Music can express emotion in succinctly but in an effective way. Peoples select different music at different time concordance with listening time's mood and objectives. Music classification and retrieval by perceived emotion is natural and functionally powerful. Since, human perception of music mood varies individual to individual; multi-label music mood classification has become a challenging problem. Because music mood may well change one or more times in an entire music clip, an exact song may offer more than one music taste to the music listener. Therefore, tracking mood changes in an entire music clip is given precedence in multi-label music mood classification tasks. This paper presents self-colored music mood segmentation and a hierarchical framework based on new mood taxonomy model to automate the task of multi-label music mood classification. The proposed mood taxonomy model combines Thayer's 2 Dimension (2D) model and Schubert's Updated Hevner adjective Model (UHM) to mitigate the probability of error causing by classifying upon maximally 4 class classification from 9. The verse and chorus parts approximately 50 to 110 sec of the whole songs is exerted manually as input music trims in this system. Consecutive self-colored mood is segmented by the image region growing method. The extracted feature sets from these segmented music pieces are ready to inject the Fuzzy Support Vector Machine (FSVM) for classification. One-against-one (O-A-O) multi-class classification method are used, for 9 class classification upon updated Hevner labeling. The hierarchical framework with new mood taxonomy model has the advantage of reducing computational complexity due to the number of classifiers employed for O-A-O approach as only 19 instead of 36 classifiers.


A Framework for supporting consistent lookup in Distributed Hash Table

June 2010

·

20 Reads

There are many structured P2P systems that use Distributed Hash Table (DHT) to map data items onto the nodes in various ways for scalable routing and location. DHT are algorithms used in modern peer-to-peer applications, which provides a reliable, scalable and efficient way to manage peer-to-peer networks. As a fundamental problem in DHT based P2P system, the efficient location of the node that stores a desired data item, performance and consistent lookup are important to avoid performance degradation and guarantee system fairness. This paper presents a structural prevention strategy to remove inconsistent lookups on the basic that inconsistent lookup is generated by inconsistent routing tables. The algorithms keep the routing tables consistent with the state of the nodes in DHT and maintain a ring structure guaranteeing consistent lookup results in the presence of node joins and leaves. The goal is to be able to trust a lookup result to give the actual state of the DHT.


An Efficient Approach for Classification of Speech and Music

December 2008

·

18 Reads

·

3 Citations

Lecture Notes in Computer Science

A new method to classify an audio segment into speech and music related to the automatic transcription of broadcast news is presented. To discriminate between speech and music, sample entropy (SampEn), a time complexity measure, mainly operates as a feature. SampEn is a variant of the approximate entropy (ApEn) that measures the regularity of time series. The basic idea is to label a given audio into speech or music depending on its regularity. Based on the SampEn sequence calculated over a window, the regularity of a given audio stream is measured. The effectiveness of the proposed method is tested on experiments, including broadcast news shows from BBC radio stations, WBAI news, UN news and music genres with different temporal distributions. Results show the robustness of the proposed method achieving high discrimination accuracy for all tested experiments.


On the Discrimination of Speech/Music Using a Time Series Regularity

December 2008

·

24 Reads

·

4 Citations

A new method to discriminate between speech and music related to the automatic transcription of broadcast news is presented. In the proposed method, a time series regularity, sample entropy (SampEn), is mainly used as an efficient feature to discriminate speech and music of broadcast audio stream. SampEn is a variant of the approximate entropy (ApEn) that measures the regularity of time series. Depending on the regularity of time series, a segment of a given audio stream is classified into speech or music. The first step of the method is calculation of SampEn sequence over windows. The second step is classification of this segment with a rule-based classification scheme over sample entropy sequence. Experimental results show the effectiveness of the proposed method for broadcast news shows with different music styles.


Application of Walsh Transform Based Method on Tracheal Breath Sound Signal Segementation.

January 2008

·

24 Reads

·

2 Citations

This paper proposes a robust segmentation method for differentiating consecutive inspiratory/expiratory episodes of different types of tracheal breath sounds. This has been done by applying minimal Walsh basis functions to transform the original input respiratory sound signals. Decision module is then applied to differentiate transformed signal into respiration segments and gap segments. The segmentation results are improved through a refinement scheme by new evaluation algorithm which is based on the duration of the segment. The results of the experiments, which have been carried out on various types of tracheal breath sounds, show the robustness and effectiveness of the proposed segmentation method.


Phase Segmentation of Noisy Respiratory Sound Signals using Genetic Approach.

January 2008

·

61 Reads

·

2 Citations

In this paper, a new approach to automatically segment noisy respiratory sound signals is proposed. Segmentation is formulated as an optimization problem and the boundaries of the signal segments are detected using a genetic algorithm (GA). As the estimated number of segments present in a segmenting signal is initially obtained, a multi-population GA is employed to determine the locations of segment boundaries. The segmentation results are found through the generations of GA by introducing a new evaluation function, which is based on the sample entropy and a heterogeneity measure. Illustrative results for respiratory sound signals contaminated by loud heartbeats and other high level noises show that the proposed genetic segmentation method is quite accurate and threshold independent to find the noisy respiratory segments as well as the pause segments under different noisy conditions.


Speech/Nonspeech Detection Using Minimal Walsh Basis Functions

January 2007

·

144 Reads

·

4 Citations

EURASIP Journal on Audio Speech and Music Processing

This paper presents a new method to detect speech/nonspeech components of a given noisy signal. Employing the combination of binary Walsh basis functions and an analysis-synthesis scheme, the original noisy speech signal is modified first. From the modified signals, the speech components are distinguished from the nonspeech components by using a simple decision scheme. Minimal number of Walsh basis functions to be applied is determined using singular value decomposition (SVD). The main advantages of the proposed method are low computational complexity, less parameters to be adjusted, and simple implementation. It is observed that the use of Walsh basis functions makes the proposed algorithm efficiently applicable in real-world situations where processing time is crucial. Simulation results indicate that the proposed algorithm achieves high-speech and nonspeech detection rates while maintaining a low error rate for different noisy conditions.


Fig. 1. (a) Clean speech signal; (b) Noisy speech signal; (c) The modified signal.  
Fig. 2. (a) Noisy input signal; (b) Segmentation results shown in dashed line together with the clean speech signal.  
A Segmentation Method for Noisy Speech Using Genetic Algorithm

April 2005

·

366 Reads

·

15 Citations

Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on

The paper presents a technique to segment automatically a speech signal in noisy environments. The speech segmentation is formulated as an optimization problem and boundaries of the speech segments are detected using a genetic algorithm (GA). The initial number of segments is estimated from the modified version of the signal using the minimal number of binary Walsh basis functions. The segmentation results are improved through the generations of the GA by introducing a new evaluation function, which is based on the sample entropy and a heterogeneity measure. Experiments have been carried out on the TIDIGITS database with different types and levels of noise; the results show the efficiency of the proposed genetic segmentation algorithm.


Citations (10)


... For our specific application, the initial population is generated using uniform pdf with a size twice that of the number of segments obtained from the fusion step [55]. To focus on the application of interest, we only discuss here the cost function and leave out the details of the algorithm. ...

Reference:

A Hybrid Unsupervised Segmentation Algorithm for Arabic Speech Using Feature Fusion and a Genetic Algorithm (July 2018)
An evolutionary approach for segmentation of noisy speech signals for efficient voice activity detection

Artificial Intelligence Research

... Therefore, in most of recent studies [21,24,26], these two stages are the main concerns of researchers for AED due to larger diversity and variability of acoustic event compared with speech signal. Classification is routinely performed by statistical machine learning algorithms like Gaussian Mixture Model (GMM) [13], Hidden Markov Model (HMM) [31], Support Vector Machine (SVM) [25,35,39,42,44] or decision tree [18] that model class-specific feature distributions or estimate discriminative system parameters. Recently, some works using DNN as a classifier for AED were reported in [4,10,22]. ...

Structuring Sport Video through Audio Event Classification
  • Citing Conference Paper
  • September 2010

Lecture Notes in Computer Science

... Many works are aimed on the design of such complex systems that are able to classify the audio stream into speech and non-speech segments, as a basic separation requirement. For example, Swe et al. [2] proposed a new algorithm for robust speech and music discrimination using a single feature SampEn. It has been shown that windowed SampEn can be used as an effective feature for speech/music discrimination. ...

On the Discrimination of Speech/Music Using a Time Series Regularity
  • Citing Conference Paper
  • December 2008

... Based on specifics of the algorithm and VAD score, a speech/nonspeech decision is taken. Over the time, one or more different features of voice signals and techniques were included in VAD algorithms, such as energy or/and subband energy [2]- [4], entropy [5], correlation coefficients [6], wavelet transform [7] [8], Walsh basis function representation [9], long-term speech information [10]- [12], periodic to aperiodic component ratio [13] [14]. Also, in [15] it is shown that the image processing technique called "local binary pattern" could be used for VAD algorithms. ...

A New Speech/Non-Speech Classification Method using Minimal Walsh Basis Functions

... Cam et al. [10] have used the triplet Markov chain in a Bayesian framework to determine the breath boundaries and detect the phases. Feng et al. [11] have proposed segmentation of tracheal breath sounds and their phases using Walsh basis functions. Jin et al. [12] have proposed respiratory phase segmentation in single-channel tracheal sound through the multi-population genetic algorithm with sample entropy as an evaluation metric. ...

Application of Walsh Transform Based Method on Tracheal Breath Sound Signal Segementation.
  • Citing Conference Paper
  • January 2008

... The proposed method therefore should be independent of amplitude variation between these two respiratory phases and able to perform accurate annotation without knowing the structures of the respiratory cycles . The presented method is applied on the consecutive inspiration/expiration segments as obtained by using the segmentation method presented in [10] . Each pair of consecutive respiratory phase segments are first aligned using phase shift information. ...

Phase Segmentation of Noisy Respiratory Sound Signals using Genetic Approach.
  • Citing Conference Paper
  • January 2008

... In [6], the researchers found that the signal entropy with respect to time can function as an effective fingerprint for audio data, providing highly competitive accuracy in recognition tasks. Entropy has also been used to distinguish speech from singing as a measure of "regularity" [11]. ...

An Efficient Approach for Classification of Speech and Music
  • Citing Conference Paper
  • December 2008

Lecture Notes in Computer Science

... Unlike our previously proposed single-stage approach, [10] here we propose a two-stage GA-based segmentation approach. In the first stage, the original signal is reconstructed into a modified sequence using minimal binary Walsh basis functions. ...

Speech/Nonspeech Detection Using Minimal Walsh Basis Functions

EURASIP Journal on Audio Speech and Music Processing

... The problem of multivariate time-series segmentation has been approached from a number of directions. Omranian et al. (2015) suggested four broad categories of segmentation approaches which are particularly relevant to biological data: clustering (Abonyi et al., 2005;Maya et al., 2020), graphical models (Angelosante & Giannakis, 2011;Xuan & Murphy, 2007), genetic algorithms (Nikolaou et al., 2015;Pwint & Sattar, 2005), and regression (Chamroukhi et al., 2013;Omranian et al., 2015). These modeling approaches are parametric in nature. ...

A Segmentation Method for Noisy Speech Using Genetic Algorithm

Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on