[Show abstract][Hide abstract]ABSTRACT: The development of assistive devices for automated sound recognition is an important field of research and has been receiving increased attention. However, there are still very few methods specifically developed for identifying environmental sounds. The majority of the existing approaches try to adapt speech recognition techniques for the task, usually incurring high computational complexity. This paper proposes a sound recognition method dedicated to environmental sounds, designed with its main focus on embedded applications. The pre-processing stage is loosely based on the human hearing system, while a robust set of binary features permits a simple k-NN classifier to be used. This gives the system the capability of in-field learning, by which new sounds can be simply added to the reference set in real-time, greatly improving its usability. The system was implemented in an FPGA based platform, developed in-house specifically for this application. The design of the proposed method took into consideration several restrictions imposed by the hardware, such as limited computing power and memory, and supports up to 12 reference sounds of around 5.3 s each. Experimental results were performed in a database of 29 sounds. Sensitivity and specificity were evaluated over several random subsets of these signals. The obtained values for sensitivity and specificity, without additional noise, were, respectively, 0.957 and 0.918. With the addition of +6 dB of pink noise, sensitivity and specificity were 0.822 and 0.942, respectively. The in-field learning strategy presented no significant change in sensitivity and a total decrease of 5.4% in specificity when progressively increasing the number of reference sounds from 1 to 9 under noisy conditions. The minimal signal-to-noise ration required by the prototype to correctly recognize sounds was between -8 dB and 3 dB. These results show that the proposed method and implementation have great potential for several real life applications.
Full-text Article · Jul 2016 · IEICE Transactions on Information and Systems
[Show abstract][Hide abstract]ABSTRACT: This paper presents a Complex-Valued Neural Network-based sound localization method. The proposed approach uses two microphones to localize sound sources in the whole horizontal plane. The method uses time delay and amplitude difference to generate a set of features which are then classified by a Complex-Valued Multi-Layer Perceptron. The advantage of using complex values is that the amplitude information can naturally masks the phase information. The proposed method is analyzed experimentally with regard to the spectral characteristics of the target sounds and its tolerance to noise. The obtained results emphasize and confirm the advantages of using Complex-Valued Neural Networks for the sound localization problem in comparison to the traditional Real-Valued Neural Network model.
Article · Oct 2013 · IEICE Transactions on Information and Systems
[Show abstract][Hide abstract]ABSTRACT: Many applications would emerge from the development of artificial systems able to accurately localize and identify sound sources.
However, one of the main difficulties of such kind of system is the natural presence of mixed sound sources in real environments.
This paper proposes a pulsed neural network based system for extraction and recognition of objective sound sources from background
sound source. The system uses the short term depression, that implements by the weight’s decay in the output layer and changing
the weight by frequency component in the competitive learning network. Experimental results show that objective sounds could
be successfully extracted and recognized.
[Show abstract][Hide abstract]ABSTRACT: Sound localization is an important ability intrinsic to animals, being currently explored by several researches. Even though
several systems and implementations have being proposed, the majority is very complex and not suitable for embedded systems.
This paper proposes a new approach for binaural sound localization and the corresponding implementation in an Field Programable
Gate Array (FPGA) device. The system is based on the signal processing modules of a previously proposed sound processing system,
which converts the input signal to spike trains. The time difference extraction and feature generation methods introduced
in this paper create simple binary feature vectors, used as training data for a standard LVQ neural network. An output temporal
layer uses the time information of the sound signals in order to reduce the misclassifications of the classifier. Preliminary
experimental results show high accuracy with small logic and memory requirements.
[Show abstract][Hide abstract]ABSTRACT: The detection of approaching vehicles is a very important topic on the development of complementary traffic safety systems. However, the majority of the proposed approaches are very complex and not suitable for embedded applications. This paper proposes a new sound approaching detection algorithm specifically intended for hardware implementation. Experimental results show higher accuracy and earlier detection when comparing to other methods.
[Show abstract][Hide abstract]ABSTRACT: Several applications would emerge from the development of efficient and robust sound classification systems able to identify the nature of non-speech sound sources. This paper proposes a novel approach that combines a simple feature generation procedure, a supervised learning process and fewer parameters in order to obtain an efficient sound classification system solution in hardware. The system is based on the signal processing modules of a previously proposed sound processing system, which convert the input signal in spike trains. The feature generation method creates simple binary features vectors, used as the training data of a standard LVQ neural network. An output temporal layer uses the time information of the sound signals in order to eliminate the misclassifications of the classifier. The result is a robust, hardware friendly model for sound classification, presenting high accuracy for the eight sound source signals used on the experiments, while requiring small FPGA logic and memory resources.
[Show abstract][Hide abstract]ABSTRACT: Pulsed Neuron (PN) model was proposed as one of the simplest models working by pulse trains. PN model has a membrane potential
to deal with the temporal information, and the calculation process is inexpensive. However, as the output function of PN model
is an Unit Step function, PN model cannot directly use the back-propagation (BP) method. It would be possible to solve general
pattern recognition problems if the PN model could be trained by the BP method. In this paper, we propose a BP method for
multilayer pulsed neural networks. The proposed method uses the duality of PN model, in which the desired output of hidden
layer neuron is calculated from output layer neurons’ weights and output. Experimental results show that the multilayer pulsed
neural networks can learn and recognize non-linear problems using the proposed method.
[Show abstract][Hide abstract]ABSTRACT: Modern applications of pattern recognition generate very large amounts of data, which require large computational effort to process. However, the majority of the methods intended for large-scale problems aim to merely adapt standard classification methods without considering if those algorithms are appropriated for large-scale problems. CombNET-II was one of the first methods specifically proposed for such kind of a task. Recently, an extension of this model, named CombNET-III, was proposed. The main modifications over the previous model was the substitution of the expert networks by Support Vectors Machines (SVM) and the development of a general probabilistic framework. Although the previous model's performance and flexibility were improved, the low accuracy of the gating network was still compromising CombNET-III's classification results. In addition, due to the use of SVM based experts, the computational complexity is higher than CombNET-II. This paper proposes a new two-layered gating network structure that reduces the compromise between number of clusters and accuracy, increasing the model's performance with only a small complexity increase. This high-accuracy gating network also enables the removal the low confidence expert networks from the decoding procedure. This, in addition to a new faster strategy for calculating multiclass SVM outputs significantly reduced the computational complexity. Experimental results of problems with large number of categories show that the proposed model outperforms the original CombNET-III, while presenting a computational complexity more than one order of magnitude smaller. Moreover, when applied to a database with a large number of samples, it outperformed all compared methods, confirming the proposed model's flexibility.
Full-text Article · Feb 2008 · IEICE Transactions on Information and Systems
[Show abstract][Hide abstract]ABSTRACT: Several applications would emerge from the development of artificial systems able to accurately localize and identify sound
sources. This paper proposes an integrated sound localization and classification system based on the human auditory system
and a respective compact hardware implementation. The proposed models are based on spiking neurons, which are suitable for
processing time series data, like sound signals, and can be easily implemented in hardware. The system uses two microphones,
extracting the time difference between the two channels with a chain of coincidence detection spiking neurons. A spiking neural
networks process the time-delay pattern, giving a single directional output. Simultaneously, an independent spiking neural
network process the spectral information of on audio channel in order to classify the source. Experimental results show that
a the proposed system could successfully locate and identify several sound sources in real time with high accuracy.
[Show abstract][Hide abstract]ABSTRACT: Current automobiles’ safety systems based on video cameras and movement sensors fail when objects are out of the line of sight.
This paper proposes a system based on pulsed neural networks able to detect if a sound source is approaching a microphone
or moving away from it. The system, based on PN models, compares the sound level difference between consecutive instants of
time in order to determine its relative movement. Moreover, the combined level difference information of all frequency channels
permits to identify the type of the sound source. Experimental results show that, for three different vehicles sounds, the
relative movement and the sound source type could be successfully identified.
[Show abstract][Hide abstract]ABSTRACT: Pulsed neurons are suitable for processing time series data, like sound signals, and can be easily implemented in hardware. In this paper, we propose an aural information processing system based on the human auditory system using a pulsed neuron model and a correspondent implementation in an FPGA device. Experimental results show that an FPGA based implementation of the proposed system can successfully identify the results faster than a similar software implementation. Noise tolerance experimental results are also presented.
[Show abstract][Hide abstract]ABSTRACT: Many applications would emerge from the development of artificial systems able to accurately localize and identify sound sources.
However, one of the main difficulties of such kind of system is the natural presence of multiple sound sources in real environments.
This paper proposes a pulsed neural network based system for separation and recognition of multiple sound sources based on
the difference on time lag of the different sources. The system uses two microphones, extracting the time difference between
the two channels with a chain of coincidence detection pulsed neurons. An unsupervised neural network processes the firing
information corresponding to each time lag in order to recognize the type of the sound source. Experimental results show that
three simultaneous musical instruments’ sounds could be successfully separated and recognized.
[Show abstract][Hide abstract]ABSTRACT: Chronic hepatitis C is a disease that is difficult to treat. At present, interferon might be the only drug, which can cure this kind of disease, but its efficacy is limited and patients face the risk of side effects and high expense, so doctors considering interferon must make a serious choice. The purpose of this study is to establish a simple model and use the clinical data to predict the interferon efficacy. This model is a combination of Feature Subset Selection and the Classifier using a Support Vector Machine (SVM). The study indicates that when five features have been selected, the identification by the SVM is as follows: the identification rate for the effective group is 85%, and the ineffective group 83%. Analysis of selected features show that HCV-RNA level, hepatobiopsy, HCV genotype, ALP and CHE are the most significant features. The results thus serve for the doctors' reference when they make decisions regarding interferon treatment.
Full-text Article · May 2007 · Journal of Medical Systems
[Show abstract][Hide abstract]ABSTRACT: The X-ray CT has a high resolution with respect to the absorption coefficient, while its spatial resolution is inferior to other X-ray imaging techniques. Consequently, the improvement of its spatial resolution is strongly desired. This paper theoretically analyzes the frequency component of the projection data, obtained by the third-generation CT system using a wide-angle fan beam. It is shown that the projection data contain the effective frequency component above the Nyquist frequency determined by the spacing between detectors. Then a new algorithm is proposed which utilizes that effective frequency component. The usefulness of the algorithm is examined by an experiment using a phantom and the human body. It is shown as a result that the new algorithm can realize spatial resolution corresponding to approximately twice the Nyquist frequency determined by the detector spacing. Using this algorithm, the spatial resolution of the CT image can be improved without decreasing the detector spacing.
Article · Mar 2007 · Systems and Computers in Japan
[Show abstract][Hide abstract]ABSTRACT: Although liver biopsy is currently regarded as the gold standard for staging liver fibrosis in chronic hepatitis C, it is a costly invasive procedure and carries a small risk for complication. Our aim in this study was to construct a simple model to distinguish between patients with no or mild fibrosis (METAVIR F0-F1) versus those with clinically significant fibrosis (METAVIR F2-F4). We retrospectively studied 204 consecutive CHC patients. Thirty-four serum markers with age, gender, duration of infection were assessed to classify fibrosis with a classifier known as the support vector machine (SVM). The method of feature selection known as sequential forward floating selection (SFFS) was introduced before the performance of SVM. When four serum markers were extracted with SFFS-SVM, F2-F4 could be predicted accurately in 96%. Our study showed that application of this model could identify CHC patients with clinically significant fibrosis with a high degree of accuracy and may decrease the need for liver biopsy.
[Show abstract][Hide abstract]ABSTRACT: Several research fields have to deal with very large classification problems, e.g. handwritten character recognition and speech recognition. Many works have proposed methods to address problems with large number of samples, but few works have been done concerning problems with large numbers of classes. CombNET-II was one of the first methods proposed for such a kind of task. It consists of a sequential clustering VQ based gating network (stem network) and several Multilayer Perceptron (MLP) based expert classifiers (branch networks). With the objectives of increasing the classification accuracy and providing a more flexible model, this paper proposes a new model based on the CombNET-II structure. the CombNET-III. The new model, intended for, but not limited to, problems with large number of classes, replaces the branch networks MLP with multiclass Support Vector Machines (SVM). It also introduces a new probabilistic framework that outputs posterior class probabilities, enabling the model to be applied in different scenarios (e.g. together with Hidden Markov Models). These changes permit the use of a larger number of smaller clusters, which reduce the complexity of the final classifiers. Moreover, the use of binary SVM with probabilistic outputs and a probabilistic decoding scheme permit the use of a pairwise output encoding on the branch networks, which reduces the computational complexity of the training stage. The experimental results show that the proposed model outperforms both the previous model CombNET-II and a single multiclass SVM. while presenting considerably smaller complexity than the latter. It is also confirmed that CombNET-III classification accuracy scales better with the increasing number of clusters, in comparison with CombNET-II.
Full-text Article · Sep 2006 · IEICE Transactions on Information and Systems
[Show abstract][Hide abstract]ABSTRACT: The linear gating classifier (stem network) of the large scale model CombNET-II has been always the limiting factor which restricts the number of the expert classifiers (branch networks). The linear boundaries between its clusters cause a rapid decrease in the performance with increas- ing number of clusters and, consequently, impair the overall performance. This work proposes the use of a non-linear classifier to learn the complex boundaries between the clusters, which increases the gating performance while keeping the balanced split of samples produced by the original se- quential clustering algorithm. The experiments have shown that, for some problems, the proposed model outperforms the monolithic classifier.