Tapankumar Basu

Tapankumar Basu
  • Managing Director at Retd, Prof.,IIT Kharagpur

About

96
Publications
6,121
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
652
Citations
Current institution
Retd, Prof.,IIT Kharagpur
Current position
  • Managing Director

Publications

Publications (96)
Article
Full-text available
The performance of speaker recognition system is highly dependent on the amount of speech used in enrollment and test. This work presents a detailed experimental review and analysis of the GMM-SVM based speaker recognition system in presence of duration variability. This article also reports a comparison of the performance of GMM-SVM classifier wit...
Chapter
This paper describes the impact of spoken language and emotional variation in a multilingual speaker identification (SID) system. The development of speech technology applications in low resource languages (LRL) is challenging due to the unavailability of proper speech corpus. This paper illustrates performance analysis of SID in six Eastern and No...
Article
Full-text available
Research and development of speech technology applications in low-resource languages (LRL) are challenging due to the non-availability of proper speech corpus. Especially, for most of the Indian languages, the amount and type of data found in different digital sources are sparse and prior works are too few to serve the purpose of large-scale develo...
Chapter
In the Indian scenario, the inadequacy of digitally available resources of language restricts the expansion of speech technology applications. This paper describes an experimental study of two such low-resource tribal languages (LRTL) of India, Santali, and Hrangkhawl for language identification (LID) purposes. Two different approaches have been ta...
Preprint
The performance of speaker recognition system is highly dependent on the amount of speech used in enrollment and test. This work presents a detailed experimental review and analysis of the GMM-SVM based speaker recognition system in presence of duration variability. This article also reports a comparison of the performance of GMM-SVM classifier wit...
Conference Paper
Full-text available
This paper describes the subjective evaluation of Bengali Emotional speech corpus for the application of automatic emotion recognition. To develop emotional speech corpora, it is important to note that all the basic and natural emotions which are assumed to be produced by human should be considered. A speech corpus containing seven basic human emot...
Conference Paper
Full-text available
This paper describes the vowels characteristics of five languages of Nagaland namely Nagamese, Ao, Lotha, Sumi and Angami. Vowel duration and formant structure (1st and 2nd formant i.e. F1 and F2) are investigated and analyzed for these languages. A detailed analysis is carried out on six vowels namely /u/, /o/, /ə/, /a/, /e/, /i/ from readout spee...
Article
Full-text available
In this paper, first a procedure to recover the Fourier phase of a signal from the phase of its bispectrum; namely the Matsuoka and Ulrych algorithm, is presented. It has also been shown that the Matsuoka and Ulrych algorithm fails when wrapped bispectral phase is used. An alternative scheme which reduces the computational complexity is presented....
Conference Paper
Full-text available
This paper describes the implementation of unsupervised speaker segmentation and clustering system. Main objective of the work presented in this paper is to study the performance of speaker diarization system using a new feature-set called Temporal Energy of Subband Cepstral Coefficients (TESBCC) and Pitch based features. The system first classifie...
Conference Paper
Full-text available
The study of closed set text-independent speaker identification using whisper speech is presented in this paper. A new feature called temporal Teager energy based sub band cepstral coefficients (TTESBCC) is proposed. The work presented compares the performance of four feature sets: Mel frequency cepstral coefficients (MFCC), temporal energy of sub...
Conference Paper
Full-text available
One of the challenging and difficult problems under the category of Music Information Retrieval (MIR) is to identify a singer of a given song under the strong influence of instrumental sounds. The performance of Singer Identification (SID) system is also severely affected by the quality of recording devices, transmission channels and singing voice(...
Conference Paper
form only given. This work investigates whether vocal emotion expressions of (i) discrete emotion be distinguished from 'no-emotion' (i.e. neutral), (ii) one discrete emotion be distinguished from another, (iii) surprise, which is actually a cognitive component that could be present with any emotion, be also recognized as distinct emotion, (iv) dis...
Conference Paper
Cases of cybercrime & terrorism on IP network is increasing day by day. In addition, there is a tendency to fraud phone-banking systems, and gain access to secure premises or accounts, which may be protected through the voice-based biometric system. To minimize these problems, we need a voice/speaker recognition system with utmost accuracy. Number...
Chapter
In this paper dynamic stability analysis of power system is investigated considering proportional-integral power system stabilizer (P-I PSS) for two-area power system. Gains of P-I PSS are optimized by minimizing an objective function using genetic algorithm (GA). Participation factor method is used to find out the suitable location of PSS. Analysi...
Article
The study of text-independent speaker identification in emotional environments is presented in this paper. The study includes identifying the speaker using speech samples in five basic emotions viz. anger, happiness, sadness, disgust, and fear. The work presented compares the performance of four feature sets: Mel frequency cepstral coefficients (MF...
Article
In this paper, various temporal features (i.e., zero crossing rate and short-time energy) and spectral features (spectral flux and spectral centroid) have been derived from the Teager energy operator (TEO) profile of the speech waveform. The efficacy of these features has been analyzed for the classification of normal and dysphonic voices by compar...
Chapter
This paper compares performances between Gaussian Mixture Model (GMM) classifier and polynomial classifier for text-independent speaker identification. The MFCC feature set has been used for this comparison. Experimental evaluation was conducted on the POLYCOST database with 130 speakers. The importance of the prior in the polynomial classifier has...
Conference Paper
A query-by-humming (QBH) system deals with retrieving the original song or music from the knowledge of its humming tune. In this paper, we present a novel Derivative Dynamic Time Warping (DDTW) based method for querying desired songs in Hindi (an Indian language) from a database by humming the tune. The system presented here use both intuitive as w...
Article
Full-text available
This paper presents the use of fuzzy min-max neural network for the text independent speaker identification. The fuzzy min-max neural network utilizes fuzzy sets as pattern classes. It is a three layer feedforward network that grows adaptively to meet the demands of the problem. The database containing speech utterances recorded from fifty speakers...
Conference Paper
This paper compares the feature sets extracted using frequency-time analysis approach and time-frequency analysis approach for text-independent speaker identification. The impetus for the frequency-time analysis approach comes from the band pass filtering view of STFT. Nyquist filter bank and Gaussian filter bank both have been used for extracting...
Article
This paper demonstrates the relation between flatness index of the eigen values of the covariance matrix of the feature vectors and the correlation between the features in a multidimensional feature space. The constant distance loci of Mahalanobis metric has been used to interpret the relation. The intuitive interpretation of the flatness index and...
Conference Paper
This paper compares the feature sets extracted using time-frequency analysis approach and frequency-time analysis approach for text-independent speaker verification. Mel-frequency cepstral coefficient (MFCC) feature set is extracted using time-frequency analysis approach. Temporal energy subband cepstral coefficient (TESBCC) feature set is extracte...
Article
This paper compares the feature sets extracted using time-frequency analysis approach and frequency-time analysis approach for text-independent speaker identification. Mel- frequency cepstral coefficient (MFCC) feature set and Inverted Mel-frequency cepstral coefficient (IMFCC) feature set are extracted using time-frequency analysis approach. Tempo...
Article
This paper demonstrates the use of two new methods of feature extraction called temporal energy of subband cepstral coefficient (TESBCC) and temporal correlation of subband cepstral coefficient (TCSBCC) for text-independent speaker identification. The focus of this work is on applications which yield higher identification accuracy without increasin...
Article
This paper introduces the use of a new method of feature extraction based on frequency-time analysis approach for text-independent speaker identification. The impetus for this new feature extraction technique comes from the filter bank summation method of STFT using Nyquist filter bank. The focus of this work is on applications which yield higher i...
Article
In this paper, the coordinated operation of conventional power system stabilizer (CPSS) and thyristor controlled series capacitor (TCSC) is studied. The analysis of mode controllability is used to select the effective location for TCSC. The performance of TCSC equipped with a phase lead-lag controller is investigated. The controllers design problem...
Conference Paper
This paper introduces the use of a new method of feature extraction for robust text-independent speaker identification. The focus of this work is on applications which yield higher identification accuracy without increasing the computational effort. The impetus for this new feature extraction technique comes from a new transformation which is based...
Article
Full-text available
In this paper, tuning of power system stabilizer (PSS) and thyristor controlled series capacitor (TCSC) is studied. The analysis of mode controllability is used to select the effective location for TCSC. The performances of TCSC equipped with a proportional-integral-derivative controller (P-I-D controller) and proportional-integral-derivative power...
Conference Paper
This paper proposes a new method of feature extraction for robust text-independent speaker identification. The focus of this work is on applications which yield higher identification accuracy without increasing the computational effort. The impetus for this new feature extraction technique comes from a new transformation. We have proposed this tran...
Conference Paper
This paper introduces a new Nyquist window. The proposed window has been compared with the Gaussian window. The time-bandwidth product of the proposed window is very close to the time-bandwidth product of the Gaussian window.
Conference Paper
This work investigates whether vocal emotion expressions of (i) discrete emotion be distinguished from 'no-emotion' (i.e. neutral), (ii) one discrete emotion be distinguished from another, (iii) surprise, which is actually a cognitive component that could be present with any emotion, be also recognized as distinct emotion, (iv) discrete emotion be...
Article
Most of the physical systems exhibit some degree of time-varying behavior. Physical phenomena exhibit time-varying behavior for a number of reasons. Some of the systems are inherently time-varying and can not effectively be modeled using time invariant models. This paper deals with the identification of time-varying systems using Haar basis functio...
Conference Paper
This paper deals with the use of Kalman filter approach for identification of linear fast time-varying processes. Most of the physical processes exhibit some degree of time-varying behavior. Physical processes exhibit time-varying behavior for a number of reasons. Some of the processes are inherently time-varying and can not effectively be modeled...
Conference Paper
In this paper, a new method of machine learning,viz., Modified Polynomial Networks (MPN) is proposed for the Dialect Recognition (DR) problem in an Indian language, viz., Marathi. The proposed algorithm for machine learning is interpreted as designing a neural network by viewing it as a curve-fitting (approximation) problem in a high-dimensional sp...
Article
Automatic Speaker Recognition (ASR) refers to the task of identifying a person based on his or her voice with the help of machines. ASR finds its potential applications in telephone based financial transactions, purchase of credit card and in forensic science and social anthropology for the study of different cultures and languages. Results of ASR...
Article
Automatic Speaker Recognition (ASR) is an economic tool for voice biometrics because of availability of low cost and powerful processors. For an ASR system to be successful in practical environments, it must have high mimic resistance, i.e., the system should not be defeated by determined mimics which may be either identical twins or professional m...
Article
The present work investigates the following specific research questions concerning voice emotion recognition: whether vocal emotion expressions of discrete emotion (i)can be distinguished from no-emotion (i.e. neutral), (ii)can be distinguished from another, (iii)of surprise, which is actually a cognitive component that could be present with any em...
Conference Paper
This work investigates whether vocal emotion expressions of full-blown discrete emotions can be recognized cross-lingually. This study will enable us to get more information regarding nature and function of emotion. Furthermore, this work will help in developing a generalized vocal emotion recognition system, which will increase the efficiency requ...
Article
In this paper, the basic principle of parameter identification of TVARX model using Recursive method with Forgetting Factor is given. Physical phenomena exhibit nonstationary or time varying behavior for a number of reasons. System identification is an experimental approach for determining the dynamic model of a system. One of the key elements for...
Conference Paper
This work investigates whether vocal emotion expression of discrete emotion (i) can be recognized cross-lingually, (ii) of surprise, which is actually a cognitive component that could be present with any emotion, can also be recognized as a distinct emotion. This study will enable us to get more information regarding nature and function of emotion....
Conference Paper
This paper deals with the identification of slowly time-varying system using Legendre basis functions. Physical phenomena exhibit time-varying behaviour for a number of reasons. To model these systems the models with time-dependent parameters are required. In this paper, the system with slowly varying parameters is considered for identification. Th...
Conference Paper
This paper presents a method based on Gaussian mixture model (GMM) classifier and Mel-frequency cepstral coefficients (MFCC) as features for emotion recognition from Assamese speeches. For training and testing of the method, data collection is carried out in Jorhat (Assam, India), which consisted of acted speeches of one short emotionally biased se...
Chapter
In this paper, a new method of classifier design, viz., Modified Polynomial Networks (MPN) is developed for the Language Identification (LID) problem. The novelty of the proposed method consists of building up language models for each language using the normalized mean of the training feature vectors for all the speakers in particular language clas...
Conference Paper
Language Identification (LID) refers to the task of identifying an unknown language from the test utterances. In this paper, a new feature set, viz.,T-MFCC by amalgamating Teager Energy Operator (TEO) and well-known Mel frequency cepstral coefficients (MFCC) is developed. The effectiveness of the newly derived feature set is demonstrated for identi...
Conference Paper
A stroke point matching by dynamic time warping (DTW) and strokes wise segment alignment by using spline interpolation for time and coordinate signals are presented in this paper. The algorithm uses both the features and signals based verification methodologies. The top-level in signature representation deals with the global geometric shape. This m...
Conference Paper
Full-text available
Language Identification (LID) refers to the task of identifying an unknown language from the test utterances. In this paper, a new method of feature extraction, viz., Teager Energy Based Mel Frequency Cepstral Coefficients (T-MFCC) is developed for identification of phonetically similar languages. Finally, an LID system is presented for Hindi and U...
Conference Paper
Automatic Speaker Recognition (ASR) deals with the identification or verification of a person's identity from his or her voice with the help of machines. A typical ASR system consists of two major blocks, viz., feature extractor and pattern classifier. The feature extractor does the job of mapping speech signals into speaker-specific features where...
Article
This correspondence proposes a novel nonlinear adaptive algorithm named as filtered-s least mean square (FSLMS) algorithm for multichannel active control of nonlinear noise processes. A reduced complexity FSLMS algorithm using filter bank approach is also suggested. The performance of the proposed algorithm is validated through computer simulations...
Conference Paper
Full-text available
In this paper, a new method of feature extraction based on design of cubic spline wavelet has been described. Dialectal zone based speaker classification in Marathi language has been attempted in the open set mode using polynomial classifier. The method consists of dividing the speech signal into nonuniform subbands in approximate Mel-scale using a...
Article
Full-text available
Automatic Speaker Recognition (ASR) is an economic tool for voice biometrics because of availability of low cost and powerful processors. For an ASR system to be successful in practical environments, it must have high mimic resistance, i.e., the system should not be defeated by determined mimics which may be either identical twins or professional m...
Article
Voice biometrics is an economic method of person authentication with the help of machines because of low cost and high power computers. In this paper, we investigate the problem of spectral resolution in female speech for speaker identification. Finally, a speaker recognition system is presented to compare the relative performance of different LP-b...
Article
Automatic speaker recognition (ASR) refers to the detection of a person's identity from his/her voice with the help of machines. ASR finds its potential applications in telephone based financial transactions, purchase of credit card, information retrieval for surveillance in defense and intelligence organization and study of different dialects in a...
Article
Greenhouse is the most practical method of achieving the objectives of protected agriculture where the natural environment is modified by using sound engineering principles to achieve optimum plant growth and yield. The methodology proposed in this paper applies artificial intelligence (AI) techniques to the modeling and control of some climate var...
Conference Paper
Full-text available
The goal of this paper is to present our work on the analysis of speech and handwriting biometrics related to meta data, which are based on one side on system hardware specifics (technical meta data) and on the other side to personal attributes (non-technical meta data). System related meta data represent physical characteristics of biometric senso...
Article
In the present work, dynamic stability analysis of power system is investigated considering proportional-integral-derivative power system stabilizer (P-I-D PSS). Gain settings of the P-I-D stabilizers are optimized at several operating conditions by minimizing an objective function using genetic algorithm (GA). Dynamic responses for a step change i...
Conference Paper
In this paper, effect of speech coding on the performance of speaker identification system is presented. Experiments were performed with G711, G726 and GSM-FR standards coding algorithms. Results are shown for the mel frequency cepstrum coefficients (MFCC) as feature set and polynomial classifier of 3rd order approximation for modelling of speaker....
Conference Paper
In this paper, a new feature set T-MFCC amalgamating Teager energy operator (TEO) and mel frequency cepstral coefficients (MFCC) is developed. The effectiveness of the newly derived feature set is demonstrated for the identification of twins in Marathi. The results are also compared with linear prediction cepstral coefficients (LPCC) for polynomial...
Conference Paper
In this paper, a new method of feature extraction based on multirate signal processing and wavelet analysis using a polynomial classifier has been described. Dialectal zone based speaker classification in Marathi language has been attempted in the open set mode. The method consists of dividing the speech signal into nonuniform subbands in approxima...
Conference Paper
In the present work, dynamic stability analysis of power system is investigated considering proportional-integral-derivative power system stabilizer (P-I-D PSS) for multimachine power systems. Gain settings of P-I-D PSS are optimized by minimizing an objective function using genetic algorithm (GA). Dynamic responses are also compared considering P-...
Conference Paper
In this paper, the application of thyristor controlled phase shifter (TCPS) in damping power system oscillation is investigated. Analysis is carried out considering TCPS equipped with conventional lead-lag controller and with proportional-integral-derivative (P-I-D) controller. Parameters and gain settings of the TCPS controller are optimized using...
Conference Paper
Automatic Speaker Recognition (ASR) has been an active area of research in speech processing. ASR deals with the identification of a person's voice with the help of machines. An important question which must be answered for the ASR system is how well the system resists effects of determined mimics especially identical twins or triplets. In this pap...
Conference Paper
Full-text available
Today biometric techniques are based either on passive (e.g. IrisScan, Face) or active methods (e.g. voice and handwriting). In our work we focus on evaluation of the latter. These methods, also described as behavioral Biometric, are characterized by a trait that is learnt and acquired over time. Several approaches for user authentication have been...
Conference Paper
In this paper, a new method of feature extraction based on perceptually meaningful subband decomposition of speech signal has been described. Dialectal zone based speaker classification in Marathi language has been attempted in the open set mode using a polynomial classifier. The method consists of dividing the speech signal into nonuniform subband...
Conference Paper
Automatic Speaker Recognition (ASR) is an economic method of biometrics because of the availability of low cost and powerful processors. An important question which must be answered for the ASR system is how well the system resists the effects of determined mimics such as those based on physiological characteristics especially identical twins or tr...
Article
Full-text available
In this paper we summarize our first research results in the field of Cross Culture user authentication. We will investigate intercultural aspects of biometrics, both of technical and legal nature. Besides biometric based user authentication, Human-to-Computer interfaces are an important part of our work. We present a methodology for intercultural...
Article
Medium range forecasts of power system load for a span of one day to one week are often required for the planning of short term maintenance schedules of unit auxiliaries and peaking stations as well as for maintaining security constraints and minimizing operational costs. A time series model of multiplicative SARIMA (seasonal autoregressive integra...
Article
The most commonly used Box-Jenkins approach for forecasting purposes has been modified to account for seasonality, employing a powerful orthogonal transform called the Walsh transform. The non-stationarity has been identified and after removing trends, the series is divided into segments of one period length. The segments have been transformed and...
Article
Medium range forecasts of hourly load demand for a span of 24 h to 168 h (one week) are required for the preparation of short term maintenance schedules of unit auxiliaries and peaking stations apart from maintaining security constraints and minimizing operational costs. Forecasts of the daily load for seven days by the available multiplicative SAR...
Article
The most popular Box-Jenkins method, generally used for short-term forecasting, is modified to make it suitable for medium and long-range forecasting. The non-stationarity and seasonality have been identified and, after removing trends and/or seasonality, the series are tested for stationarity by various methods. The series have been fitted for dif...
Article
Interconnected power systems containing large hydro-stations experience building up of oscillations following disturbances. These, at times, cause instability rendering the system deficient in generation. It is thought that negative damping caused by the turbine due to water inertia effect, is chiefly responsible for such behavior. Compensation sch...
Article
Hydroturbines introduce negative damping in power systems, sometimes giving rise to instability. Earlier works have demonstrated the effectiveness of compensation schemes in improving the dynamic stability of the system under parameter variations. But the improvement in the stability limit may not be the only desirable feature of a system's perform...
Article
Large hydrostations in an interconnected power system show oscillation following a disturbance. These oscillations may build up after a long time and may cause stability problems. The effects of governor turbine control loops on the system stability were studied. Two compensating schemes for the turbine model have been suggested in this paper, cons...
Article
Hydroturbines introduce negative damping in power systems because of water inertia effect reflected in the numerator to their transfer functions. Compensation schemes for the hydroturbine may counter the negative damping effect to a large extent, and further improvement in the stability may be achieved by additional stabilizing signals. This paper...
Article
Following a disturbance, large hydro stations set in oscillations in power system, which may build up after a long time and may cause instability. The inherent destabilising effects of the hydroturbine due to the water inertia can be countered by selecting suitable compensating schemes. The effectiveness of such a scheme for improving transient sta...
Article
The paper describes the development of a mathematical model in the d-q-0 frame of reference for the analysis of unbalanced operation of synchronous machines. The equations for faults between two phases and between one line and the generator neutral are described. The solution of the equations of the model by digital computer is presented. Initially...
Article
Full-text available
In this paper a new approach for combining biometric authentication and digital watermarking is presented. A digital audio watermark method is used to embed metadata into the reference data of biometric speaker recognition. Metadata in our context may consist of feature template representations complementary to the speech modality, such as iris cod...
Article
Full-text available
This paper describes a technique for unsupervised audio segmentation. Main objective of the work presented in this paper is to study the performance of audio segmentation system using metric-based method. The system first classifies the audio signal into speech and nonspeech signal using variance of zero crossing rate. The feature Line spectral pai...
Article
Full-text available
Speaker Recognition (SR) is an economic method of biometrics be-cause of availability of low cost and high power computers. An important question which must be answered for the SR system is how well the system resists the effects of determined mimics such as those based on physiological characteristics especially identical twins or triplets. In thi...
Article
Full-text available
Automatic Speaker Recognition (ASR) is an economic method of biometrics because of the availability of the low cost and powerful processors. Results of ASR are highly dependent on database, i.e., the results obtained in an ASR system are meaningless if the recording conditions are not of standard. In this paper, a methodology and a typical experime...

Network

Cited By