
Tanujit ChakrabortyUniversité Paris Sorbonne Abu Dhabi · Mathematics
Tanujit Chakraborty
Ph.D. from Indian Statistical Institute
Research Interests: Machine Learning, Time Series Forecasting, and Statistical Analysis of Networks.
About
49
Publications
7,423
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
660
Citations
Citations since 2017
Introduction
I am broadly interested in various areas of Statistics and Machine Learning with real-life applications. My research works involve developing statistical methodologies for "data-driven problems" from various applied disciplines (e.g., Epidemiology, Biology, Business, Software Reliability, Quality Engineering, Macroeconomics, Nonlinear Dynamics and Network Analysis to name a few). My research interest lies in Hybrid Machine Learning, Time Series Forecasting and Statistical Analysis of Networks.
Additional affiliations
January 2022 - present
Publications
Publications (49)
Dengue fever is a virulent disease spreading over 100 tropical and subtropical countries in Africa, the Americas, and Asia. This arboviral disease affects around 400 million people globally, severely distressing the healthcare systems. The unavailability of a specific drug and ready-to-use vaccine makes the situation worse. Hence, policymakers must...
Deep Learning has received increased attention due to its unbeatable success in many fields, such as computer vision, natural language processing, recommendation systems, and most recently in simulating multiphysics problems and predicting nonlinear dynamical systems. However, modeling and forecasting the dynamics of chaotic systems remains an open...
Deep learning utilizing transformers has recently achieved a lot of success in many vital areas such as natural language processing, computer vision, anomaly detection, and recommendation systems, among many others. Among several merits of transformers, the ability to capture long-range temporal dependencies and interactions is desirable for time s...
Infectious diseases remain among the top contributors to human illness and death worldwide, among which many diseases produce epidemic waves of infection. The unavailability of specific drugs and ready-to-use vaccines to prevent most of these epidemics makes the situation worse. These force public health officials, health care providers, and policy...
Forecasting time series data presents an emerging field of data science that has its application ranging from stock price and exchange rate prediction to the early prediction of epidemics. Numerous statistical and machine learning methods have been proposed in the last five decades with the demand for generating high-quality and reliable forecasts....
An unprecedented outbreak of the novel coronavirus (COVID-19) in the form of peculiar pneumonia has spread globally since its first case in Wuhan province, China, in December 2019. Soon after, the infected cases and mortality increased rapidly. The future of the pandemic’s progress was uncertain, and thus, predicting it became crucial for public he...
A new method has been proposed to generalize Burr-XII distribution, also called Burr distribution, by adding an extra parameter to an existing Burr distribution for more flexibility. In this method, the exponent of the Burr distribution is modeled using a nonlinear function of the data and one additional parameter. The models of this newly introduc...
In this study, we consider large-scale network data sets from different disciplines, namely social networks, collaboration networks, web graphs, citation networks, biological networks, product co-purchasing networks, temporal networks, communication networks, ground-truth networks, and brain networks. We study several individual data sets from each...
The coronavirus disease 2019 (COVID-19) has become a public health emergency of international concern affecting more than 200 countries and territories worldwide. As of September 30, 2020, it has caused a pandemic outbreak with more than 33 million confirmed infections, and more than 1 million reported deaths worldwide. Several statistical, machine...
Perhaps the most recent controversial topic in network science research is to determine whether real-world complex networks are scale-free or not. Recently, Broido and Clauset [A.D. Broido, A. Clauset, Nature Communication, 10, 1017 (2019)] asserted that the degree distributions of real-world networks are rarely power law under statistical tests. S...
Deep learning utilizing transformers has recently achieved a lot of success in many vital areas such as natural language processing, computer vision, anomaly detection, and recommendation systems, among many others. Among several merits of transformers, the ability to capture long-range temporal dependencies and interactions is desirable for time s...
Real-world networks are generally claimed to be scale-free. This means that the degree distributions follow the classical power-law, at least asymptotically. However, closer observation shows that the classical power-law distribution is often inadequate to meet the data characteristics due to the existence of an identifiable nonlinearity in the ent...
The remarkable flexibility and adaptability of both deep learning models and ensemble methods have led to the proliferation for their application in understanding many physical phenomena. Traditionally, these two techniques have largely been treated as independent methodologies in practical applications. This study develops an optimized ensemble de...
Forecasting time series present a perpetual topic of research in statistical machine learning for the last five decades. Due to the unprecedented outbreak of the novel coronavirus (COVID-19), forecasting the COVID-19 pandemic became a key research interest for both epidemiologists and statisticians. These future predictions are useful for the effec...
The remarkable flexibility and adaptability of both deep learning models and ensemble methods have led to the proliferation for their application in understanding many physical phenomena. Traditionally, these two techniques have largely been treated as independent methodologies in practical applications. This study develops an optimized ensemble de...
The ongoing coronavirus disease 2019 (COVID-19) pandemic is one of the major health emergencies in decades that affected almost every country in the world. As of June 30, 2020, it has caused an outbreak with more than 10 million confirmed infections, and more than 500,000 reported deaths globally. Due to the unavailability of an effective treatment...
Real-world heavy-tailed networks are claimed to be scale-free, meaning that the degree distributions follow the classical power-law. But it is evident from a closer observation that there exists a clearly identifiable non-linear pattern in the entire degree distribution in a log-log scale. Thus, the classical power-law distribution is often inadequ...
Unemployment has always been a very focused issue causing a nation as a whole to lose its economic and financial contribution. Unemployment rate prediction of a country is a crucial factor for the country’s economic and financial growth planning and a challenging job for policymakers. Traditional stochastic time series models, as well as modern non...
Real-world time series data sets contain a combination of linear and nonlinear patterns, making the time series forecasting problem more challenging. In this paper, a new hybrid methodology is introduced for forecasting univariate time series data sets using a multiplicative error modeling approach. An autoregressive integrated moving average (ARIM...
The coronavirus disease 2019 (COVID-19) has become a public health emergency of international concern affecting more than 200 countries and territories worldwide. As of September 30, 2020, it has caused a pandemic outbreak with more than 33 million confirmed infections and more than 1 million reported deaths worldwide. Several statistical, machine...
Software defect prediction (SDP) is a convenient way to identify defects in the early phases of the software development life cycle. This early warning system can help in the removal of software defects and yield a cost-effective and good quality of software products. A wide range of statistical and machine learning models have been employed to pre...
Real-world time series data sets contain a combination of linear and nonlinear patterns, making the time series forecasting problem more challenging. In this paper, a new hybrid methodology is introduced for forecasting univariate time series data sets using a multiplicative error modeling approach. An autoregressive integrated moving average (ARIM...
The ongoing COVID-19 pandemic is one of the major health emergencies in decades that affected almost every country in the world. As of May 10, 2020, it has caused an outbreak with more than 41,78,000
confirmed infections and more than 2,83,000 reported deaths globally. Due to the unavailability of an effective treatment (or vaccine) and insufficien...
The coronavirus disease 2019 (COVID-19) has become a public health emergency of international concern affecting 201 countries and territories around the globe. As of April 4, 2020, it has caused a pandemic outbreak with more than 11,16,643 confirmed infections and more than 59,170 reported deaths worldwide. The main focus of this paper is two-fold:...
The coronavirus disease 2019 (COVID-19) has become a public health emergency of international concern affecting 201 countries and territories around the globe. As of April 4, 2020, it has caused a pandemic outbreak with more than 11,16,643 confirmed infections and more than 59,170 reported deaths worldwide. The main focus of this paper is two-fold:...
Learning from an imbalanced data set presents a tricky problem in which traditional learning algorithms perform poorly. Traditional classifiers usually aim to optimize the overall accuracy without considering the relative distribution of each class. To improve predictions in imbalanced classification problems, this article presents a superensemble...
In this article, we propose a novel hybridization of regression trees (RTs) and radial basis function networks, namely, radial basis neural tree model, for waste recovery process (WRP) improvement in a paper industry. As a by-product of the paper manufacturing process, a lot of waste along with valuable fibers and fillers come out from the paper ma...
Software defect prediction (SDP) is an available way to identify defects in the early phases of software development life cycle. This early warning system can help in the removal of software defects and yield a cost-effective and good quality of software product. A wide range of statistical and machine learning models have been employed to predict...
Frequentist and Bayesian methods differ in many aspects, but share some basic optimal properties. In real-life classification and regression problems, situations exist in which a model based on one of the methods is preferable based on some subjective criterion. Nonparametric classification and regression techniques, such as decision trees and neur...
In this work, we propose a hybrid binary classifier which combines a decision tree with a support vector machine. The proposed hybrid model has the advantages of improved accuracy and easy interpretability. The model will be useful for feature selection cum classification tasks in real-world supervised learning problems. Numerical evidence is also...
Forecasting unemployment rate is a perpetual topic of research over the past three decades. Unemployment rate prediction of a country is a key factor for the country's economic and financial growth planning and a challenging job for policymakers. Traditional stochastic time series models, as well as modern nonlinear time series techniques, were emp...
Private business schools in India face a regular problem of picking quality students for their MBA programs to achieve the desired placement percentage. Generally, such datasets are biased towards one class, i.e., imbalanced in nature. And learning from the imbalanced dataset is a difficult proposition. This paper proposes an imbalanced ensemble cl...
In this work, we propose an ensemble of classification trees (CT) and artificial neural networks (ANN). Several statistical properties including universal consistency and upper bound of an important parameter of the proposed classifier are shown. Numerical evidence is also provided using various real-life data sets to assess the performance of the...
In this work, we propose a hybrid regression model to solve a specific problem faced by a modern paper manufacturing company. Boiler inlet water quality is a major concern for the paper machine.
If water treatment plant can not produce water of desired quality, then it results in poor health of the boiler water tube and consequently affects the qua...
Dengue case management is an alarmingly important global health issue. The effective allocation of resources is often difficult due to external and internal factors imposing nonlinear fluctuations in the prevalence of dengue fever. We aimed to construct an early-warning system that could accurately forecast subsequent dengue cases in three dengue e...
Dengue case management is an alarmingly important global health issue. The effective allocation of resources is often difficult due to external and internal factors imposing nonlinear fluctuations in the prevalence of dengue fever. We aimed to construct an early-warning system that could accurately forecast subsequent dengue cases in three dengue e...
In this work, we propose a hybrid regression model to solve a specific problem faced by a modern paper
manufacturing company. Boiler inlet water quality is a major concern for the paper machine. If water
treatment plant can not produce water of desired quality, then it results in poor health of the boiler water
tube and consequently affects the qua...
In this article, we propose a novel hybridization of regression trees (RT) and radial basis function networks (RBFN), namely, radial basis neural tree (RBNT) model, for waste recovery process improvement in the paper industry. As a by-product of the paper manufacturing process, a lot of waste along with valuable fibers and fillers come out from the...
Learning from an imbalanced dataset is a tricky proposition. Because these datasets are biased towards one class, most existing classifiers tend not to perform well on minority class examples. Conventional classifiers usually aim to optimize the overall accuracy without considering the relative distribution of each class. This article presents a su...
Private business schools in India face a common problem of selecting quality students for their MBA programs to achieve desired placement percentage. Business school data set is biased towards one class, i.e., imbalanced in nature. And learning from imbalanced data set is a difficult proposition. Most existing classification methods tend not to per...
In this work, we propose an ensemble of classification trees (CT) and artificial neural networks (ANN). Several statistical properties including universal consistency of the classifier are shown and numerical evidence is provided on a real life data set to assess the performance of the model. Our proposed nonparametric ensemble classifier doesn't s...
This work is motivated by a particular problem in a modern paper manufacturing industry, in which maximum efficiency of the process fiber-filler recovery equipment, also known as Krofta supracell, is desired. As a by-product of the paper manufacturing process, a lot of unwanted materials along with valuable fibers and fillers come out as waste mate...
In recent years, business schools face a common problem of selecting quality students for their Master of Business Administration (MBA) programs so that the target placement percentage is achieved. Selecting a wrong student may increase the number of unplaced students. Also, more the number of unplaced students more is the negative impact on the in...