Tanujit Chakraborty

Tanujit Chakraborty
Université Paris Sorbonne Abu Dhabi · Mathematics

Ph.D. from Indian Statistical Institute
Research Interests: Machine Learning, Time Series Forecasting, and Statistical Analysis of Networks.

About

49
Publications
7,423
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
660
Citations
Citations since 2017
49 Research Items
660 Citations
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
Introduction
I am broadly interested in various areas of Statistics and Machine Learning with real-life applications. My research works involve developing statistical methodologies for "data-driven problems" from various applied disciplines (e.g., Epidemiology, Biology, Business, Software Reliability, Quality Engineering, Macroeconomics, Nonlinear Dynamics and Network Analysis to name a few). My research interest lies in Hybrid Machine Learning, Time Series Forecasting and Statistical Analysis of Networks.
Additional affiliations
January 2022 - present
Sorbonne Université
Position
  • Researcher and Advisor

Publications

Publications (49)
Preprint
Full-text available
Dengue fever is a virulent disease spreading over 100 tropical and subtropical countries in Africa, the Americas, and Asia. This arboviral disease affects around 400 million people globally, severely distressing the healthcare systems. The unavailability of a specific drug and ready-to-use vaccine makes the situation worse. Hence, policymakers must...
Preprint
Full-text available
Deep Learning has received increased attention due to its unbeatable success in many fields, such as computer vision, natural language processing, recommendation systems, and most recently in simulating multiphysics problems and predicting nonlinear dynamical systems. However, modeling and forecasting the dynamics of chaotic systems remains an open...
Preprint
Full-text available
Deep learning utilizing transformers has recently achieved a lot of success in many vital areas such as natural language processing, computer vision, anomaly detection, and recommendation systems, among many others. Among several merits of transformers, the ability to capture long-range temporal dependencies and interactions is desirable for time s...
Preprint
Full-text available
Infectious diseases remain among the top contributors to human illness and death worldwide, among which many diseases produce epidemic waves of infection. The unavailability of specific drugs and ready-to-use vaccines to prevent most of these epidemics makes the situation worse. These force public health officials, health care providers, and policy...
Preprint
Full-text available
Forecasting time series data presents an emerging field of data science that has its application ranging from stock price and exchange rate prediction to the early prediction of epidemics. Numerous statistical and machine learning methods have been proposed in the last five decades with the demand for generating high-quality and reliable forecasts....
Article
Full-text available
An unprecedented outbreak of the novel coronavirus (COVID-19) in the form of peculiar pneumonia has spread globally since its first case in Wuhan province, China, in December 2019. Soon after, the infected cases and mortality increased rapidly. The future of the pandemic’s progress was uncertain, and thus, predicting it became crucial for public he...
Article
Full-text available
A new method has been proposed to generalize Burr-XII distribution, also called Burr distribution, by adding an extra parameter to an existing Burr distribution for more flexibility. In this method, the exponent of the Burr distribution is modeled using a nonlinear function of the data and one additional parameter. The models of this newly introduc...
Preprint
Full-text available
In this study, we consider large-scale network data sets from different disciplines, namely social networks, collaboration networks, web graphs, citation networks, biological networks, product co-purchasing networks, temporal networks, communication networks, ground-truth networks, and brain networks. We study several individual data sets from each...
Chapter
Full-text available
The coronavirus disease 2019 (COVID-19) has become a public health emergency of international concern affecting more than 200 countries and territories worldwide. As of September 30, 2020, it has caused a pandemic outbreak with more than 33 million confirmed infections, and more than 1 million reported deaths worldwide. Several statistical, machine...
Article
Full-text available
Perhaps the most recent controversial topic in network science research is to determine whether real-world complex networks are scale-free or not. Recently, Broido and Clauset [A.D. Broido, A. Clauset, Nature Communication, 10, 1017 (2019)] asserted that the degree distributions of real-world networks are rarely power law under statistical tests. S...
Article
Full-text available
Deep learning utilizing transformers has recently achieved a lot of success in many vital areas such as natural language processing, computer vision, anomaly detection, and recommendation systems, among many others. Among several merits of transformers, the ability to capture long-range temporal dependencies and interactions is desirable for time s...
Article
Full-text available
Real-world networks are generally claimed to be scale-free. This means that the degree distributions follow the classical power-law, at least asymptotically. However, closer observation shows that the classical power-law distribution is often inadequate to meet the data characteristics due to the existence of an identifiable nonlinearity in the ent...
Article
Full-text available
The remarkable flexibility and adaptability of both deep learning models and ensemble methods have led to the proliferation for their application in understanding many physical phenomena. Traditionally, these two techniques have largely been treated as independent methodologies in practical applications. This study develops an optimized ensemble de...
Conference Paper
Full-text available
Forecasting time series present a perpetual topic of research in statistical machine learning for the last five decades. Due to the unprecedented outbreak of the novel coronavirus (COVID-19), forecasting the COVID-19 pandemic became a key research interest for both epidemiologists and statisticians. These future predictions are useful for the effec...
Preprint
Full-text available
The remarkable flexibility and adaptability of both deep learning models and ensemble methods have led to the proliferation for their application in understanding many physical phenomena. Traditionally, these two techniques have largely been treated as independent methodologies in practical applications. This study develops an optimized ensemble de...
Article
Full-text available
The ongoing coronavirus disease 2019 (COVID-19) pandemic is one of the major health emergencies in decades that affected almost every country in the world. As of June 30, 2020, it has caused an outbreak with more than 10 million confirmed infections, and more than 500,000 reported deaths globally. Due to the unavailability of an effective treatment...
Conference Paper
Full-text available
Real-world heavy-tailed networks are claimed to be scale-free, meaning that the degree distributions follow the classical power-law. But it is evident from a closer observation that there exists a clearly identifiable non-linear pattern in the entire degree distribution in a log-log scale. Thus, the classical power-law distribution is often inadequ...
Article
Full-text available
Unemployment has always been a very focused issue causing a nation as a whole to lose its economic and financial contribution. Unemployment rate prediction of a country is a crucial factor for the country’s economic and financial growth planning and a challenging job for policymakers. Traditional stochastic time series models, as well as modern non...
Conference Paper
Full-text available
Real-world time series data sets contain a combination of linear and nonlinear patterns, making the time series forecasting problem more challenging. In this paper, a new hybrid methodology is introduced for forecasting univariate time series data sets using a multiplicative error modeling approach. An autoregressive integrated moving average (ARIM...
Preprint
Full-text available
The coronavirus disease 2019 (COVID-19) has become a public health emergency of international concern affecting more than 200 countries and territories worldwide. As of September 30, 2020, it has caused a pandemic outbreak with more than 33 million confirmed infections and more than 1 million reported deaths worldwide. Several statistical, machine...
Article
Full-text available
Software defect prediction (SDP) is a convenient way to identify defects in the early phases of the software development life cycle. This early warning system can help in the removal of software defects and yield a cost-effective and good quality of software products. A wide range of statistical and machine learning models have been employed to pre...
Preprint
Real-world time series data sets contain a combination of linear and nonlinear patterns, making the time series forecasting problem more challenging. In this paper, a new hybrid methodology is introduced for forecasting univariate time series data sets using a multiplicative error modeling approach. An autoregressive integrated moving average (ARIM...
Preprint
Full-text available
The ongoing COVID-19 pandemic is one of the major health emergencies in decades that affected almost every country in the world. As of May 10, 2020, it has caused an outbreak with more than 41,78,000 confirmed infections and more than 2,83,000 reported deaths globally. Due to the unavailability of an effective treatment (or vaccine) and insufficien...
Preprint
Full-text available
The coronavirus disease 2019 (COVID-19) has become a public health emergency of international concern affecting 201 countries and territories around the globe. As of April 4, 2020, it has caused a pandemic outbreak with more than 11,16,643 confirmed infections and more than 59,170 reported deaths worldwide. The main focus of this paper is two-fold:...
Article
The coronavirus disease 2019 (COVID-19) has become a public health emergency of international concern affecting 201 countries and territories around the globe. As of April 4, 2020, it has caused a pandemic outbreak with more than 11,16,643 confirmed infections and more than 59,170 reported deaths worldwide. The main focus of this paper is two-fold:...
Article
Full-text available
Learning from an imbalanced data set presents a tricky problem in which traditional learning algorithms perform poorly. Traditional classifiers usually aim to optimize the overall accuracy without considering the relative distribution of each class. To improve predictions in imbalanced classification problems, this article presents a superensemble...
Article
Full-text available
In this article, we propose a novel hybridization of regression trees (RTs) and radial basis function networks, namely, radial basis neural tree model, for waste recovery process (WRP) improvement in a paper industry. As a by-product of the paper manufacturing process, a lot of waste along with valuable fibers and fillers come out from the paper ma...
Preprint
Software defect prediction (SDP) is an available way to identify defects in the early phases of software development life cycle. This early warning system can help in the removal of software defects and yield a cost-effective and good quality of software product. A wide range of statistical and machine learning models have been employed to predict...
Preprint
Full-text available
Frequentist and Bayesian methods differ in many aspects, but share some basic optimal properties. In real-life classification and regression problems, situations exist in which a model based on one of the methods is preferable based on some subjective criterion. Nonparametric classification and regression techniques, such as decision trees and neur...
Conference Paper
Full-text available
In this work, we propose a hybrid binary classifier which combines a decision tree with a support vector machine. The proposed hybrid model has the advantages of improved accuracy and easy interpretability. The model will be useful for feature selection cum classification tasks in real-world supervised learning problems. Numerical evidence is also...
Preprint
Forecasting unemployment rate is a perpetual topic of research over the past three decades. Unemployment rate prediction of a country is a key factor for the country's economic and financial growth planning and a challenging job for policymakers. Traditional stochastic time series models, as well as modern nonlinear time series techniques, were emp...
Article
Full-text available
Private business schools in India face a regular problem of picking quality students for their MBA programs to achieve the desired placement percentage. Generally, such datasets are biased towards one class, i.e., imbalanced in nature. And learning from the imbalanced dataset is a difficult proposition. This paper proposes an imbalanced ensemble cl...
Article
In this work, we propose an ensemble of classification trees (CT) and artificial neural networks (ANN). Several statistical properties including universal consistency and upper bound of an important parameter of the proposed classifier are shown. Numerical evidence is also provided using various real-life data sets to assess the performance of the...
Article
Full-text available
In this work, we propose a hybrid regression model to solve a specific problem faced by a modern paper manufacturing company. Boiler inlet water quality is a major concern for the paper machine. If water treatment plant can not produce water of desired quality, then it results in poor health of the boiler water tube and consequently affects the qua...
Article
Dengue case management is an alarmingly important global health issue. The effective allocation of resources is often difficult due to external and internal factors imposing nonlinear fluctuations in the prevalence of dengue fever. We aimed to construct an early-warning system that could accurately forecast subsequent dengue cases in three dengue e...
Preprint
Dengue case management is an alarmingly important global health issue. The effective allocation of resources is often difficult due to external and internal factors imposing nonlinear fluctuations in the prevalence of dengue fever. We aimed to construct an early-warning system that could accurately forecast subsequent dengue cases in three dengue e...
Preprint
Full-text available
In this work, we propose a hybrid regression model to solve a specific problem faced by a modern paper manufacturing company. Boiler inlet water quality is a major concern for the paper machine. If water treatment plant can not produce water of desired quality, then it results in poor health of the boiler water tube and consequently affects the qua...
Preprint
In this article, we propose a novel hybridization of regression trees (RT) and radial basis function networks (RBFN), namely, radial basis neural tree (RBNT) model, for waste recovery process improvement in the paper industry. As a by-product of the paper manufacturing process, a lot of waste along with valuable fibers and fillers come out from the...
Preprint
Full-text available
Learning from an imbalanced dataset is a tricky proposition. Because these datasets are biased towards one class, most existing classifiers tend not to perform well on minority class examples. Conventional classifiers usually aim to optimize the overall accuracy without considering the relative distribution of each class. This article presents a su...
Preprint
Private business schools in India face a common problem of selecting quality students for their MBA programs to achieve desired placement percentage. Business school data set is biased towards one class, i.e., imbalanced in nature. And learning from imbalanced data set is a difficult proposition. Most existing classification methods tend not to per...
Preprint
Full-text available
In this work, we propose an ensemble of classification trees (CT) and artificial neural networks (ANN). Several statistical properties including universal consistency of the classifier are shown and numerical evidence is provided on a real life data set to assess the performance of the model. Our proposed nonparametric ensemble classifier doesn't s...
Article
Full-text available
This work is motivated by a particular problem in a modern paper manufacturing industry, in which maximum efficiency of the process fiber-filler recovery equipment, also known as Krofta supracell, is desired. As a by-product of the paper manufacturing process, a lot of unwanted materials along with valuable fibers and fillers come out as waste mate...
Article
Full-text available
In recent years, business schools face a common problem of selecting quality students for their Master of Business Administration (MBA) programs so that the target placement percentage is achieved. Selecting a wrong student may increase the number of unplaced students. Also, more the number of unplaced students more is the negative impact on the in...

Network

Cited By