Soudeep Deb

Soudeep Deb
Indian Institute of Management Bangalore | IIMB · Decision Sciences

PhD

About

38
Publications
5,626
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
144
Citations
Introduction
My primary field of research is related to forecasting, time series data, spatio-temporal modeling, clustering problems and inference for random processes. I am also interested in sports analytics, especially problems related to soccer.
Additional affiliations
September 2018 - February 2020
NBC Universal
Position
  • Analyst
March 2020 - present
Indian Institute of Management Bangalore
Position
  • Professor (Assistant)
Education
September 2013 - August 2018
University of Chicago
Field of study
  • Statistics
July 2011 - June 2013
Indian Statistical Institute
Field of study
  • Statistics
July 2008 - June 2011
Indian Statistical Institute
Field of study
  • Statistics

Publications

Publications (38)
Preprint
Full-text available
Modeling and forecasting air quality plays a crucial role in informed air pollution management and protecting public health. The air quality data of a region, collected through various pollution monitoring stations, display nonlinearity, nonstationarity, and highly dynamic nature and detain intense stochastic spatiotemporal correlation. Geometric d...
Preprint
Full-text available
We propose a nonparametric algorithm to detect structural breaks in the conditional mean and/or variance of a time series. Our method does not assume any specific parametric form for the dependence structure of the regressor, the time series model, or the distribution of the model noise. This flexibility allows our algorithm to be applicable to a w...
Preprint
Full-text available
Statistical research in real estate markets, particularly in understanding the spatio-temporal dynamics of house prices, has garnered significant attention in recent times. Although Bayesian methods are common in spatio-temporal modeling, standard Markov chain Monte Carlo (MCMC) techniques are usually slow for large datasets such as house price dat...
Article
It is often of primary interest to analyze and forecast the levels of a continuous phenomenon as a categorical variable. In this paper, we propose a new spatio-temporal model to deal with this problem in a binary setting, with an interesting application related to the COVID-19 pandemic, a phenomena that depends on both spatial proximity and tempora...
Preprint
Full-text available
In this paper, we develop a new and effective approach to nonparametric quantile regression that accommodates ultrahigh-dimensional data arising from spatio-temporal processes. This approach proves advantageous in staving off computational challenges that constitute known hindrances to existing nonparametric quantile regression methods when the num...
Article
Predicting the winner of an election is of importance to multiple stakeholders. To formulate the problem, we consider an independent sequence of categorical data with a finite number of possible outcomes in each. The data is assumed to be observed in batches, each of which is based on a large number of such trials and can be modeled via multinomial...
Article
This article employs a Bayesian methodology to predict the results of soccer matches in real-time. Using sequential data of various events throughout the match, we utilise a multinomial probit regression in a novel framework to estimate the time-varying impact of covariates and to forecast the outcome. English Premier League data from eight seasons...
Article
Full-text available
In this work, we develop a methodology to detect structural breaks in multivariate time series data using the t-distributed stochastic neighbour embedding (t-SNE) technique and non-parametric spectral density estimates. By applying the proposed algorithm to the exchange rates of Indian rupee against four primary currencies, we establish that the co...
Preprint
Full-text available
Although there is substantial literature on identifying structural changes for continuous spatio-temporal processes, the same is not true for categorical spatio-temporal data. This work bridges that gap and proposes a novel spatio-temporal model to identify changepoints in ordered categorical data. The model leverages an additive mean structure wit...
Preprint
Full-text available
This paper employs a Bayesian methodology to predict the results of soccer matches in real-time. Using sequential data of various events throughout the match, we utilize a multinomial probit regression in a novel framework to estimate the time-varying impact of covariates and to forecast the outcome. English Premier League data from eight seasons a...
Preprint
Full-text available
The success of a football team depends on various individual skills and performances of the selected players as well as how cohesively they perform. This work proposes a two-stage process for selecting optimal playing eleven of a football team from its pool of available players. In the first stage, for the reference team, a LASSO-induced modified t...
Article
The COVID-19 pandemic has caused a significant disruption in the social lives and mental health of people across the world. This study aims to asses the effect of using internet search volume data. We categorize the widely searched keywords on the internet in several categories, which are relevant in analyzing the public mental health status. Corre...
Article
Competitive balance in a football league is extremely important from the perspective of economic growth of the industry. Many researchers have earlier proposed different measures of competitive balance, which are primarily adapted from standard economic theory. However, these measures fail to capture the finer nuances of the game. In this work, we...
Preprint
Full-text available
In this article, we study the effect of vector-valued interventions in votes under a binary voter model, where each voter expresses their vote as a $0-1$ valued random variable to choose between two candidates. We assume that the outcome is determined by the majority function, which is true for a democratic system. The term intervention includes ca...
Article
The ongoing pandemic of Coronavirus disease has already affected more than 300,000 people. In this study, we propose an appropriate auto-regressive integrated moving-average model with time-varying parameters to analyze the trend pattern of the early incidence of COVID-19 outbreak, and subsequently, estimate the basic reproduction number R0 for dif...
Preprint
Full-text available
The selection of essential variables in logistic regression is vital because of its extensive use in medical studies, finance, economics and related fields. In this paper, we explore four main typologies (test-based, penalty-based, screening-based, and tree-based) of frequentist variable selection methods in logistic regression setup. Primary objec...
Article
Coronavirus pandemic has affected the whole world extensively and it is of immense importance to understand how the disease is spreading. In this work, we provide evidence of spatial dependence in the pandemic data and accordingly develop a new statistical technique that captures the spatio-temporal dependence pattern of the COVID-19 spread appropr...
Article
Predicting a dengue outbreak well ahead of time is of immense importance to healthcare personnel. In this study, an ensemble method based on three different types of models has been developed. The proposed approach combines negative binomial regression, autoregressive integrated moving average model and generalized linear autoregressive moving aver...
Preprint
This paper proposes a model-free nonparametric estimator of conditional quantile of a time series regression model where the covariate vector is repeated many times for different values of the response. This type of data is abound in climate studies. To tackle such problems, our proposed method exploits the replicated nature of the data and improve...
Preprint
Full-text available
We consider a sequence of variables having multinomial distribution with the number of trials corresponding to these variables being large and possibly different. The multinomial probabilities of the categories are assumed to vary randomly depending on batches. The proposed framework is interesting from the perspective of various applications in pr...
Preprint
Full-text available
Count data appears in various disciplines. In this work, a new method to analyze time series count data has been proposed. The method assumes exponentially decaying covariance structure, a special class of the Matern covariance function, for the latent variable in a Poisson regression model. It is implemented in a Bayesian framework, with the help...
Preprint
Full-text available
Competitive balance in a football league is extremely important from the perspective of economic growth of the industry. Many researchers have earlier proposed different measures of competitive balance, which are primarily adapted from the standard economic theory. However, these measures fail to capture the finer nuances of the game. In this work,...
Article
Full-text available
Recent Coronavirus pandemic has prompted many regulations which are affecting the stock market. Especially because of lockdown policies across the world, the airlines industry is suffering. We analyse the stock price movements of three major airlines companies using a new approach which leverages a measure of internet concern on different topics. I...
Preprint
Full-text available
AIMS Diabetes mellitus is a public health problem worldwide, with diabetic neuropathy (DN) being a common complication. Studies indicate that, neurons can develop insulin resistance (IR) and cannot respond to the neurotrophic properties of insulin. Although studies exist on the relation between DN and glycemic exposure index (GE i ), papers about c...
Preprint
Full-text available
The ongoing pandemic of Coronavirus disease (COVID-19) emerged in Wuhan, China in the end of 2019. It has already affected more than 300,000 people, with the number of deaths nearing 13000 across the world. As it has been posing a huge threat to global public health, it is of utmost importance to identify the rate at which the disease is spreading....
Article
Full-text available
In this study, we develop a clustering method for multivariate time series data. In practical situations, such problems can arise in finance, economics, control theory, and health science. First, we propose to use a simulation based approximation to the test statistic and develop a method to test if two multivariate time series are coming from same...
Article
: media-1vid110.1542/5972298231001PEDS-VA_2018-1171Video Abstract BACKGROUND: Firearm-related fatalities are a top 3 cause of death among children in the United States. Despite historical declines in firearm ownership, the firearm-related mortality rate among young children has risen over the past decade. In this study, we examined changes in firea...
Article
This paper introduces improved methods for statistically assessing birth seasonality and intra‐annual variation in δ18O from faunal tooth enamel. The first method estimates input parameters for use with a previously developed parametric approach by C. Tornero et al. The second method uses a non‐parametric clustering procedure to group individuals w...
Thesis
Full-text available
In this thesis, three different problems in time series and random field have been discussed. First, for a general class of stationary random fields, we study the asymptotic properties of different parametric and nonparametric spectral density estimators under an easily verifiable short-range dependence condition. The theory developed here allows b...
Article
Full-text available
It is of utmost importance to have a clear understanding of the status of air pollution and to provide forecasts and insights about the air quality to the general public and researchers in environmental studies. Previous studies of spatio-temporal models showed that even a short-term exposure to high concentrations of atmospheric fine particulate m...
Article
Full-text available
Goals are results of pin-point shots and it is a pivotal decision in soccer when, how and where to shoot. The main contribution of this study is two-fold. At first, after showing that there exists high spatial correlation in the data of shots across games, we introduce a spatial process in the error structure to model the probability of conversion...
Article
Full-text available
For a general class of stationary random fields we study asymptotic properties of the discrete Fourier transform (DFT), periodogram, parametric and nonparametric spectral density estimators under an easily verifiable short-range dependence condition expressed in terms of functional dependence measures. We allow irregularly spaced data which is inde...
Conference Paper
Stochastic simulation algorithms provide a powerful means to understand complex biochemical processes as well as to solve the inverse problem of reconstructing hidden states and parameters from experimental single-cell data. At presence, a repertoire of efficient algorithms for simulating and calibrating stochastic reaction networks is available. H...

Network

Cited By