Gautam Shroff

Gautam Shroff
  • Principal Investigator at Tata Consultancy Services Limited

About

169
Publications
91,801
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,059
Citations
Current institution
Tata Consultancy Services Limited
Current position
  • Principal Investigator

Publications

Publications (169)
Preprint
Full-text available
We consider the application of machine learning models for short-term intra-day trading in equities. We envisage a scenario wherein machine learning models are submitted by independent data scientists to predict discretised ten-candle returns every five minutes, in response to five-minute candlestick data provided to them in near real-time. An ense...
Preprint
Full-text available
Large Language Models (LLMs) excel in diverse applications including generation of code snippets, but often struggle with generating code for complex Machine Learning (ML) tasks. Although existing LLM single-agent based systems give varying performance depending on the task complexity, they purely rely on larger and expensive models such as GPT-4....
Preprint
Robotic Process Automation (RPA) systems face challenges in handling complex processes and diverse screen layouts that require advanced human-like decision-making capabilities. These systems typically rely on pixel-level encoding through drag-and-drop or automation frameworks such as Selenium to create navigation workflows, rather than visual under...
Chapter
We propose a program synthesis challenge inspired by the Abstraction and Reasoning Corpus (ARC) [3]. The ARC is intended as a touchstone for human intelligence. It consists of 400 tasks, each with very small numbers (3–5) of ‘input-output’ image pairs. It is known that the tasks are ‘human-solvable’ in the sense that, for any of the tasks, there ex...
Preprint
Cross-domain and cross-compositional generalization of Text-to-SQL semantic parsing is a challenging task. Existing Large Language Model (LLM) based solutions rely on inference-time retrieval of few-shot exemplars from the training set to synthesize a run-time prompt for each Natural Language (NL) test query. In contrast, we devise an algorithm whi...
Preprint
Full-text available
We model short-duration (e.g. day) trading in financial markets as a sequential decision-making problem under uncertainty, with the added complication of continual concept-drift. We, therefore, employ meta reinforcement learning via the RL2 algorithm. It is also known that human traders often rely on frequently occurring symbolic patterns in price...
Preprint
Full-text available
Deep neural networks (DNN) are prone to miscalibrated predictions, often exhibiting a mismatch between the predicted output and the associated confidence scores. Contemporary model calibration techniques mitigate the problem of overconfident predictions by pushing down the confidence of the winning class while increasing the confidence of the remai...
Preprint
Full-text available
We are interested in neurosymbolic systems consisting of a high-level symbolic layer for explainable prediction in terms of human-intelligible concepts; and a low-level neural layer for extracting symbols required to generate the symbolic explanation. Real data is often imperfect meaning that even if the symbolic theory remains unchanged, we may st...
Preprint
Full-text available
Analogical Reasoning problems challenge both connectionist and symbolic AI systems as these entail a combination of background knowledge, reasoning and pattern recognition. While symbolic systems ingest explicit domain knowledge and perform deductive reasoning, they are sensitive to noise and require inputs be mapped to preset symbolic features. Co...
Conference Paper
Full-text available
Identifying all possible user intents for a dialog system at design time is challenging even for skilled domain experts. For practical applications , novel intents may have to be inferred incrementally on the fly. This typically entails repeated retraining of the intent detector on both the existing and novel intents which can be expensive and woul...
Conference Paper
Full-text available
Intent Detection is a crucial component of Dialogue Systems wherein the objective is to classify a user utterance into one of the multiple pre-defined intents. A prerequisite for developing an effective intent identifier is a training dataset labeled with all possible user intents. However, even skilled domain experts are often unable to foresee al...
Article
We consider a class of visual analogical reasoning problems that involve discovering the sequence of transformations by which pairs of input/output images are related, so as to analogously transform future inputs. This program synthesis task can be easily solved via symbolic search. Using a variation of the ‘neural analogical reasoning’ approach, w...
Preprint
We consider a sequence of related multivariate time series learning tasks, such as predicting failures for different instances of a machine from time series of multi-sensor data, or activity recognition tasks over different individuals from multiple wearable sensors. We focus on two under-explored practical challenges arising in such settings: (i)...
Preprint
We consider learning a trading agent acting on behalf of the treasury of a firm earning revenue in a foreign currency (FC) and incurring expenses in the home currency (HC). The goal of the agent is to maximize the expected HC at the end of the trading episode by deciding to hold or sell the FC at each time step in the trading episode. We pose this...
Preprint
Medical professionals evaluating alternative treatment plans for a patient often encounter time varying confounders, or covariates that affect both the future treatment assignment and the patient outcome. The recently proposed Counterfactual Recurrent Network (CRN) accounts for time varying confounders by using adversarial training to balance recur...
Preprint
Full-text available
We consider a class of visual analogical reasoning problems that involve discovering the sequence of transformations by which pairs of input/output images are related, so as to analogously transform future inputs. This program synthesis task can be easily solved via symbolic search. Using a variation of the `neural analogical reasoning' approach of...
Conference Paper
Full-text available
Recommender Systems (RS) tend to recommend more popular items instead of the relevant long-tail items. Mitigating such popularity bias is crucial to ensure that less popular but relevant items are part of the recommendation list shown to the user. In this work, we study the phenomenon of popularity bias in session-based RS (SRS) obtained via deep l...
Preprint
Full-text available
The ability to recognise and make analogies is often used as a measure or test of human intelligence. The ability to solve Bongard problems is an example of such a test. It has also been postulated that the ability to rapidly construct novel abstractions is critical to being able to solve analogical problems. Given an image, the ability to construc...
Article
Twitter, used in 200 countries with over 250 milliontweets a day, is a rich source of local news from aroundthe world. Many events of local importance are first reportedon Twitter, including many that never reach newschannels. Further, there are often only a few tweetsreporting each such event, in contrast with the largervolumes that follow events...
Chapter
We present an effective technique for training deep learning agents capable of negotiating on a set of clauses in a contract agreement using a simple communication protocol. We use Multi-Agent Reinforcement Learning to train both agents simultaneously as they negotiate with each other in the training environment. We also model selfish and prosocial...
Preprint
Advertising channels have evolved from conventional print media, billboards and radio advertising to online digital advertising (ad), where the users are exposed to a sequence of ad campaigns via social networks, display ads, search etc. While advertisers revisit the design of ad campaigns to concurrently serve the requirements emerging out of new...
Preprint
Most of the existing deep reinforcement learning (RL) approaches for session-based recommendations either rely on costly online interactions with real users, or rely on potentially biased rule-based or data-driven user-behavior models for learning. In this work, we instead focus on learning recommendation policies in the pure batch or offline setti...
Article
We consider the problem of estimating the remaining useful life (RUL) of a system or a machine from sensor data. Many approaches for RUL estimation based on sensor data make assumptions about how machines degrade. Additionally, sensor data from machines is noisy and often suffers from missing values in many practical settings. We propose Embed-RUL:...
Chapter
As the COVID-19 pandemic threatens to overwhelm healthcare systems across the world, there is a need for reducing the burden on medical staff via automated systems for patient screening. Given the limited availability of testing kits with long turn-around test times and the exponentially increasing number of COVID-19 positive cases, X-rays offer an...
Preprint
Full-text available
We address the problem of counterfactual regression using causal inference (CI) in observational studies consisting of high dimensional covariates and high cardinality treatments. Confounding bias, which leads to inaccurate treatment effect estimation, is attributed to covariates that affect both treatments and outcome. The presence of high-dimensi...
Article
Full-text available
In this paper we seek to identify data instances with a low value of some objective (or cost) function. Normally posed as optimisation problems, our interest is in problems that have the following characteristics: (a) optimal, or even near-optimal solutions are very rare; (b) it is expensive to obtain the value of the objective function for large n...
Preprint
Several applications of Internet of Things(IoT) technology involve capturing data from multiple sensors resulting in multi-sensor time series. Existing neural networks based approaches for such multi-sensor or multivariate time series modeling assume fixed input dimension or number of sensors. Such approaches can struggle in the practical setting w...
Preprint
Automated equipment health monitoring from streaming multisensor time-series data can be used to enable condition-based maintenance, avoid sudden catastrophic failures, and ensure high operational availability. We note that most complex machinery has a well-documented and readily accessible underlying structure capturing the inter-dependencies betw...
Conference Paper
Full-text available
We address the problem of counterfactual regression using causal inference (CI) in observational studies consisting of high dimensional covariates and high cardinality treatments. Confounding bias, which leads to inaccurate treatment effect estimation, is attributed to covariates that affect both treatments and outcome. The presence of high-dimensi...
Chapter
Recently, neural networks trained as optimizers under the “learning to learn” or meta-learning framework have been shown to be effective for a broad range of optimization tasks including derivative-free black-box function optimization. Recurrent neural networks (RNNs) trained to optimize a diverse set of synthetic non-convex differentiable function...
Preprint
Full-text available
Causal inference (CI) in observational studies has received a lot of attention in healthcare, education, ad attribution, policy evaluation, etc. Confounding is a typical hazard, where the context affects both, the treatment assignment and response. In a multiple treatment scenario, we propose the neural network based MultiMBNN, where we overcome co...
Conference Paper
Full-text available
Causal inference (CI) in observational studies has received a lot of attention in healthcare, education, ad attribution, policy evaluation, etc. Confounding is a typical hazard, where the context affects both, the treatment assignment and response. In a multiple treatment scenario, we propose the neural network based MultiMBNN, where we overcome co...
Conference Paper
Full-text available
Performing inference on data obtained through observational studies is becoming extremely relevant due to the widespread availability of data in fields such as healthcare, education, retail, etc. Furthermore, this data is accrued from multiple homogeneous subgroups of a heterogeneous population, and hence, generalizing the inference mechanism over...
Preprint
Full-text available
Performing inference on data obtained through observational studies is becoming extremely relevant due to the widespread availability of data in fields such as healthcare, education, retail, etc. Furthermore, this data is accrued from multiple homogeneous subgroups of a heterogeneous population, and hence, generalizing the inference mechanism over...
Preprint
Full-text available
Deep neural networks (DNNs) have achieved state-of-the-art results on time series classification (TSC) tasks. In this work, we focus on leveraging DNNs in the often-encountered practical scenario where access to labeled training data is difficult, and where DNNs would be prone to overfitting. We leverage recent advancements in gradient-based meta-l...
Chapter
Enterprise data is usually stored in the form of relational databases. Question Answering systems provides an easier way so that business analysts can get data insights without struggling with the syntax of SQL. However, building a supervised machine learning based question answering system is a challenging task involving large manual annotations f...
Preprint
Full-text available
The goal of session-based recommendation (SR) models is to utilize the information from past actions (e.g. item/product clicks) in a session to recommend items that a user is likely to click next. Recently it has been shown that the sequence of item interactions in a session can be modeled as graph-structured data to better account for complex item...
Conference Paper
Recent advances in sequence-aware approaches for session-based recommendation, such as those based on recurrent neural networks, highlight the importance of leveraging sequential information from a session while making recommendations. Further, a session based k-nearest-neighbors approach (SKNN) has proven to be a strong baseline for session-based...
Article
Full-text available
Deep neural networks are prone to overfitting, especially in small training data regimes. Often, these networks are overparameterized and the resulting learned weights tend to have strong correlations. However, convolutional networks in general, and fully convolution neural networks (FCNs) in particular, have been shown to be relatively parameter e...
Preprint
Full-text available
Recently, neural networks trained as optimizers under the "learning to learn" or meta-learning framework have been shown to be effective for a broad range of optimization tasks including derivative-free black-box function optimization. Recurrent neural networks (RNNs) trained to optimize a diverse set of synthetic non-convex differentiable function...
Chapter
Recent advancements in the area of Computer Vision with state-of-art Neural Networks has given a boost to Optical Character Recognition (OCR) accuracies. However, extracting characters/text alone is often insufficient for relevant information extraction as documents also have a visual structure that is not captured by OCR. Extracting information fr...
Preprint
Full-text available
Our interest in this paper is in meeting a rapidly growing industrial demand for information extraction from images of documents such as invoices, bills, receipts etc. In practice users are able to provide a very small number of example images labeled with the information that needs to be extracted. We adopt a novel two-level neuro-deductive, appro...
Preprint
In this paper, we present iPrescribe, a scalable low-latency architecture for recommending 'next-best-offers' in an online setting. The paper presents the design of iPrescribe and compares its performance for implementations using different real-time streaming technology stacks. iPrescribe uses an ensemble of deep learning and machine learning algo...
Preprint
Full-text available
Training deep neural networks often requires careful hyper-parameter tuning and significant computational resources. In this paper, we propose ConvTimeNet (CTN): an off-the-shelf deep convolutional neural network (CNN) trained on diverse univariate time series classification (TSC) source tasks. Once trained, CTN can be easily adapted to new TSC tar...
Chapter
Answering natural language questions posed on a knowledge graph requires traversing an appropriate sequence of relationships starting from the mentioned entities. To answer complex queries, we often need to traverse more than two relationships. Traditional approaches traverse at most two relationships, as well as typically first retrieve candidate...
Preprint
Full-text available
Deep neural networks have shown promising results for various clinical prediction tasks. However, training deep networks such as those based on Recurrent Neural Networks (RNNs) requires large labeled data, significant hyper-parameter tuning effort and expertise, and high computational resources. In this work, we investigate as to what extent can tr...
Preprint
Full-text available
Prognostics or Remaining Useful Life (RUL) Estimation from multi-sensor time series data is useful to enable condition-based maintenance and ensure high operational availability of equipment. We propose a novel deep learning based approach for Prognostics with Uncertainty Quantification that is useful in scenarios where: (i) access to labeled failu...
Chapter
Full-text available
Survival analysis refers to a gamut of statistical techniques developed to infer the survival time from time-to-event data. In particular, we are interested in recurrent event survival analysis in the presence of one or more competing risks in each recurrent time-step, in order to obtain the probabilistic relationship between the input covariates a...
Conference Paper
In this paper we present iPrescribe, a scalable low-latency architecture for recommending 'next-best-offers' in an online setting. The paper presents the design of iPrescribe and compares its performance for implementations using different real-time streaming technology stacks. iPrescribe uses ensemble of deep learning and machine learning algorith...
Preprint
In this paper we present Meeting Bot, a reinforcement learning based conversational system that interacts with multiple users to schedule meetings. The system is able to interpret user utterences and map them to preferred time slots, which are then fed to a reinforcement learning (RL) system with the goal of converging on an agreeable time slot. Th...
Preprint
Recent advancements in the area of Computer Vision with state-of-art Neural Networks has given a boost to Optical Character Recognition (OCR) accuracies. However, extracting characters/text alone is often insufficient for relevant information extraction as documents also have a visual structure that is not captured by OCR. Extracting information fr...
Conference Paper
Full-text available
Helpdesk is a key component of any large IT organization, where users can log a ticket about any issue they face related to IT infrastructure, administrative services, human resource services, etc. Normally, users have to assign appropriate set of labels to a ticket so that it could be routed to right domain expert who can help resolve the issue. I...
Conference Paper
Artificial Intelligence (AI) and Machine Learning (ML) approaches, well known from IT disciplines, are beginning to excite the networking and networked systems community. Of late, we are seeing a huge excitement about applying AI and ML to networked systems. Is this merely a hype? Are there use cases and genuine applications that could lead to real...
Conference Paper
Full-text available
In this work, we attempt to address two practical limitations when using Recurrent Neural Networks (RNNs) as classifiers for fault detection using multi-sensor time series data: Firstly, there is a need to understand the classification decisions of RNNs. It is difficult for engineers to diagnose the faults when multiple sensors are being monitored...
Article
Full-text available
In this work, we attempt to address two practical limitations when using Recurrent Neural Networks (RNNs) as classifiers for fault detection using multi-sensor time series data: Firstly, there is a need to understand the classification decisions of RNNs. It is difficult for engineers to diagnose the faults when multiple sensors are being monitored...
Conference Paper
Full-text available
We describe the approach--submitted as part of the 2018 PHM Data Challenge--for estimating time-to-failure or Remaining Useful Life (RUL) of Ion Mill Etching Systems in an online fashion using data from multiple sensors. RUL estimation from multi-sensor data can be considered as learning a regression function that maps a multivariate time series to...
Article
We describe the approach – submitted as part of the 2018 PHM Data Challenge – for estimating time-to-failure or Remaining Useful Life (RUL) of Ion Mill Etching Systems in an online fashion using data from multiple sensors. RUL estimation from multi-sensor data can be considered as learning a regression function that maps a multivariate time series...
Preprint
Full-text available
We present an effective technique for training deep learning agents capable of negotiating on a set of clauses in a contract agreement using a simple communication protocol. We use Multi Agent Reinforcement Learning to train both agents simultaneously as they negotiate with each other in the training environment. We also model selfish and prosocial...
Conference Paper
Recent proliferation of conversational systems has resulted in an increased demand for more natural dialogue systems, capable of more sophisticated interactions than merely providing factual answers. This is evident from usage pattern of a conversational system deployed within our organization. Users not only expect it to perform co-reference resol...
Conference Paper
Full-text available
We present an effective technique for training deep learning agents capable of negotiating on a set of clauses in a contract agreement using a simple communication protocol. We use Multi Agent Reinforcement Learning to train both agents simultaneously as they negotiate with each other in the training environment. We also model selfish and prosocial...
Conference Paper
Full-text available
Estimating remaining useful life (RUL) for equipment using sensor data streams is useful to enable condition-based maintenance, and avoid catastrophic shutdowns due to impending failures. Recently, supervised deep learning approaches have been proposed for RUL estimation that leverage historical sensor data of failed instances for training the mode...
Conference Paper
Full-text available
Online anomaly detection in time series is an important component for automated monitoring. In many applications, time series are high-dimensional with tens or even hundreds of variables being monitored simultaneously. We note that existing anomaly detection approaches based on recurrent autoencoders may not be very effective for high-dimensional t...
Conference Paper
Full-text available
Predictive models based on Recurrent Neural Networks (RNNs) for clinical time series have been successfully used for various tasks such as phe-notyping, in-hospital mortality prediction, and di-agnostics. However, RNNs require large labeled data for training and are computationally expensive to train. Pre-training a network for some supervised or u...
Preprint
Full-text available
Deep neural networks have shown promising results for various clinical prediction tasks such as diagnosis, mortality prediction, predicting duration of stay in hospital, etc. However, training deep networks -- such as those based on Recurrent Neural Networks (RNNs) -- requires large labeled data, high computational resources, and significant hyperp...
Preprint
Full-text available
Loading the containers on the ship from a yard, is an impor- tant part of port operations. Finding the optimal sequence for the loading of containers, is known to be computationally hard and is an example of combinatorial optimization, which leads to the application of simple heuristics in practice. In this paper, we propose an approach which uses...
Conference Paper
Anomaly detection in time series is an important task with several practical applications. The common approach of training one model in an offline manner using historical data is likely to fail under dynamically changing and non-stationary environments where the definition of normal behavior changes over time making the model irrelevant and ineffec...
Conference Paper
In this paper we present a comprehensive view of prominent causal discovery algorithms, categorized into two main categories (1) assuming acyclic and no latent variables, and (2) allowing both cycles and latent variables, along with experimental results comparing them from three perspectives: (a) structural accuracy, (b) standard predictive accurac...
Article
Full-text available
News reports in media contain records of a wide range of socio-economic and political events in time. Using a publicly available, large digital database of news records, and aggregating them over time, we study the network of ethnic conflicts and human rights violations. Complex network analyses of the events and the involved actors provide importa...
Conference Paper
We describe an automated assistant for answering frequently asked questions; our system has been deployed, and is currently answering HR-related queries in two different areas (leave management and health insurance) to a large number of users. The needs of a large global corporate lead us to model a frequently asked question (FAQ) to be an equivale...
Article
As the world population increases and arable land decreases, it becomes vital to improve the productivity of the agricultural land available. Given the weather and soil properties, farmers need to take critical decisions such as which seed variety to plant and in what proportion, in order to maximize productivity. These decisions are irreversible a...
Conference Paper
Aggregate analysis, such as comparing country-wise sales versus global market share across product categories, is often complicated by the unavailability of common join attributes, e.g., category, across diverse datasets from different geographies or retail chains. Sometimes this is a missing data issue, while in other cases it may be inherent, e.g...
Conference Paper
Full-text available
A large number of sensors are being installed on machines to capture the operational behavior of machines. The time series data collected from these sensors is then used to monitor the health of machines. We consider the situation where health monitoring of machines by engineers using raw time series visualizations is augmented by a machine learnin...
Article
In this paper we present a comprehensive view of prominent causal discovery algorithms, categorized into two main categories (1) assuming acyclic and no latent variables, and (2) allowing both cycles and latent variables, along with experimental results comparing them from three perspectives: (a) structural accuracy, (b) standard predictive accurac...

Network

Cited By