Shubhra Kanti Karmaker Santu

Shubhra Kanti Karmaker Santu
Auburn University | AU · Department of Computer Science & Software Engineering

Ph.D.
Always Looking for Excellent PhD Students

About

21
Publications
11,032
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
156
Citations
Introduction
I am a Tenure-Track Assistant Professor in the department of Computer Science and Software Enginering at Auburn University, Alabama. My primary research interest lies at the intersection of Big Data, Artifical Intelligence and Natural Language Processing. Before joining Auburn University, I was a Postdoctoral Research Associate in the Laboratory for Information and Decision Systems at Massachusetts Institute of Technology (MIT), hosted by Dr. Kalyan Veeramachaneni.
Additional affiliations
January 2019 - December 2019
Massachusetts Institute of Technology
Position
  • PostDoc Position
May 2018 - present
Microsoft
Position
  • Research Intern
Description
  • Summer Research Intern with Riham Mansour
May 2017 - August 2017
Microsoft
Position
  • Research Intern
Description
  • Summer Research Intern with Hao ma
Education
August 2014 - December 2018
University of Illinois, Urbana-Champaign
Field of study
  • Computer Science

Publications

Publications (21)
Preprint
Full-text available
In this paper, we present a novel perspective towards IR evaluation by proposing a new family of evaluation metrics where the existing popular metrics (e.g., nDCG, MAP) are customized by introducing a query-specific lower-bound (LB) normalization term. While original nDCG, MAP etc. metrics are normalized in terms of their upper bounds based on an i...
Conference Paper
Full-text available
In this demo, we focus on analyzing COVID-19 related symptoms across the globe reported through tweets by building an interactive spatio-temporal visualization tool, i.e., COVID19α. Using around 462 million tweets collected over a span of six months, COVID19α provides three different types of visualization tools: 1) Spatial Visualization with a foc...
Conference Paper
Full-text available
In most existing works, nDCG is computed for a fixed cutoff , i.e., nDCG@k and some fixed discounting coefficient. Such a conventional query-independent way to compute nDCG does not accurately reflect the utility of search results perceived by an individual user and is thus non-optimal. In this paper, we conduct a case study of the impact of using...
Preprint
Full-text available
As big data becomes ubiquitous across domains, and more and more stakeholders aspire to make the most of their data, demand for machine learning tools has spurred researchers to explore the possibilities of automated machine learning (AutoML). AutoML tools aim to make machine learning accessible for non-machine learning experts (domain experts), to...
Preprint
Full-text available
Tracking sexual violence is a challenging task. In this paper, we present a supervised learning-based automated sexual violence report tracking model that is more scalable, and reliable than its crowdsource based counterparts. We define the sexual violence report tracking problem by considering victim, perpetrator contexts and the nature of the vio...
Conference Paper
Full-text available
Learning to Rank is an important framework used in search engines to optimize the combination of multiple features in a single ranking function. In the existing work on learning to rank, such a ranking function is often trained on a large set of different queries to optimize the overall performance on all of them. However, the optimal parameters to...
Conference Paper
Full-text available
Content of text data are often influenced by contextual factors which often evolve over time (e.g., content of social media are often influenced by topics covered in the major news streams). Existing language models do not consider the influence of such related evolving topics, and thus are not optimal. In this paper, we propose to incorporate such...
Preprint
Full-text available
Most automation in machine learning focuses on model selection and hyper parameter tuning, and many overlook the challenge of automatically defining predictive tasks. We still heavily rely on human experts to define prediction tasks, and generate labels by aggregating raw data. In this paper, we tackle the challenge of defining useful prediction pr...
Preprint
Full-text available
Most automation in machine learning focuses on model selection and hyper parameter tuning, and many overlook the challenge of automatically defining predictive tasks. We still heavily rely on human experts to define prediction tasks, and generate labels by aggregating raw data. In this paper, we tackle the challenge of defining useful prediction pr...
Thesis
Full-text available
A crucial component of any intelligent system is to understand and predict the behavior of its users. A correct model of user's behavior enables the system to perform effectively to better serve the user's need. While much work has been done on user behavior modeling based on historical activity data, little attention has been paid to how external...
Research Proposal
Full-text available
The goal of my research is to develop intelligent systems by exploiting the web-scale Big Data which can enhance human’s perception of the real world and thus, facilitate informed decision making in a wide range of application areas. Broadly, my research interest lies at the intersection of Text Mining, Natural Language Processing and Information R...
Conference Paper
Full-text available
The promise of big data relies on the release and aggregation of data sets. When these data sets contain sensitive information about individuals , it has been scalable and convenient to protect the privacy of these individuals by de-identification. However, studies show that the combination of de-identified data sets with other data sets risks re-i...
Article
Full-text available
As data reported by humans about our world, text data play a very important role in all data mining applications, yet how to develop a general text analysis system to support all text mining applications is a difficult challenge. In this position paper, we introduce SOFSAT, a new framework that can support set-like operators for semantic analysis o...
Conference Paper
Full-text available
Previous work has shown that popular trending events are important external factors which pose significant influence on user search behavior and also provided a way to computationally model this influence. However, their problem formulation was based on the strong assumption that each event poses its influence independently. This assumption is unre...
Conference Paper
Full-text available
Time series are ubiquitous in the world since they are used to measure various phenomena (e.g., temperature, spread of a virus, sales, etc.). Forecasting of time series is highly beneficial (and necessary) for optimizing decisions, yet is a very challenging problem; using only the historical values of the time series is often insufficient. In this...
Conference Paper
Full-text available
E-Commerce (E-Com) search is an emerging important new application of information retrieval. Learning to Rank (LETOR) is a general effective strategy for optimizing search engines, and is thus also a key technology for E-Com search. While the use of LETOR for web search has been well studied, its use for E-Com search has not yet been well explored....
Conference Paper
Full-text available
Understanding how users’ search behavior is influenced by real world events is important both for social science re- search and for designing better search engines for users. In this paper, we study how to model the influence of events on user queries by framing it as a novel data mining problem. Specifically, given a text description of an event,...
Conference Paper
Full-text available
Online customer reviews are very useful for both helping consumers make buying decisions on products or services and providing business intelligence. However, it is a challenge for people to manually digest all the opinions buried in large amounts of review data, raising the need for automatic opinion summarization and analysis. One fundamental cha...
Conference Paper
Full-text available
Generalization ability of a classifier is an important issue for any classification task. This paper proposes a new evolutionary system, i.e., EDARIC, based on the Pittsburgh approach for evolutionary machine learning and classification. The new system uses a destructive approach that starts with large-sized rules and gradually decreases the sizes...
Conference Paper
Full-text available
Time series forecasting (TSF) have been widely used in many application areas such as science, engineering and finance. The characteristics of phenomenon generating a series are usually unknown and information available for forecasting is only limited to the past values of the series. It is, therefore, necessary to use an appropriate number of past...

Network

Cited By

Projects

Projects (9)
Project
Imagine an intelligent agent which can assist you in performing regular Data Science tasks. To take the first step towards this ambitious goal, we are proposing a dialogue-based system to guide users through the process of formulating the goal Machine Learning (ML) task by themselves.