Subhabrata Mukherjee

Subhabrata Mukherjee
Microsoft · Microsoft Research

Doctor of Engineering

About

83
Publications
34,559
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,483
Citations
Additional affiliations
April 2019 - present
Microsoft
Position
  • Senior Researcher
October 2017 - April 2019
Amazon
Position
  • Researcher
August 2015 - December 2015
Google Inc.
Position
  • Intern
Education
November 2013 - September 2017
Max Planck Institute for Informatics
Field of study
  • Computer Science
July 2010 - July 2012
Indian Institute of Technology Bombay
Field of study
  • Natural Language Processing, Machine Learning, Artificial Intelligence

Publications

Publications (83)
Preprint
Full-text available
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters. This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation. Parameter-efficient techniques have been developed that tune s...
Preprint
Full-text available
Traditional multi-task learning (MTL) methods use dense networks that use the same set of shared weights across several different tasks. This often creates interference where two or more tasks compete to pull model parameters in different directions. In this work, we study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learn...
Preprint
Full-text available
The transformer architecture is ubiquitously used as the building block of most large-scale language models. However, it remains a painstaking guessing game of trial and error to set its myriad of architectural hyperparameters, e.g., number of layers, number of attention heads, and inner size of the feed forward network, and find architectures with...
Preprint
Full-text available
Knowledge distillation (KD) methods compress large models into smaller students with manually-designed student architectures given pre-specified computational cost. This requires several trials to find a viable student, and further repeating the process for each student or computational budget change. We use Neural Architecture Search (NAS) to auto...
Preprint
Most recent progress in natural language understanding (NLU) has been driven, in part, by benchmarks such as GLUE, SuperGLUE, SQuAD, etc. In fact, many NLU models have now matched or exceeded "human-level" performance on many tasks in these benchmarks. Most of these benchmarks, however, give models access to relatively large amounts of labeled data...
Preprint
Full-text available
Recent works have focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the compressed model performance for downstream tasks. However, there has been no study in analyzing the impact of compression on the generalizability and robustness of these models. Towards this end, we study two popular...
Preprint
Full-text available
We present a new method LiST for efficient fine-tuning of large pre-trained language models (PLMs) in few-shot learning settings. LiST significantly improves over recent methods that adopt prompt fine-tuning using two key techniques. The first one is the use of self-training to leverage large amounts of unlabeled data for prompt-tuning to significa...
Preprint
Full-text available
While pre-trained language models have obtained state-of-the-art performance for several natural language understanding tasks, they are quite opaque in terms of their decision-making process. While some recent works focus on rationalizing neural predictions by highlighting salient concepts in the text as justifications or rationales, they rely on t...
Preprint
Full-text available
Existing bias mitigation methods for DNN models primarily work on learning debiased encoders. This process not only requires a lot of instance-level annotations for sensitive attributes, it also does not guarantee that all fairness sensitive information has been removed from the encoder. To address these limitations, we explore the following resear...
Preprint
Full-text available
While deep and large pre-trained models are the state-of-the-art for various natural language processing tasks, their huge size poses significant challenges for practical uses in resource constrained settings. Recent works in knowledge distillation propose task-agnostic as well as task-specific methods to compress these models, with task-specific o...
Preprint
The combination of multilingual pre-trained representations and cross-lingual transfer learning is one of the most effective methods for building functional NLP systems for low-resource languages. However, for extremely low-resource languages without large-scale monolingual corpora for pre-training or sufficient annotated data for fine-tuning, tran...
Preprint
Full-text available
State-of-the-art deep neural networks require large-scale labeled training data that is often expensive to obtain or not available for many tasks. Weak supervision in the form of domain-specific rules has been shown to be useful in such settings to automatically generate weakly labeled training data. However, learning with weak rules is challenging...
Chapter
Social media has greatly enabled people to participate in online activities at an unprecedented rate. However, this unrestricted access also exacerbates the spread of misinformation and fake news which cause confusion and chaos if not detected in a timely manner. Given the rapidly evolving nature of news events and the limited amount of annotated d...
Preprint
Neural sequence labeling is an important technique employed for many Natural Language Processing (NLP) tasks, such as Named Entity Recognition (NER), slot tagging for dialog systems and semantic parsing. Large-scale pre-trained language models obtain very good performance on these tasks when fine-tuned on large amounts of task-specific labeled data...
Conference Paper
Full-text available
Email remains one of the most frequently used means of online communication. People spend significant amount of time every day on emails to exchange information, manage tasks and schedule events. Previous work has studied different ways for improving email productivity by prioritizing emails, suggesting automatic replies or identifying intents to r...
Preprint
Full-text available
Recent success of large-scale pre-trained language models crucially hinge on fine-tuning them on large amounts of labeled data for the downstream task, that are typically expensive to acquire. In this work, we study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck by making use of large-sc...
Preprint
Full-text available
Email remains one of the most frequently used means of online communication. People spend a significant amount of time every day on emails to exchange information, manage tasks and schedule events. Previous work has studied different ways for improving email productivity by prioritizing emails, suggesting automatic replies or identifying intents to...
Preprint
Full-text available
Web search engines are frequently used to access information about products. This has increased in recent times with the rising popularity of e-commerce. However, there is limited understanding of what users search for and their intents when it comes to product search on the web. In this work, we study search logs from Bing web search engine to cha...
Preprint
Full-text available
Intelligent features in email service applications aim to increase productivity by helping people organize their folders, compose their emails and respond to pending tasks. In this work, we explore a new application, Smart-To-Do, that helps users with task management over emails. We introduce a new task and dataset for automatically generating To-D...
Preprint
Full-text available
Multilingual representations embed words from many languages into a single semantic space such that words with similar meanings are close to each other regardless of the language. These embeddings have been widely used in various settings, such as cross-lingual transfer, where a natural language processing (NLP) model trained on one language is dep...
Conference Paper
Full-text available
Deep and large pre-trained language models are the state-of-the-art for various natural language processing tasks. However, the huge size of these models could be a deterrent to use them in practice. Some recent and concurrent works use knowledge distillation to compress these huge models into shallow ones. In this work we study knowledge distillat...
Conference Paper
Full-text available
Multilingual representations embed words from all languages into a single semantic space such that words with similar meanings are close to each other regardless of languages. These embeddings have been widely used in various settings, such as cross-lingual transfer, where a natural language processing (NLP) model trained on one language deploys to...
Conference Paper
Full-text available
Smart contextualized features in email service applications increase the ease with which people organize their folders, write their emails and respond to pending tasks. In this work, we explore an interesting application that allow users to diligently organize their tasks and schedules via Smart To-Do. We introduce a new task and dataset for automa...
Preprint
Full-text available
Deep and large pre-trained language models are the state-of-the-art for various natural language processing tasks. However, the huge size of these models could be a deterrent to use them in practice. Some recent and concurrent works use knowledge distillation to compress these huge models into shallow ones. In this work we study knowledge distillat...
Preprint
Full-text available
Social media has greatly enabled people to participate in online activities at an unprecedented rate. However, this unrestricted access also exacerbates the spread of misinformation and fake news online which might cause confusion and chaos unless being detected early for its mitigation. Given the rapidly evolving nature of news events and the limi...
Preprint
Controversial claims are abundant in online media and discussion forums. A better understanding of such claims requires analyzing them from different perspectives. Stance classification is a necessary step for inferring these perspectives in terms of supporting or opposing the claim. In this work, we present a neural network model for stance classi...
Preprint
Full-text available
Recent advances in pre-training huge models on large amounts of text through self supervision have obtained state-of-the-art results in various natural language processing tasks. However, these huge and expensive models are difficult to use in practise for downstream tasks. Some recent efforts use knowledge distillation to compress these models. Ho...
Preprint
Full-text available
Social influence plays a vital role in shaping a user's behavior in online communities dealing with items of fine taste like movies, food, and beer. For online recommendation, this implies that users' preferences and ratings are influenced due to other individuals. Given only time-stamped reviews of users, can we find out who-influences-whom, and c...
Preprint
Full-text available
In this paper, we consider advancing web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB). Traditional techniques from universal schema and from schema mapping fall in two extremes: either they perform instance-level inference relying on embeddin...
Conference Paper
Full-text available
In this paper, we consider advancing web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB). Traditional techniques from universal schema and from schema mapping fall in two extremes: either they perform instance-level inference relying on embeddin...
Conference Paper
Full-text available
Social influence plays a vital role in shaping a user's behavior in online communities dealing with items of fine taste like movies, food, and beer. For online recommendation, this implies that users' preferences and ratings are influenced due to other individuals. Given only time-stamped reviews of users, can we find out who-influences-whom, and c...
Conference Paper
Full-text available
Extraction of missing attribute values is to find values describing an attribute of interest from a free text input. Most past related work on extraction of missing attribute values work with a closed world assumption with the possible set of values known beforehand, or use dictionaries of values and hand-crafted features. How can we discover new a...
Preprint
Full-text available
Misinformation such as fake news is one of the big challenges of our society. Research on automated fact-checking has proposed methods based on supervised learning, but these approaches do not consider external evidence apart from labeled training instances. Recent approaches counter this deficit by considering external sources related to a claim....
Preprint
Full-text available
Extraction of missing attribute values is to find values describing an attribute of interest from a free text input. Most past related work on extraction of missing attribute values work with a closed world assumption with the possible set of values known beforehand, or use dictionaries of values and hand-crafted features. How can we discover new a...
Conference Paper
Full-text available
Rapid increase of misinformation online has emerged as one of the biggest challenges in this post-truth era. This has given rise to many fact-checking websites that manually assess doubtful claims. However, the speed and scale at which misinformation spreads in online media inherently limits manual verification. Hence, the problem of automatic cred...
Article
Full-text available
One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), o...
Thesis
Full-text available
One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), o...
Article
Full-text available
Online review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has shown that recommender systems benefit from explicit consideration of user experience. However, prior work assumes a fixed number of discrete experience levels, whereas in reality users gain experience and mature contin...
Article
Full-text available
Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and...
Article
Full-text available
Online reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers' purchasing decisions. However, the proliferation of non-credible reviews -- either fake (promoting/ demoting an item), incompetent (involving irrelevant aspects), or biased -- entails the problem of identifying credible reviews...
Article
Full-text available
Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by expl...
Article
Full-text available
Online reviews provided by consumers are a valuable asset for e-Commerce platforms, influencing potential consumers in making purchasing decisions. However, these reviews are of varying quality, with the useful ones buried deep within a heap of non-informative reviews. In this work, we attempt to automatically identify review quality in terms of it...
Article
Full-text available
Current recommender systems exploit user and item similarities by collaborative filtering. Some advanced methods also consider the temporal evolution of item ratings as a global background process. However, all prior methods disregard the individual evolution of a user's experience level and how this is expressed in the user's writing in a review c...
Conference Paper
Full-text available
The web is a huge source of valuable information. However, in recent times, there is an increasing trend towards false claims in social media, other web-sources, and even in news. Thus, factchecking websites have become increasingly popular to identify such misinformation based on manual analysis. Recent research proposed methods to assess the cred...
Conference Paper
Full-text available
There is an increasing amount of false claims in news, social media, and other web sources. While prior work on truth discovery has focused on the case of checking factual statements, this paper addresses the novel task of assessing the credibility of arbitrary claims made in natural-language text - in an open-domain setting without any assumptions...
Conference Paper
Full-text available
Online review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has shown that recommender systems benefit from explicit consideration of user experience. However, prior work assumes a fixed number of discrete experience levels, whereas in reality users gain experience and mature contin...
Conference Paper
Full-text available
Online reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers’ purchasing decisions. However, the proliferation of non-credible reviews — either fake (promoting/ demoting an item), incompetent (involving irrelevant aspects), or biased — entails the problem of identifying credible reviews....
Conference Paper
Full-text available
Current recommender systems exploit user and item similarities by collaborative filtering. Some advanced methods also consider the temporal evolution of item ratings as a global background process. However, all prior methods disregard the individual evolution of a user's experience level and how this is expressed in the user's writing in a review c...
Conference Paper
Full-text available
Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and...