Vasudeva Varma

Vasudeva Varma
International Institute of Information Technology, Hyderabad | IIIT · Software Engineering Research Center

About

255
Publications
72,209
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,765
Citations
Citations since 2017
75 Research Items
2716 Citations
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600

Publications

Publications (255)
Preprint
Full-text available
Commonsense question-answering (QA) methods combine the power of pre-trained Language Models (LM) with the reasoning provided by Knowledge Graphs (KG). A typical approach collects nodes relevant to the QA pair from a KG to form a Working Graph (WG) followed by reasoning using Graph Neural Networks(GNNs). This faces two major challenges: (i) it is d...
Preprint
Full-text available
Lack of encyclopedic text contributors, especially on Wikipedia, makes automated text generation for \emph{low resource (LR) languages} a critical problem. Existing work on Wikipedia text generation has focused on \emph{English only} where English reference articles are summarized to generate English Wikipedia pages. But, for low-resource languages...
Preprint
Full-text available
Clause recommendation is the problem of recommending a clause to a legal contract, given the context of the contract in question and the clause type to which the clause should belong. With not much prior work being done toward the generation of legal contracts, this problem was proposed as a first step toward the bigger problem of contract generati...
Chapter
Full-text available
Clause recommendation is the problem of recommending a clause to a legal contract, given the context of the contract in question and the clause type to which the clause should belong. With not much prior work being done toward the generation of legal contracts, this problem was proposed as a first step toward the bigger problem of contract generati...
Chapter
Body shaming, a criticism based on the body’s shape, size, or appearance, has become a dangerous act on social media. With a rise in the reporting of body shaming experiences on the web, automated monitoring of body shaming posts will help rescue individuals, especially adolescents, from the emotional anguish they experience. To the best of our kno...
Article
Full-text available
Sexism, a permeate form of oppression, causes profound suffering through various manifestations. Given the increasing number of experiences of sexism shared online, categorizing these recollections automatically can support the battle against sexism, since it can promote successful evaluations by gender studies researchers and government representa...
Article
An overwhelming amount of data is generated everyday onsocial media, encompassing a wide spectrum of topics. With almost every business decision depending on customer opinion, mining of social media data needs to be quick and easy.For a data analyst to keep up with the agility and the scale of the data, it is impossible to bank on fully supervised...
Article
The central idea in designing various marketing strategies for online social networks is to identify the influencers in the network. The influential individuals induce ``word-of-mouth" effects in the network. These individuals are responsible for triggering long cascades of influence that convince their peers to perform a similar action (buying a p...
Article
Micro-blogging sites such as Facebook, Twitter, Google+ present a nice opportunity for targeting advertisements that are contextually related to the microblog content. By virtue of the sparse and noisy text makes identifying the microblogs suitable for advertising a very hard problem. In this work, we approach the problem of identifying the microbl...
Article
Sexism, an injustice that subjects women and girls to enormous suffering, manifests in blatant as well as subtle ways. In the wake of growing documentation of experiences of sexism on the web, the automatic categorization of accounts of sexism has the potential to assist social scientists and policymakers in studying and thereby countering sexism....
Preprint
We propose a knowledge-based approach for extraction of Cause-Effect (CE) relations from biomedical text. Our approach is a combination of an unsupervised machine learning technique to discover causal triggers and a set of high-precision linguistic rules to identify cause/effect arguments of these causal triggers. We evaluate our approach using a c...
Preprint
Full-text available
The evolution of social media platforms have empowered everyone to access information easily. Social media users can easily share information with the rest of the world. This may sometimes encourage spread of fake news, which can result in undesirable consequences. In this work, we train models which can identify health news related to COVID-19 pan...
Chapter
Full-text available
Witness testimonies are important constituents of a court case description and play a significant role in the final decision. We propose two techniques to identify sentences representing witness testimonies. The first technique employs linguistic rules whereas the second technique applies distant supervision where training set is constructed automa...
Chapter
Full-text available
Sexism, a pervasive form of oppression, causes profound suffering through various manifestations. Given the rising number of experiences of sexism reported online, categorizing these recollections automatically can aid the fight against sexism, as it can facilitate effective analyses by gender studies researchers and government officials involved i...
Article
We consider the task of Medical Concept Normalization (MCN) which aims to map informal medical phrases such as “loosing weight” to formal medical concepts, such as “Weight loss”. Deep learning models have shown high performance across various MCN datasets containing small number of target concepts along with adequate number of training examples per...
Article
Given the recent progress in language modeling using Transformer-based neural models and an active interest in generating stylized text, we present an approach to leverage the generalization capabilities of a language model to rewrite an input text in a target author's style. Our proposed approach adapts a pre-trained language model to generate aut...
Preprint
With the ever-increasing cases of hate spread on social media platforms, it is critical to design abuse detection mechanisms to proactively avoid and control such incidents. While there exist methods for hate speech detection, they stereotype words and hence suffer from inherently biased training. Bias removal has been traditionally studied for str...
Preprint
Extractive text summarization has been an extensive research problem in the field of natural language understanding. While the conventional approaches rely mostly on manually compiled features to generate the summary, few attempts have been made in developing data-driven systems for extractive summarization. To this end, we present a fully data-dri...
Preprint
Automated multi-document extractive text summarization is a widely studied research problem in the field of natural language understanding. Such extractive mechanisms compute in some form the worthiness of a sentence to be included into the summary. While the conventional approaches rely on human crafted document-independent features to generate a...
Preprint
Sexism, an injustice that subjects women and girls to enormous suffering, manifests in blatant as well as subtle ways. In the wake of growing documentation of experiences of sexism on the web, the automatic categorization of accounts of sexism has the potential to assist social scientists and policy makers in utilizing such data to study and counte...
Conference Paper
Full-text available
With the ever-increasing cases of hate spread on social media platforms, it is critical to design abuse detection mechanisms to pro-actively avoid and control such incidents. While there exist methods for hate speech detection, they stereotype words and hence suffer from inherently biased training. Bias removal has been traditionally studied for st...
Conference Paper
In recent times, fake news and misinformation have had a disruptive and adverse impact on our lives. Given the prominence of microblogging networks as a source of news for most individuals, fake news now spreads at a faster pace and has a more profound impact than ever before. This makes detection of fake news an extremely important challenge. Fake...
Article
Identifying medical persona from a social media post is critical for drug marketing, pharmacovigilance and patient recruitment. Medical persona classification aims to computationally model the medical persona associated with a social media post. We present a novel deep learning model for this task which consists of two parts: Convolutional Neural N...
Article
Given the explosion in availability of scientific documents (books and research papers), automatically extracting cause–effect (CE) relation mentions, along with other arguments such as polarity, uncertainty and evidence, is becoming crucial for creating scientific knowledge bases from scientific text documents. Such knowledge bases can be used for...
Chapter
Full-text available
Users have been trained to type keyword queries on search engines. However, recently there has been a significant rise in the number of verbose queries. Often times such queries are not well-formed. The lack of well-formedness in the query might adversely impact the downstream pipeline which processes these queries. A well-formed natural language q...
Chapter
Spatial Role Labelling involves identification of text segments which emit spatial semantics such as describing an object of interest, a reference point or the object’s relative position with the reference. Tasks in SemEval exercises of 2012 and 2013 propose problems and datasets for Spatial Role Labelling. In this paper, we propose a simple two-st...
Chapter
Sentence compression has traditionally been tackled as syntactic tree pruning, where rules and statistical features are defined for pruning less relevant words. Recent years have witnessed the rise of neural models without leveraging syntax trees, learning sentence representations automatically and pruning words from such representations. We invest...
Conference Paper
Full-text available
An important aspect of understanding narrative text is identification of actors, its mentions and coreferences among them. Coreference Resolution in Hindi is a relatively under-explored area. In this paper, we focus on the task of resolving corefer-ences of actor mentions in Hindi narrative text. We propose a linguistically grounded approach for th...
Conference Paper
Popular methods for news recommendation which are based on collaborative filtering and content-based filtering have multiple drawbacks. The former method does not account for the sequential nature of news reading and suffers from the problem of cold-start, while the latter, suffers from over-specialization. In order to address these issues for news...
Conference Paper
An effective news recommendation system should harness the historical information of the user based on her interactions as well as the content of the articles. In this paper we propose a novel deep learning model for news recommendation which utilizes the content of the news articles as well as the sequence in which the articles were read by the us...
Preprint
Full-text available
Text segmentation plays an important role in various Natural Language Processing (NLP) tasks like summarization, context understanding, document indexing and document noise removal. Previous methods for this task require manual feature engineering, huge memory requirements and large execution times. To the best of our knowledge, this paper is the f...
Preprint
In order to expand their reach and increase website ad revenue, media outlets have started using clickbait techniques to lure readers to click on articles on their digital platform. Having successfully enticed the user to open the article, the article fails to satiate his curiosity serving only to boost click-through rates. Initial methods for this...
Conference Paper
Online media outlets, in a bid to expand their reach and subsequently increase revenue through ad monetisation, have begun adopting clickbait techniques to lure readers to click on articles. The article fails to fulfill the promise made by the headline. Traditional methods for clickbait detection have relied heavily on feature engineering which, in...
Chapter
Prediction of trust relations between users of social networks is critical for finding credible information. Inferring trust is challenging, since user-specified trust relations are highly sparse and power-law distributed. In this paper, we explore utilizing community memberships for trust prediction in a principled manner. We also propose a novel...
Article
Full-text available
Background: Social media is a useful platform to share health-related information due to its vast reach. This makes it a good candidate for public-health monitoring tasks, specifically for pharmacovigilance. We study the problem of extraction of Adverse-Drug-Reaction (ADR) mentions from social media, particularly from Twitter. Medical information...
Chapter
Adverse drug reactions (ADRs) are one of the leading causes of mortality in health care. Current ADR surveillance systems are often associated with a substantial time lag before such events are officially published. On the other hand, online social media such as Twitter contain information about ADR events in real-time, much before any official rep...
Chapter
Adverse drug reactions (ADRs) are one of the leading causes of mortality in health care. Current ADR surveillance systems are often associated with a substantial time lag before such events are officially published. On the other hand, online social media such as Twitter contain information about ADR events in real-time, much before any official rep...
Conference Paper
Text segmentation plays an important role in various Natural Language Processing (NLP) tasks like summarization, context understanding, document indexing and document noise removal. Previous methods for this task require manual feature engineering, huge memory requirements and large execution times. To the best of our knowledge, this paper is the f...
Article
Full-text available
Adverse drug reactions (ADRs) are one of the leading causes of mortality in health care. Current ADR surveillance systems are often associated with a substantial time lag before such events are officially published. On the other hand, online social media such as Twitter contain information about ADR events in real-time, much before any official rep...
Article
Full-text available
Adverse drug reactions (ADRs) are one of the leading causes of mortality in health care. Current ADR surveillance systems are often associated with a substantial time lag before such events are officially published. On the other hand, online social media such as Twitter contain information about ADR events in real-time, much before any official rep...
Conference Paper
Full-text available
Automated multi-document extractive text summarization is a widely studied research problem in the field of natural language understanding. Such extractive mechanisms compute in some form the worthiness of a sentence to be included into the summary. While the conventional approaches rely on human crafted document-independent features to generate a...
Conference Paper
Full-text available
Bizarre news items are those news items so strange and unusual that readers might question the claims presented in the news. This paper presents the first machine learning approach to bizarre news detection in online news media. We contribute by compiling the first bizarre news corpus of 23754 bizarre news headlines, and by developing a bizarre new...
Article
Full-text available
Social media is a rich source of information and opinion, with exponential data growth rate. However social media posts are difficult to analyze since they are brief, unstructured and noisy. Interestingly, many social media posts are about an entity or entities. Understanding which entity is central (Salient Entity) to a post, helps better analyze...
Conference Paper
Full-text available
Extractive text summarization has been an extensive research problem in the field of natural language understanding. While the conventional approaches rely mostly on manually compiled features to generate the summary, few attempts have been made in developing data-driven systems for extractive summarization. To this end, we present a fully data-dri...
Conference Paper
Research in analysis of microblogging platforms is experiencing a renewed surge with a large number of works applying representation learning models for applications like sentiment analysis, semantic textual similarity computation, hashtag prediction, etc. Although the performance of the representation learning models has been better than the tradi...
Conference Paper
Identifying medical persona from a social media post is of paramount importance for drug marketing and pharmacovigilance. In this work, we propose multiple approaches to infer the medical persona associated with a social media post. We pose this as a supervised multi-label text classification problem. The main challenge is to identify the hidden cu...
Conference Paper
Inferring trust relations between social media users is critical for a number of applications wherein users seek credible information. The fact that available trust relations are scarce and skewed makes trust prediction a challenging task. To the best of our knowledge, this is the first work on exploring representation learning for trust prediction...
Conference Paper
Full-text available
Automatic categorization of computer science research papers using just the abstracts, is a hard problem to solve. This is due to the short text length of the abstracts. Also, abstracts are a general discussion of the topic with few domain specific terms. These reasons make it hard to generate good representations of abstracts which in turn leads t...
Article
Full-text available
Inferring trust relations between social media users is critical for a number of applications wherein users seek credible information. The fact that available trust relations are scarce and skewed makes trust prediction a challenging task. To the best of our knowledge, this is the first work on exploring representation learning for trust prediction...
Article
Full-text available
Hate speech detection on Twitter is critical for applications like controversial event extraction, building AI chatterbots, content recommendation, and sentiment analysis. We define this task as being able to classify a tweet as racist, sexist or neither. The complexity of the natural language constructs makes this task very challenging. We perform...
Chapter
Social media is an important venue for information sharing, discussions or conversations on a variety of topics and events generated or happening across the globe. Application of automated text summarization techniques on the large volume of information piled up in social media can produce textual summaries in a variety of flavors depending on the...
Conference Paper
Scientific article recommendation problem deals with recommending similar scientific articles given a query article. It can be categorized as a content based similarity system. Recent advancements in representation learning methods have proven to be effective in modeling distributed representations in different modalities like images, languages, sp...
Article
Research in analysis of microblogging platforms is experiencing a renewed surge with a large number of works applying representation learning models for applications like sentiment analysis, semantic textual similarity computation, hashtag prediction, etc. Although the performance of the representation learning models has been better than the tradi...
Conference Paper
In this work we propose a novel representation learning model which computes semantic representations for tweets accurately. Our model systematically exploits the chronologically adjacent tweets (‘context’) from users’ Twitter timelines for this task. Further, we make our model user-aware so that it can do well in modeling the target tweet by explo...
Article
Full-text available
Developers need to comprehend a program before modifying it. In such cases, developers use cues to establish the relevance of a piece of information with a task. Being familiar with different kinds of cues will help the developers to comprehend a program faster. But unlike the experienced developers, novice developers fail to recognize the relevant...
Article
In this work we propose a novel representation learning model which computes semantic representations for tweets accurately. Our model systematically exploits the chronologically adjacent tweets ('context') from users' Twitter timelines for this task. Further, we make our model user-aware so that it can do well in modeling the target tweet by explo...
Article
Research in social media analysis is experiencing a recent surge with a large number of works applying representation learning models to solve high-level syntactico-semantic tasks such as sentiment analysis, semantic textual similarity computation, hashtag prediction and so on. Although the performance of the representation learning models are bett...
Article
Full-text available
Sentiment analysis (SA) using code-mixed data from social media has several applications in opinion mining ranging from customer satisfaction to social campaign analysis in multilingual societies. Advances in this area are impeded by the lack of a suitable annotated dataset. We introduce a Hindi-English (Hi-En) code-mixed dataset for sentiment anal...
Article
Opportunistic reuse is a need based sourcing of software modules without a prior reuse plan. It is a common tactical approach in software development. Developers often reuse an external software module opportunistically to improve their productivity. But, studies have shown that this results in extensive refactoring and adds maintenance owes. We as...
Conference Paper
Doc2Sent2Vec is an unsupervised approach to learn low-dimensional feature vector (or embedding) for a document. This embedding captures the semantics of the document and can be fed as input to machine learning algorithms to solve a myriad number of applications in the field of data mining and information retrieval. Some of these applications includ...
Conference Paper
In this paper, we present a novel N-gram (N> = 1) filtration technique for keyphrase extraction. To filter the sophisticated candidate keyphrases (N-grams), we introduce the combined use of: 1) statistical feature (obtained by using weighted betweenness centrality scores of words, which isgenerally used to identify the border nodes/edges in communi...