Tanmoy Chakraborty

Tanmoy Chakraborty
Indraprastha Institute of Information Technology | IIITD · Department of Computer Science

About

272
Publications
42,795
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,717
Citations
Citations since 2016
240 Research Items
2615 Citations
20162017201820192020202120220200400600
20162017201820192020202120220200400600
20162017201820192020202120220200400600
20162017201820192020202120220200400600

Publications

Publications (272)
Preprint
Conversations emerge as the primary media for exchanging ideas and conceptions. From the listener's perspective, identifying various affective qualities, such as sarcasm, humour, and emotions, is paramount for comprehending the true connotation of the emitted utterance. However, one of the major hurdles faced in learning these affect dimensions is...
Preprint
The widespread diffusion of medical and political claims in the wake of COVID-19 has led to a voluminous rise in misinformation and fake news. The current vogue is to employ manual fact-checkers to efficiently classify and verify such data to combat this avalanche of claim-ridden misinformation. However, the rate of information dissemination is suc...
Preprint
Full-text available
Existing self-supervised learning strategies are constrained to either a limited set of objectives or generic downstream tasks that predominantly target uni-modal applications. This has isolated progress for imperative multi-modal applications that are diverse in terms of complexity and domain-affinity, such as meme analysis. Here, we introduce two...
Article
With an increasing outreach of digital platforms in our lives, researchers have taken a keen interest in studying different facets of social interactions. Analyzing the spread of information ( aka diffusion) has brought forth multiple research areas such as modelling user engagement, determining emerging topics, forecasting the virality of online p...
Preprint
Full-text available
The psychotherapy intervention technique is a multifaceted conversation between a therapist and a patient. Unlike general clinical discussions, psychotherapy's core components (viz. symptoms) are hard to distinguish, thus becoming a complex problem to summarize later. A structured counseling conversation may contain discussions about symptoms, hist...
Preprint
Curbing online hate speech has become the need of the hour; however, a blanket ban on such activities is infeasible for several geopolitical and cultural reasons. To reduce the severity of the problem, in this paper, we introduce a novel task, hate speech normalization, that aims to weaken the intensity of hatred exhibited by an online post. The in...
Article
Social reputation (e.g., likes, comments, shares, etc.) on YouTube is the primary tenet to popularize channels/videos. However, the organic way to improve social reputation is tedious, which often provokes content creators to seek services of online blackmarkets for rapidly inflating content reputation. Such blackmarkets act underneath a thriving c...
Article
Analyzing gender is critical to study mental health (MH) support in CVD (cardiovascular disease). The existing studies on using social media for extracting MH symptoms consider symptom detection and tend to ignore user context, disease, or gender. The current study aims to design and evaluate a system to capture how MH symptoms associated with CVD...
Preprint
Full-text available
Internet memes have emerged as an increasingly popular means of communication on the Web. Although typically intended to elicit humour, they have been increasingly used to spread hatred, trolling, and cyberbullying, as well as to target specific individuals, communities, or society on political, socio-cultural, and psychological grounds. While prev...
Chapter
Humans like to express their opinions and crave the opinions of others. Mining and detecting opinions from various sources are beneficial to individuals, organisations, and even governments. One such organisation is news media, where a general norm is not to showcase opinions from their side. Anchors are the face of the digital media, and it is req...
Preprint
Full-text available
The automatic identification of harmful content online is of major concern for social media platforms, policymakers, and society. Researchers have studied textual, visual, and audio content, but typically in isolation. Yet, harmful content often combines multiple modalities, as in the case of memes, which are of particular interest due to their vir...
Article
The current special issue of Neurocomputing was designed to encourage researchers from interdisciplinary domains working on multilingual social media analytics to think beyond the conventional way of combating online hostile posts. The special issue was primarily based on the theme of the First Workshop on Combating On line Ho st ile Posts in Regio...
Preprint
Being a popular mode of text-based communication in multilingual communities, code-mixing in online social media has became an important subject to study. Learning the semantics and morphology of code-mixed language remains a key challenge, due to scarcity of data and unavailability of robust and language-invariant representation learning technique...
Preprint
Full-text available
Analyzing gender is critical to study mental health (MH) support in CVD (cardiovascular disease). The existing studies on using social media for extracting MH symptoms consider symptom detection and tend to ignore user context, disease, or gender. The current study aims to design and evaluate a system to capture how MH symptoms associated with CVD...
Preprint
Indirect speech such as sarcasm achieves a constellation of discourse goals in human communication. While the indirectness of figurative language warrants speakers to achieve certain pragmatic goals, it is challenging for AI agents to comprehend such idiosyncrasies of human communication. Though sarcasm identification has been a well-explored topic...
Preprint
Full-text available
Fake news can spread quickly on social media and it is important to detect it before it creates lot of damage. Automatic fact/claim verification has recently become a topic of interest among diverse research communities. We present the findings of the Factify shared task, which aims undertake multi-modal fact verification, organized as a part of th...
Preprint
Full-text available
We propose the task of automatically identifying papers used as baselines in a scientific article. We frame the problem as a binary classification task where all the references in a paper are to be classified as either baselines or non-baselines. This is a challenging problem due to the numerous ways in which a baseline reference can appear in a pa...
Article
Efficient discovery of a speaker’s emotional states in a multi-party conversation is significant to design human-like conversational agents. During a conversation, the cognitive state of a speaker often alters due to certain past utterances, which may lead to a flip in their emotional state. Therefore, discovering the reasons (triggers) behind the...
Preprint
Full-text available
Memes are commonly used in social media platforms for humour. Generally, memes consist of an image and embedded text. Memes can be used to spread hate or misinformation, hence it is important to study them. The Memotion task [1] conducted at SemEval 2020, released a data of 10k memes annotated with sentiment label (task A), emotion label (task B) a...
Preprint
Full-text available
Detecting and labeling stance in social media text is strongly motivated by hate speech detection, poll prediction, engagement forecasting, and concerted propaganda detection. Today's best neural stance detectors need large volumes of training data, which is difficult to curate given the fast-changing landscape of social media text and issues on wh...
Preprint
Full-text available
Since the proliferation of social media usage, hate speech has become a major crisis. Hateful content can spread quickly and create an environment of distress and hostility. Further, what can be considered hateful is contextual and varies with time. While online hate speech reduces the ability of already marginalised groups to participate in discus...
Chapter
We propose the task of automatically identifying papers used as baselines in a scientific article. We frame the problem as a binary classification task where all the references in a paper are to be classified as either baselines or non-baselines. This is a challenging problem due to the numerous ways in which a baseline reference can appear in a pa...
Article
The behavior of information cascades (such as retweets) has been modeled extensively. While point process-based generative models have long been in use for estimating cascade growths, deep learning has greatly enhanced the integration of diverse features and signals. We observe two significant temporal signals in cascade data that have not been rep...
Preprint
Full-text available
With an increasing outreach of digital platforms in our lives, researchers have taken a keen interest to study different facets of social interactions that seem to be evolving rapidly. Analysing the spread of information (aka diffusion) has brought forth multiple research areas such as modelling user engagement, determining emerging topics, forecas...
Article
Online media platforms have enabled users to connect with individuals and organizations, and share their thoughts. Other than connectivity, these platforms also serve multiple purposes, such as education, promotion, updates, and awareness. Increasing, the reputation of individuals in online media (aka social reputation ) is thus essential these day...
Preprint
Full-text available
Social reputation (e.g., likes, comments, shares, etc.) on YouTube is the primary tenet to popularize channels/videos. However, the organic way to improve social reputation is tedious, which often provokes content creators to seek the services of online blackmarkets for rapidly inflating content reputation. Such blackmarkets act underneath a thrivi...
Preprint
Full-text available
The onset of the COVID-19 pandemic has brought the mental health of people under risk. Social counselling has gained remarkable significance in this environment. Unlike general goal-oriented dialogues, a conversation between a patient and a therapist is considerably implicit, though the objective of the conversation is quite apparent. In such a cas...
Article
Aggression is a prominent trait of human beings that can affect social harmony in a negative way. The hate mongers misuse the freedom of speech in social media platforms to flood with their venomous comments in many forms. Identifying different traits of online offense is thus inevitable and the need of the hour. Existing studies usually handle one...
Article
YouTube sells advertisements on the posted videos, which in turn enables the content creators to monetize their videos. As an unintended consequence, this has proliferated various illegal activities such as artificial boosting of views, likes, comments, and subscriptions. We refer to such videos (gaining likes and comments artificially) and channel...
Article
Many classes of network growth models have been proposed in the literature for capturing real-world complex networks. Existing research primarily focuses on global characteristics of these models, e.g., degree distribution. We aim to shift the focus towards studying the network growth dynamics from the perspective of individual nodes. In this paper...
Preprint
Full-text available
The Transformer and its variants have been proven to be efficient sequence learners in many different domains. Despite their staggering success, a critical issue has been the enormous number of parameters that must be trained (ranging from $10^7$ to $10^{11}$) along with the quadratic complexity of dot-product attention. In this work, we investigat...
Preprint
Among the various modes of communication in social media, the use of Internet memes has emerged as a powerful means to convey political, psychological, and socio-cultural opinions. Although memes are typically humorous in nature, recent days have witnessed a proliferation of harmful memes targeted to abuse various social entities. As most harmful m...
Chapter
The number of citations received by a scientific article has been used as a proxy for its influence/impact on the research field since the long past. Raw citation count, however, treats all the citations received by a paper equal and ignores the evolution and organization of follow-up studies inspired by the article, thereby failing to capture the...
Preprint
Full-text available
Internet memes have become powerful means to transmit political, psychological, and socio-cultural ideas. Although memes are typically humorous, recent days have witnessed an escalation of harmful memes used for trolling, cyberbullying, and abusing social entities. Detecting such harmful memes is challenging as they can be highly satirical and cryp...
Preprint
Full-text available
Due to the over-emphasize of the quantity of data, the data quality has often been overlooked. However, not all training data points contribute equally to learning. In particular, if mislabeled, it might actively damage the performance of the model and the ability to generalize out of distribution, as the model might end up learning spurious artifa...
Preprint
Full-text available
The formulation of a claim rests at the core of argument mining. To demarcate between a claim and a non-claim is arduous for both humans and machines, owing to latent linguistic variance between the two and the inadequacy of extensive definition-based formalization. Furthermore, the increase in the usage of online social media has resulted in an ex...
Article
An overwhelming amount of data is generated everyday onsocial media, encompassing a wide spectrum of topics. With almost every business decision depending on customer opinion, mining of social media data needs to be quick and easy.For a data analyst to keep up with the agility and the scale of the data, it is impossible to bank on fully supervised...
Preprint
Full-text available
The behaviour of information cascades (such as retweets) has been modelled extensively. While point process-based generative models have long been in use for estimating cascade growths, deep learning has greatly enhanced diverse feature integration. We observe two significant temporal signals in cascade data that have not been emphasized or reporte...
Preprint
Full-text available
Understanding linguistics and morphology of resource-scarce code-mixed texts remains a key challenge in text processing. Although word embedding comes in handy to support downstream tasks for low-resource languages, there are plenty of scopes in improving the quality of language representation particularly for code-mixed languages. In this paper, w...
Article
Sarcasm detection and humor classification are inherently subtle problems, primarily due to their dependence on the contextual and non-verbal information. Furthermore, existing studies in these two topics are usually constrained in non-English languages such as Hindi, due to the unavailability of qualitative annotated datasets. In this work, we mak...
Article
In this article, we address the problem of data scarcity for the sequence classification tasks. We propose AugmentGAN, a simple-yet-effective generative adversarial network-based text augmentation model, which ensures syntactic coherency in the newly generated samples. Given an input with a label, AugmentGAN aims to generate a semantically similar...
Article
The rise of online media has incentivized users to adopt various unethical and artificial ways of gaining social growth to boost their credibility within a short time period. In this paper, we introduce ABOME, a novel multi-platform data repository consisting of artificially boosted online media entities (also known as blackmarket-driven collusive...
Article
Short text is a popular avenue of sharing feedback, opinions and reviews on social media, e-commerce platforms, etc. Many companies need to extract meaningful information (which may include thematic content as well as semantic polarity) out of such short texts to understand users’ behaviour. However, obtaining high quality sentiment-associated and...
Article
Today's Internet is awash in memes as they are humorous, satirical, or ironic which make people laugh. According to a survey, 33% of social media users in age bracket [13-35] send memes every day, whereas more than 50% send every week. Some of these memes spread rapidly within a very short time-frame, and their virality depends on the novelty of th...
Preprint
Full-text available
In recent years, abstractive text summarization with multimodal inputs has started drawing attention due to its ability to accumulate information from different source modalities and generate a fluent textual summary. However, existing methods use short videos as the visual modality and short summary as the ground-truth, therefore, perform poorly o...
Preprint
Sarcasm detection and humor classification are inherently subtle problems, primarily due to their dependence on the contextual and non-verbal information. Furthermore, existing studies in these two topics are usually constrained in non-English languages such as Hindi, due to the unavailability of qualitative annotated datasets. In this work, we mak...
Conference Paper
Full-text available
Curbing hate speech is undoubtedly a major challenge for online microblogging platforms like Twitter. While there have been studies around hate speech detection, it is not clear how hate speech finds its way into an online discussion. It is important for a content moderator to not only identify which tweet is hateful, but also to predict which twee...
Article
Code-mixing is the practice of alternating between two or more languages. A major part of sentiment analysis research has been monolingual and they perform poorly on the code-mixed text. We introduce methods that use multilingual and cross-lingual embeddings to transfer knowledge from monolingual text to code-mixed text for code-mixed sentiment ana...
Article
The aim of image captioning is to generate textual description of a given image. Though seemingly an easy task for humans, it is challenging for machines as it requires the ability to comprehend the image (computer vision) and consequently generate a human-like description for the image (natural language understanding). In recent times, encoder-dec...
Chapter
Fake tweets are observed to be ever-increasing, demanding immediate countermeasures to combat their spread. During COVID-19, tweets with misinformation should be flagged and neutralised in their early stages to mitigate the damages. Most of the existing methods for early detection of fake news assume to have enough propagation information for large...
Article
Cohesive subgraph discovery is an important problem in bipartite graph mining. In this paper, we focus on one kind of cohesive structure, called k-biplex, where each vertex of one side is disconnected from at most k vertices of the other side. We consider the large maximal k-biplex enumeration problem which is to list all those maximal k-biplexes w...
Article
In recent years, abstractive text summarization with multimodal inputs has started drawing attention due to its ability to accumulate information from different source modalities and generate a fluent textual summary. However, existing methods use short videos as the visual modality and short summary as the ground-truth, therefore, perform poorly o...
Chapter
Fake news has become the norm in our times. With the coming of social media platforms, where anyone can write/post on any issues without any regulations, sometimes based on what they read from various platforms and sometimes based on what they are asked to believe in by political ideologies, fake news has become a phenomenon that requires serious a...
Chapter
Nowadays, the diversified services on social media make news diffused at higher rate and larger volumes, which poses unique challenges in terms of the efficiency, scalability, and accuracy on the fake news detection. To solve these issues, graph mining, as a promising direction of data mining, has successfully attracted attentions of recent studies...
Chapter
With the rise of social media, the world is faced with the challenge of increasing health-related fake news more than ever before. We are constantly flooded with health-related information through various online platforms, many of which turn out to be inaccurate and misleading. This chapter provides an overview of various health fake news and relat...
Chapter
To date, there is no comprehensive linguistic description of fake news. This chapter surveys a range of fake news detection research, focusing specifically on that which adopts a linguistic approach as a whole or as part of an integrated approach. Areas where linguistics can support fake news characterisation and detection are identified, namely, i...
Chapter
Diverse demographics, culture, and language; a troubled history of communal violence, polarised politics, and sensationalist media; and a recent explosion in smartphone ownership and Internet access have created a “fake news” crisis in India which threatens both its democratic values and the security of its citizens. One of the unique features of I...