
Shammur Absar ChowdhuryQatar Computing Research Institute · ALT
Shammur Absar Chowdhury
PhD
About
70
Publications
28,880
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
595
Citations
Introduction
I am interested in analyzing and understanding human conversation. I authored more than 30 papers for different speech and NLP challenges, with the main focus on speech overlaps, turn-takings, speech discourse, code-switching, along with the explainability of the speech modules. My work also includes studying the potential of language models for its linguistic task understanding capabilities.
Additional affiliations
Publications
Publications (70)
User satisfaction is an important aspect of the user experience while interacting with objects, systems or people. Traditionally user satisfaction is evaluated a-posteriori via spoken or written questionnaires or interviews. In automatic behavioral analysis we aim at measuring the user emotional states and its descriptions as they unfold during the...
The paper explores the ability of LSTM networks trained on a language modeling task to detect linguistic structures which are ungrammatical due to extraction violations (extra arguments and subject-relative clause island violations), and considers its implications for the debate on language innatism. The results show that the current RNN model can...
Overlapping speech is a natural and frequently occurring phenomenon in human–human conversations with an underlying purpose. Speech overlap events may be categorized as competitive and non-competitive. While the former is an attempt to grab the floor, the latter is an attempt to assist the speaker to continue the turn. The presence and distribution...
An end-to-end dialect identification system generates the likelihood of each dialect, given a speech utterance. The performance relies on its capabilities to discriminate the acoustic properties between the different dialects, even though the input signal contains non-dialectal information such as speaker and channel. In this work, we study how non...
With the advent of globalization, there is an increasing demand for multilingual automatic speech recognition (ASR), handling language and dialectal variation of spoken content. Recent studies show its efficacy over monolingual systems. In this study, we design a large multilingual end-to-end ASR using self-attention based conformer architecture. W...
We are interested in the problem of conversational analysis and its application to the health domain. Cognitive Behavioral Therapy is a structured approach in psychotherapy, allowing the therapist to help the patient to identify and modify the malicious thoughts, behavior, or actions. This cooperative effort can be evaluated using the Working Allia...
Gender analysis of Twitter can reveal important socio-cultural differences between male and female users. There has been a significant effort to analyze and automatically infer gender in the past for most widely spoken languages' content, however, to our knowledge very limited work has been done for Arabic. In this paper, we perform an extensive an...
The emergence of the COVID-19 pandemic and the first global infodemic have changed our lives in many different ways. We relied on social media to get the latest information about the COVID-19 pandemic and at the same time to disseminate information. The content in social media consisted not only health related advises, plans, and informative news f...
We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets -- anal...
The pervasiveness of intra-utterance Code-switching (CS) in spoken content has enforced ASR systems to handle mixed input. Yet, designing a CS-ASR has many challenges, mainly due to the data scarcity, grammatical structure complexity, and mismatch along with unbalanced language usage distribution. Recent ASR studies showed the predominance of E2E-A...
Code-switching in automatic speech recognition (ASR) is an important challenge due to globalization. Recent research in multilingual ASR shows potential improvement over mono-lingual systems. We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments. Our innovative framework deploys a multi-graph a...
With the advent of globalization, there is an increasing demand for multilingual automatic speech recognition (ASR), handling language and dialectal variation of spoken content. Recent studies show its efficacy over monolingual systems. In this study, we design a large multilingual end-to-end ASR using self-attention based conformer architecture. W...
We introduce the largest transcribed Arabic speech corpus, QASR 1 , collected from the broadcast domain. This multi-dialect speech dataset contains 2, 000 hours of speech sampled at 16kHz crawled from Aljazeera news channel. The dataset is released with lightly supervised transcriptions, aligned with the audio segments. Unlike previous datasets, QA...
Bangla -- ranked as the 6th most widely spoken language across the world (https://www.ethnologue.com/guides/ethnologue200), with 230 million native speakers -- is still considered as a low-resource language in the natural language processing (NLP) community. With three decades of research, Bangla NLP (BNLP) is still lagging behind mainly due to the...
End-to-end deep neural network architectures have pushed the state-of-the-art in speech technologies, as well as in other spheres of Artificial Intelligence, subsequently leading researchers to train more complex and deeper models. These improvements came at the cost of transparency. Deep neural networks are innately opaque and difficult to interpr...
Code-switching in automatic speech recognition (ASR) is an important challenge due to globalization. Recent research in multilingual ASR shows potential improvement over monolingual systems. We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments. Our innovative framework deploys a multi-graph ap...
Personified big data and rapidly developing data science techniques enable previously unforeseen methodological developments for longitudinal analysis of online audiences. Applying data-driven persona generation on online customer statistics from a real organizational social media channel, we demonstrate how personas can be deployed to understand o...
We introduce the largest transcribed Arabic speech corpus, QASR, collected from the broadcast domain. This multi-dialect speech dataset contains 2,000 hours of speech sampled at 16kHz crawled from Aljazeera news channel. The dataset is released with lightly supervised transcriptions, aligned with the audio segments. Unlike previous datasets, QASR c...
In this paper, we present the Kanari/QCRI (KARI) system and the modeling strategies used to participate in the Interspeech 2021 Code-switching (CS) challenge for low-resource Indian languages. The subtask involved developing a speech recognition system for two CS datasets: Hindi-English and Bengali-English, collected in a real-life scenario. To tac...
False preconceptions about users can result in poor design, product development, and marketing decisions, so rectifying these preconceptions is essential for organizations. This research quantitatively evaluates the ability of data-driven personas to alter decision makers’ preconceptions about their online social media users. We conduct a within-pa...
Sentiment analysis has been widely used to understand our views on social and political agendas or user experiences over a product. It is one of the cores and well-researched areas in NLP. However, for low-resource languages, like Bangla, one of the prominent challenge is the lack of resources. Another important limitation, in the current literatur...
Automatic categorization of short texts, such as news headlines and social media posts, has many applications ranging from content analysis to recommendation systems. In this paper, we use such text categorization i.e., labeling the social media posts to categories like 'sports', 'politics', 'human-rights' among others, to showcase the efficacy of...
Sentiment analysis has been widely used to understand our views on social and political agendas or user experiences over a product. It is one of the cores and well-researched areas in NLP. However, for low-resource languages, like Bangla, one of the prominent challenge is the lack of resources. Another important limitation, in the current literatur...
The Intra-utterance code-switching (CS) is defined as the alternation between two or more languages within the same utterance. Despite the fact that spoken dialectal code-switching (DCS) is more challenging than CS, it remains largely unexplored. In this study, we describe a method to build the first spoken DCS corpus. The corpus is annotated at th...
Algorithmic fairness criteria for machine learning models are gathering widespread research interest. They are also relevant in the context of data-driven personas that rely on online user data and opaque algorithmic processes. Overall, while technology provides lucrative opportunities for the persona design practice, several ethical concerns need...
To predict personality traits of data-driven personas, we apply an automatic persona generation methodology to generate 15 personas from the social media data of an online news organization. After generating the personas, we aggregate each personas’ YouTube comments and predict the “Big Five” personality traits of each persona from the comments per...
Access to social media often enables users to engage in conversation with limited accountability. This allows a user to share their opinions and ideology, especially regarding public content, occasionally adopting offensive language. This may encourage hate crimes or cause mental harm to targeted individuals or groups. Hence, it is important to det...
In this paper, we describe our efforts at OSACT Shared Task on Offensive Language Detection. The shared task consists of two subtasks: offensive language detection (Subtask A) and hate speech detection (Subtask B). For offensive language detection, a system combination of Support Vector Machines (SVMs) and Deep Neural Networks (DNNs) achieved the b...
Artificial generation of facial images is increasingly popular, with machine learning achieving photo-realistic results. Yet, there is a concern that the generated images might not fairly represent all demographic groups. We use a state-of-the-art method to generate 10,000 facial images and find that the generated images are skewed towards young pe...
The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multiple platforms, there is a lack of models for online hate det...
Due to the rapid advancement of different neural network architectures, the task of automated translation from one language to another is now in a new era of Machine Translation (MT) research. In the last few years, Neural Machine Translation (NMT) architectures have proven to be successful for resource-rich languages, trained on a large dataset of...
Social media analytics is insightful, but it can also be difficult to use within organizations due to lack of analytics skills and empathy towards raw numbers portraying target groups. To address this concern, we present Automatic Persona Generation (APG), a system and methodology [1] for quantitatively generating personas using large amounts of on...
Machine translation systems facilitate our communication and access to information, taking down language barriers. It is a well-researched area of Natural Language Processing (NLP), especially for resource-rich languages (e.g., language pairs in Europarl Parallel corpus). Besides these languages, there is also work on other language pairs including...
We propose a novel approach to the study of how artificial neural
network perceive the distinction between grammatical and ungrammatical
sentences, a crucial task in the growing field of synthetic
linguistics. The method is based on performance measures of language
models trained on corpora and fine-tuned with either grammatical or
ungrammatical se...
Named Entity Recognition is one of the fundamental problems for Information Extraction and the task is to find the mentioned entities in text. Over the years there has been significant progress in Named Entity Recognition (NER) research for resource-rich languages such as English, Chinese, and Italian. Although, there are a number of studies for Ba...
Depression is a major debilitating disorder which can affect people from all ages. With a continuous increase in the number of annual cases of depression, there is a need to develop automatic techniques for the detection of the presence and extent of depression. In this AVEC challenge we explore different modalities (speech, language and visual fea...
Silence is an integral part of the most frequent turn-taking phenomena in spoken conversations. Silence is sized and placed within the conversation flow and it is coordinated by the speakers along with the other speech acts. The objective of this analytical study is twofold: to explore the functions of silence with duration of one second and above,...
Modern data-driven spoken language systems (SLS) require manual semantic annotation for training spoken language understanding parsers. Multilingual porting of SLS demands significant manual effort and language resources, as this manual annotation has to be replicated. Crowdsourcing is an accessible and cost-effective alternative to traditional met...
The study of human interaction dynamics has been at the center for multiple research disciplines including computer and social sciences, conversational analysis and psychology, for over decades. Recent interest has been shown with the aim of designing computational models to improve human-machine interaction system as well as support humans in thei...
The motivation behind the research on overlapping speech has always been dominated by the need to model human- machine interaction for dialog systems and conversation anal- ysis. To have more complex insights of the interlocutors’ intentions behind the interaction, we need to understand the type of overlaps. Overlapping speech signals the interlocu...
Part-of-speech (POS) information is one of the fundamental components in the natural language processing pipeline, which helps in extracting higher-level information such as named entities, discourse, and syntactic structure of a sentence. For some languages, such as English, Dutch, and Chinese, it is considered as a solved problem due to the highe...
In this paper, we aim to investigate the coordination of interlocutors behavior in different emotional segments. Conversational coordination between the interlocutors is the tendency of speakers to predict and adjust each other accordingly on an ongoing conversation. In order to find such a coordination, we investigated 1) lexical similarities betw...
Discourse parsing is an important task in Language Understanding
with applications to human-human and human-machine communication
modeling. However, most of the research has focused on
written text, and parsers heavily rely on syntactic parsers that themselves
have low performance on dialog data. In our work, we address
the problem of analyzing the...
Overlapping speech is one of the most frequently occurring events in the course of human-human conversations. Understanding the dynamics of overlapping speech is crucial for conversational analysis and for modeling human-machine dialog. Overlapping speech may signal the speaker’s intention to grab the floor with a competitive vs non-competitive act...
Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated. However, complex tasks like semantic annotation trans- fer require workers to take simultaneous decisions on chunk segmentation and labeling while acquir...