Shammur Absar Chowdhury

Shammur Absar Chowdhury
Qatar Computing Research Institute · ALT

PhD

About

70
Publications
28,880
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
595
Citations
Introduction
I am interested in analyzing and understanding human conversation. I authored more than 30 papers for different speech and NLP challenges, with the main focus on speech overlaps, turn-takings, speech discourse, code-switching, along with the explainability of the speech modules. My work also includes studying the potential of language models for its linguistic task understanding capabilities.
Additional affiliations
May 2019 - present
Qatar Computing Research Institute
Position
  • PostDoc Position
September 2017 - April 2019
Università degli Studi di Trento
Position
  • PostDoc Position
November 2012 - April 2017
Università degli Studi di Trento
Position
  • PhD
Description
  • Analyzing turn-taking behavior and different types of overlap and silence in a conversation.
Education
November 2012 - April 2017
Università degli Studi di Trento
Field of study
  • Department of Information Engineering and Computer Science
January 2007 - December 2010
BRAC University
Field of study
  • Computer Science
January 2007 - December 2010
BRAC University
Field of study
  • Electronics and Communication Engineering

Publications

Publications (70)
Conference Paper
Full-text available
User satisfaction is an important aspect of the user experience while interacting with objects, systems or people. Traditionally user satisfaction is evaluated a-posteriori via spoken or written questionnaires or interviews. In automatic behavioral analysis we aim at measuring the user emotional states and its descriptions as they unfold during the...
Conference Paper
Full-text available
The paper explores the ability of LSTM networks trained on a language modeling task to detect linguistic structures which are ungrammatical due to extraction violations (extra arguments and subject-relative clause island violations), and considers its implications for the debate on language innatism. The results show that the current RNN model can...
Article
Full-text available
Overlapping speech is a natural and frequently occurring phenomenon in human–human conversations with an underlying purpose. Speech overlap events may be categorized as competitive and non-competitive. While the former is an attempt to grab the floor, the latter is an attempt to assist the speaker to continue the turn. The presence and distribution...
Conference Paper
Full-text available
An end-to-end dialect identification system generates the likelihood of each dialect, given a speech utterance. The performance relies on its capabilities to discriminate the acoustic properties between the different dialects, even though the input signal contains non-dialectal information such as speaker and channel. In this work, we study how non...
Preprint
Full-text available
With the advent of globalization, there is an increasing demand for multilingual automatic speech recognition (ASR), handling language and dialectal variation of spoken content. Recent studies show its efficacy over monolingual systems. In this study, we design a large multilingual end-to-end ASR using self-attention based conformer architecture. W...
Preprint
We are interested in the problem of conversational analysis and its application to the health domain. Cognitive Behavioral Therapy is a structured approach in psychotherapy, allowing the therapist to help the patient to identify and modify the malicious thoughts, behavior, or actions. This cooperative effort can be evaluated using the Working Allia...
Preprint
Full-text available
Gender analysis of Twitter can reveal important socio-cultural differences between male and female users. There has been a significant effort to analyze and automatically infer gender in the past for most widely spoken languages' content, however, to our knowledge very limited work has been done for Arabic. In this paper, we perform an extensive an...
Preprint
Full-text available
The emergence of the COVID-19 pandemic and the first global infodemic have changed our lives in many different ways. We relied on social media to get the latest information about the COVID-19 pandemic and at the same time to disseminate information. The content in social media consisted not only health related advises, plans, and informative news f...
Preprint
Full-text available
We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets -- anal...
Preprint
Full-text available
The pervasiveness of intra-utterance Code-switching (CS) in spoken content has enforced ASR systems to handle mixed input. Yet, designing a CS-ASR has many challenges, mainly due to the data scarcity, grammatical structure complexity, and mismatch along with unbalanced language usage distribution. Recent ASR studies showed the predominance of E2E-A...
Conference Paper
Full-text available
Code-switching in automatic speech recognition (ASR) is an important challenge due to globalization. Recent research in multilingual ASR shows potential improvement over mono-lingual systems. We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments. Our innovative framework deploys a multi-graph a...
Conference Paper
Full-text available
With the advent of globalization, there is an increasing demand for multilingual automatic speech recognition (ASR), handling language and dialectal variation of spoken content. Recent studies show its efficacy over monolingual systems. In this study, we design a large multilingual end-to-end ASR using self-attention based conformer architecture. W...
Conference Paper
Full-text available
We introduce the largest transcribed Arabic speech corpus, QASR 1 , collected from the broadcast domain. This multi-dialect speech dataset contains 2, 000 hours of speech sampled at 16kHz crawled from Aljazeera news channel. The dataset is released with lightly supervised transcriptions, aligned with the audio segments. Unlike previous datasets, QA...
Preprint
Full-text available
Bangla -- ranked as the 6th most widely spoken language across the world (https://www.ethnologue.com/guides/ethnologue200), with 230 million native speakers -- is still considered as a low-resource language in the natural language processing (NLP) community. With three decades of research, Bangla NLP (BNLP) is still lagging behind mainly due to the...
Preprint
Full-text available
End-to-end deep neural network architectures have pushed the state-of-the-art in speech technologies, as well as in other spheres of Artificial Intelligence, subsequently leading researchers to train more complex and deeper models. These improvements came at the cost of transparency. Deep neural networks are innately opaque and difficult to interpr...
Preprint
Full-text available
Code-switching in automatic speech recognition (ASR) is an important challenge due to globalization. Recent research in multilingual ASR shows potential improvement over monolingual systems. We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments. Our innovative framework deploys a multi-graph ap...
Article
Personified big data and rapidly developing data science techniques enable previously unforeseen methodological developments for longitudinal analysis of online audiences. Applying data-driven persona generation on online customer statistics from a real organizational social media channel, we demonstrate how personas can be deployed to understand o...
Preprint
Full-text available
We introduce the largest transcribed Arabic speech corpus, QASR, collected from the broadcast domain. This multi-dialect speech dataset contains 2,000 hours of speech sampled at 16kHz crawled from Aljazeera news channel. The dataset is released with lightly supervised transcriptions, aligned with the audio segments. Unlike previous datasets, QASR c...
Preprint
Full-text available
In this paper, we present the Kanari/QCRI (KARI) system and the modeling strategies used to participate in the Interspeech 2021 Code-switching (CS) challenge for low-resource Indian languages. The subtask involved developing a speech recognition system for two CS datasets: Hindi-English and Bengali-English, collected in a real-life scenario. To tac...
Article
False preconceptions about users can result in poor design, product development, and marketing decisions, so rectifying these preconceptions is essential for organizations. This research quantitatively evaluates the ability of data-driven personas to alter decision makers’ preconceptions about their online social media users. We conduct a within-pa...
Conference Paper
Full-text available
Sentiment analysis has been widely used to understand our views on social and political agendas or user experiences over a product. It is one of the cores and well-researched areas in NLP. However, for low-resource languages, like Bangla, one of the prominent challenge is the lack of resources. Another important limitation, in the current literatur...
Conference Paper
Full-text available
Automatic categorization of short texts, such as news headlines and social media posts, has many applications ranging from content analysis to recommendation systems. In this paper, we use such text categorization i.e., labeling the social media posts to categories like 'sports', 'politics', 'human-rights' among others, to showcase the efficacy of...
Preprint
Full-text available
Sentiment analysis has been widely used to understand our views on social and political agendas or user experiences over a product. It is one of the cores and well-researched areas in NLP. However, for low-resource languages, like Bangla, one of the prominent challenge is the lack of resources. Another important limitation, in the current literatur...
Conference Paper
Full-text available
The Intra-utterance code-switching (CS) is defined as the alternation between two or more languages within the same utterance. Despite the fact that spoken dialectal code-switching (DCS) is more challenging than CS, it remains largely unexplored. In this study, we describe a method to build the first spoken DCS corpus. The corpus is annotated at th...
Chapter
Full-text available
Algorithmic fairness criteria for machine learning models are gathering widespread research interest. They are also relevant in the context of data-driven personas that rely on online user data and opaque algorithmic processes. Overall, while technology provides lucrative opportunities for the persona design practice, several ethical concerns need...
Chapter
Full-text available
To predict personality traits of data-driven personas, we apply an automatic persona generation methodology to generate 15 personas from the social media data of an online news organization. After generating the personas, we aggregate each personas’ YouTube comments and predict the “Big Five” personality traits of each persona from the comments per...
Conference Paper
Full-text available
Access to social media often enables users to engage in conversation with limited accountability. This allows a user to share their opinions and ideology, especially regarding public content, occasionally adopting offensive language. This may encourage hate crimes or cause mental harm to targeted individuals or groups. Hence, it is important to det...
Article
Full-text available
In this paper, we describe our efforts at OSACT Shared Task on Offensive Language Detection. The shared task consists of two subtasks: offensive language detection (Subtask A) and hate speech detection (Subtask B). For offensive language detection, a system combination of Support Vector Machines (SVMs) and Deep Neural Networks (DNNs) achieved the b...
Conference Paper
Full-text available
Artificial generation of facial images is increasingly popular, with machine learning achieving photo-realistic results. Yet, there is a concern that the generated images might not fairly represent all demographic groups. We use a state-of-the-art method to generate 10,000 facial images and find that the generated images are skewed towards young pe...
Article
Full-text available
The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multiple platforms, there is a lack of models for online hate det...
Preprint
Full-text available
Due to the rapid advancement of different neural network architectures, the task of automated translation from one language to another is now in a new era of Machine Translation (MT) research. In the last few years, Neural Machine Translation (NMT) architectures have proven to be successful for resource-rich languages, trained on a large dataset of...
Conference Paper
Full-text available
Social media analytics is insightful, but it can also be difficult to use within organizations due to lack of analytics skills and empathy towards raw numbers portraying target groups. To address this concern, we present Automatic Persona Generation (APG), a system and methodology [1] for quantitatively generating personas using large amounts of on...
Preprint
Full-text available
Machine translation systems facilitate our communication and access to information, taking down language barriers. It is a well-researched area of Natural Language Processing (NLP), especially for resource-rich languages (e.g., language pairs in Europarl Parallel corpus). Besides these languages, there is also work on other language pairs including...
Conference Paper
Full-text available
We propose a novel approach to the study of how artificial neural network perceive the distinction between grammatical and ungrammatical sentences, a crucial task in the growing field of synthetic linguistics. The method is based on performance measures of language models trained on corpora and fine-tuned with either grammatical or ungrammatical se...
Conference Paper
Full-text available
Named Entity Recognition is one of the fundamental problems for Information Extraction and the task is to find the mentioned entities in text. Over the years there has been significant progress in Named Entity Recognition (NER) research for resource-rich languages such as English, Chinese, and Italian. Although, there are a number of studies for Ba...
Article
Full-text available
Depression is a major debilitating disorder which can affect people from all ages. With a continuous increase in the number of annual cases of depression, there is a need to develop automatic techniques for the detection of the presence and extent of depression. In this AVEC challenge we explore different modalities (speech, language and visual fea...
Conference Paper
Full-text available
Silence is an integral part of the most frequent turn-taking phenomena in spoken conversations. Silence is sized and placed within the conversation flow and it is coordinated by the speakers along with the other speech acts. The objective of this analytical study is twofold: to explore the functions of silence with duration of one second and above,...
Article
Full-text available
Modern data-driven spoken language systems (SLS) require manual semantic annotation for training spoken language understanding parsers. Multilingual porting of SLS demands significant manual effort and language resources, as this manual annotation has to be replicated. Crowdsourcing is an accessible and cost-effective alternative to traditional met...
Thesis
Full-text available
The study of human interaction dynamics has been at the center for multiple research disciplines including computer and social sciences, conversational analysis and psychology, for over decades. Recent interest has been shown with the aim of designing computational models to improve human-machine interaction system as well as support humans in thei...
Conference Paper
Full-text available
The motivation behind the research on overlapping speech has always been dominated by the need to model human- machine interaction for dialog systems and conversation anal- ysis. To have more complex insights of the interlocutors’ intentions behind the interaction, we need to understand the type of overlaps. Overlapping speech signals the interlocu...
Conference Paper
Full-text available
Part-of-speech (POS) information is one of the fundamental components in the natural language processing pipeline, which helps in extracting higher-level information such as named entities, discourse, and syntactic structure of a sentence. For some languages, such as English, Dutch, and Chinese, it is considered as a solved problem due to the highe...
Conference Paper
Full-text available
In this paper, we aim to investigate the coordination of interlocutors behavior in different emotional segments. Conversational coordination between the interlocutors is the tendency of speakers to predict and adjust each other accordingly on an ongoing conversation. In order to find such a coordination, we investigated 1) lexical similarities betw...
Conference Paper
Full-text available
Discourse parsing is an important task in Language Understanding with applications to human-human and human-machine communication modeling. However, most of the research has focused on written text, and parsers heavily rely on syntactic parsers that themselves have low performance on dialog data. In our work, we address the problem of analyzing the...
Conference Paper
Full-text available
Overlapping speech is one of the most frequently occurring events in the course of human-human conversations. Understanding the dynamics of overlapping speech is crucial for conversational analysis and for modeling human-machine dialog. Overlapping speech may signal the speaker’s intention to grab the floor with a competitive vs non-competitive act...
Conference Paper
Full-text available
Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated. However, complex tasks like semantic annotation trans- fer require workers to take simultaneous decisions on chunk segmentation and labeling while acquir...