Shammur Absar Chowdhury

Shammur Absar Chowdhury
Qatar Computing Research Institute · ALT

PhD

About

100
Publications
52,786
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,405
Citations
Introduction
I am interested in analyzing and understanding human conversation. I authored more than 30 papers for different speech and NLP challenges, with the main focus on speech overlaps, turn-takings, speech discourse, code-switching, along with the explainability of the speech modules. My work also includes studying the potential of language models for its linguistic task understanding capabilities.
Additional affiliations
May 2019 - present
Qatar Computing Research Institute
Position
  • PostDoc Position
September 2017 - April 2019
University of Trento
Position
  • PostDoc Position
November 2012 - April 2017
University of Trento
Position
  • PhD
Description
  • Analyzing turn-taking behavior and different types of overlap and silence in a conversation.
Education
November 2012 - April 2017
University of Trento
Field of study
  • Department of Information Engineering and Computer Science
January 2007 - December 2010
BRAC University
Field of study
  • Computer Science
January 2007 - December 2010
BRAC University
Field of study
  • Electronics and Communication Engineering

Publications

Publications (100)
Conference Paper
Full-text available
User satisfaction is an important aspect of the user experience while interacting with objects, systems or people. Traditionally user satisfaction is evaluated a-posteriori via spoken or written questionnaires or interviews. In automatic behavioral analysis we aim at measuring the user emotional states and its descriptions as they unfold during the...
Conference Paper
Full-text available
The paper explores the ability of LSTM networks trained on a language modeling task to detect linguistic structures which are ungrammatical due to extraction violations (extra arguments and subject-relative clause island violations), and considers its implications for the debate on language innatism. The results show that the current RNN model can...
Article
Full-text available
Overlapping speech is a natural and frequently occurring phenomenon in human–human conversations with an underlying purpose. Speech overlap events may be categorized as competitive and non-competitive. While the former is an attempt to grab the floor, the latter is an attempt to assist the speaker to continue the turn. The presence and distribution...
Conference Paper
Full-text available
An end-to-end dialect identification system generates the likelihood of each dialect, given a speech utterance. The performance relies on its capabilities to discriminate the acoustic properties between the different dialects, even though the input signal contains non-dialectal information such as speaker and channel. In this work, we study how non...
Preprint
Full-text available
With the advent of globalization, there is an increasing demand for multilingual automatic speech recognition (ASR), handling language and dialectal variation of spoken content. Recent studies show its efficacy over monolingual systems. In this study, we design a large multilingual end-to-end ASR using self-attention based conformer architecture. W...
Preprint
Full-text available
Arabic, with its rich diversity of dialects, remains significantly underrepresented in Large Language Models, particularly in dialectal variations. We address this gap by introducing seven synthetic datasets in dialects alongside Modern Standard Arabic (MSA), created using Machine Translation (MT) combined with human post-editing. We present AraDiC...
Preprint
Full-text available
This paper presents a novel Dialectal Sound and Vowelization Recovery framework, designed to recognize borrowed and dialectal sounds within phonologically diverse and dialect-rich languages, that extends beyond its standard orthographic sound sets. The proposed framework utilized a quantized sequence of input with(out) continuous pretrained self-su...
Preprint
Full-text available
Natural Question Answering (QA) datasets play a crucial role in developing and evaluating the capabilities of large language models (LLMs), ensuring their effective usage in real-world applications. Despite the numerous QA datasets that have been developed, there is a notable lack of region-specific datasets generated by native users in their own l...
Conference Paper
Full-text available
Recent advancements in Large Language Models (LLMs) have significantly influenced the landscape of language and speech research. Despite this progress, these models lack specific benchmarking against state-of-the-art (SOTA) models tailored to particular languages and tasks. LAraBench addresses this gap for Ara-bic Natural Language Processing (NLP)...
Preprint
Full-text available
Self-supervised models have revolutionized speech processing, achieving new levels of performance in a wide variety of tasks with limited resources. However, the inner workings of these models are still opaque. In this paper, we aim to analyze the encoded contextual representation of these foundation models based on their inter- and intra-model sim...
Preprint
Full-text available
Children's speech recognition is considered a low-resource task mainly due to the lack of publicly available data. There are several reasons for such data scarcity, including expensive data collection and annotation processes, and data privacy, among others. Transforming speech signals into discrete tokens that do not carry sensitive information bu...
Article
Full-text available
We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets—analyzi...
Preprint
Full-text available
The recent development and success of Large Language Models (LLMs) necessitate an evaluation of their performance across diverse NLP tasks in different languages. Although several frameworks have been developed and made publicly available, their customization capabilities for specific tasks and datasets are often complex for different users. In thi...
Preprint
Full-text available
We introduce MyVoice, a crowdsourcing platform designed to collect Arabic speech to enhance dialectal speech technologies. This platform offers an opportunity to design large dialectal speech datasets; and makes them publicly available. MyVoice allows contributors to select city/country-level fine-grained dialect and record the displayed utterances...
Preprint
Full-text available
The disparity in phonology between learner's native (L1) and target (L2) language poses a significant challenge for mispronunciation detection and diagnosis (MDD) systems. This challenge is further intensified by lack of annotated L2 data. This paper proposes a novel MDD architecture that exploits multiple `views' of the same input data assisted by...
Preprint
Full-text available
With large Foundation Models (FMs), language technologies (AI in general) are entering a new paradigm: eliminating the need for developing large-scale task-specific datasets and supporting a variety of tasks through set-ups ranging from zero-shot to few-shot learning. However, understanding FMs capabilities requires a systematic benchmarking effort...
Preprint
Full-text available
This paper introduces a novel Arabic pronunciation learning application QVoice, powered with end-to-end mispronunciation detection and feedback generator module. The application is designed to support non-native Arabic speakers in enhancing their pronunciation skills, while also helping native speakers mitigate any potential influence from regional...
Preprint
Full-text available
The success of the multilingual automatic speech recognition systems empowered many voice-driven applications. However, measuring the performance of such systems remains a major challenge, due to its dependency on manually transcribed speech data in both mono- and multilingual scenarios. In this paper, we propose a novel multilingual framework -- e...
Preprint
Full-text available
Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition. In this paper, we focus on the question of robust and fair evaluation metrics. To that end, we develop a reference benchmark data set of code-switching speech recognition hypotheses with human judgments. We define clear guidelines for minima...
Preprint
Full-text available
One of the biggest challenges in designing mispronunciation detection models is the unavailability of labeled L2 speech data. To overcome such data scarcity, we introduce SpeechBlender -- a fine-grained data augmentation pipeline for generating mispronunciation errors. The SpeechBlender utilizes varieties of masks to target different regions of a p...
Preprint
Full-text available
We are interested in the problem of conversational analysis and its application to the health domain. Cognitive Behavioral Therapy is a structured approach in psychotherapy, allowing the therapist to help the patient to identify and modify the malicious thoughts, behavior, or actions. This cooperative effort can be evaluated using the Working Allia...
Preprint
Full-text available
Gender analysis of Twitter can reveal important socio-cultural differences between male and female users. There has been a significant effort to analyze and automatically infer gender in the past for most widely spoken languages' content, however, to our knowledge very limited work has been done for Arabic. In this paper, we perform an extensive an...
Preprint
Full-text available
The emergence of the COVID-19 pandemic and the first global infodemic have changed our lives in many different ways. We relied on social media to get the latest information about the COVID-19 pandemic and at the same time to disseminate information. The content in social media consisted not only health related advises, plans, and informative news f...
Preprint
Full-text available
We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets -- anal...
Preprint
Full-text available
The pervasiveness of intra-utterance Code-switching (CS) in spoken content has enforced ASR systems to handle mixed input. Yet, designing a CS-ASR has many challenges, mainly due to the data scarcity, grammatical structure complexity, and mismatch along with unbalanced language usage distribution. Recent ASR studies showed the predominance of E2E-A...
Conference Paper
Full-text available
Code-switching in automatic speech recognition (ASR) is an important challenge due to globalization. Recent research in multilingual ASR shows potential improvement over mono-lingual systems. We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments. Our innovative framework deploys a multi-graph a...
Conference Paper
Full-text available
With the advent of globalization, there is an increasing demand for multilingual automatic speech recognition (ASR), handling language and dialectal variation of spoken content. Recent studies show its efficacy over monolingual systems. In this study, we design a large multilingual end-to-end ASR using self-attention based conformer architecture. W...
Conference Paper
Full-text available
We introduce the largest transcribed Arabic speech corpus, QASR 1 , collected from the broadcast domain. This multi-dialect speech dataset contains 2, 000 hours of speech sampled at 16kHz crawled from Aljazeera news channel. The dataset is released with lightly supervised transcriptions, aligned with the audio segments. Unlike previous datasets, QA...
Preprint
Full-text available
Bangla -- ranked as the 6th most widely spoken language across the world (https://www.ethnologue.com/guides/ethnologue200), with 230 million native speakers -- is still considered as a low-resource language in the natural language processing (NLP) community. With three decades of research, Bangla NLP (BNLP) is still lagging behind mainly due to the...
Preprint
Full-text available
End-to-end deep neural network architectures have pushed the state-of-the-art in speech technologies, as well as in other spheres of Artificial Intelligence, subsequently leading researchers to train more complex and deeper models. These improvements came at the cost of transparency. Deep neural networks are innately opaque and difficult to interpr...
Preprint
Full-text available
Code-switching in automatic speech recognition (ASR) is an important challenge due to globalization. Recent research in multilingual ASR shows potential improvement over monolingual systems. We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments. Our innovative framework deploys a multi-graph ap...
Article
Personified big data and rapidly developing data science techniques enable previously unforeseen methodological developments for longitudinal analysis of online audiences. Applying data-driven persona generation on online customer statistics from a real organizational social media channel, we demonstrate how personas can be deployed to understand o...
Preprint
Full-text available
We introduce the largest transcribed Arabic speech corpus, QASR, collected from the broadcast domain. This multi-dialect speech dataset contains 2,000 hours of speech sampled at 16kHz crawled from Aljazeera news channel. The dataset is released with lightly supervised transcriptions, aligned with the audio segments. Unlike previous datasets, QASR c...
Preprint
Full-text available
In this paper, we present the Kanari/QCRI (KARI) system and the modeling strategies used to participate in the Interspeech 2021 Code-switching (CS) challenge for low-resource Indian languages. The subtask involved developing a speech recognition system for two CS datasets: Hindi-English and Bengali-English, collected in a real-life scenario. To tac...
Article
Full-text available
Automatic Speech Recognition refers to the process through which speech is converted into text. Over the decades, automatic speech recognition has achieved many milestones, thanks to advances in machine learning and low-cost computer hardware. As a result, the best systems for English have achieved a single-digit word error rate (WER) and, in some...
Article
False preconceptions about users can result in poor design, product development, and marketing decisions, so rectifying these preconceptions is essential for organizations. This research quantitatively evaluates the ability of data-driven personas to alter decision makers’ preconceptions about their online social media users. We conduct a within-pa...
Conference Paper
Full-text available
Sentiment analysis has been widely used to understand our views on social and political agendas or user experiences over a product. It is one of the cores and well-researched areas in NLP. However, for low-resource languages, like Bangla, one of the prominent challenge is the lack of resources. Another important limitation, in the current literatur...
Conference Paper
Full-text available
Automatic categorization of short texts, such as news headlines and social media posts, has many applications ranging from content analysis to recommendation systems. In this paper, we use such text categorization i.e., labeling the social media posts to categories like 'sports', 'politics', 'human-rights' among others, to showcase the efficacy of...
Preprint
Full-text available
Sentiment analysis has been widely used to understand our views on social and political agendas or user experiences over a product. It is one of the cores and well-researched areas in NLP. However, for low-resource languages, like Bangla, one of the prominent challenge is the lack of resources. Another important limitation, in the current literatur...
Conference Paper
Full-text available
The Intra-utterance code-switching (CS) is defined as the alternation between two or more languages within the same utterance. Despite the fact that spoken dialectal code-switching (DCS) is more challenging than CS, it remains largely unexplored. In this study, we describe a method to build the first spoken DCS corpus. The corpus is annotated at th...
Chapter
Full-text available
Algorithmic fairness criteria for machine learning models are gathering widespread research interest. They are also relevant in the context of data-driven personas that rely on online user data and opaque algorithmic processes. Overall, while technology provides lucrative opportunities for the persona design practice, several ethical concerns need...
Chapter
Full-text available
To predict personality traits of data-driven personas, we apply an automatic persona generation methodology to generate 15 personas from the social media data of an online news organization. After generating the personas, we aggregate each personas’ YouTube comments and predict the “Big Five” personality traits of each persona from the comments per...
Conference Paper
Full-text available
Access to social media often enables users to engage in conversation with limited accountability. This allows a user to share their opinions and ideology, especially regarding public content, occasionally adopting offensive language. This may encourage hate crimes or cause mental harm to targeted individuals or groups. Hence, it is important to det...
Article
Full-text available
In this paper, we describe our efforts at OSACT Shared Task on Offensive Language Detection. The shared task consists of two subtasks: offensive language detection (Subtask A) and hate speech detection (Subtask B). For offensive language detection, a system combination of Support Vector Machines (SVMs) and Deep Neural Networks (DNNs) achieved the b...
Conference Paper
Full-text available
User perceptions of personas affect the adoption of personas for decision-making in real organizations. To investigate how experience affects the way an individual perceives a persona, we conduct an experimental study with individuals less and more experienced with personas. Quantitative results show that previous experience increases several impor...
Conference Paper
Full-text available
Artificial generation of facial images is increasingly popular, with machine learning achieving photo-realistic results. Yet, there is a concern that the generated images might not fairly represent all demographic groups. We use a state-of-the-art method to generate 10,000 facial images and find that the generated images are skewed towards young pe...
Conference Paper
Full-text available
Personas are a well-known technique in human computer interaction. However, there is a lack of rigorous empirical research evaluating personas relative to other methods. In this 34-participant experiment, we compare a persona system and an analytics system, both using identical user data, for efficiency and effectiveness for a user identification t...
Article
Full-text available
The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multiple platforms, there is a lack of models for online hate det...
Preprint
Full-text available
Due to the rapid advancement of different neural network architectures, the task of automated translation from one language to another is now in a new era of Machine Translation (MT) research. In the last few years, Neural Machine Translation (NMT) architectures have proven to be successful for resource-rich languages, trained on a large dataset of...