About
268
Publications
45,783
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,535
Citations
Publications
Publications (268)
We propose a lesion-aware graph neural network (LEGNet) to predict language ability from resting-state fMRI (rs-fMRI) connectivity in patients with post-stroke aphasia. Our model integrates three components: an edge-based learning module that encodes functional connectivity between brain regions, a lesion encoding module, and a subgraph learning mo...
Predicting emotions elicited by news headlines can be challenging as the task is largely influenced by the varying nature of people's interpretations and backgrounds. Previous works have explored classifying discrete emotions directly from news headlines. We provide a different approach to tackling this problem by utilizing people's explanations of...
Handwriting synthesis, the task of automatically generating realistic images of handwritten text, has gained increasing attention in recent years, both as a challenge in itself, as well as a task that supports handwriting recognition research. The latter task is to synthesize large image datasets that can then be used to train deep learning models...
News media structure their reporting of events or issues using certain perspectives. When describing an incident involving gun violence, for example, some journalists may focus on mental health or gun regulation, while others may emphasize the discussion of gun rights. Such perspectives are called \say{frames} in communication research. We study, f...
Multimodal machine learning models are being developed to analyze pathology images and other modalities, such as gene expression, to gain clinical and biological insights. However, most frameworks for multimodal data fusion do not fully account for the interactions between different modalities. Here, we present an attention-based fusion architectur...
See also the commentary by Sitek in this issue. Supplemental material is available for this article.
Multimodal machine learning models are being developed to analyze pathology images and other modalities, such as gene expression, to gain clinical and biological insights. However, most frameworks for multimodal data fusion do not fully account for the interactions between different modalities. Here, we present an attention-based fusion architectur...
A major bottleneck of interdisciplinary computer vision (CV) research is the lack of a framework that eases the reuse and abstraction of state-of-the-art CV models by CV and non-CV researchers alike. We present here BU-CVKit, a computer vision framework that allows the creation of research pipelines with chainable Processors. The community can crea...
Background
Although speech-language therapy (SLT) is proven to be beneficial to recovery of post-stroke aphasia, delivering sufficiently high amounts of dosage remains a problem in real-world clinical practice. Self-managed SLT was introduced to solve the problem. Previous research showed in a 10-week period, increased dosage frequency could lead t...
Focusing on a polarized issue—U.S. gun violence—this study examines agenda setting as an antecedent of political expression on social media. A state-of-the-art machine-learning model was used to analyze news coverage from 25 media outlets—mainstream and partisan. Those results were paired with a two-wave panel survey conducted during the 2018 U.S....
Hand-held ultrasound devices are emerging as a promising intervention to aid in diagnosing deadly early childhood pneumonia in the developing world. Lung ultrasound (LUS) data, however, can be difficult to read and interpret accurately, and thus require trained professionals. A variety of deep learning (DL) models have been developed to aid profess...
While transformers have greatly boosted performance in semantic segmentation, domain adaptive transformers are not yet well explored. We identify that the domain gap can cause discrepancies in self-attention. Due to this gap, the transformer attends to spurious regions or pixels, which deteriorates accuracy on the target domain. We propose to perfo...
Accurate tracking of the 3D pose of animals from video recordings is critical for many behavioral studies, yet there is a dearth of publicly available datasets that the computer vision community could use for model development. We here introduce the Rodent3D dataset that records animals exploring their environment and/or interacting with each other...
While pose estimation is an important computer vision task, it requires expensive annotation and suffers from domain shift. In this paper, we investigate the problem of domain adaptive 2D pose estimation that transfers knowledge learned on a synthetic source domain to a target domain without supervision. While several domain adaptive pose estimatio...
A major challenge for online learning systems is supporting students’ engagement. Online systems are sometimes boring, repetitive, and unappealing; external distractions often lead to off-task behavior, and a decline in learning. Student engagement and emotion are also tightly correlated with learning gains because emotion drives attention and atte...
While pose estimation is an important computer vision task, it requires expensive annotation and suffers from domain shift. In this paper, we investigate the problem of domain adaptive 2D pose estimation that transfers knowledge learned on a synthetic source domain to a target domain without supervision. While several domain adaptive pose estimatio...
We propose a five-step computational framing analysis framework that researchers can use to analyze multilingual news data. The framework combines unsupervised and supervised machine learning and leverages a state-of-the-art multilingual deep learning model, which can significantly enhance frame prediction performance while requiring a considerably...
Background:
Poststroke recovery depends on multiple factors and varies greatly across individuals. Using machine learning models, this study investigated the independent and complementary prognostic role of different patient-related factors in predicting response to language rehabilitation after a stroke.
Methods:
Fifty-five individuals with chr...
We propose a video-based transfer learning approach for predicting problem outcomes of students working with an intelligent tutoring system (ITS) by analyzing their faces and gestures. The ability to predict such outcomes enables tutoring systems to adjust interventions and ultimately yield improved student learning. We collected and released a lab...
Background
Accurate patient identification is essential for delivering longitudinal care. Our team developed an ear biometric system (SEARCH) to improve patient identification. To address how ear growth affects matching rates longitudinally, we constructed an infant cohort, obtaining ear image sets monthly to map a 9-month span of observations. Thi...
Abstract— In this work, we propose a video-based transfer learning approach for predicting problem outcomes of students working with an intelligent tutoring system (ITS). By analyzing a student’s face and gestures, our method predicts the outcome of a student answering a problem in an ITS from a video feed. Our work is motivated by the reasoning th...
Datasets of documents in Arabic are urgently needed to promote computer vision and natural language processing research that addresses the specifics of the language. Unfortunately, publicly available Arabic datasets are limited in size and restricted to certain document domains. This paper presents the release of BE-Arabic-9K, a dataset of more tha...
When journalists cover a news story, they can cover the story from multiple angles or perspectives. These perspectives are called "frames", and usage of one frame or another may influence public perception and opinion of the issue at hand. We develop a web-based system for analyzing frames in multilingual text documents. We propose and guide users...
News media structure their reporting of events or issues using certain perspectives. When describing an incident involving gun violence, for example, some journalists may focus on mental health or gun regulation, while others may emphasize the discussion of gun rights. Such perspectives are called "frames" in communication research. We study, for t...
Deep learning is a powerful tool for assessing pathology data obtained from digitized biopsy slides. In the context of supervised learning, most methods typically divide a whole slide image (WSI) into patches, aggregate convolutional neural network outcomes on them and estimate
overall disease grade. However, patch-based methods introduce label noi...
While using online learning software, students demonstrate many reactions, various levels of engagement, and emotions (e.g. confusion, boredom, excitement). Having such information automatically accessible to teachers (or digital tutors) can aid in understanding how students are progressing, and suggest who and when needs further assistance. As par...
Unsupervised domain adaptation for semantic segmentation has been intensively studied due to the low cost of the pixel-level annotation for synthetic data. The most common approaches try to generate images or features mimicking the distribution in the target domain while preserving the semantic contents in the source domain so that a model can be t...
Interstitial fibrosis and tubular atrophy (IFTA) on a renal biopsy are strong indicators of disease chronicity and prognosis. Techniques that are typically used for IFTA grading remain manual, leading to variability among pathologists. Accurate IFTA estimation using computational techniques can reduce this variability and provide quantitative asses...
Despite several transient spikes in response to the deadliest mass shootings, the U.S. population continues to perceive gun violence as less important than other issues, and public opinion remains divided along partisan lines. Drawing upon literature of compelling arguments and partisan media, this study investigates what kind of news framing—episo...
While visual object detection with deep learning has received much attention in the past decade, cases when heavy intra-class occlusions occur have not been studied thoroughly. In this work, we propose a Non-Maximum-Suppression (NMS) algorithm that dramatically improves the detection recall while maintaining high precision in scenes with heavy occl...
Patient identification in low- to middle-income countries is one of the most pressing public health challenges of our day. Given the ubiquity of mobile phones, their use for health-care coupled with a biometric identification method, present a unique opportunity to address this challenge. Our research proposes an Android-based solution of an ear bi...
Text information in scanned documents becomes accessible only when extracted and interpreted by a text recognizer. For a recognizer to work successfully, it must have detailed location information about the regions of the document images that it is asked to analyse. It will need focus on page regions with text skipping non-text regions that include...
Unsupervised domain adaptation for semantic segmentation has been intensively studied due to the low cost of the pixel-level annotation for synthetic data. The most common approaches try to generate images or features mimicking the distribution in the target domain while preserving the semantic contents in the source domain so that a model can be t...
Scene text recognition is the task of recognizing character sequences in images of natural scenes. The considerable diversity in the appearance of text in a scene image and potentially highly complex backgrounds make text recognition challenging. Previous approaches employ character sequence generators to analyze text regions and, subsequently, com...
When journalists cover a news story, they can cover the story from multiple angles or perspectives. A news article written about COVID-19 for example, might focus on personal preventative actions such as mask-wearing, while another might focus on COVID-19's impact on the economy. These perspectives are called "frames," which when used may influence...
We present a new, publicly-available image dataset generated by the NVIDIA Deep Learning Data Synthesizer intended for use in object detection, pose estimation, and tracking applications. This dataset contains 144k stereo image pairs that synthetically combine 18 camera viewpoints of three photorealistic virtual environments with up to 10 objects (...
In this paper, we focus on visual complexity, an image attribute that humans can subjectively evaluate based on the level of details in the image. We explore unsupervised information extraction from intermediate convolutional layers of deep neural networks to measure visual complexity. We derive an activation energy metric that combines convolution...
In the context of building an intelligent tutoring system (ITS), which improves student learning outcomes by intervention, we set out to improve prediction of student problem outcome. In essence, we want to predict the outcome of a student answering a problem in an ITS from a video feed by analyzing their face and gestures. For this, we present a n...
We report results of a comparison of the accuracy of crowdworkers and seven NaturalLanguage Processing (NLP) toolkits in solving two important NLP tasks, named-entity recognition (NER) and entity-level sentiment(ELS) analysis. We here focus on a challenging dataset, 1,000 political tweets that were collected during the U.S. presidential primary ele...
Crowdcoding, a method that outsources “coding” tasks to numerous people on the internet, has emerged as a popular approach for annotating texts and visuals. However, the performance of this approach for analyzing social media data in the context of journalism and mass communication research has not been systematically assessed. This study evaluated...
In the past decade, deep learning based visual object detection has received a significant amount of attention, but cases when heavy intra-class occlusions occur are not studied thoroughly. In this work, we propose a novel Non-MaximumSuppression (NMS) algorithm that dramatically improves the detection recall while maintaining high precision in scen...
We introduce a new framework for text detection named SA-Text meaning "Simple but Accurate," which utilizes heatmaps to detect text regions in natural scene images effectively. SA-Text detects text that occurs in various fonts, shapes, and orientations in natural scene images with complicated backgrounds. Experiments on three challenging and public...
Monitoring population-level changes in diet could be useful for education and for implementing interventions to improve health. Research has shown that data from social media sources can be used for monitoring dietary behavior.We propose a scrape-by-location methodology to create food image datasets from Instagram posts. We used it to collect 3.56...
Foreground object segmentation is a critical step for many image analysis tasks. While automated methods can produce high-quality results, their failures disappoint users in need of practical solutions. We propose a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher...
Monitoring population-level changes in diet could be useful for education and for implementing interventions to improve health. Research has shown that data from social media sources can be used for monitoring dietary behavior. We propose a scrape-by-location methodology to create food image datasets from Instagram posts. We used it to collect 3.56...
State-of-the-art text spotting systems typically aim to detect isolated words or word-by-word text in images of natural scenes and ignore the semantic coherence within a region of text. However, when interpreted together, seemingly isolated words may be easier to recognize. On this basis, we propose a novel "semantic-based text recognition" (STR) d...
Home-based exercising is a vital part of any physical therapy program. With correct execution of the exercises, faster recovery from physical impairments can be achieved. With the conventional approaches, however, a home-based physical therapy program may not be as effective due to the lack of supervision by the therapist at home. ExerciseCheck is...
ExerciseCheck is a scalable, accessible platform designed and developed for the remote monitoring and evaluation of physical therapy. Physical rehabilitation is an important aspect of a patient's recovery from injury and often requires the patient to perform a prescribed set of exercises over medium- to long-term periods. In the absence of a physic...
We present a new, publicly-available image dataset generated by the NVIDIA Deep Learning Data Synthesizer intended for use in object detection, pose estimation, and tracking applications. This dataset contains 144k stereo image pairs that synthetically combine 18 camera viewpoints of three photorealistic virtual environments with up to 10objects (c...
Foreground object segmentation is a critical step for many image analysis tasks. While automated methods can produce high-quality results, their failures disappoint users in need of practical solutions. We propose a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher...
Due to concerns about human error in crowdsourcing, it is standard practice to collect labels for the same data point from multiple internet workers. We here show that the resulting budget can be used more effectively with a flexible worker assignment strategy that asks fewer workers to analyze easy-to-label data and more workers to analyze data th...