About
471
Publications
193,651
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
45,025
Citations
Introduction
Nitesh Chawla is the Frank M. Freimann Professor of Computer Science and Engineering at the University of Notre Dame.
Additional affiliations
January 2007 - present
August 1997 - August 2002
Publications
Publications (471)
Introduction
Maintaining an affordable and nutritious diet can be challenging, especially for those living under the conditions of poverty. To fulfill a healthy diet, consumers must make difficult decisions within a complicated food landscape. Decisions must factor information on health and budget constraints, the food supply and pricing options at...
As artificial intelligence becomes more pervasive, explainability and the need to interpret machine learning models’ behavior emerge as critical issues. Discussions are usually bounded by those who defend that interpretable models must be the rule or that non-interpretable models’ ability to capture more complex patterns warrants their use. In this...
Large Language Models (LLMs) have shown remarkable generalization capability with exceptional performance in various language modeling tasks. However, they still exhibit inherent limitations in precisely capturing and returning grounded knowledge. While existing work has explored utilizing knowledge graphs to enhance language modeling via joint tra...
Message Passing Neural Networks (MPNNs) have emerged as the {\em de facto} standard in graph representation learning. However, when it comes to link prediction, they often struggle, surpassed by simple heuristics such as Common Neighbor (CN). This discrepancy stems from a fundamental limitation: while MPNNs excel in node-level representation, they...
BACKGROUND
Older adults often face technological exclusion as their needs and contexts are not considered in the design of digital tools. The user-centered design (UCD) approach, centered on the specific needs and contexts of the user, can potentially address this exclusion.
OBJECTIVE
This study aimed to use the UCD approach to develop a connected...
Graph neural networks (GNNs) have shown remarkable performance on diverse graph mining tasks. While sharing the same message passing framework, our study shows that different GNNs learn distinct knowledge from the same graph. This implies potential performance improvement by distilling the complementary knowledge from multiple models. However, know...
Generative self-supervised learning (SSL), especially masked autoencoders, has become one of the most exciting learning paradigms and has shown great potential in handling graph data. However, real-world graphs are always heterogeneous, which poses three critical challenges that existing methods ignore: 1) how to capture complex graph structure? 2)...
Cross-domain graph few-shot learning attempts to address the prevalent data scarcity issue in graph mining problems. However, the utilization of cross-domain data induces another intractable domain shift issue which severely degrades the generalization ability of cross-domain graph few-shot learning models. The combat with the domain shift issue is...
This tutorial paper provides a general overview of symbolic regression (SR) with specific focus on standards of interpretability. We posit that interpretable modeling, although its definition is still disputed in the literature, is a practical way to support the evaluation of successful information fusion. In order to convey the benefits of SR as a...
Large Language Models (LLMs) with strong abilities in natural language processing tasks have emerged and have been rapidly applied in various kinds of areas such as science, finance and software engineering. However, the capability of LLMs to advance the field of chemistry remains unclear. In this paper,we establish a comprehensive benchmark contai...
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science. In particular, it encodes molecules as numerical vectors preserving the molecular structures and features, on top of which the downstream tasks (e.g., property prediction) can be performed. Recently, MRL has achieved consider...
Data augmentation forms the cornerstone of many modern machine learning training pipelines; yet, the mechanisms by which it works are not clearly understood. Much of the research on data augmentation (DA) has focused on improving existing techniques, examining its regularization effects in the context of neural network over-fitting, or investigatin...
Convolutional neural networks (CNNs) have achieved impressive results on imbalanced image data, but they still have difficulty generalizing to minority classes and their decisions are difficult to interpret. These problems are related because the method by which CNNs generalize to minority classes, which requires improvement, is wrapped in a black-...
The rapid advancement in data-driven research has increased the demand for effective graph data analysis. However, real-world data often exhibits class imbalance, leading to poor performance of machine learning models. To overcome this challenge, class-imbalanced learning on graphs (CILG) has emerged as a promising solution that combines the streng...
Graph neural networks (GNNs) have shown remarkable performance on diverse graph mining tasks. While sharing the same message passing framework, our study shows that different GNNs learn distinct knowledge from the same graph. This implies potential performance improvement by distilling the complementary knowledge from multiple models. However, know...
The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs o...
italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">With the increasing deployment of small Unmanned Aerial Systems (sUAS) on various tasks, it becomes crucial to analyze and detect anomalies from their flight logs. To support research in this area, we curate DLA, the first real-world time series anomaly...
Spread of nonindigenous species by shipping is a large and growing global problem that harms coastal ecosystems and economies and may blur coastal biogeographic patterns. This study coupled eukaryotic environmental DNA (eDNA) metabarcoding with dissimilarity regression to test the hypothesis that ship-borne species spread homogenizes port communiti...
Graph Neural Networks (GNNs) have attracted tremendous attention by demonstrating their capability to handle graph data. However, they are difficult to be deployed in resource-limited devices due to model sizes and scalability constraints imposed by the multi-hop data dependency. In addition, real-world graphs usually possess complex structural inf...
This tutorial paper provides a general overview of symbolic regression (SR) with specific focus on standards of interpretability. We posit that interpretable modeling, although its definition is still disputed in the literature, is a practical way to support the evaluation of successful information fusion. In order to convey the benefits of SR as a...
Convolutional neural networks (CNNs) have achieved impressive results on imbalanced image data, but they still have difficulty generalizing to minority classes and their decisions are difficult to interpret. These problems are related because the method by which CNNs generalize to minority classes, which requires improvement, is wrapped in a blackb...
Mobile health (mHealth) technologies offer an opportunity to enable the care and support of community-dwelling older adults, however, research examining the use of mHealth in delivering quality of life (QoL) improvements in the older population is limited. We developed a tablet application (eSeniorCare) based on the Successful Aging framework and i...
Graph Neural Networks (GNNs) have been widely used on graph data and have shown exceptional performance in the task of link prediction. Despite their effectiveness, GNNs often suffer from high latency due to non-trivial neighborhood data dependency in practical deployments. To address this issue, researchers have proposed methods based on knowledge...
Tools that can help older adults self-manage multiple health goals in collaboration with their care managers are rare to find. Informed by the Self-Determination Theory, Goal-Oriented Care paradigm and our prior findings, we used an iterative, user-centered process to design a tablet application to facilitate Goal-Oriented care in community-dwellin...
While Graph Neural Networks (GNNs) have demonstrated their efficacy in dealing with non-Euclidean structural data, they are difficult to be deployed in real applications due to the scalability constraint imposed by multi-hop data dependency. Existing methods attempt to address this scalability issue by training multi-layer perceptrons (MLPs) exclus...
Generative self-supervised learning (SSL), especially masked autoencoders, has become one of the most exciting learning paradigms and has shown great potential in handling graph data. However, real-world graphs are always heterogeneous, which poses three critical challenges that existing methods ignore: 1) how to capture complex graph structure? 2)...
Background
Febrile neutropenia (FN) is an early indicator of infection in oncology patients post-chemotherapy. We aimed to determine clinical predictors of septic shock and/or bacteremia in pediatric cancer patients experiencing FN and to create a model that classifies patients as low-risk for these outcomes.
Methods
This is a retrospective analys...
The self-supervised learning (SSL) paradigm is an essential exploration area, which tries to eliminate the need for expensive data labeling. Despite the great success of SSL methods in computer vision and natural language processing, most of them employ contrastive learning objectives that require negative samples, which are hard to define. This be...
Objectives
The impact and risk of SARS-CoV-2 transmission from asymptomatic and presymptomatic hosts remains an open question. This study measured the secondary attack rates (SARs) and relative risk (RR) of SARS-CoV-2 transmission from asymptomatic and presymptomatic index cases as compared with symptomatic index cases.
Methods
We used COVID-19 te...
The prevalence of wearable sensors ( e . g ., smart wristband) is creating unprecedented opportunities to not only inform health and wellness states of individuals, but also assess and infer personal attributes, including demographic and personality attributes. However, the data captured from wearables, such as heart rate or number of steps, presen...
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science. In particular, it encodes molecules as numerical vectors preserving the molecular structures and features, on top of which the downstream tasks (e.g., property prediction) can be performed. Recently, MRL has achieved consider...
Path-based relational reasoning over knowledge graphs has become increasingly popular due to a variety of downstream applications such as question answering in dialogue systems, fact prediction, and recommendation systems. In recent years, reinforcement learning (RL) based solutions for knowledge graphs have been demonstrated to be more interpretab...
Recipe recommendation systems play an essential role in helping people decide what to eat. Existing recipe recommendation systems typically focused on content-based or collaborative filtering approaches, ignoring the higher-order collaborative signal such as relational structure information among users, recipes and food items. In this paper, we for...
Learning effective recipe representations is essential in food studies. Unlike what has been developed for image-based recipe retrieval or learning structural text embeddings, the combined effect of multi-modal information (i.e., recipe images, text, and relation data) receives less attention. In this paper, we formalize the problem of multi-modal...
Graph representation learning has attracted tremendous attention due to its remarkable performance in many real-world applications. However, prevailing supervised graph representation learning models for specific tasks often suffer from label sparsity issue as data labeling is always time and resource consuming. In light of this, few-shot learning...
Terrorism is a major problem worldwide, causing thousands of fatalities and billions of dollars in damage every year. To address this threat, we propose a novel feature representation method and evaluate machine learning models that learn from localized news data in order to predict whether a terrorist attack will occur on a given calendar date and...
UNSTRUCTURED
Older adults remain susceptible to technological exclusion because digital tools aimed at them are often not informed by their needs and contexts. We used a Human-Centered Design (HCD) approach to implement a connected health system to support Goal-Oriented Care paradigm for community dwelling older adults. Along with 31 older adults,...
Graph neural networks (GNNs) continue to achieve state-of-the-art performance on many graph learning tasks, but rely on the assumption that a given graph is a sufficient approximation of the true neighborhood structure. In the presence of higher-order sequential dependencies, we show that the tendency of traditional graph representations to underfi...
Learning effective recipe representations is essential in food studies. Unlike what has been developed for image-based recipe retrieval or learning structural text embeddings, the combined effect of multi-modal information (i.e., recipe images, text, and relation data) receives less attention. In this paper, we formalize the problem of multi-modal...
Recipe recommendation systems play an essential role in helping people decide what to eat. Existing recipe recommendation systems typically focused on content-based or collaborative filtering approaches, ignoring the higher-order collaborative signal such as relational structure information among users, recipes and food items. In this paper, we for...
It is well known that unhealthy food consumption plays a significant role in dietary and lifestyle-related diseases. Therefore, it is important for researchers to examine methods that may encourage the consumer to consider healthier dietary and lifestyle habits as diseases such as obesity, heart disease, and high blood pressure remain a worldwide i...
Graph representation learning has attracted tremendous attention due to its remarkable performance in many real-world applications. However, prevailing (semi-)supervised graph representation learning models for specific tasks often suffer from label sparsity issue as data labeling is always time and resource consuming. In light of this, few-shot le...
Although some research highlights the benefits of behavioral routines for individual functioning, other research indicates that routines can reflect an individual's inflexibility and lower well-being. Given conflicting accounts on the benefits of routine, research is needed to examine how routineness versus flexibility in health-related behaviors c...
COVID-19 remains a global threat in the face of emerging SARS-CoV-2 variants and gaps in vaccine administration and availability. In this study, we analyze a data-driven COVID-19 testing program implemented at a mid-sized university, which utilized two simple, diverse, and easily interpretable machine learning models to predict which students were...
Representation learning has overcome the often arduous and manual featurization of networks through (unsupervised) feature learning as it results in embeddings that can apply to a variety of downstream learning tasks. The focus of representation learning on graphs has focused mainly on shallow (node-centric) or deep (graph-based) learning approache...
Despite over two decades of progress, imbalanced data is still considered a significant challenge for contemporary machine learning models. Modern advances in deep learning have further magnified the importance of the imbalanced data problem, especially when learning from images. Therefore, there is a need for an oversampling method that is specifi...
Recipe recommendation systems play an important role in helping people find recipes that are of their interest and fit their eating habits. Unlike what has been developed for recommending recipes using content-based or collaborative filtering approaches, the relational information among users, recipes, and food items is less explored. In this paper...
Dozens of terrorist attacks are perpetrated in the United States every year, often causing fatalities and other significant damage. Toward the end of better understanding and mitigating these attacks, we present a set of machine learning models that learn from localized news data in order to predict whether a terrorist attack will occur on a given...
Tablet technology and its associated applications have the potential to improve the quality of life of older adults. Current tablet usability studies involving older adults have been performed using qualitative measures focused on older generations of tablets, limited by its weight, power, resolution and availability of appropriate applications. We...
Tablets can open a new world for older adults and potentially improve their quality of life. We taught tablet skills to forty-two older adults, who were novice technology users. Sixteen socialized, group-based technology workshops were conducted and observational data was collected by the workshop facilitators. Thematic analysis revealed that older...
We hypothesize that behavioral patterns of people are reflected in how they interact with their mobile devices and that continuous sensor data passively collected from their phones and wearables can infer their job performance. Specifically, we study day-today job performance (improvement, no change, decline) of N=298 information workers using mobi...
Nucleosides are fundamental building blocks of DNA and RNA in all life forms and viruses. In addition, natural nucleosides and their analogs are critical in prebiotic chemistry, innate immunity, signaling, antiviral drug discovery and artificial synthesis of DNA / RNA sequences. Combined with the fact that quantitative structure activity relationsh...
Spread of nonindigenous organisms by shipping is one of the largest threats to coastal ecosystems. Limited monitoring and understanding of this phenomenon currently hinder development of effective prevention policies. Surveying ports in North America, South America, Europe, Southeast Asia, and Australia we explored environmental DNA community profi...
The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs o...
Natural language interfaces to databases is a growing field that enables end users to interact with relational databases without technical database skills. These interfaces solve the problem of synthesizing SQL queries based on natural language input from the user. There are considerable research interests around the topic but there are few systems...
To improve consumer engagement and satisfaction, online news services employ strategies for personalizing and recommending articles to their users based on their interests. In addition to news agencies’ own digital platforms, they also leverage social media to reach out to a broad user base. These engagement efforts are often disconnected with each...
A bstract
COVID-19 remains a global threat in the face of emerging SARS-CoV-2 variants and gaps in vaccine administration and availability, and organizations must be prepared to detect and mitigate its risk to their people and activities. In this report we share key lessons learned from an adaptive COVID-19 testing program implemented at a mid-size...
Representation learning on graphs has emerged as a powerful mechanism to automate feature vector generation for downstream machine learning tasks. The advances in representation on graphs have centered on both homogeneous and heterogeneous graphs, where the latter presenting the challenges associated with multi-typed nodes and/or edges. In this pap...
Importance: Asymptomatic and presymptomatic carriers of SARS-CoV-2 are an ongoing and significant risk for community spread of the virus, especially with the majority of the world still unvaccinated and new variants emerging.
Objective: To quantify the presence and effects of symptom presentation (or lack thereof) on the community transmission ofSA...
Most graph neural network models learn embeddings of nodes in static attributed graphs for predictive analysis. Recent attempts have been made to learn temporal proximity of the nodes. We find that real dynamic attributed graphs exhibit complex phenomenon of co-evolution between node attributes and graph structure. Learning node embeddings for fore...
People are looking for complementary contexts, such as team members of complementary skills for project team building and/or reading materials of complementary knowledge for effective student learning, to make their behaviors more likely to be successful. Complementarity has been revealed by behavioral sciences as one of the most important factors...
The self-supervised learning (SSL) paradigm is an essential exploration area, which tries to eliminate the need for expensive data labeling. Despite the great success of SSL methods in computer vision and natural language processing, most of them employ contrastive learning objectives that require negative samples, which are hard to define. This be...
Web personalization, e.g., recommendation or relevance search, tailoring a service/product to accommodate specific online users, is becoming increasingly important. Inductive personalization aims to infer the relations between existing entities and unseen new ones, e.g., searching relevant authors for new papers or recommending new items to users....
Negative life events, such as the death of a loved one, are an unavoidable part of life. These events can be overwhelmingly stressful and may lead to the development of mental health disorders. To mitigate these adverse developments, prior literature has utilized measures of psychological responses to negative life events to better understand their...
Chemical reactions are a complex process, as they involve interaction between several molecular compounds. As a result, predicting the success of a reaction is a non-trivial task, which often requires running several experiments in the lab. This process is is expensive, time consuming, and inefficient. As a result, in recent years, researchers have...
Documentation and review of patient heart rate are a fundamental process across a myriad of clinical settings. While historically recorded manually, bedside monitors now provide for the automated collection of such data. Despite the availability of continuous streaming data, patients' charts continue to reflect only a subset of this information as...
Despite over two decades of progress, imbalanced data is still considered a significant challenge for contemporary machine learning models. Modern advances in deep learning have magnified the importance of the imbalanced data problem. The two main approaches to address this issue are based on loss function modifications and instance resampling. Ins...
Assessment of individuals' job performance, personalized health and psychometric measures are domains where data-driven ubiquitous computing will have a profound impact in the near future. Existing work in these domains focus on techniques that use data extracted from questionnaires, sensors (wearable, computer, etc.), or other traits to assess wel...