About
238
Publications
48,917
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,546
Citations
Publications
Publications (238)
Dynamic concepts in time series are crucial for understanding complex systems such as financial markets, healthcare, and online activity logs. These concepts help reveal structures and behaviors in sequential data for better decision-making and forecasting. However, existing models often struggle to detect and track concept drift due to limitations...
Kernel-based subspace clustering, which addresses the nonlinear structures in data, is an evolving area of research. Despite noteworthy progressions, prevailing methodologies predominantly grapple with limitations relating to (i) the influence of predefined kernels on model performance; (ii) the difficulty of preserving the original manifold struct...
In the realm of time series analysis, tackling the phenomenon of concept drift poses a significant challenge. Concept drift -- characterized by the evolving statistical properties of time series data, affects the reliability and accuracy of conventional analysis models. This is particularly evident in co-evolving scenarios where interactions among...
Jailbreak prompts pose a significant threat in AI and cybersecurity, as they are crafted to bypass ethical safeguards in large language models, potentially enabling misuse by cybercriminals. This paper analyzes jailbreak prompts from a cyber defense perspective, exploring techniques like prompt injection and context manipulation that allow harmful...
Dynamic concepts in time series are crucial for understanding complex systems such as financial markets, healthcare, and online activity logs. These concepts help reveal structures and behaviors in sequential data for better decision-making and forecasting. Existing models struggle with detecting and tracking concept drift due to limitations in int...
Forecasting in probabilistic time series is a complex endeavor that extends beyond predicting future values to also quantifying the uncertainty inherent in these predictions. Gaussian process regression stands out as a Bayesian machine learning technique adept at addressing this multifaceted challenge. This paper introduces a novel approach that bl...
The increasing sophistication of cyber threats necessitates innovative approaches to cybersecurity. In this paper, we explore the potential of psychological profiling techniques, particularly focusing on the utilization of Large Language Models (LLMs) and psycholinguistic features. We investigate the intersection of psychology and cybersecurity, di...
Kolmogorov-Arnold Networks (KAN) is a groundbreaking model recently proposed by the MIT team, representing a revolutionary approach with the potential to be a game-changer in the field. This innovative concept has rapidly garnered worldwide interest within the AI community. Inspired by the Kolmogorov-Arnold representation theorem, KAN utilizes spli...
Predicting the time for an event to occur while simultaneously exploring the coexisting effects of various risk factors has captivated considerable research interest. However, the profusion of repeated measurements involving a diverse array of risk factors has outpaced the capabilities of current methods for analyzing time-to-event data. In this pa...
Children diagnosed with Autism Spectrum Disorder (ASD) often exhibit agitated behaviors that can isolate them from their peers. This study aims to examine if wearable data, collected during everyday activities, could effectively detect such behaviors. First, we used the Empatica E4 device to collect real data including Blood Volume Pulse (BVP), Ele...
Virtually all countries in the world are experiencing growth in the number and proportion of seniors in their population. Almost half of these seniors live with one or more disabling conditions. This highlights the concern about when, and how probably, a disability is likely to occur in aging people. In this paper, we mathematicize this concern as...
We investigate the problem of discovering and modeling regime shifts in an ecosystem comprising multiple time series known as co-evolving time series. Regime shifts refer to the changing behaviors exhibited by series at different time intervals. Learning these changing behaviors is a key step toward time series forecasting. While advances have been...
Stock trend prediction has received a significant amount of attention in recent years. Existing methods could not exploit the peculiar trends for prediction, which are valuable in rising-falling trend analysis for short-term or long-term investments. In this paper, we propose an integrated model that can discover peculiar trend patterns for stock t...
This paper scrutinizes a database of over 4900 YouTube videos to characterize financial market coverage. Financial market coverage generates a large number of videos. Therefore, watching these videos to derive actionable insights could be challenging and complex. In this paper, we leverage Whisper, a speech-to-text model from OpenAI, to generate a...
Machine learning (ML) algorithms have become popular in recent years and have found increasing utility in the field of medical imaging, specifically in positron emission tomography (PET) imaging. The interest in ML in PET imaging for the study of neurodegenerative diseases stems from the potential of these techniques to analyze and predict the phys...
Moral foundations theory helps understand differences in morality across cultures. Web trending topics assemble diverse opinions on the matters covered in the community. Detecting moral foundations within trending topics-related opinions can be of crucial importance in preventing moral shock and outrage, and extreme actions. In this paper, we propo...
Regime switching analysis is extensively advocated in many fields to capture complex behaviors underlying an ecosystem, such as the economic or financial system. A regime can be defined as a specific group of complex patterns that share common characteristics in a specific time interval. Regime switch, caused by external and/or internal drivers, re...
For medical treatments, pain is often measured by self-report. However, the current subjective pain assessment highly depends on the patient’s response and is therefore unreliable. In this paper, we propose a physiological-signals-based objective pain recognition method that can extract new features, which have never been discovered in pain detecti...
Sequence representation, which is aimed at embedding sequentially symbolic data in a real space, is a foundational task in sequence pattern recognition. It is a difficult problem due to the challenges entailed in learning the intrinsic structural features within sequences in small sample size cases, in an unsupervised way. In this paper, we propose...
The new coronavirus outbreak has been officially declared a global pandemic by the World Health Organization. To grapple with the rapid spread of this ongoing pandemic, most countries have banned indoor and outdoor gatherings and ordered their residents to stay home. Given the developing situation with coronavirus, mental health is an important cha...
Psychology research findings suggest that personality is related to differences in friendship characteristics and that some personality traits correlate with linguistic behavior. In this paper, we investigate the influence that personality may have on affinity formation. To this end, we derive affinity relationships from social media interactions,...
Multi-view clustering, which optimally integrates complementary information from different views to improve clustering performance, has drawn considerable attention in recent years. Despite recent advances, issues remain when dealing with data of high dimensionality and heterogeneity, especially in categorical sequences. These unique challenges and...
Markov models are extensively used for categorical sequence clustering and classification due to their inherent ability to capture complex chronological dependencies hidden in sequential data. Existing Markov models are based on an implicit assumption that the probability of the next state depends on the preceding context/pattern which is consist o...
We investigate the problem of discovering and modeling regime shifts in an ecosystem comprising multiple time series known as co-evolving time series. Regime shifts refer to the changing behaviors exhibited by series at different time intervals. Learning these changing behaviors is a key step toward time series forecasting. While advances have been...
Most existing approaches for electricity load forecasting perform the task based on overall electricity consumption. However, using such a global methodology can affect load forecasting accuracy, as it does not consider the possibility that customers’ consumption behavior may change at any time. Predicting customers’ electricity consumption in the...
The new coronavirus outbreak has been officially declared a global pandemic by the World Health Organization. To grapple with the rapid spread of this ongoing pandemic, most countries have banned indoor and outdoor gatherings and ordered their residents to stay home. Given the developing situation with coronavirus, mental health is an important cha...
Moral foundations theory helps understand differences in morality across cultures. In this paper, we propose a model to predict moral foundations (MF) from social media trending topics. We also investigate whether differences in MF influence emotional traits. Our results are promising and leave room for future research avenues.
This paper presents a new approach for cross community mining and discovery using topic modeling. Our approach identifies automatically the communities in a dataset in an unsupervised way and extracts relationships between these communities. These relationships represent the interaction between communities which helps to identify the cross communit...
The concept of affinity relationship discovery is relatively new in the context of online discussion communities and there has been little work addressing it to date. This problem entails finding affinity relationships in a community by combining structural features and the content of interactions. Affinity discovery seeks not only to identify thes...
Time-to-event prediction has been an important practical task for longitudinal studies in many fields such as manufacturing, medicine, and healthcare. While most of the conventional survival analysis approaches suffer from the presence of censored failures and statistically circumscribed assumptions, few attempts have been made to develop survival...
This paper addresses the problem of discovering
hidden affinity relationships in online communities. Online discussions assemble people to talk about various types of topics and
to share information. People progressively develop the affinity,
and they get closer as frequently as they mention themselves in
messages and they send positive messages to...
Chronic obstructive pulmonary disease (COPD) yields a high rate of failures such as hospital readmission and death in the United States, Canada and worldwide. COPD failure imposes a significant social and economic burden on society, and predicting such failure is crucial to early intervention and decision-making, making this a very important resear...
Chronic obstructive pulmonary disease (COPD) yields a high rate of failures such as hospital readmission and death in the United States, Canada and worldwide. COPD failure imposes a significant social and economic burden on society, and predicting such failure is crucial to early intervention and decision-making, making this a very important resear...
Remaining useful life (RUL) prediction has been a topic of practical interest in many fields involving preventive intervention, including manufacturing, medicine and healthcare. While most of the conventional approaches suffer from censored failures arising and statistically circumscribed assumptions, few attempts have been made to predict RUL by d...
In real-world social networks, there is an increasing interest in tracking the evolution of groups of users and detecting the various changes they are liable to undergo. Several approaches have been proposed for this. In studying these approaches, we observed that most of them use a two-stage process. In the first stage, they run an algorithm to id...
Discovering function-related structural features, such as the cloverleaf shape of transfer RNA secondary structures, is essential to understand RNA function. With this aim, we have developed a platform, named Structurexplor, to facilitate the exploration of structural features in populations of RNA secondary structures. It has been designed and dev...
We study the problem of privacy preservation in multiple independent data publishing. An attack on personal privacy which uses independent datasets is called a composition attack. For example, a patient might have visited two hospitals for the same disease, and his information is independently anonymized and distributed by the two hospitals. Much o...
Text categorization is widely characterized as a multi-label classification problem. Robust modeling of the semantic similarity between a query text and training texts is essential to construct an effective and accurate classifier. In this paper, we systematically investigate the Web page/text classification problem via integrating sparse represent...
Clustering high-dimensional data is a challenging task in data mining, and clustering high-dimensional categorical data is even more challenging because it is more difficult to measure the similarity between categorical objects. Most algorithms assume feature independence when computing similarity between data objects, or make use of computationall...
Discovering function-related structural features, such as the cloverleaf shape of transfer RNA secondary structures, is essential to understand RNA function. With this aim, we have developed a platform, named Structurexplor, to facilitate the exploration of structural features in populations of RNA secondary structures. It has been designed and dev...
In dynamic social networks, communities may undergo various changes over time. For example, a community may split into several other communities, expand into a larger community, or shrink to a smaller community, or several communities may merge into one community. This is an important and difficult issue in the study of social networks. In the curr...
Users in real-world social networks are organized into communities that differ from each other in terms of influence, authority, interest, size, etc. This article addresses the problems of detecting communities of authority and of estimating the influence of such communities in dynamic social networks. These are new issues that have not yet been ad...
The presence of complex distributions of samples concealed in high-dimensional, massive sample-size data challenges all of the current classification methods for data mining. Samples within a class usually do not uniformly fill a certain (sub)space but are individually concentrated in certain regions of diverse feature subspaces, revealing the clas...
Background
Adverse events (AEs) in acute care hospitals are frequent and associated with significant morbidity, mortality, and costs. Measuring AEs is necessary for quality improvement and benchmarking purposes, but current detection methods lack in accuracy, efficiency, and generalizability. The growing availability of electronic health records (E...
Motivation:
Comparing ribonucleic acid (RNA) secondary structures of arbitrary size uncovers structural patterns that can provide a better understanding of RNA functions. However, performing fast and accurate secondary structure comparisons is challenging when we take into account the RNA configuration (i.e. linear or circular), the presence of ps...
Motivation:
The identification of contaminating sequences in a de novo assembly is challenging because of the absence of information on the target species. For sample types where the target organism is impossible to isolate from its matrix, such as endoparasites, endosymbionts and soil-harvested samples, contamination is unavoidable. A few post-as...
Survival prediction is crucial to healthcare research, but is confined primarily to specific types of data involving only the present measurements. This paper considers the more general class of healthcare data found in practice, which includes a wealth of intermittently varying historical measurements in addition to the present measurements. Makin...
Survival prediction is crucial to healthcare research, but is confined primarily to specific types of data involving only time-invariant or synchronously time-varying measurements. This paper considers the more general class of intermittently varying data found in practice, which includes a wealth of unaligned historical measurements. Making surviv...
This poster paper presents an approach for tracking community structures. In contrast to the vast majority of existing methods, which are based on time-to-time consecutive evaluation, the proposed approach uses a similarity measure that involves the global temporal aspect of the network under investigation. A notable feature of our approach is that...
Table S1 Results from structural equations models showing lagged effects of dysregulation across systems.
Appendix S1 Additional details on data sets, health outcomes measures, and structural equations models.
Fig. S1 Correlations among dysregulation scores of the a priori systems. The only difference from Fig. 1 in the main text is we did not adjust for age.
In real-world social networks, there is increasing interest in tracking the evolution of groups of users. Existing approaches track evolving communities, in a time-sequential way, by comparing communities in terms of nodes using a similarity measure such as the Jaccard or a modified Jaccard measure. The measure allows the use of a one-to-one compar...
Categorical data clustering is an important subject in pattern recognition. Currently, subspace clustering of categorical data remains an open problem due to the difficulties in estimating attribute interestingness according to the statistics of categories in clusters. In this paper, a new algorithm is proposed for clustering categorical data with...
An increasing number of aging researchers believes that multi-system physiological dysregulation may be a key biological mechanism of aging, but evidence of this has been sparse. Here, we used biomarker data on nearly 33 000 individuals from four large datasets to test for the presence of multi-system dysregulation. We grouped 37 biomarkers into si...
This paper addresses a new problem concerning the evolution of influence relationships between communities in dynamic social networks. A weighted temporal multigraph is employed to represent the dynamics of the social networks and analyze the influence relationships between communities over time. To ensure the interpretability of the knowledge disc...
This paper addresses a new problem concerning the
evolution of influence relationships between communities
in dynamic social networks. A weighted temporal
multigraph is employed to represent the dynamics
of the social networks and analyze the influence relationships
between communities over time. To ensure the
interpretability of the knowledge disc...
This paper presents a novel and practical model for behavioral user profile modeling using causal relationships. In this model, causal relationships, which represent the influence among variables, are discovered from event sequences representing users behaviors, and used for modeling behavioral user profiles. Our model first discovers significant p...
Kernel-based methods have become popular in machine learning; however, they are typically designed for numeric data. These methods are established in vector spaces, which are undefined for categorical data. In this paper, we propose a new kind of kernel trick, showing that mapping of categorical samples into kernel spaces can be alternatively descr...
Clustering categorical sequences is an important and difficult data mining task. Despite recent efforts, the challenge remains, due to the lack of an inherently meaningful measure of pairwise similarity. In this paper, we propose a novel variable-order Markov framework, named weighted conditional probability distribution (WCPD), to model clusters o...
Un nouveau modèle pour la détection automatique des bateaux cibles dans les images SAR est developpé dans ce papier. Le modèle utilise une méthode statistique combinée avec les techniques de traitement d'images. Le réseau de neurones probabilistiques (PNN) est un modèle très efficace pour la classification des données. Il est basé sur l'approche no...
Viroids are small circular single-stranded infectious RNAs characterized by a relatively high mutation level. Knowledge