Herna Lydia Viktor

Herna Lydia Viktor
  • PhD (Computer Science)
  • Professor at University of Ottawa

About

195
Publications
59,664
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,752
Citations
Introduction
I am a full professor of Computer Science at the School of Electrical Engineering and Computer Science, of the University of Ottawa. My areas of expertise are Applied AI and databases. Specifically, my team and I are working on machine learning algorithms, advances techniques for data-driven discovery and Big Data solutions for decision support.
Current institution
University of Ottawa
Current position
  • Professor

Publications

Publications (195)
Article
Full-text available
Lifelong Machine Learning (LML) denotes a scenario involving multiple sequential tasks, each accompanied by its respective dataset, in order to solve specific learning problems. In this context, the focus of LML techniques is on utilizing already acquired knowledge to adapt to new tasks efficiently. Essentially, LML concerns about facing new tasks...
Article
Full-text available
Recent breakthroughs in deep learning have revolutionized protein sequence and structure prediction. These advancements are built on decades of protein design efforts, and are overcoming traditional time and cost limitations. Diffusion models, at the forefront of these innovations, significantly enhance design efficiency by automating knowledge acq...
Article
Machine Learning’s widespread application owes to its ability to develop accurate and scalable models. In cyber-security, where labeled data is scarce, Semi-Supervised Learning (SSL) emerges as a potential solution. SSL excels at tasks challenging traditional supervised and unsupervised algorithms by leveraging limited labelled data alongside abund...
Article
Full-text available
Price prediction remains a crucial aspect of financial market research as it forms the basis for various trading strategies and portfolio management techniques. However, traditional models such as ARIMA are not effective for multi-horizon forecasting, and current deep learning approaches do not take into account the conditional heteroscedasticity o...
Article
Full-text available
Protein generation has numerous applications in designing therapeutic antibodies and creating new drugs. Still, it is a demanding task due to the inherent complexities of protein structures and the limitations of current generative models. Proteins possess intricate geometry, and sampling their conformational space is challenging due to its high di...
Article
Full-text available
The accuracy of price forecasts is important for financial market trading strategies and portfolio management. Compared to traditional models such as ARIMA and other state-of-the-art deep learning techniques, temporal Transformers with similarity embedding perform better for multi-horizon forecasts in financial time series, as they account for the...
Chapter
Protein structural properties are often determined by experimental techniques such as X-ray crystallography and nuclear magnetic resonance. However, both approaches are time-consuming and expensive. Conversely, protein amino acid sequences may be readily obtained from inexpensive high-throughput techniques, although such sequences lack structural i...
Article
Full-text available
Lifelong machine learning concerns the development of systems that continuously learn from diverse tasks, incorporating new knowledge without forgetting the knowledge they have previously acquired. Multi-label classification is a supervised learning process in which each instance is assigned multiple non-exclusive labels, with each label denoted as...
Article
Full-text available
This paper presents a new approach for protein generation based on one-shot learning and hybrid quantum neural networks. Given a single protein complex, the system learns how to predict the remaining unknown properties, without resorting to autoregression, from the physicochemical properties of the receptor and a prior on the physicochemical proper...
Article
Full-text available
The design of binder proteins for specific target proteins using deep learning is a challenging task that has a wide range of applications in both designing therapeutic antibodies and creating new drugs. Machine learning-based solutions, as opposed to laboratory design, streamline the design process and enable the design of new proteins that may be...
Article
Full-text available
Background Conducting clinical trials for traumatic spinal cord injury (tSCI) presents challenges due to patient heterogeneity. Identifying clinically similar subgroups using patient demographics and baseline injury characteristics could lead to better patient-centered care and integrated care delivery. Purpose We sought to (1) apply an unsupervis...
Chapter
Log sequences generated by heterogeneous systems are critical for understanding computer system behaviour and ensuring operational and security integrity. However, the diverse formats, structures, and content of logs pose challenges for traditional log anomaly detection approaches that rely on log parsing, which can be imperfect and incomplete in i...
Article
Background Traumatic spinal cord injuries (TSCI) greatly affect the lives of patients and their families. Prognostication may improve treatment strategies, health care resource allocation, and counseling. Multivariable clinical prediction models (CPMs) for prognosis are tools that can estimate an absolute risk or probability that an outcome will oc...
Article
Patient-reported outcome measures (PROMs) are an important metric to assess total knee arthroplasty (TKA) patients. The purpose of this study was to use a machine learning (ML) algorithm to identify patient features that impact PROMs after TKA.
Preprint
Full-text available
A common approach to quantifying model interpretability is to calculate faithfulness metrics based on iteratively masking input tokens and measuring how much the predicted label changes as a result. However, we show that such metrics are generally not suitable for comparing the interpretability of different neural text classifiers as the response t...
Article
Full-text available
Online supervised learning from fast-evolving data streams, particularly in domains such as health, the environment, and manufacturing, is a crucial research area. However, these domains often experience class imbalance, which can skew class distributions. It is essential for online learning algorithms to analyze large datasets in real-time while a...
Conference Paper
Studies of protein-protein interactions facilitate the development of new drugs and can aid understanding of the mechanisms behind disease pathogenesis. Finding the sites of interaction on the molecular surface is key to understanding protein-protein interactions and the role of molecular pathways. However, this is still an open area of research. T...
Article
Full-text available
In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security infrastructure enhancement through pattern recogniti...
Article
Full-text available
Research into Intrusion and Anomaly Detectors at the Host level typically pays much attention to extracting attributes from system call traces. These include window-based, Hidden Markov Models, and sequence-model-based attributes. Recently, several works have been focusing on sequence-model-based feature extractors, specifically Word2Vec and GloVe,...
Preprint
Full-text available
Artificial Intelligence and Machine Learning have witnessed rapid, significant improvements in Natural Language Processing (NLP) tasks. Utilizing Deep Learning, researchers have taken advantage of repository comments in Software Engineering to produce accurate methods for detecting Self-Admitted Technical Debt (SATD) from 20 open-source Java projec...
Chapter
Protein-protein interactions play an important role in the development of new therapeutic treatments and prophylactic vaccines. For instance, the efficacy of a vaccine strongly depends to what extent an antibody may form a stable bond with an antigen. In-laboratory experiments are both time-consuming and expensive, which limits their scope to only...
Article
Full-text available
Proteins mainly perform their functions by interacting with other proteins. Protein–protein interactions underpin various biological activities such as metabolic cycles, signal transduction, and immune response. However, due to the sheer number of proteins, experimental methods for finding interacting and non-interacting protein pairs are time-cons...
Preprint
Full-text available
We apply a large multilingual language model (BLOOM-176B) in open-ended generation of Chinese song lyrics, and evaluate the resulting lyrics for coherence and creativity using human reviewers. We find that current computational metrics for evaluating large language model outputs (MAUVE) have limitations in evaluation of creative writing. We note th...
Article
Full-text available
Machine-generated text is increasingly difficult to distinguish from text authored by humans. Powerful open-source models are freely available, and user-friendly tools that democratize access to generative models are proliferating. ChatGPT, which was released shortly after the first edition of this survey, epitomizes these trends. The great potenti...
Article
Full-text available
Flattening shapes without distortion is a problem that has been intriguing scientists for centuries. It is a fundamental problem of high importance in computer vision as many approaches may greatly benefit from its implementation. This paper introduces a new approach that allows flattening without distortion, by transforming the shape from Riemanni...
Chapter
Recently, there has been growing interest in fairness considerations in Artificial Intelligence (AI) and AI-based systems, as the decisions made by AI applications may negatively impact individuals and communities with ethical or legal consequences. Indeed, it is crucial to ensure that decisions based on AI-based systems do not reflect discriminato...
Preprint
Full-text available
Advances in natural language generation (NLG) have resulted in machine generated text that is increasingly difficult to distinguish from human authored text. Powerful open-source models are freely available, and user-friendly tools democratizing access to generative models are proliferating. The great potential of state-of-the-art NLG systems is te...
Chapter
Price prediction is essential in financial market research, as it is often used as a primary component for trading strategy or portfolio management specialisations. As these strategies rely on more than one future prediction point, the accuracy of a multi-horizon forecast is very important. Classical models, such as autoregressive integrated moving...
Article
Full-text available
Most proteins perform their biological function by interacting with themselves or other molecules. Thus, one may obtain biological insights into protein functions, disease prevalence, and therapy development by identifying protein–protein interactions (PPI). However, finding the interacting and non-interacting protein pairs through experimental app...
Article
Full-text available
The widespread usage of machine learning in different mainstream contexts has made deep learning the technique of choice in various domains, including finance. This systematic survey explores various scenarios employing deep learning in financial markets, especially the stock market. A key requirement for our methodology is its focus on research pa...
Chapter
Log parsing is the process of extracting logical units from system, device or application generated logs. It holds utmost importance in the field of log analytics and forensics. Many security analytic tools rely on logs to detect, prevent and mitigate attacks. It is critical for these tools to extract information from large volumes of logs from mul...
Article
Full-text available
Due to the rapid technological advances that have been made over the years, more people are changing their way of living from traditional ways of doing business to those featuring greater use of electronic resources. This transition has attracted (and continues to attract) the attention of cybercriminals, referred to in this article as “attackers”,...
Preprint
Full-text available
The detection of computer-generated text is an area of rapidly increasing significance as nascent generative models allow for efficient creation of compelling human-like text, which may be abused for the purposes of spam, disinformation, phishing, or online influence campaigns. Past work has studied detection of current state-of-the-art models, but...
Chapter
Online semi-supervised learning (SSL) from data streams is an emerging area of research with many applications due to the fact that it is often expensive, time-consuming, and sometimes even unfeasible to collect labelled data from streaming domains. State-of-the-art online SSL algorithms use clustering techniques to maintain micro-clusters, or, alt...
Article
Full-text available
The classification of deformable protein shapes, based solely on their macromolecular surfaces, is a challenging problem in proteinprotein interaction prediction and protein design. Shape classification is made difficult by the fact that proteins are dynamic, flexible entities with high geometrical complexity. In this paper, we introduce a novel de...
Article
Full-text available
This work introduces novel approaches, based on geometrical deep learning, for predicting protein–protein interactions. A dataset containing both interacting and non-interacting proteins is selected from the Negatome Database. Interactions are predicted from a graph representing the proteins’ three-dimensional macromolecular surfaces. The nodes are...
Chapter
Online influence operations (OIOs) present a serious threat to the integrity of online social spaces and to real-world democratic elections. While many OIO detection approaches have focused on classification algorithms for individual social media posts (often with artificially balanced datasets), we present a novel system centering around a human a...
Article
In recommendation systems, the grey-sheep problem refers to users with unique preferences and tastes that make it difficult to develop accurate profiles. That is, the similarity search approach typically followed during the recommendation process fails to yield good results. Most research does not focus on such users and thus fails to cater to more...
Chapter
In e-business, recommender systems have been instrumental in guiding users through their online experiences. However, these systems are often limited by the lack of labels data and data sparsity. Increasingly, data-mining techniques are utilized to address this issue. In most research, recommendations to be made are achieved via supervised learning...
Chapter
Mining data streams has become an important topic due to the increased availability of vast amounts of online data. In such incremental learning scenarios, observations arrive in a sequence over time and are subject to changes in data distributions, also known as concept drifts. Interleaved test-then-train evaluations are often used during supervis...
Chapter
Full-text available
Recommendation systems, which are employed to mitigate the information overload e-commerce users face, have succeeded in aiding customers during their online shopping experience. However, to be able to make accurate recommendations, these systems require information about the items for sale and about users’ individual preferences. Making recommenda...
Preprint
The detection of clandestine efforts to influence users in online communities is a challenging problem with significant active development. We demonstrate that features derived from the text of user comments are useful for identifying suspect activity, but lead to increased erroneous identifications when keywords over-represented in past influence...
Chapter
The MapReduce programming paradigm is a prominent model for expressing parallel computations, especially in the context of data processing of vast data sets. However, modern data processing runtimes, implementing the MapReduce programming paradigm, do not generally support the use of arbitrary programming languages. Access to programming-language i...
Preprint
In machine learning, the one-class classification problem occurs when training instances are only available from one class. It has been observed that making use of this class's structure, or its different contexts, may improve one-class classifier performance. Although this observation has been demonstrated for static data, a rigorous application o...
Poster
Full-text available
Objectives This study aims to assess the psychosocial risk factors and resettlement stress relationships to cardiovascular health among adult immigrant (Figure 1) who landed in Canada after 1985. Furthermore, to develop Machine Learning (ML) prediction models based on pre and post-immigration data to predict the risk of CVD for new arrivals of adul...
Chapter
Clustering naturally addresses many of the challenges of data streams and many data stream clustering algorithms (DSCAs) have been proposed. The literature does not, however, provide quantitative descriptions of how these algorithms behave in different circumstances. In this paper we study how the clusterings produced by different DSCAs change, rel...
Conference Paper
Recommendation systems, which are employed to mitigate the information overload faced by e-commerce users, have succeeded in aiding customers during their online shopping experience. However, to be able to make accurate recommendations, these systems require information about the items for sale and information about users’ individual preferences. M...
Article
Full-text available
The last decade has seen a surge of interest in adaptive learning algorithms for data stream classification, with applications ranging from predicting ozone level peaks, learning stock market indicators, to detecting computer security violations. In addition, a number of methods have been developed to detect concept drifts in these streams. Conside...
Article
Full-text available
Ab initio molecular dynamics is an irreplaceable technique for the realistic simulation of complex molecular systems and processes from first principles. This paper proposes a comprehensive and self-contained review of ab initio molecular dynamics from a computational perspective and from first principles. Quantum mechanics is presented from a mole...
Article
Full-text available
The success of data stream mining techniques has allowed decision makers to analyze their data in multiple domains, ranging from monitoring network intrusion to financial markets analysis and online sales transactions exploration. Specifically, online ensembles that construct accurate models against drifting data streams have been developed. Recent...
Article
Full-text available
Increasingly, Internet of Things (IoT) domains, such as sensor networks, smart cities, and social networks, generate vast amounts of data. Such data are not only unbounded and rapidly evolving. Rather, the content thereof dynamically evolves over time, often in unforeseen ways. These variations are due to so-called concept drifts, caused by changes...
Conference Paper
The identification of changes in data distributions associated with data streams is critical in understanding the mechanics of data generating processes and ensuring that data models remain representative through time. To this end, concept drift detection methods often utilize statistical techniques that take numerical data as input. However, many...
Preprint
The last decade has seen a surge of interest in adaptive learning algorithms for data stream classification, with applications ranging from predicting ozone level peaks, learning stock market indicators, to detecting computer security violations. In addition, a number of methods have been developed to detect concept drifts in these streams. Conside...
Chapter
Data mining has been successfully applied in many businesses, thus aiding managers to make informed decisions that are based on facts, rather than having to rely on guesswork and incorrect extrapolations. Data mining algorithms equip institutions to predict the movements of financial indicators, enable companies to move towards more energy-efficien...
Conference Paper
Adaptive online learning algorithms have been successfully applied to fast-evolving data streams. Such streams are susceptible to concept drift, which implies that the most suitable type of classifier often changes over time. In this setting, a system that is able to seamlessly select the type of learner that presents the current “best” model holds...
Conference Paper
Decision makers increasingly require near-instant models to make sense of fast evolving data streams. Learning from such evolving environments is, however, a challenging task. This challenge is partially due to the fact that the distribution of data often changes over time, thus potentially leading to degradation in the overall performance. In part...
Conference Paper
Full-text available
Recently, there is a growing trend to utilize data mining algorithms to explore datasets being modeled using graphs. In most cases, these graphs evolve over time, thus exhibiting more complex patterns and relationships among nodes. In particular , social networks are believed to manifest the preferential attachment property which assumes that new g...
Conference Paper
Online ensemble methods have been very successful to create accurate models against data streams that are susceptible to concept drift. The success of data stream mining has allowed diverse users to analyse their data in multiple domains, ranging from monitoring stock markets to analysing network traffic and exploring ATM transactions. Increasingly...
Chapter
Selecting the optimal subset of views for materialization provides an effective way to reduce the query evaluation time for real-time Online Analytic Processing (OLAP) queries posed against a data warehouse. However, materializing a large number of views may be counterproductive and may exceed storage thresholds, especially when considering very la...
Conference Paper
Twitter feeds provide data scientists with a large repository for entity based sentiment analysis. Specifically, the tweets of individual users may be used in order to track the ebb and flow of their sentiments and opinions. However, this domain poses a challenge for traditional classifiers, since the vast majority of tweets are unlabeled. Further,...
Article
Full-text available
Macromolecular structures, such as neuraminidases, hemagglutinins, and monoclonal antibodies, are not rigid entities. Rather, they are characterised by their flexibility, which is the result of the interaction and collective motion of their constituent atoms. This conformational diversity has a significant impact on their physicochemical and biolog...
Conference Paper
Class imbalance is a crucial problem in machine learning and occurs in many domains. Specifically, the two-class problem has received interest from researchers in recent years, leading to solutions for oil spill detection, tumour discovery and fraudulent credit card detection, amongst others. However, handling class imbalance in datasets that conta...
Article
Acquisition systems based on laser triangulation or structured light are becoming commonplace in anthropometry. Such systems allow one to capture very detailed data to be used when addressing the sizing problem. This chapter introduces state-of-the-art approaches to describe, to segment and to cluster the data acquired by such systems. We describe...
Conference Paper
Imbalanced data, where the number of instances of one class is much higher than the others, are frequent in many domains such as fraud detection, telecommunications management, oil spill detection, and text classification. Traditional classifiers do not perform well when considering data that are susceptible to both within-class and between-class i...
Conference Paper
In data warehousing, selecting a subset of views for materialization has been widely employed as a way to reduce the query evaluation time for real-time OLAP queries. However, materialization of a large number of views may be counterproductive and may exceed storage thresholds, especially when considering very large data warehouses. Thus, an import...
Conference Paper
Finding correspondences between deformable objects has wide application in many domains. In information retrieval, researchers may be interested in finding similar objects, while computer animation experts may be considering ways to morph shapes. The correspondence problem is especially challenging when the objects under consideration are suspect t...
Conference Paper
Non-rigid shapes are generally known as objects where the three dimensional geometry may deform by internal and/or external forces. Deformable shapes are all around us, ranging from macromolecules, to natural objects such as the trees in the forest or the fruits in our gardens, and even human bodies. The development of measurements to accurately de...
Article
Full-text available
Meta-model merging is the process of incorporating data models into an integrated, consistent model, against which accurate queries may be processed. The efficiency of such a process is very much reliant on effective semantic representation of chosen data models, as well as the mapping relationships between the schema and data instance elements of...
Conference Paper
The protein docking problem refers to the task of predicting the appropriate matching of one protein molecule (the receptor) to another (the ligand), when attempting to bind them to form a stable complex. Research shows that matching the three-dimensional geometric structures of proteins plays a key role in determining a so-called docking pair. How...
Conference Paper
Full-text available
Recommender Systems have been applied in a large number of domains. However, current approaches rarely consider multiple criteria or the level of mobility and location of a user. In this paper, we introduce a novel algorithm to construct personalized multi-criteria Recommender Systems. Our algorithm incorporates the user's current context, and tech...
Conference Paper
Recently, a number of researchers have turned their attention to the creation of isometrically invariant shape descriptors based on the heat equation. The reason for this surge in interest is that the Laplace-Beltrami operator, associated with the heat equation, is highly dependent on the topology of the underlying manifold, which may lead to the c...
Conference Paper
Research has shown that the functionalities of proteins are largely influenced by their three dimensional (3D) shapes. This observation is especially relevant in drug design, where the knowledge of the 3D structure of a protein enables pharmacologists to select the best binding proteins when aiming to moderate functions. However, a relatively small...
Article
Multirelational classification aims to discover patterns across multiple interlinked tables (relations) in a relational database. In many large organizations, such a database often spans numerous departments and/or subdivisions, which are involved in different aspects of the enterprise such as customer profiling, fraud detection, inventory manageme...
Conference Paper
Full-text available
Meta-model merging is the process of incorporating data models into an integrated, consistent model against which accurate queries may be processed. Within the data warehousing domain, the integration of data marts is often time-consuming. In this paper, we introduce an approach for the integration of relational star schemas, which are instances of...

Network

Cited By