
Patricia Jean Riddle- University of Auckland
Patricia Jean Riddle
- University of Auckland
About
79
Publications
7,173
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,202
Citations
Current institution
Publications
Publications (79)
The distribution of streaming data often changes over time as conditions change, a phenomenon known as concept drift. Only a subset of previous experience, collected in similar conditions, is relevant to learning an accurate classifier for current data. Learning from irrelevant experience describing a different concept can degrade performance. A sy...
Missing data imputation for location-based sensor data has attracted much attention in recent years. The state-of-the-art imputation methods based on graph neural networks have a priori assumption that the spatial correlations between sensor locations are static. However, real-world data sets often exhibit dynamic spatial correlations. This paper p...
Learning from streaming data is challenging as the distribution of incoming data may change over time, a phenomenon known as concept drift. The predictive patterns, or experience learned under one distribution may become irrelevant as conditions change under concept drift, but may become relevant once again when conditions reoccur. Adaptive learnin...
Background Automated clinical decision support for risk assessment is a powerful tool in combating cardiovascular disease (CVD), enabling targeted early intervention that could avoid issues of overtreatment or undertreatment. However, current CVD risk prediction models use observations at baseline without explicitly representing patient history as...
Continual learning of a stream of tasks is an active area in deep neural networks. The main challenge investigated has been the phenomenon of catastrophic forgetting or interference of newly acquired knowledge with knowledge from previous tasks. Recent work has investigated forward knowledge transfer to new tasks. Backward transfer for improving kn...
Attention describes cognitive processes that are important to many human phenomena including reading. The term is also used to describe the way in which transformer neural networks perform natural language processing. While attention appears to be very different under these two contexts, this paper presents an analysis of the correlations between t...
Data collected over time often exhibit changes in distribution, or concept drift, caused by changes in factors relevant to the classification task, e.g. weather conditions. Incorporating all relevant factors into the model may be able to capture these changes, however, this is usually not practical. Data stream based methods, which instead explicit...
A data stream is a sequence of observations produced by a generating process which may evolve over time. In such a time-varying stream the relationship between input features and labels, or concepts, can change. Adapting to changes in concept is most often done by destroying and incrementally rebuilding the current classifier. Many systems addition...
Learning a sequence of tasks is a long-standing challenge in machine learning. This setting applies to learning systems that observe examples of a range of tasks at different points in time. A learning system should become more knowledgeable as more related tasks are learned. Although the problem of learning sequentially was acknowledged for the fi...
The Healthcare system is exposed to the increasing impact of chronic diseases including cardiovascular diseases; it is of much importance to analyze and understand the health trajectories for efficient planning and fair allotment of resources. This work proposes an approach based on mining clinical data to support the exploration of health trajecto...
Chronic conditions, especially cardiovascular disease account for a large burden on modern healthcare systems. These conditions are by their nature ones that unfold over a long period of time, typically involving many healthcare events, treatments and changes of patient status. The gold standard in public health informatics for risk assessment is r...
Neural networks are known to be very sensitive to the initial weights. There has been a lot of research on initialization that aims to stabilize the training process. However, very little research has studied the relationship between initialization and generalization. We demonstrate that poorly initialized model will lead to lower test accuracy. We...
Urban air pollution poses a significant global health risk, but due to the high expense of measuring air quality, the amount of available data on pollutant exposure has generally been wanting. In recent years this has motivated the development of several cheap, portable air quality monitoring instruments. However, these instruments also tend to be...
In our research, we consider transfer learning scenarios where a target learner does not have access to the source data, but instead to hypotheses or models induced from it. This is called the Hypothesis Transfer Learning (HTL) problem. Previous approaches concentrated on transferring source hypotheses as a whole. We introduce a novel method for se...
High utility itemset mining is the problem of finding sets of items whose utilities are higher than or equal to a specific threshold. We propose a novel technique called mHUIMiner, which utilises a tree structure to guide the itemset expansion process to avoid considering itemsets that are nonexistent in the database. Unlike current techniques, it...
Energy-based Markov–Gibbs random field (MGRF) image models describe images by statistics of localised features; hence selecting statistics is crucial. This paper presently a procedure for searching much broader than typical families of linear-filter-based statistics, by alternately optimising in continuous parameter space and discrete graphical str...
The evolution of data or concept drift is a common phenomena in data streams. Currently most drift detection methods are able to locate the point of drift, but are unable to provide important information on the characteristics of change such as the magnitude of change which we refer to as drift severity. Monitoring drift severity provides crucial i...
Currently, Markov-Gibbs random field (MGRF) image models which include
high-order interactions are almost always built by modelling responses of a
stack of local linear filters. Actual interaction structure is specified
implicitly by the filter coefficients. In contrast, we learn an explicit
high-order MGRF structure by considering the learning pro...
Currently, Markov-Gibbs random field (MGRF) image models which include high-order interactions are almost always built by modelling responses of a stack of local linear filters. Actual interaction structure is specified implicitly by the filter coefficients. In contrast, we learn an explicit high-order MGRF structure by considering the learning pro...
We introduce a new simple framework for texture modelling with Markov--Gibbs random fields (MGRF). The framework learns texture-specific high order pixel interactions described by feature functions of signal patterns. Currently, modelling of high order interactions is almost exclusively achieved by linear filtering. Instead we investigate `binary p...
Optimization based techniques have emerged as important methods to tackle the problems of efficiency and accuracy in data mining. One of the current application areas is outlier detection that has not been fully explored yet but has enormous potential. Web bots are an example of outliers, which can be found in the web usage analysis process. Web bo...
Optimization based pattern discovery has emerged as an important field in knowledge discovery and data mining (KDD), and has been used to enhance the efficiency and accuracy of clustering, classification, association rules and outlier detection. Cluster analysis, which identifies groups of similar data items in large datasets, is one of its recent...
Association Rule Mining (ARM) is a promising method to provide insights for better management of chronic diseases. However, ARM tends to give an overwhelming number of rules, leading to the long-standing problem of identifying the 'interesting' rules for knowledge discovery. Therefore, this paper proposes a hybrid clustering-ARM approach to gain in...
Recommender systems have become one of the necessary tools to help a web user find a potentially interesting resource based on their preferences. In implicit recommender systems, the recommendations are made based on the implicit information of the web users i.e. data collected from web logs or cookies without knowing users preferences. Developing...
In this paper, we present new experimental results supporting the Seeding Genetic Algorithm (SGA). We evaluate the algorithm’s performance with various parameterisations, making comparisons to the Canonical Genetic Algorithm (CGA), and use these as guidelines as we establish reasonable parameters for the seeding algorithm. We present experimental r...
Techniques exist that enable problem-solvers to automatically generate an almost unlimited number of heuristics for any given problem. Since they are generated for a specific problem, the cost of selecting a heuristic must be included in the cost of solving the problem. This involves a tradeoff between the cost of selecting the heuristic and the be...
Descriptive abilities of translation-invariant Markov-Gibbs random fields (MGRF), common in texture modelling, are expected to increase if higher-order interactions, i.e. conditional dependencies between larger numbers of pixels, are taken into account. But the complexity of modelling grows as well, so that most of the recent high-order MGRFs are b...
Background: Despite the number of Web effort estimation techniques investigated, there is no consensus as to which technique produces the most accurate estimates, an issue shared by effort estimation in the general software estimation domain. A previous study in this domain has shown that using ensembles of estimation techniques can be used to addr...
We give an overview of our journal paper on applying iterative-deepening A* to the traveling tournament problem, a combinatorial optimization problem from the sports scheduling literature. This approach involved combining past ideas and creating new ideas to help reduce node expansion. This resulted in a state-of-the-art approach for optimally solv...
Data clustering aims to group data based on similarities between the data elements. Recently, due to the increasing complexity and amount of heterogenous data, modeling of such data for clustering has become a serious challenge. In this paper we tackle the problem of modeling heterogeneous web usage data for clustering. The main contribution is a n...
Implicit web usage data is sparse and noisy and cannot be used for usage clustering unless passed through a sophisticated pre-processing phase. In this paper we propose a systematic way to analyze and preprocess the web usage data so that data clustering can be applied effectively to extract similar groups of user. We split the entire process into...
This work presents an iterative-deepening A∗ (IDA∗) based approach to the traveling tournament problem (TTP). The TTP is a combinatorial optimization problem which abstracts
the Major League Baseball schedule. IDA∗ is able to find optimal solutions to this problem, with performance improvements coming from the incorporation of various
past concepts...
Background: Web development plays an important role in today's industry, so an in depth view into Web resource estimation would be valuable. However a systematic review (SR) on Web resource estimation in its entirety has not been done.
Aim: The aim of this paper is to present a SR of Web resource estimation in order to define the current state of t...
Due to a marked increase in the number of web users and their activities, many application areas that use patterns generated from their activities has been proposed. Web-based implicit recommender systems are one such application. An implicit recommender system is a tool that helps guide a user to a particular web resource based on implicit data. I...
Efficiency and quality of the product of data mining process is a challenging question for the researchers. Different methods have been proposed in the literature to tackle these problems. Optimization based methods are a way to address this issue. We addressed the problem of data clustering by implementing swarm intelligence based optimization tec...
This paper explores six different representations of the BlocksWorld Domain. It compares the results of seven planners run on these representations. It shows that the rankings for the International Planning Competition, us-ing the non-satisficing scoring function, would change for every representation.
To deal with the quantity and quality issues with online healthcare resources, creating web portals centred on particular health topics and/or communities of users is a strategy to provide access to a reduced corpus of information resources that meet quality and relevance criteria. In this paper we use hyperspace analogue to language (HAL) to model...
Clustering- an important data mining task, which groups the data on the basis of similarities among the data, can be divided into two broad categories, partitional clustering and hierarchal. We combine these two methods and propose a novel clustering algorithm called Hierarchical Particle Swarm Optimization (HPSO) data clustering. The proposed algo...
Outlier detection is an important field in data mining and knowledge discovery, which aims to identify abnormal observations in a large dataset. Common application areas of outlier detection are intrusion detection in computer networks, credit cards fraud detection, detecting abnormal changes in stock prices, and identifying abnormal health conditi...
While modelling spatially uniform or low-order polynomial contrast and offset changes is mostly a solved problem, there has
been limited progress in models which could represent highly inhomogeneous photometric variations. A recent quadratic programming
(QP) based matching allows for almost arbitrary photometric deviations. However this QP-based ap...
A recently proposed multi parametric quadratic programming (QP) based approach to image matching under complex photometric variations has shown quite an improvement over other state-of-the-art algorithms. However, it is not an entirely satisfactory solution for many practical applications as it does not account for geometric dissimilarities between...
The traveling tournament problem has proven to be a difficult problem for the ant colony optimization metaheuristic, with past approaches showing poor results. This is due to the unusual problem structure and feasibility constraints. We present a new ant colony optimization approach to this problem, hybridizing it with a forward checking and confli...
Our paper presents a new exact method to solve the traveling tournament problem. More precisely, we apply DFS* to this problem
and improve its performance by keeping the expensive heuristic estimates in memory to help greatly cut down the computational
time needed. We further improve the performance by exploiting a symmetry property found in the tr...
In recent years the integration and interaction of data mining and multi agent system (MAS) has become a popular approach
for tackling the problem of distributed data mining. The use of intelligent optimization techniques in the form of MAS has
been demonstrated to be beneficial for the performance of complex, real time, and costly data mining proc...
Web session clustering is one of the important Web usage mining techniques which aims to group usage sessions on the basis of some similarity measures. In this paper we describe a new Web session clustering algorithm that uses particle swarm optimization. We review the existing Web usage clustering techniques and propose a swarm intelligence based...
Clustering is an important data mining task and has been explored extensively by a number of researchers for different application areas such as finding similarities in images, text data and bio-informatics data. Various optimization techniques have been proposed to improve the performance of clustering algorithms. In this paper we propose a novel...
In this paper, we apply the ant colony optimization metaheuristic to the Single Round Robin Maximum Value Problem, a problem
from sports scheduling. This problem contains both feasibility constraints and an optimization goal. We approach this problem
using a combination of the metaheuristic with backtracking search. We show how using constraint sat...
The discrete cosine transform is proposed as a basis for representing fundamental frequency (F0) contours of speech. The advantages over existing representations include deterministic algorithms for both analysis and synthesis and a simple distance measure in the parameter space. A two-tier model using the DCT is shown to be able to model F0 contou...
Efficient discovery of lowest level building blocks is a fundamental requirement for a successful genetic algorithm. Although considerable effort has been directed at techniques for combining existing building blocks there has been little emphasis placed on discovering those blocks in the first place. This paper describes an analysis of the canonic...
Despite extensive research of the meteorological effects on air pollution trends and pollutant effects on public health, there has been no attempt to combine both meteorological factors and pollutant factors to estimate the levels of risk to human health. This is not simply a matter of the number of the predictors but also the complexity of their i...
Goebel et al. (4) presented a unied decomposition of en- semble loss for explaining ensemble performance. They considered demo- cratic voting schemes with uniform weights, where the various base clas- siers each can vote for a single class once only. In this article, we gener- alize their decomposition to cover weighted, probabilistic voting scheme...
This paper provides an explanation for recent results that displayed a tendency for variable length genomes to shrink when random selection is used. This shrinking effect is also observed under a form of " selection" that allows every member of the population to reproduce in every generation. It is theoretically and empirically shown that the prese...
This paper provides an explanation for re-cent results that displayed a tendency for variable length genomes to shrink when random selection is used. This shrinking effect is also observed under a form of "null selection" that allows every member of the population to reproduce in every generation. It is theoretically and empirically shown that the...
Choosing the right crossover operator for the problem at hand is a difficult problem. We describe an experiment that shows
a surprising result when comparing 1-point and uniform crossover on the Royal Road problem and derive equations for calculating
the expected rates of building block discovery, retention and combination. These equations provide...
In this paper, an algorithm is presented for learning concept classification rules. It is a hybrid between evolutionary computing and inductive logic programming (ILP). Given input of positive and negative examples, the algorithm constructs a logic program to classify these examples. The algorithm has several attractive features including the abili...
this paper, an algorithm is presented for learning concept classification rules. It is a hybrid between evolutionary computing and inductive logic programming (ILP). Given input of positive and negative examples, the algorithm constructs a logic program to classify these examples. The algorithm has several attractive features including the ability...
We consider the application of Evolutionary Algorithms (EAs) to the problem of automating the locomotion of computer-simulated creatures. We introduce niching as a way of maintaining genetic diversity and show that it results in the generation of a range of locomotion controllers and increases the probability of finding difficult or rare locomotion...
Inductive logic programming (ILP) algorithms are classification algorithms that construct classifiers represented as logic programs. ILP algorithms have a number of attractive features, notably the ability to make use of declarative background (user-supplied) knowledge. However, ILP algorithms deal poorly with large data sets (>104 examples) and th...
this paper. The accuracy is the (empirical) conditional probability of the rule's postcondition being true given that the rule's precondition is true. The positive coverage is the (empirical) conditional probability of the rule's precondition being true given that the rule's postcondition is true. The combination of these two measures gives the str...
this paper we will focus on the description of the machine learning task. We are at an early experimental stage of our research and therefore have only preliminary results. 2 Domain
We applied inductive classication techniques to data collected in a Boeing plant with the goal of uncovering possible aws in the manufacturing process. This application led us to explore two aspects of classical decision-tree induction: Pre-and Post-processing: much of our eort was focused on the pre-processing of raw data to make it suitable for i...
We applied inductive classification techniques to data collected in a Boeing plant with the goal of uncovering possible flaws in the manufacturing process. This application led us to explore two aspects of classical decision-tree induction: ffl Pre- and Post-processing: much of our effort was focused on the pre-processing of raw data to make it sui...
Research in the fields of problem solving, expert systems, and machine learning has been converging on the issue of problem representation. A system’s ability to solve problems, answer questions, and acquire knowledge has traditionally been bounded by its initial problem representation. One solution to this dilemma is to develop systems which can a...
This chapter explores the problem relating to automatically shifting from one problem representation to another representation that is more efficient, with respect to a given problem solving method and a given problem class. A system's ability to solve problems, answer questions, or acquire knowledge has always been bounded by the problem represent...
My research deals with automatically shifting from one representation of a certain problem to another representation which is more efficient for the problem class to which that problem belongs. I am attempting to discover general purpose primitive representation shifts and methods for automating them.