# Sheridan K. HoughtenBrock University · Department of Computer Science

Sheridan K. Houghten

PhD

## About

106

Publications

15,523

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

635

Citations

Introduction

Sheridan K. Houghten currently works at the Department of Computer Science, Brock University. Sheridan does research in Combinatorial Optimization, Algorithms, and Computational Intelligence. Application areas include Bioinformatics and Coding Theory.

Additional affiliations

June 1999 - present

September 1991 - May 1999

Education

September 1993 - June 1999

September 1991 - September 1993

September 1988 - May 1991

## Publications

Publications (106)

Many computational intelligence approaches have been used for the fragment assembly problem. However, the comparison and analysis of these approaches is difficult due to the lack of availability of standard benchmarks. Although similar datasets may be used as a starting point, there is not enough information to reproduce the exact overlaps matrix f...

Disease-gene association attempts to understand the relationship between genetic diseases and the genes associated with them. Many genetic diseases are not due to defects in a single gene, but rather are a result of various genetic components interacting in a complex network. We examine the use of two evolutionary computation approaches for disease...

DNA error correcting codes over the edit metric consist of embeddable markers for sequencing projects that are tolerant of sequencing errors. When a genetic library has multiple sources for its sequences, use of embedded markers permit tracking of sequence origin. This study compares different methods for synthesizing DNA error correcting codes. A...

An extremal self-dual doubly-even binary (n,k,d) code has a minimum weight d=4 └ n/24 ┘ +4. Of such codes with length divisible by 24, the Golay code is the only (24,12,8) code, the extended quadratic residue code is the only known (48,24,12) code, and there is no known (72,36,16) code. One may partition the search for a (48,24,12) self-dual doubly...

Parkinson's disease (PD) is a neurodegenerative disease represented by the progressive loss of dopamine producing neurons, with motor and non-motor symptoms that may be hard to distinguish from other disorders. Affecting millions of people across the world, its symptoms include bradykinesia, tremors, depression, rigidity, postural instability, cogn...

The impact of different lockdown strategies upon the total number of infections in an epidemic are evaluated for two models of infection: one in which the disease confers permanent immunity, and one in which it does not. The strategies are based upon the proportion of the population infected at a time in order to trigger lockdown, combined with the...

The development of computational intelligence based approaches for the compression of graphs is an under-explored area of research. Further, compression of weighted graphs is significantly more complicated than compression of unweighted graphs. In this paper a multi-objective approach using NSGA-II is applied to the problem of weighted graph compre...

The 2022 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (IEEE CIBCB 2022) was held from August 15th-17th in Ottawa, Canada. This conference has been held annually since 2004. After two years of virtual conferences, IEEE CIBCB 2022 was held primarily in person, with some remote participation. It was great t...

In this paper we evaluate the impact of different lockdown strategies upon the total number of infections during an epidemic. The strategies are based upon the percentage of the population infected during a given time step, as well as upon the amount by which interactions must be reduced during lockdown. We use a weighted personal contact network t...

A generative evolutionary algorithm is used to evolve weighted personal contact networks that represent physical contact between individuals, and thus possible paths of infection during an epidemic. The evolutionary algorithm evolves a list of edge-editing operations applied to an initial graph. Two initial graphs are considered, a ring graph and a...

A number of applications use DNA as a storage mechanism. Because processes in these applications may cause errors in the data, the information must be encoded as one of a chosen set of words that are well separated from one another — a DNA error-correcting code. Typically, the types of errors that may occur include insertions, deletions and substit...

This study introduces a new game that models competition in foraging behavior. Two moose decide, in each time period, which of three foraging areas to visit. Moose in the same foraging area fight, gaining no forage and also damaging some forage during their conflict. Moose alone in a foraging area eat, with the forage in each field being replenishe...

A multi-objective genetic algorithm is applied to the problem of identifying genes associated with Alzheimer’s disease. The input to the genetic algorithm is a set of centrality measures obtained by merging various biological evidence types into a complex network, based on a set of 11 genes already known to be associated with this disease.
In terms...

An evolutionary algorithm is used to create personal contact networks representing the social connections between members of a population. We introduce a vaccinated state to the Susceptible-Infected-Removed (SIR) model whereby vaccinated individuals are less likely to become infected. As such this study examines the impact of a single vaccine dose...

Evolutionary algorithms are used to generate personal contact networks, modelling human populations, that are most likely to match a given epidemic profile. The Susceptible-Infected-Removed (SIR) model is used and also expanded upon to allow for an extended period of infection, termed the SIIR model. The networks generated for each of these models...

The development of general edit metric decoders is a challenging problem, especially with the inclusion of additional biological restrictions that can occur when using error correcting codes in biological applications. Side effect machines (SEMs), an extension of finite state machines, can provide efficient decoding algorithms for such edit metric...

Epidemic contact tracing examines the movement of infection through a population based upon links in a contact network, and weighted networks represent the potential of transfer of the contagion. Graph compression reduces the size of a network by merging groups of nodes into \textit{supernodes}. This study considers the use of genetic algorithms to...

In recent years, researchers have been exploring alternative methods to solving Integer Prime Factorization, the decomposition of an integer into its prime factors. This has direct application to cryptanalysis of RSA, as one means of breaking such a cryptosystem requires factorization of a large number that is the product of two prime numbers. This...

Personal contact networks that represent social interactions can be used to identify who can infect whom during the spread of an epidemic. The structure of a personal contact network has great impact upon both epidemic duration and the total number of infected individuals. A vaccine, with varying degrees of success, can reduce both the length and s...

Cities, while exciting in their visualization and permitting several layouts, do not take into account the placement of crucial characters which might be part of the narrative. Narrative graphs, a connected graph of all potential and existing relations within a game, can enable an ability to find a Non-player Character (NPC) who is likely to live n...

Parkinson's disease is a neurodegenerative disease that affects close to 10 million with various symptoms including tremors and changes in gait. Observing differences or changes in an individual's manifestations of gait may provide a mechanism to identify Parkinson's disease and understand specific changes. In this study, timeseries data from both...

Disease Gene Association finds genes that are involved in the presentation of a given genetic disease. We present a hybrid approach which implements a multi-objective genetic algorithm, where input consists of centrality measures based on various relational biological evidence types merged into a complex network. Multiple objective settings and par...

Parkinson's Disease is a disorder with diagnostic symptoms that include a change to a walking gait. The disease is problematic to diagnose. An objective method of monitoring the gait of a patient is required to ensure the effectiveness of diagnosis and treatments. We examine the suitability of Extreme Gradient Boosting (XGBoost) and Artificial Neur...

The dropping cost of sequencing human DNA has allowed for fast development of several projects around the world generating huge amounts of DNA sequencing data. This deluge of data has run up against limited storage space, a problem that researchers are trying to solve through compression techniques. In this study we address the compression of SAM f...

In this study, deep learning will be used to test the predictability of stock trends. Stock markets are known to be volatile, prices fluctuate, and there are many complicated financial indicators involved. Various data including news or financial indicators can be used to predict stock prices. In this study, the focus will be on using past stock pr...

Network graphs appear in a number of important biological data problems, recording information relating to protein-protein interactions, gene regulation, transcription regulation and much more. These graphs are of such a significant size that they are impossible for a human to understand. Furthermore, the ever-expanding quantity of such information...

Parkinson's disease (PD) is a degenerative disorder of the central nervous system that has many debilitating symptoms which affect the patient's motor system and can cause significant changes in their gait. By using genetic programming, we aim to develop descriptive symbolic nonlinear models of PD patient gait from time series data recorded from pr...

Network graphs appear in a number of important biological data problems, recording information relating to protein-protein interactions, gene regulation, transcription regulation and much more. These graphs are of such a significant
size that they are impossible for a human to understand. Furthermore, the ever-expanding quantity of such information...

Creating a representation capable of generating
personal contact networks that are most likely to exhibit specific
epidemic behavior is difficult due to the inherit volatility of
an epidemic and the numerous parameters accompanying the
problem. To surpass these hurdles, evolutionary algorithms are
used to create a generative solution combined with...

Side Effect Machines (SEMs) are an extension of finite state machines which place a counter on each node that is incremented when that node is visited. Previous studies examined a genetic algorithm to discover node connections in SEMs for edit metric decoding for biological applications, namely to handle sequencing errors. Edit metric codes, while...

Hierarchical clustering via neighbor joining, widely used in biology, can be quite sensitive to the addition or deletion of single taxa. In an earlier study it was found that neighbor joining trees on random data were commonly quite unstable in the sense that large re-arrangements of the tree occurred when the tree was reconstructed after the delet...

The accurate modeling of epidemics on social contact networks is difficult due to the variation between different epidemics and the large number of parameters inherent to the problem. To reduce complexity, evolutionary computation is used to create a generative representation of the epidemic model. Previous gains from the use of local, verses globa...

Alzheimer's disease (AD) is an irreversible, progressive neurological disorder that causes memory and thinking skill loss. Many different methods and algorithms have been applied to extract patterns from neuroimaging data in order to distinguish different stages of Alzheimer's disease (AD). However, the similarity of the brain patterns in older adu...

The Disease Gene Association Problem (DGAP) is a Bioinformatics problem in which search strategies inspired by Computer Science take Biological data and attempt to efficiently and accurately provide as best a solution as possible. Specifically, the DGAP seeks to find and rank various genes based on their involvement in the presentation of a given d...

Embeddable biomarkers are short strands of DNA that can be incorporated into genetic constructs to enable later identification. They are drawn from error correcting codes on the DNA alphabet relative to the Levenshtein metric. This study uses three types of evolutionary algorithms to improve the best known size of DNA error
correcting codes, improv...

The salmon algorithm is a metaheuristic inspired by the behaviour of salmon swimming upstream to spawn. It has previously shown success when used for the creation of sets of robust tags for DNA sequencing applications, as well as for the travelling salesman problem. In this paper the salmon algorithm is evaluated for the construction of optimal cov...

Storage and processing of biological networks is challenging and costly due to the large sizes of many of these networks. Compression of such graphs is one possible solution to this problem. This study presents two single-objective genetic algorithms, along with one multi-objective algorithm, to address the problem of graph compression. The fitness...

Public-key cryptography is a fundamental component of modern electronic communication that can be constructed with many different mathematical processes. Presently, cryptosystems based on elliptic curves are becoming popular due to strong cryptographic strength per small key size. At the heart of these schemes is the intractability of the elliptic...

All simple finite groups are classified as members of specific families. With one exception, these families are infinite collections of groups sharing similar structures. The exceptional family of sporadic groups contains exactly twenty-six members. The five Mathieu groups are the most accessible of these sporadic cases. In this article, we explore...

Permutation problems are a very common classification of optimization problems. Because of their popularity countless algorithms have been developed in an attempt to find high quality solutions. It is also common to see many different types of search spaces reduced to permutation problems as there are many heuristics and metaheuristics for them due...

Many real-world graphs, including those storing various forms of biological data, are of such large size that storing and processing their information has too high a cost. As a result, one possible solution is to compress the graphs by merging nodes into supernodes. This study introduces a genetic algorithm for graph compression that is based on th...

DNA Fragment assembly - an NP-Hard problem - is one of the major steps in of DNA sequencing. Multiple strategies have been used for this problem, including greedy graph-based algorithms, deBruijn graphs, and the overlap-layout-consensus approach. This study focuses on the overlap-layout-consensus approach. Heuristics and computational intelligence...

Disease-gene association attempts to determinewhich genes are involved with genetic diseases. Various methodologies have been applied to this problem for different diseases. In earlier work, two evolutionary approaches were used to analyze the complex network of gene interaction. This paper presents an improvement upon the genetic programming appro...

Detection and correction of errors within communications using error correcting codes allows for safer transmission over noisy channels. Generation of these codes however can be extremely time consuming, especially with more complex types of errors such as insertion and deletion of bits. This research looks at optimal codes based on edit distance,...

Embeddable biomarkers are short strands of DNA that can be incorporated into genetic constructs to enable later identification. They are drawn from error correcting codes on the DNA alphabet relative to the Levenshtein metric. This study revisits the Conway variation operator which can serve as a population initializer, mutation operator, or crosso...

Understanding the relationship between genetic diseases and the genes associated with them is an important problem regarding human health. The vast amount of data created from a large number of high-throughput experiments performed in the last few years has resulted in an unprecedented growth in computational methods to tackle the disease-gene asso...

DNA Assembly is among the most fundamental and challenging problems in bioinformatics. Near optimal solutions are available for bacterial and small genomes. However assembling large and complex genomes including the human genome using Next-Generation-Sequencing (NGS) technologies is shown to be very difficult. This paper presents an algorithm for c...

Recentering-Restarting Genetic Algorithms have been used successfully to evolve multiple epidemic networks and perform DNA error correction. This work studies variations of the Recentering-Restarting Genetic Algorithm for the purpose of evaluating its effectiveness for ordered gene problems. These variations use multiple seeds and two adaptive repr...

Experimental extended x-ray absorption fine structure (EXAFS) spectra carry information about the chemical structure of metal protein complexes. However, predicting the structure of such complexes from EXAFS spectra is not a simple task. Currently methods such as Monte Carlo Optimization or simulated annealing are used in structure refinement of EX...

The Fragment Assembly Problem is a major component of the DNA sequencing process that is identified as being NP-Hard. A variety of approaches to this problem have been used, including overlap-layout-consensus, de Bruijn graphs, and greedy graph based algorithms. The overlap-layout-consensus approach is one of the more popular strategies which has b...

The Fragment Assembly Problem is a major component of the DNA sequencing process that is identified as being NP-Hard. A variety of approaches to this problem have been used, including overlap-layout-consensus, de Bruijn graphs, and greedy graph based algorithms. The overlap-layout-consensus approach is one of the more popular strategies which has b...

The Fragment Assembly Problem is a major component of the DNA sequencing process that is identified as being NP-Hard. A variety of approaches to this problem have been used, including overlap-layout-consensus, de Bruijn graphs, and greedy graph based algorithms. The overlap-layout-consensus approach is one of the more popular strategies which has b...

Recentering-restarting evolutionary algorithms have been used successfully to evolve epidemic networks. This study develops multiple variations of this algorithm for the purpose of evaluating its use for ordered-gene problems. These variations are called recentering or reanchoring-restarting evolutionary algorithms.
Two different adaptive represent...

Quaternary error-correcting codes defined over the edit metric may be used as labels to track the origin of sequence data. When used in such applications there are typically additional restrictions that are biologically motivated, such as a required GC content or the avoidance of certain patterns. As a result such codes can not be expected to have...

Self-dual doubly even linear binary error-correcting codes, often referred to as type II codes, are codes closely related to many combinatorial structures such as 5-designs. Extremal codes are codes that have the largest possible minimum distance for a given length and dimension. The existence of the extremal [72,36,16] type II code is still open....

Codes capable of correcting insertions or deletions due to synchronization errors are of increasing importance as the speed of transmission grows. Finding optimal deletion-correcting codes is particularly difficult because unlike traditional codes defined over Hamming distance, the sizes of the spheres about the code words are of varying sizes. Our...

The maximum possible number of codewords in a q-ary code of length n and minimum distance d is denoted Aq(n,d). It is a fundamental problem in coding theory to determine this value for given parameters q, n and d. Codes that attain the maximum are said to be optimal. Unfortunately, for many different values of these parameters, the maximum number o...

Error-correcting codes allow for reliable transmission of data over mediums subject to interference. They guarantee detection and recovery from a level of transmission corruption. Larger error-correcting codes increase the maximum sizes of messages transmittable, which improves communication efficiency. However, discovering optimal error-correcting...

Understanding the machinery of gene regulation to control gene expression has been one of the main focuses of bioinformaticians for years. We use a multi-objective genetic algorithm to evolve a specialized version of side effect machines for degenerate motif discovery. We compare some suggested objectives for the motifs they find and report prelimi...

We present a novel method for the creation of photographic mosaic images using fractals generated via evolutionary techniques. A photomosaic is a rendering of an image performed by placing a grid of smaller images that permit the original image to be visible when viewed from a distance. The problem of selecting the smaller images is a computational...

The prediction of protein side-chain conformation is central for understanding protein functions. Side-chain packing is a sub-problem of protein folding and its computational complexity has been shown to be NP-hard. We investigated the capabilities of a hybrid (genetic algorithm/simulated annealing) technique for side-chain packing and for the gene...

DNA error correcting codes over the edit met- ric can be used to correct sequencing errors. The codewords may be used as embeddable markers that allow one to track the origin of sequence data. The Salmon Algorithm is a search meta-heuristic inspired by the behaviour of salmon swimming upstream to spawn. This algorithm consists of a number of parame...

Error correcting codes over the DNA alphabet are used as embeddable biomarkers. Error correction provides resilience of identification in spite of sequencing errors. Ring optimization is a type of spatially structured evolutionary algorithm derived from models of ring species in nature. This paper compares the performance of a ring optimizer with a...

DNA edit metric codes are used as labels to track the origin of sequence data. This study is the first to treat sophisticated decoders for these error-correcting codes. Side effect machines can provide efficient decoding algorithms for such codes. Two methods for automatically producing decoding algorithms are presented. Side Effect Machines (SEMs)...

The design of a large and reliable DNA codeword library is a key problem in DNA based computing. DNA codes, namely sets of fixed length edit metric codewords over the alphabet {A,C,G, T}, satisfy certain combinatorial constraints with respect to biological and chemical restrictions of DNA strands. The primary constraints that we consider are the re...

In this position paper we examine preliminary results of a new type of general error correction decoder for Edit Metric Codes. The Single Classifier Machine Decoder uses the concept of Side Effect Machines(SEMs) created via Genetic Algorithms(GAs) in order to create a mapping from the Edit Metric to the Euclidean Metric to create a decoder. By not...

We provide a preliminary exploration of the use of genetic algorithms (GA) upon a substitution permutation network (SPN) cipher. The purpose of the exploration is to determine how to find weak keys. The size of the selected SPN created by Stinson gives a sample for showing the methodology and suitability of an attack using GA. We divide the types o...

DNA error correcting codes over the edit metric create embeddable markers for sequencing projects that are tolerant of sequencing errors. When a sequence library has multiple sources for its sequences, use of embedded markers permit tracking of sequence origin. Evolutionary algorithms are currently the best known technique for optimizing DNA error...

The edit distance, also known as Levenshtein distance, between two words is the minimum number of substitutions, insertions and/or deletions required to change one word into another. An (n,M, d)q edit code is a q-ary code with minimum edit distance d and in which the longest codeword has length n. A code is optimal if it has the maximum number of c...

A weighing matrix W(n, k) of order n with weight k is an n × n matrix with entries from {0, 1,−1} which satisfies WWT = kIn. Such a matrix is group-developed if its rows and columns can be indexed by elements of a finite group G so that wg,h = wgf,hf for all g, h, and f in G. Group-developed weighing matrices are a natural generalization of perfect...

A weighing matrix W(n, k) of order n with weight k is an n × n matrix with entries from {0, 1,−1} which satisfies WWT = kIn. Such a matrix is group-developed if its rows and columns can be indexed by elements of a finite group G so that wg,h = wgf,hf for all g, h, and f in G. Group-developed weighing matrices are a natural generalization of perfect...

The maximum possible number of codewords in a q-ary code of length n and minimum distance d is denoted Aq(n,d). It is a fundamental problem in coding theory to determine this value for given parameters q, n and d. Codes that attain the maximum are said to be optimal. Unfortunately, for many different values of these parameters, the maximum number o...