
Bruce William WatsonStellenbosch University | SUN · Centre for AI Research School for Data-Science and Computational Thinking
Bruce William Watson
Doctor of Philosophy
Chair of the Centre for AI Research, specialised in Cybersecurity and Algorithms
About
178
Publications
57,884
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,718
Citations
Citations since 2017
Publications
Publications (178)
‘Long COVID’ is the term used to describe the phenomenon in which patients who have survived a COVID-19 infection continue to experience prolonged SARS-CoV-2 symptoms. Millions of people across the globe are affected by Long COVID. Solving the Long COVID conundrum will require drawing upon the lessons of the COVID-19 pandemic, during which thousand...
The data ecosystem is complex and involves multiple stakeholders. Researchers and scientists engaging in data-intensive research collect, analyse, store, manage and share large volumes of data. Consequently, capturing researchers’ and scientists’ views from multidisciplinary fields on data use, sharing and governance adds an important African persp...
In the process of data modelling, interpretation of the results in a straightforward manner is often challenging. The model's statistical performance is linked to specific parameters, while the interpretation of the results in the context of the problem often relies on visualisation aids. The aim of this study was to develop an analytical pipeline...
Background
Fibrin(ogen) amyloid microclots and platelet hyperactivation previously reported as a novel finding in South African patients with the coronavirus 2019 disease (COVID-19) and Long COVID/Post-Acute Sequelae of COVID-19 (PASC), might form a suitable set of foci for the clinical treatment of the symptoms of Long COVID/PASC. A Long COVID/PAS...
Deductive program verification is a post-hoc quality assurance technique following the design-by-contract paradigm where correctness of the program is proven only after it was written. Contrary, correctness-by-construction (CbC) is an incremental program construction technique. Starting with the functional specification, the program’s correctness i...
Contract Driven Development formalizes functional requirements within component contracts. The process aims to produce higher quality software, reduce quality assurance costs and improve reusability. However, the perceived complexity and cost of requirements formalization has limited the adoption of this approach in industry. In this article, we co...
Real-time distributed Internet of Things (IoT) systems are increasingly using complex event processing to make inferences about the environment. This mode of operation is able to reduce communication require- ments, improve robustness and scalability, and avoid the need for big data storage and processing. With systems making many inferences about...
Under the coronavirus pandemic, governments and corporations around the world have adopted a work-from-home (WFH) mode of operations to continue governing and operating. Over two years into the COVID-19 pandemic, many of us continue to work from home and a large majority have few plans to return to the office.
Early on, governments and companies s...
This is study of microclots in long-COVID patients, including treatment thereof, and a data-driven correlation of comorbidities with long-COVID symptoms.
This is a corrected version. The preprint original is also at: https://doi.org/10.21203/rs.3.rs-1205453/v1
Background: Fibrin(ogen) amyloid microclots and platelet hyperactivation previously reported as a novel finding in South African patients with the coronavirus 2019 disease (COVID-19) and Long COVID/Post-Acute Sequelae of COVID-19 (PASC), might form a suitable set of foci for the clinical treatment of the symptoms of long COVID/PASC. A Long COVID/PA...
Text mining to produce a sensory description of Gins and Craft Beers.
In early 2020, the rapid adoption of remote working and communications tools by governments, companies, and individuals around the world increased dependency on cyber infrastructure for the normal functioning of States, businesses, and societies. For some, the urgent need to communicate whilst safeguarding human life took priority over ensuring tha...
Internationally digital technology is widely used in support of elections. While most countries depend on technological advances in some form or the other, electronic voting as such has been far less universally adopted. Thus far, only about 20 per cent of the world has used electronic voting for national elections-and with mixed success. While ove...
In recent years, researchers have started to investigate X-by-Construction (XbC) as a refinement approach to engineer systems that by-construction satisfy certain non-functional properties, beyond correctness as considered by the more traditional Correctness-by-Construction (CbC). In line with increasing attention for fault-tolerance and the use of...
Correctness-by-construction (CbC) is a refinement-based methodology to incrementally create formally correct programs. Programs are constructed using refinement rules which guarantee that the resulting implementation is correct with respect to a pre-/postcondition specification. In contrast, with post-hoc verification (PhV) a specification and a pr...
Governments around the world commonly use Cloud Service Providers (CSPs) that are headquartered in other nations. How do they ensure data sovereignty when these CSPs, storing a nation’s data within that nation’s borders, are subject to long-arm statutes on data stored abroad? And what if, in turn, the governmental data is stored abroad, would acces...
Cloud Service Providers (CSP) offer the opportunity for individuals, companies, and governments to rapidly leverage current capabilities dynamically and with great elasticity. At the time of writing, unlike the U.S., Canada does not have large sovereign CSPs with global presence.
Although one may debate overall cost effectiveness and value of mov...
Information system security threats perpetuates organisations in spite of enormous investments in security measures. The academic literature and the media reflect the huge financial loss and reputational harm to organisations due to computer related security breaches. Although technical safeguards are indispensable, the academic literature highligh...
In many software applications, it is necessary to preserve confidentiality of information. Therefore, security mechanisms are needed to enforce that secret information does not leak to unauthorized users. However, most language-based techniques that enable information flow control work post-hoc, deciding whether a specific program violates a confid...
Regularities in strings are often related to periods and covers, which have extensively been studied, and algorithms for their efficient computation have broad application. In this paper we concentrate on computing cyclic regularities of strings, in particular, we propose several efficient algorithms for computing: (i) cyclic periodicity; (ii) all...
Within the last decades, the dead-zone algorithms have emerged as being highly performant on certain types of data. Such algorithms solve the keyword exact matching problem over strings, though extensions to trees and two-dimensional data have also been devised. In this short paper, we give an overview of such algorithms.
Regularities in strings are often related to periods and covers, which have extensively been studied, and algorithms for their efficient computation have broad application. In this paper we concentrate on computing cyclic regularities of strings, in particular, we propose several efficient algorithms for computing: (i) cyclic periodicity; (ii) all...
Correctness-by-Construction (CbC) is an approach to incrementally create formally correct programs guided by pre- and postcondition specifications. A program is created using refinement rules that guarantee the resulting implementation is correct with respect to the specification. Although CbC is supposed to lead to code with a low defect rate, it...
After decades of progress on Correctness-by-Construction (CbC) as a scientific discipline of engineering, it is time to look further than correctness and investigate a move from CbC to XbC, i.e., considering also non-functional properties. X-by-Construction (XbC) is concerned with a step-wise refinement process from specification to code that autom...
Guaranteeing that information processed in computing systems remains confidential is vital for many software applications. To this end, language-based security mechanisms enforce fine-grained access control policies for program variables to prevent secret information from leaking through unauthorized access. However, approaches for language-based s...
In this paper, we propose a reduction of the minimization problem for a bottom-up deterministic tree automaton (DFTA), making the latter a minimization of a string deterministic finite automaton (DFA). To achieve this purpose, we proceed first by the transformation of the tree automaton into a particular string automaton, followed by minimizing thi...
Over the last few decades, several technology specialists have collected computer viruses and other malware. Today, if one desires, they can download current malware collections from Internet- based sources1. It could be argued that a large majority of older malware would not be as effective as the day they were written, due to the target systems o...
The increasingly large volumes of publicly available sensory descriptions of wine raises the question whether this source of data can be mined to extract meaningful domain-specific information about the sensory properties of wine. We introduce a novel application of formal concept lattices, in combination with traditional statistical tests, to visu...
A method for developing concurrent software is advocated that centres on using CSP to specify the behaviour of the system. A small example problem is used to illustrate the method. The problem is to develop a simulation system that keeps track of and reports on the least unique bid of multiple streams of randomly generated incoming bids. The proble...
Technologies have evolved so rapidly that companies and governments seem to be regularly trying to catch up to new capabilities and thereby making quick decisions that have the potential to set precedents and present international challenges. Is cyber capability changing so fast that our sensemaking is lagging? Is cyber shape-shifting?
With the op...
A degenerate or indeterminate string on an alphabet $\Sigma$ is a sequence of non-empty subsets of $\Sigma$. Given a degenerate string $t$ of length $n$, we present a new method based on the Burrows--Wheeler transform for searching for a degenerate pattern of length $m$ in $t$ running in $O(mn)$ time on a constant size alphabet $\Sigma$. Furthermor...
Failure deterministic finite automata (FDFAs) represent regular languages more compactly than deterministic finite automata (DFAs). Four algorithms that convert arbitrary DFAs to language-equivalent FDFAs are empirically investigated. Three are concrete variants of a previously published abstract algorithm, the DFA-Homomorphic Algorithm (DHA). The...
The data explosion problem continues to escalate requiring novel and ingenious solutions. Pattern inference focusing on repetitive structures in data is a vigorous field of endeavor aimed at shrinking volumes of data by means of concise descriptions. The Burrows–Wheeler transformation computes a permutation of a string of letters over an alphabet,...
Most regular expression matching engines have operators and features to enhance the succinctness of classical regular expressions, such as interval quantifiers and regular lookahead. In addition, matching engines in for example Perl, Java, Ruby and .NET, also provide operators, such as atomic operators, that constrain the backtracking behavior of t...
Being an unsupervised machine learning and data mining technique, biclustering and its multimodal extensions are becoming popular tools for analysing object-attribute data in different domains. Apart from conventional clustering techniques, biclustering is searching for homogeneous groups of objects while keeping their common description, e.g., in...
Growing SmartCities means that the amount of information processed and stored to manage a city’s infrastructure (e.g., traffic, public transport, electricity) is growing as well. To manage this, SmartCities are deploying truly distributed and highly scalable information and communication (ICT) infrastructure, connecting a conglomerate of smart devi...
Correctness-by-construction (CbC) is an approach for developing algorithms inline with rigorous correctness arguments. A high-level specification is evolved into an implementation in a sequence of small, tractable refinement steps guaranteeing the resulting implementation to be correct. CbC facilitates the design of algorithms that are more efficie...
Correctness-by-construction (CbC), traditionally based on weakest precondition semantics, and post-hoc verification (PhV) aspire to ensure functional correctness. We argue for a lightweight approach to CbC where lack of formal rigour increases productivity. In order to mitigate the risk of accidentally introducing errors during program construction...
We apply results from ambiguity of non-deterministic finite automata to the problem of determining the asymptotic worst-case matching time, as a function of the length of the input strings, when attempting to match input strings with a given regular expression, where the matcher being used is a backtracking regular expression matcher.
Modern software systems, in particular in mobile and cloud-based applications, exist in many different variants in order to adapt to changing user requirements or application contexts. Software product line engineering allows developing these software systems by managed large-scale reuse in order to achieve shorter time to market. Traditional softw...
Taxonomy-Based Software Construction (TABASCO) applies extensive domain analyses to create conceptual hierarchies of algorithmic domains. Those are used as basis for the implementation of software toolkits. The monolithic structure of TABASCO-based toolkits restricts their adoption on resource-constrained or special-purpose devices. In this paper,...
This extended abstract sketches some of the most recent advances in hardware implementations (and surrounding issues) of finite automata and regular expressions.
A method of determining a set of prescribed actions includes receiving a configuration script identifying a set of influencers, a set of performance indicators, a model type, a target time, and a prescription method. The method further includes deriving a model of the model type based on data associated with the set of influencers or with the set o...
We propose a reduction of the minimization problem for a bottom-up deterministic tree automaton (DFTA) to the minimization problem for a string deterministic finite automaton (DFA). We proceed by a transformation of the tree automaton into a particular string automaton and then minimize the string automaton. We show that for our transformation, the...
We discuss the correctness-by-construction approach to software development.•We discuss our experience with this approach in various algorithmic settings.•We argue that its application to algorithmically complex system parts is worthwhile.
The timing performance data of ten related algorithms (solving the single keyword pattern matching problem) executing under a wide variety of operating conditions, was gathered and analysed. Using the resulting 15 million items of timing data, various metrics to estimate algorithm performance were computed and compared. An assessment is made of whe...
In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One efficient, compact representation is the factor oracle (FO). At the same time, any classical deterministic finite automata (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to...
As long as software has been produced, there have been efforts to strive for quality in software products. In order to understand quality in software products, researchers have built models of software quality that rely on metrics in an attempt to provide a quantitative view of software quality. The aim of these models is to provide software produc...
Deep packet inspection (DPI) systems are required to perform at or near network line-rate speeds, matching thousands of rules against the network traffic. The engineering performance and price trade-offs are such that DPI is difficult to virtualize, either because of very high memory consumption or the use of custom hardware; similarly, a running D...
In indexing of and pattern matching on DNA sequences, representing all factors of a sequence is important. One efficient, compact representation is the factor oracle (FO). At the same time, any classical deterministic finite automata (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple sy...
http://dl.acm.org/citation.cfm?id=2564892
TABLE OF CONTENTS:
1. Abstraction, Refinement, Enrichment
2. Ceteris Paribus Preferences: Prediction via Abduction
3. Minimal Weighted Automata over the Galois Field with Two Elements
4. Symmetric Difference NFA: the State of the Art
5. Analyzing Strings with Ordered Lyndon-like Structures
6. Verifying an E...
The design and implementation is discussed of Fire@mSat"2, an algorithm to detect microsatellites (short approximate tandem repeats) in DNA. The algorithm relies on deterministic finite automata. The parameters are designed to support requirements expressed by molecular biologists in data exploration. By setting the parameters of Fire@mSat"2 as lib...
Most software packages with regular expression matching engines offer operators that extend the classical regular expressions, such as counting, intersection, complementation, and interleaving. Some of the most popular engines, for example those of Java and Perl, also provide operators that are intended to control the nondeterminism inherent in reg...
The factor oracle [3] is a data structure for weak factor recognition. It is a deterministic finite automaton (DFA) built on a string p of length m that is acyclic, recognizes at least all factors of p, has m+ 1 states which are all final, is homogeneous, and has m to 2m-1 transitions. The factor storacle [6] is an alternative automaton that satisf...
I introduce two performance improvements to the Commentz-Walter family of multiple-keyword (exact) pattern matching algorithms. These algorithms (which are in the Boyer-Moore-Horspool style of keyword pattern matchers) consist of two nested loops: the outer one to process the input text and the inner one processing possible keyword matches. The gua...
A so-called dead-zone pattern matching family of algorithms has previously been proposed as a concept. Here the performance of several instances of the family are empirically investigated. An abstract description of the algorithm family is given, as well as of these instances. This leads to a total of five different implementations of the algorithm...
Earlier publications provided an abstract specification of a family of single keyword pattern matching algorithms [18] which search unexamined portions of the text in a divide-and-conquer fashion, generating dead-zones in the text as they progress. These dead zones are area of text that require no further examination. Here the results are described...
The consequences of regular expression hashing as a means of finite state automaton reduction is explored, based on variations of Brzozowski's algorithm. In this approach, each hash collision results in the merging of the automaton's states, and it is subsequently shown that a super-automaton will always be constructed, regardless of the hash funct...
The previous chapter illustrated the potency of software correctness by construction for developing a new and elegant algorithm. In this chapter we focus on classifying and taxonomising algorithmic problems by relying on correctness by construction thinking.
In this chapter, a number of fairly elementary algorithms are developed. They are, namely: linear search; finding the maximal element in an array; a version of binary search; a simple pattern matching algorithm; raising a number to a specific integer power; and finding the integer approximation of a logarithm to the base 2.
This chapter provides further examples of the software correctness by construction method. The examples are fairly diverse. They range from sorting in a specialised context (the Dutch National Flag problem), discovering segmental properties of an array (the longest segment and the longest palindrome problems), raster drawing algorithms, the majorit...
The correctness by construction methodology advocated by this book starts off with a predicate-based specification of the problem at hand, and then incrementally refines that specification to code. However, to be able to do this, several preliminary notational and theoretical matters have to be in place.
Procedures Synonyms are subprocedure, subprogram, routine, subroutine, function and method. In this text, we will keep to the terms procedure and function as they were classically used in languages such as Pascal. offer a well-known way of reusing code. A procedure may be viewed as a named block of code, characterised by its pre- and postconditions...
In this chapter, the correctness by construction approach is applied to an algorithmic problem that lies well off the beaten track of classical text book examples. The algorithm has been in the public domain since about 2000, but was only clearly explained and its correctness shown in 2010 [26]. The algorithm has also been shown to be considerably...
The focus of this book is on bridging the gap between two extreme methods for developing software. On the one hand, there are texts and approaches that are so formal that they scare off all but the most dedicated theoretical computer scientists. On the other, there are some who believe that any measure of formality is a waste of time, resulting in...
Inspired by failure functions found in classical pattern matching algorithms, a failure deterministic finite automaton (FDFA) is defined as a formalism to recognise a regular language. An algorithm, based on formal concept analysis, is proposed for deriving from a given deterministic finite automaton (DFA) a language-equivalent FDFA. The FDFA's tra...
Formal concept analysis is used as the basis for two new multiple keyword string pattern matching algorithms. The algorithms addressed are built upon a so-called position encoded pattern lattice (PEPL). The algorithms presented are in conceptual form only; no experimental results are given. The first algorithm to be presented is easily understood a...
In this paper two concurrent versions of Brzozowski's deterministic finite automaton (DFA) construction algorithm are developed from first principles, the one being a slight refinement of the other. We rely on Hoare's CSP as our notation.
The specifications that are proposed of the Brzozowski algorithm are in terms of the concurrent composition of...
In model-based testing (MBT, also known as “specification-based” testing, or “model-driven” testing: MDT), the test cases, according to which a hardware or software unit (module, component) shall be tested after its development or implementation, are typically not created “ab initio”. They are derived by some means of formal reasoning from a formal...
Many keyword pattern matching algorithms use precomputation subroutines to produce lookup tables, which in turn are used to improve performance during the search phase. If the keywords to be matched are known at compile time, the precomputation subroutines can be implemented to be evaluated at compile time versus at run time. This will provide a pe...
This paper describes an experimental study to compare the performance of various dynamically resizable bit-vector implementations for the C++ programming language. We compare the std::vector from the Standard Template Library (STL), boost::dynamic_bitset from Boost, Qt::QBitArray from QT Software, and BitMagic's bm::bvector We also compare std::vec...
Previous work on implementations of FA-based string recognizers suggested a range of implementation strategies (and therefore, algorithms) aiming at improving their performance for fast string recognition. However, an efficient exploitation of suggested algorithms by domain-specific FA-implementers requires prior knowledge of the behaviour (perform...
The tourist slogan used to market South Africa A World in One Country cuts across many more dimensions than just those of interest to tourists. Everywhere in the country, there is evidence of both a highly advanced and sophisticated economy and lifestyle, as well as of poverty and underdevelopment. The purpose of this session is to reflect on wheth...
An object-oriented framework is proposed for constructing a virtual machine (VM) to be used in the context of incrementally and iteratively developing a domain-specific language (DSL). The framework is written in C#. It includes abstract instruction and environment classes. By extending these, a concrete layer of classes is obtained whose instances...
These proceedings contain the final versions of the papers presented at the 7th International
Workshop on Finite-State Methods and Natural Language Processing, FSMNLP
2008. The workshop was held in Ispra, Italy, on September 11–12, 2008. The event was
the seventh instance in the series of FSMNLP workshops, and the third that was arranged
as a stand...
We introduce a new CSP operator for modeling scenarios characterised by partial or optional parallelism. We provide examples of such scenarios and sketch the semantics of our operator. Relevant properties are proven.
An incremental algorithm to construct a lattice from a collection of sets is derived, refined, analyzed, and related to a similar previously published algorithm for constructing concept lattices. The lattice constructed by the algorithm is the one obtained by closing the collection of sets with respect to set intersection. The analysis explains the...
We propose a concept lattice-based approach to multiple two dimensional pattern matching problems. It is assumed that a pattern
can be described as a set of vertices (or pixels) and that a small set of vertices around each vertex corresponds to an attribute
in a concept lattice. Typically, an attribute should be a succinct characterisation of domai...
We present two algorithms for minimizing deterministic frontier-to-root tree automata (dfrtas) and compare them with their string counterparts. The presentation is incremental, starting out from definitions of minimality of automata and state equivalence, in the style of earlier algorithm taxonomies by the authors. The first algorithm is the classi...
Please obtain the Festschrift from ==> http://www.cs.up.ac.za/cs/sgruner/Festschrift/