Dimitar Kazakov

Dimitar Kazakov
  • PhD
  • Professor (Associate) at University of York

Reader, AI Group, Dept. of Computer Science, University of York

About

159
Publications
49,127
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,014
Citations
Introduction
Current research interests: concurrent learning in description logic, financial modelling and forecasting, modelling of syntax for historical linguistics, computational chemistry, health applications of ML, NLP.
Current institution
University of York
Current position
  • Professor (Associate)
Additional affiliations
March 1998 - present
University of York
Position
  • Professor (Associate)

Publications

Publications (159)
Conference Paper
Full-text available
This paper presents a novel approach to constructing a Question Answering model for analysing Nationally Determined Contributions (NDC) reports within the environmental sector. The approach is based on Large Language Models (LLMs) equipped with Retrieval Augmented Generation (RAG) and enhanced by ontology integration. Acknowledging the challenges i...
Preprint
Full-text available
Mitigating bias in automated decision-making systems, specifically deep learning models, is a critical challenge in achieving fairness. This complexity stems from factors such as nuanced definitions of fairness, unique biases in each dataset, and the trade-off between fairness and model accuracy. To address such issues, we introduce FairVIC, an inn...
Preprint
Full-text available
The presence of specific linguistic signals particular to a certain sub-group of people can be picked up by language models during training. This may lead to discrimination if the model has learnt to pick up on a certain group's language. If the model begins to associate specific language with a distinct group, any decisions made based upon this la...
Conference Paper
Full-text available
Consideration of multiple viewpoints on a contentious issue is critical for avoiding bias and assisting in the formulation of rational decisions. We observe that the current model imposes a constraint on diversity. This is because the conventional attention mechanism is biased toward a single semantic aspect of the claim, whereas the claim may cont...
Chapter
There is a history of hybrid machine learning approaches where the result of an unsupervised learning algorithm is used to provide data annotation from which ILP can learn in the usual supervised manner [7, 8]. Here we consider the task of predicting the property of cointegration between the time series of stock price of two companies, which can be...
Conference Paper
Full-text available
In this paper we present a framework for automatically predicting the gender and age of a patient using chest x-rays (CXRs). The work of this paper derives from common situations in medical imaging where the gender/age of a patient might be missing or in situations where the x-ray is of poor quality, thus leaving the medical practitioner unable to...
Conference Paper
COVID-19 was declared a pandemic by the World Health Organization (WHO) in January 2020. Many studies found that some specific age groups of people have a higher risk of contracting the disease. The gold standard test for the disease is a condition-specific test based on Reverse-Transcriptase Polymerase Chain Reaction (RT-PCR). We have previously s...
Article
Full-text available
Each argument begins with a conclusion, which is followed by one or more premises supporting the conclusion. The warrant is a critical component of Toulmin's argument model; it explains why the premises support the claim. Despite its critical role in establishing the claim's veracity, it is frequently omitted or left implicit, leaving readers to in...
Article
Full-text available
The warrant element of the Toulmin model is critical for fact-checking and assessing the strength of an argument. As implicit information, warrants justify the arguments and explain why the evidence supports the claim. Despite the critical role warrants play in facilitating argument comprehension, the fact that most works aim to select the best war...
Conference Paper
Full-text available
The warrant element of the Toulmin model is critical for fact-checking and assessing the strength of an argument. As implicit information, warrants justify the arguments and explain why the evidence supports the claim. Despite the critical role warrants play in facilitating argument comprehension, the fact that most works aim to select the best war...
Conference Paper
Full-text available
There is a history of hybrid machine learning approaches where the result of an unsupervised learning algorithm is used to provide data annotation from which ILP can learn in the usual supervised manner. Here we consider the task of predicting the property of cointegration between the time series of stock price of two companies, which can be used t...
Article
Full-text available
COVID-19 has been declared by The World Health Organization (WHO) a global pandemic in January, 2020. Researchers have been working on formulating the best approach and solutions to cure the disease and help to prevent such pandemics in the future. A lot of efforts have been made to develop a fast and accurate early clinical assessment of the disea...
Chapter
Fact-checking is a task to capture the relation between a claim and evidence (premise) to decide this claim’s truth. Detecting the factuality of claim, as in fake news, depending only on news knowledge, e.g., evidence text, is generally inadequate since fake news is intentionally written to mislead readers. Most of the previous models on this task...
Article
Full-text available
The massive growth of internet users nowadays can be a big opportunity for the businesses to promote their services. This opportunity is not only for e-commerce, but also for other e-services, such as e-tourism. In this paper, we propose an approach of personalized recommender system with pairwise preference elicitation for the e-tourism domain are...
Preprint
Full-text available
The massive growth of internet users nowadays can be a big opportunity for the businesses to promote their services. This opportunity is not only for e-commerce, but also for other e-services, such as e-tourism. In this paper, we propose an approach of personalized recommender system with pairwise preference elicitation for the e-tourism domain are...
Preprint
Made available under the CC BY-NC-ND licence (https://creativecommons.org/licenses/by-nc-nd/4.0/). Recent work in parametric syntax (Guardiano, Longobardi, Cordoni, & Crisma, to appear) has produced a list of 91 binary parameters which express cross-linguistic variation in a well-defined module of grammar (the nominal domain). These parameters ha...
Conference Paper
Full-text available
Pair trading is a market-neutral strategy which is based on the use of standard, well-known statistical tests applied to time series of price to identify suitable pairs of stock. This article studies the potential benefits of using additional qualitative information for this type of trade. Here we use an ontology to represent and structure informat...
Article
Full-text available
In recent years, many daily processes such as internet web searching, e-mail filter-ing, social media services, e-commerce have benefited from machine learning tech-niques (ML). The implementation of ML techniques has been largely focused on blackbox methods where the general conclusions are not easily interpretable. Hence, theelaboration with othe...
Chapter
Machine Learning (ML) approaches can achieve impressive results, but many lack transparency or have difficulties handling data of high structural complexity. The class of ML known as Inductive Logic Programming (ILP) draws on the expressivity and rigour of subsets of First Order Logic to represent both data and models. When Description Logics (DL)...
Preprint
Full-text available
We conduct an exhaustive survey of adaptive selection of operators (AOS) in Evolutionary Algorithms (EAs). We simplified the AOS structure by adding more components to the framework to built upon the existing categorisation of AOS methods. In addition to simplifying, we looked at the commonality among AOS methods from literature to generalise them....
Article
Full-text available
Argument generation aims to produce extra information automatically given a piece of text as input to provide information helps in decision making. These arguments can reason about the available information, its conflicting or uncertain information and the newly obtained information to construct believes or goals that assert the agreement of a clai...
Article
Full-text available
In recent years, many daily processes such as internet web searching, e-mail filtering, social media services, e-commerce have benefited from Machine Learning (ML) techniques. The implementation of ML techniques has been largely focused on black box methods where the general conclusions are not easily interpretable. Hence, the elaboration with othe...
Book
This book constitutes the refereed conference proceedings of the 29th International Conference on Inductive Logic Programming, ILP 2019, held in Plovdiv, Bulgaria, in September 2019. The 11 papers presented were carefully reviewed and selected from numerous submissions. Inductive Logic Programming (ILP) is a subfield of machine learning, which orig...
Conference Paper
Full-text available
This paper describes a method for creating multilingual dictionaries using Wikipedia as a resource. A lucky strike on the road to multilingual information retrieval, the main idea is simple: taking the titles of Wikipedia pages in English and then finding the titles of the corresponding articles in other languages produces a multilingual dictionary...
Conference Paper
Full-text available
In this paper, we propose a recommender system using pair- wise comparisons as the main source of information in the user pref- erence elicitation process. We use a logic-based approach implemented in APARELL, an inductive learner modelling the user's preferences in description logic. A within-subject preliminary user study with a large dataset fro...
Conference Paper
Full-text available
Simulated data generated from an accurate modelling tool can demonstrate real-life events. This approach mimics pipeline opera- tion without the need for maintaining the original anomaly record that is already scarce in the pipeline industry. This synthetic data carries precise signatures and the shape of the curve depicts the type of alarm. Learni...
Article
Machine Learning (ML) approaches can achieve impressive results, but many lack transparency or have difficulties handling data of high structural complexity. The class of ML known as Inductive Logic Programming (ILP) draws on the expressivity and rigour of subsets of First Order Logic to represent both data and models. When Description Logics (DL)...
Conference Paper
Full-text available
Adaptive Operator Selection (AOS) is an approach that controls discrete parameters of an Evolutionary Algorithm (EA) during the run. In this paper, we propose an AOS method based on Double Deep Q-Learning (DDQN), a Deep Reinforcement Learning method, to control the mutation strategies of Differential Evolution (DE). The application of DDQN to DE re...
Preprint
Full-text available
Adaptive Operator Selection (AOS) is an approach that controls discrete parameters of an Evolutionary Algorithm (EA) during the run. In this paper, we propose an AOS method based on Double Deep Q-Learning (DDQN), a Deep Reinforcement Learning method, to control the mutation strategies of Differential Evolution (DE). The application of DDQN to DE re...
Article
Full-text available
Adaptive Operator Selection (AOS) is an approach that controls discrete parameters of an Evolutionary Algorithm (EA) during the run. In this paper, we propose an AOS method based on Double Deep Q-Learning (DDQN), a Deep Reinforcement Learning method, to control the mutation strategies of Differential Evolution (DE). The application of DDQN to DE re...
Conference Paper
This article describes a novel framework for the detection of causal links between financial news and the subsequent movements of the stock market. The approach builds on and substantially improves a previously published in-house design for the detection and measurement of correlation between news and time series in the financial domain, which has...
Chapter
Full-text available
Probability Matching is one of the most successful methods for adaptive operator selection (AOS), that is, online parameter control, in evolutionary algorithms. In this paper, we propose a variant of Probability Matching, called Recursive Probability Matching (RecPM-AOS), that estimates reward based on progress in past generations and estimates qua...
Poster
Full-text available
Demographic events often leave traces in languages and genes: this prompted Darwin’s prediction that the evolutionary tree of human populations would provide the best possible phylogeny of language relationships. We tested Darwin’s expectation through long-distance genome-language comparisons across Eurasia, relying on independently assessed quanti...
Conference Paper
Full-text available
Demographic events often leave traces in languages and genes: this prompted Darwin’s prediction that the evolutionary tree of human populations would provide the best possible phylogeny of language relationships. We tested Darwin’s expectation through long-distance genome-language comparisons across Eurasia, relying on independently assessed quanti...
Conference Paper
Full-text available
Here we describe a Description Logic (DL) based Inductive Logic Programming (ILP) algorithm for learning relations of order. We test our algorithm on the task of learning user preferences from pairwise comparisons. The results have implications for the development of customised recommender systems for e-commerce, and more broadly, wherever DL-based...
Chapter
Full-text available
The use of parameters in the description of natural language syntax has to balance between the need to discriminate among (sometimes subtly different) languages, which can be seen as a cross-linguistic version of Chomsky's descriptive adequacy (Chomsky, 1964), and the complexity of the acquisition task that a large number of parameters would imply,...
Article
Full-text available
ILP learners are commonly implemented to consider sequentially each training example for each of the hypotheses tested. Computing the cover set of a hypothesis in this way is costly, and introduces a major bottleneck in the learning process. This computation can be implemented more efficiently through the use of data level parallelism. Here we prop...
Conference Paper
Full-text available
The use of parameters in the description of natural language syntax has to balance between the need to discriminate among (sometimes subtly different) languages, which can be seen as a cross-linguistic version of Chomsky's (1964) descriptive adequacy, and the complexity of the acquisition task that a large number of parameters would imply, which is...
Article
Full-text available
Here we describe a Description Logic (DL) based Inductive Logic Programming (ILP) algorithm for learning relations of order. We test our algorithm on the task of learning user preferences from pairwise comparisons. The results have implications for the development of customised recommender systems for e-commerce, and more broadly, wherever DL-based...
Article
Full-text available
The use of parameters in the description of natural language syntax has to balance between the need to discriminate among (sometimes subtly different) languages, which can be seen as a cross-linguistic version of Chomsky’s (1964) descriptive adequacy, and the complexity of the acquisition task that a large number of parameters would imply, which is...
Conference Paper
It is a truth universally acknowledged that e-commerce platform users in search of an item that best suits their preferences may be offered a lot of choices. An item may be characterised by many attributes, which can complicate the process. Here the classic approach in decision support systems – to put weights on the importance of each attribute –...
Conference Paper
Hybridisation of algorithms in evolutionary computation (EC) has been used by researchers to overcome drawbacks of population-based algorithms. The introduced algorithm called mutated Artificial Bee Colony algorithm, is a novel variant of standard Artificial Bee Colony algorithm (ABC) which successfully moves out of local optima. First, new paramet...
Conference Paper
Multi-agent systems can play a critical role in Islamic banking development by creating or reforming existing financial products that improve final output and solving a number of legal issues related to Islamic Sharia law. This paper introduces novel multi-agent protocols for a commodity-trading platform to be used for Islamic banking. This work fo...
Conference Paper
Financial news and stocks appear linked to the point where the use of online news to forecast the markets has become a major selling point for some traders. The correlation between news content and stock returns is clearly of interest, but has been mostly centred on news meta-data, such as volume and popularity. We address this question here by mea...
Conference Paper
Full-text available
This article describes a novel dataset aiming to provide insight on the relationship between stock market prices and news on social media, such as Twitter. While several financial companies advertise that they use Twitter data in their decision process, it has been hard to demonstrate whether online postings can genuinely affect market prices. By f...
Conference Paper
Full-text available
This article describes an empirical approach to the macroeconomic modelling of the Euro zone. Data for the period 1971–2007 has been used to learn systems of ordinary differential equations (ODE) linking inflation, real interest and output growth. The equation discovery algorithm LAGRAMGE was used in conjunction with a grammar defining a potentiall...
Conference Paper
Full-text available
We present results from simulations studying the hypothesis that mechanisms for landmark-based navigation could have served as preadaptations for compositional language. It is argued that sharing directions would significantly have helped bridge the gap between general and language-specific cognitive faculties. The experiments in this study are bui...
Article
In time series analysis research, there is a strong interest in discrete representations of real valued data streams. One approach still considered state-of-the-art is the Symbolic Aggregate Approximation (SAX) algorithm. The interest of this paper concerns the SAX assumption of data being highly Gaussian and the use of the standard normal curve to...
Conference Paper
Full-text available
This paper presents a method of lexical semantic disambiguation in multilingual corpora and describes the construction of an artificial word-aligned and lexically disambiguated gold-standard corpus from an existing multilingual resource. The suggested approach uses sets of aligned words and phrases across languages as unique semantic tags similar t...
Conference Paper
Full-text available
This paper presents a method of lexical semantic disambiguation in multilingual corpora and describes the construction of an artificial word-aligned and lexically disambiguated gold-standard corpus from an existing multilingual resource. The suggested approach uses sets of aligned words and phrases across languages as unique semantic tags similar t...
Article
Full-text available
This article presents results from simulations studying the hypothesis that mechanisms for landmark-based navigation could have served as preadaptations for compositional language. It is argued that sharing directions would significantly have helped bridge the gap between general and language-specific cognitive faculties. A number of different leve...
Article
This paper presents a method of lexical semantic disambiguation in multilingual corpora and describes the construction of an artificial word-aligned and lexically disambiguated gold-standard corpus from an existing multilingual resource. The suggested approach uses sets of aligned words and phrases across languages as unique semantic tags similar t...
Article
Full-text available
In time series analysis research there is a strong interest in discrete representations of real valued data streams. One approach that emerged over a decade ago and is still considered state-of-the-art is the Symbolic Aggregate Approximation algorithm. This discretization algorithm was the first symbolic approach that mapped a real-valued time seri...
Article
Full-text available
This work discusses the challenge of developing self-cognisant artificial intelligence systems, looking at the possible benefits and the main issues in this quest. It is argued that the degree of complexity, variation, and specialisation of technological artefacts used nowadays, along with their sheer number, represent an issue that can and should...
Conference Paper
Full-text available
This paper introduces a novel forecasting algorithm that is a blend of micro and macro modelling perspectives when using Artificial Intelligence (AI) techniques. The micro component concerns the fine-tuning of technical indicators with population based optimization algorithms. This entails learning a set of parameters that optimize some economicall...
Conference Paper
Full-text available
This study analyzes two implications of the Adaptive Market Hypothesis: variable efficiency and cyclical profitability. These implications are, inter alia, in conflict with the Efficient Market Hypothesis. Variable efficiency has been a popular topic amongst econometric researchers, where a variety of studies have shown that variable efficiency doe...
Chapter
Multi-Agent Systems can play a critical role in Islamic banking development to create new or reform existing financial products in order to promote profit, reduce risk and to solve a number of legal issues related to Islamic Sharia law. This paper introduces a novel multi-agent platform for commodity trading in support of Islamic banking. The artic...
Conference Paper
Full-text available
Current approaches to instruction cache analysis for determining worst-case execution time rely on building a mathematical model of the cache that tracks its contents at all points in the program. This requires perfect knowledge of the functional behaviour of the cache and may result in extreme complexity and pessimism if many alternative paths thr...
Article
This paper describes an equation discovery approach based on machine learning using LAGRAMGE as an equation discovery tool, using two sources of input, a dataset and model presented in context-free grammar. The approach is searching a large range of potential equations using a specific model. The parameters of the equation are fitted to find the be...
Article
Full-text available
This study investigates the characteristic of non- stationarity in a financial time-series and its effect on the learning process for Artificial Neural Networks (ANN). It is motivated by previous work where it was shown that non- stationarity is not static within a financial time series but quite variable in nature. Initially unit-root tests were p...
Article
This paper presents a technique to build a lexical resource used for annotation of parallel corpora where the tags can be seen as multilingual 'synsets'. The approach can be extended to add relationships between these synsets that are akin to WordNet relationships of synonymy and hypernymy. The paper also discusses how the results can be evaluated....
Article
Full-text available
Determination of accurate estimates for the Worst-Case Execution Time of a program is essential for guaranteeing the correct temporal behavior of any Real-Time System. Of particular importance is tightly bounding the number of iterations of loops in the program or excessive undue pessimism can result. This paper presents a novel approach to determi...
Conference Paper
Full-text available
The use of technical indicators to derive stock trading signals is a foundation of financial technical analysis. Many of these indicators have several parameters which creates a difficult optimization problem given the highly non-linear and non-stationary nature of a financial time-series. This study investigates a popular financial indicator, Boll...
Conference Paper
Full-text available
This study analyzes the effectiveness of an Artificial Immune System (AIS) to model and predict the movements of the stock market. To aid in this research the AIS models are compared with a k-Nearest Neighbors (kNN) algorithm, an artificial neural network (ANN) and a benchmark market portfolio to compare simulated trading results. The analysis show...
Article
Full-text available
This paper presents a technique to build a lexical resource used for annotation of parallel corpora where the tags can be seen as multilingual 'synsets'. The approach can be extended to add relationships between these synsets that are akin to WordNet relationships of synonymy and hypernymy. The paper also discusses how the success of this approach...
Conference Paper
Full-text available
It has been suggested that human language emerged as either a new, critical faculty to handle recursion, which linked two other existing systems in the brain, or as an exaptation of an existing mechanism, which had been used for a different purpose to that point. Of these two theories, the latter appears more parsimonious, but, somewhat surprisingl...
Article
Full-text available
This paper outlines an approach to the unsu-pervised construction from unannotated paral-lel corpora of a lexical semantic resource akin to WordNet. The paper also describes how this re-source can be used to add lexical semantic tags to the text corpus at hand. Finally, we discuss the possibility to add some of the predicates typi-cal for WordNet t...
Conference Paper
Full-text available
Static analysis can be used to determine safe estimates of Worst Case Execution Time. However, overestimation of the number of loop iterations, particularly in nested loops, can result in substantial pessimism in the overall estimate. This paper presents a method of determining exact parametric values of the number of loop iterations for a particul...
Conference Paper
Full-text available
This paper proposes a method for creating a multilingual dictionary by taking the titles of Wikipedia pages in English and then finding the titles of the corresponding articles in other languages. The creation of such multilingual dictionaries has become possible as a result of exponential increase in the size of multilingual information on the web...
Conference Paper
Full-text available
This article describes a machine learning based approach applied to acquiring empirical forecasting models. The approach makes use of the LAGRAMGE equation discovery tool to define a potentially very wide range of equations to be considered for the model. Importantly, the equations can vary in the number of terms and types of functors linking the v...
Conference Paper
Full-text available
This paper outlines an approach to the unsu-pervised construction from unannotated parallel corpora of a lexical semantic resource akin to WordNet. The paper also describes how this resource can be used to add lexical semantic tags to the text corpus at hand. Finally, we discuss the possibility to add some of the predicates typical for WordNet to i...
Conference Paper
Full-text available
The problem of determining the Worse Case Execution Time (WCET) of a piece of code is a fundamental one in the Real Time Systems community. Existing methods either try to gain this information by analysis of the program code or by running extensive timing analyses. This paper presents a new approach to the problem based on using Machine Learning i...
Conference Paper
Most software engineering methods require some form of model populated with appropriate information. Real-time systems are no exception. A significant issue is that the information needed is not always freely available and derived it using manual methods is costly in terms of time and money. Previous work showed how machine learning information der...
Conference Paper
Full-text available
In our previous work we introduced a hybrid, GA&ILP-based approach for learning of stem-suffix segmentation rules from an unmarked list of words. Evaluation of the method was made difficult by the lack of word corpora annotated with their morphological segmentation. Here the hybrid approach is evaluated indirectly, on the task of tag prediction. A...
Chapter
In order to be truly autonomous, agents need the ability to learn from and adapt to the environment and other agents. This chapter introduces key concepts of machine learning and how they apply to agent and multi-agent systems. Rather than present a comprehensive survey, we discuss a number of issues that we believe are important in the design of l...
Conference Paper
Full-text available
Clustering multiple-instances in a multi-relational environment requires data transformations (e.g. data aggregation) from datasets stored in multiple tables into a single table. Unfortunately, most relational databases are limited to a few basic methods of aggregation (e.g. max, min, sum, count, ave) to aggregate continuous and categorical values....
Conference Paper
Handling numerical data stored in a relational database is different from handling those numerical data stored in a single table due to the multiple occurrences of an individual record in the non-target table and non-determinate relations between tables. Most traditional data mining methods only deal with a single table and discretize columns that...
Article
Nature-inspired algorithms such as genetic algorithms, particle swarm optimisation and ant colony algorithms have successfully solved computer science problems of search and optimisation. The initial implementations of these techniques focused on static ...
Conference Paper
Full-text available
In solving the classification problem in relational data mining, traditional methods, for example, the C4.5 and its variants, usually require data transformations from datasets stored in multiple tables into a single table. Unfortunately, we may loss some information when we join tables with a high degree of one-to-many association. Therefore, data...
Conference Paper
Full-text available
Clustering is an essential data mining task with various types of applications. Traditional clustering algorithms are based on a vector space model representation. A relational database system often contains multi- relational information spread across multiple relations (tables). In order to cluster such data, one would require to restrict the anal...

Network

Cited By