Michail Basios’s research while affiliated with University College London and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (17)


Fig. 1: Overview of evoML software architecture
evoML Yellow Paper: Evolutionary AI and Optimisation Studio
  • Preprint
  • File available

December 2022

·

38 Reads

·

Leslie Kanthan

·

Michail Basios

·

[...]

·

Vardan Voskanyan

Machine learning model development and optimisation can be a rather cumbersome and resource-intensive process. Custom models are often more difficult to build and deploy, and they require infrastructure and expertise which are often costly to acquire and maintain. Machine learning product development lifecycle must take into account the need to navigate the difficulties of developing and deploying machine learning models. evoML is an AI-powered tool that provides automated functionalities in machine learning model development, optimisation, and model code optimisation. Core functionalities of evoML include data cleaning, exploratory analysis, feature analysis and generation, model optimisation, model evaluation, model code optimisation, and model deployment. Additionally, a key feature of evoML is that it embeds code and model optimisation into the model development process, and includes multi-objective optimisation capabilities.

Download

Cryptocurrency trading: a comprehensive survey

December 2022

·

27,064 Reads

·

309 Citations

Financial Innovation

In recent years, the tendency of the number of financial institutions to include cryptocurrencies in their portfolios has accelerated. Cryptocurrencies are the first pure digital assets to be included by asset managers. Although they have some commonalities with more traditional assets, they have their own separate nature and their behaviour as an asset is still in the process of being understood. It is therefore important to summarise existing research papers and results on cryptocurrency trading, including available trading platforms, trading signals, trading strategy research and risk management. This paper provides a comprehensive survey of cryptocurrency trading research, by covering 146 research papers on various aspects of cryptocurrency trading ( e . g ., cryptocurrency trading systems, bubble and extreme condition, prediction of volatility and return, crypto-assets portfolio construction and crypto-assets, technical trading and others). This paper also analyses datasets, research trends and distribution among research objects (contents/properties) and technologies, concluding with some promising opportunities that remain open in cryptocurrency trading.


Fig. 1: Execution time as the Lines of Code increases for the open source benchmarks The execution time of SOURCERERCC falls mostly in the range of 0.087 to 192 seconds, with an outlier of 2183 seconds
Fig. 2: Execution time as the Number of Methods increases for the open source benchmarks
Clone Detection on Large Scala Codebases

April 2022

·

91 Reads

Code clones are identical or similar code segments. The wide existence of code clones can increase the cost of maintenance and jeopardise the quality of software. The research community has developed many techniques to detect code clones, however, there is little evidence of how these techniques may perform in industrial use cases. In this paper, we aim to uncover the differences when such techniques are applied in industrial use cases. We conducted large scale experimental research on the performance of two state-of-the-art code clone detection techniques, SourcererCC and AutoenCODE, on both open source projects and an industrial project written in the Scala language. Our results reveal that both algorithms perform differently on the industrial project, with the largest drop in precision being 30.7\%, and the largest increase in recall being 32.4\%. By manually labelling samples of the industrial project by its developers, we discovered that there are substantially less Type-3 clones in the aforementioned project than that in the open source projects.



Ascertaining price formation in cryptocurrency markets with machine learning

April 2021

·

672 Reads

·

61 Citations

European Journal of Finance

The cryptocurrency market is amongst the fastest-growing of all the financial markets in the world. Unlike traditional markets, such as equities, foreign exchange and commodities, cryptocurrency market is considered to have larger volatility and illiquidity. This paper is inspired by the recent success of using machine learning for stock market prediction. In this work, we analyze and present the characteristics of the cryptocurrency market in a high-frequency setting. In particular, we applied a machine learning approach to predict the direction of the mid-price changes on the upcoming tick. We show that there are universal features amongst cryptocurrencies which lead to models outperforming asset-specific ones. We also show that there is little point in feeding machine learning models with long sequences of data points; predictions do not improve. Furthermore, we solve the technical challenge to design a lean predictor, which performs well on live data downloaded from crypto exchanges. A novel retraining method is defined and adopted towards this end. Finally, the trade-off between model accuracy and frequency of training is analyzed in the context of multi-label prediction. Overall, we demonstrate that promising results are possible for cryptocurrencies on live data, by achieving a consistent 78% accuracy on the prediction of the mid-price movement on live exchange rate of Bitcoins vs. US dollars.


Genetic Improvement @ ICSE 2020

October 2020

·

55 Reads

·

2 Citations

ACM SIGSOFT Software Engineering Notes

Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceed- ings) there was a wide ranging discussion at the eighth inter- national Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the International Conference on Software En- gineering on Friday 3rd July 2020). Topics included industry take up, human factors, explainabiloity (explainability, jus- tifyability, exploitability) and GI benchmarks. We also con- trast various recent online approaches (e.g. SBST 2020) to holding virtual computer science conferences and workshops via the WWW on the Internet without face to face interac- tion. Finally we speculate on how the Coronavirus Covid-19 Pandemic will a ect research next year and into the future.


Genetic Improvement @ ICSE 2020

October 2020

·

14 Reads

Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceedings) there was a wide ranging discussion at the eighth international Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the 42nd ACM/IEEE International Conference on Software Engineering on Friday 3rd July 2020). Topics included industry take up, human factors, explainabiloity (explainability, justifyability, exploitability) and GI benchmarks. We also contrast various recent online approaches (e.g. SBST 2020) to holding virtual computer science conferences and workshops via the WWW on the Internet without face-2-face interaction. Finally we speculate on how the Coronavirus Covid-19 Pandemic will affect research next year and into the future.


Better Model Selection with a new Definition of Feature Importance

September 2020

·

68 Reads

·

1 Citation

Feature importance aims at measuring how crucial each input feature is for model prediction. It is widely used in feature engineering, model selection and explainable artificial intelligence (XAI). In this paper, we propose a new tree-model explanation approach for model selection. Our novel concept leverages the Coefficient of Variation of a feature weight (measured in terms of the contribution of the feature to the prediction) to capture the dispersion of importance over samples. Extensive experimental results show that our novel feature explanation performs better than general cross validation method in model selection both in terms of time efficiency and accuracy performance.


IEO: Intelligent Evolutionary Optimisation for Hyperparameter Tuning

September 2020

·

29 Reads

Hyperparameter optimisation is a crucial process in searching the optimal machine learning model. The efficiency of finding the optimal hyperparameter settings has been a big concern in recent researches since the optimisation process could be time-consuming, especially when the objective functions are highly expensive to evaluate. In this paper, we introduce an intelligent evolutionary optimisation algorithm which applies machine learning technique to the traditional evolutionary algorithm to accelerate the overall optimisation process of tuning machine learning models in classification problems. We demonstrate our Intelligent Evolutionary Optimisation (IEO)in a series of controlled experiments, comparing with traditional evolutionary optimisation in hyperparameter tuning. The empirical study shows that our approach accelerates the optimisation speed by 30.40% on average and up to 77.06% in the best scenarios.


Figure 1: Surge in downloads via the GP bibliography during the GI workshop (14:00-17:21 BST). (Each day divided into four 6 hour periods.)
Genetic Improvement @ ICSE 2020

July 2020

·

61 Reads

Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceedings) there was a wide ranging discussion at the eighth international Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the 42nd ACM/IEEE International Conference on Software Engineering on Friday 3rd July 2020). Topics included industry take up, human factors, explainabiloity (explainability, justifyability, exploitability) and GI benchmarks. We also contrast various recent online approaches (e.g. SBST 2020) to holding virtual computer science conferences and workshops via the WWW on the Internet without face-2-face interaction. Finally we speculate on how the Coronavirus Covid-19 Pandemic will affect research next year and into the future.


Citations (8)


... ML-based code optimizations have also been adopted by the open-source community [22,37,127] and industry. For example, Artemis++ [43] employs mutation algorithms to generate optimized C++ code, improving runtime, CPU, and memory usage. ...

Reference:

Language Models for Code Optimization: Survey, Challenges and Future Directions
Genetic Optimisation of C++ Applications
  • Citing Conference Paper
  • November 2021

... Blockchain addresses banking-related issues, such as cost reduction, advantages, risks, future trends (Osmani et al. 2021), supply chain finance (Gong et al. 2022a, b), opportunities and policy implications (O'Dair and Owen 2019), and challenges and responses Fang et al. 2022). Studies concentrate on the challenges and limitations of applying blockchain technology to banking and financial systems (Gan et al. 2021;Jaiswal et al. 2022). ...

Cryptocurrency trading: a comprehensive survey

Financial Innovation

... Their approach achieves 10.47% higher accuracy than standard GRU models, demonstrating the value of optimization in managing volatile financial markets and supporting practical risk management strategies. Fang et al. [28] develop an LSTM framework with adaptive retraining, achieving 78% accuracy by processing high-frequency order book data. Their results illustrate the importance of continuous learning and real-time adjustment to sustain model performance in dynamic trading environments. ...

Ascertaining price formation in cryptocurrency markets with machine learning

European Journal of Finance

... Feature Importance corresponds to a range of strategies designed to assign a quantifiable importance score to each predictor variable included within a predictive model. These particular scores serve as a measure of the extent to which each predictor contributes to the model's predictive power [50,51]. A score of greater magnitude suggests a pronounced influence of the specific predictor on the model's outcome predictions. ...

Better Model Selection with a new Definition of Feature Importance
  • Citing Preprint
  • September 2020

... The least precision and recall values have been achieved for type-4 clone detection which is 94.5% and 87.6% respectively. [31] O(s) O(r 2 /|Buckets|) LSH [32] O ( s p+1 + ks) O(ks p logs) LSH w/grouping [33] O(max v∈G |v| p+1 + k|v| ) O ( k v∈G |v| p log|v|) DECKARD w/Post-Processing [15] max{O(c|rcAN|), O v∈G , ( |v| p+1 + k|v|)} O(s + k v∈G|v| ρ +1 log|v| + c|rcAN| 2 ) DP-matching [34] O(max v∈G |v| p+1 + k|v| ) O ( k v∈G |v| p log|v|) Event checking [35] O ( s p+1 + ks) O(ks p logs) Normalisation pipeline [36] O ( s p+1 + ks) O(ks p logs) Context-sensitive pointer analysis [37] s O(nα(n, n)) O(n) s SourcererCC [38] O ( n 2 ) O ( n 2 ) Autoencode [38] O Worst-case complexities of CloneDR, CP-Miner, and DECKARD (r is the number of lines of code, s is the size of a parse tree, |Buckets| is the number of hash tables used in CloneDR, k is the number of node kinds, |v| is the size of a vector group, 0 < ρ < 1, c is the number of clone classes reported, and |rcAN| is the average size of the clone classes). ...

Clone Detection on Large Scala Codebases
  • Citing Conference Paper
  • February 2020

... For example, note that all dynamic approaches support representation switching while most static approaches do not. De Wael et al. [13], Xu [41] both allow the user to specify the data and switching criteria with varying level of compiler-support, while [34] Bench ✓ PetaBricks [3] Bench+Static ✓ ✓ Brainy [10,17] Bench+Static * ✓ CoCo [41] Dynamic ✓ ✓ ✓ [30] Dynamic ✓ ✓ [18,19] Dynamic ✓ ✓ ✓ Late Data Layout [38] Static † ✓ ✓ ✓ SimpleL [6] Static ✓ JitDS [13] Dynamic ✓ ✓ ✓ [32] S+B+D ✓ Artemis [4,5] Static+Bench ✓ ✓ DBFlex [35] Static+Bench ✓ Cres [40] Static ✓ ✓ ✓ CT+ [27,28] Bench+Static * ✓ SETL [33] Static [9] Static † ✓ ✓ * Analysis only † No analysis Table 2. Overview of related work, highlighting fulfilment of the non-quantitative challenges from Section 1. We mark a challenge if a work mentions it explicitly or strongly implies support for it. ...

Darwinian Data Structure Selection
  • Citing Conference Paper
  • October 2018

... GI makes changes to source code and thus can be applied to a wide range of software types. GI has been used to improve many different properties of software, including runtime (Langdon et al. 2015;Petke et al. 2013), memory (Basios et al. 2017;Wu et al. 2015), and energy consumption (Bruce et al. 2015;Burles et al. 2015). ...

Optimising Darwinian Data Structures on Google Guava

Lecture Notes in Computer Science

... While the study showed that developers could benefit from using more specialized data structures for better performance, developers only rarely go beyond the general-purposed collection types. As such, several tools have been proposed to better guide developers in selecting their data structures in Java for better time and memory allocation [2,5], and better energy consumption [19]. This finding that developers only rarely tune their collections has a similar parallel to our set of findings. ...

Darwinian Data Structure Selection