Leandro L. Minku's research while affiliated with University of Birmingham and other places

Publications (158)

Article
Full-text available
The capacitated arc routing problem (CARP) is a challenging combinatorial optimisation problem abstracted from many real-world applications, such as waste collection, road gritting and mail delivery. However, few studies considered dynamic changes during the vehicles’ service, which can cause the original schedule infeasible or obsolete. The few ex...
Article
Time Delay Reservoir (TDR) is a hardware-friendly machine learning approach from two perspectives. First, it can prevent the connection overhead of neural networks with increasing neurons. Second, through its dynamic system representation, TDR can also be implemented in hardware by different systems. However, it performs poorly on tasks that involv...
Article
Design-time evaluation is essential to build the initial software architecture to be deployed. However, experts’ assumptions made at design-time are unlikely to remain true indefinitely in systems that are characterized by scale, hyperconnectivity, dynamism, and uncertainty in operations (e.g. IoT). Therefore, experts’ design-time decisions can be...
Article
Full-text available
When applying evolutionary algorithms to circuit design automation, circuit representation is the first consideration. There have been several studies applying different circuit representations. However, they still have some problems, such as lack of design ability, which means the diversity of evolved circuits was limited by the circuit representa...
Article
Benchmarking heuristic algorithms is vital to understand under which conditions and on what kind of problems cer- tain algorithms perform well. Most benchmarks are performance based, to test algorithm performance under a wide set of con- ditions. There is also resource- and behavior-based benchmarks to test the resource consumption and the behavior...
Article
Full-text available
In automotive digital development, 3D prototype creation is a team effort of designers and engineers, each contributing with ideas and technical evaluations through means of computer simulations. To support the team in the 3D design ideation and exploration task, we propose an interactive design system for assisted design explorations and faster pe...
Article
Full-text available
Following decades of sustained improvement, metaheuristics are one of the great success stories of optimization research. However, in order for research in metaheuristics to avoid fragmentation and a lack of reproducibility, there is a pressing need for stronger scientific and computational infrastructure to support the development, analysis and co...
Preprint
Full-text available
Intelligent techniques are urged to achieve automatic allocation of the computing resource in Open Radio Access Network (O-RAN), to save computing resource, increase utilization rate of them and decrease the delay. However, the existing problem formulation to solve this resource allocation problem is unsuitable as it defines the capacity utility of...
Article
Throughout its development period, a software project experiences different phases, comprises modules with different complexities and is touched by many different developers. Hence, it is natural that problems such as Just-in-Time Software Defect Prediction (JIT-SDP) are affected by changes in the defect generating process (concept drifts), potenti...
Article
Just-In-Time Software Defect Prediction (JIT-SDP) uses machine learning to predict whether software changes are defect-inducing or clean. When adopting JIT-SDP, changes in the underlying defect generating process may significantly affect the predictive performance of JIT-SDP models over time. Therefore, being able to continuously track the predicti...
Article
Full-text available
Cross-Project (CP) Just-In-Time Software Defect Prediction (JIT-SDP) makes use of CP data to overcome the lack of data necessary to train well performing JIT-SDP classifiers at the beginning of software projects. However, such approaches have never been investigated in realistic online learning scenarios, where Within-Project (WP) software changes...
Article
Learning from data streams that emerge from nonstationary environments has many real-world applications and poses various challenges. A key characteristic of such a task is the varying nature of target functions and data distributions over time (concept drifts). Most existing work relies solely on labeled data to adapt to concept drifts in classifi...
Preprint
Full-text available
Benchmarking heuristic algorithms is vital to understand under which conditions and on what kind of problems certain algorithms perform well. Most benchmarks are performance-based, to test algorithm performance under a wide set of conditions. There are also resource- and behaviour-based benchmarks to test the resource consumption and the behaviour...
Preprint
Benchmarking heuristic algorithms is vital to understand under which conditions and on what kind of problems certain algorithms perform well. Most benchmarks are performance-based, to test algorithm performance under a wide set of conditions. There are also resource- and behaviour-based benchmarks to test the resource consumption and the behaviour...
Article
Real-world applications have been dealing with large amounts of data that arrive over time and generally present changes in their underlying joint probability distribution, i.e., concept drift. Concept drift can be subdivided into two types: virtual drift, which affects the unconditional probability distribution $p(\boldsymbol{x})$ , and real dri...
Article
Full-text available
Context: Evaluating software architectures in uncertain environments raises new challenges, which require continuous approaches. We define continuous evaluation as multiple evaluations of the software architecture that begins at the early stages of the development and is periodically and repeatedly performed throughout the lifetime of the software...
Article
Microservices have gained acceptance in software industries as an emerging architectural style for autonomic, scalable, and more reliable computing. Among the critical microservice architecture design decisions is when to adapt the granularity of a microservice architecture by merging/decomposing microservices. No existing work investigates the fol...
Preprint
The capacitated arc routing problem (CARP) is a challenging combinatorial optimisation problem abstracted from typical real-world applications, like waste collection and mail delivery. However, few studies considered dynamic changes during the vehicles' service, which can make the original schedule infeasible or obsolete. The few existing studies a...
Article
Full-text available
Class imbalance introduces additional challenges when learning classifiers from concept drifting data streams. Most existing work focuses on designing new algorithms for dealing with the global imbalance ratio and does not consider other data complexities. Independent research on static imbalanced data has highlighted the influential role of local...
Article
Architecture-based software performance optimisation can help to find potential performance problems and mitigate their negative effects at an early stage. To automate this optimisation process, rule-based and metaheuristic-based performance optimisation methods have been proposed. However, existing rule-based methods explore a limited search space...
Article
Full-text available
Surrogate-assisted evolutionary algorithms (SAEAs), which use efficient surrogate models or meta-models to approximate the fitness function in evolutionary algorithms (EAs), are effective and popular methods for solving computationally expensive optimization problems. During the past decades, a number of SAEAs have been proposed by combining differ...
Preprint
Real-world applications have been dealing with large amounts of data that arrive over time and generally present changes in their underlying joint probability distribution, i.e., concept drift. Concept drift can be subdivided into two types: virtual drift, which affects the unconditional probability distribution p(x), and real drift, which affects...
Article
The International Workshop on Realizing Arti cial Intelligence Synergies in Software Engineering (RAISE) aims to present the state of the art in the crossover between Software Engineering and Arti cial Intelligence. This workshop explored not only the appli- cation of AI techniques to SE problems but also the application of SE techniques to AI prob...
Article
Data stream applications usually suffer from multiple types of concept drift. However, most existing approaches are only able to handle a subset of types of drift well, hindering predictive performance. We propose to use diversity as a framework to handle multiple types of drift. The motivation is that a diverse ensemble can not only contain models...
Preprint
Full-text available
Following decades of sustained improvement, metaheuristics are one of the great success stories of optimization research. However, in order for research in metaheuristics to avoid fragmentation and a lack of reproducibility, there is a pressing need for stronger scientific and computational infrastructure to support the development, analysis and co...
Chapter
The Capacitated Arc Routing Problem (CARP) is an abstraction for typical real world applications, like waste collection, winter gritting and mail delivery, to allow the development of efficient optimization algorithms. Most research work focuses on the static CARP, where all information in the problem remains unchanged over time. However, in the re...
Article
Objective: Acute exacerbations contribute significantly to the morbidity of asthma. Recent studies have shown that early detection and treatment of asthma exacerbations leads to improved outcomes. We aimed to develop a machine learning algorithm to detect severe asthma exacerbations using easily available daily monitoring data. Methods: We analysed...
Article
This issue's practitioners' Digest reports on papers from the 2018 International Conference on Mining Sof tware Repositories, the 2019 International Conference on Requirements Engineering, and the 2019 European Conference on Software Process Improvement. Feedback or suggestions are welcome. In addition, if you try or adopt any of the practices incl...
Conference Paper
Just-In-Time Software Defect Prediction (JIT-SDP) is concerned with predicting whether software changes are defect-inducing or clean based on machine learning classifiers. Building such classifiers requires a sufficient amount of training data that is not available at the beginning of a software project. Cross-Project (CP) JIT-SDP can overcome this...
Article
Full-text available
This paper claims that a new field of empirical software engineering research and practice is emerging: data mining using/used-by optimizers for empirical studies, or DUO. For example, data miners can generate models that are explored by optimizers. Also, optimizers can advise how to best adjust the control parameters of a data miner. This combined...
Article
Full-text available
Software effort estimation is an online supervised learning problem, where new training projects may become available over time. In this scenario, the Cross-Company (CC) approach Dycom can drastically reduce the number of Within-Company (WC) projects needed for training, saving their collection cost. However, Dycom requires CC projects to be split...
Article
This issue's "Practitioners' Digest" department reports on the 2019 International Conference on Software Engineering (ICSE). We focus on two emerging themes: security and energy issues for mobile apps. Feedback or suggestions are welcome. In addition, if you try or adopt any of the practices included in this article, please send me and the author(s...
Article
Learning in non-stationary environments is a challenging task which requires the updating of predictive models to deal with changes in the underlying probability distribution of the problem, i.e., dealing with concept drift. Most work in this area is concerned with updating the learning system so that it can quickly recover from concept drift, whil...
Article
Software effort estimation (SEE) usually suffers from inherent uncertainty arising from predictive model limitations and data noise. Relying on point estimation only may ignore the uncertain factors and lead project managers (PMs) to wrong decision making. Prediction intervals (PIs) with confidence levels (CLs) present a more reasonable representat...
Preprint
In data stream mining, predictive models typically suffer drops in predictive performance due to concept drift. As enough data representing the new concept must be collected for the new concept to be well learnt, the predictive performance of existing models usually takes some time to recover from concept drift. To speed up recovery from concept dr...
Chapter
The fields of transfer learning and learning in non-stationary environments are closely related. Both look into the problem of training and test data that come from different probability distributions. However, these two fields have evolved separately. Transfer learning enables knowledge to be transferred between different domains or tasks in order...
Preprint
Full-text available
This paper claims that a new field of empirical software engineering research and practice is emerging: data mining using/used-by optimizers for empirical studies, or DUO. For example, data miners can generate the models that are explored by optimizers.Also, optimizers can advise how to best adjust the control parameters of a data miner. This combi...
Conference Paper
Full-text available
Software effort estimation (SEE) usually suffers from data scarcity problem due to the expensive or long process of data collection. As a result, companies usually have limited projects for effort estimation, causing unsatisfactory prediction performance. Few studies have investigated strategies to generate additional SEE data to aid such learning....
Conference Paper
Full-text available
Recent research has shown significant success in the detection of social bots. While there are tools to distinguish automated bots from regular user accounts, information about their strategies, biases and influence on their target audience remains harder to obtain. To uncover such details, e.g., to understand the role of bots in political campaign...
Conference Paper
Several data stream applications involve recurring concepts, i.e., concept drifts that change the underlying distribution of the data to a distribution previously seen in the data stream. Examples include electricity price prediction and tweet topic classification. In such scenario, it is useful to maintain a pool of old models that could be recove...
Conference Paper
Full-text available
This paper introduces Data-Driven Search-based Software Engineering (DSE), which combines insights from Mining Software Repositories (MSR) and Search-based Software Engineering (SBSE). While MSR formulates software engineering problems as data mining problems, SBSE reformulate Software Engineering (SE) problems as optimization problems and use meta...
Article
This issue’s column reports on presentations at the 11th International Symposium on Empirical Software Engineering and Measurement, 13th International Conference on Predictive Models and Data Analytics in Software Engineering, and 21st International Systems and Software Product Line Conference.
Article
The prediction of defects in a target project based on data from external projects is called Cross-Project Defect Prediction (CPDP). Several methods have been proposed to improve the predictive performance of CPDP models. However, there is a lack of comparison among state-of-the-art methods. Moreover, previous work has shown that the most suitable...
Article
Full-text available
This paper introduces Data-Driven Search-based Software Engineering (DSE), which combines insights from Mining Software Repositories (MSR) and Search-based Software Engineering (SBSE). While MSR formulates software engineering problems as data mining problems, SBSE reformulates SE problems as optimization problems and use meta-heuristic algorithms...
Conference Paper
Background: Software Effort Estimation (SEE) can be formulated as an online learning problem, where new projects are completed over time and may become available for training. In this scenario, a Cross-Company (CC) SEE approach called Dycom can drastically reduce the number of Within-Company (WC) projects needed for training, saving the high cost o...
Article
Software project scheduling is the problem of allocating employees to tasks in a software project. Due to the large scale of current software projects, many studies have investigated the use of optimization algorithms to find good software project schedules. However, despite the importance of human factors to the success of software projects, exist...
Article
In many applications of information systems learning algorithms have to act in dynamic environments where data are collected in the form of transient data streams. Compared to static data mining, processing streams imposes new computational requirements for algorithms to incrementally process incoming examples while using limited memory and time. F...
Article
Full-text available
Software Effort Estimation (SEE) models can be used for decision-support by software managers to determine the effort required to develop a software project. They are created based on data describing projects completed in the past. Such data could include past projects from within the company that we are interested in (WC projects) and/or from othe...
Article
With the wide application of machine learning algorithms to the real world, class imbalance and concept drift have become crucial learning issues. Class imbalance happens when the data categories are not equally represented, i.e., at least one category is minority compared to other categories. It can cause learning bias towards the majority class a...
Conference Paper
Context The research literature on software development projects usually assumes that effort is a good proxy for cost. Practice, however, suggests that there are circumstances in which costs and effort should be distinguished. Objectives: We determine similarities and differences between size, effort, cost, duration, and number of defects of softwa...
Article
Full-text available
As an emerging research topic, online class imbalance learning often combines the challenges of both class imbalance and concept drift. It deals with data streams having very skewed class distributions, where concept drift may occur. It has recently received increased research attention; however, very little work addresses the combined problem wher...
Article
This issue's column reports on papers from the 24th International Requirements Engineering Conference, 38th International Conference on Software Engineering, and the 10th International Symposium on Empirical Software Engineering and Measurement. Topics include performance and security requirements, injecting human values into software engineering,...
Conference Paper
We use real options theory to evaluate the options of diversity in design by looking at the trade-offs between the cost and long-term value of different architectural strategies under uncertainty, given a set of scenarios of interest. As part of our approach, we extend one of the widely used architecture trade-offs analysis methods (Cost-Benefit An...
Article
Full-text available
Self-adaptive mechanisms for the identification of the most suitable variation operator in Evolutionary Algorithms rely almost exclusively on the measurement of the fitness of the offspring, which may not be sufficient to assess the optimality of an operator (e.g., in a landscape with an high degree of neutrality). This paper proposes a novel Adapt...
Conference Paper
Full-text available
Software project scheduling plays an important role in reducing the cost and duration of software projects. It is an NP-hard combinatorial optimization problem that has been addressed based on single and multi-objective algorithms. However, such algorithms have always used fixed genetic operators, and it is unclear which operators would be more app...
Conference Paper
Background: the terms Within-Company (WC) and Cross-Company (CC) in Software Effort Estimation (SEE) have the connotation that CC projects are considerably different from WC projects, and that WC projects are more similar to the projects being estimated. However, as WC projects can themselves be heterogeneous, this is not always the case. Therefore...
Chapter
As explained in Chapter 5, self-aware and self-expressive systems can be designed based on a number of patterns and primitives. In this chapter, we discuss issues to be considered when developing such systems, especially when going through phases 3 (selecting the best pattern) and 5 (determining primitives and alternatives), and possibly also phase...
Chapter
Chapter 5 has provided step-by-step guidelines on how to design selfaware and self-expressive systems, including several architectural patterns with different levels of self-awareness. Chapter 6 has explained important features in self-aware and self-expressive systems, including adaptivity, robustness, multiobjectivity and decentralisation. To all...
Article
Class evolution, the phenomenon of class emergence and disappearance, is an important research topic for data stream mining. All previous studies implicitly regard class evolution as a transient change, which is not true for many real-world problems. This paper concerns the scenario where classes emerge or disappear gradually. A class-based ensembl...
Article
Full-text available
This issue's column reports on papers from the 19th International Conference on Software Product Lines, the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, the 23rd IEEE International Requirements Engineering Conference, and the 9th European Conference on S...
Article
Full-text available
The field of data mining for software engineering has been growing over the last decade. This field is concerned with the use of data mining to provide useful insights into how to improve software engineering processes and software itself, supporting decision-making. For that, data produced by software engineering processes and products during and...