About
195
Publications
39,395
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,521
Citations
Citations since 2017
Introduction
Tome Eftimov currently works as a Researcher at the Computer Systems Department at Jozef Stefan Institute. He did his postdoc at the Department of Biomedical Data Science and the Center for Population Health Sciences, Stanford University. Tome does research in Statistics, Stochastic Optimization Algorithms, Natural Language Processing and Machine Learning.
Additional affiliations
Education
October 2011 - June 2013
Faculty of computer science and engineering
Field of study
- Computer Science and Computer Engineering (Computer networks and E-technologies)
October 2007 - June 2011
Faculty of electrical engineering and information technologies
Field of study
- Electrical engineer on the field informatics and computer science
Publications
Publications (195)
In this paper a novel approach for making a statistical comparison of meta-heuristic stochastic optimization algorithms over multiple single-objective problems is introduced, where a new ranking scheme is proposed to obtain data for multiple problems. The main contribution of this approach is that the ranking scheme is based on the whole distributi...
The accelerating growth of big data in the biomedical domain, with an endless amount of electronic health records and more than 30 million citations and abstracts in PubMed, introduces the need for automatic structuring of textual biomedical data. In this paper, we develop a method for detecting relations between food and disease entities from raw...
Background
Recently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work has been done in natural language processing and mac...
Performance complementarity of solvers available to tackle black-box optimization problems gives rise to the important task of algorithm selection (AS). Automated AS approaches can help replace tedious and labor-intensive manual selection, and have already shown promising performance in various optimization domains. Automated AS relies on machine l...
The application of machine learning (ML) models to the analysis of optimization algorithms requires the representation of optimization problems using numerical features. These features can be used as input for ML models that are trained to select or to configure a suitable algorithm for the problem at hand. Since in pure black-box optimization info...
In black-box optimization, it is essential to understand why an algorithm instance works on a set of problem instances while failing on others and provide explanations of its behavior. We propose a methodology for formulating an algorithm instance footprint that consists of a set of problem instances that are easy to be solved and a set of problem...
A key component of automated algorithm selection and configuration, which in most cases are performed using supervised machine learning (ML) methods is a good-performing predictive model. The predictive model uses the feature representation of a set of problem instances as input data and predicts the algorithm performance achieved on them. Common m...
Leave-one-problem-out (LOPO) performance prediction requires machine learning (ML) models to extrapolate algorithms' performance from a set of training problems to a previously unseen problem. LOPO is a very challenging task even for state-of-the-art approaches. Models that work well in the easier leave-one-instance-out scenario often fail to gener...
Knowledge about the interactions between dietary and biomedical factors is scattered throughout uncountable research articles in an unstructured form (e.g., text, images, etc.) and requires automatic structuring so that it can be provided to medical professionals in a suitable format. Various biomedical knowledge graphs exist, however, they require...
Nowadays, it is really important and crucial to follow the new biomedical knowledge that is presented in scientific literature. To this end, Information Extraction pipelines can help to automatically extract meaningful relations from textual data that further require additional checks by domain experts. In the last two decades, a lot of work has be...
Although recipe data are very easy to come by nowadays, it is really hard to find a complete recipe dataset - with a list of ingredients, nutrient values per ingredient, and per recipe, allergens, etc. Recipe datasets are usually collected from social media websites where users post and publish recipes. Usually written with little to no structure,...
Empirical data plays an important role in evolutionary computation research. To make better use of the available data, ontologies have been proposed in the literature to organize their storage in a structured way. However, the full potential of these formal methods to capture our domain knowledge has yet to be demonstrated. In this work, we evaluat...
Per-instance automated algorithm configuration and selection are gaining significant moments in evolutionary computation in recent years. Two crucial, sometimes implicit, ingredients for these automated machine learning (AutoML) methods are 1) feature-based representations of the problem instances and 2) performance prediction methods that take the...
In the last decades, a great amount of work has been done in predictive modeling of issues related to human and environmental health. Resolution of issues related to healthcare is made possible by the existence of several biomedical vocabularies and standards, which play a crucial role in understanding the health information, together with a large...
Multi-label classification (MLC) is an ML task of predictive modeling in which a data instance can simultaneously belong to multiple classes. MLC is increasingly gaining interest in different application domains such as text mining, computer vision, and bioinformatics. Several MLC algorithms have been proposed in the literature, resulting in a meta...
Many optimization algorithm benchmarking platforms allow users to share their experimental data to promote reproducible and reusable research. However, different platforms use different data models and formats, which drastically complicates the identification of relevant datasets, their interpretation, and their interoperability. Therefore, a seman...
Providing comprehensive details on how and why a stochastic optimization algorithm behaves in a particular way, on a single problem instance or a set of problem instances is a challenging task. For this purpose, we propose a methodology based on problem landscape features and explainable machine learning models, for automated algorithm performance...
Accurate and reliable forecasting is a crucial task in many different domains. The selection of a forecasting algorithm that is suitable for a specific time series can be a challenging task, since the algorithms’ performance depends on the time-series properties, as well as the properties of the forecasting algorithms. The methodology and analysis...
Algorithm selection wizards are effective and versatile tools that automatically select an optimization algorithm given high-level information about the problem and available computational resources, such as number and type of decision variables, maximal number of evaluations, possibility to parallelize evaluations, etc. State-of-the-art algorithm...
Besides the numerous studies in the last decade involving food and nutrition data, this domain remains low resourced. Annotated corpuses are very useful tools for researchers and experts of the domain in question, as well as for data scientists for analysis. In this paper, we present the annotation process of food consumption data (recipes) with se...
Non-communicable diseases are on the rise and are often related to food choices; nutrition affects infectious diseases too. Therefore, there is growing interest in research on public and personal health, as related to food, nutrition behaviour and well-being of consumers throughout the life cycle. These concepts and their relations are complex and...
Per-instance algorithm selection seeks to recommend, for a given problem instance and a given performance criterion, one or several suitable algorithms that are expected to perform well for the particular setting. The selection is classically done offline, using openly available information about the problem instance or features that are extracted...
Many factors significantly influence the outcomes of infectious diseases such as COVID-19. A significant focus needs to be put on dietary habits as environmental factors since it has been deemed that imbalanced diets contribute to chronic diseases. However, not enough effort has been made in order to assess these relations. So far, studies in the f...
Algorithm selection wizards are effective and versatile tools that automatically select an optimization algorithm given high-level information about the problem and available computational resources, such as number and type of decision variables, maximal number of evaluations, possibility to parallelize evaluations, etc. State-of-the-art algorithm...
In this chapter, we introduce the statistical background required to understand the deep statistical comparison methods that are presented in this book. We give an overview of the basic terms used in statistics, starting with descriptive statistics and a special focus on hypothesis testing. At the end, we provide guidelines for which statistical te...
This chapter presents an application of the Deep Statistical Comparison approach to a single-objective optimization. It provides examples of how the Deep Statistical Comparison ranking scheme can be used for a performance assessment of single-objective stochastic optimization algorithms. Next, a practical Deep Statistical Comparison ranking scheme...
This chapter presents an application of the Deep Statistical Comparison approach in multi-objective optimization. It provides examples of how the Deep Statistical Comparison ranking scheme can be used for performance assessment of multi-objective stochastic optimization algorithms using a single-quality-indicator data. Next, different ensembles of...
This chapter provides a short introduction to meta-heuristic stochastic optimization, so that the reader is acquainted with the statistical analysis of the optimization results. First, the optimization and its two main families in the form of combinatorial and numerical optimization are introduced. Next, the two classifications of optimization prob...
This chapter introduces statistical approaches that can be utilized for statistical comparisons of meta-heuristic stochastic optimization algorithms. First, the most commonly used approach for a statistical comparison is presented, followed by a recently published approach, known as the Deep Statistical Comparison. Both approaches are discussed usi...
This chapter provides an introduction to the theory of benchmarking, which is a crucial step when performing a comparison of optimization algorithms. The four main steps of benchmarking will be explained in more detail, starting from identifying the reasons for benchmarking, defining the optimization domain (problem and algorithm selection), defini...
In this chapter, a web-service-based e-learning tool called DSCTool for making a statistical comparison of meta-heuristic optimization algorithms is introduced, without having to worry about making incorrect conclusions. First, the general pipeline of the tool is presented, followed by a detailed explanation of all the web services. Next, two types...
Missing data is a common problem in a wide range of fields that can arise as a result of different reasons: lack of analysis, mishandling samples, measurement error, etc. The area of nutrition and food composition is no exception to the problem of missing values. Missing data in food composition databases (FCDB) significantly limits their usage. Co...
Fair algorithm evaluation is conditioned on the existence of high-quality benchmark datasets that are non-redundant and are representative of typical optimization scenarios. In this paper, we evaluate three heuristics for selecting diverse problem instances which should be involved in the comparison of optimization algorithms in order to ensure rob...
Alzheimer’s disease is still a field of research with lots of open questions. The complexity of the disease prevents the early diagnosis before visible symptoms regarding the individual’s cognitive capabilities occur. This research presents an in-depth analysis of a huge data set encompassing medical, cognitive and lifestyle’s measurements from mor...
Per-instance algorithm selection seeks to recommend, for a given problem instance and a given performance criterion, one or several suitable algorithms that are expected to perform well for the particular setting. The selection is classically done offline, using openly available information about the problem instance or features that are extracted...
Selecting the most suitable algorithm and determining its hyperparameters for a given optimization problem is a challenging task. Accurately predicting how well a certain algorithm could solve the problem is hence desirable. Recent studies in single-objective numerical optimization show that supervised machine learning methods can predict algorithm...
Landscape-aware algorithm selection approaches have so far mostly been relying on landscape feature extraction as a preprocessing step, independent of the execution of optimization algorithms in the portfolio. This introduces a significant overhead in computational cost for many practical applications, as features are extracted and computed via sam...
Predicting the performance of an optimization algorithm on a new problem instance is crucial in order to select the most appropriate algorithm for solving that problem instance. For this purpose, recent studies learn a supervised machine learning (ML) model using a set of problem landscape features linked to the performance achieved by the optimiza...
In this paper, we have proposed a new pipeline for landscape analysis of time-series machine learning datasets that enables us to better understand a benchmarking problem landscape, allows us to select a diverse benchmark datasets portfolio, and reduces the presence of performance assessment bias via bootstrapping evaluation. Combining a large mult...
The focus of the current paper is on a design of responsible governance of food consumer science e-infrastructure using the case study Determinants and Intake Data Platform (DI Data Platform). One of the key challenges for implementation of the DI Data Platform is how to develop responsible governance that observes the ethical and legal frameworks...
In optimization, algorithm selection, which is the selection of the most suitable algorithm for a specific problem, is of great importance, as algorithm performance is heavily dependent on the problem being solved. However, when using machine learning for algorithm selection, the performance of the algorithm selection model depends on the data used...
Predicting the performance of an optimization algorithm on a new problem instance is crucial in order to select the most appropriate algorithm for solving that problem instance. For this purpose, recent studies learn a supervised machine learning (ML) model using a set of problem landscape features linked to the performance achieved by the optimiza...
Many optimization algorithm benchmarking platforms allow users to share their experimental data to promote reproducible and reusable research. However, different platforms use different data models and formats, which drastically complicates the identification of relevant datasets, their interpretation, and their interoperability. Therefore, a seman...
Choosing optimal Deep Learning (DL) architecture and hyperparameters for a particular problem is still not a trivial task among researchers. The most common approach relies on popular architectures proven to work on specific problem domains led on the same experiment environment and setup. However, this limits the opportunity to choose or invent no...
Efficient solving of an unseen optimization problem is related to appropriate selection of an optimization algorithm and its hyper-parameters. For this purpose, automated algorithm performance prediction should be performed that in most commonly-applied practices involves training a supervised ML algorithm using a set of problem landscape features....
In this paper, we present FoodChem, a new Relation Extraction (RE) model for identifying chemicals present in the composition of food entities, based on textual information provided in biomedical peer-reviewed scientific literature. The RE task is treated as a binary classification problem, aimed at identifying whether the contains relation exists...
Background
A better understanding of food-related behaviour and its determinants can be achieved through harmonisation and linking of the various data-sources and knowledge platforms.
Scope
We describe the key decision-making in the development of a prototype of the Determinants and Intake Platform (DI Platform), a data platform that aims to harmo...
Being both a poison and a cure for many lifestyle and non-communicable diseases, food is inscribing itself into the prime focus of precise medicine. The monitoring of few groups of nutrients is crucial for some patients, and methods for easing their calculations are emerging. Our proposed machine learning pipeline deals with nutrient prediction bas...
BACKGROUND
Being both a poison and a cure for many lifestyle and non-communicable diseases, food is inscribing itself into the prime focus of precise medicine, therefore knowing what is in our food has become utmost important. The monitoring of few groups of nutrients has become crucial for some patients, and with that methods for easing their calc...
When designing a benchmark problem set, it is important to create a set of benchmark problems that are a good generalization of the set of all possible problems. One possible way of easing this difficult task is by using artificially generated problems. In this paper, one such single-objective continuous problem generation approach is analyzed and...
Many platforms for benchmarking optimization algorithms offer users the possibility of sharing their experimental data with the purpose of promoting reproducible and reusable research. However, different platforms use different data models and formats, which drastically inhibits identification of relevant data sets, their interpretation, and their...
Accurately predicting the performance of different optimization algorithms for previously unseen problem instances is crucial for high-performing algorithm selection and configuration techniques. In the context of numerical optimization, supervised regression approaches built on top of exploratory landscape analysis are becoming very popular. From...