Automated Machine Learning - Methods, Systems, Challenges
Abstract
This open access book presents the first comprehensive overview of general methods in Automated Machine Learning (AutoML), collects descriptions of existing systems based on these methods, and discusses the first series of international challenges of AutoML systems. The recent success of commercial ML applications and the rapid growth of the field has created a high demand for off-the-shelf ML methods that can be used easily and without expert knowledge. However, many of the recent machine learning successes crucially rely on human experts, who manually select appropriate ML architectures (deep learning architectures or more traditional ML workflows) and their hyperparameters. To overcome this problem, the field of AutoML targets a progressive automation of machine learning, based on principles from optimization and machine learning itself. This book serves as a point of entry into this quickly-developing field for researchers and advanced students alike, as well as providing a reference for practitioners aiming to use AutoML in their work.
... By decoupling G from rigid procedural search or adaptation loops, and permitting its instantiation across multiple formal paradigms, the proposed method encompasses and generalizes many existing learning approaches. Standard machine learning (ML) (Bishop, 2006), self-supervised learning (SSL) (LeCun et al., 2021), meta-learning (Finn et al., 2017), neural architecture search (NAS) (Zoph & Le, 2017), hypernetworks (Ha et al., 2016), program synthesis (Gulwani et al., 2017), automated machine learning (AutoML) (Hutter et al., 2019), and neuro-symbolic AI (d'Avila Garcez et al., 2002) all emerge as special cases within this unifying framework-differing only in the structure of G, the semantics of z, or the optimization strategy. Recent explorations of modular hypernetworks (Raviv & Shlezinger, 2024) and evolutionary architecture generation (Zou et al., 2024) suggest promising instantiations of the Generator paradigm in practice. ...
... NEAT (Stanley & Miikkulainen, 2002), TPOT (Olson et al., 2016), AutoKeras (Jin et al., 2019), AutoML (Hutter et al., 2019), AutoGluon (Erickson et al., 2020), AutoML-Zero (Real et al., 2020) Architecture and weights discovered via automated search. ...
... Finally, AutoML systems (Hutter et al., 2019) and meta-architecture frameworks increasingly automate portions of the model design pipeline. Yet most approaches still rely on iterative search, adaptation, or ensembling, rather than direct semantic generation. ...
The design of artificial intelligence systems has historically depended on resource-intensive pipelines of architecture search, parameter optimization, and manual tuning. We propose a fundamental shift: the Generator paradigm, wherein both a model's architecture A and parameters W-or more generally, executable functions-are synthesized directly from compact semantic seeds z via a generator G, formalized as (A, W) = G(z). Unlike traditional approaches that separate architecture discovery and weight learning, our framework decouples the generator G from fixed procedural search and training loops, permitting G to be symbolic, neural, procedural, or hybrid. This abstraction generalizes and unifies existing paradigms-including standard machine learning (ML), self-supervised learning (SSL), meta-learning, neural architecture search (NAS), hypernetworks, program synthesis, automated machine learning (AutoML), and neuro-symbolic AI-as special cases within a broader generative formulation. By reframing model construction as semantic generation rather than incremental optimization, this approach bypasses persistent challenges such as compute-intensive search, brittle task adaptation, and rigid retraining requirements. This work lays a foundation for compact, efficient, and interpretable world model generation, and opens new paths toward scalable, adaptive, and semantically conditioned intelligence systems.
... A limited range of candidate values was specified for each hyperparameter, and the grid search evaluated the model's performance across all combinations. The optimal set of hyperparameters was then selected based on the configuration that achieved the highest performance metrics 31 . ...
... 80% of the samples were designated as the training set, while 20% were used as the test set for holdout validation. Additionally, cross-validation was applied with a set number of folds equal to 5. In each iteration, we calculated the performance metrics of the model on the validation set, such as accuracy (measures the proportion of correctly predicted samples to the total number of samples), precision (represents the proportion of correctly predicted positive samples), recall (represents the proportion of actual positive samples), and F1 score (the harmonic means of precision and recall) 31 . The model's performance was further assessed using confusion matrix, ROC, and AUC. ...
Deciphering Neolithic settlement environmental selection strategies is vital for understanding prehistoric human-environment relationships. This study employs a multi-classification XGBoost model and SHAP analysis to accurately classify 432 Neolithic archaeological sites in Zhejiang Province (AUC = 0.93), effectively distinguishing environmental selection patterns across different cultural phases. The model’s feature importance ranking indicates that elevation, surface relief, slope, and water buffer zones are main factors influencing settlement site selection, though their impact intensity and mechanisms vary significantly across different cultural phases. Early Neolithic settlements (11.0–7.0 ka BP) favored high-altitude, vegetated river valleys supporting hunting-gathering economies, while mid-Neolithic communities (7.0–4.3 ka BP) shifted to lowland alluvial plains promoting rice agriculture. Late Neolithic settlements (4.3 ka BP-) expanded to higher elevations to mitigate flooding risks, coinciding with revived hunting-gathering practices. This study highlights the interplay between environmental and socio-economic factors in shaping settlement patterns and demonstrates the value of machine learning for archaeological research.
... Hyper-parameter optimization has been well studied by the computer science and machine learning research communities. Hutter et al. [17] comprehensively covers HPO methods. Bischl et al. [9] review several HPO methods, including: methods based on (pseudo-) random search and other stochastic sampling processes, Bayesian optimization methods relying on surrogate models, and evolutionary strategies based on biological concepts (e.g. ...
... Hyper-parameter optimization algorithms seek to identify an optimal set of hyper-parameters (λ) that optimize an objective function -f ( ) -corresponding to a user-selected evaluation metric [9,17]. The choice of evaluation metric is problem dependent. ...
Background
Supervised machine learning is increasingly being used to estimate clinical predictive models. Several supervised machine learning models involve hyper-parameters, whose values must be judiciously specified to ensure adequate predictive performance.
Objective
To compare several (nine) hyper-parameter optimization (HPO) methods, for tuning the hyper-parameters of an extreme gradient boosting model, with application to predicting high-need high-cost health care users.
Methods
Extreme gradient boosting models were estimated using a randomly sampled training dataset. Models were separately trained using nine different HPO methods: 1) random sampling, 2) simulated annealing, 3) quasi-Monte Carlo sampling, 4-5) two variations of Bayesian hyper-parameter optimization via tree-Parzen estimation, 6-7) two implementations of Bayesian hyper-parameter optimization via Gaussian processes, 8) Bayesian hyper-parameter optimization via random forests, and 9) the covariance matrix adaptation evolutionary strategy. For each HPO method, we estimated 100 extreme gradient boosting models at different hyper-parameter configurations; and evaluated model performance using an AUC metric on a randomly sampled validation dataset. Using the best model identified by each HPO method, we evaluated generalization performance in terms of discrimination and calibration metrics on a randomly sampled held-out test dataset (internal validation) and a temporally independent dataset (external validation).
Results
The extreme gradient boosting model estimated using default hyper-parameter settings had reasonable discrimination (AUC=0.82) but was not well calibrated. Hyper-parameter tuning using any HPO algorithm/sampler improved model discrimination (AUC=0.84), resulted in models with near perfect calibration, and consistently identified features predictive of high-need high-cost health care users.
Conclusions
In our study, all HPO algorithms resulted in similar gains in model performance relative to baseline models. This finding likely relates to our study dataset having a large sample size, a relatively small number of features, and a strong signal to noise ratio; and would likely apply to other datasets with similar characteristics.
... However, these approaches often require a significant manual effort in model selection and hyperparameter tuning. Recent advancements in Automated Machine Learning (AutoML) have shown to be promising in automating these tasks, leading to more efficient and accurate models (Hutter et al., 2019). The goal is to improve existing neural network models by optimizing their hyperparameters, including architecture and training parameters, so to obtain the best possible model that maximizes prediction accuracy (Yang and Shami, 2020). ...
... In general, there are two main categories of hyperparameters: those that influence the architecture, such as fully connected size, activation function, dropout rate, kernel, and filters in convolutional layers, and those that impact the training process, such as initial learning rate, optimizer, batch size, number of epochs, and more. Therefore, hyperparameter optimization is a critical aspect of machine learning model development, as it involves selecting the best set of hyperparameters that govern the learning process of the model (Hutter et al., 2019;Nematzadeh et al., 2022). Metaheuristic algorithms offer an efficient and effective approach to hyperparameter optimization by exploring the search space in a heuristic manner to find optimal hyperparameter values (Bibaeva, 2018;Cutello et al., 2024b). ...
Decision-making is a crucial process for any organization, since it involves the selection of the most effective action from a variety of options. In this context, data plays an important role in driving decisions. Analyzing data allows us to extract patterns that enable better decision-making for achieving specific goals. However, to make the right decisions to control the behavior of a system, it is necessary to take into account different factors, which can be challenging. Indeed, in dynamic systems, numerous variables change over time, and understanding the future state of these systems can be crucial for controlling the system. Predicting future states based on historical data is known as time series forecasting, which can be divided into univariate and multivariate forecasting, with the latter being particularly relevant due to its consideration of multiple variables. Deep Learning methods enhance decision-making by identifying patterns in complex datasets. As data complexity grows, techniques like Automated Machine Learning optimize model performance. The present study introduces a novel methodology that integrates multivariate time series forecasting into decision-making frameworks. We used Automated-Machine Learning to develop a predictive model for forecasting future system states, aiding optimal decision-making. The study compares machine learning models based on performance metrics and computational cost across various domains, including weather monitoring, power consumption, hospital electricity monitoring, and exchange rates. We also analyzed the importance of the hyperparameters in identifying key factors affecting model performance. The obtained results show that Neural Architecture Search method can improve state predictor design by reducing computational resources and enhancing performance.
... Hyperparameter optimization, a subarea of Automated Machine Learning [1], is critical for improving machine learning models. It focuses on finding the optimal configuration of hyperparameters, such as the learning rate or the batch size [2]. ...
... Therefore, the infinitedimensional vector f t in (17) is almost surely continuous. In other words, there exists a δ > 0 such that for all ζ 1 , ζ 2 ∈ Y such that ∥ζ 1 ...
We introduce Hyperparameter Controller (HyperController), a computationally efficient algorithm for hyperparameter optimization during training of reinforcement learning neural networks. HyperController optimizes hyperparameters quickly while also maintaining improvement of the reinforcement learning neural network, resulting in faster training and deployment. It achieves this by modeling the hyperparameter optimization problem as an unknown Linear Gaussian Dynamical System, which is a system with a state that linearly changes. It then learns an efficient representation of the hyperparameter objective function using the Kalman filter, which is the optimal one-step predictor for a Linear Gaussian Dynamical System. To demonstrate the performance of HyperController, it is applied as a hyperparameter optimizer during training of reinforcement learning neural networks on a variety of OpenAI Gymnasium environments. In four out of the five Gymnasium environments, HyperController achieves highest median reward during evaluation compared to other algorithms. The results exhibit the potential of HyperController for efficient and stable training of reinforcement learning neural networks.
... Many tasks, such as neural architecture search (Elsken, Metzen, and Hutter 2019) and hyperparameter optimization (Hutter, Kotthoff, and Vanschoren 2019;Golovin et al. 2017), can be abstracted as black-box optimization (BBO), which means that although we can evaluate f (x) for any x ∈ X, we have no access to any other information about f , such as the Hessian and gradients. Moreover, it is expensive to evaluate f (x) in most cases. ...
... Meta-learn (or learn-to-learn) is also an important research direction of AutoML (Hutter, Kotthoff, and Vanschoren 2019). Here, we mainly focus on a learnable optimization strategy for BBO , divided into two parts. ...
The core challenge of high-dimensional and expensive black-box optimization (BBO) is how to obtain better performance faster with little function evaluation cost. The essence of the problem is how to design an efficient optimization strategy tailored to the target task. This paper designs a powerful optimization framework to automatically learn the optimization strategies from the target or cheap surrogate task without human intervention. However, current methods are weak for this due to poor representation of optimization strategy. To achieve this, 1) drawing on the mechanism of genetic algorithm, we propose a deep neural network framework called B2Opt, which has a stronger representation of optimization strategies based on survival of the fittest; 2) B2Opt can utilize the cheap surrogate functions of the target task to guide the design of the efficient optimization strategies. Compared to the state-of-the-art BBO baselines, B2Opt can achieve multiple orders of magnitude performance improvement with less function evaluation cost.
... Mixed-variable black-box optimization (MV-BBO) involves the simultaneous optimization of different types of variables, such as continuous, integer, and categorical variables, in black-box settings. MV-BBO problems often appear in various real-world applications, such as hyperparameter optimization in machine learning [13,15], hardware design [21,25], and development of new materials [17,28]. In these problems, there are often dependencies among different types of variables, requiring efficient methods to optimize them simultaneously. ...
... where 2 ppf ( ) is -quantile of 2 -distribution with 1 degree of freedom. We note that if ( +1) mut, does not change before and after the modification in (15), [ ( +1) ] in also does not change. Additionally, no changes are made to ⟨ ( ) ⟩ in , namely, ...
This study focuses on mixed-variable black-box optimization (MV-BBO), addressing continuous, integer, and categorical variables. Many real-world MV-BBO problems involve dependencies among these different types of variables, requiring efficient methods to optimize them simultaneously. Recently, stochastic optimization methods leveraging the mechanism of the covariance matrix adaptation evolution strategy have shown promising results in mixed-integer or mixed-category optimization. However, such methods cannot handle the three types of variables simultaneously. In this study, we propose CatCMA with Margin (CatCMAwM), a stochastic optimization method for MV-BBO that jointly optimizes continuous, integer, and categorical variables. CatCMAwM is developed by incorporating a novel integer handling into CatCMA, a mixed-category black-box optimization method employing a joint distribution of multivariate Gaussian and categorical distributions. The proposed integer handling is carefully designed by reviewing existing integer handlings and following the design principles of CatCMA. Even when applied to mixed-integer problems, it stabilizes the marginal probability and improves the convergence performance of continuous variables. Numerical experiments show that CatCMAwM effectively handles the three types of variables, outperforming state-of-the-art Bayesian optimization methods and baselines that simply incorporate existing integer handlings into CatCMA.
... Starting from the breakthrough of AlexNet [4] at the Im-ageNet competition in 2012, various different architectures, including GoogLeNet [5], ResNet [6] and the Transformer [7], have introduced a plethora of ways to bundle and stack different layers of neural networks. In contrast to these manual engineering approaches, neural architecture search (NAS), a research area within automated machine learning (AutoML) [8], aims to automate the complex process of finding bestperforming architecture for a given task [9]. NAS has contributed to frontier models in image classification [10] and natural language processing [11]. ...
Neural Architecture Search (NAS) accelerates progress in deep learning through systematic refinement of model architectures. The downside is increasingly large energy consumption during the search process. Surrogate-based benchmarking mitigates the cost of full training by querying a pre-trained surrogate to obtain an estimate for the quality of the model. Specifically, energy-aware benchmarking aims to make it possible for NAS to favourably trade off model energy consumption against accuracy. Towards this end, we propose three design principles for such energy-aware benchmarks: (i) reliable power measurements, (ii) a wide range of GPU usage, and (iii) holistic cost reporting. We analyse EA-HAS-Bench based on these principles and find that the choice of GPU measurement API has a large impact on the quality of results. Using the Nvidia System Management Interface (SMI) on top of its underlying library influences the sampling rate during the initial data collection, returning faulty low-power estimations. This results in poor correlation with accurate measurements obtained from an external power meter. With this study, we bring to attention several key considerations when performing energy-aware surrogate-based benchmarking and derive first guidelines that can help design novel benchmarks. We show a narrow usage range of the four GPUs attached to our device, ranging from 146 W to 305 W in a single-GPU setting, and narrowing down even further when using all four GPUs. To improve holistic energy reporting, we propose calibration experiments over assumptions made in popular tools, such as Code Carbon, thus achieving reductions in the maximum inaccuracy from 10.3 % to 8.9 % without and to 6.6 % with prior estimation of the expected load on the device.
... Previously, AI optimization for segmentation tasks has focused on manually designing architectures and fine-tuning hyperparameters [38]. When automated, hyperparameter optimization often targeted a small set of parameters using optimization-libraries [39,40]. ...
Background
Artificial intelligence (AI) methods have established themselves in cardiovascular magnetic resonance (CMR) as automated quantification tools for ventricular volumes, function, and myocardial tissue characterization. Quality assurance approaches focus on measuring and controlling AI-expert differences but there is a need for tools that better communicate reliability and agreement. This study introduces the Verity plot, a novel statistical visualization that communicates the reliability of quantitative parameters (QP) with clear agreement criteria and descriptive statistics.
Methods
Tolerance ranges for the acceptability of the bias and variance of AI-expert differences were derived from intra- and interreader evaluations. AI-expert agreement was defined by bias confidence and variance tolerance intervals being within bias and variance tolerance ranges. A reliability plot was designed to communicate this statistical test for agreement. Verity plots merge reliability plots with density and a scatter plot to illustrate AI-expert differences. Their utility was compared against Correlation, Box and Bland-Altman plots.
Results
Bias and variance tolerance ranges were established for volume, function, and myocardial tissue characterization QPs. Verity plots provided insights into statstistcal properties, outlier detection, and parametric test assumptions, outperforming Correlation, Box and Bland-Altman plots. Additionally, they offered a framework for determining the acceptability of AI-expert bias and variance.
Conclusion
Verity plots offer markers for bias, variance, trends and outliers, in addition to deciding AI quantification acceptability. The plots were successfully applied to various AI methods in CMR and decisively communicated AI-expert agreement.
... AutoML [59] represents an emergent research domain initiated to automate aspects of machine learning across varied industrial sectors [59,60]. AutoML encompasses an array of techniques and tools designed to automate the processes involved in designing, training, and optimizing machine learning models. ...
This study presents a numerical investigation and predictive modeling framework to evaluate the influence of microscale frictional parameters on the mechanical behavior and failure mechanisms of cementitious composites. In the first phase, discrete element modeling (DEM) was employed to analyze the effects of bonded friction angle and non-bonded friction coefficient on the stress–strain response, failure evolution, and macro-scale properties. The results revealed a distinct transition from tensile to shear-dominated failure modes beyond a critical friction angle, accompanied by notable changes in compressive strength and deformation characteristics. Additionally, the role of non-bonded friction coefficient in post-failure behavior was identified, emphasizing its influence on load-redistribution. In the second phase, an AutoML-driven artificial neural network (ANN) was optimized via grid search, selecting an optimal four-layer model to predict macroparameters from microscale DEM inputs. The proposed ANN demonstrated high predictive accuracy, effectively capturing nonlinear dependencies while significantly reducing the need for additional numerical simulations. This integration of DEM and AI-based predictive modeling provides a computationally efficient, scalable solution for material characterization,
enabling faster, data-driven insights into cementitious composite behavior without reliance on extensive simulation campaigns.
... This approach not only streamlines workflows but also fosters cost-effective solutions and innovative breakthroughs. Machine learning is transforming research across numerous fields by enabling faster insights and more precise predictions [28][29][30][31]. It is fascinating to witness how technological advancements are changing our strategies for tackling complex challenges and making informed decisions. ...
This study presents a novel approach consisting of integrating experimental mechanics and machine learning (ML) to predict the dynamic compressive strength of plain and steel fibre reinforced concrete (SFRC) under high strain rates. It addresses key challenges of conventional Hopkinson bar experiments, including high costs, limited accessibility to specialized equipment, and difficulties in replicating extreme conditions. A comprehensive database of 157 experimental datasets was compiled to develop robust predictive models, including random forest, gradient boosting (GB), extreme gradient boosting, and categorical boosting. Among these, GB demonstrated the highest predictive accuracy, emphasizing the dominant influence of strain rate. A key contribution of this study is the development of a user-friendly graphical user interface, which transforms these ML models into a practical tool for researchers and civil engineers, enabling cost-effective and time-efficient estimation of SFRC’s compressive strength under dynamic loading. This work highlights the transformative potential of ML-driven approaches in civil engineering, offering innovative solutions to long-standing experimental challenges.
... Hyperparameter tuning is the process of selecting the most suitable set of hyperparameters for an ML [46] or DL [47] model. Hyperparameter tuning helps achieve better performance of the model by aligning the model with the data distribution. ...
... Auto ML systems employ various optimization strategies to navigate the vast space of possible model configurations, leveraging techniques such as Bayesian optimization, genetic algorithms, and gradient-based approaches to efficiently identify promising solutions. Meta-learning-the concept of learning how to learn-forms another crucial foundation, as Auto ML systems analyze patterns across datasets and modeling tasks to transfer knowledge between problems, improving efficiency and performance [2]. ...
This comprehensive article analyzes the evolving role of Automated Machine Learning (Auto ML) in enterprise predictive systems, exploring its transformative impact on organizational analytics capabilities. The article investigates prominent Auto ML frameworks—including Auto-WEKA, IBM's Auto AI, and Microsoft's Neural Network Intelligence—evaluating their distinctive architectures, capabilities, and enterprise applications. By synthesizing implementation experiences across diverse industry contexts, we identify key benefits of enterprise Auto ML adoption, including substantial efficiency gains, democratization of advanced analytics, and measurable return on investment. However, successful implementation requires addressing significant challenges related to model interpretability, data quality dependencies, domain-specific customization requirements, and organizational change management. Looking forward, the convergence of Auto ML with complementary technologies such as explainable AI, edge computing, and federated learning promises to reshape enterprise predictive capabilities, while emerging regulatory frameworks necessitate thoughtful governance approaches. The article concludes with strategic recommendations for organizations seeking to leverage Auto ML as a cornerstone of their data-driven decision-making infrastructure, emphasizing the importance of balanced implementation approaches that combine technological innovation with appropriate human oversight and domain expertise.
... Traditionally, ML and DL workflows require extensive expertise and manual efforts, making them less accessible to researchers with limited modeling engineering backgrounds. To overcome this limitation, AutoML frameworks explore a range of algorithms and configurations, often employing techniques such as genetic algorithms and neural architecture search [13,14]. Specifically, in the context of EDM, AutoML can enhance predictive modeling by optimizing feature representations and adapting to the complexity of learning behaviors, thereby improving decision-making processes in adaptive learning environments [15,16]. ...
Advancements in modern technology have significantly increased the availability of educational data, presenting researchers with new challenges in extracting meaningful insights. Educational Data Mining offers analytical methods to support the prediction of student outcomes, development of intelligent tutoring systems, and curriculum optimization. Prior studies have highlighted the potential of semi-supervised approaches that incorporate feature selection to identify factors influencing academic success, particularly for improving model interpretability and predictive performance. Many feature selection methods tend to exclude variables that may not be individually powerful predictors but can collectively provide significant information, thereby constraining a model’s capabilities in learning environments. In contrast, Deep Learning (DL) models paired with Automated Machine Learning techniques can decrease the reliance on manual feature engineering, thereby enabling automatic fine-tuning of numerous model configurations. In this study, we propose a reproducible methodology that integrates DL with AutoML to evaluate student performance. We compared the proposed DL methodology to a semi-supervised approach originally introduced by Yu et al. under the same evaluation criteria. Our results indicate that DL-based models can provide a flexible, data-driven approach for examining student outcomes, in addition to preserving the importance of feature selection for interpretability. This proposal is available for replication and additional research.
... Commonly, a logarithmic transformation or a QuantileTransformer 3 is used [23], but, to our knowledge, no study has systematically examined the impact of these transformations on the predictive performance. Notably, this has been neglected by the AutoML literature: Auto-sklearn [14,15] does not support performing target variable transformations and the book on Automated Machine Learning [24] does not discuss this problem. ...
The machine learning pipeline typically involves the iterative process of (1) collecting the data, (2) preparing the data, (3) learning a model, and (4) evaluating a model. Practitioners recognize the importance of the data preparation phase in terms of its impact on the ability to learn accurate models. In this regard, significant attention is often paid to manipulating the feature set (e.g., selection, transformations, dimensionality reduction). A point that is less well appreciated is that transformations on the target variable can also have a large impact on whether it is possible to learn a suitable model. These transformations may include accounting for subject-specific biases (e.g., in how someone uses a rating scale), contexts (e.g., population size effects), and general trends (e.g., inflation). However, this point has received a much more cursory treatment in the existing literature. The goal of this paper is three-fold. First, we aim to highlight the importance of this problem by showing when transforming the target variable has been useful in practice. Second, we will provide a set of generic ``rules of thumb'' that indicate situations when transforming the target variable may be needed. Third, we will discuss which transformations should be considered in a given situation.
... This approach allowed researchers to achieve state-of-the-art results without training models from scratch. Outsourcing all or part(s) of training became popular through services such as APIs, fine-tuning or post-training services, and AutoML tools [HKV19]. ...
The widespread adoption of AI in recent years has led to the emergence of AI supply chains: complex networks of AI actors contributing models, datasets, and more to the development of AI products and services. AI supply chains have many implications yet are poorly understood. In this work, we take a first step toward a formal study of AI supply chains and their implications, providing two illustrative case studies indicating that both AI development and regulation are complicated in the presence of supply chains. We begin by presenting a brief historical perspective on AI supply chains, discussing how their rise reflects a longstanding shift towards specialization and outsourcing that signals the healthy growth of the AI industry. We then model AI supply chains as directed graphs and demonstrate the power of this abstraction by connecting examples of AI issues to graph properties. Finally, we examine two case studies in detail, providing theoretical and empirical results in both. In the first, we show that information passing (specifically, of explanations) along the AI supply chains is imperfect, which can result in misunderstandings that have real-world implications. In the second, we show that upstream design choices (e.g., by base model providers) have downstream consequences (e.g., on AI products fine-tuned on the base model). Together, our findings motivate further study of AI supply chains and their increasingly salient social, economic, regulatory, and technical implications.
... Meta-learning [15] is one potential solution to this problem. Its extension, meta reinforcement learning (MRL), can learn a meta-policy with strong generalization ability given a task distribution, allowing DRL-based algorithms to adapt more quickly to new tasks and achieve better results [16]. ...
In mobile edge computing (MEC), the limited resources of individual edge servers and the uneven workload can significantly impact the system performance and user quality of experience (QoE), necessitating greater consideration for edge-edge and edge-cloud collaboration. However, existing collaborative task offloading strategies still fall short in performance and lack adaptability to dynamic environments. To address these challenges, this paper proposes a meta-reinforcement learning-based two-stage task offloading algorithm, named MRL-TSO, for decentralized task offloading in a generic MEC framework that integrates edge-edge and edge-cloud cooperation. MRL-TSO is the first to leverage MRL to address collaborative task offloading for edge-edge and edge-cloud at the edge layer and innovatively divides the offloading decision into two stages: in the first stage, a trained decision model is used to make preliminary decisions based on local information; in the second stage, the initial decision is combined with the status of edge servers to filter out unavailable resources, thereby improving the reliability of the decision. Experimental results indicate that, compared to the state-of-the-art work, MRL-TSO increases the task success rate by at least 1.43% and reduces latency by 9.69%. Especially in the case of edge server failures, it improves the task success rate by at least 45.60% compared to the competitive algorithms, demonstrating better environmental adaptability and robustness.
... In this second test, the two hyperparameters common to all the algorithms used (except SA), the number of iterations and the number of individuals, were adjusted. This was accomplished using the HyperOpt Python package [65,66].The results from Table 7 illustrate the tuned hyperparameters for each algorithm. These adjusted values were determined based on the exploration range of 100 to 500 epochs and 10 to 100 individuals, with the exception of SA, which is a single-solution algorithm. ...
The train timetabling problem in liberalized railway markets represents a challenge to the coordination between infrastructure managers and railway undertakings. Efficient scheduling is critical in maximizing infrastructure capacity and utilization while adhering as closely as possible to the requests of railway undertakings. These objectives ultimately contribute to maximizing the infrastructure manager's revenues. This paper sets out a modular simulation framework to reproduce the dynamics of deregulated railway systems. Ten metaheuristic algorithms using the MEALPY Python library are then evaluated in order to optimize train schedules in the liberalized Spanish railway market. The results show that the Genetic Algorithm outperforms others in revenue optimization, convergence speed, and schedule adherence. Alternatives, such as Particle Swarm Optimization and Ant Colony Optimization Continuous, show slower convergence and higher variability. The results emphasize the trade-off between scheduling more trains and adhering to requested times, providing insights into solving complex scheduling problems in deregulated railway systems.
... Selecting an appropriate model architecture and tuning hyperparameters, such as learning rate, layer depth, or number of trees, can be computationally expensive and timeconsuming. Despite the advancement of automated machine learning (AutoML) techniques [165], [166], [167], efficiently identifying optimal configurations for deep tabular methods under practical constraints remains challenging and critical for achieving high predictive performance. Domain-Specific Constraints. ...
Tabular data, structured as rows and columns, is among the most prevalent data types in machine learning classification and regression applications. Models for learning from tabular data have continuously evolved, with Deep Neural Networks (DNNs) recently demonstrating promising results through their capability of representation learning. In this survey, we systematically introduce the field of tabular representation learning, covering the background, challenges, and benchmarks, along with the pros and cons of using DNNs. We organize existing methods into three main categories according to their generalization capabilities: specialized, transferable, and general models. Specialized models focus on tasks where training and evaluation occur within the same data distribution. We introduce a hierarchical taxonomy for specialized models based on the key aspects of tabular data -- features, samples, and objectives -- and delve into detailed strategies for obtaining high-quality feature- and sample-level representations. Transferable models are pre-trained on one or more datasets and subsequently fine-tuned on downstream tasks, leveraging knowledge acquired from homogeneous or heterogeneous sources, or even cross-modalities such as vision and language. General models, also known as tabular foundation models, extend this concept further, allowing direct application to downstream tasks without fine-tuning. We group these general models based on the strategies used to adapt across heterogeneous datasets. Additionally, we explore ensemble methods, which integrate the strengths of multiple tabular models. Finally, we discuss representative extensions of tabular learning, including open-environment tabular machine learning, multimodal learning with tabular data, and tabular understanding. More information can be found in the following repository: https://github.com/LAMDA-Tabular/Tabular-Survey.
... La automatización puede dar a conocer las decisiones tomadas por el sistema, dificultando la replicación de estudios y comprometiendo la transparencia de los resultados [13]. ...
El artículo exploró cómo la Inteligencia Artificial (IA) y la Ciencia de Datos han revolucionado la creación y análisis de información al integrar metodologías avanzadas que superan las barreras tradicionales en la interpretación de datos complejos. Se abordaron conceptos fundamentales y desafíos técnicos y éticos actuales, destacando la automatización del ciclo de vida analítico mediante AutoML, la implementación de modelos explicables y la gestión de sesgos algorítmicos. La investigación examinó también las limitaciones de la IA en el procesamiento de datos no estructurados y su interacción con tecnologías emergentes como blockchain y la computación cuántica. Los resultados subrayaron la importancia de establecer normativas que garanticen el equilibrio entre innovación tecnológica y la protección de derechos humanos en un contexto de big data y decisiones automatizadas. Concluye enfatizando que el impacto de la IA trasciende lo técnico, consolidándola como motor de avance interdisciplinario, impulsando tanto el progreso del conocimiento humano como aplicaciones prácticas sostenibles, siempre bajo enfoques éticos y regulados.
... One of the important fields of study in the field of machine learning is hyperparameter optimization. Hyperparameter optimization is expressed as [40]: ...
Smart systems gain importance when both the importance of early diagnosis and treatment for the patients and the costs associated with the use and maintenance of medical devices are considered. In this direction, it is necessary to obtain high-performance models for more effective smart systems. It is aimed to propose an effective hybrid prediction model in which hyperparameters are automatically determined and to lay the groundwork for a clinical decision support system to be developed for the detection of Diabetes Retinopathy. Prediction models have been designed with Extreme Learning Machine and Support Vector Machine algorithms, parameter optimization has been carried out by hybridizing these algorithms with Genetic Algorithm. ELM thought to be open to development and give effective results in terms of application, has been especially preferred. ELM achieved the highest accuracy with a value of 74.49%. The parameters of this model are as the number of neurons is 180, the activation function used is tan-sigmoid, and the threshold value used to determine the class label is 0.427. According to different performance evaluation measures, developed models are more successful than the studies carried out with the same data set in the literature.
... Automated machine learning (AutoML) is an end-to-end machine learning process that is accessible for implementing state-of-the-art machine learning approaches (Hutter et al. 2019;Kanti Karmaker et al. 2021). PyCaret is a representative AutoML library implemented in a Python environment that offers an end-to-end pipeline with low code and performs time-consuming procedures from data preprocessing to modeling functions (Ali 2020;Chauhan et al. 2020;Sarangpure et al. 2023). ...
The extreme rainfall events associated with climate change trigger landslides. Approximately 60% of South Korea comprises mountainous terrain, with steep slopes rendering it particularly prone to landslides. Despite the implementation of early warning systems by the Korea Forest Service (KFS), landslide damage remains substantial, with approximately 2345 hectares affected over the past 5 years, resulting in severe human and economic costs. The current 24-h early warning system, based on Tier 3 administrative division (Town), faces challenges in accurately identifying high-susceptibility landslide areas. Thus, a daily landslide susceptibility model that integrates landslide-associated conditioning factors with meteorological, topographic, and environmental data was designed to assess landslide susceptibility with a spatial resolution of 100 m. Using AutoML, we identified Random Forest as the optimal model for predicting landslide susceptibility. Training the model with landslide data from 2016 to 2022 resulted in an accuracy of 0.93, AUC of 0.98, and F- 1 score of 0.98. A kappa value of 0.85 indicated the effective classification of past landslides using testing data. Location-based validation using 2023 occurrences revealed highly susceptible classifications for 88% of 43 landslides, while spatial scale-based hazard assessment using observed data indicated high hazard for 96% of 607 landslides in Tiers 3 and 4 (Township). Weather forecasting was also found to affect accuracy, with 76% accuracy for forecasts made at 5:00 PM and 41% for forecasts made at 8:00 AM. It was confirmed that further calibration of forecasting data can enhance the performance of the susceptibility model. The designed process thus enhances landslide prevention and preparedness on both local and regional scales, offering a crucial tool for mitigating the impact of landslides in South Korea.
... The best values were selected based on performance metrics such as accuracy and AUC to improve the models' effectiveness in classifying academic and psychological challenges. As suggested by Hutter et al. (2019), common values for hyperparameters like random_state=42 and max_iter=1000 are used to ensure stable results across models. The Table (9) below shows the hyperparameters tested for each model: To strengthen the performance evaluation and ensure that hyperparameter choices were not overfitted to a single data split, a k-fold cross-validation approach (k=5) was employed during hyperparameter tuning. ...
Abstract
Background/purpose. University students in Jordan face numerous
challenges that affect their lifestyle on campus and academic
performance. The most common challenges can be summarized into two
important categories: psychological and academic factors. Psychological
factors, such as anxiety levels and daily sleep duration, and academic
factors such as GPA and study hours, it is worth mentioning that these
phenomena may have related influences on each other and along with
such interactions may heighten negative effects. Furthermore, there is no
solid research on the topic that can provide solutions to both dimensions
in one study. This paper provides a novel analysis-based framework to help
target students who face these challenges in the early stages to provide
quality service and consultation.
Materials/methods. The framework was developed based on a
questionnaire that was built based on consultation of psychological and
academic expertise to extract features that are related to the important
factors. The questionnaire was distributed to 1020 students from several
Jordanian universities. The evaluation of data collected through
questionnaires included three major sections about demographic,
academic, and psychological factors using the SPSS statistical analysis tool
to ensure validity and reliability. After that, the Framework categorizes
each student's challenges using the Large Language Model (LLM) into
academic difficulties, academic and psychological challenges,
psychological distress, and normal students. Finally, multiple classifiers are
applied to obtain the status of the students.
Results. The results show that the collected features from questionnaires
work well with all classifiers with high accuracy. The contributions of this
study include analyzing both academic and psychological factors and
exploring their correlation through a case study conducted in Jordan. Also,
using LLM for categorization along with classifiers provides an early
intervention for students who suffer from academic, and psychological
challenges or both.
Conclusion. These findings suggest that early interventions targeting both
academic and psychological factors are critical for improving student wellbeing
and academic success, providing valuable insights for university
support services.
... The aim is to identify the presence of sinkholes based on key characteristics that influence their occurrence. Additionally, selecting hyperparameters is a critical step in managing the learning process during model construction (Hutter et al. 2019). One of the key tasks in machine learning is to adjust these parameters to optimize performance automatically. ...
Sinkhole hazard mapping using automated visual techniques is challenging because of the difficulty in distinguishing solution depressions from non-sinkhole depressions, such as streams, channels, or man-made circular structures in digital images. While past researchers have proposed semi-automated visual techniques for identifying solution depressions, these methods typically entail a manual visual processing step in which actual sinkhole formations are manually identified in a given geologic formation to establish a basic reference map that is subsequently applied to other areas in the specified geologic formation. This two-step process is lengthy and undermines the purpose of automated mapping. Using surface reflectance data from multispectral satellite imagery allows for identifying carbonate composition lithological units in a digital image. This study proposes integrating multispectral remote sensing with geological analysis to uncover crucial spectral patterns linked to surface mineralogy and environmental conditions associated with sinkhole formations. This integration aims to effectively identify the presence of sinkhole formations while excluding non-sinkhole artifacts from the analysis in a genuinely automated workflow. A crucial aspect of this study involved integrating high-resolution data from Landsat 8 Operational Land Imager (OLI) and Sentinel-2 Multispectral Instrument (MSI) imagery to distinguish rock units in a predominantly karst terrain for identifying surface depressions. In addition, we incorporated attributes covering morphometric, geomorphic, and physical soil properties derived from LiDAR-based topographic depressions. Prior studies have utilized supervised learning methods within machine learning frameworks on datasets containing confirmed sinkholes and non-sinkholes to improve the accuracy of mapping predictions. We utilized three machine learning techniques—Linear Regression, Random Forest, and Gradient Boosting—on the features database to conduct a comparative analysis, aiming to assess the enhancement of the methodology’s effectiveness compared to other studies. We aimed to improve the classification of crucial features and minimize the need for an additional manual visual inspection step to distinguish non-sinkhole formations from potential sinkhole boundaries identified. Among these methods, Random Forest proved to be the most appropriate for recognizing features that directly indicate sinkholes. This approach yielded an impressive Receiver Operating Characteristic (ROC) curve of 92%, showcasing its effectiveness in mapping sinkholes.
... The hyperparameters can be of different types (real values, e.g., learning rate; binary values, e.g., whether to use early stopping or not; integers, e.g., number of neighbors; categories, e.g., type of optimization algorithm). For a given dataset D, the objective of the optimization problem is defined by the relation [31]. ...
The subject of this research is the development of a classifier based on machine learning (ML) that is able to recognize defective and healthy ball bearings. For this purpose, vibration measurements were performed on the bearings, on a total of 196 samples. For each recorded vibration signal, a feature extraction was performed by digital processing in the time domain. The following ML algorithms were used to develop the classifier: K-nearest neighbor (KNN) and support vector machine (SVM) as well as improved versions of the aforementioned algorithms. Improved versions of the mentioned algorithms were obtained by optimizing their hyperparameters. The corresponding models of the KNN and SVM algorithms showed a high percentage of success in classification, 98.5 % and 99.5 %, respectively. By optimizing the hyperparameters, models with a maximum classification success of 100 % were achieved.
... Designing task-specific pipelines manually is timeconsuming and error-prone (Lazebnik, Somech, and Weinberg 2022). Automated ML (AutoML) addresses this by automating ML processes to create optimal pipelines (Hutter, Kotthoff, and Vanschoren 2019;Gu et al. 2024). ...
The end-to-end automated design of machine learning (ML) pipelines significantly reduces the workload for data scientists and democratizes ML for non-experts. Evolutionary algorithm (EA)-based automated ML (AutoML) systems, a prominent category of AutoML, often face inefficiencies due to the costly fitness evaluation of candidate ML pipelines. Although surrogate models have been employed to approximate the true performance of pipelines more quickly, a key challenge remains in effectively bridging the semantic gap between the heterogeneous features of datasets and pipelines. To address this issue, we propose ADELA, a novel accompanying surrogate-based optimization strategy that accelerates EA-based AutoML while retaining the performance of the resulting pipelines. ADELA operates in two phases: Offline, leveraging a high-quality curated pipeline corpus to meta-learn an accompanying surrogate model; and Online, selecting the accompanying pipeline and using the learned model to predict the performance of evaluation pipelines instead of executing them. The accompanying mechanism effectively mitigates the semantic gap between datasets and pipelines, enabling ADELA to reduce computation times by an average of 73.66% while retaining 98.78% of the final pipeline performance, as demonstrated in extensive experimental evaluations.
... Similar to the conventional ML algorithms, FL exhibits sensitivity to empirical choices of hyperparameters (HPs), such as learning rate, and optimization steps (Kairouz et al. 2021). Hyperparameter Tuning (HPT) is a vital yet challenging component of the ML pipeline, which has been extensively studied in the context of centralized ML (Hutter, Kotthoff, and Vanschoren 2019). However, traditional HPT methods, such as Bayesian Optimization (Snoek, Larochelle, and Adams 2012), are not suitable for FL systems. ...
Federated Learning (FL) is a distributed machine learning (ML) paradigm, in which multiple clients collaboratively train ML models without centralizing their local data. Similar to conventional ML pipelines, the client local optimization and server aggregation procedure in FL are sensitive to the hyperparameter (HP) selection. Despite extensive research on tuning HPs for centralized ML, these methods yield suboptimal results when employed in FL. This is mainly because their "training-after-tuning" framework is unsuitable for FL with limited client computation power. While some approaches have been proposed for HP-Tuning in FL, they are limited to the HPs for client local updates. In this work, we propose a novel HP-tuning algorithm, called Federated Population-based Hyperparameter Tuning (FedPop), to address this vital yet challenging problem. FedPop employs population-based evolutionary algorithms to optimize the HPs, which accommodates various HP types at both the client and server sides. Compared with prior tuning methods, FedPop employs an online "tuning-while-training" framework, offering computational efficiency and enabling the exploration of a broader HP search space. Our empirical validation on the common FL benchmarks and complex real-world FL datasets, including full-sized Non-IID ImageNet-1K, demonstrates the effectiveness of the proposed method, which substantially outperforms the concurrent state-of-the-art HP-tuning methods in FL.
Convolutional neural networks have become essential in computer vision, especially for image classification. They depend heavily on hyperparameters, and there is no practical way to manually tune these numerous settings through trial and error. This made it necessary for automated methods, especially those that come with metaheuristic algorithms, to optimize the hyperparameters and build good network architectures. Metaheuristic algorithms provide an easy way of determining the best hyperparameters by generating and testing various combinations using intuitive strategies and principles of solution-finding. This review provides a comprehensive discussion of convolutional neural networks, such as their layers, architectural designs, types, and ways of improvement, with a focus on optimization using metaheuristic algorithms. It highlights prominent algorithms and recent studies aimed at improving hyperparameter selection. By combining results of current and future research, this review should be a helpful resource for researchers, serving as the basis for further research and innovation in automated hyperparameter optimization using metaheuristic approaches, contributing significantly to further development in this field. The study concludes that metaheuristic algorithms significantly enhance the performance of convolutional neural networks with a simple yet effective replacement for manual tuning and high future prospects for automated optimization breakthroughs.
This research explores the emerging field of automated feature engineering in machine learning and stresses how machine learning and AI algorithms can potentially revolutionize predictive modeling. Automation will ease feature generation, transformation, and selection, making it more efficient, boosting model performance, and reducing human bias. Feature engineering has traditionally been time-consuming and skill-based. The research examines several AI-based approaches, i.e., AutoML tools, feature extraction using deep learning, and feature selection using reinforcement learning. It also addresses problems like overfitting, scalability, and the complexity of preprocessing data. The presentation of case studies and applications focuses on the rising importance of machine learning automation. Innovations for future developments and trends in automatic feature engineering are also presented in the conclusion of the paper as informative perspectives to practitioners and researchers.
This position paper argues that optimization problem solving can transition from expert-dependent to evolutionary agentic workflows. Traditional optimization practices rely on human specialists for problem formulation, algorithm selection, and hyperparameter tuning, creating bottlenecks that impede industrial adoption of cutting-edge methods. We contend that an evolutionary agentic workflow, powered by foundation models and evolutionary search, can autonomously navigate the optimization space, comprising problem, formulation, algorithm, and hyperparameter spaces. Through case studies in cloud resource scheduling and ADMM parameter adaptation, we demonstrate how this approach can bridge the gap between academic innovation and industrial implementation. Our position challenges the status quo of human-centric optimization workflows and advocates for a more scalable, adaptive approach to solving real-world optimization problems.
In the era of big data, the synergy between machine learning (ML) and data modeling has emerged as a cornerstone for predictive analytics. This article explores the integration of machine learning techniques with traditional data modeling approaches to enhance decision-making across various domains. By leveraging the strengths of both methodologies, organizations can unlock deeper insights, improve accuracy, and drive innovation. This article discusses key concepts, challenges, and applications, providing a roadmap for researchers and practitioners to harness the full potential of these technologies.
Numerous algorithms have been developed for continuous single-objective optimization (SOO), with performance evaluations typically based on statistical analyses over benchmark problems. However, this approach has limitations, as results often fail to generalize to new problem instances, impeding our ability to build trustworthy optimization methods. A significant challenge is the lack of understanding of algorithm behavior, which varies across different optimization problems, treating these algorithms as black boxes. Landscape analysis offers insights into the meta-features of optimization problems, which are crucial for interpreting algorithm behavior. With the rise of Explainable AI (xAI), there is a growing interest in applying xAI to evolutionary algorithms. This chapter reviews our research on explainable landscape analysis and introduces a novel experiment using xAI to identify key landscape features that characterize optimization problems.
Ensuring safety and safeguarding indoor properties require reliable fire detection methods. Traditional detection techniques that use smoke, heat, or fire sensors often fail due to false positives and slow response time. Existing deep learning-based object detectors fall short of improved accuracy in indoor settings and real-time tracking, considering the dynamic nature of fire and smoke. This study aimed to address these challenges in fire and smoke detection in indoor settings. It presents a hyperparameter-optimized YOLOv5 (HPO-YOLOv5) model optimized by a genetic algorithm. To cover all prospective scenarios, we created a novel dataset comprising indoor fire and smoke images. There are 5,000 images in the dataset, split into training, validation, and testing samples at a ratio of 80:10:10. It also used the Grad-CAM technique to provide visual explanations for model predictions, ensuring interpretability and transparency. This research combined YOLOv5 with DeepSORT (which uses deep learning features to improve the tracking of objects over time) to provide real-time monitoring of fire progression. Thus, it allows for the notification of actual fire hazards. With a mean average precision (mAP@0.5) of 92.1%, the HPO-YOLOv5 model outperformed state-of-the-art models, including Faster R-CNN, YOLOv5, YOLOv7 and YOLOv8. The proposed model achieved a 2.4% improvement in mAP@0.5 over the original YOLOv5 baseline model. The research has laid the foundation for future developments in fire hazard detection technology, a system that is dependable and effective in indoor scenarios.
Artificial neural networks (ANNs) are essential machine learning models widely used in various fields and applications. These models rely on a vector of parameters, which must be computationally estimated. In this study, a fully connected multilayer perceptron ANN, a modern feedforward neural network with two input layers and two hidden layers (each containing 10 neurons), was developed to estimate the ground state binding energy of isotopes with odd mass numbers ranging from 17 to 339, covering 3414 nuclei. The ANN was applied to three models: the integrated nuclear model, the liquid drop model (LDM), and an empirical formula. The predicted ground state binding energies were evaluated using mean square error (MSE), correlation coefficient (R), and accuracy. To optimize the ANN's performance, parameters such as the number of hidden layers and learning rates were refined using the particle swarm optimization (PSO) algorithm. This optimization reduced the ANN error, achieving an MSE of 0.0099706 and a high accuracy of 99.736% for the LDM model. The correlation coefficient R demonstrated a strong association between the target and output values, confirming the accuracy and robustness of the models. The PSO algorithm's optimization further minimized errors and improved the results, validating the differences in binding energy between the three models and the ANN. This approach underscores the effectiveness of ANNs in modeling complex physical phenomena with high precision.
Linear accelerator–magnetic resonance (linac-MR) hybrid systems allow for real-time magnetic resonance imaging (MRI)-guided radiotherapy for more accurate dose delivery to the tumor and improved sparing of the adjacent healthy tissues. However, for real-time tumor detection, it is unfeasible for a human expert to manually contour (gold standard) the tumor at the fast imaging rate of a linac-MR. This study aims to develop a neural network-based tumor autocontouring algorithm with patient-specific hyperparameter optimization (HPO) and to validate its contouring accuracy using in vivo MR images of cancer patients. Two-dimensional (2D) intrafractional MR images were acquired at 4 frames/s using 3 tesla (T) MRI from 11 liver, 24 prostate, and 12 lung cancer patients. A U-Net architecture was applied for tumor autocontouring and was further enhanced by implementing HPO using the Covariance Matrix Adaptation Evolution Strategy. Six hyperparameters were optimized for each patient, for which intrafractional images and experts’ manual contours were input into the algorithm to find the optimal set of hyperparameters. For evaluation, Dice’s coefficient (DC), centroid displacement (CD), and Hausdorff distance (HD) were computed between the manual contours and autocontours. The performance of the algorithm was benchmarked against two standardized autosegmentation methods: non-optimized U-Net and nnU-Net. For the proposed algorithm, the mean (standard deviation) DC, CD, and HD of the 47 patients were 0.92 (0.04), 1.35 (1.03), and 3.63 (2.17) mm, respectively. Compared to the two benchmarking autosegmentation methods, the proposed algorithm achieved the best overall performance in terms of contouring accuracy and speed. This work presents the first tumor autocontouring algorithm applicable to the intrafractional MR images of liver and prostate cancer patients for real-time tumor-tracked radiotherapy. The proposed algorithm performs patient-specific HPO, enabling accurate tumor delineation comparable to that of experts.
The increasing complexity and diversity of tasks in machine learning require the development of models capable of selecting the most suitable algorithm and its optimal configuration for specific problems. This paper explores the concept of meta-learning, where the goal is to train models that can autonomously identify and select the best algorithm, along with its hyperparameters, for a given task. By leveraging historical performance data from a variety of machine learning algorithms, the proposed approach focuses on building predictive models that can generalize across different tasks, leading to improved efficiency and performance in real-world applications. We investigate various methods, including supervised learning, reinforcement learning, and genetic algorithms, for training models that intelligently decide on the best algorithm and its configuration. The results demonstrate significant improvements in task-specific performance, showcasing the potential of adaptive machine learning systems in dynamic environments.
Machine learning (ML) has become a popular tool for prediction in many fields due to its capability to automatically learn patterns based on training datasets to predict targets accurately. The main objective of this manuscript is to evaluate the capability of these computational techniques in predicting thinning and forming force in a two-point incremental forming process (TPIF). Different machine learning (ML) algorithms have been built using a dataset from literature to select the appropriate approach in prediction output responses. This used dataset includes four input parameters: tool nose diameter, step-down increment, sheet thickness and wall angle. An optimal set of hyperparameters has been identified to improve ML models’ performance. The prediction accuracy is typically evaluated using commonly adopted error metrics for regression cases, such as the mean squared error (MSE), the mean absolute error (MAE), and the R ² . ML algorithms evaluated in this work were Multilayer Perceptron, Lasso, Random Forest, Support Vector Regression and Gaussian Processes. Traditional polynomial regression with various degrees was also used for comparison purposes. It is found that SVR model with linear kernel is the most appropriate regression model in those two prediction responses.
Techniques for automatically designing deep neural network architectures such as reinforcement learning based approaches have recently shown promising results. However, their success is based on vast computational resources (e.g. hundreds of GPUs), making them difficult to be widely used. A noticeable limitation is that they still design and train each network from scratch during the exploration of the architecture space, which is highly inefficient. In this paper, we propose a new framework toward efficient architecture search by exploring the architecture space based on the current network and reusing its weights. We employ a reinforcement learning agent as the meta-controller, whose action is to grow the network depth or layer width with function-preserving transformations. As such, the previously validated networks can be reused for further exploration, thus saves a large amount of computational cost. We apply our method to explore the architecture space of the plain convolutional neural networks (no skip-connections, branching etc.) on image benchmark datasets (CIFAR-10, SVHN) with restricted computational resources (5 GPUs). Our method can design highly competitive networks that outperform existing networks using the same design scheme. On CIFAR-10, our model without skip-connections achieves 4.23% test error rate, exceeding a vast majority of modern architectures and approaching DenseNet. Furthermore, by applying our method to explore the DenseNet architecture space, we are able to achieve more accurate networks with fewer parameters.
The demand for performing data analysis is steadily rising. As a consequence, people of different profiles (i.e., non experienced users) have started to analyze their data. However, this is challenging for them. A key step that poses difficulties and determines the success of the analysis is the data mining step (model/algorithm selection problem). Meta-learning is a technique used for assisting non-expert users in this step. The effectiveness of meta-learning, is however, largely dependent on the description/characterization of datasets (i.e., meta-features used for meta-learning). There is need for improving the effectiveness of meta-learning by identifying and designing more predictive meta-features. In this work, we use a method from Exploratory Factor Analysis to study the predictive power of different meta-features collected in OpenML, which is a collaborative machine learning platform that is designed to store and organize metadata about datasets, data mining algorithms, models and their evaluations. We first use the method to extract latent features, which are abstract concepts that group together meta-features with common characteristics. Then, we study and visualize the relationship of the latent-features with 3 different performance measures of 4 classification algorithms on hundreds of datasets available in OpenML, and we select the latent-features with the highest predictive power. Finally, we use the selected latent-features to perform meta-learning and we show that our method improves the meta-learning process. Furthermore, we design an easy to use application for retrieving different metadata from OpenML as the biggest source of data in this domain.
Algorithm selection methods can be speeded-up substantially by incorporating multi-objective measures that give preference to algorithms that are both promising and fast to evaluate. In this paper, we introduce such a measure, A3R, and incorporate it into two algorithm selection techniques: average ranking and active testing. Average ranking combines algorithm rankings observed on prior datasets to identify the best algorithms for a new dataset. The aim of the second method is to iteratively select algorithms to be tested on the new dataset, learning from each new evaluation to intelligently select the next best candidate. We show how both methods can be upgraded to incorporate a multi-objective measure A3R that combines accuracy and runtime. It is necessary to establish the correct balance between accuracy and runtime, as otherwise time will be wasted by conducting less informative tests. The correct balance can be set by an appropriate parameter setting within function A3R that trades off accuracy and runtime. Our results demonstrate that the upgraded versions of Average Ranking and Active Testing lead to much better mean interval loss values than their accuracy-based counterparts.
While bigger and deeper neural network architectures continue to advance the state-of-the-art for many computer vision tasks, real-world adoption of these networks is impeded by hardware and speed constraints. Conventional model compression methods attempt to address this problem by modifying the architecture manually or using pre-defined heuristics. Since the space of all reduced architectures is very large, modifying the architecture of a deep neural network in this way is a difficult task. In this paper, we tackle this issue by introducing a principled method for learning reduced network architectures in a data-driven way using reinforcement learning. Our approach takes a larger `teacher' network as input and outputs a compressed `student' network derived from the `teacher' network. In the first stage of our method, a recurrent policy network aggressively removes layers from the large `teacher' model. In the second stage, another recurrent policy network carefully reduces the size of each remaining layer. The resulting network is then evaluated to obtain a reward -- a score based on the accuracy and compression of the network. Our approach uses this reward signal with policy gradients to train the policies to find a locally optimal student network. Our experiments show that we can achieve compression rates of more than 10x for models such as ResNet-34 while maintaining similar performance to the input `teacher' network. We also present a valuable transfer learning result which shows that policies which are pre-trained on smaller `teacher' networks can be used to rapidly speed up training on larger `teacher' networks.
WEKA is a widely used, open-source machine learning platform. Due to its intuitive interface, it is particularly popular with novice users. However, such users often find it hard to identify the best approach for their particular dataset among the many available. We describe the new version of Auto-WEKA, a system designed to help such users by automatically searching through the joint space of WEKA’s learning algorithms and their respective hyperparameter settings to maximize performance, using a state-of-the-art Bayesian optimization method. Our new package is tightly integrated with WEKA, making it just as accessible to end users as any other learning algorithm. © 2017 Lars Kotthoff, Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown.
Existing methods for structure discovery in time series data construct interpretable, compositional kernels for Gaussian process regression models. While the learned Gaussian process model provides posterior mean and variance estimates, typically the structure is learned via a greedy optimization procedure. This restricts the space of possible solutions and leads to over-confident uncertainty estimates. We introduce a fully Bayesian approach, inferring a full posterior over structures, which more reliably captures the uncertainty of the model.
Restart techniques are common in gradient-free optimization to deal with multimodal functions. Partial restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions. In this paper, we propose a simple restart technique for stochastic gradient descent to improve its anytime performance when training deep neural networks. We empirically study its performance on CIFAR-10 and CIFAR-100 datasets where we demonstrate new state-of-the-art results below 4\% and 19\%, respectively. Our source code is available at https://github.com/loshchil/SGDR.
The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural networks, and styling images with neural art.
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
A comprehensive introduction to Support Vector Machines and related kernel methods.
In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs—-kernels—for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics.
Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.
Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While recent approaches use Bayesian optimization to adaptively select configurations, we focus on speeding up random search through adaptive resource allocation and early-stopping. We formulate hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations. We introduce a novel algorithm, Hyperband, for this framework and analyze its theoretical properties, providing several desirable guarantees. Furthermore, we compare Hyperband with popular Bayesian optimization methods on a suite of hyperparameter optimization problems. We observe that Hyperband can provide over an order-of-magnitude speedup over our competitor set on a variety of deep-learning and kernel-based learning problems. © 2018 Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh and Ameet Talwalkar.
We introduce an automatic machine learning (AutoML) modeling architecture called Autostacker, which combines an innovative hierarchical stacking architecture and an Evolutionary Algorithm (EA) to perform efficient parameter search. Neither prior domain knowledge about the data nor feature preprocessing is needed. Using EA, Autostacker quickly evolves candidate pipelines with high predictive accuracy. These pipelines can be used as is or as a starting point for human experts to build on. Autostacker finds innovative combinations and structures of machine learning models, rather than selecting a single model and optimizing its hyperparameters. Compared with other AutoML systems on fifteen datasets, Autostacker achieves state-of-art or competitive performance both in terms of test accuracy and time cost.
Clinical prognostic models derived from largescale healthcare data can inform critical diagnostic and therapeutic decisions. To enable off-theshelf usage of machine learning (ML) in prognostic research, we developed AUTOPROGNOSIS: a system for automating the design of predictive modeling pipelines tailored for clinical prognosis. AUTOPROGNOSIS optimizes ensembles of pipeline configurations efficiently using a novel batched Bayesian optimization (BO) algorithm that learns a low-dimensional decomposition of the pipelines high-dimensional hyperparameter space in concurrence with the BO procedure. This is achieved by modeling the pipelines performances as a black-box function with a Gaussian process prior, and modeling the similarities between the pipelines baseline algorithms via a sparse additive kernel with a Dirichlet prior. Meta-learning is used to warmstart BO with external data from similar patient cohorts by calibrating the priors using an algorithm that mimics the empirical Bayes method. The system automatically explains its predictions by presenting the clinicians with logical association rules that link patients features to predicted risk strata. We demonstrate the utility of AUTOPROGNOSIS using 10 major patient cohorts representing various aspects of cardiovascular patient care.
Bayesian optimization has become a successful tool for optimizing the hyperparameters of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success, for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization, we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed Fabolas, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that Fabolas often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband. © 2017, Institute of Mathematical Statistics. All rights reserved.
How do people learn about complex functional structure? Taking inspiration from other areas of cognitive science, we propose that this is accomplished by harnessing compositionality: complex structure is decomposed into simpler building blocks. We formalize this idea within the framework of Bayesian regression using a grammar over Gaussian process kernels. We show that participants prefer compositional over non-compositional function extrapolations, that samples from the human prior over functions are best described by a compositional model, and that people perceive compositional functions as more predictable than their non-compositional but otherwise similar counterparts. We argue that the compositional nature of intuitive functions is consistent with broad principles of human cognition.
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.
Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible, wide and deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
We introduce techniques for rapidly transferring the information stored in one neural net into another neural net. The main purpose is to accelerate the training of a significantly larger neural net. During real-world workflows, one often trains very many different neural networks during the experimentation and design process. This is a wasteful process in which each new model is trained from scratch. Our Net2Net technique accelerates the experimentation process by instantaneously transferring the knowledge from a previous network to each new deeper or wider network. Our techniques are based on the concept of function-preserving transformations between neural network specifications. This differs from previous approaches to to pre-training that altered the function represented by a neural net when adding layers to it. Using our knowledge transfer mechanism to add depth to Inception modules, we demonstrate a new state of the art accuracy rating on the ImageNet dataset.
Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features. The objective is to make these higher- level representations more abstract, with their individual features more invariant to most of the variations that are typically present in the training distribution, while collectively preserving as much as possible of the information in the input. Ideally, we would like these representations to disentangle the unknown factors of variation that underlie the training distribution. Such unsupervised learning of representations can be exploited usefully under the hypothesis that the input distribution P(x) is structurally related to some task of interest, say predicting P(y|x). This paper focusses on why unsupervised pre-training of representations can be useful, and how it can be exploited in the transfer learning scenario, where we care about predictions on examples that are not from the same distribution as the training distribution
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
The original ImageNet dataset is a popular large-scale benchmark for training Deep Neural Networks. Since the cost of performing experiments (e.g, algorithm design, architecture search, and hyperparameter tuning) on the original dataset might be prohibitive, we propose to consider a downsampled version of ImageNet. In contrast to the CIFAR datasets and earlier downsampled versions of ImageNet, our proposed ImageNet32 (and its variants ImageNet64 and ImageNet16) contains exactly the same number of classes and images as ImageNet, with the only difference that the images are downsampled to 3232 pixels per image (6464 and 1616 pixels for the variants, respectively). Experiments on these downsampled variants are dramatically faster than on the original ImageNet and the characteristics of the downsampled datasets with respect to optimal hyperparameters appear to remain similar. The proposed datasets and scripts to reproduce our results are available at https://image-net.org/download-images and https://github.com/PatrykChrabaszcz/Imagenet32_Scripts
A data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. Typically, a dataset needs to be pre-processed before being mined. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives. As a consequence, non-experienced users become overwhelmed with pre-processing alternatives. In this paper, we show that the problem can be addressed by automating the pre-processing with the support of meta-learning. To this end, we analyzed a wide range of data pre-processing techniques and a set of classification algorithms. For each classification algorithm that we consider and a given dataset, we are able to automatically suggest the transformations that improve the quality of the results of the algorithm on the dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.
The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://github.com/xgastaldi/shake-shake.
Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning—pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators—such as synthetic feature constructors—that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.
All real-world classification problems require a carefully designed system to achieve the desired generalization performance. Developers need to select a useful feature subset and a classifier with suitable hyperparameters. Furthermore, a feature preprocessing method (e.g. scaling or pre-whitening) and a dimension reduction method (e.g. Principal Component Analysis (PCA), Autoencoders or other manifold learning algorithms) may improve the performance. The interplay of all these components is complex and a manual selection is time-consuming. This paper presents an automatic optimization framework that incorporates feature selection, several feature preprocessing methods, multiple feature transforms learned by manifold learning and multiple classifiers including all hyperparameters. The highly combinatorial optimization problem is solved with an evolutionary algorithm. Additionally, a multi-classifier based on the optimization trajectory is presented which improves the generalization. The evaluation on several datasets shows the effectiveness of the proposed framework.
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
ChaLearn is organizing the Automatic Machine Learning (AutoML) contest for the IJCNN 2015, which challenges participants to solve classification and regression problems without any human intervention. Participants' code is automatically run on the contest servers to train and test learning machines. However, there is no obligation to submit code. Half of the prizes can be won by submitting prediction results only. Datasets of progressive difficulty are introduced throughout six rounds. (Participants can enter the competition in any round.) The rounds alternate phases in which learners are tested on datasets participants have not seen (AutoML), and phases in which participants have limited time to tweak their algorithms on those datasets to improve performance (Tweakathon). This challenge will push the state of the art in fully automatic machine learning on a wide range of real-world problems. The platform will remain available beyond the termination of the challenge.
As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.
Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have identified its scaling to high-dimensions as one of the holy grails of the field. In this paper, we introduce a novel random embedding idea to attack this problem. The resulting Random EMbedding Bayesian Optimization (REMBO) algorithm is very simple, has important invariance properties, and applies to domains with both categorical and continuous variables. We present a thorough theoretical analysis of REMBO. Empirical results confirm that REMBO can efiectively solve problems with billions of dimensions, provided the intrinsic dimensionality is low. They also show that REMBO achieves state-of-the-art performance in optimizing the 47 discrete parameters of a popular mixed integer linear programming solver.
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Model selection and hyperparameter optimization is crucial in applying machine learning to a novel dataset. Recently, a subcommunity of machine learning has focused on solving this problem with Sequential Model-based Bayesian Optimization (SMBO), demonstrating substantial successes in many applications. However, for computationally expensive algorithms the overhead of hyperparameter optimization can still be prohibitive. In this paper we mimic a strategy human domain experts use: speed up optimization by starting from promising configurations that performed well on similar datasets. The resulting initialization technique integrates naturally into the generic SMBO framework and can be trivially applied to any SMBO method. To validate our approach, we perform extensive experiments with two established SMBO frameworks (Spearmint and SMAC) with complementary strengths; optimizing two machine learning frameworks on 57 datasets. Our initialization procedure yields mild improvements for low-dimensional hyperparameter optimization and substantially improves the state of the art for the more complex combined algorithm selection and hyperparameter optimization problem.
We propose a novel and efficient algorithm for maximizing the observed log-likelihood of a multivariate normal data matrix with missing values. We show that our procedure, based on iteratively regressing the missing on the observed variables, generalizes the standard EM algorithm by alternating between different complete data spaces and performing the E-Step incrementally. In this non-standard setup we prove numerical convergence to a stationary point of the observed log-likelihood. For high-dimensional data, where the number of variables may greatly exceed sample size, we perform regularization using a Lasso-type penalty. This introduces sparsity in the regression coefficients used for imputation, permits fast computation and warrants competitive performance in terms of estimating the missing entries. We show on simulated and real data that the new method often improves upon other modern imputation techniques such as k-nearest neighbors imputation, nuclear norm minimization or a penalized likelihood approach with an ℓ-penalty on the concentration matrix. © 2014 Nicolas Städler, Daniel J. Stekhoven and Peter Bühlmann.
Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost.
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. © 2014 Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov.
The success of hand-crafted machine learning systems in many applications raises the question of making machine learning algorithms more autonomous, i.e., to reduce the requirement of expert input to a minimum. We discuss two strategies towards this goal: (1) automated optimization of hyperparameters (including mechanisms for feature selection, preprocessing, model selection, etc) and (2) the development of algorithms with reduced sets of hyperparameters. Since many research directions (e.g., deep learning), show a tendency towards increasingly complex algorithms with more and more hyperparamters, the demand for both of these strategies continuously increases. We review recent hyperparameter optimization methods and discuss data-driven approaches to avoid the introduction of hyperparameters using unsupervised learning. We end in discussing how these complementary strategies can work hand-in-hand, representing a very promising approach towards autonomous machine learning.
Traditional genetic programming only supports the use of arithmetic and logical operators on scalar features. The GTMOEP (Georgia Tech Multiple Objective Evolutionary Programming) framework builds upon this by also handling feature vectors, allowing the use of signal processing and machine learning functions as primitives, in addition to the more conventional operators. GTMOEP is a novel method for automated, data-driven algorithm creation, capable of outperforming human derived solutions.
As an example, GTMOEP was applied to the problem of predicting how long an emergency responder can remain in a hazmat suit before the effects of heat stress cause the user to become unsafe. An existing third-party physics model was leveraged for predicting core temperature from various situational parameters. However, a sustained high heart rate also means that a user is unsafe. To improve performance, GTMOEP was evaluated to predict an expected pull time, computed from both thresholds during human trials.
GTMOEP produced dominant solutions in multiple objective space to the performance of predictions made by the physics model alone, resulting in a safer algorithm for emergency responders to determine operating times in harsh environments. The program generated by GTMOEP will be deployed to a mobile application for their use.