Chapter

AutoCL: AutoML for Concept Learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Chapter
Full-text available
Analysis of how semantic concepts are represented within Convolutional Neural Networks (CNNs) is a widely used approach in Explainable Artificial Intelligence (XAI) for interpreting CNNs. A motivation is the need for transparency in safety-critical AI-based systems, as mandated in various domains like automated driving. However, to use the concept representations for safety-relevant purposes, like inspection or error retrieval, these must be of high quality and, in particular, stable. This paper focuses on two stability goals when working with concept representations in computer vision CNNs: stability of concept retrieval and of concept attribution. The guiding use-case is a post-hoc explainability framework for object detection (OD) CNNs, towards which existing concept analysis (CA) methods are successfully adapted. To address concept retrieval stability, we propose a novel metric that considers both concept separation and consistency, and is agnostic to layer and concept representation dimensionality. We then investigate impacts of concept abstraction level, number of concept training samples, CNN size, and concept representation dimensionality on stability. For concept attribution stability we explore the effect of gradient instability on gradient-based explainability methods. The results on various CNNs for classification and object detection yield the main findings that (1) the stability of concept retrieval can be enhanced through dimensionality reduction via data aggregation, and (2) in shallow layers where gradient instability is more pronounced, gradient smoothing techniques are advised. Finally, our approach provides valuable insights into selecting the appropriate layer and concept representation dimensionality, paving the way towards CA in safety-critical XAI applications.
Conference Paper
Full-text available
Models computed using deep learning have been effectively applied to tackle various problems in many disciplines. Yet, the predictions of these models are often at most post-hoc and locally explainable. In contrast, class expressions in description logics are ante-hoc and globally explainable. Although state-of-the-art symbolic machine learning approaches are being successfully applied to learn class expressions, their application at large scale has been hindered by their impractical runtimes. Arguably, the reliance on myopic heuristic functions contributes to this limitation. We propose a novel neuro-symbolic class expression learning model, DRILL, to mitigate this limitation. By learning non-myopic heuristic functions with deep Q-learning, DRILL efficiently steers the standard search procedure in a quasi-ordered search space towards goal states. Our extensive experiments on 4 benchmark datasets and 390 learning problems suggest that DRILL converges to goal states at least 2.7 times faster than state-of-the-art models on all learning problems. The results of our statistical significance test confirms that DRILL converges to goal states significantly faster (p-value <1%) than state-of-the-art models on all benchmark datasets. We provide an open-source implementation of DRILL, including pre-trained models, training and evaluation scripts.
Article
Full-text available
Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time‐consuming and irreproducible manual process of trial‐and‐error to find well‐performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization. This article is categorized under: Algorithmic Development > Statistics Technologies > Machine Learning Technologies > Prediction
Article
Full-text available
In the meantime, a wide variety of terminologies, motivations, approaches, and evaluation criteria have been developed within the research field of explainable artificial intelligence (XAI). With the amount of XAI methods vastly growing, a taxonomy of methods is needed by researchers as well as practitioners: To grasp the breadth of the topic, compare methods, and to select the right XAI method based on traits required by a specific use-case context. Many taxonomies for XAI methods of varying level of detail and depth can be found in the literature. While they often have a different focus, they also exhibit many points of overlap. This paper unifies these efforts and provides a complete taxonomy of XAI methods with respect to notions present in the current state of research. In a structured literature analysis and meta-study, we identified and reviewed more than 50 of the most cited and current surveys on XAI methods, metrics, and method traits. After summarizing them in a survey of surveys, we merge terminologies and concepts of the articles into a unified structured taxonomy. Single concepts therein are illustrated by more than 50 diverse selected example methods in total, which we categorize accordingly. The taxonomy may serve both beginners, researchers, and practitioners as a reference and wide-ranging overview of XAI method traits and aspects. Hence, it provides foundations for targeted, use-case-oriented, and context-sensitive future research.
Article
Full-text available
As big data becomes ubiquitous across domains, and more and more stakeholders aspire to make the most of their data, demand for machine learning tools has spurred researchers to explore the possibilities of automated machine learning (AutoML). AutoML tools aim to make machine learning accessible for non-machine learning experts (domain experts), to improve the efficiency of machine learning, and to accelerate machine learning research. But although automation and efficiency are among AutoML’s main selling points, the process still requires human involvement at a number of vital steps, including understanding the attributes of domain-specific data, defining prediction problems, creating a suitable training dataset, and selecting a promising machine learning technique. These steps often require a prolonged back-and-forth that makes this process inefficient for domain experts and data scientists alike and keeps so-called AutoML systems from being truly automatic. In this review article, we introduce a new classification system for AutoML systems, using a seven-tiered schematic to distinguish these systems based on their level of autonomy. We begin by describing what an end-to-end machine learning pipeline actually looks like, and which subtasks of the machine learning pipeline have been automated so far. We highlight those subtasks that are still done manually—generally by a data scientist—and explain how this limits domain experts’ access to machine learning. Next, we introduce our novel level-based taxonomy for AutoML systems and define each level according to the scope of automation support provided. Finally, we lay out a roadmap for the future, pinpointing the research required to further automate the end-to-end machine learning pipeline and discussing important challenges that stand in the way of this ambitious goal.
Article
Full-text available
Passengers significantly affect airport terminal energy consumption and indoor environmental quality. Accurate passenger forecasting provides important insights for airport terminals to optimize their operation and management. However, the COVID-19 pandemic has greatly increased the uncertainty in airport passenger since 2020. There are insufficient studies to investigate which pandemic-related variables should be considered in forecasting airport passenger trends under the impact of COVID-19 outbreaks. In this study, the interrelationship between COVID-19 pandemic trends and passenger traffic at a major airport terminal in China was analyzed on a day-by-day basis. During COVID-19 outbreaks, three stages of passenger change were identified and characterized, i.e., the decline stage, the stabilization stage, and the recovery stage. A typical “sudden drop and slow recovery” pattern of passenger traffic was identified. A LightGBM model including pandemic variables was developed to forecast short-term daily passenger traffic at the airport terminal. The SHapley Additive exPlanations (SHAP) values was used to quantify the contribution of input pandemic variables. Results indicated the inclusion of pandemic variables reduced the model error by 27.7% compared to a baseline model. The cumulative numbers of COVID-19 cases in previous weeks were found to be stronger predictors of future passenger traffic than daily COVID-19 cases in the most recent week. In addition, the impact of pandemic control policies and passengers' travel behavior was discussed. Our empirical findings provide important implications for airport terminal operations in response to the on-going COVID-19 pandemic.
Article
Full-text available
Graph neural networks (GNNs) have been widely used in various graph analysis tasks. As the graph characteristics vary significantly in real-world systems, given a specific scenario, the architecture parameters need to be tuned carefully to identify a suitable GNN. Neural architecture search (NAS) has shown its potential in discovering the effective architectures for the learning tasks in image and language modeling. However, the existing NAS algorithms cannot be applied efficiently to GNN search problem because of two facts. First, the large-step exploration in the traditional controller fails to learn the sensitive performance variations with slight architecture modifications in GNNs. Second, the search space is composed of heterogeneous GNNs, which prevents the direct adoption of parameter sharing among them to accelerate the search progress. To tackle the challenges, we propose an automated graph neural networks (AGNN) framework, which aims to find the optimal GNN architecture efficiently. Specifically, a reinforced conservative controller is designed to explore the architecture space with small steps. To accelerate the validation, a novel constrained parameter sharing strategy is presented to regularize the weight transferring among GNNs. It avoids training from scratch and saves the computation time. Experimental results on the benchmark datasets demonstrate that the architecture identified by AGNN achieves the best performance and search efficiency, comparing with existing human-invented models and the traditional search methods.
Article
Full-text available
Aiming at the problem that the current feature selection algorithm can not adapt to both supervised learning data and unsupervised learning data, and had poor feature interpretability, this paper proposed a hybrid feature selection model based on machine learning and knowledge graph. By the idea of hybridization, this model used supervised learning algorithms, unsupervised learning algorithms and knowledge graph technology to model from the perspective of data features and text features. Firstly, the data-based feature weights were obtained through the machine learning model, and then the text-based weights were obtained by using the knowledge graph technology, and the weight sets are combined to obtain a feature matrix with good explanatory properties that meets both the data and text features. Finally, the case analysis proves that the method proposed in this paper has good effects and interpretability.
Article
Full-text available
Feature selection technique is a knowledge discovery tool which provides an understanding of the problem through the analysis of the most relevant features. Feature selection aims at building better classifier by listing significant features which also helps in reducing computational overload. Due to existing high throughput technologies and their recent advancements are resulting in high dimensional data due to which feature selection is being treated as handy and mandatory in such datasets. This actually questions the interpretability and stability of traditional feature selection algorithms. The high correlation in features frequently produces multiple equally optimal signatures, which makes traditional feature selection method unstable and thus leading to instability which reduces the confidence of selected features. Stability is the robustness of the feature preferences it produces to perturbation of training samples. Stability indicates the reproducibility power of the feature selection method. High stability of the feature selection algorithm is equally important as the high classification accuracy when evaluating feature selection performance. In this paper, we provide an overview of feature selection techniques and instability of the feature selection algorithm. We also present some of the solutions which can handle the different source of instability.
Chapter
Full-text available
As data science becomes increasingly mainstream, there will be an ever-growing demand for data science tools that are more accessible, flexible, and scalable. In response to this demand, automated machine learning (AutoML) researchers have begun building systems that automate the process of designing and optimizing machine learning pipelines. In this chapter we present TPOT v0.3, an open source genetic programming-based AutoML system that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification task. We benchmark TPOT on a series of 150 supervised classification tasks and find that it significantly outperforms a basic machine learning analysis in 21 of them, while experiencing minimal degradation in accuracy on 4 of the benchmarks—all without any domain knowledge nor human input. As such, genetic programming-based AutoML systems show considerable promise in the AutoML domain.
Article
Full-text available
The availability of structured data has increased significantly over the past decade and several approaches to learn from structured data have been proposed. These logic-based, inductive learning methods are often conceptually similar, which would allow a comparison among them even if they stem from different research communities. However, so far no efforts were made to define an environment for running learning tasks on a variety of tools, covering multiple knowledge representation languages. With SML-Bench, we propose a benchmarking framework to run inductive learning tools from the ILP and semantic web communities on a selection of learning problems. In this paper, we present the foundations of SML-Bench, discuss the systematic selection of benchmarking datasets and learning problems, and showcase an actual benchmark run on the currently supported tools.
Article
Full-text available
We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where the conditional distribution of the response variable given the input is the model to be explained. We develop an efficient variational approximation to the mutual information, and show that the resulting method compares favorably to other model explanation methods on a variety of synthetic and real data sets using both quantitative metrics and human evaluation.
Article
Full-text available
Feature filtering algorithms are commonly used for prepossessing high-dimensional datasets such as DNA microarrays. Feature selection ensemble learning is a developing field, where new efficient and robust feature selection algorithms are produced by combining existing ones. We propose a new algorithm called MeLiF which is based on ranking filters ensemble via learning linear form. We compare this algorithm on nine DNA-microarray datasets with basic feature filtering algorithms and RelifF. MeLiF has shown the best AUC score and competitive stability in comparison with other methods.
Article
Full-text available
Feature selection is used in many application areas relevant to expert and intelligent systems, such as data mining and machine learning, image processing, anomaly detection, bioinformatics and natural language processing. Feature selection based on information theory is a popular approach due its computational efficiency , scalability in terms of the dataset dimensionality, and independence from the classifier. Common drawbacks of this approach are the lack of information about the interaction between the features and the classifier, and the selection of redundant and irrelevant features. The latter is due to the limitations of the employed goal functions leading to overestimation of the feature significance. To address this problem, this article introduces two new nonlinear feature selection methods, namely Joint Mutual Information Maximisation (JMIM) and Normalised Joint Mutual Information Maximisation (NJMIM); both these methods use mutual information and the 'maximum of the minimum' criterion, which alleviates the problem of overestimation of the feature significance as demonstrated both theoretically and experimentally. The proposed methods are compared using eleven publically available datasets with five competing methods. The results demonstrate that the JMIM method outperforms the other methods on most tested public datasets, reducing the relative average classification error by almost 6% in comparison to the next best performing method. The statistical significance of the results is confirmed by the ANOVA test. Moreover , this method produces the best trade-off between accuracy and stability.
Article
Full-text available
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes.
Article
Full-text available
Variable and feature selection have become the focus of much research in areas of application for which datasets with tells or hundreds of thousands of variables are available. These areas include text processing of internet documents, gene expression array analysis, and combinatorial chemistry. The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of the underlying process that generated the data. The contributions of this special issue cover a wide range of aspects of such problems: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
Conference Paper
Full-text available
In this paper we focus on learning concept descriptions expressed in Description Logics. After stating the learning problem in this context, a FOIL-like algorithm is presented that can be applied to general DL languages, discussing related theoretical aspects of learning with the inherent incompleteness underlying the semantics of this representation. Subsequently we present an experimental evaluation of the implementation of this algorithm performed on some real ontologies in order to empirically assess its performance.
Article
Full-text available
With the advent of the Semantic Web, description logics have become one of the most prominent paradigms for knowledge representation and reasoning. Progress in re- search and applications, however, is constrained by the lack of well-structured knowledge bases consisting of a sophisticated schema and instance data adhering to this schema. It is paramount that suitable automated methods for their acquisition, maintenance, and evolu- tion will be developed. In this paper, we provide a learning algorithm based on refinement operators for the description logic ALCQ including support for concrete roles. We develop the algorithm from thorough theoretical foundations by identifying possible abstract prop- erty combinations which refinement operators for description logics can have. Using these investigations as a basis, we derive a practically useful complete and proper refinement operator. The operator is then cast into a learning algorithm and evaluated using our im- plementation DL-Learner. The results of the evaluation show that our approach is superior to other learning approaches on description logics, and is competitive with established ILP systems.
Article
Full-text available
In this paper, we introduce DL-Learner, a framework for learning in description logics and OWL. OWL is the official W3C standard ontology language for the Seman tic Web. Concepts in this language can be learned for constructing and maintaining OWL ontologies or for solving prob- lems similar to those in Inductive Logic Programming. DL-Learner includes several learning al- gorithms, support for different OWL formats, reasoner interfaces, and learning problems. It is a cross-platform framework implemented in Java. The framework allows easy programmatic access and provides a command line interface, a graphical interface as well as a WSDL-based web service.
Article
Full-text available
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.
Chapter
Class expression learning in description logics has long been regarded as an iterative search problem in an infinite conceptual space. Each iteration of the search process invokes a reasoner and a heuristic function. The reasoner finds the instances of the current expression, and the heuristic function computes the information gain and decides on the next step to be taken. As the size of the background knowledge base grows, search-based approaches for class expression learning become prohibitively slow. Current neural class expression synthesis (NCES) approaches investigate the use of neural networks for class expression learning in the attributive language with complement (ALC\mathcal {ALC}). While they show significant improvements over search-based approaches in runtime and quality of the computed solutions, they rely on the availability of pretrained embeddings for the input knowledge base. Moreover, they are not applicable to ontologies in more expressive description logics. In this paper, we propose a novel NCES approach which extends the state of the art to the description logic ALCHIQ(D)\mathcal {ALCHIQ(D)}. Our extension, dubbed NCES2, comes with an improved training data generator and does not require pretrained embeddings for the input knowledge base as both the embedding model and the class expression synthesizer are trained jointly. Empirical results on benchmark datasets suggest that our approach inherits the scalability capability of current NCES instances with the additional advantage that it supports more complex learning problems. NCES2 achieves the highest performance overall when compared to search-based approaches and to its predecessor NCES. We provide our source code, datasets, and pretrained models at https://github.com/dice-group/NCES2.
Chapter
Many applications require explainable node classification in knowledge graphs. Towards this end, a popular “white-box” approach is class expression learning: Given sets of positive and negative nodes, class expressions in description logics are learned that separate positive from negative nodes. Most existing approaches are search-based approaches generating many candidate class expressions and selecting the best one. However, they often take a long time to find suitable class expressions. In this paper, we cast class expression learning as a translation problem and propose a new family of class expression learning approaches which we dub neural class expression synthesizers. Training examples are “translated” into class expressions in a fashion akin to machine translation. Consequently, our synthesizers are not subject to the runtime limitations of search-based approaches. We study three instances of this novel family of approaches based on LSTMs, GRUs, and set transformers, respectively. An evaluation of our approach on four benchmark datasets suggests that it can effectively synthesize high-quality class expressions with respect to the input examples in approximately one second on average. Moreover, a comparison to state-of-the-art approaches suggests that we achieve better F-measures on large datasets. For reproducibility purposes, we provide our implementation as well as pretrained models in our public GitHub repository at https://github.com/dice-group/NeuralClassExpressionSynthesisKeywordsNeural networkConcept learningClass expression learningLearning from examples
Article
Graph structured data has wide applicability in various domains such as physics, chemistry, biology, computer vision, and social networks, to name a few. Recently, graph neural networks (GNN) were shown to be successful in effectively representing graph structured data because of their good performance and generalization ability. However, explaining the effectiveness of GNN models is a challenging task because of the complex nonlinear transformations made over the iterations. In this paper, we propose GraphLIME, a local interpretable model explanation for graphs using the Hilbert-Schmidt Independence Criterion (HSIC) Lasso, which is a nonlinear feature selection method. GraphLIME is a generic GNN-model explanation framework that learns a nonlinear interpretable model locally in the subgraph of the node being explained. Through experiments on two real-world datasets, the explanations of GraphLIME are found to be of extraordinary degree and more descriptive in comparison to the existing explanation methods.
Chapter
Concept learning approaches based on refinement operators explore partially ordered solution spaces to compute concepts, which are used as binary classification models for individuals. However, the number of concepts explored by these approaches can grow to the millions for complex learning problems. This often leads to impractical runtimes. We propose to alleviate this problem by predicting the length of target concepts before the exploration of the solution space. By these means, we can prune the search space during concept learning. To achieve this goal, we compare four neural architectures and evaluate them on four benchmarks. Our evaluation results suggest that recurrent neural network architectures perform best at concept length prediction with a macro F-measure ranging from 38% to 92%. We then extend the CELOE algorithm, which learns ALC concepts, with our concept length predictor. Our extension yields the algorithm CLIP. In our experiments, CLIP is at least 7.5× faster than other state-of-the-art concept learning algorithms for ALC—including CELOE—and achieves significant improvements in the F-measure of the concepts learned on 3 out of 4 datasets. For reproducibility, we provide our implementation in the public GitHub repository at https://github.com/dice-group/LearnALCLengths.
Article
With the continuous improvement of the status of machine learning in business, scientific research and even life, feature engineering, an indispensable part of machine learning tasks, has attracted more and more researchers attention. In feature engineering, feature engineering is an important component for generating new effective information. Researches in the past hope to operate available data to acquire new features, it does not import extra information. Therefore, this paper puts forward the feature generation method based on knowledge graph. This paper introduces the basic application method of knowledge graph by using the convenience of obtaining structured data in knowledge graph. This paper constructs campus big data mental health assessment data through questionnaires and campus databases, uses feature generation methods based on knowledge maps to generate features, obtains new data generated by collation, and uses the model to verify; finally, the experimental results show that after optimization The recall rate of the data set in the four integrated learning models has increased to more than 60%, which can provide preliminary guidance for judging the mental health status of students.
Article
The recent series of innovations in deep learning (DL) have shown enormous potential to impact individuals and society, both positively and negatively. DL models utilizing massive computing power and enormous datasets have significantly outperformed prior historical benchmarks on increasingly difficult, well-defined research tasks across technology domains such as computer vision, natural language processing, and human-computer interactions. However, DL's black-box nature and over-reliance on massive amounts of data condensed into labels and dense representations pose challenges for interpretability and explainability. Furthermore, DLs have not proven their ability to effectively utilize relevant domain knowledge critical to human understanding. This aspect was missing in early data-focused approaches and necessitated knowledge-infused learning (K-iL) to incorporate computational knowledge. This article demonstrates how knowledge, provided as a knowledge graph, is incorporated into DL using K-iL. Through examples from natural language processing applications in healthcare and education, we discuss the utility of K-iL towards interpretability and explainability.
Article
Micro-array technology generates high-dimensional data. The high dimensionality of data hampers the learning capability of machine learning algorithms. Dimensionality can be reduced using feature selection (FS) techniques, which is an important and essential pre-processing step to process high dimensional data. In this work, a hybrid filter–wrapper approach is proposed for feature selection. The multi-attribute decision-making method called Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) is used as a filter for informative feature extraction. Further, Binary Jaya algorithm with time-varying transfer function is proposed as a wrapper feature selector to find the optimal subset of features. The proposed approach is tested on 10 benchmark micro-array datasets and compared with state-of-the-art methods. Experimental results suggest that the proposed approach performs better in terms of classification accuracy and it is 10 times faster than existing approaches.
Article
Deep learning (DL) techniques have obtained remarkable achievements on various tasks, such as image recognition, object detection, and language modeling. However, building a high-quality DL system for a specific task highly relies on human expertise, hindering its wide application. Meanwhile, automated machine learning (AutoML) is a promising solution for building a DL system without human assistance and is being extensively studied. This paper presents a comprehensive and up-to-date review of the state-of-the-art (SOTA) in AutoML. According to the DL pipeline, we introduce AutoML methods-covering data preparation, feature engineering, hyperparameter optimization, and neural architecture search (NAS)-with a particular focus on NAS, as it is currently a hot sub-topic of AutoML. We summarize the representative NAS algorithms' performance on the CIFAR-10 and ImageNet datasets and further discuss the following subjects of NAS methods: one/two-stage NAS, one-shot NAS, joint hyperparameter and architecture optimization, and resource-aware NAS. Finally, we discuss some open problems related to the existing AutoML methods for future research.
Article
The Web of Data is one of the perspectives of the Semantic Web. In this context, concept learning services, supported by multirelational machine learning, have been integrated in various tools for knowledge engineers to carry out several tasks related to the construction, completion and maintenance of the knowledge bases: essentially they are used to elicit new candidate concept definitions (i.e. axioms regarding classes) to be incorporated in the knowledge bases possibly also as replacements for previous ones. Sundry reference approaches rely on a covering strategy to generalize input examples that can be regarded as a form of hill-climbing search that explores a huge discrete conceptual space. Methods adopting this strategy are known to be affected by an inherent problem of myopia. In particular, our DL-Foil has been shown to suffer from this problem as its algorithm is based on a stochastic yet informed exploration of the concept space, by means of a refinement operator, to generate partial descriptions iteratively. To tackle this problem and enhance the performance of our system we have introduced a series of extensions of the original DL-Foil algorithm, that have led to various releases of its spin-off DL-Focl. Essentially they aim at reducing the aforementioned problem through specific strategies grounded on either the integration of meta-heuristics, such as repeated hill-climbing and tabu search, or the employment of some form of lookahead. In this work, we present consolidated and extended releases of both DL-Foil and DL-Focl along various dimensions: better heuristics and stop conditions, more complex refinement operators with the possibility to perform the specialization adopting iterative deepening or lookahead strategies, improved versions of the algorithm based on the repeated hill-climbing strategy with new quality criteria and of the tabu search with a different policy for managing the local memory. All the implementations of these approaches have been extensively evaluated in three experimental sessions, involving various publicly available knowledge bases and fragments extracted from the Linked Data cloud, showing interesting results and indicating some lessons to be learned: our approaches outperformed a popular reference system from the DL-Learner framework on learning problems when the open-world semantics is explicitly considered. They also exhibited an analogous performance on a benchmark of datasets from contexts with an intended underlying closed-world semantics.
Conference Paper
Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms, e.g., NASNet, PNAS, usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a neural network while changing its neural architecture, could be helpful for NAS by enabling more efficient training during the search. In this paper, we propose a novel framework enabling Bayesian optimization to guide the network morphism for efficient neural architecture search. The framework develops a neural network kernel and a tree-structured acquisition function optimization algorithm to efficiently explores the search space. Extensive experiments on real-world benchmark datasets have been done to demonstrate the superior performance of the developed framework over the state-of-the-art methods. Moreover, we build an open-source AutoML system based on our method, namely Auto-Keras. The code and documentation are available at https://autokeras.com. The system runs in parallel on CPU and GPU, with an adaptive search strategy for different GPU memory limits.
Conference Paper
The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API that allows users to construct the parameter search space dynamically, (2) efficient implementation of both searching and pruning strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purposes, ranging from scalable distributed computing to light-weight experiment conducted via interactive interface. In order to prove our point, we will introduce Optuna, an optimization software which is a culmination of our effort in the development of a next generation optimization software. As an optimization software designed with define-by-run principle, Optuna is particularly the first of its kind. We will present the design-techniques that became necessary in the development of the software that meets the above criteria, and demonstrate the power of our new design through experimental results and real world applications. Our software is available under the MIT license (https://github.com/pfnet/optuna/).
Article
Hyperparameters are important for machine learning algorithms since they directly control the behaviors of training algorithms and have a significant effect on the performance of machine learning models. Several techniques have been developed and successfully applied for certain application domains. However, this work demands professional knowledge and expert experience. And sometimes it has to resort to the brute-force search. Therefore, if an efficient hyperparameter optimization algorithm can be developed to optimize any given machine learning method, it will greatly improve the efficiency of machine learning. In this paper, we consider building the relationship between the performance of the machine learning models and their hyperparameters by Gaussian processes. In this way, the hyperparameter tuning problem can be abstracted as an optimization problem and Bayesian optimization is used to solve the problem. Bayesian optimization is based on the Bayesian theorem. It sets a prior over the optimization function and gathers the information from the previous sample to update the posterior of the optimization function. A utility function selects the next sample point to maximize the optimization function. Several experiments were conducted on standard test datasets. Experiment results show that the proposed method can find the best hyperparameters for the widely used machine learning models, such as the random forest algorithm and the neural networks, even multi-grained cascade forest under the consideration of time cost.
Article
Although morphological features have been used in the diagnosis of a variety of neurological and psychiatric disorders, these features did not show significant discriminative value in identifying Autism Spectrum Disorder (ASD), possibly due to the omission of changes in structural similarities among cortical regions. In this study, structural images from 66 high-functioning adults with ASD and 66 matched typically-developing controls (TDC) were used to test the hypothesis of the abnormality of cortico-cortical relationship in ASD. Seven morphological features of each of the 360 brain regions were extracted and elastic network was used to quantify the similarities between each target region and all other regions. The similarities were used to construct multi-feature-based networks (MFN), which were then submitted to a support vector machine classifier to classify the individuals of the two groups. Results showed MFN significantly improved the accuracy of discriminating patients with ASD from TDCs (78.63%) compared to using morphological features only (< 65%). The combination of MFN with morphological features and other high-level MFN properties did not further enhance the classification performance. Our findings demonstrate that the variations in cortico-cortical similarities are important in the etiology of ASD and can be used as biomarkers in the diagnostic process.
Article
Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While recent approaches use Bayesian optimization to adaptively select configurations, we focus on speeding up random search through adaptive resource allocation and early-stopping. We formulate hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations. We introduce a novel algorithm, Hyperband, for this framework and analyze its theoretical properties, providing several desirable guarantees. Furthermore, we compare Hyperband with popular Bayesian optimization methods on a suite of hyperparameter optimization problems. We observe that Hyperband can provide over an order-of-magnitude speedup over our competitor set on a variety of deep-learning and kernel-based learning problems. © 2018 Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh and Ameet Talwalkar.
Conference Paper
This paper proposes a mixed-initiative feature engineering approach using explicit knowledge captured in a knowledge graph complemented by a novel interactive visualization method. Using the explicitly captured relations and dependencies between concepts and their properties, feature engineering is enabled in a semi-automatic way. Furthermore, the results (and decisions) obtained throughout the process can be utilized for refining the features and the knowledge graph. Analytical requirements can then be conveniently captured for feature engineering -- enabling integrated semantics-driven data analysis and machine learning.
Article
In machine learning, one often encounters data sets where a general pattern is violated by a relatively small number of exceptions (for example, a rule that says that all birds can fly is violated by examples such as penguins). This complicates the concept learning process and may lead to the rejection of some simple and expressive rules that cover many cases. In this paper we present an approach to this problem in description logic learning by computing partial descriptions (which are not necessarily entirely complete) of both positive and negative examples and combining them. Our Symmetric Parallel Class Expression Learning approach enables the generation of general rules with exception patterns included. We demonstrate that this algorithm provides significantly better results (in terms of metrics such as accuracy, search space covered, and learning time) than standard approaches on some typical data sets. Further, the approach has the added benefit that it can be paral-lelised relatively simply, leading to much faster exploration of the search tree on modern computers.
Book
With the advent of the Semantic Web and Semantic Technologies, ontologies have become one of the most prominent paradigms for knowledge representation and reasoning. However, recent progress in the field faces a lack of well structured ontologies with large amounts of instance data due to the fact that engineering such ontologies requires a considerable investment of resources. Nowadays, knowledge bases often provide large volumes of data without sophisticated schemata. Hence, methods for automated schema acquisition and maintenance are sought. Schema acquisition is closely related to solving typical classification problems in machine learning, e.g. the detection of chemical compounds causing cancer. In this work, we investigate both, the underlying machine learning techniques and their application to knowledge acquisition in the Semantic Web. © 2010, Akademische Verlagsgesellschaft AKA GmbH, Heidelberg. All rights reserved.
Article
Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.
Article
Monte Carlo is one of the most versatile and widely used numerical methods. Its convergence rate, O(N−1/2), is independent of dimension, which shows Monte Carlo to be very robust but also slow. This article presents an introduction to Monte Carlo methods for integration problems, including convergence theory, sampling methods and variance reduction techniques. Accelerated convergence for Monte Carlo quadrature is attained using quasi-random (also called low-discrepancy) sequences, which are a deterministic alternative to random or pseudo-random sequences. The points in a quasi-random sequence are correlated to provide greater uniformity. The resulting quadrature method, called quasi-Monte Carlo, has a convergence rate of approximately O((logN)kN−1). For quasi-Monte Carlo, both theoretical error estimates and practical limitations are presented. Although the emphasis in this article is on integration, Monte Carlo simulation of rarefied gas dynamics is also discussed. In the limit of small mean free path (that is, the fluid dynamic limit), Monte Carlo loses its effectiveness because the collisional distance is much less than the fluid dynamic length scale. Computational examples are presented throughout the text to illustrate the theory. A number of open problems are described.
Article
AbstractWhile the number of knowledge bases in the Semantic Web increases, the maintenance and creation of ontology schemata still remain a challenge. In particular creating class expressions constitutes one of the more demanding aspects of ontology engineering. In this article we describe how to adapt a semi-automatic method for learning OWL class expressions to the ontology engineering use case. Specifically, we describe how to extend an existing learning algorithm for the class learning problem. We perform rigorous performance optimization of the underlying algorithms for providing instant suggestions to the user. We also present two plugins, which use the algorithm, for the popular Protégé and OntoWiki ontology editors and provide a preliminary evaluation on real ontologies.
Article
In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been made on these topics in both empirical and theoretical work in machine learning, and we present a general framework that we use to compare different methods. We close with some challenges for future work in this area.
Conference Paper
Multi-objective evolutionary algorithms which use non-dominated sorting and sharing have been mainly criticized for their (i) -4 computational complexity (where is the number of objectives and is the population size), (ii) non-elitism approach, and (iii) the need for specifying a sharing parameter. In this paper, we suggest a non-dominated sorting based multi-objective evolutionary algorithm (we called it the Non-dominated Sorting GA-II or NSGA-II) which alleviates all the above three difficulties. Specifically, a fast non-dominated sorting approach with computational complexity is presented. Second, a selection operator is presented which creates a mating pool by combining the parent and child populations and selecting the best (with respect to fitness and spread) solutions. Simulation results on five difficult test problems show that the proposed NSGA-II is able to find much better spread of solutions in all problems compared to PAES---another elitist multi-objective EA which pays special attention towards creating a diverse Pareto-optimal front. Because of NSGA-II's low computational requirements, elitist approach, and parameter-less sharing approach, NSGA-II should find increasing applications in the years to come.