In preparation for things to come, we discuss a plain vanilla Python implementation of "the" greedy approximation algorithm for the set cover problem.
We show that the fundamental tasks of sorting lists and building search trees or heaps can be modeled as quadratic unconstrained binary optimization problems (QUBOs). The idea is to understand these tasks as permutation problems and to devise QUBOs whose solutions represent appropriate permutation matrices. We discuss how to construct such QUBOs and how to solve them using Hopfield nets or adiabatic) quantum computing. In short, we show that neurocomputing methods or quantum computers can solve problems usually associated with abstract data structures.
Cross-lingual Entity Typing (CLET) aims at improving the quality of entity type prediction by transferring semantic knowledge learned from rich-resourced languages to low-resourced languages. By utilizing multilingual transfer learning via the mixture-of-experts approach, our model dynamically captures the relationship between the target language and each source language and effectively generalizes to predict types of unseen entities in new languages. Extensive experiments on multi-lingual datasets show that our method significantly outperforms multiple baselines and can robustly handle the negative transfer. We questioned the relationship between language similarity and the performance of CLET. A series of experiments refute the popular commonsense that the more data the better. We propose the Similarity Hypothesis for CLET as follows: The more similar the source and the target are, the better the performance will be; A large set of source languages with a high deviation of similarity may perform worse than one of its subsets whose members are more similar to the target than other sources.
This report documents the program and the outcomes of Dagstuhl Seminar 21362 "Structure and Learning", held from September 5 to 10, 2021. Structure and learning are among the most prominent topics in Artificial Intelligence (AI) today. Integrating symbolic and numeric inference was set as one of the next open AI problems at the Townhall meeting "A 20 Year Roadmap for AI" at AAAI 2019. In this Dagstuhl Seminar, we discussed related problems from an interdisciplinary perspective, in particular, Cognitive Science, Cognitive Psychology, Physics, Computational Humor, linguistics, Machine Learning, and AI. This report overviews presentations and working groups during the seminar, and lists two open problems. Seminar September 5-10, 2021-http://www.dagstuhl.de/21362
The scientific areas of artificial intelligence and machine learning are rapidly evolving and their scientific discoveries are drivers of scientific progress in areas ranging from physics or chemistry to life sciences and humanities. But machine learning is facing a re-producibility crisis that is clashing with the core principles of the scientific method: With the growing complexity of methods, it is becoming increasingly difficult to independently reproduce and verify published results and fairly compare methods. One possible remedy is maximal transparency with regard to the design and execution of experiments. For this purpose, best practices for handling machine learning experiments are summarized in this Coding Nugget. In addition, a convenient and simple library for tracking of experimental results, meticulous-ml , is being introduced in the final hands-on section.
We consider L2 support vector machines for binary classification. These are as robust as other kinds of SVMs but can be trained almost effortlessly. Indeed, having previously derived the corresponding dual training problem, we now show how to solve it using the Frank-Wolfe algorithm. In short, we show that it requires only a few lines of plain vanilla NumPy code to train an SVM.
Neural networks have the potential to be extremely powerful for computer vision related tasks, but can be computationally expensive. Classical methods, by comparison, tend to be relatively light weight, albeit not as powerful. In this paper, we propose a method of combining parts from a classical system, called the Viola-Jones Object Detection Framework, with a modern ternary neural network to improve the efficiency of a convolutional neural net by replacing convolutional filters with a set of custom ones inspired by the framework. This reduces the number of operations needed for computing feature values with negligible effects on overall accuracy, allowing for a more optimized network.
In this note, we introduce some of the common terminology in digital image processing. We also have a very first look at how to work with digital images in Python and discuss how to read and write them from-and to disc.
Deep neural networks such as Convolutional Neural Networks (CNNs) have been successfully applied to a wide variety of tasks, including time series forecasting. In this paper, we propose a novel approach for online deep CNN selection using saliency maps in the task of time series forecasting. We start with an arbitrarily set of different CNN forecasters with various architectures. Then, we outline a gradient-based technique for generating saliency maps with a coherent design to make it able to specialize the CNN forecasters across different regions in the input \timeseries using a performance-based ranking. In this framework, the selection of the adequate model is performed in an online fashion and the computation of saliency maps responsible for the model selection is achieved adaptively following drift detection in the \timeseries. In addition, the saliency maps can be exploited to provide suitable explanations for the reason behind selecting a specific model at a certain time interval or instant. An extensive empirical study on various real-world datasets demonstrates that our method achieves excellent or on par results in comparison to the state-of-the-art approaches as well as several baselines.
We demonstrate that Hopfield networks can be used for hard vector quantization. To this end, we first formulate vector quantization as the problem of minimizing the mean discrepancy between kernel density estimates of two data distributions and then express it as a quadratic unconstrained binary optimization problem that can be solved by a Hopfield net. Our corresponding NumPy code is simple and consistently produces good results.
This note demonstrates that Hopfield nets can solve Sudoku puzzles. We discuss how to represent Sudokus in terms of binary vectors and how to express their rules and hints in terms of matrix-vector equations. This allows us to set up energy functions whose global minima encode the solution to a given puzzle. However, as these energy functions typically have numerous local minima, Hopfield nets with random selection or steepest descent updates rarely find the correct solution. We therefore consider stochastic Hopfield nets or Boltzmann machines whose neurons update according to a stochastic process called simulated annealing. Our corresponding NumPy code is comparatively simple and efficient and consistently yields good results.
We approach least squares optimization from the point of view of gradient flows. As a practical example, we consider a simple linear regression problem, set up the corresponding differential equation, and show how to solve it using SciPy.
We revisit Hopfield nets for bipartition clustering and tweak the underlying energy function such that it has a unique global minimum. In other words, we show how to remove ambiguity from the bipartition clustering problem. Our corresponding NumPy code is short and simple.
We show how max-sum diversification can be used to solve the-clique problem, a well-known NP-complete problem. This reduction proves that max-sum diversification is NP-hard and provides a simple and practical method to find cliques of a given size using Hopfield networks.
We derive the dual problem of L2 support vector machine training. This involves setting up the Lagrangian of the primal problem and working with the Karush-Kuhn-Tucker conditions. As a payoff, we find that the dual poses a rather simple optimization problem that can be solved by the Frank-Wolfe algorithm.
We revisit Hopfield nets for bipartition clustering and show how to invoke the kernel trick to increase robustness and versatility. Our corresponding NumPy code is short and simple.
We show that Hopfield networks can cluster numerical data into two salient clusters. Our derivation of a corresponding energy function is based on properties of the specific problem of 2-means clustering. Our corresponding NumPy code is short and simple.
We demonstrate that Hopfield networks can tackle the max-sum diversification problem. To this end, we express max-sum diversification as a quadratic unconstrained binary optimization problem which can be cast as a Hopfield energy minimization problem. Since max-sum diversification is an NP-hard subset selection problem, we cannot guarantee that Hopfield nets will discover an optimal solution. Nevertheless, our simple NumPy implementation consistently produces good results.
Having previously considered sorting as a linear programming problem, we now cast it as a quadratic unconstrained binary optimization problem (QUBO). Deriving this formulation is a bit cumbersome but it allows for implementing neural networks or even quantum computing algorithms that sort. Here, however, we consider a simple greedy QUBO solver and implement it using NumPy.
Despite its great success, machine learning can have its limits when dealing with insufficient training data. A potential solution is the additional integration of prior knowledge into the training process which leads to the notion of informed machine learning. In this paper, we present a structured overview of various approaches in this field. We provide a definition and propose a concept for informed machine learning which illustrates its building blocks and distinguishes it from conventional machine learning. We introduce a taxonomy that serves as a classification framework for informed machine learning approaches. It considers the source of knowledge, its representation, and its integration into the machine learning pipeline. Based on this taxonomy, we survey related research and describe how different knowledge representations such as algebraic equations, logic rules, or simulation results can be used in learning systems. This evaluation of numerous papers on the basis of our taxonomy uncovers key methods in the field of informed machine learning.
Ojas' rule for neural principal component learning has a continuous analog called the Oja flow. This is a gradient flow on the unit sphere whose equilibrium points indicate the principal eigenspace of the training data. We briefly discuss characteristics of this flow and show how to solve its differential equation using SciPy.
Entity Matching (EM) aims at recognizing entity records that denote the same real-world object. Neural EM models learn vector representation of entity descriptions and match entities end-to-end. Though robust, these methods re-quire many annotated resources for training,and lack of interpretability. In this paper, we propose a novel EM framework that consists of Heterogeneous Information Fusion (HIF) and Key Attribute Tree (KAT) Induction to decouple feature representation from matching decision. Using self-supervised learning and mask mechanism in pre-trained language modeling, HIF learns the embeddings of noisy attribute values by inter-attribute attention with unlabeled data. Using a set of comparison features and a limited amount of annotated data, KAT Induction learns an efficient decision tree that can be interpreted by generating entity matching rules whose structure is advocated by domain experts. Experiments on six public datasets and three industrial datasets show that our method is highly efficient and outperforms SOTA EM models in most cases.
We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article therefore deals with the question of whether Wikipedia exhibits this kind of linguistic relativity or not. From the perspective of educational science, the article develops a computational model of the information landscape from which multiple texts are drawn as typical input of web-based reading. For this purpose, it develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding Wikipedias. In this way the article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
Linear programming is a surprisingly versatile tool. That is, many problems we would not usually think of in terms of a linear programming problem can actually be expressed as such. In this note, we show that sorting is such a problem and discuss how to solve linear programs for sorting using SciPy.
Artificial intelligence for autonomous driving must meet strict requirements on safety and robustness, which motivates the thorough validation of learned models. However, current validation approaches mostly require ground truth data and are thus both cost-intensive and limited in their applicability. We propose to overcome these limitations by a model agnostic validation using a-priori knowledge from street maps. In particular, we show how to validate semantic segmentation masks and demonstrate the potential of our approach using OpenStreetMap. We introduce validation metrics that indicate false positive or negative road segments. Besides the validation approach, we present a method to correct the vehicle's GPS position so that a more accurate localization can be used for the street-map based validation. Lastly, we present quantitative results on the Cityscapes dataset indicating that our validation approach can indeed uncover errors in semantic segmentation masks.
Fake information poses one of the major threats for society in the 21st century. Identifying misinformation has become a key challenge due to the amount of fake news that is published daily. Yet, no approach is established that addresses the dynamics and versatility of fake news editorials. Instead of classifying content, we propose an evidence retrieval approach to handle fake news. The learning task is formulated as an unsupervised machine learning problem. For validation purpose, we provide the user with a set of news articles from reliable news sources supporting the hypothesis of the news article in query and the final decision is left to the user. Technically we propose a two-step process: (i) Aggregation-step: With information extracted from the given text we query for similar content from reliable news sources. (ii) Refining-step: We narrow the supporting evidence down by measuring the semantic distance of the text with the collection from step (i). The distance is calculated based on Word2Vec and the Word Mover's Distance. In our experiments, only content that is below a certain distance threshold is considered as supporting evidence. We find that our approach is agnostic to concept drifts, i.e. the machine learning task is independent of the hypotheses in a text. This makes it highly adaptable in times where fake news is as diverse as classical news is. Our pipeline offers the possibility for further analysis in the future, such as investigating bias and differences in news reporting.
We revisit the problem of numerically solving the Schrödinger equation for a one-dimensional quantum harmonic oscillator. We reconsider our previous finite difference scheme and discuss how higher order finite differences can lead to more accurate solutions. In particular, we will consider a five point stencil to approximate second order derivatives and implement the approach using SciPy functions for sparse matrices.
Most quantum mechanical systems cannot be solved analytically and therefore require numerical solution strategies. In this note, we consider a simple such strategy and discretize the Schrodinger equation that governs the behavior of a one-dimensional quantum harmonic oscillator. This leads to an eigenvalue / eigenvector problem over finite matrices and vectors which we then implement and solve using standard NumPy functions.
Having previously discussed how SciPy allows us to solve linear programs, we can study further applications of linear programming. Here, we consider least absolute deviation regression and solve a simple parameter estimation problem deliberately chosen to expose potential pitfalls in using SciPy's optimization functions.
This note discusses how to solve linear programming problems with SciPy. As a practical use case, we consider the task of computing the Chebyshev center of a bounded convex polytope.
The optimization of submodular functions constitutes a viable way to perform clustering. Strong approximation guarantees and feasible optimization w.r.t. streaming data make this clustering approach favorable. Technically, submodular functions map subsets of data to real values, which indicate how "representative" a specific subset is. Optimal sets might then be used to partition the data space and to infer clusters. Exemplar-based clustering is one of the possible submodular functions, but suffers from high computational complexity. However, for practical applications, the particular real-time or wall-clock run-time is decisive. In this work, we present a novel way to evaluate this particular function on GPUs, which keeps the necessities of optimizers in mind and reduces wall-clock run-time. To discuss our GPU algorithm, we investigated both the impact of different run-time critical problem properties, like data dimensionality and the number of data points in a subset, and the influence of required floating-point precision. In reproducible experiments, our GPU algorithm was able to achieve competitive speedups of up to 72x depending on whether multi-threaded computation on CPUs was used for comparison and the type of floating-point precision required. Half-precision GPU computation led to large speedups of up to 452x compared to single-precision, single-thread CPU computations.
Companies have an increasing demand for enriching documents with metadata. In an applied setting, we present a three-part workflow for the combination of multi-label classification and semantic tagging using a collection of key-phrases. The workflow is illustrated on the basis of patent abstracts with the CPC scheme. The key-phrases are drawn from a training set collection of documents without manual interaction. The union of CPC labels and key-phrases provides a label set on which a multi-label classifier model is generated by supervised training. We show learning curves for both key-phrases and classification categories, and a semantic graph generated from cosine similarities. We conclude that, given sufficient training data, the number of label categories is highly scalable.
In the present work we study classifiers' decision boundaries via Brownian motion processes in ambient data space and associated probabilistic techniques. Intuitively, our ideas correspond to placing a heat source at the decision boundary and observing how effectively the sample points warm up. We are largely motivated by the search for a soft measure that sheds further light on the decision boundary's geometry. En route, we bridge aspects of potential theory and geometric analysis (Maz'ya 2011, Grigor'Yan and Saloff-Coste 2002) with active fields of ML research such as adversarial examples and generalization bounds. First, we focus on the geometric behavior of decision boundaries in the light of adversarial attack/defense mechanisms. Experimentally, we observe a certain capacitory trend over different adversarial defense strategies: decision boundaries locally become flatter as measured by isoperimetric inequalities (Ford et al 2019); however, our more sensitive heat-diffusion metrics extend this analysis and further reveal that some non-trivial geometry invisible to plain distance-based methods is still preserved. Intuitively, we provide evidence that the decision boundaries nevertheless retain many persistent "wiggly and fuzzy" regions on a finer scale. Second, we show how Brownian hitting probabilities translate to soft generalization bounds which are in turn connected to compression and noise stability (Arora et al 2018), and these bounds are significantly stronger if the decision boundary has controlled geometric features.
Spatio-temporal data sets such as satellite image series are of utmost importance for understanding global developments like climate change or urbanization. However, incompleteness of data can greatly impact usability and knowledge discovery. In fact, there are many cases where not a single data point in the set is fully observed. For filling gaps, we introduce a novel approach that utilizes Markov random fields (MRFs). We extend the probabilistic framework to also consider empirical prior information, which allows to train even on highly incomplete data. Moreover, we devise a way to make discrete MRFs predict continuous values via state superposition. Experiments on real-world remote sensing imagery suffering from cloud cover show that the proposed approach outperforms state-of-the-art gap filling techniques.
The Abstraction and Reasoning Corpus (ARC) comprising image-based logical reasoning tasks is intended to serve as a benchmark for measuring intelligence. Solving these tasks is very difficult for offthe-shelf ML methods due to their diversity and low amount of training data. We here present our approach, which solves tasks via grammatical evolution on a domain-specific language for image transformations. With this approach, we successfully participated in an online challenge, scoring among the top 4% out of 900 participants.
The last decade has witnessed a rapid growth of the field of exoplanet discovery and characterisation. However, several big challenges remain, many of which could be addressed using machine learning methodology. For instance, the most prolific method for detecting exoplanets and inferring several of their characteristics, transit photometry, is very sensitive to the presence of stellar spots. The current practice in the literature is to identify the effects of spots visually and correct for them manually or discard the affected data. This paper explores a first step towards fully automating the efficient and precise derivation of transit depths from transit light curves in the presence of stellar spots. The methods and results we present were obtained in the context of the 1st Machine Learning Challenge organized for the European Space Agency's upcoming Ariel mission. We first present the problem, the simulated Ariel-like data and outline the Challenge while identifying best practices for organizing similar challenges in the future. Finally, we present the solutions obtained by the top-5 winning teams, provide their code and discuss their implications. Successful solutions either construct highly non-linear (w.r.t. the raw data) models with minimal preprocessing -deep neural networks and ensemble methods- or amount to obtaining meaningful statistics from the light curves, constructing linear models on which yields comparably good predictive performance.
The communication between data-generating devices is partially responsible for a growing portion of the world's power consumption. Thus reducing communication is vital, both, from an economical and an ecological perspective. For machine learning, on-device learning avoids sending raw data, which can reduce communication substantially. Furthermore, not centralizing the data protects privacy-sensitive data. However, most learning algorithms require hardware with high computation power and thus high energy consumption. In contrast, ultra-low-power processors, like FPGAs or micro-controllers, allow for energy-efficient learning of local models. Combined with communication-efficient distributed learning strategies, this reduces the overall energy consumption and enables applications that were yet impossible due to limited energy on local devices. The major challenge is then, that the low-power processors typically only have integer processing capabilities. This paper investigates an approach to communication-efficient on-device learning of integer exponential families that can be executed on low-power processors, is privacy-preserving, and effectively minimizes communication. The empirical evaluation shows that the approach can reach a model quality comparable to a centrally learned regular model with an order of magnitude less communication. Comparing the overall energy consumption, this reduces the required energy for solving the machine learning task by a significant amount.
Traditional neural networks represent everything as a vector, and are able to approximate a subset of logical reasoning to a certain degree. As basic logic relations are better represented by topological relations between regions, we propose a novel neural network that represents everything as a ball and is able to learn topological configuration as an Euler diagram. So comes the name Euler Neural-Network (ENN). The central vector of a ball is a vector that can inherit representation power of traditional neural network. ENN distinguishes four spatial statuses between balls, namely, being disconnected, being partially overlapped, being part of, being inverse part of. Within each status, ideal values are defined for efficient reasoning. A novel back-propagation algorithm with six Rectified Spatial Units (ReSU) can optimize an Euler diagram representing logical premises, from which logical conclusion can be deduced. In contrast to traditional neural network, ENN can precisely represent all 24 different structures of Syllogism. Two large datasets are created: one extracted from WordNet-3.0 covers all types of Syllogism reasoning, the other extracted all family relations from DBpedia. Experiment results approve the superior power of ENN in logical representation and reasoning. Datasets and source code are available upon request.
This chapter is the continued discussion on the last experiment in Chap. 10.1007/978-3-030-56275-5_6—Under what condition, can the precision for the Task of Membership-Validation reach 100%? We will create a new type of Geometric Connectionist Machines for Triple Classification task in Knowledge Graph reasoning. Our key question is: How shall we spatialize labeled tree structures onto vector embeddings?
We revisit the kernel minimum enclosing ball problem and show that it can be solved using simple recurrent neural networks. Once solved, the interior of a ball can be characterized in terms of a function of a set of support vectors and local minima of this function can be thought of as prototypes of the data at hand. For Gaussian kernels, these minima can be naturally found via a mean shift procedure and thus via another recurrent neurocomputing process. Practical results demonstrate that prototypes found this way are descriptive, meaningful, and interpretable.
Data mining and Machine Learning research has led to a wide variety of training methods and algorithms for different types of models. Many of these methods solve or approximate NP-hard optimization problems at their core, using vastly different approaches, some algebraic , others heuristic. This paper demonstrates another way of solving these problems by reducing them to quadratic polynomial optimization problems on binary variables. This class of parametric optimization problems is well-researched and powerful, and offers a unifying framework for many relevant ML problems that can all be tackled with one efficient solver. Because of the friendly domain of binary values, such a solver lends itself particularly well to hardware acceleration, as we further demonstrate in this paper by evaluating our problem reductions using FPGAs.
Variational Quantum Eigensolvers (VQEs) have recently attracted considerable attention. Yet, in practice, they still suffer from the efforts for estimating cost function gradients for large parameter sets or resource-demanding reinforcement strategies. Here, we therefore consider recent advances in weight-agnostic learning and propose a strategy that addresses the trade-off between finding appropriate circuit architectures and parameter tuning. We investigate the use of NEAT-inspired algorithms which evaluate circuits via genetic competition and thus circumvent issues due to exceeding numbers of parameters. Our methods are tested both via simulation and on real quantum hardware and are used to solve the transverse Ising Hamiltonian and the Sherrington-Kirkpatrick spin model.
Deep neural network models represent the state-of-the-art methodologies for natural language processing. Here we build on top of these methodologies to incorporate temporal information and model how review data changes with time. Specifically, we use the dynamic representations of recurrent point process models, which encode the history of how business or service reviews are received in time, to generate instantaneous language models with improved prediction capabilities. Simultaneously , our methodologies enhance the predictive power of our point process models by incorporating summarized review content representations. We provide recurrent network and temporal convolution solutions for modeling the review content. We deploy our methodologies in the context of recommender systems, effectively characterizing the change in preference and taste of users as time evolves. Source code is available at .
Password guessing approaches via deep learning have recently been investigated with significant breakthroughs in their ability to generate novel, realistic password candidates. In the present work we study a broad collection of deep learning and probabilistic based models in the light of password guessing: attention-based deep neural networks, autoencoding mechanisms and generative adversarial networks. We provide novel generative deep-learning models in terms of variational autoencoders exhibiting state-of-art sampling performance, yielding additional latent-space features such as interpolations and targeted sampling. Lastly, we perform a thorough empirical analysis in a unified controlled framework over well-known datasets (RockYou, LinkedIn, Youku, Zomato, Pwnd). Our results not only identify the most promising schemes driven by deep neural networks, but also illustrate the strengths of each approach in terms of generation variability and sample uniqueness.
Recent progress in recommender system research has shown the importance of including temporal representations to improve interpretability and performance. Here, we incorporate temporal representations in continuous time via recurrent point process for a dynamical model of reviews. Our goal is to characterize how changes in perception, user interest and seasonal effects affect review text.
Despite the high availability of financial and legal documents they are often not utilized by text processing or machine learning systems, even though the need for automated processing and extraction of useful patterns from these documents is increasing. This is partly due to the presence of sensitive entities in these documents, which restrict their usage beyond authorized parties and purposes. To overcome this limitation, we consider the task of anonymization in financial and legal documents using state-of-the-art natural language processing methods. Towards this, we present a web-based application to anonymize financial documents and also a large scale evaluation of different deep learning techniques.
Artificial intelligence for autonomous driving must meet strict requirements on safety and ro-bustness. We propose to validate machine learning models for self-driving vehicles not only with given ground truth labels, but also with additional a-priori knowledge. In particular, we suggest to validate the drivable area in semantic segmen-tation masks using given street map data. We present first results, which indicate that prediction errors can be uncovered by map-based validation.
Reinforcement learning (RL) has recently shown impressive performance in complex game AI and robotics tasks. To a large extent, this is thanks to the availability of simulated environments such as OpenAI Gym, Atari Learning Environment, or Malmo which allow agents to learn complex tasks through interaction with virtual environments. While RL is also increasingly applied to natural language processing (NLP), there are no simulated textual environments available for researchers to apply and consistently benchmark RL on NLP tasks. With the work reported here, we therefore release NLPGym, an open-source Python toolkit that provides interactive textual environments for standard NLP tasks such as sequence tagging, multi-label classification, and question answering. We also present experimental results for 6 tasks using different RL algorithms which serve as baselines for further research. The toolkit is published at https://github.com/rajcscw/nlp-gym