Article

Multiclass classification utilising an estimated algorithmic probability prior

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The simplicity bias upper bound [31] gives a way to bound probability P (x) of observing output pattern x based essentially only on estimating the Kolmogorov complexity of the pattern in x. Simplicity bias analysis has been applied in many contexts, including machine learning and deep neural networks [33,34,35,36,37,38,39]. Simplicity bias and algorithmic probability are closely related to Occam's razor [40], a fundamental basis of scientific reasoning and model selection, that simpler theories/models should be preferred over more complex theories/models provided they explain the data equally well. ...
... Simplicity bias and the related bound have been found to apply in many systems including RNA shapes [31,38], protein shapes [51,49], models of plant growth [31], ordinary differential equation solution profiles [31], finite state transducers [32], natural time series [43], and others. In these systems, the probability of different output shapes, upon a random sampling of inputs, was nontrivially bounded and thereby predicted by Eq. (2). ...
... Arguments inspired by algorithmic information theory (AIT) have been used to predict the occurrence of simplicity bias in many real-world input-output maps, where complex patterns have exponentially low probabilities, and high probability patterns are simple [31]. This phenomenon has been observed in a very wide range of systems, including RNA shapes [31,38], protein shapes [51,49], models of plant growth [31], ordinary differential equation solution profiles [31], finite state transducer [32], natural time series [43], deep neural networks [33], 1D dynamical systems [27], and others. In this work, we have numerically investigated the presence of simplicity bias in digitized trajectories of the random logistic map. ...
Preprint
Full-text available
Simplicity bias is an intriguing phenomenon prevalent in various input-output maps, characterized by a preference for simpler, more regular, or symmetric outputs. Notably, these maps typically feature high-probability outputs with simple patterns, whereas complex patterns are exponentially less probable. This bias has been extensively examined and attributed to principles derived from algorithmic information theory and algorithmic probability. In a significant advancement, it has been demonstrated that the renowned logistic map xk+1=μxk(1xk)x_{k+1}=\mu x_k(1-x_k), a staple in dynamical systems theory, and other one-dimensional maps exhibit simplicity bias when conceptualized as input-output systems. Building upon this foundational work, our research delves into the manifestations of simplicity bias within the random logistic map, specifically focusing on scenarios involving additive noise. This investigation is driven by the overarching goal of formulating a comprehensive theory for the prediction and analysis of time series. Our primary contributions are multifaceted. We discover that simplicity bias is observable in the random logistic map for specific ranges of μ\mu and noise magnitudes. Additionally, we find that this bias persists even with the introduction of small measurement noise, though it diminishes as noise levels increase. Our studies also revisit the phenomenon of noise-induced chaos, particularly when μ=3.83\mu=3.83, revealing its characteristics through complexity-probability plots. Intriguingly, we employ the logistic map to underscore a paradoxical aspect of data analysis: more data adhering to a consistent trend can occasionally lead to reduced confidence in extrapolation predictions, challenging conventional wisdom. We propose that adopting a probability-complexity perspective in analyzing dynamical systems could significantly enrich statistical learning theories related to series prediction and analysis. This approach not only facilitates a deeper understanding of simplicity bias and its implications but also paves the way for novel methodologies in forecasting complex systems behavior, especially in scenarios dominated by uncertainty and stochasticity.
... From a very diferent perspective, algorithmic probability estimates have also been made via deriving a weaker form of the coding theorem, applicable in real-world contexts [9], taking the form of an upper bound. Tis weaker form upper bound was applied in a range of input-output maps to make a priori predictions of the probability of diferent shapes and patterns, such as the probability of diferent RNA shapes appearing on a random choice of genetic sequence, or the probability of diferential equation solution profle shapes, on random choice of input parameters, and several other examples [7,9,18,19]. Surprisingly, it was found that probability estimates could be made directly from the complexities of the shapes themselves, without recourse to the details of the map or reference to how the shapes were generated. Te authors of [9] termed this phenomenon of an inverse relation between complexity and probability simplicity bias (SB). ...
... Are there common patterns across diferent system dictating which outputs will be LKLP? How can we best incorporate simplicity bias probability predictions into other probability estimation approaches, such as machine learning [19]? ...
Article
Full-text available
Developing new ways to estimate probabilities can be valuable for science, statistics, engineering, and other fields. By considering the information content of different output patterns, recent work invoking algorithmic information theory inspired arguments has shown that a priori probability predictions based on pattern complexities can be made in a broad class of input-output maps. These algorithmic probability predictions do not depend on a detailed knowledge of how output patterns were produced, or historical statistical data. Although quantitatively fairly accurate, a main weakness of these predictions is that they are given as an upper bound on the probability of a pattern, but many low complexity, low probability patterns occur, for which the upper bound has little predictive value. Here, we study this low complexity, low probability phenomenon by looking at example maps, namely a finite state transducer, natural time series data, RNA molecule structures, and polynomial curves. Some mechanisms causing low complexity, low probability behaviour are identified, and we argue this behaviour should be assumed as a default in the real-world algorithmic probability studies. Additionally, we examine some applications of algorithmic probability and discuss some implications of low complexity, low probability patterns for several research areas including simplicity in physics and biology, a priori probability predictions, Solomonoff induction and Occam’s razor, machine learning, and password guessing.
... The upper bound implies that complex output patterns must have low probabilities, while high-probability outputs must be simple. Example systems in which simplicity bias has been observed include RNA structures [1,6], differential equation solutions [1], finite state transducers [2], time series patterns in natural data [7], and natural protein structures [8], among others. Which systems will and will not show simplicity bias has yet to be determined, but the phenomenon is expected to appear in a wide class of input-output maps, under fairly general conditions. ...
Article
Full-text available
Arguments inspired by algorithmic information theory predict an inverse relation between the probability and complexity of output patterns in a wide range of input–output maps. This phenomenon is known as simplicity bias. By viewing the parameters of dynamical systems as inputs, and the resulting (digitised) trajectories as outputs, we study simplicity bias in the logistic map, Gauss map, sine map, Bernoulli map, and tent map. We find that the logistic map, Gauss map, and sine map all exhibit simplicity bias upon sampling of map initial values and parameter values, but the Bernoulli map and tent map do not. The simplicity bias upper bound on the output pattern probability is used to make a priori predictions regarding the probability of output patterns. In some cases, the predictions are surprisingly accurate, given that almost no details of the underlying dynamical systems are assumed. More generally, we argue that studying probability–complexity relationships may be a useful tool when studying patterns in dynamical systems.
... The upper bound implies that complex output patterns must have low probabilities, while high probability outputs must be simple. Example systems where simplicity bias has been observed include RNA structures [14,16], differential equation solutions [14], finite state transducers [15], time series patterns in natural data [17], natural protein structures [18], among others. ...
Preprint
Full-text available
Arguments inspired by algorithmic information theory predict an inverse relation between the probability and complexity of output patterns in a wide range of input-output maps. This phenomenon is known as simplicity bias. By viewing the parameters of dynamical systems as inputs, and resulting (digitised) trajectories as outputs, we study simplicity bias in the logistic map, Gauss map, sine map, Bernoulli map, and tent map. We find that the logistic map, Gauss map, and sine map all exhibit simplicity bias upon sampling of map initial values and parameter values, but the Bernoulli map and tent map do not. The simplicity bias upper bound on output pattern probability is used to make a priori predictions for the probability of output patterns. In some cases, the predictions are surprisingly accurate, given that almost no details of the underlying dynamical systems are assumed. More generally, we argue that studying probability-complexity relationships may be a useful tool in studying patterns in dynamical systems.
Article
Full-text available
Unravelling the structure of genotype–phenotype (GP) maps is an important problem in biology. Recently, arguments inspired by algorithmic information theory (AIT) and Kolmogorov complexity have been invoked to uncover simplicity bias in GP maps, an exponentially decaying upper bound in phenotype probability with the increasing phenotype descriptional complexity. This means that phenotypes with many genotypes assigned via the GP map must be simple, while complex phenotypes must have few genotypes assigned. Here, we use similar arguments to bound the probability P ( x → y ) that phenotype x , upon random genetic mutation, transitions to phenotype y . The bound is P ( x → y ) ≲ 2 − a K ~ ( y | x ) − b , where K ~ ( y | x ) is the estimated conditional complexity of y given x , quantifying how much extra information is required to make y given access to x . This upper bound is related to the conditional form of algorithmic probability from AIT. We demonstrate the practical applicability of our derived bound by predicting phenotype transition probabilities (and other related quantities) in simulations of RNA and protein secondary structures. Our work contributes to a general mathematical understanding of GP maps and may facilitate the prediction of transition probabilities directly from examining phenotype themselves, without utilizing detailed knowledge of the GP map.
Article
Full-text available
To what extent can we forecast a time series without fitting to historical data? Can universal patterns of probability help in this task? Deep relations between pattern Kolmogorov complexity and pattern probability have recently been used to make a priori probability predictions in a variety of systems in physics, biology and engineering. Here we study simplicity bias (SB) — an exponential upper bound decay in pattern probability with increasing complexity — in discretised time series extracted from the World Bank Open Data collection. We predict upper bounds on the probability of discretised series patterns, without fitting to trends in the data. Thus we perform a kind of ‘forecasting without training data’, predicting time series shape patterns a priori, but not the actual numerical value of the series. Additionally we make predictions about which of two discretised series is more likely with accuracy of ∼80%, much higher than a 50% baseline rate, just by using the complexity of each series. These results point to a promising perspective on practical time series forecasting and integration with machine learning methods.
Article
Full-text available
The framework of Solomonoff prediction assigns prior probability to hypotheses inversely proportional to their Kolmogorov complexity. There are two well-known problems. First, the Solomonoff prior is relative to a choice of Universal Turing machine. Second, the Solomonoff prior is not computable. However, there are responses to both problems. Different Solomonoff priors converge with more and more data. Further, there are computable approximations to the Solomonoff prior. I argue that there is a tension between these two responses. This is because computable approximations to Solomonoff prediction do not always converge.
Article
Full-text available
Significance Why does evolution favor symmetric structures when they only represent a minute subset of all possible forms? Just as monkeys randomly typing into a computer language will preferentially produce outputs that can be generated by shorter algorithms, so the coding theorem from algorithmic information theory predicts that random mutations, when decoded by the process of development, preferentially produce phenotypes with shorter algorithmic descriptions. Since symmetric structures need less information to encode, they are much more likely to appear as potential variation. Combined with an arrival-of-the-frequent mechanism, this algorithmic bias predicts a much higher prevalence of low-complexity (high-symmetry) phenotypes than follows from natural selection alone and also explains patterns observed in protein complexes, RNA secondary structures, and a gene regulatory network.
Article
Full-text available
Genotype–phenotype maps link genetic changes to their fitness effect and are thus an essential component of evolutionary models. The map between RNA sequences and their secondary structures is a key example and has applications in functional RNA evolution. For this map, the structural effect of substitutions is well understood, but models usually assume a constant sequence length and do not consider insertions or deletions. Here, we expand the sequence–structure map to include single nucleotide insertions and deletions by using the RNAshapes concept. To quantify the structural effect of insertions and deletions, we generalize existing definitions for robustness and non-neutral mutation probabilities. We find striking similarities between substitutions, deletions and insertions: robustness to substitutions is correlated with robustness to insertions and, for most structures, to deletions. In addition, frequent structural changes after substitutions also tend to be common for insertions and deletions. This is consistent with the connection between energetically suboptimal folds and possible structural transitions. The similarities observed hold both for genotypic and phenotypic robustness and mutation probabilities, i.e. for individual sequences and for averages over sequences with the same structure. Our results could have implications for the rate of neutral and non-neutral evolution.
Article
Full-text available
Morphospaces –representations of phenotypic characteristics– are often populated unevenly, leaving large parts unoccupied. Such patterns are typically ascribed to contingency, or else to natural selection disfavouring certain parts of the morphospace. The extent to which developmental bias, the tendency of certain phenotypes to preferentially appear as potential variation, also explains these patterns is hotly debated. Here we demonstrate quantitatively that developmental bias is the primary explanation for the occupation of the morphospace of RNA secondary structure (SS) shapes. Upon random mutations, some RNA SS shapes (the frequent ones) are much more likely to appear than others. By using the RNAshapes method to define coarse-grained SS classes, we can directly compare the frequencies that non-coding RNA SS shapes appear in the RNAcentral database to frequencies obtained upon random sampling of sequences. We show that: a) Only the most frequent structures appear in nature; the vast majority of possible structures in the morphospace have not yet been explored. b) Remarkably small numbers of random sequences are needed to produce all the RNA SS shapes found in nature so far. c) Perhaps most surprisingly, the natural frequencies are accurately predicted, over several orders of magnitude in variation, by the likelihood that structures appear upon uniform random sampling of sequences. The ultimate cause of these patterns is not natural selection, but rather strong phenotype bias in the RNA genotype-phenotype map, a type of developmental bias or “findability constraint”, which limits evolutionary dynamics to a hugely reduced subset of structures that are easy to “find”.
Article
Full-text available
We show how complexity theory can be introduced in machine learning to help bring together apparently disparate areas of current research. We show that this model-driven approach may require less training data and can potentially be more generalizable as it shows greater resilience to random attacks. In an algorithmic space the order of its element is given by its algorithmic probability, which arises naturally from computable processes. We investigate the shape of a discrete algorithmic space when performing regression or classification using a loss function parametrized by algorithmic complexity, demonstrating that the property of differentiation is not required to achieve results similar to those obtained using differentiable programming approaches such as deep learning. In doing so we use examples which enable the two approaches to be compared (small, given the computational power required for estimations of algorithmic complexity). We find and report that 1) machine learning can successfully be performed on a non-smooth surface using algorithmic complexity; 2) that solutions can be found using an algorithmic-probability classifier, establishing a bridge between a fundamentally discrete theory of computability and a fundamentally continuous mathematical theory of optimization methods; 3) a formulation of an algorithmically directed search technique in non-smooth manifolds can be defined and conducted; 4) exploitation techniques and numerical methods for algorithmic search to navigate these discrete non-differentiable spaces can be performed; in application of the (a) identification of generative rules from data observations; (b) solutions to image classification problems more resilient against pixel attacks compared to neural networks; (c) identification of equation parameters from a small data-set in the presence of noise in continuous ODE system problem, (d) classification of Boolean NK networks by (1) network topology, (2) underlying Boolean function, and (3) number of incoming edges.
Article
Full-text available
Turing machines (TMs) are the canonical model of computation in computer science and physics. We combine techniques from algorithmic information theory and stochastic thermodynamics to analyze the thermodynamic costs of TMs. We consider two different ways of realizing a given TM with a physical process. The first realization is designed to be thermodynamically reversible when fed with random input bits. The second realization is designed to generate less heat, up to an additive constant, than any realization that is computable (i.e., consistent with the physical Church-Turing thesis). We consider three different thermodynamic costs: The heat generated when the TM is run on each input (which we refer to as the “heat function”), the minimum heat generated when a TM is run with an input that results in some desired output (which we refer to as the “thermodynamic complexity” of the output, in analogy to the Kolmogorov complexity), and the expected heat on the input distribution that minimizes entropy production. For universal TMs, we show for both realizations that the thermodynamic complexity of any desired output is bounded by a constant (unlike the conventional Kolmogorov complexity), while the expected amount of generated heat is infinite. We also show that any computable realization faces a fundamental trade-off among heat generation, the Kolmogorov complexity of its heat function, and the Kolmogorov complexity of its input-output map. We demonstrate this trade-off by analyzing the thermodynamics of erasing a long string.
Article
Full-text available
According to our current conception of physics, any valid physical theory is supposed to describe the objective evolution of a unique external world. However, this condition is challenged by quantum theory, which suggests that physical systems should not always be understood as having objective properties which are simply revealed by measurement. Furthermore, as argued below, several other conceptual puzzles in the foundations of physics and related fields point to limitations of our current perspective and motivate the exploration of an alternative: to start with the first-person (the observer) rather than the third-person perspective (the world). In this work, I propose a rigorous approach of this kind on the basis of algorithmic information theory. It is based on a single postulate: that universal induction\textit{universal induction} determines the chances of what any observer sees next. That is, instead of a world or physical laws, it is the local state of the observer alone that determines those probabilities. Surprisingly, despite its solipsistic foundation, I show that the resulting theory recovers many features of our established physical worldview: it predicts that it appears to observers as if there was an external world\textit{as if there was an external world} that evolves according to simple, computable, probabilistic laws. In contrast to the standard view, objective reality is not assumed on this approach but rather provably emerges as an asymptotic statistical phenomenon. The resulting theory dissolves puzzles like cosmology's Boltzmann brain problem, makes concrete predictions for thought experiments like the computer simulation of agents, and suggests novel phenomena such as ``probabilistic zombies'' governed by observer-dependent probabilistic chances. It also suggests that some basic phenomena of quantum theory (Bell inequality violation and no-signalling) might be understood as consequences of this framework.
Conference Paper
Full-text available
Action models of Dynamic Epistemic Logic (DEL) represent precisely how actions are perceived by agents. DEL has recently been used to define infinite multi-player games, and it was shown that they can be solved in some cases. However, the dynamics being defined by the classic DEL update product for individual actions, only turn-based games have been considered so far. In this work we define a concurrent DEL product, propose a mechanism to resolve conflicts between actions, and define concurrent DEL games. As in the turn-based case, the obtained concurrent infinite game arenas can be finitely represented when all actions are public, or all are propositional. Thus we identify cases where the strategic epistemic logic ATL*K can be model checked on such games.
Article
Full-text available
Some established and also novel techniques in the field of applications of algorithmic (Kolmogorov) complexity currently co-exist for the first time and are here reviewed, ranging from dominant ones such as statistical lossless compression to newer approaches that advance, complement and also pose new challenges and may exhibit their own limitations. Evidence suggesting that these different methods complement each other for different regimes is presented and despite their many challenges, some of these methods can be better motivated by and better grounded in the principles of algorithmic information theory. It will be explained how different approaches to algorithmic complexity can explore the relaxation of different necessary and sufficient conditions in their pursuit of numerical applicability, with some of these approaches entailing greater risks than others in exchange for greater relevance. We conclude with a discussion of possible directions that may or should be taken into consideration to advance the field and encourage methodological innovation, but more importantly, to contribute to scientific discovery. This paper also serves as a rebuttal of claims made in a previously published minireview by another author, and offers an alternative account.
Article
Full-text available
For a broad class of input-output maps, arguments based on the coding theorem from algorithmic information theory (AIT) predict that simple (low Kolmogorov complexity) outputs are exponentially more likely to occur upon uniform random sampling of inputs than complex outputs are. Here, we derive probability bounds that are based on the complexities of the inputs as well as the outputs, rather than just on the complexities of the outputs. The more that outputs deviate from the coding theorem bound, the lower the complexity of their inputs. Since the number of low complexity inputs is limited, this behaviour leads to an effective lower bound on the probability. Our new bounds are tested for an RNA sequence to structure map, a finite state transducer and a perceptron. The success of these new methods opens avenues for AIT to be more widely used.
Article
Full-text available
This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of MDL estimators. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC versus BIC and cross-validation versus Bayes can, to a large extent, be viewed from a unified perspective.
Article
Full-text available
Entropy and free-energy estimation are key in thermodynamic characterization of simulated systems ranging from spin models through polymers, colloids, protein structure, and drug design. Current techniques suffer from being model specific, requiring abundant computation resources and simulation at conditions far from the studied realization. Here, we present a universal scheme to calculate entropy using lossless-compression algorithms and validate it on simulated systems of increasing complexity. Our results show accurate entropy values compared to benchmark calculations while being computationally effective. In molecular-dynamics simulations of protein folding, we exhibit unmatched detection capability of the folded states by measuring previously undetectable entropy fluctuations along the simulation timeline. Such entropy evaluation opens a new window onto the dynamics of complex systems and allows efficient free-energy calculations.
Article
Full-text available
Complex behaviour emerges from interactions between objects produced by different generating mechanisms. Yet to decode their causal origin(s) from observations remains one of the most fundamental challenges in science. Here we introduce a universal, unsupervised and parameter-free model-oriented approach, based on the seminal concept and the first principles of algorithmic probability, to decompose an observation into its most likely algorithmic generative models. Our approach uses a perturbation-based causal calculus to infer model representations. We demonstrate its ability to deconvolve interacting mechanisms regardless of whether the resultant objects are bit strings, space–time evolution diagrams, images or networks. Although this is mostly a conceptual contribution and an algorithmic framework, we also provide numerical evidence evaluating the ability of our methods to extract models from data produced by discrete dynamical systems such as cellular automata and complex networks. We think that these separating techniques can contribute to tackling the challenge of causation, thus complementing statistically oriented approaches.
Article
Full-text available
Information-theoretic-based measures have been useful in quantifying network complexity. Here we briefly survey and contrast (algorithmic) information-theoretic methods which have been used to characterize graphs and networks. We illustrate the strengths and limitations of Shannon’s entropy, lossless compressibility and algorithmic complexity when used to identify aspects and properties of complex networks. We review the fragility of computable measures on the one hand and the invariant properties of algorithmic measures on the other demonstrating how current approaches to algorithmic complexity are misguided and suffer of similar limitations than traditional statistical approaches such as Shannon entropy. Finally, we review some current definitions of algorithmic complexity which are used in analyzing labelled and unlabelled graphs. This analysis opens up several new opportunities to advance beyond traditional measures.
Article
Full-text available
Many systems in nature can be described using discrete input-output maps. Without knowing details about a map, there may seem to be no a priori reason to expect that a randomly chosen input would be more likely to generate one output over another. Here, by extending fundamental results from algorithmic information theory, we show instead that for many real-world maps, the a priori probability P(x) that randomly sampled inputs generate a particular output x decays exponentially with the approximate Kolmogorov complexity [Formula: see text] of that output. These input-output maps are biased towards simplicity. We derive an upper bound P(x) ≲ [Formula: see text], which is tight for most inputs. The constants a and b, as well as many properties of P(x), can be predicted with minimal knowledge of the map. We explore this strong bias towards simple outputs in systems ranging from the folding of RNA secondary structures to systems of coupled ordinary differential equations to a stochastic financial trading model.
Article
Full-text available
While the equilibrium properties, states, and phase transitions of interacting systems are well described by statistical mechanics, the lack of suitable state parameters has hindered the understanding of non-equilibrium phenomena in diverse settings, from glasses to driven systems to biology. The length of a losslessly compressed data file is a direct measure of its information content: The more ordered the data is, the lower its information content and the shorter the length of its encoding can be made. Here, we describe how data compression enables the quantification of order in non-equilibrium and equilibrium many-body systems, both discrete and continuous, even when the underlying form of order is unknown. We consider absorbing state models on and off-lattice, as well as a system of active Brownian particles undergoing motility-induced phase separation. The technique reliably identifies non-equilibrium phase transitions, determines their character, quantitatively predicts certain critical exponents without prior knowledge of the order parameters, and reveals previously unknown ordering phenomena. This technique should provide a quantitative measure of organization in condensed matter and other systems exhibiting collective phase transitions in and out of equilibrium.
Article
Full-text available
One of the most remarkable features of the > 3.5 billion year history of life on Earth is the apparent trend of innovation and open-ended growth of complexity. Similar trends are apparent in artificial and technological systems. However, a general framework for understanding open-ended evolution as it might occur in biological or technological systems has not yet been achieved. Here, we cast the problem within the broader context of dynamical systems theory to uncover and characterize mechanisms for producing open-ended evolution (OEE). We present formal definitions of two hallmark features of OEE: unbounded evolution and innovation. We define unbounded evolution as patterns that are non-repeating within the expected Poincare\'e recurrence time of an equivalent isolated system, and innovation as trajectories not observed in isolated systems. As a case study, we test three new variants of cellular automata (CA) that implement time-dependent update rules against these two definitions. We find that each is capable of generating conditions for OEE, but vary in their ability to do so. Our results demonstrate that state-dependent dynamics, widely regarded as a hallmark feature of life, statistically out-perform other candidate mechanisms. It is also the only mechanism to produce OEE in a scalable manner, consistent with notions of OEE as ongoing production of complexity. Our results thereby suggest a new framework for unifying the mechanisms for generating OEE with features distinctive to life and its artifacts, with wide applicability to both biological and artificial systems.
Article
Full-text available
With the advent of high-performance computing, Bayesian methods are becoming increasingly popular tools for the quantification of uncertainty throughout science and industry. Since these methods can impact the making of sometimes critical decisions in increasingly complicated contexts, the sensitivity of their posterior conclusions with respect to the underlying models and prior beliefs is a pressing question to which there currently exist positive and negative answers. We report new results suggesting that, although Bayesian methods are robust when the number of possible outcomes is finite or when only a finite number of marginals of the data-generating distribution are unknown, they could be generically brittle when applied to continuous systems (and their discretizations) with finite information on the data-generating distribution. If closeness is defined in terms of the total variation (TV) metric or the matching of a finite system of generalized moments, then (1) two practitioners who use arbitrarily close models and observe the same (possibly arbitrarily large amount of) data may reach opposite conclusions; and (2) any given prior and model can be slightly perturbed to achieve any desired posterior conclusion. The mechanism causing brittleness/robustness suggests that learning and robustness are antagonistic requirements, which raises the possibility of a missing stability condition when using Bayesian inference in a continuous world under finite information.
Article
Full-text available
The prevalence of neutral mutations implies that biological systems typically have many more genotypes than phenotypes. But can the way that genotypes are distributed over phenotypes determine evolutionary outcomes? Answering such questions is difficult because the number of genotypes can be hyper-astronomically large. By solving the genotype-phentoype (GP) map for RNA secondary structure for systems up to length L=126 nucleotides (where the set of all possible RNA strands would weigh more than the mass of the visible universe) we show that the GP map strongly constrains the evolution of non-coding RNA (ncRNA). Remarkably, simple random sampling over genotypes accurately predicts the distribution of properties such as the mutational robustness or the number of stems per secondary structure found in naturally occurring ncRNA. Since we ignore natural selection, this close correspondence with the mapping suggests that structures allowing for functionality are easily discovered, despite the enormous size of the genetic spaces. The mapping is extremely biased: the majority of genotypes map to an exponentially small portion of the morphospace of all biophysically possible structures. Such strong constraints provide a non-adaptive explanation for the convergent evolution of structures such as the hammerhead ribozyme. ncRNA presents a particularly clear example of bias in the arrival of variation strongly shaping evolutionary outcomes.
Article
Full-text available
Motivation: Abstract shape analysis, first proposed in 2004, allows one to extract several relevant structures from the folding space of an RNA sequence, preferable to focusing in a single structure of minimal free energy. We report recent extensions to this approach. Results: We have rebuilt the original RNAshapes as a repository of components that allows us to integrate several established tools for RNA structure analysis: RNAshapes, RNAalishapes and pknotsRG, including its recent extension pKiss. As a spin-off, we obtain heretofore unavailable functionality: e. g. with pKiss, we can now perform abstract shape analysis for structures holding pseudoknots up to the complexity of kissing hairpin motifs. The new tool pAliKiss can predict kissing hairpin motifs from aligned sequences. Along with the integration, the functionality of the tools was also extended in manifold ways. Availability and implementation: As before, the tool is available on the Bielefeld Bioinformatics server at http://bibiserv.cebitec.uni-bielefeld.de/rnashapesstudio. Contact: bibi-help@cebitec.uni-bielefeld.de
Article
Full-text available
We show that numerical approximations of Kolmogorov complexity (KK) of graphs and networks capture some group-theoretic and topological properties of empirical networks, ranging from metabolic to social networks, and of small synthetic networks that we have produced. That KK and the size of the group of automorphisms of a graph are correlated opens up interesting connections to problems in computational geometry, and thus connects several measures and concepts from complexity science. We derive these results via two different Kolmogorov complexity approximation methods applied to the adjacency matrices of the graphs and networks. The methods used are the traditional lossless compression approach to Kolmogorov complexity, and a normalised version of a Block Decomposition Method (BDM) based on algorithmic probability theory.
Article
Full-text available
We can discover the effective similarity among pairs of finite objects and denoise a finite object using the Kolmogorov complexity of these objects. The drawback is that the Kolmogorov complexity is not computable. If we approximate it, using a good realworld compressor, then it turns out that on natural data the processes give adequate results in practice. The methodology is parameter-free, alignment-free and works on individual data. We illustrate both methods with examples. © 2012 The Author(s) Published by the Royal Society. All rights reserved.
Article
Full-text available
This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. This allows a Bayesian mixture of environment models to be computed in time proportional to the logarithm of the size of the model class. Secondly, the finite-horizon expectimax search is approximated by an asymptotically convergent Monte Carlo Tree Search technique. This scaled down AIXI agent is empirically shown to be effective on a wide class of toy problem domains, ranging from simple fully observable games to small POMDPs. We explore the limits of this approximate agent and propose a general heuristic framework for scaling this technique to much larger problems. Comment: 42 LaTeX pages, 20 figures
Article
Full-text available
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.
Article
Full-text available
The Universal Intelligence Measure is a recently proposed formal definition of intelligence. It is mathematically specified, extremely general, and captures the essence of many informal definitions of intelligence. It is based on Hutter's Universal Artificial Intelligence theory, an extension of Ray Solomonoff's pioneering work on universal induction. Since the Universal Intelligence Measure is only asymptotically computable, building a practical intelligence test from it is not straightforward. This paper studies the practical issues involved in developing a real-world UIM-based performance metric. Based on our investigation, we develop a prototype implementation which we use to evaluate a number of different artificial agents.
Article
Full-text available
Understanding inductive reasoning is a problem that has engaged mankind for thousands of years. This problem is relevant to a wide range of fields and is integral to the philosophy of science. It has been tackled by many great minds ranging from philosophers to scientists to mathematicians, and more recently computer scientists. In this article we argue the case for Solomonoff Induction, a formal inductive framework which combines algorithmic information theory with the Bayesian framework. Although it achieves excellent theoretical results and is based on solid philosophical foundations, the requisite technical knowledge necessary for understanding this framework has caused it to remain largely unknown and unappreciated in the wider scientific community. The main contribution of this article is to convey Solomonoff induction and its related concepts in a generally accessible form with the aim of bridging this current technical gap. In the process we examine the major historical contributions that have led to the formulation of Solomonoff Induction as well as criticisms of Solomonoff and induction in general. In particular we examine how Solomonoff induction addresses many issues that have plagued other inductive systems, such as the black ravens paradox and the confirmation problem, and compare this approach with other recent approaches.
Article
Full-text available
Secondary structure forms an important intermediate level of description of nucleic acids that encapsulates the dominating part of the folding energy, is often well conserved in evolution, and is routinely used as a basis to explain experimental findings. Based on carefully measured thermodynamic parameters, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties. The ViennaRNA Package has been a widely used compilation of RNA secondary structure related computer programs for nearly two decades. Major changes in the structure of the standard energy model, the Turner 2004 parameters, the pervasive use of multi-core CPUs, and an increasing number of algorithmic variants prompted a major technical overhaul of both the underlying RNAlib and the interactive user programs. New features include an expanded repertoire of tools to assess RNA-RNA interactions and restricted ensembles of structures, additional output information such as centroid structures and maximum expected accuracy structures derived from base pairing probabilities, or z-scores for locally stable secondary structures, and support for input in fasta format. Updates were implemented without compromising the computational efficiency of the core algorithms and ensuring compatibility with earlier versions. The ViennaRNA Package 2.0, supporting concurrent computations via OpenMP, can be downloaded from http://www.tbi.univie.ac.at/RNA.
Article
Full-text available
Using frequency distributions of daily closing price time series of several financial market indexes, we investigate whether the bias away from an equiprobable sequence distribution found in the data, predicted by algorithmic information theory, may account for some of the deviation of financial markets from log-normal, and if so for how much of said deviation and over what sequence lengths. We do so by comparing the distributions of binary sequences from actual time series of financial markets and series built up from purely algorithmic means. Our discussion is a starting point for a further investigation of the market as a rule-based system with an 'algorithmic' component, despite its apparent randomness, and the use of the theory of algorithmic probability with new tools that can be applied to the study of the market price phenomenon. The main discussion is cast in terms of assumptions common to areas of economics in agreement with an algorithmic view of the market.
Article
Full-text available
The function of a non-protein-coding RNA is often determined by its structure. Since experimental determination of RNA structure is time-consuming and expensive, its computational prediction is of great interest, and efficient solutions based on thermodynamic parameters are known. Frequently, however, the predicted minimum free energy structures are not the native ones, leading to the necessity of generating suboptimal solutions. While this can be accomplished by a number of programs, the user is often confronted with large outputs of similar structures, although he or she is interested in structures with more fundamental differences, or, in other words, with different abstract shapes. Here, we formalize the concept of abstract shapes and introduce their efficient computation. Each shape of an RNA molecule comprises a class of similar structures and has a representative structure of minimal free energy within the class. Shape analysis is implemented in the program RNAshapes. We applied RNAshapes to the prediction of optimal and suboptimal abstract shapes of several RNAs. For a given energy range, the number of shapes is considerably smaller than the number of structures, and in all cases, the native structures were among the top shape representatives. This demonstrates that the researcher can quickly focus on the structures of interest, without processing up to thousands of near-optimal solutions. We complement this study with a large-scale analysis of the growth behaviour of structure and shape spaces. RNAshapes is available for download and as an online version on the Bielefeld Bioinformatics Server.
Article
Full-text available
Background: Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. Results: We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the methodology to discriminate and classify biological sequences and structures. A second set of experiments aims at assessing how well two commonly available classification algorithms, UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining), can use the methodology to perform their task, their performance being evaluated against gold standards and with the use of well known statistical indexes, i.e., the F-measure and the partition distance. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of USM on biological data. The main ones are reported next. Conclusion: UCD and NCD are indistinguishable, i.e., they yield nearly the same values of the statistical indexes we have used, accross experiments and data sets, while CD is almost always worse than both. UPGMA seems to yield better classification results with respect to NJ, i.e., better values of the statistical indexes (10% difference or above), on a substantial fraction of experiments, compressors and USM approximation choices. The compression program PPMd, based on PPM (Prediction by Partial Matching), for generic data and Gencompress for DNA, are the best performers among the compression algorithms we have used, although the difference in performance, as measured by statistical indexes, between them and the other algorithms depends critically on the data set and may not be as large as expected. PPMd used with UCD or NCD and UPGMA, on sequence data is very close, although worse, in performance with the alignment methods (less than 2% difference on the F-measure). Yet, it scales well with data set size and it can work on data other than sequences. In summary, our quantitative analysis naturally complements the rich theory behind USM and supports the conclusion that the methodology is worth using because of its robustness, flexibility, scalability, and competitiveness with existing techniques. In particular, the methodology applies to all biological data in textual format. The software and data sets are available under the GNU GPL at the supplementary material web page.
Article
Full-text available
We present a new method for clustering based on compression. The method does not use subject-specific features or background knowledge, and works as follows: First, we determine a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, co-developed by one of the authors, is provably optimal. However, the optimality comes at the price of using the noncomputable notion of Kolmogorov complexity. We propose axioms to capture the real-world setting, and show that the NCD approximates optimality. To extract a hierarchy of clusters from the distance matrix, we determine a dendrogram (ternary tree) by a new quartet method and a fast heuristic to implement it. The method is implemented and available as public software, and is robust under choice of different compressors. To substantiate our claims of universality and robustness, we report evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors. In genomics, we presented new evidence for major questions in Mammalian evolution, based on whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta hypothesis against the Theria hypothesis.
Article
Ribonucleic acid (RNA) is a fundamental biological molecule that is essential to all living organisms, performing a versatile array of cellular tasks. The function of many RNA molecules is strongly related to the structure it adopts. As a result, great effort is being dedicated to the design of efficient algorithms that solve the “folding problem”—given a sequence of nucleotides, return a probable list of base pairs, referred to as the secondary structure prediction. Early algorithms largely rely on finding the structure with minimum free energy. However, the predictions rely on effective simplified free energy models that may not correctly identify the correct structure as the one with the lowest free energy. In light of this, new, data-driven approaches that not only consider free energy, but also use machine learning techniques to learn motifs are also investigated and recently been shown to outperform free energy–based algorithms on several experimental data sets. In this work, we introduce the new ExpertRNA algorithm that provides a modular framework that can easily incorporate an arbitrary number of rewards (free energy or nonparametric/data driven) and secondary structure prediction algorithms. We argue that this capability of ExpertRNA has the potential to balance out different strengths and weaknesses of state-of-the-art folding tools. We test ExpertRNA on several RNA sequence-structure data sets, and we compare the performance of ExpertRNA against a state-of-the-art folding algorithm. We find that ExpertRNA produces, on average, more accurate predictions of nonpseudoknotted secondary structures than the structure prediction algorithm used, thus validating the promise of the approach. Summary of Contribution: ExpertRNA is a new algorithm inspired by a biological problem. It is applied to solve the problem of secondary structure prediction for RNA molecules given an input sequence. The computational contribution is given by the design of a multibranch, multiexpert rollout algorithm that enables the use of several state-of-the-art approaches as base heuristics and allowing several experts to evaluate partial candidate solutions generated, thus avoiding assuming the reward being optimized by an RNA molecule when folding. Our implementation allows for the effective use of parallel computational resources as well as to control the size of the rollout tree as the algorithm progresses. The problem of RNA secondary structure prediction is of primary importance within the biology field because the molecule structure is strongly related to its functionality. Whereas the contribution of the paper is in the algorithm, the importance of the application makes ExpertRNA a showcase of the relevance of computationally efficient algorithms in supporting scientific discovery.
Book
This monograph demonstrates a new approach to the classical mode decomposition problem through nonlinear regression models, which achieve near-machine precision in the recovery of the modes. The presentation includes a review of generalized additive models, additive kernels/Gaussian processes, generalized Tikhonov regularization, empirical mode decomposition, and Synchrosqueezing, which are all related to and generalizable under the proposed framework. Although kernel methods have strong theoretical foundations, they require the prior selection of a good kernel. While the usual approach to this kernel selection problem is hyperparameter tuning, the objective of this monograph is to present an alternative (programming) approach to the kernel selection problem while using mode decomposition as a prototypical pattern recognition problem. In this approach, kernels are programmed for the task at hand through the programming of interpretable regression networks in the context of additive Gaussian processes. It is suitable for engineers, computer scientists, mathematicians, and students in these fields working on kernel methods, pattern recognition, and mode decomposition problems.
Article
Learning can be seen as approximating an unknown function by interpolating the training data. Although Kriging offers a solution to this problem, it requires the prior specification of a kernel and it is not scalable to large datasets. We explore a numerical approximation approach to kernel selection/construction based on the simple premise that a kernel must be good if the number of interpolation points can be halved without significant loss in accuracy (measured using the intrinsic RKHS norm ∥·∥ associated with the kernel). We first test and motivate this idea on a simple problem of recovering the Green's function of an elliptic PDE (with inhomogeneous coefficients) from the sparse observation of one of its solutions. Next we consider the problem of learning non-parametric families of deep kernels of the form K_1(F_n(x), F_n(x')) with F_(n+1) = (I_d + ϵG_(n+1)) ◦ F_n and G_(n+1) ∈ span{K_1(F_n(x_i), ·)}. With the proposed approach constructing the kernel becomes equivalent to integrating a stochastic data driven dynamical system, which allows for the training of very deep (bottomless) networks and the exploration of their properties. These networks learn by constructing flow maps in the kernel and input spaces via incremental data-dependent deformations/perturbations (appearing as the cooperative counterpart of adversarial examples) and, at profound depths, they (1) can achieve accurate classification from only one data point per class (2) appear to learn archetypes of each class (3) expand distances between points that are in different classes and contract distances between points in the same class. For kernels parameterized by the weights of Convolutional Neural Networks, minimizing approximation errors incurred by halving random subsets of interpolation points, appears to outperform training (the same CNN architecture) with relative entropy and dropout.
Article
Some preliminary work is presented on a very general new theory of inductive inference. The extrapolation of an ordered sequence of symbols is implemented by computing the a priori probabilities of various sequences of symbols. The a priori probability of a sequence is obtained by considering a universal Turing machine whose output is the sequence in question. An approximation to the a priori probability is given by the shortest input to the machine that will give the desired output. A more exact formulation is given, and it is made somewhat plausible that extrapolation probabilities obtained will be largely independent of just which universal Turing machine was used, providing that the sequence to be extrapolated has an adequate amount of information in it. Some examples are worked out to show the application of the method to specific problems. Applications of the method to curve fitting and other continuous problems are discussed to some extent. Some alternative
Article
A new alternative definition is given for the algorithmic quantity of information defined by A. N. Kolmogorov. The nongrowth of this quantity is proved for random and certain other processes.
Article
Computers may be thought of as engines for transforming free energy into waste heat and mathematical work. Existing electronic computers dissipate energy vastly in excess of the mean thermal energykT, for purposes such as maintaining volatile storage devices in a bistable condition, synchronizing and standardizing signals, and maximizing switching speed. On the other hand, recent models due to Fredkin and Toffoli show that in principle a computer could compute at finite speed with zero energy dissipation and zero error. In these models, a simple assemblage of simple but idealized mechanical parts (e.g., hard spheres and flat plates) determines a ballistic trajectory isomorphic with the desired computation, a trajectory therefore not foreseen in detail by the builder of the computer. In a classical or semiclassical setting, ballistic models are unrealistic because they require the parts to be assembled with perfect precision and isolated from thermal noise, which would eventually randomize the trajectory and lead to errors. Possibly quantum effects could be exploited to prevent this undesired equipartition of the kinetic energy. Another family of models may be called Brownian computers, because they allow thermal noise to influence the trajectory so strongly that it becomes a random walk through the entire accessible (low-potential-energy) portion of the computer's configuration space. In these computers, a simple assemblage of simple parts determines a low-energy labyrinth isomorphic to the desired computation, through which the system executes its random walk, with a slight drift velocity due to a weak driving force in the direction of forward computation. In return for their greater realism, Brownian models are more dissipative than ballistic ones: the drift velocity is proportional to the driving force, and hence the energy dissipated approaches zero only in the limit of zero speed. In this regard Brownian models resemble the traditional apparatus of thermodynamic thought experiments, where reversibility is also typically only attainable in the limit of zero speed. The enzymatic apparatus of DNA replication, transcription, and translation appear to be nature's closest approach to a Brownian computer, dissipating 20–100kT per step. Both the ballistic and Brownian computers require a change in programming style: computations must be renderedlogically reversible, so that no machine state has more than one logical predecessor. In a ballistic computer, the merging of two trajectories clearly cannot be brought about by purely conservative forces; in a Brownian computer, any extensive amount of merging of computation paths would cause the Brownian computer to spend most of its time bogged down in extraneous predecessors of states on the intended path, unless an extra driving force ofkTln2 were applied (and dissipated) at each merge point. The mathematical means of rendering a computation logically reversible (e.g., creation and annihilation of a history file) will be discussed. The old Maxwell's demon problem is discussed in the light of the relation between logical and thermodynamic reversibility: the essential irreversible step, which prevents the demon from breaking the second law, is not the making of a measurement (which in principle can be done reversibly) but rather the logically irreversible act of erasing the record of one measurement to make room for the next. Converse to the rule that logically irreversible operations on data require an entropy increase elsewhere in the computer is the fact that a tape full of zeros, or one containing some computable pseudorandom sequence such as pi, has fuel value and can be made to do useful thermodynamic work as it randomizes itself. A tape containing an algorithmically random sequence lacks this ability.
Article
The Bayesian framework is a well-studied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not always available or can fail, in particular in complex situations. Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model class and the prior. I discuss in breadth how and in which sense universal (non-i.i.d.) sequence prediction solves various (philosophical) problems of traditional Bayesian sequence prediction. I show that Solomonoff’s model possesses many desirable properties: strong total and future bounds, and weak instantaneous bounds, and, in contrast to most classical continuous prior densities, it has no zero p(oste)rior problem, i.e. it can confirm universal hypotheses, is reparametrization and regrouping invariant, and avoids the old-evidence and updating problem. It even performs well (actually better) in non-computable environments.
Article
RNA folding is viewed here as a map assigning secondary structures to sequences. At fixed chain length the number of sequences far exceeds the number of structures. Frequencies of structures are highly non-uniform and follow a generalized form of Zipf's law: we find relatively few common and many rare ones. By using an algorithm for inverse folding, we show that sequences sharing the same structure are distributed randomly over sequence space. All common structures can be accessed from an arbitrary sequence by a number of mutations much smaller than the chain length. The sequence space is percolated by extensive neutral networks connecting nearest neighbours folding into identical structures. Implications for evolutionary adaptation and for applied molecular evolution are evident: finding a particular structure by mutation and selection is much simpler than expected and, even if catalytic activity should turn out to be sparse of RNA structures, it can hardly be missed by evolutionary processes.
Article
{ital Algorithmic} {ital randomness} provides a rigorous, entropylike measure of disorder of an individual, microscopic, definite state of a physical system. It is defined by the size (in binary digits) of the shortest message specifying the microstate uniquely up to the assumed resolution. Equivalently, algorithmic randomness can be expressed as the number of bits in the smallest program for a universal computer that can reproduce the state in question (for instance, by plotting it with the assumed accuracy). In contrast to the traditional definitions of entropy, algorithmic randomness can be used to measure disorder without any recourse to probabilities. Algorithmic randomness is typically very difficult to calculate exactly but relatively easy to estimate. In large systems, probabilistic ensemble definitions of entropy (e.g., coarse-grained entropy of Gibbs and Boltzmann's entropy {ital H}=ln{ital W}, as well as Shannon's information-theoretic entropy) provide accurate estimates of the algorithmic entropy of an individual system or its average value for an ensemble. One is thus able to rederive much of thermodynamics and statistical mechanics in a setting very different from the usual. {ital Physical} {ital entropy}, I suggest, is a sum of (i) the missing information measured by Shannon's formula and (ii) of the algorithmic information content---algorithmic randomness---present in the available data about the system. This definition of entropy is essential in describing the operation of thermodynamic engines from the viewpoint of information gathering and using systems. These Maxwell demon-type entities are capable of acquiring and processing information and therefore can decide'' on the basis of the results of their measurements and computations the best strategy for extracting energy from their surroundings. From their internal point of view the outcome of each measurement is definite.
Article
If a problem in functional data analysis is low dimensional then the methodology for its solution can often be reduced to relatively conventional techniques in multivariate analysis. Hence, there is intrinsic interest in assessing the finite dimensionality of functional data. We show that this problem has several unique features. From some viewpoints the problem is trivial, in the sense that continuously distributed functional data which are exactly finite dimensional are immediately recognizable as such, if the sample size is sufficiently large. However, in practice, functional data are almost always observed with noise, for example, resulting from rounding or experimental error. Then the problem is almost insolubly difficult. In such cases a part of the average noise variance is confounded with the true signal and is not identifiable. However, it is possible to define the unconfounded part of the noise variance. This represents the best possible lower bound to all potential values of average noise variance and is estimable in low noise settings. Moreover, bootstrap methods can be used to describe the reliability of estimates of unconfounded noise variance, under the assumption that the signal is finite dimensional. Motivated by these ideas, we suggest techniques for assessing the finiteness of dimensionality. In particular, we show how to construct a critical point such that, if the distribution of our functional data has fewer than "q" - 1 degrees of freedom, then we should be willing to assume that the average variance of the added noise is at least . If this level seems too high then we must conclude that the dimension is at least "q" - 1. We show that simpler, more conventional techniques, based on hypothesis testing, are gene
Article
A new approach to the problem of evaluating the complexity ("randomness") of finite sequences is presented. The proposed complexity measure is related to the number of steps in a self-delimiting production process by which a given sequence is presumed to be generated. It is further related to the number of distinct substrings and the rate of their occurrence along the sequence. The derived properties of the proposed measure are discussed and motivated in conjunction with other well-established complexity criteria.
Article
Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the Solomonoff-Levin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a time-bounded extension of Kolmogoro...