Conference Paper

Approximating discrete probability distributions with causal dependence trees

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Chow and Liu considered the problem of approximating discrete joint distributions with dependence tree distributions where the goodness of the approximations were measured in terms of KL distance. They (i) demonstrated that the minimum divergence approximation was the tree with maximum sum of mutual informations, and (ii) specified a low-complexity minimum-weight spanning tree algorithm to find the optimal tree. In this paper, we consider an analogous problem of approximating the joint distribution on discrete random processes with causal, directed, dependence trees, where the approximation is again measured in terms of KL distance. We (i) demonstrate that the minimum divergence approximation is the directed tree with maximum sum of directed informations, and (ii) specify a low-complexity minimum weight directed spanning tree, or arborescence, algorithm to find the optimal tree. We also present an example to demonstrate the algorithm.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Only the dependency of a bit to one of its previous bits is taken into account during the calculation. Chow and Liu analysed how to optimize this estimation in the sense of Kullback-Leibler distance [4]. Therefore, Eq. (3) is also called Chow-Liu representation. ...
... Therefore, Eq. (3) is also called Chow-Liu representation. IfˆHIfˆ IfˆH(X) is the estimated entropy withˆPwithˆ withˆP (X), then with the chain rule of the joint entropy:Table 2. The mean and standard deviation of the estimated entropy and the information rate for the training and testing set It is shown in [4] that Kullback-Leibler distance between the real distribution of X and the second-order dependency tree is dependent on the following variables: ...
Article
Full-text available
Fuzzy commitment is an efficient template protection algorithm that can improve security and safeguard privacy of biometrics. Existing theoretical security analysis has proved that although privacy leakage is unavoidable, perfect security from information-theoretical points of view is possible when bits extracted from biometric features are uniformly and independently distributed. Unfortunately, this strict condition is difficult to fulfill in practice. In many applications, dependency of binary features is ignored and security is thus suspected to be highly overestimated. This paper gives a comprehensive analysis on security and privacy of fuzzy commitment regarding empirical evaluation. The criteria representing requirements in practical applications are investigated and measured quantitatively in an existing protection system for 3D face recognition. The evaluation results show that a very significant reduction of security and enlargement of privacy leakage occur due to the dependency of biometric features. This work shows that in practice, one has to explicitly measure the security and privacy instead of trusting results under non-realistic assumptions.
... After the reintroduction of transfer entropy, it has found applications in various research fields. Such successful applications include not only engineering fields relevant to information theory but also various kinds of scientiffic research: detecting directed dependency in cellular automata [8], machine learning [9], chemical process [10], health monitoring [11], analysis of brain activity [12,13], stock markets [14,15], ecological monitoring programs [16], music analysis [17], and human-human/robot communication [18,19,20]. ...
... In analysis of such datasets with complex interactions, MTE is likely to be useful because it exclusively measures a pair of variables by cancelling out the effects from the other variables. This general applicability of MTE to multivariate systems covers a broader range of empirical and theoretical fields using PTEs [8,9,10,11,12,13,14,16,17,18,19,20]. ...
Article
Full-text available
One of the crucial steps in scientific studies is to specify dependent relationships among factors in a system of interest. Given little knowledge of a system, can we characterize the underlying dependent relationships through observation of its temporal behaviors? In multivariate systems, there are potentially many possible dependent structures confusable with each other, and it may cause false detection of illusory dependency between unrelated factors. The present study proposes a new information-theoretic measure with consideration to such potential multivariate relationships. The proposed measure, called multivariate transfer entropy, is an extension of transfer entropy, a measure of temporal predictability. In the simulations and empirical studies, we demonstrated that the proposed measure characterized the latent dependent relationships in unknown dynamical systems more accurately than its alternative measure.
... If the number of attributes q + p ≤ 25, we choose the P C algorithm [24] to learn the structured correlations among nodes V = {V 1 , ..., V q+p }. If q + p > 25, we choose the Chou − Liu algorithm [25]. In the final DAG, we add edges that point directly to the label attribute from other attributes. ...
Preprint
Full-text available
p>GAN-based tabular synthesis methods have made important progress in generating sophisticated synthetic data for privacypreserving data publishing. However, existing methods do not consider explicit attribute correlations and property constraints on tabular data synthesis, which may lead to inaccurate data analysis results. In this paper, we propose a Controllable tabular data synthesis framework with explicit Correlations and property Constraints, namely C3-TGAN. It leverages Bayesian networks to learn explicit correlations among attributes and model them as control vectors. Such control vectors can guide C3-TGAN to generate synthetic data with complicated property constraints. By conducting comprehensive experiments on 14 publicly available benchmark datasets, we showcase C3-TGAN’s remarkable performance advantage over state-of-the-art methods for synthesizing tabular data.</p
... If the number of attributes q + p ≤ 25, we choose the P C algorithm [24] to learn the structured correlations among nodes V = {V 1 , ..., V q+p }. If q + p > 25, we choose the Chou − Liu algorithm [25]. In the final DAG, we add edges that point directly to the label attribute from other attributes. ...
Preprint
Full-text available
p>GAN-based tabular synthesis methods have made important progress in generating sophisticated synthetic data for privacypreserving data publishing. However, existing methods do not consider explicit attribute correlations and property constraints on tabular data synthesis, which may lead to inaccurate data analysis results. In this paper, we propose a Controllable tabular data synthesis framework with explicit Correlations and property Constraints, namely C3-TGAN. It leverages Bayesian networks to learn explicit correlations among attributes and model them as control vectors. Such control vectors can guide C3-TGAN to generate synthetic data with complicated property constraints. By conducting comprehensive experiments on 14 publicly available benchmark datasets, we showcase C3-TGAN’s remarkable performance advantage over state-of-the-art methods for synthesizing tabular data.</p
... The MST is a graph that links all nodes of a network with the minimum cost of links. The approximate MST algorithm provides an efficient solution for constructing the MST [9]. Through recent innovations in the approximate MST algorithm, the complexity of the algorithm has been drastically reduced, allowing for faster and more accurate network construction [10]. ...
... e steps of the MWST algorithm [41] are as follows: first, start from the node set Y � { X i }. en, find node X j from the set V \ Y, which has the largest mutual information with any node y in set Y and uses undirected edges to connect y and X j . Repeat this operation until Y � V. ...
Article
Full-text available
Dynamic programming is difficult to apply to large-scale Bayesian network structure learning. In view of this, this article proposes a BN structure learning algorithm based on dynamic programming, which integrates improved MMPC (maximum-minimum parents and children) and MWST (maximum weight spanning tree). First, we use the maximum weight spanning tree to obtain the maximum number of parent nodes of the network node. Second, the MMPC algorithm is improved by the symmetric relationship to reduce false-positive nodes and obtain the set of candidate parent-child nodes. Finally, with the maximum number of parent nodes and the set of candidate parent nodes as constraints, we prune the parent graph of dynamic programming to reduce the number of scoring calculations and the complexity of the algorithm. Experiments have proved that when an appropriate significance level α is selected, the MMPCDP algorithm can greatly reduce the number of scoring calculations and running time while ensuring its accuracy.
... Achieving scalability and accuracy (using valid spectrum patterns and RSA constraints) at the same time is hard. Nevertheless, utilizing the independence link assumption A1 in Section 2.5.1, and noting that a product-form approximation is also a valid probability distribution [21,62], the approximate probability of acceptance of a request on a route with h hops in an EON without SC may be given by a product of all individual probability of acceptance in constituent links on each hop of an h-hop route r as below. ...
Thesis
Full-text available
The unprecedented growth in Internet traffic has driven the innovations in provisioning of optical resources as per the need of bandwidth demands such that the resource utilization and spectrum efficiency could be maximized. With the advent of the next generation flexible optical transponders and switches, the flexible-grid-based elastic optical network (EON) is foreseen as an alternative to the widely deployed fixed-grid-based wavelength division multiplexing networks. At the same time, the flexible resource provisioning also raises new challenges for EONs. One such challenge is the spectrum fragmentation. As network traffic varies over time, spectrum gets fragmented due to the setting up and tearing down of non-uniform bandwidth requests over aligned (i.e., continuous) and adjacent (i.e., contiguous) spectrum slices, which leads to a non-optimal spectrum allocation, and generally results in higher blocking probability and lower spectrum utilization in EONs. To address this issue, the allocation and reallocation of optical resources are required to be modeled accurately, and managed efficiently and intelligently. The modeling of routing and spectrum allocation in EONs with the spectrum contiguity and spectrum continuity constraints is well investigated, but existing models do not consider the fragmentation issue resulted by these constraints and non-uniform bandwidth demands. This thesis addresses this issue and considers both the constraints to computing exact blocking probabilities in EONs with and without spectrum conversion, and with spectrum reallocation (known as defragmentation) for the first time using the Markovian approach. As the exact network models are not scalable with respect to the network size and capacity, this thesis proposes load independent and load-dependent approximate models to compute approximate blocking probabilities in EONs. Results show that the connection blocking due to fragmentation can be reduced by using a spectrum conversion or a defragmentation approach, but it can not be eliminated in a mesh network topology. This thesis also deals with the important network resource provisioning task in EONs. To this end, it first presents algorithmic solutions to efficiently allocate and reallocate spectrum resources using the fragmentation factor along spectral, time, and spatial dimensions. Furthermore, this thesis highlights the role of machine learning techniques in alleviating issues in static provisioning of optical resources, and presents two use-cases: handling time-varying traffic in optical data center networks, and reducing energy consumption and allocating spectrum proportionately to traffic classes in fiber-wireless networks.
... A search during the conventional network structure that maximizes a scoring function described to express how well a structure matches a provided set of data [19]. The scoring functions are depended on several concepts, such as entropy and information [28,29], the minimum description length [30,31] , or Bayesian approaches [32,33]. According to Bayesian rules, the posterior probability of Bayesian network is: ...
Conference Paper
Full-text available
A Bayesian network is used for a distribution of random variables and conditional dependencies among them. In this paper, we present two methods for enhancing structure learning of Bayesian network using search optimization. This method uses a hybrid approach between Bee optimization algorithms and Simulates Annealing. In this proposed method we use two techniques for search. The first technique, uses Bee optimization as a local search and Simulated Annealing as a global search (BS). The second technique uses Simulated Annealing as local search and Bee algorithms as global search(BSA2TR). In the proposed methods DBe (Bayesian Dirichlet "e" for likelihood-equivalence) metric is used as score function. The result shows the efficiency of the proposed methods based on different criteria by using different networks like (Alarm, Asia, Andes2, Hepar2, Hailfinder,…etc).
... where q(x 1..N ) is the linearized version of the nonlinear subgraph q(X 1..N ) = ij∈K q(z ij | X i , X j ) and K = {(i, j)} i,j∈1..N is the target subgraph topology. Our method is generic to the target subgraph's topology, ranging from the sparsest Chow-Liu tree [8] to the fully-dense clique. Different strategies to choose a topology have been discussed extensively in previous work [4], [22]. ...
Conference Paper
Full-text available
We present a fast nonlinear approximation method for marginalizing out nodes on pose graphs for long-term simultaneous localization, mapping, and navigation. Our approximation preserves the pose graph structure to leverage the rich literature of pose graphs and optimization schemes. By re-parameterizing from absolute-to relative-pose spaces, our method does not suffer from the choice of linearization points as in previous works. We then join our approximation process with a scaled version of the recently-demoted pose-composition approach. Our approach eschews the expenses of many state-of-the-art convex optimization schemes through our efficient and simple O(N 2) implementation for a given known topology of the approximate subgraph. We demonstrate its speed and near optimality in practice by comparing against state-of-the-art techniques on popular datasets.
... Mutation trees can be sampled as done for phylogenetic trees, either exhaustively or by Monte Carlo, and can be scored via standard information theory. Each such model is a well-known Chow-Liu tree, a generator of the joint distribution p(c 1 , … , c w ) if c 1 , … , c w are the w groups for this patient-that is, the probability of observing the presence/absence of the corresponding alterations in a sample 50 . A Chow-Liu tree contains second-order terms p(y|x) for the product approximation of the joint distribution that we factorize. ...
Article
Full-text available
Recurrent successions of genomic changes, both within and between patients, reflect repeated evolutionary processes that are valuable for the anticipation of cancer progression. Multi-region sequencing allows the temporal order of some genomic changes in a tumor to be inferred, but the robust identification of repeated evolution across patients remains a challenge. We developed a machine-learning method based on transfer learning that allowed us to overcome the stochastic effects of cancer evolution and noise in data and identified hidden evolutionary patterns in cancer cohorts. When applied to multi-region sequencing datasets from lung, breast, renal, and colorectal cancer (768 samples from 178 patients), our method detected repeated evolutionary trajectories in subgroups of patients, which were reproduced in single-sample cohorts (n = 2,935). Our method provides a means of classifying patients on the basis of how their tumor evolved, with implications for the anticipation of disease progression.
... TAN Bayes classifiers are not good candidates for bagging [45]. However, imposing a tree structure to rare data can be too strong and lead to overfitting. ...
Conference Paper
Full-text available
Class imbalance affects medical diagnosis, as the number of disease cases is often outnumbered. When it is severe, learning algorithms fail to retrieve the rarer classes and common assessment metrics become uninformative. In this work, class imbalance is approached using neuropsychological data, with the aim of differentiating Alzheimer’s Disease (AD) from Mild Cognitive Impairment (MCI) and predicting the conversion from MCI to AD. The effect of the imbalance on four learning algorithms is examined through the application of bagging, Bayes risk minimization and MetaCost. Plain decision trees were always outperformed, indicating susceptibility to the imbalance. The naïve Bayes classifier was robust but suffered a bias that was adjusted through risk minimization. This strategy outperformed all other combinations of classifiers and meta-learning/ensemble methods. The tree-augmented naïve Bayes classifier also benefited from an adjustment of the decision threshold. In the nearly balanced datasets, it was improved by bagging, suggesting that the tree structure was too strong for the attribute dependencies. Support vector machines were robust, as their plain version achieved good results and was never outperformed.
... This means that it is possible to recover the best tree that approximates the data. In [Chow and Liu, 1968] is presented the Chow-Liu's algorithm for learning trees. After the skeleton is obtained, this algorithm selects a node as a root and directs the remainder edges according to this selection. ...
Article
Full-text available
Bayesian networks, are usefull tools for the representation of non-linear interactions among variables. Recently, they have been combined with evolutionary methods to form a new class of optimization algorithms: the Factorized Distribution Algorithms (FDAs). FDAs have been proved to be significantly better than their genetic ancestors. They learn and sample distributions instead of using crossover and mutation operators. Most of the members of the FDAs that have been designed, learn general Bayesian networks. However, in this work we study a FDA that learns polytrees, which are single connected directed graphs. Key words: Bayesian networks, evolutionary algorithms, FDAs. MSC: 90B15, 90C35. RESUMEN Las redes Bayesianas son herramientas muy útiles para la representación de las interacciones no lineales entre las variables. Recientemente estas han sido combinadas con los métodos evolutivos para formar una nueva clase de algoritmos de optimización: los Algoritmos con Distribución Factorizada (FDA). Se ha probado que los FDA son mejores que sus antecesores, los Algoritmos Genéticos, en problemas donde existe una fuerte interacción entre las variables. Estos algoritmos aprenden y simulan distribuciones probabilísticas, en lugar de usar operadores de cruzamiento y mutación. La mayoría de los FDA diseñados hasta el momento aprenden redes Bayesianas generales. Sin embargo, en este trabajo nosotros estudiamos un FDA que aprende poliárboles, los cuales son grafos dirigidos simplemente conectados. Palabras clave: redes Bayesianas, algoritmos evolutivos, FDA.
... Mutual information has been used in pattern recognition and information retrieval for finding association between attributes [6,52]. A dependence tree consisting of pairs of most dependent attributes can be constructed by using mutual information as a measure of dependency between two attributes [8]. Mutual information and related dependence trees and generalized dependence graphs have been used in probabilistic networks and expert systems [9,40]. ...
Article
Full-text available
A database may be considered as a statistical population, and an at- tribute as a statistical variable taking values from its domain. One can carry out statistical and information-theoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller populations. An attribute is deemed important if it partitions the database such that previously unknown reg- ularities and patterns are observable. Many information-theoretic measures have been proposed and applied to quantify the importance of attributes and relation- ships between attributes in various fields. In the context of knowledge discovery and data mining (KDD), we present a critical review and analysis of information- theoretic measures of attribute importance and attribute association, with emphasis on their interpretations and connections.
Conference Paper
We consider a specific graph learning task: reconstructing a symmetric matrix that represents an underlying graph using linear measurements. We study fundamental trade-offs between the number of measurements (sample complexity), the complexity of the graph class, and the probability of error by first deriving a necessary condition (fundamental limit) on the number of measurements. Then, by considering a two-stage recovery scheme, we give a sufficient condition for recovery. In the special cases of the uniform distribution on trees with n nodes and the Erdös-Rényi (n, p) class, the sample complexity derived from the fundamental trade-offs is tight up to multiplicative factors. In addition, we design and implement a polynomial-time (in n) algorithm based on the two-stage recovery scheme. Simulations for several canonical graph classes and IEEE power system test cases demonstrate the effectiveness of the proposed algorithm for accurate topology and parameter recovery.
Article
Full-text available
The last decade has seen an increase in the attention paid to the development of cost‐sensitive learning algorithms that aim to minimize misclassification costs while still maintaining accuracy. Most of this attention has been on cost‐sensitive decision tree learning, whereas relatively little attention has been paid to assess if it is possible to develop better cost‐sensitive classifiers based on Bayesian networks. Hence, this paper presents EBNO, an algorithm that utilizes Genetic algorithms to learn cost‐sensitive Bayesian networks, where genes are utilized to represent the links between the nodes in Bayesian networks and the expected cost is used as a fitness function. An empirical comparison of the new algorithm has been carried out with respect to (a) an algorithm that induces cost‐insensitive Bayesian networks to provide a base line, (b) ICET, a well‐known algorithm that uses Genetic algorithms to induce cost‐sensitive decision trees, (c) use of MetaCost to induce cost‐sensitive Bayesian networks via bagging (d) use of AdaBoost to induce cost‐sensitive Bayesian networks, and (e) use of XGBoost, a gradient boosting algorithm, to induce cost‐sensitive decision trees. An empirical evaluation on 28 data sets reveals that EBNO performs well in comparison with the algorithms that produce single interpretable models and performs just as well as algorithms that use bagging and boosting methods.
Article
The increasing proliferation of Cloud Services (CSs) has made the reliable CS selection problem a major challenge. To tackle this problem, this article introduces a new trust model called Chain Augmented Naïve Bayes-based Trust Model (CAN-TM). This model leverages the correlation that may exist among QoS attributes to solve many issues in reliable CS selection challenge, such as predicting missing assessments and improving accuracy of trust computing. This is achieved by combining both the n-gram Markov model and the Naïve Bayes model. Experiments are conducted to validate that our proposed CAN-TM outperforms state-of-the-art approaches.
Article
We consider the maximum likelihood estimation of sparse inverse covariance matrices. We demonstrate that current heuristic approaches primarily encourage robustness, instead of the desired sparsity. We give a novel approach that solves the cardinality constrained likelihood problem to certifiable optimality. The approach uses techniques from mixed-integer optimization and convex optimization, and provides a high-quality solution with a guarantee on its suboptimality, even if the algorithm is terminated early. Using a variety of synthetic and real datasets, we demonstrate that our approach can solve problems where the dimension of the inverse covariance matrix is up to 1000 s. We also demonstrate that our approach produces significantly sparser solutions than Glasso and other popular learning procedures, makes less false discoveries, while still maintaining state-of-the-art accuracy.
Article
For high‐dimensional data, it is a tedious task to determine anomalies such as outliers. We present a novel outlier detection method for high‐dimensional contingency tables. We use the class of decomposable graphical models to model the relationship among the variables of interest, which can be depicted by an undirected graph called the interaction graph. Given an interaction graph, we derive a closed form expression of the likelihood ratio test (LRT) statistic and an exact distribution for efficient simulation of the test statistic. An observation is declared an outlier if it deviates significantly from the approximated distribution of the test statistic under the null hypothesis. We demonstrate the use of the LRT outlier detection framework on genetic data modeled by Chow‐Liu trees.
Article
Full-text available
Probabilistic graphical models offer a powerful framework to account for the dependence structure between variables, which is represented as a graph. However, the dependence between variables may render inference tasks intractable. In this paper, we review techniques exploiting the graph structure for exact inference, borrowed from optimisation and computer science. They are built on the principle of variable elimination whose complexity is dictated in an intricate way by the order in which variables are eliminated. The so‐called treewidth of the graph characterises this algorithmic complexity: low‐treewidth graphs can be processed efficiently. The first point that we illustrate is therefore the idea that for inference in graphical models, the number of variables is not the limiting factor, and it is worth checking the width of several tree decompositions of the graph before resorting to the approximate method. We show how algorithms providing an upper bound of the treewidth can be exploited to derive a ‘good' elimination order enabling to realise exact inference. The second point is that when the treewidth is too large, algorithms for approximate inference linked to the principle of variable elimination, such as loopy belief propagation and variational approaches, can lead to accurate results while being much less time consuming than Monte‐Carlo approaches. We illustrate the techniques reviewed in this article on benchmarks of inference problems in genetic linkage analysis and computer vision, as well as on hidden variables restoration in coupled Hidden Markov Models.
Conference Paper
We present a compressed sensing based approach to multilabel classification that exploits the label structure present in many multilabel applications. The compressed sensing method exploits the sparsity in the label vector. The label vector is projected to a lower dimensional space by a random projection matrix. From the training data we learn how to predict the projected vector directly from the features of the samples. For a new test sample, we first predict the projected vector and then use compressed sensing recovery algorithm to estimate the sparse label vector. For many practical scenarios the label vector is not only sparse but the active labels represent a context or theme; hence have a structure. In this paper we propose to learn the label structure instead of considering the individual labels to be independent and identically distributed. We assume a Bayesian model for the labels and model the label structure as latent tree. We learn the label structure from the training data and use the learned structure during estimation of the label vector from predicted projections. Furthermore, we propose a new structure learning approach where we hash the labels into smaller number of buckets and learn the structure from these buckets. This significantly reduces the computational complexity without sacrificing accuracy. We present numerical results to demonstrate this approach and its benefit.
Chapter
Pattern mining is one of the most important aspects of data mining. By far the most popular and well-known approach is frequent pattern mining. That is, to discover patterns that occur in many transactions. This approach has many virtues including monotonicity, which allows efficient discovery of all frequent patterns. Nevertheless, in practice frequent pattern mining rarely gives good results—the number of discovered patterns is typically gargantuan and they are heavily redundant. Consequently, a lot of research effort has been invested toward improving the quality of the discovered patterns. In this chapter we will give an overview of the interestingness measures and other redundancy reduction techniques that have been proposed to this end. In particular, we first present classic techniques such as closed and non-derivable itemsets that are used to prune unnecessary itemsets. We then discuss techniques for ranking patterns on how expected their score is under a null hypothesis—considering patterns that deviate from this expectation to be interesting. These models can either be static, as well as dynamic; we can iteratively update this model as we discover new patterns. More generally, we also give a brief overview on pattern set mining techniques, where we measure quality over a set of patterns, instead of individually. This setup gives us freedom to explicitly punish redundancy which leads to a more to-the-point results.
Chapter
In this chapter, we review the Estimation of Distribution Algorithms proposed for the solution of combinatorial optimization problems and optimization in continuous domains. Different approaches for Estimation of Distribution Algorithms have been ordered by the complexity of the interrelations that they are able to express. These will be introduced using one unified notation.
Article
In this chapter a preliminary work on the use of Estimation of Distribution Algorithms (EDAs) for the induction of classification rules is presented. Each individual obtained by simulation of the probability distribution learnt in each EDA generation represents a disjunction of a finite number of simple rules. This problem has been modeled to allow representations with different complexities. Experimental results comparing three types of EDAs – UMDA, a dependency tree and EBNA – with two classical algorithms of rule induction – RIPPER and CN2 – are shown.
Article
Recently, directed information graphs have been proposed as concise graphical representations of the statistical dynamics among multiple random processes. A directed edge from one node to another indicates that the past of one random process statistically affects the future of another, given the past of all other processes. When the number of processes is large, computing those conditional dependence tests becomes difficult. Also, when the number of interactions becomes too large, the graph no longer facilitates visual extraction of relevant information for decision-making. This work considers approximating the true joint distribution on multiple random processes by another, whose directed information graph has at most one parent for any node. Under a Kullback-Leibler (KL) divergence minimization criterion, we show that the optimal approximate joint distribution can be obtained by maximizing a sum of directed informations. In particular, each directed information calculation only involves statistics among a pair of processes and can be efficiently estimated and given all pairwise directed informations, an efficient minimum weight spanning directed tree algorithm can be solved to find the best tree. We demonstrate the efficacy of this approach using simulated and experimental data. In both, the approximations preserve the relevant information for decision-making.
Article
In this chapter we present an approach for solving the Traveling Sales man Problem using Estimation of Distribution Algorithms (EDAs). This approach is based on using discrete and continuous EDAs to find the best possible solution. We also present a method in which domain knowledge (based on local search) is combined with EDAs to find better solutions. We show experimental results obtained on several standard examples for discrete and continuous EDAs both alone and combined with a heuristic local search.
Article
Full-text available
In this paper we examine a novel addition to the known methods for learning Bayesian networks from data that improves the quality of the learned networks. Our approach explicitly represents and learns the local structure in the conditional probability tables (CPTs), that quantify these networks. This increases the space of possible models, enabling the representation of CPTs with a variable number of parameters that depends on the learned local structures. The resulting learning procedure is capable of inducing models that better emulate the real complexity of the interactions present in the data. We describe the theoretical foundations and practical aspects of learning local structures, as well as an empirical evaluation of the proposed method. This evaluation indicates that learning curves characterizing the procedure that exploits the local structure converge faster than these of the standard procedure. Our results also show that networks learned with local structure tend to be more complex (in terms of arcs), yet require less parameters.
Article
Full-text available
Recently, Fredman and Tarjan invented a new, especially efficient form of heap (priority queue). Their data structure, the Fibonacci heap (or F-heap) supports arbitrary deletion in O(log n) amortized time and other heap operations in O(1) amortized time. In this paper we use F-heaps to obtain fast algorithms for finding minimum spanning trees in undirected and directed graphs. For an undirected graph containing n vertices and m edges, our minimum spanning tree algorithm runs in O(m log β (m, n)) time, improved from O(mβ(m, n)) time, where β(m, n)=min {i|log(i) n ≦m/n}. Our minimum spanning tree algorithm for directed graphs runs in O(n log n + m) time, improved from O(n log n +m log log log(m/n+2) n). Both algorithms can be extended to allow a degree constraint at one vertex.
Chapter
Full-text available
A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesian-network methods for learning to techniques for supervised and unsupervised learning. We illustrate the graphical-modeling approach using a real-world case study.
Article
Full-text available
New measures are proposed for mutual and causal dependence between two time series, based on information theoretical ideas. The measure of mutual dependence is shown to be the sum of the measure of unidirectional causal dependence from the first time series to the second, the measure of unidirectional causal dependence from the second to the first, and the measure of instantaneous causal dependence. The measures are applicable to any kind of time series: continuous, discrete, or categorical.
Article
Full-text available
This paper addresses the problem of learning Bayesian network structures from data by using an information theoretic dependency analysis approach. Based on our three-phase construction mechanism, two efficient algorithms have been developed. One of our algorithms deals with a special case where the node ordering is given, the algorithm only require ) ( 2 N O CI tests and is correct given that the underlying model is DAG-Faithful [Spirtes et. al., 1996]. The other algorithm deals with the general case and requires ) ( 4 N O conditional independence (CI) tests. It is correct given that the underlying model is monotone DAG-Faithful (see Section 4.4). A system based on these algorithms has been developed and distributed through the Internet. The empirical results show that our approach is efficient and reliable. 1 Introduction The Bayesian network is a powerful knowledge representation and reasoning tool under conditions of uncertainty. A Bayesian network is a directed acyclic graph ...
Article
Written by one of the preeminent researchers in the field, this book provides a comprehensive exposition of modern analysis of causation. It shows how causality has grown from a nebulous concept into a mathematical theory with significant applications in the fields of statistics, artificial intelligence, economics, philosophy, cognitive science, and the health and social sciences. Judea Pearl presents and unifies the probabilistic, manipulative, counterfactual, and structural approaches to causation and devises simple mathematical tools for studying the relationships between causal connections and statistical associations. The book will open the way for including causal analysis in the standard curricula of statistics, artificial intelligence, business, epidemiology, social sciences, and economics. Students in these fields will find natural models, simple inferential procedures, and precise mathematical definitions of causal concepts that traditional texts have evaded or made unduly complicated. The first edition of Causality has led to a paradigmatic change in the way that causality is treated in statistics, philosophy, computer science, social science, and economics. Cited in more than 5,000 scientific publications, it continues to liberate scientists from the traditional molds of statistical thinking. In this revised edition, Judea Pearl elucidates thorny issues, answers readers’ questions, and offers a panoramic view of recent advances in this field of research. Causality will be of interests to students and professionals in a wide variety of fields. Anyone who wishes to elucidate meaningful relationships from data, predict effects of actions and policies, assess explanations of reported events, or form theories of causal understanding and causal speech will find this book stimulating and invaluable.
Article
Introduction to Graphs and Networks Computer Representation and Solution Tree Algorithms Shortest-Path Algorithms Minimum-Cost Flow Algorithms Matching and Assignment Algorithms The Postman and Related Arc Routing Problems The Traveling Salesman and Related Vertex Routing Problems Location Problems Project Networks NETSOLVE User's Manual
Article
This paper provides algorithms that use an information-theoretic analysis to learn Bayesian network structures from data. Based on our three-phase learning framework, we develop efficient algorithms that can effectively learn Bayesian networks, requiring only polynomial numbers of conditional independence (CI) tests in typical cases. We provide precise conditions that specify when these algorithms are guaranteed to be correct as well as empirical evidence (from real world applications and simulation tests) that demonstrates that these systems work efficiently and reliably in practice.
Article
We consider a Bayesian method for learning the Bayesian network structure from complete data. Recently, Koivisto and Sood (2004) presented an algorithm that for any single edge computes its marginal posterior probability in O(n 2^n) time, where n is the number of attributes; the number of parents per attribute is bounded by a constant. In this paper we show that the posterior probabilities for all the n (n - 1) potential edges can be computed in O(n 2^n) total time. This result is achieved by a forward-backward technique and fast Moebius transform algorithms, which are of independent interest. The resulting speedup by a factor of about n^2 allows us to experimentally study the statistical power of learning moderate-size networks. We report results from a simulation study that covers data sets with 20 to 10,000 records over 5 to 25 discrete attributes
Chapter
Half-title pageSeries pageTitle pageCopyright pageDedicationPrefaceAcknowledgementsContentsList of figuresHalf-title pageIndex
Article
There is an increasing interest in studying control systems employing multiple sensors and actuators that are geographically distributed. Communication is an important component of these distributed and networked control systems. Hence, there is a need to understand the interactions between the control components and the communication components of the distributed system. In this paper, we formulate a control problem with a communication channel connecting the sensor to the controller. Our task involves designing the channel encoder and channel decoder along with the controller to achieve different control objectives. We provide upper and lower bounds on the channel rate required to achieve these different control objectives. In many cases, these bounds are tight. In doing so, we characterize the "information complexity" of different control objectives.
Chapter
Information theory answers two fundamental questions in communication theory: what is the ultimate data compression (answer: the entropy H), and what is the ultimate transmission rate of communication (answer: the channel capacity C). For this reason some consider information theory to be a subset of communication theory. We will argue that it is much more. Indeed, it has fundamental contributions to make in statistical physics (thermodynamics), computer science (Kolmogorov complexity or algorithmic complexity), statistical inference (Occam's Razor: “The simplest explanation is best”) and to probability and statistics (error rates for optimal hypothesis testing and estimation). The relationship of information theory to other fields is discussed. Information theory intersects physics (statistical mechanics), mathematics (probability theory), electrical engineering (communication theory) and computer science (algorithmic complexity). We describe these areas of intersection in detail.
Article
A distributed algorithm is presented for constructing minimum weight directed spanning trees (arborescences), each with a distinct root node, in a strongly connected directed graph. A processor exists at each node. Given the weights and origins of the edges incoming to their nodes, the processors follow the algorithm and exchange messages with their neighbors until all arborescences are constructed. The amount of information exchanged and the time to completion are O(|N|)2. Copyright © 1983 by The Institute of Electrical and Electronics Engineers, Inc.
Article
A generalization of information theory is presented with the aim of distinguishing the direction of information flow for mutually coupled statistical systems. The bidirectional communication theory refers to two systems. Two directed transinformations are defined which are a measure of the statistical coupling between the systems. Their sum equals Shannon's transinformation. An information flow diagram explains the relation between the directed transinformations and the entropies of the sources. An extension to a group of such systems has also been proposed. The theory is able to describe the informational relationships between living beings and other multivariate complex systems as encountered in economy. An application example referring to group behavior with monkeys is given.
Conference Paper
We consider channel coding with feedback for the general case where the feedback may be an arbitrary deterministic function of the output samples. Under the assumption that the channel states take values in a finite alphabet, we find an achievable rate and an upper bound on the capacity. We conclude by showing that when the channel is indecomposable, and has no intersymbol interference, its capacity is given by the limit of the maximum of the (normalized) directed information between the input XN and the output YN, i.e. C = limNrarrinfin/1N max I(XN rarr YN), where the maximization is over the causal conditioning probability Q(xN||kN-) defined in this paper
Article
We consider capacity of discrete-time channels with feedback for the general case where the feedback is a time-invariant deterministic function of the output samples. Under the assumption that the channel states take values in a finite alphabet, we find a sequence of achievable rates and a sequence of upper bounds on the capacity. The achievable rates and the upper bounds are computable for any N, and the limits of the sequences exist. We show that when the probability of the initial state is positive for all the channel states, then the capacity is the limit of the achievable-rate sequence. We further show that when the channel is stationary, indecomposable, and has no intersymbol interference (ISI), its capacity is given by the limit of the maximum of the (normalized) directed information between the input X N and the output Y N , i.e., C=lim Nrarrinfin (1/n)max I(X N rarrY N ) where the maximization is taken over the causal conditioning probability Q(x N parz N-1 ) defined in this paper. The main idea for obtaining the results is to add causality into Gallager's results on finite state channels. The capacity results are used to show that the source-channel separation theorem holds for time-invariant determinist feedback, and if the state of the channel is known both at the encoder and the decoder, then feedback does not increase capacity.
Article
In this paper, we introduce a general framework for treating channels with memory and feedback. First, we prove a general feedback channel coding theorem based on Massey's concept of directed information. Second, we present coding results for Markov channels. This requires determining appropriate sufficient statistics at the encoder and decoder. We give a recursive characterization of these sufficient statistics. Third, a dynamic programming framework for computing the capacity of Markov channels is presented. Fourth, it is shown that the average cost optimality equation (ACOE) can be viewed as an implicit single-letter characterization of the capacity. Fifth, scenarios with simple sufficient statistics are described. Sixth, error exponents for channels with feedback are presented.
Conference Paper
We study the problem of gambling in horse races with causal side information and show that Masseypsilas directed information characterizes the increment in the maximum achievable capital growth rate due to the availability of side information. This result gives a natural interpretation of directed information I(Yn rarr Xn) as the amount of information that Yn causally provides about Xn. Extensions to stock market portfolio strategies and data compression with causal side information are also discussed.
Article
We investigate the role of directed information in portfolio theory, data compression, and statistics with causality constraints. In particular, we show that directed information is an upper bound on the increment in growth rates of optimal portfolios in a stock market due to causal side information. This upper bound is tight for gambling in a horse race, which is an extreme case of stock markets. Directed information also characterizes the value of causal side information in instantaneous compression and quantifies the benefit of causal inference in joint compression of two stochastic processes. In hypothesis testing, directed information evaluates the best error exponent for testing whether a random process Y causally influences another process X or not. These results lead to a natural interpretation of directed information I ( Yn → Xn ) as the amount of information that a random sequence Yn = ( Y 1, Y 2,..., Yn ) causally provides about another random sequence Xn = ( X 1, X 2,..., Xn ). A new measure, directed lautum information, is also introduced and interpreted in portfolio theory, data compression, and hypothesis testing.
Conference Paper
In this paper, we consider a dynamical system, whose state is an input to a memoryless channel. The state of the dynamical system is affected by its past, an exogenous input, and causal feedback from the channel's output. We consider maximizing the directed information between the input signal and the channel output, over all exogenous input distributions and/or dynamical system policies. We demonstrate that under certain conditions, reversibility of a Markov chain implies directed information is maximized. With this, we develop achievability theorems for channels with (infinite) memory as well as optimality conditions for sequential estimation of Markov processes through dynamical systems with causal feedback. We provide examples, which includes the exponential server timing channel and the trapdoor channel.
Article
Learning a Bayesian network structure from data is a well-motivated but computationally hard task. We present an algorithm that computes the exact posterior probability of a subnetwork, e.g., a directed edge; a modified version of the algorithm finds one of the most probable network structures. This algorithm runs in time O(n 2n + nk+1C(m)), where n is the number of network variables, k is a constant maximum in-degree, and C(m) is the cost of computing a single local marginal conditional likelihood for m data instances. This is the first algorithm with less than super-exponential complexity with respect to n. Exact computation allows us to tackle complex cases where existing Monte Carlo methods and local search procedures potentially fail. We show that also in domains with a large number of variables, exact computation is feasible, given suitable a priori restrictions on the structures; combining exact and inexact methods is also possible. We demonstrate the applicability of the presented algorithm on four synthetic data sets with 17, 22, 37, and 100 variables.
Thesis
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linear-Gaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data. In particular, the main novel technical contributions of this thesis are as follows: a way of representing Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of applying Rao-Blackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Article
Advances in recording technologies have given neuroscience researchers access to large amounts of data, in particular, simultaneous, individual recordings of large groups of neurons in different parts of the brain. A variety of quantitative techniques have been utilized to analyze the spiking activities of the neurons to elucidate the functional connectivity of the recorded neurons. In the past, researchers have used correlative measures. More recently, to better capture the dynamic, complex relationships present in the data, neuroscientists have employed causal measures-most of which are variants of Granger causality-with limited success. This paper motivates the directed information, an information and control theoretic concept, as a modality-independent embodiment of Granger's original notion of causality. Key properties include: (a) it is nonzero if and only if one process causally influences another, and (b) its specific value can be interpreted as the strength of a causal relationship. We next describe how the causally conditioned directed information between two processes given knowledge of others provides a network version of causality: it is nonzero if and only if, in the presence of the present and past of other processes, one process causally influences another. This notion is shown to be able to differentiate between true direct causal influences, common inputs, and cascade effects in more two processes. We next describe a procedure to estimate the directed information on neural spike trains using point process generalized linear models, maximum likelihood estimation and information-theoretic model order selection. We demonstrate that on a simulated network of neurons, it (a) correctly identifies all pairwise causal relationships and (b) correctly identifies network causal relationships. This procedure is then used to analyze ensemble spike train recordings in primary motor cortex of an awake monkey while performing target reaching tasks, uncovering causal relationships whose directionality are consistent with predictions made from the wave propagation of simultaneously recorded local field potentials.
Article
Thesis (doctoral)--Eidgenössiche Technische Hochschule Zürich, 1998. Includes bibliographical references (p. 129-135).
Article
There occurs on some occasions a difficulty in deciding the direction of causality between two related variables and also whether or not feedback is occurring. Testable definitions of causality and feedback are proposed and illustrated by use of simple two-variable models. The important problem of apparent instantaneous causality is discussed and it is suggested that the problem often arises due to slowness in recordhag information or because a sufficiently wide class of possible causal variables has not been used. It can be shown that the cross spectrum between two variables can be decomposed into two parts, each relating to a single causal arm of a feedback situation. Measures of causal lag and causal strength can then be constructed. A generalization of this result with the partial cross spectrum is suggested.The object of this paper is to throw light on the relationships between certain classes of econometric models involving feedback and the functions arising in spectral analysis, particularly the cross spectrum and the partial cross spectrum. Causality and feedback are here defined in an explicit and testable fashion. It is shown that in the two-variable case the feedback mechanism can be broken down into two causal relations and that the cross spectrum can be considered as the sum of two cross spectra, each closely connected with one of the causations. The next three sections of the paper briefly introduce those aspects of spectral methods, model building, and causality which are required later. Section IV presents the results for the two-variable case and Section V generalizes these results for three variables.
Conference Paper
Understanding the role of communication in networked control systems is an important and challenging problem. In this paper we determine the communication rates required to achieve a variety of cooperative control objectives. Our results depend heavily on three information theoretic ideas: the directed data processing inequality, Slepian-Wolf coding, and source coding with side-information at the receiver.
Conference Paper
Two conservation laws for mutual information in terms of directed informations between two synchronized sequences of random variables are derived, the first for the case of no conditioning and the second for the case of causal conditioning on a third synchronized sequence. As a byproduct of the derivation of the first conservation law, the directed information specifying the feedback flowing from the second sequence to the first sequence is identified, which leads to a simple proof that a previously known sufficient condition for equality of mutual and directed information is also a necessary condition
Article
In this work, we consider a source coding model with feed-forward. We analyze a system with a noiseless, feed-forward link where the decoder has knowledge of all previous source samples while reconstructing the present sample. The rate-distortion function for an arbitrary source with feed-forward is derived in terms of directed information, a variant of mutual information. We further investigate the nature of the rate-distortion function with feed-forward for two common types of sources- discrete memory- less sources and Gaussian sources. We then characterize the error exponent for a general source with feed-forward. The results are then extended to feed-forward with an arbitrary delay larger than the block length.
Article
A method is presented to approximate optimally an n -dimensional discrete probability distribution by a product of second-order distributions, or the distribution of the first-order tree dependence. The problem is to find an optimum set of n - 1 first order dependence relationship among the n variables. It is shown that the procedure derived in this paper yields an approximation of a minimum difference in information. It is further shown that when this procedure is applied to empirical observations from an unknown distribution of tree dependence, the procedure is the maximum-likelihood estimate of the distribution.
Article
This paper addresses fundamental limitations of feedback using information theoretic conservation laws and flux arguments. The paper has two parts. In the first part, we derive a conservation law dictating that causal feedback cannot reduce the differential entropy inserted in the loop by external sources. An interpretation of this result is that the total randomness induced by disturbances, as measured by differential entropy, cannot be reduced by causal feedback; it can only be re-allocated in time or in frequency (if well defined). Under asymptotic stationarity assumptions, this result has a spectral representation which constitutes an extension of Bode's inequality for arbitrary feedback. Our proofs make clear the role of causality, as well as how stability assumptions impact the final result. In the second part, we derive an inequality unveiling that the feedback loop must be able to convey information originating from two independent sources: 1) initial states of the physical plant; 2) exogenous disturbance signals. By using such principle, we construct a variety of information rate (information flux) inequalities. Furthermore, we derive a universal performance bound which is parameterized solely by the feedback capacity and the parameters of the plant. The latter is a new fundamental limitation, which is different from Bode's classical result, indicating that finite feedback capacity brings a new type of performance bound.
Article
In this paper, we show a general equivalence between feedback stabilization through an analog communication channel, and a communication scheme based on feedback which is a generalization of that of Schalkwijk and Kailath. We also show that the achievable transmission rate of the scheme is given by the Bode's sensitivity integral formula, which characterizes a fundamental limitation of causal feedback. Therefore, we can now use the many results and design tools from control theory to design feedback communication schemes providing desired communication rates, and to generate lower bounds on the channel feedback capacity. We consider single user Gaussian channels with memory and memory-less multiuser broadcast, multiple access, and interference channels. In all cases, the results we obtain either achieve the feedback capacity, when this is known, recover known best rates, or provide new best achievable rates.
Article
A new approach for learning Bayesian belief networks from raw data is presented. The approach is based on Rissanen's Minimal Description Length (MDL) principle, which is particularly well suited for this task. Our approach does not require any prior assumptions about the distribution being learned. In particular, our method can learn unrestricted multiplyconnected belief networks. Furthermore, unlike other approaches our method allows us to tradeoff accuracy against complexity in the learned model. This is important since if the learned model is very complex (highly connected), it can be computationally intractable to use. In such a case it would be preferable to use a simpler model even if it is less accurate. MDL offers a principled method for making this tradeoff. We also show that our method generalizes previous approaches based on Kullback cross-entropy. Experiments have been conducted to demonstrate the feasibility of the approach. Appears in Proceedings of 2nd Pacific Rim Int...
Article
. In many multivariate domains, we are interested in analyzing the dependency structure of the underlying distribution, e.g., whether two variables are in direct interaction. We can represent dependency structures using Bayesian network models. To analyze a given data set, Bayesian model selection attempts to find the most likely (MAP) model, and uses its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed order over network variables. This allows us to compute, for a given order, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses a Markov Chain Monte Carlo (MCMC) method, but over orders rather than over network structures. The space of orders is smaller and more regular than the space of structures, and has much a smoother posterior "landscape". We present empirical results on synthetic and reallife datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap approach.
Article
It is shown that the "usual definition" of a discrete memoryless channel (DMC) in fact prohibits the use of feedback. The difficulty stems from the confusion of causality and statistical dependence. An adequate definition of a DMC is given, as well as a definition of using a channel without feedback. A definition, closely based on an old idea of Marko, is given for the directed information flowing from one sequence to another. This directed information is used to give a simple proof of the well-known fact that the use of feedback cannot increase the capacity of a DMC. It is shown that, when feedback is present, directed information is a more useful quantity than the traditional mutual information. INTRODUCTION Information theory has enjoyed little success in dealing with systems that incorporate feedback. Perhaps it was for this reason that C.E. Shannon chose feedback as the subject of the first Shannon Lecture, which he delivered at the 1973 IEEE International Symposium on Informati...
Measures of mutual and causal dependence between two time series (CorrespInvestigating causal relations by econometric models and cross-spectral methods
  • J Rissanen
  • M Wax
  • C Granger
J. Rissanen and M. Wax, "Measures of mutual and causal dependence between two time series (Corresp.)," IEEE Transactions on Information Theory, vol. 33, no. 4, pp. 598–601, 1987. [17] C. Granger, "Investigating causal relations by econometric models and cross-spectral methods," Econometrica, vol. 37, no. 3, pp. 424–438, 1969.
Approximating discrete probability distributions with dependence trees
  • C Chow
  • C Liu
C. Chow and C. Liu, " Approximating discrete probability distributions with dependence trees, " IEEE transactions on Information Theory, vol. 14, no. 3, pp. 462–467, 1968.