Chapter

Elements of Information Theory

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Half-title pageSeries pageTitle pageCopyright pageDedicationPrefaceAcknowledgementsContentsList of figuresHalf-title pageIndex

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The choice of orderings is restricted to the set of permutations having disjoint cycles with elements equal to some cluster. RCC is optimal as it achieves the Shannon bound [Cover, 1999] in bit savings. The worst-case computational complexity of RCC is quasi-linear in the largest cluster size and requires no training or machine learning techniques. ...
... , created via concatenation, is uniquely decodable. It is known that E X∼P X [C(X)] ≥ H(P X ) for any uniquely decodable code, and any code with average-length close to H(P X ) must obey C(x) ≈ − log P X (x) [Shannon, 1948, Cover, 1999. It is possible to construct C using entropy coders such as Asymmetric Numeral Systems [Duda, 2009], Arithmetic Coding [Witten et al., 1987], or Huffman Codes [Huffman, 1952]. ...
... Results are shown in Table 1. Scalar quantization [Cover, 1999, Lloyd, 1982 [2019] where the number of clusters is held fixed to approximately √ n. Savings in bytes-per-element are shown in the second column, where log|Π| = √ n log(( √ n − 1)!), and agree with Table 1. ...
Preprint
We present an optimal method for encoding cluster assignments of arbitrary data sets. Our method, Random Cycle Coding (RCC), encodes data sequentially and sends assignment information as cycles of the permutation defined by the order of encoded elements. RCC does not require any training and its worst-case complexity scales quasi-linearly with the size of the largest cluster. We characterize the achievable bit rates as a function of cluster sizes and number of elements, showing RCC consistently outperforms previous methods while requiring less compute and memory resources. Experiments show RCC can save up to 2 bytes per element when applied to vector databases, and removes the need for assigning integer ids to identify vectors, translating to savings of up to 70% in vector database systems for similarity search applications.
... The aforementioned connection attains a third dimension when one realizes that the gametheoretic interpretations of the fixed points can further be supplemented with interesting information-theoretic connection: Attainment of ESS in the course of time-evolution of frequencies is manifested [17] as decrease in an appropriately constructed Kullback-Leibler (KL) divergence (also called relative entropy) [18,19]. In the context of this paper, let us bring a rather obvious point to the fore: Both the game-theoretic and the information-theoretic meanings are associated with the fixed points of the replicator dynamics as ESS, by construction, corresponds to a single population state. ...
... This matches with Eq. (15). We can easily see that one trivial possibility is the square bracket terms in Eq. (19) are individually zero, which corresponds to NE. But another non-trivial possibility is that the overall sum is zero; whether such a solution exists depends on the exact structure of payoff matrices. ...
... In this section, we adopt an information-theoretic perspective to guide us in interpreting periodic orbits. We find that the relative entropy, also known as the Kullback-Leibler (KL) divergence [18,19], to be especially valuable for this purpose. It is defined as follows, ...
Article
Full-text available
The concept of evolutionarily stability and its relation with the fixed points of the replicator equation are important aspects of evolutionary game dynamics. In the light of the fact that oscillating state of a population and individuals (or players) of different roles are quite natural occurrences, we ask the question how the concept of evolutionarily stability can be generalized so as to associate game-theoretic meaning to oscillatory behaviours of players asymmetrically interacting, i.e., if there are both intraspecific and interspecific interactions between two subpopulations in the population. We guide our scheme of generalization such that the evolutionary stability is related to the dynamic stability of the corresponding periodic orbits of a time-discrete replicator dynamics. We name the generalization of evolutionarily stable state as two-species heterogeneity stable orbit. Furthermore, we invoke the principle of decrease of relative entropy in order to associate the generalization of evolutionary stability with an information-theoretic meaning. This particular generalization is aptly termed as two-species information stable orbit.
... Given the importance of data processing and monotonic divergences, people have studied the mathematical structure gives rise to monotonic divergences, e.g. [2][3][4][5][6][7][8] and references therein, as well as find various applications for them (see [9][10][11][12] for many such applications). In doing so, it has become of interest to relate the different divergences to each other in the sense of bounding one by another (See in particular [13] and references therein). ...
... In the above, L f α = 1 similarly satisfies (9). The inequalities for KL divergence in Propositions 10 and 11 are equivalent to L f α = 4 with α ∈ {1, 0} in Proposition 12, and may alternatively be viewed as direct corollaries of it. ...
... Then, L f = 8 satisfies (9). ...
Preprint
The data processing inequality is central to information theory and motivates the study of monotonic divergences. However, it is not clear operationally we need to consider all such divergences. We establish a simple method for Pinsker inequalities as well as general bounds in terms of χ2\chi^{2}-divergences for twice-differentiable f-divergences. These tools imply new relations for input-dependent contraction coefficients. We use these relations to show for many f-divergences the rate of contraction of a time homogeneous Markov chain is characterized by the input-dependent contraction coefficient of the χ2\chi^{2}-divergence. This is efficient to compute and the fastest it could converge for a class of divergences. We show similar ideas hold for mixing times. Moreover, we extend these results to the Petz f-divergences in quantum information theory, albeit without any guarantee of efficient computation. These tools may have applications in other settings where iterative data processing is relevant.
... The maximum uncertainty for a variable with N possible states is H max = log(N) [36] and we can define the information, I, in a system simply as: ...
... where U is the uniform distribution over X [36]. ...
... Here, we will provide another demonstration for both cases, continuous and discrete, based on the chain rule for relative entropy, D KL [36]. ...
Article
Full-text available
Population genetics describes evolutionary processes, focusing on the variation within and between species and the forces shaping this diversity. Evolution reflects information accumulated in genomes, enhancing organisms’ adaptation to their environment. In this paper, I propose a model that begins with the distribution of mating based on mutual fitness and progresses to viable adult genotype distribution. At each stage, the changes result in different measures of information. The evolutionary dynamics at each stage of the model correspond to certain aspects of interest, such as the type of mating, the distribution of genotypes in regard to mating, and the distribution of genotypes and haplotypes in the next generation. Changes to these distributions are caused by variations in fitness and result in Jeffrey’s divergence values other than zero. As an example, a model of hybrid sterility is developed of a biallelic locus, comparing the information indices associated with each stage of the evolutionary process. In conclusion, the informational perspective seems to facilitate the connection between cause and effect and allows the development of statistical tests to perform hypothesis testing against zero-information null models (random mating, no selection, etc.). The informational perspective could contribute to clarify, deepen, and expand the mathematical foundations of evolutionary theory.
... by invoking (29) and the triangle inequality. Thus, equations (19) -(21), (24) -(26) and (31) - (33) guarantee that P (x n 1 , x n 2 , w n , y n ,z n , f 1 , f 2 ) ≈ p U (f 1 )p U (f 2 )p(x n 1 , x n 2 , w n , y n ,z n ), ...
... Since mutual information is a continuous function of the probability distribution, expressions (49)-(50) imply I(Z n ; X n 1 , X n 2 , W n , Y n )|p(x n 1 , x n 2 , w n , y n ,z n |f * 1 , f * 2 ) → 0 (51) as n → ∞, by utilizing the property that if random variables Θ and Θ ′ with the same support set Θ satisfy p Θ −p Θ ′ 1 ≤ ǫ ≤ 1/4, then according to [33,Theorem 17.3.3], it follows that |H(Θ)−H(Θ ′ )| ≤ ζ log |Θ|, where ζ approaches zero as ǫ approaches zero. ...
... where (a) follows from triangle inequality, (b) follows from [7, Lemma V.1], while (c) follows by the definition of a successful code, i.e., from (52). Consequently, it follows from [33,Theorem 17.3.3] that the mutual information I(X n 1 , X n 2 , W n , Y n ;Z n ) of interest can be bounded as ...
Preprint
Full-text available
A fundamental problem in decentralized networked systems is to coordinate actions of different agents so that they reach a state of agreement. In such applications, it is additionally desirable that the actions at various nodes may not be anticipated by malicious eavesdroppers. Motivated by this, we investigate the problem of secure multi-terminal strong coordination aided by a multiple-access wiretap channel. In this setup, independent and identically distributed copies of correlated sources are observed by two transmitters who encode the channel inputs to the MAC-WT. The legitimate receiver observing the channel output and side information correlated with the sources must produce approximately i.i.d. copies of an output variable jointly distributed with the sources. Furthermore, we demand that an external eavesdropper learns essentially nothin g about the sources and the simulated output sequence by observing its own MAC-WT output. This setting is aided by the presence of independent pairwise shared randomness between each encoder and the legitimate decoder, that is unavailable to the eavesdropper. We derive an achievable rate region based on a combination of coordination coding and wiretap coding, along with an outer bound. The inner bound is shown to be tight and a complete characterization is derived for the special case when the sources are conditionally independent given the decoder side information and the legitimate channel is composed of deterministic links. Further, we also analyze a more general scenario with possible encoder cooperation, where one of the encoders can non-causally crib from the other encoders input, for which an achievable rate region is proposed. We then explicitly compute the rate regions for an example both with and without cribbing between the encoders, and demonstrate that cribbing strictly improves upon the achievable rate region.
... which is always positive [19], where dθ denotes dMdW. Some terminologies: q(θ) is called R-density, which approximates the true posterior p(θ|s) in the variational scheme. ...
... The extra parameters µ d and w d are the steadystate values of µ and w, respectively, without driving terms ws and sµ. After substituting Equations (19) and (20) into Equation (15) and evaluating Equation (17), one can determine the neural representations of the momenta p µ and p w . The results are given as ...
... The neural masses m µ and m w are a measure of inferential precision, defined to be the inverse noise strengths [Equation (18)]. The frictional coefficients denoted by γ µ and γ w appear in the generative functions [Equations (19) and (20)], which we set γ µ = 10γ w to account for the slower weight dynamics compared to the neuronal activity. Furthermore, the parameters µ d and w d in the inhomogeneous vector [Equation (29)] represent the brain's prior belief about the postsynaptic and weight values before the presynaptic input arrives. ...
Article
Full-text available
The brain is a biological system comprising nerve cells and orchestrates its embodied agent’s perception, behavior, and learning in dynamic environments. The free-energy principle (FEP) advocated by Karl Friston explicates the local, recurrent, and self-supervised cognitive dynamics of the brain’s higher-order functions. In this study, we continue to refine the FEP through a physics-guided formulation; specifically, we apply our theory to synaptic learning by considering it an inference problem under the FEP and derive the governing equations, called Bayesian mechanics. Our study uncovers how the brain infers weight changes and postsynaptic activity, conditioned on the presynaptic input, by deploying generative models of the likelihood and prior belief. Consequently, we exemplify the synaptic efficacy in the brain with a simple model; in particular, we illustrate that the brain organizes an optimal trajectory in neural phase space during synaptic learning in continuous time, which variationally minimizes synaptic surprisal.
... The maximum uncertainty for a variable with N possible states is H max = log(N) [36] and we can define the information I of a system simply as It is also possible to measure the uncertainty of X conditioned on another ...
... where U is the uniform over X [36]. ...
... In [23] it is proved for the discrete case, that J PSS is, in turn, the sum of the sexual selection information within each sex, J PSS = J S1 +J S2 , and in [25] the same is proven for the continuous case. Here we will give another demonstration for both cases, continuous and discrete, based on the chain rule for relative entropy D KL [36]. ...
Preprint
Full-text available
Population genetics describes evolutionary processes, focusing on the variation within and between species and the forces shaping this diversity. Evolution reflects information accumulated in genomes, enhancing organisms' adaptation to their environment. In this paper, we propose a model that begins with the distribution of matings based on mutual fitness and progresses to viable adult genotype distribution. At each stage the changes result in different measures of information. The evolutionary dynamics of each stage of the model correspond to aspects such as the type of mating, the distributions of genotypes within matings, and the distribution of genotypes and haplotypes in the next generation. Changes in these distributions are caused by variation in fitness and result in Jeffreys divergence values other than zero. As an example, a model of hybrid sterility is developed at a biallelic locus, comparing the information indices associated with each stage of the evolutionary process. In conclusion, the informational perspective seems to facilitate the connection of causes and effects and allows the development of statistics to contrast null models of zero information (random mating, no selection, etc.). The informational perspective could contribute to clarify, deepen and expand the mathematical foundations of evolutionary theory.
... The inherent limitation of filter-based methods lies in their inability to consider the interactions between features, which hampers their ability to address the issue of redundancy. Consequently, the effectiveness of widely used filter-based methods, such as Information Gain (IG) (Guyon and Elisseeff 2003), Pointwise Mutual Information (PMI) (Cover and Thomas 1991), Chi-square test (Forman 2003), etc., may significantly decrease. This decrease is primarily caused by the inclusion of a large number of redundant and irrelevant features, which adversely impact the discrimination ability of the classification algorithm. ...
... Instead, we employ a selection algorithm that takes into account the correlation between features with similar classification weights. This correlation, which captures the association between discrete variables, can be effectively measured using mutual information (Cover and Thomas 2006), which quantifies the amount of information shared between two variables and captures the dependency or association between them. ...
... The conditional probabilities of category c j in the presence or absence of term t are P(c j /t) and P(c j /t). • Mutual Information (MI) (Cover and Thomas 1991) is a statistical measure that quantifies the relationship between two random variables, X and Y . In text categorization, MI measures the association between a term t and a target category c j having probabilities P(t) and P(c j ), respectively, computed as the point-wise MI as given in Eq. 7: ...
Article
Full-text available
This paper introduces a new hybrid method to address the issue of redundant and irrelevant features selected by filter-based methods for text classification. The method utilizes an enhanced genetic algorithm called “Feature Correlation-based Genetic Algorithm” (FC-GA). Initially, a feature subset with the highest classification accuracy is selected by a filter-based method, which will be then used by the FC-GA to generate potential solutions by considering the correlation between features that have similar classification weights and avoiding useless random solutions. The encoding process involves assigning a value of 0 to features that provide a high degree of correlation with other features having almost the same classification information beyond a specified context, while features that are lowly correlated retain their initial code of 1. Through iterative optimization using crossover and mutation operators, the algorithm should remove redundant features that provide strong correlations and high redundancy, which could lead to improved classification performance at a lower computation cost. The aim of this study is to improve the efficiency of filter-based methods, incorporate feature correlation information into genetic algorithms, and utilize pre-optimized feature subsets to efficiently identify optimal solutions. To evaluate the effectiveness of the proposed method, SVM and NB classifiers are employed on six public datasets and compared to five well-known and effective filter-based methods. The results indicate that a significant portion (about 50%) of the features selected by reference filter-based methods are redundant. Eliminating those redundant features leads to a significant improvement in classification performance as measured by the micro-f1 measure.
... : | A B C  [38]. For the information flow from A to B given the intervention on C , we denote it as ...
... The reason is that the intervention on A is generally different from the intervention on B [15]. Note that the statistical independence is symmetric, A B B A    , which means that the mutual information is symmetric [38]. In short, the information flow is different from the mutual information [16]. ...
... To understand the property given by Proposition 1, we only need to consider that, essentially, the information flow in (6) is the relative entropy between both sides of the equality in (5). Hence, when the distributions on both sides are equal, the relative entropy is zero [38]. Corollary 1 just follows directly from Proposition 1. ...
Preprint
Full-text available
Empirical risk minimization (ERM) is a celebrated induction principle for developing data-driven models. However, ERM has received both pros and cons for its capability on domain generalization (DG). To this end, this paper attempts to study the success and failure of ERM at supervised DG classification tasks, both theoretically and empirically, with causal perspectives. In the theoretical aspect, we first explore different properties of a causal metric termed information flow, followed with discussing relationships between the information flow and the mutual information in the proposed causal graph. Next, we analyze the roles of the transformed causal feature and the transformed spurious feature on modeling performances. It reveals that the interaction between the spurious influencer and the transformed causal feature is the key determining the failure or success of ERM on DG. In the empirical study, we first simulate various DG settings based on the MNIST, Fashion MNIST, and CIFAR10 datasets. Next, we verify developed theories by testing three different neural network configurations in designed experiments. In addition, experiments based on real-world datasets are conducted to further consolidate key points of the proposed theories. To extend application benefits of the theoretical discoveries, a new risk minimization framework with a novel feature intervention for regulating ERM is proposed. It achieves DG improvements over ERM on real-world datasets of image segmentation, image classification, and text classification.
... where f 0 is a known and concave utility function, x is a vector measuring mean-ergodic system performance, X is a convex set, and P 0 is a total power budget. Problem (3) is well-studied in the literature [1], [27]- [29], and a globally optimal solution is readily available via the well-known waterfilling algorithm [1], [29]. However, although ergodic-optimal policies perform optimally in expectation, they are prone to fluctuations beyond P 0 , e.g., power spikes, particularly over heavy-tailed fading channels. ...
... where f 0 is a known and concave utility function, x is a vector measuring mean-ergodic system performance, X is a convex set, and P 0 is a total power budget. Problem (3) is well-studied in the literature [1], [27]- [29], and a globally optimal solution is readily available via the well-known waterfilling algorithm [1], [29]. However, although ergodic-optimal policies perform optimally in expectation, they are prone to fluctuations beyond P 0 , e.g., power spikes, particularly over heavy-tailed fading channels. ...
Preprint
Full-text available
Modern wireless communication systems necessitate the development of cost-effective resource allocation strategies, while ensuring maximal system performance. While commonly realizable via efficient waterfilling schemes, ergodic-optimal policies often exhibit instantaneous resource constraint fluctuations as a result of fading variability, violating prescribed specifications possibly within unacceptable margins, inducing further operational challenges and/or costs. On the other extent, short-term-optimal policies -- commonly based on deterministic waterfilling-- while strictly maintaining operational specifications, are not only impractical and computationally demanding, but also suboptimal in a long-term sense. To address these challenges, we introduce a novel distributionally robust version of a classical point-to-point interference-free multi-terminal constrained stochastic resource allocation problem, by leveraging the Conditional Value-at-Risk (CVaR) as a coherent measure of power policy fluctuation risk. We derive closed-form dual-parameterized expressions for the CVaR-optimal resource policy, along with corresponding optimal CVaR quantile levels by capitalizing on (sampling) the underlying fading distribution. We subsequently develop two dual-domain schemes -- one model-based and one model-free -- to iteratively determine a globally-optimal resource policy. Our numerical simulations confirm the remarkable effectiveness of the proposed approach, also revealing an almost-constant character of the CVaR-optimal policy and at rather minimal ergodic rate optimality loss.
... where f 0 is a known and concave utility function, x is a vector measuring mean-ergodic system performance, X is a convex set, and P 0 is a total power budget. Problem (3) is well-studied in the literature [1], [27]- [29], and a globally optimal solution is readily available via the well-known waterfilling algorithm [1], [29]. However, although ergodic-optimal policies perform optimally in expectation, they are prone to fluctuations beyond P 0 , e.g., power spikes, particularly over heavy-tailed fading channels. ...
... where f 0 is a known and concave utility function, x is a vector measuring mean-ergodic system performance, X is a convex set, and P 0 is a total power budget. Problem (3) is well-studied in the literature [1], [27]- [29], and a globally optimal solution is readily available via the well-known waterfilling algorithm [1], [29]. However, although ergodic-optimal policies perform optimally in expectation, they are prone to fluctuations beyond P 0 , e.g., power spikes, particularly over heavy-tailed fading channels. ...
Conference Paper
Full-text available
Modern wireless communication systems necessitate the development of cost-effective resource allocation strategies, while ensuring maximal system performance. While commonly realizable via efficient waterfilling schemes, ergodic-optimal policies often exhibit instantaneous resource constraint fluctuations as a result of fading variability, violating prescribed specifications possibly within unacceptable margins, inducing further operational challenges and/or costs. On the other extent, short-term-optimal policies-commonly based on deterministic waterfilling-, while strictly maintaining operational specifications, are not only impractical and computationally demanding, but also suboptimal in a long-term sense. To address these challenges, we introduce a novel distributionally robust version of a classical point-to-point interference-free multi-terminal constrained stochastic resource allocation problem, by leveraging the Conditional Value-at-Risk (CVaR) as a coherent measure of power policy fluctuation risk. We derive closed-form dual-parameterized expressions for the CVaR-optimal resource policy, along with corresponding optimal CVaR quantile levels by capitalizing on (sampling) the underlying fading distribution. We subsequently develop two dual-domain schemes-one model-based and one model-free-to iteratively determine a globally-optimal resource policy. Our numerical simulations confirm the remarkable effectiveness of the proposed approach, also revealing an almost-constant character of the CVaR-optimal policy and at rather minimal ergodic rate optimality loss.
... C n = J j=1 γ j Cn ,j = J j=1 γ j σ 2 j I M K represents the covarince matrix of noise vector n. Furthermore, by applying the concavity of differential entropy in terms of the underlying PDF [39], we have computed the following lower bound for h(n), which is then utilized in inequality (b). ...
... M K [39], using the fact that Hs|H and n are independent. The inequality (d) follows since the logarithmic function is strictly increasing, and we have h L (n) ≤ h(n), as stated in (19). ...
Article
Full-text available
In this paper, we present a comparative performance analysis of three prominent multicarrier schemes—orthogonal frequency division multiplexing (OFDM), generalized frequency division multiplexing (GFDM), and orthogonal time frequency space (OTFS) modulation—in the presence of doubly dispersive fading channels and impulsive noise. This noise, as a disturbance characterized by short-duration random bursts of high energy, can significantly degrade the quality of service across various wireless and wireline communications scenarios. We consider Bernoulli-normal and Middleton Class A distributions as two well-known impulsive noise models and examine five key performance indicators, including spectral efficiency (SE), overall complexity, peak-to-average power ratio (PAPR), bit error rate (BER), and achievable rate. The study of system complexity reveals the computational requirements of these schemes in terms of the number of complex multiplications essential for distinct transceiver blocks. In addition, to address the PAPR, we suggest a novel approach utilizing weighted-type fractional Fourier transform precoding. This method examines the impact of variations in the precoder order and explores the application of iterative algorithms for more optimal designing of the precoder. Throughout a comprehensive investigation, the impact of impulsive noise and users’ mobility on the BER and the achievable rate of the considered schemes is disclosed. Overall, our study indicates that while OFDM performs inadequately in terms of the SE and achievable rate, and GFDM has the poorest output concerning complexity, PAPR, and BER, the OTFS scheme distinguishes itself by effectively balancing these metrics and ensuring reliable communications in mobility scenarios.
... Example 6 (Gradient of the KL-divergence). For any p ∈ P > (Ω), the Kullback-Leibler divergence, in short KL-divergence (see [19] or § 3.3-4 in [9]), of q and p is the scalar field ...
... Example 7 (Entropy). The expression of the entropy [19] H ...
Article
Full-text available
A critical processing step for AI algorithms is mapping the raw data to a landscape where the similarity of two data points is conveniently defined. Frequently, when the data points are compositions of probability functions, the similarity is reduced to affine geometric concepts; the basic notion is that of the straight line connecting two data points, defined as a zero-acceleration line segment. This paper provides an axiomatic presentation of the probability simplex’s most commonly used affine geometries. One result is a coherent presentation of gradient flow in Aichinson’s compositional data, Amari’s information geometry, the Kantorivich distance, and the Lagrangian optimization of the probability simplex.
... by I(A,B) and called it mutual information of A and B. The rate of transmission in [1] or the mutual information I(X,Y) between two continuous random variables X and Y [8] is defined as ...
... is a combined X and Y probability measure (see) [8] . We study the discrete probability distributions P and Q. ...
... A (binary) Huffman tree is a code tree constructed from a source by recursively merging two smallest-probability nodes until only one node with probability 1 remains. (For more details about Huffman codes, see Section 5.6 of [1].) The leaf nodes in the tree correspond to initial source probabilities. ...
... Huffman codes [3] were invented in 1952 and are used today in many practical data compression applications, such as for text, audio, image, and video coding, and are known to be optimal [1]. ...
Article
Full-text available
A property of prefix codes called strong monotonicity is introduced, and it is proven that for a given source, a prefix code is optimal if and only if it is complete and strongly monotone.
... The uncertainty analysis process is as shown in Figure 7. According to information theory [20], the entropy of discrete distributions is one of the methods for measuring uncertainty. The actual obtained flow pattern data are ( ) ...
... Under the same gas-liquid conditions, when the pipe pressure is higher, the liquid film region of the slug is compressed, the liquid slug gradually becomes longer, and the frequency of slug generation becomes larger. In order to ensure the accuracy of the model prediction, the slug According to information theory [20], the entropy of discrete distributions is one of the methods for measuring uncertainty. The actual obtained flow pattern data are P s = (x, y) and generate a finite dataset after data preprocessing C s , expressed as ...
Article
Full-text available
In order to improve the accuracy and efficiency of flow pattern recognition and to solve the problem of the real-time monitoring of flow patterns, which is difficult to achieve with traditional visual recognition methods, this study introduced a flow pattern recognition method based on a convolutional neural network (CNN), which can recognize the flow pattern under different pressure and flow conditions. Firstly, the complex gas–liquid distribution and its velocity field in the annulus were investigated using a computational fluid dynamics (CFDs) simulation, and the gas–liquid distribution and velocity vectors in the annulus were obtained to clarify the complexity of the flow patterns in the annulus. Subsequently, a sequence model containing three convolutional layers and two fully connected layers was developed, which employed a CNN architecture, and the model was compiled using the Adam optimizer and the sparse classification cross entropy as a loss function. A total of 450 images of different flow patterns were utilized for training, and the trained model recognized slug and annular flows with probabilities of 0.93 and 0.99, respectively, confirming the high accuracy of the model in recognizing annulus flow patterns, and providing an effective method for flow pattern recognition.
... , and p(k − 1) ∼  (0 nNN , Σ(k − 1)). KL divergence is used for its ability to quantify differences between probability distributions and its computational efficiency [67]. Additionally,̃e st denotes the pre-set stopping tolerance. ...
Article
Full-text available
This paper concerns the optimality problem of distributed linear quadratic control in a linear stochastic multi‐agent system (MAS). The main challenge stems from MAS network topology that limits access to information from non‐neighbouring agents, imposing structural constraints on the control input space. A distributed control‐estimation synthesis is proposed which circumvents this issue by integrating distributed estimation for each agent into distributed control law. Based on the agents' state estimate information, the distributed control law allows each agent to interact with non‐neighbouring agents, thereby relaxing the structural constraint. Then, the primal optimal distributed control problem is recast to the joint distributed control‐estimation problem whose solution can be obtained through the iterative optimization procedure. The stability of the proposed method is verified and the practical effectiveness is supported by numerical simulations and real‐world experiments with multi‐quadrotor formation flight.
... Therefore, the task is a classical hypothesis testing problem of two normal distributions with known, identical variance. In this case, the asymptotic exponent of the average error is given by the Chernoff information [18,Chapter 11.9] of the distributions corresponding to the channel states 0 and s. The Chernoff information of two distributions f 0 (x) and f 1 (x) is ...
Preprint
Full-text available
We study the problem of joint communication and detection of wiretapping on an optical fiber from a quantum perspective. Our system model describes a communication link that is capable of transmitting data under normal operating conditions and raising a warning at the transmitter side in case of eavesdropping. It contributes to a unified modelling approach, based on which established quantum communication paradigms like quantum key distribution can be compared to other approaches offering similar functionality.
... As a result, using Pinsker's type inequality (e.g. [12]) and under the assumption that f (·) produces accurate predictions of Y for in-distribution inputs, the objective function in equation 2 can be turned into a cross-entropy term, which is more amenable to optimization and analysis, as follows: ...
Preprint
Full-text available
Graph Neural Networks (GNNs) leverage the structural properties of the graph to inform the architecture of the neural network, thus achieving improved accuracy in graph learning tasks. However, like many neural network models, GNNs face a significant challenge with interpretability. To mitigate this issue, recent works have developed post-hoc instance-level explanation methods that focus on identifying minimal and sufficient subgraphs which strongly influence GNN predictions. Approaches that build on the graph information bottleneck principle (GIB) to quantify minimality and sufficiency have received particular attention, and have been used in several state-of-the-art explanation mechanisms. This work identifies several fundamental issues in such quantifications, particularly a signaling issue in the sufficiency, and a redundancy issue in the minimality quantifications. These may lead to explanations that do not accurately reflect the rationale behind GNN decisions. To overcome these challenges, we propose a new objective function and explainer architecture, dubbed the SculptEdgeX. The SculptEdgeX framework assesses the sufficiency of an input subgraph by generating an in-distribution supergraph and evaluating its prediction accuracy when processed by the GNN. This involves an initial densification process that adds edges to the input graph, followed by a selective edge removal step — called edge sculpting — to produce an in-distribution supergraph. To ensure the in-distribution property, we pre-train a calibrator network that parametrizes the underlying distribution of a given graph, hence enabling us to compare the distribution parameters with those of the original input distribution. We validate our method through extensive experiments on both synthetic and real-world datasets, demonstrating the effectiveness of SculptEdgeX in producing informative explanations.
... 3. Given that ReduNet aims to flatten each category of data into its respective linear subspace, we use the condition number, an essential metric for testing whether a linear system is ill-conditioned, as an auxiliary criterion for stopping training. This helps save computational re- Thomas, 2006]. Supposing Z contains two uncorrelated subsets, Z 1 and Z 2 , the coding rate for all data R(Z 1 ∪ Z 2 ) is greater than the sum of the coding rates for R(Z 1 ) and ...
Preprint
ReduNet is a deep neural network model that leverages the principle of maximal coding rate \textbf{redu}ction to transform original data samples into a low-dimensional, linear discriminative feature representation. Unlike traditional deep learning frameworks, ReduNet constructs its parameters explicitly layer by layer, with each layer's parameters derived based on the features transformed from the preceding layer. Rather than directly using labels, ReduNet uses the similarity between each category's spanned subspace and the data samples for feature updates at each layer. This may lead to features being updated in the wrong direction, impairing the correct construction of network parameters and reducing the network's convergence speed. To address this issue, based on the geometric interpretation of the network parameters, this paper presents ESS-ReduNet to enhance the separability of each category's subspace by dynamically controlling the expansion of the overall spanned space of the samples. Meanwhile, label knowledge is incorporated with Bayesian inference to encourage the decoupling of subspaces. Finally, stability, as assessed by the condition number, serves as an auxiliary criterion for halting training. Experiments on the ESR, HAR, Covertype, and Gas datasets demonstrate that ESS-ReduNet achieves more than 10x improvement in convergence compared to ReduNet. Notably, on the ESR dataset, the features transformed by ESS-ReduNet achieve a 47\% improvement in SVM classification accuracy.
... First of all, the process of generating persistence images and integrating them with raw images requires significant computational resources, which can be a challenge in the context of real-time systems. Another major limitation is the Data Processing Inequality [26]. Regardless of the filtration or architecture we use, we will not be able to infer more information from the model than is already present in the data. ...
Preprint
Full-text available
Artificial Neural Networks (ANNs) require significant amounts of data and computational resources to achieve high effectiveness in performing the tasks for which they are trained. To reduce resource demands, various techniques, such as Neuron Pruning, are applied. Due to the complex structure of ANNs, interpreting the behavior of hidden layers and the features they recognize in the data is challenging. A lack of comprehensive understanding of which information is utilized during inference can lead to inefficient use of available data, thereby lowering the overall performance of the models. In this paper, we introduce a method for integrating Topological Data Analysis (TDA) with Convolutional Neural Networks (CNN) in the context of image recognition. This method significantly enhances the performance of neural networks by leveraging a broader range of information present in the data, enabling the model to make more informed and accurate predictions. Our approach, further referred to as Vector Stitching, involves combining raw image data with additional topological information derived through TDA methods. This approach enables the neural network to train on an enriched dataset, incorporating topological features that might otherwise remain unexploited or not captured by the network's inherent mechanisms. The results of our experiments highlight the potential of incorporating results of additional data analysis into the network's inference process, resulting in enhanced performance in pattern recognition tasks in digital images, particularly when using limited datasets. This work contributes to the development of methods for integrating TDA with deep learning and explores how concepts from Information Theory can explain the performance of such hybrid methods in practical implementation environments.
... where the maximum is taken over all possible input distributions p(x) [33]. Furthermore, we have ...
Article
Full-text available
This paper introduces a class of symmetric ciphering systems with a finite secret key, which provides ideal secrecy, autonomy in key generation and distribution, and robustness against the probabilistic structure of messages (the Ideally Secret Autonomous Robust (ISAR) system). The ISAR system is based on wiretap polar codes constructed over an artificial wiretap channel with a maximum secrecy capacity of 0.5. The system autonomously maintains a minimum level of key equivocation by continuously refreshing secret keys without additional key generation and distribution infrastructure. Moreover, it can transform any stream ciphering system with a finite secret key of known length into an ISAR system without knowing and/or changing its algorithm. Therefore, this class of system strongly supports privacy, a critical requirement for contemporary security systems. The ISAR system’s reliance on wiretap polar coding for strong secrecy ensures resistance to passive known plaintext attacks. Furthermore, resistance to passive attacks on generated refreshing keys follows directly from ideal secrecy and autonomy. The results presented offer an efficient methodology for synthesizing this class of systems with predetermined security margins and a complexity of the order of nlog⁡n, where n is the block length of the applied polar code.
... The term selection method we have described can be understood as a form of feature selection that reasons globally about the data and tries to control for some effects that are not of interest (topic or document idiosyncrasies). We compared the approach to two classic, simple methods for feature selection: ranking based on pointwise mutual information (PMI) and weighted average PMI (WAPMI) (Schneider, 2005; Cover and Thomas, 2012). Selected features were used to classify the ideologies of held-out documents from our corpus. ...
... Notably, [8]- [12] introduced non-orthogonal multiple access schemes for eMBB and URLLC coexistence in the uplink. These schemes stem from the classical studies on the Gaussian multiple access channel (GMAC) with homogeneous and infinite blocklength, where the entire capacity region can be achieved by superposition coding and successive interference cancellation (SIC) with time-sharing [13] or rate-splitting with partial SIC [14], [15]. ...
Article
Full-text available
We consider the uplink multiple access of heterogeneous users, e.g., ultra-reliable low-latency communications (URLLC) and enhanced mobile broadband (eMBB) users. Each user has its own reliability requirement and blocklength constraint, and users transmitting longer blocks suffer from heterogeneous interference. On top of that, the decoding of URLLC messages cannot leverage successive interference cancellation (SIC) owing to the stringent latency requirements. This can significantly degrade the spectral efficiency of all URLLC users when the interference is strong. To overcome this issue, we propose a new multiple access scheme employing discrete signaling and treating interference as noise (TIN) decoding, i.e., without SIC. Specifically, to handle heterogeneous interference while maintaining the single-user encoding and decoding complexities, each user uses a single channel code and maps its coded bits onto sub-blocks of symbols, where the underlying constellations can be different. We demonstrate theoretically and numerically that the proposed scheme employing quadrature amplitude modulations and TIN decoding can perform very close to the benchmark scheme based on Gaussian signaling with perfect SIC decoding. Interestingly, we show that the proposed scheme does not need to use all the transmit power budget, but also can sometimes even outperform the benchmark scheme.
... Adaptation and game theory Koza [37]; Smith [38]; Mitchell [39]; Davis [40]; Myerson [41]; Challet and Zhang [42]; Gould [43]; Dawkins [44]; Axelrod [45]; Watson [46]; Gintis [47]. Information theory Shannon [48]; Pierce [49]; Bennett [50]; Cover [51]; Badii [52]. Computational complexity Garey [53]; Aaronson [54] ; Moore [55] Agent-based modeling Schelling [56]; Ray [57]; Palmer [58]; Epstein [59]; Berry [60]; Macy [61] Grimm [62]; Gilbert [63]; Page [64]. ...
Article
Full-text available
Complex system theory is a new and rapidly-developing field. A complex system is composed of many simple individual components that interact nonlinearly, giving rise to global behaviors and often unexpected, unprecedented and unpredictable that exhibit nontrivial emergent and self-organizing behaviors. In this context, a sub-domain has emerged under the name "complex adaptive systems: CAS". This current coming from the developments in evolutionary algorithms in 1980 years, focused on complex systems in relation with an environment in which they dynamically learn and adapt. The goal of this work is to explore the Complex system theory through a comprehensive survey, which clearly shows the properties, the mechanisms, the applications, and the different scientific currents that are related to the complex systems from 1940 until 2015.We present the effective approach for modeling and simulating complex systems. We undertake a fairly exhaustive and rigorous survey of applications of complex theory in various domains of science.
... At equilibrium, a system's entropy is maximized, obscuring initial conditions. Nowadays, entropy is a fundamental concept across a wide range of disciplines: statistical mechanics [24], information theory [11], quantum computation [23], cryptography [29], biology [40], neuroscience [13], ecological modeling [28], financial market analysis and portfolio optimization [14,22,41], decision tree construction [2], machine learning [4], and others. In these contexts, entropy serves as a metric to quantify information, predictability, complexity, and other relevant characteristics. ...
Preprint
Full-text available
The paper extends the analysis of the entropies of the Poisson distribution with parameter λ\lambda. It demonstrates that the Tsallis and Sharma-Mittal entropies exhibit monotonic behavior with respect to λ\lambda, whereas two generalized forms of the R\'enyi entropy may exhibit "anomalous" (non-monotonic) behavior. Additionally, we examine the asymptotic behavior of the entropies as λ\lambda \to \infty and provide both lower and upper bounds for them.
... The square-root in the definition arises naturally from the bounds we are able to prove, and is dictated by the form of Pinsker's inequality 28 , ensuring that the sum of the length of successive path fragments equates the length of the path. ...
Preprint
Despite having triggered devastating pandemics in the past, our ability to quantitatively assess the emergence potential of individual strains of animal influenza viruses remains limited. This study introduces Emergenet, a tool to infer a digital twin of sequence evolution to chart how new variants might emerge in the wild. Our predictions based on Emergenets built only using 220,151 Hemagglutinnin (HA) sequences consistently outperform WHO seasonal vaccine recommendations for H1N1/H3N2 subtypes over two decades (average match-improvement: 3.73 AAs, 28.40\%), and are at par with state-of-the-art approaches that use more detailed phenotypic annotations. Finally, our generative models are used to scalably calculate the current odds of emergence of animal strains not yet in human circulation, which strongly correlates with CDC's expert-assessed Influenza Risk Assessment Tool (IRAT) scores (Pearson's r=0.721,p=104r = 0.721, p = 10^{-4}). A minimum five orders of magnitude speedup over CDC's assessment (seconds vs months) then enabled us to analyze 6,354 animal strains collected post-2020 to identify 35 strains with high emergence scores (>7.7> 7.7). The Emergenet framework opens the door to preemptive pandemic mitigation through targeted inoculation of animal hosts before the first human infection.
... Notably, [8]- [12] introduced non-orthogonal multiple access schemes for eMBB and URLLC coexistence in the uplink. These schemes stem from the classical studies on the Gaussian multiple access channel (GMAC) with homogeneous and infinite blocklength, where the entire capacity region can be achieved by superposition coding and successive interference cancellation (SIC) with time-sharing [13] or rate-splitting with partial SIC [14], [15]. ...
Preprint
Full-text available
We consider the uplink multiple access of heterogeneous users, e.g., ultra-reliable low-latency communications (URLLC) and enhanced mobile broadband (eMBB) users. Each user has its own reliability requirement and blocklength constraint, and users transmitting longer blocks suffer from heterogeneous interference. On top of that, the decoding of URLLC messages cannot leverage successive interference cancellation (SIC) owing to the stringent latency requirements. This can significantly degrade the spectral efficiency of all URLLC users when the interference is strong. To overcome this issue, we propose a new multiple access scheme employing discrete signaling and treating interference as noise (TIN) decoding, i.e., without SIC. Specifically, to handle heterogeneous interference while maintaining the single-user encoding and decoding complexities, each user uses a single channel code and maps its coded bits onto sub-blocks of symbols, where the underlying constellations can be different. We demonstrate theoretically and numerically that the proposed scheme employing quadrature amplitude modulations and TIN decoding can perform very close to the benchmark scheme based on Gaussian signaling with perfect SIC decoding. Interestingly, we show that the proposed scheme does not need to use all the transmit power budget, but also can sometimes even outperform the benchmark scheme.
... To motivate an effective maximum-entropy criterion, we start with an observation that the following facts about distributions over the unit cube are mathematically equivalent [CT91]: ...
Preprint
A number of different architectures and loss functions have been applied to the problem of self-supervised learning (SSL), with the goal of developing embeddings that provide the best possible pre-training for as-yet-unknown, lightly supervised downstream tasks. One of these SSL criteria is to maximize the entropy of a set of embeddings in some compact space. But the goal of maximizing the embedding entropy often depends--whether explicitly or implicitly--upon high dimensional entropy estimates, which typically perform poorly in more than a few dimensions. In this paper, we motivate an effective entropy maximization criterion (E2MC), defined in terms of easy-to-estimate, low-dimensional constraints. We demonstrate that using it to continue training an already-trained SSL model for only a handful of epochs leads to a consistent and, in some cases, significant improvement in downstream performance. We perform careful ablation studies to show that the improved performance is due to the proposed add-on criterion. We also show that continued pre-training with alternative criteria does not lead to notable improvements, and in some cases, even degrades performance.
... Thus, as suggested in many previous works [45], [46], we approximate the above encoding, decoding, and channel transfer steps as a successive Markov chain K 0 → K 1 → · · · → K n , where K 0 indicates the distribution of original images at the transmitter side, n is the total number of data processing procedures, K n indicates the distribution of ultimate edited images at the receiver side, and K s (0 < s < n) indicates the distribution of a series of intermediate representations during the above steps. For any two continuous transitions U → V → W from the whole Markov chain, we have the Data Processing Inequality (DPI) [47] as shown in Theorem 1. ...
Preprint
Real-time computer vision (CV) plays a crucial role in various real-world applications, whose performance is highly dependent on communication networks. Nonetheless, the data-oriented characteristics of conventional communications often do not align with the special needs of real-time CV tasks. To alleviate this issue, the recently emerged semantic communications only transmit task-related semantic information and exhibit a promising landscape to address this problem. However, the communication challenges associated with Semantic Facial Editing, one of the most important real-time CV applications on social media, still remain largely unexplored. In this paper, we fill this gap by proposing Editable-DeepSC, a novel cross-modal semantic communication approach for facial editing. Firstly, we theoretically discuss different transmission schemes that separately handle communications and editings, and emphasize the necessity of Joint Editing-Channel Coding (JECC) via iterative attributes matching, which integrates editings into the communication chain to preserve more semantic mutual information. To compactly represent the high-dimensional data, we leverage inversion methods via pre-trained StyleGAN priors for semantic coding. To tackle the dynamic channel noise conditions, we propose SNR-aware channel coding via model fine-tuning. Extensive experiments indicate that Editable-DeepSC can achieve superior editings while significantly saving the transmission bandwidth, even under high-resolution and out-of-distribution (OOD) settings.
... The relationship between the fidelities of the two protocols can be further understood through the quantum data processing inequality (DPI), a key principle in quantum information theory. The DPI states that applying a completely positive trace-preserving (CPTP) map such as a local operation, lossy transformation, or quantum channel, to a quantum state cannot increase its fidelity with respect to a reference state [49][50][51][52]. In the alternative protocol, the process of tracing out certain modes can be regarded as a lossy operation that discards useful correlations between subsystems. ...
Preprint
Quantum communication facilitates the secure transmission of information and the distribution of entanglement, but the rates at which these tasks can be achieved are fundamentally constrained by the capacities of quantum channels. Although quantum repeaters typically enhance these rates by mitigating losses and noise, a simple entanglement swapping protocol via a central node is not effective against the Pauli dephasing channel due to the additional degradation introduced by Bell-state measurements. This highlights the importance of purifying distributed Bell states before performing entanglement swapping. In this work, we introduce an entanglement purification protocol assisted by two-way classical communication that not only purifies the states but also achieves the channel capacities. Our protocol uses an iterative process involving CNOT gates and Hadamard basis measurements, progressively increasing the fidelity of Bell states with each iteration. This process ensures that the resulting Bell pairs are perfect in the limit of many recursive iterations, making them ideal for use in quantum repeaters and for correcting dephasing errors in quantum computers. The explicit circuit we propose is versatile and applicable to any number of Bell pairs, offering a practical solution for mitigating decoherence in quantum networks and distributed quantum computing.
... To enhance the information contained within the working dataframe, as well as normalizing the data and reducing the internal noise, improving the coherence of the clustering, we added an additional step, performing a normalization of the data based on mutual information (Cover and Thomas 1991). This technique, emerged from information theory, characterizes the amount of information from a variable x within a variable y, measuring the redundancy between variables, and resulting in a measure of the uncertainty of all our variables by eliminating this redundant information. ...
Article
Full-text available
Background The demand for fresh strategies to analyze intricate multidimensional data in neuroscience is increasingly evident. One of the most complex events during our neurodevelopment is adolescence, where our nervous system suffers constant changes, not only in neuroanatomical traits but also in neurophysiological components. One of the most impactful factors we deal with during this time is our environment, especially when encountering external factors such as social behaviors or substance consumption. Binge drinking (BD) has emerged as an extended pattern of alcohol consumption in teenagers, not only affecting their future lifestyle but also changing their neurodevelopment. Recent studies have changed their scope into finding predisposition factors that may lead adolescents into this kind of patterns of consumption. Methods In this article, using unsupervised machine learning (UML) algorithms, we analyze the relationship between electrophysiological activity of healthy teenagers and the levels of consumption they had 2 years later. We used hierarchical agglomerative UML techniques based on Ward's minimum variance criterion to clusterize relations between power spectrum and functional connectivity and alcohol consumption, based on similarity in their correlations, in frequency bands from theta to gamma. Results We found that all frequency bands studied had a pattern of clusterization based on anatomical regions of interest related to neurodevelopment and cognitive and behavioral aspects of addiction, highlighting the dorsolateral and medial prefrontal, the sensorimotor, the medial posterior, and the occipital cortices. All these patterns, of great cohesion and coherence, showed an abnormal electrophysiological activity, representing a dysregulation in the development of core resting‐state networks. The clusters found maintained not only plausibility in nature but also robustness, making this a great example of the usage of UML in the analysis of electrophysiological activity—a new perspective into analysis that, while contributing to classical statistics, can clarify new characteristics of the variables of interest.
... An essential step to perform sensitivity analysis of Bayesian posterior distribution with respect to its prior estimates of parameters obtained from the Box-Jenkins polynomial model has been implemented in this section. In accordance with [42][43][44], Relative Entropy or Kullback-Leibler (K-L) Divergence between two posterior probabilities density functions P(θ | Z) and θ p have been determined. Using Equations (20)-(23), K-L Divergence has been further derived in Equations (25) and (26). ...
Article
Full-text available
In this paper, nonlinear system identification using Bayesian network has been implemented to discover open-loop lateral-directional aerodynamic model parameters of an agile aircraft using a grey box modelling structure. Our novel technique has been demonstrated on simulated flight data from an F-16 nonlinear simulation of its Flight Dynamic Model (FDM). A mathematical model has been obtained using time series analysis of a Box-Jenkins (BJ) model structure, and parameter refinement has been performed using Bayesian mechanics. The aircraft nonlinear Flight Dynamic Model is adequately excited with doublet inputs, as per the dictates of its natural frequency, in accordance with non-parametric modelling (Finite Impulse Response) estimates. Time histories of optimized doublet inputs in the form of aileron and rudder deflections, and outputs in the form of roll and yaw rates are recorded. Dataset is pre-processed by implementing de-trending, smoothing, and filtering techniques. Blend of System Identification time-domain grey box modelling structures to include Output Error (OE) and Box-Jenkins (BJ) Models are stage-wise implemented in multiple flight conditions under varied stochastic models. Furthermore, a reduced order parsimonious model is obtained using Akaike information Criteria (AIC). Parameter error minimization activity is conducted using the Levenberg-Marquardt (L-M) Algorithm, and parameter refinement is performed using the Bayesian Algorithm due to its natural connection with grey box modelling. Comparative analysis of different nonlinear estimators is performed to obtain best estimates for the lateral-directional aerodynamic model of supersonic aircraft. Model Quality Assessment is conducted through statistical techniques namely: Residual Analysis, Best Fit Percentage, Fit Percentage Error, Mean Squared Error, and Model order. Results have shown promising one-step model predictions with an accuracy of 96.25%. Being a sequel to our previous research work for postulating longitudinal aerodynamic model of supersonic aircraft, this work completes the overall aerodynamic model, further leading towards insight to its flight control laws and subsequent simulator design.
... Definition 5 (KL-Divergence [6]). Let P, Q be two distributions over the space Ω and suppose P is absolutely continuous with respect to Q. ...
Article
Full-text available
We analyze the generalization properties of batch reinforcement learning (batch RL) with value function approximation from an information-theoretic perspective. We derive generalization bounds for batch RL using (conditional) mutual information. In addition, we demonstrate how to establish a connection between certain structural assumptions on the value function space and conditional mutual information. As a by-product, we derive a high-probability generalization bound via conditional mutual information, which was left open and may be of independent interest.
... where H(.) is the entropy defined by Shannon and I(.) denotes the mutual information (MI) (Cover and Thomas, 2006). SU is zero for independent random variables and equal to one for deterministically dependent ones. ...
Conference Paper
The progressive degeneration of nerve cells causes neurodegenerative diseases. For instance, Alzheimer and Parkinson diseases progressively decrease the cognitive abilities and the motor skills of an individual. Without the knowledge for a cure, we aim to slow down their impact by resorting to rehabilitative therapies and medicines. Thus, early diagnosis plays a key role to delay the progression of these diseases. The analysis of handwriting dynamics for specific tasks is found to be an effective tool to provide early diagnosis of these diseases. Recently, the Diagnosis AlzheimeR WIth haNdwriting (DARWIN) dataset was introduced. It contains records of handwriting samples from 174 participants (diagnosed as having Alzheimer's or not), performing 25 specific handwriting tasks, including dictation, graphics, and copies. In this paper, we explore the use of the DARWIN dataset with dimensionality reduction, explainability, and classification techniques. We identify the most relevant and decisive handwriting features for predicting Alzheimer. From the original set of 450 features with different groups, we found small subsets of features showing that the time spent to perform the in-air movements are the most decisive type of features for predicting Alzheimer.
... We, therefore, first describe the relevant information theoretic measures that we use. Mutual information is defined as follows [28]: ...
Article
Full-text available
This work analyses the interdependent link creation of patent and shareholding links in interfirm networks, and how this dynamics affects the resilience of such networks in the face of cascading failures. Using the Orbis dataset, we construct very large co-patenting and shareholding networks, globally as well as in terms of individual countries. Besides, we construct smaller overlap networks from those firm pairs which have both types of links between them, for nine years between 2008-2016. We use information theoretic measures, such as mutual information, active information storage, and transfer entropy, to characterise the topological similarities and shared topological information between the relevant co-patenting and shareholding networks. We then construct a cascading failure model, and use it to analyse the resilience of interdependent interfirm networks in terms of multiple failure characteristics. We find that there is relatively high level of mutual information between co-patenting networks and the shareholding networks from later years, suggesting that the formation of shareholding links is influenced by the existence of patent links in previous years. We highlight that this phenomena differs between countries. For interfirm networks from certain countries, such as Switzerland and Netherlands, this influence is remarkably higher compared to other countries. We also show that this influence becomes most apparent after a delay of four years between the formation of co-patenting links and shareholding links. Analysing the resilience of shareholding networks against cascading failures, we show that in terms of both mean downtime, and failure proportion of firms, certain countries including Italy, Germany, India, Japan and the United States, have less resilient shareholding networks compared to other countries with significant economies. Based on our results, we postulate that an interfirm network model which considers multiple types of relationships together, uses information theoretic measures to establish information sharing and causality between them, and uses cascading failure simulation to understand the resilience of such networks under economic and financial stress, could be a useful multifaceted model to highlight important features of economic systems around the world.
... We perform the following analysis based on the assumption of optimal channel coding [52] under the additive white Gaussian noise (AWGN) channel model. Note that, the channel condition and coding method are just chosen to simplify the subsequent derivations, while the general procedure and consequent algorithm design are applicable as long as Theorem 1 holds no matter which channel condition and coding method are used. ...
Article
Full-text available
Recently, the growing of deep space explorations has attracted notable interests on interplanetary network (IPN), which is the key infrastructure for communications across vast distances in the solar system. However, the unique characteristics of IPN pose numerous unexplored challenges for interplanetary data transfers (IP-DTs), i.e., the challenges that existing schemes developed for Earth-based networks are ill-equipped to handle. To address these challenges, we first propose a novel distributed algorithm that leverages the Lyapunov optimization to jointly optimize the routing, scheduling and rate control of IP-DTs at each node. Specifically, our proposal adaptively optimizes the data-rate and bundle scheduling at each output port of a node, significantly improving the end-to-end (E2E) latency and delivery ratio of IP-DTs under a long-term energy constraint. Then, we further explore the heterogeneity of IPN to introduce limited state information exchange among nodes, and devise mechanisms for generating and disseminating state messages to facilitate timely adjustments of routing and scheduling schemes in response to unexpected link disruptions and traffic surges. Simulations verify the advantages of our proposal over the state-of-the-arts.
... where I physical stands for the physical properties of a system, and I information represents the corresponding information-theoretic quantities [65,66]. ...
Article
Full-text available
This study aims to provide general axioms that should hold for any theory of quantum gravity. These axioms imply that spacetime is an emergent structure, which emerges from information. This information cannot occur in spacetime, as spacetime emerges from it, and hence exists in an abstract mathematical Platonic realm. Thus, quantum gravity exists as a formal system in this Platonic realm. Gödel’s theorems apply to such formal systems, and hence they apply to quantum gravity. This limits the existence of a complete consistent theory of quantum gravity. However, we generalize the Lucas-Penrose argument and argue that a non-algorithmic understanding exists in this Platonic realm. It makes it possible to have a complete and consistent theory of quantum gravity.
... By the data-processing inequality applied to relative entropies (see [23] pp. 370-371), we have ...
Preprint
This work is concerned with robust distributed multi-view image transmission over a severe fading channel with imperfect channel state information (CSI), wherein the sources are slightly correlated. Since the signals are further distorted at the decoder, traditional distributed deep joint source-channel coding (DJSCC) suffers considerable performance degradation. To tackle this problem, we leverage the complementarity and consistency characteristics among the distributed, yet correlated sources, and propose an enhanced robust DJSCC, namely RDJSCC. In RDJSCC, we design a novel cross-view information extraction (CVIE) mechanism to capture more nuanced cross-view patterns and dependencies. In addition, a complementarity-consistency fusion (CCF) mechanism is utilized to fuse the complementarity and consistency from multi-view information in a symmetric and compact manner. Theoretical analysis and simulation results show that our proposed RDJSCC can effectively leverage the advantages of correlated sources even under severe fading conditions, leading to an improved reconstruction performance. The open source code of this work is available at:https://dongbiao26.github.io/rdjscc/.
... Using Lemma 4.3 on H S we obtain a matching M over V (G) such that Obj D (G+M ) ≤ 7α·Obj D S (H S ) ≤ 14 · αc 1 · f CE (D S ). From the well-known decomposition (or grouping) property of entropy [4,9], which intuitively means that merging two probability values into one by adding them, does not increase entropy, it follows that f CE (D S )) ≤ f CE (D) and consequently we have ...
Preprint
Full-text available
Graph augmentation is a fundamental and well-studied problem that arises in network optimization. We consider a new variant of this model motivated by reconfigurable communication networks. In this variant, we consider a given physical network and the measured communication demands between the nodes. Our goal is to augment the given physical network with a matching, so that the shortest path lengths in the augmented network, weighted with the demands, are minimal.We prove that this problem is NP-hard, even if the physical network is a cycle. We then use results from demand-aware network design to provide a constant-factor approximation algorithm for adding a matching in case that only a few nodes in the network cause almost all the communication. For general real-world communication patterns, we design and evaluate a series of heuristics that can deal with arbitrary graphs as the underlying network structure. Our algorithms are validated experimentally using real-world traces (from e.g., Facebook) of data centers.
... Theoretically, the BKT transition is characterized by a rapid increase of the vortex number n v . 93 Figure 1A plots its temperature derivative dn v /dT for three different values of V at fixed J = 0.5 and = 0. A peak is clearly seen for each curve that defines the temperature scale . ...
Article
Identifying the key factors controlling the magnitude of Tc is of critical importance in the pursuit of high-temperature superconductivity. In cuprates, Tc reaches its maximal value in trilayer structure, leading to the belief that interlayer coupling may help promote the pairing. In contrast, for the recently discovered nickelate superconductors under high pressure, the maximum Tc is reduced from about 80 K in the bilayer La3Ni2O7 to 30 K in the trilayer La4Ni3O10. Motivated by this opposite trend, we propose an interlayer pairing scenario for the superconductivity of La4Ni3O10. Our theory reveals intrinsic frustration in the spin-singlet pairing that the inner layer tends to form with both of the two outer layers respectively, leading to strong superconducting fluctuations between layers. This explains the reduction of its maximum Tc compared to that of the bilayer La3Ni2O7. Our findings support a fundamental distinction between multilayer nickelate and cuprate superconductors, and ascribe it to their different (interlayer versus intralayer) pairing mechanisms. Furthermore, our theory predicts extended s±-wave gap structures in La4Ni3O10, with varying signs and possible nodes on different Fermi pockets. We also find an intrinsic Josephson coupling with potentially interesting consequences that may be examined in future experiments. Our work reveals the possibility of rich novel physics in multilayer superconductors with interlayer pairing.
Article
A prominent argument in the political correctness debate is that people feel pressure to publicly espouse sociopolitical views they do not privately hold, and that such misrepresentations might render public discourse less vibrant and informative. This paper formalizes the argument in terms of social image and evaluates it experimentally in the context of college campuses. The results show that (i) social image concerns drive a wedge between the sensitive sociopolitical attitudes that college students report in private and in public; (ii) public utterances are indeed less informative than private utterances; and (iii) information loss is exacerbated by (partial) audience naïveté. (JEL D72, D83, D91, I23, Z13)
Article
Full-text available
The El Niño–Southern Oscillation (ENSO) is a dominant mode of climate variability influencing temperature and precipitation in distant parts of the world. Traditionally, the ENSO influence is assessed considering its amplitude. Focusing on its quasi-oscillatory dynamics comprising multiple timescales, we analyze the causal influence of phases of ENSO oscillatory components on scales of precipitation variability in eastern China, using information-theoretic generalization of Granger causality. We uncover the causal influence of the ENSO quasi-biennial component on the precipitation variability on and around the annual scale, while the amplitude of the precipitation quasi-biennial component is influenced by the low-frequency ENSO components with periods of around 6 years. This cross-scale causal information flow is important mainly in the Yellow River basin (YWRB), while in the Yangtze River basin (YZRB) the causal effect of the ENSO amplitude is dominant. The presented results suggest that, in different regions, different aspects of ENSO dynamics should be employed for prediction of precipitation.
Article
Full-text available
In this work, a new method for denoising signals is developed that is based on variational mode decomposition (VMD) and a novel metric using detrended fluctuation analysis (DFA). The proposed method first decomposes the signal into band-limited intrinsic mode functions (BLIMFs) using VMD. Then, a DFA-based developed metric is employed to identify the `noisy’ BLIMFs (based on their DFA-based scaling exponent and frequency content). The existing DFA-based methods use a single-slope threshold to detect noise, assuming all signals have the same noise pattern and ignoring their unique characteristics. In contrast, the proposed DFA-based metric sets adaptive thresholds for each mode based on their specific frequency and correlation properties, making it more effective for diverse signals and noise types. These predominantly noisy BLIMFs are then denoised using shrinkage techniques in the framework of stationary wavelet transform (SWT). This step allows efficient denoising of components, mainly the noisy BLIMFs identified by the adaptive threshold, without losing important signal details. Extensive computer simulations have been carried out for both synthetic and real electrocardiogram (ECG) signals. It is demonstrated that the proposed method outperforms the state-of-the-art denoising methods and with a comparable computational complexity.
Preprint
We consider a sequential decision-making setting where, at every round t, a market maker posts a bid price BtB_t and an ask price AtA_t to an incoming trader (the taker) with a private valuation for one unit of some asset. If the trader's valuation is lower than the bid price, or higher than the ask price, then a trade (sell or buy) occurs. If a trade happens at round t, then letting MtM_t be the market price (observed only at the end of round t), the maker's utility is MtBtM_t - B_t if the maker bought the asset, and AtMtA_t - M_t if they sold it. We characterize the maker's regret with respect to the best fixed choice of bid and ask pairs under a variety of assumptions (adversarial, i.i.d., and their variants) on the sequence of market prices and valuations. Our upper bound analysis unveils an intriguing connection relating market making to first-price auctions and dynamic pricing. Our main technical contribution is a lower bound for the i.i.d. case with Lipschitz distributions and independence between prices and valuations. The difficulty in the analysis stems from the unique structure of the reward and feedback functions, allowing an algorithm to acquire information by graduating the "cost of exploration" in an arbitrary way.
Article
We investigate a problem of attention allocation and portfolio selection with information capacity constraint and return predictability in a multi-asset framework. In a two-phase formulation, the optimal attention strategy maximizes the combined expected alpha payoffs and expected beta payoffs of the portfolio. Return predictors taking extreme values incentivize the investor to learn about them and this leads to competition among information sources for attention. Moreover, the investor trades with varying skills including picking alphas and betting on beta, depending on the magnitude of the related predictors. Our multi-period analysis using reinforcement learning demonstrates time-horizon effects on attention and investment strategies.
Preprint
The dynamics and processes involved in particle-molecule scattering, including nuclear dynamics, is described and analyzed by different quantum information quantities along the different stages of the scattering. The main process studied and characterized with the information quantities is the interatomic coulombic electronic capture (ICEC), an inelastic process that can lead to dissociation of the target molecule. The analysis is focused in a one-dimensional transversely confined NeHe\text{NeHe} molecule model used to simulate the scattering between an electron e\text{e}^-(particle) and a NeHe+\text{NeHe}^+ ion (molecule). The time-independent Schr\"odinger equation (TISE) is solved using the Finite Element Method (FEM) with a self-developed Julia package FEMTISE to compute potential energy curves (PECs) and the parameters of the interactions between particles. The time-dependent Schr\"odinger equation (TDSE) is solved using the Multi-configuration time-dependent Hartree (MCTDH) algorithm. The time dependent electronic and nuclear probability densities are calculated for different electron incoming energies, evidencing elastic and inelastic processes that can be correlated to changes in von Neumann entropy, conditional mutual information and Shannon entropies. The expectation value of the position of the particles, as well as their standard deviations, are analyzed along the whole dynamics and related to the entanglement during the collision and after the process is over, hence evidencing the dynamics of entanglement generation. It is shown that the correlations generated in the collision is partially retained only when the inelastic process is active.
516 Grouping rule, 43 Growth rate optimal
  • U Grenander
Grenander, U., 516 Grouping rule, 43 Growth rate optimal, 459
291 Hypothesis testing, 1,4, 10,287,304-315 Bayesian, 312-315 optimal, see Neyman-Pearson lemma iff (if and only if)
  • Huffman
Huffman code, 78,87,92-110,114,119, 121-124,171,288,291 Hypothesis testing, 1,4, 10,287,304-315 Bayesian, 312-315 optimal, see Neyman-Pearson lemma iff (if and only if), 86 i.i.d. (independent and identically distributed), 6
xi Law of large numbers: for incompressible sequences181 method of types, 286-288 strong
  • S Lavenberg
Lavenberg, S., xi Law of large numbers: for incompressible sequences, 157,158, 179,181 method of types, 286-288 strong, 288,310,359,436,442,461,474 weak, 50,51,57,126,178,180,195,198, 226,245,292,352,384,385
519 Linear algebra, 210 Linear code
  • Y Linde
Linde, Y., 519 Linear algebra, 210 Linear code, 210,211
See also Shannon-McMillan-Breiman theorem McMillan's inequality
  • B Mcmillan
McMillan, B., 59, 124,520. See also Shannon-McMillan-Breiman theorem McMillan's inequality, 90-92, 117, 124 MDL (minimum description length), 182
See also Channel, discrete memoryless Mercury
  • Memoryless
Memoryless, 57,75, 184. See also Channel, discrete memoryless Mercury, 169
See also Wealth Monkeys
  • Money
Money, 126-142,166, 167,471. See also Wealth Monkeys, X0-162, 181
189 Set sum, 498 agn function
  • Sender
Sender, 10, 189 Set sum, 498 agn function, 108-110
See also Shannon code; Shannon-Fano-Eliaa code; Shannon-McMiIlan-Breiman theorem Shannon code
  • C E Shannon
Shannon, C. E., viii, 1,3,4, 10,43,49,59, 77, 80,87,95,124,134,138-140,143,198, 222,223,248,249,265,369,372,383, 457,458,499,509,522,523. See also Shannon code; Shannon-Fano-Eliaa code; Shannon-McMiIlan-Breiman theorem Shannon code, 89,95,96,107,108,113,117, 118, 121,124,144, 151, 168,459
333 Kohnogorov sufficient, 176 minimal sufficient, 38 sufficient
  • Statistic
Statistic, 36,36-38,176,333 Kohnogorov sufficient, 176 minimal sufficient, 38 sufficient, 37
383 Weakly typical, see Typical Wealth, x, 4,9
  • Waveform
Waveform, 247,340,383 Weakly typical, see Typical Wealth, x, 4,9, 10, 34, 125-142, 166, 167, 459-480 Wealth relative, 126
See also Lempel-Ziv coding Ziv's inequality
  • J Ziv
  • Xi
Ziv, J., xi, 107, 124, 319, 335, 443,458, 519, 524,525. See also Lempel-Ziv coding Ziv's inequality, 323
Elias coding / 101 Arithmetic coding / 104 Competitive optimality of the Shannon code / 107 Generation of discrete distributions from fair coins / 110 Summary of Chapter
  • Shannon-Fano
Shannon-Fano-Elias coding / 101 Arithmetic coding / 104 Competitive optimality of the Shannon code / 107 Generation of discrete distributions from fair coins / 110 Summary of Chapter 5 / 117 Problems for Chapter 5 / 118 Historical notes / 124
139 Historical notes
  • Histogram
Histogram, 139 Historical notes, 49,59, 77, 124, 143, 182, 222,238,265,278,335,372,457,481,509
optimal, see Neyman-Pearson lemma iff (if and only if)
  • Huffman Code
Huffman code, 78,87,92-110,114,119, 121-124,171,288,291 Hypothesis testing, 1,4, 10,287,304-315 Bayesian, 312-315 optimal, see Neyman-Pearson lemma iff (if and only if), 86 457,458,510,514,515,518
See also Covariance matrix Variational distance
  • Variance
Variance, 36,57,234-264,270,274,316,317, 327,330. See also Covariance matrix Variational distance, 300