Science topic

# Information Theory - Science topic

An interdisciplinary study dealing with the transmission of messages or signals, or the communication of information. Information theory does not directly deal with meaning or content, but with physical representations that have meaning or content. It overlaps considerably with communication theory and CYBERNETICS.

Questions related to Information Theory

In recent years there has been a tremendous surge in neuroimaging research, and in my experience the most exciting aspects lie in:

- exploring how neural systems are able to process and integrate multiple inputs,
- elucidating how complex neuronal circuits can be understood by computational modelling of simplified models (both static as well as dynamical),
- elucidating what is the mechanism for synaptic plasticity and the mechanism underlying how brain regions communicate (the neurodevelopmental and plastic brain models of cognitive and computational processing and brain connectivity).
- Understanding what is the underlying computational basis for the generation of complex neural activity in a given brain region. For example, can neurons with similar inputs and identical synaptic parameters but different weights in a given layer of the brain show a qualitatively distinct neural firing rate pattern?

To understand these kinds of neuronal computations at an integrative level, a systems view is a powerful framework to provide both mechanistic insight, while taking advantage, as a complementary method, also allowing the possibility of modelling the system in a rigorous and detailed way and providing insight into its behaviour. What's your (qualified) opinion?

Dear all,

Why forward selection search is very popular and widely used in FS based on mutual information such as MRMR, JMI, CMIM, and JMIM (See )? Why other search approaches such as the beam search approach is not used? If there is a reason for that, kindly reply to me.

The general consensus about the brain and various neuroimaging studies suggest that brain states indicate variable entropy levels for different conditions. On the other hand, entropy is an increasing phenomenon in nature from the thermodynamical point of view and biological systems contradict this law for various reasons. This can be also thought of as the transformation of energy from one form to another. This situation makes me think about the possibility of the existence of distinct energy forms in the brain. Briefly, I would like to ask;

Could we find a representation for the different forms of energy rather than the classical power spectral approach? For example, useful energy, useless energy, reserved energy, and so on.

If you find my question ridiculous, please don't answer, I am just looking for some philosophical perspective on the nature of the brain.

Thanks in advance.

If there were the quantum mechanical equivalents of individual neurons and of larger networks of neurons, and if quantum mechanisms of error correction worked on those level, you could get something like consciousness. This is because information could (in principle) flow between neurons - that means you have a mechanism of some sort of distributed computing inside the brain. What's your view?

An alternate (rather elaborate) discussion about the two can be found below. However this particular idea just emerged once I started rethinking about information in general.

The current

**, known as Industry 4.0, is determined by the development of the following technologies of advanced information processing: Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence, Business Intelligence and other advanced data mining technologies.***technological revolution*In connection with the above, I would like to ask you:

Which

**of the current***information technologies***contribute the most to reducing the***technological revolution Industry 4.0***between counterparties of***asymmetry of information***?***financial transactions*The above question concerns the

**between such***asymmetry of information***, such as between***financial transaction partners***and***borrowers***granting loans, and before granting a loan carrying out***banks***of a potential borrower and the bank's***creditworthiness***level associated with a specific***credit risk***and, inter alia,***credit transaction***and clients of their financial services.***financial institutions*Please reply

Best wishes

How to obtain currently necessary information from

*for the needs of specific***Big Data database systems***and necessary to carry out***scientific research***?***economic, business and other analyzes**Of course, the right data is important for

*. However, in the present era of digitalization of various categories of***scientific research***and creating various libraries, databases, constantly expanding large data sets stored in database systems, data warehouses and***information***, it is important to develop techniques and tools for filtering large data sets in those databases***Big Data database systems***to filter out of terabytes of data only***data***that is currently needed for the purpose of conducted scientific research in a given field of***information***, for the purposes of obtaining answers to a given research question and for***knowledge***needs, eg after connecting these databases to***business***analytical platforms. I described these issues in my scientific publications presented below.***Business Intelligence**Do you agree with my opinion on this matter?

In view of the above, I am asking you the following question:

*How to obtain currently necessary information from*

**Big Data**database systems for the needs of specific**scientific research**and necessary to carry out**economic, business and other analyzes**?Please reply

I invite you to the discussion

Thank you very much

Dear Colleagues and Friends from RG

*The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:*

I invite you to discussion and cooperation.

Best wishes

What are the important topics in the field: Data analysis in

**database systems?***Big Data*What kind of scientific research dominate in the field of Data analysis in

**database systems?***Big Data*Please reply. I invite you to the discussion

Dear Colleagues and Friends from RG

The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:

I invite you to discussion and cooperation.

Best wishes

I would like to have a deeper insight into Markov Chain, its origin, and its application in Information Theory, Machine Learning and automated theory.

The

**future**of**marketing**development in**social media**Marketing in social media is still a very developing field in the field of

**marketing**techniques used on the**Internet**. On the one hand, some of the largest**online technology companies**have built their business concept on**social media marketing**or are entering this field.On the other hand, there are startups of technology companies acquiring data from the Internet and processing information in

**Big Data database systems**for the purpose of providing information services to other entities as support for strategic and operational**management**, including planning advertising campaigns.Therefore, the question arises:

**What tools for social media marketing will be developed in the future?**Please, answer, comments

I invite you to the discussion

Hello Dear colleagues:

it seems to me this could be an interesting thread for discussion:

I would like to center the discussion around the concept of Entropy. But I would like to address it on the explanation-description-ejemplification part of the concept.

i.e. What do you think is a good, helpul explanation for the concept of Entropy (in a technical level of course) ?

A manner (or manners) of explain it trying to settle down the concept as clear as possible. Maybe first, in a more general scenario, and next (if is required so) in a more specific one ....

Kind regards !

An interesting thing is the algorithm according to which specific search results appear in a

**based on a given password.***Google search engine*Formulas of this type of

**can be variously constructed so that different search results can be obtained according to the same password.***algorithms*Added to this is the issue of promoting search results for companies that have made certain fees for a high level of positioning in

**. Unfortunately, this is not an objective process of finding information available on the***search results***but a formula based on commercial marketing. In this situation, there is a question about competitiveness, which is limited in this way.***Internet*In view of the above, I am asking you: Does

**'s search engine algorithm restrict competition in the availability of***Google***on the Internet?***information*Please,

**, comments. I invite you to the discussion.***answer*What kind of scientific research dominate in the field of

*?***Functionality and applications of smartphones**Please, provide your suggestions for a

*, problem or***question***in the issues:***research thesis***Functionality and applications of smartphones*.Please reply.

I invite you to the discussion

Thank you very much

Best wishes

I have been pondering about the relationship between these two important topics of our data-driven world for a while. I have bits and pieces, but I have been looking forward to find a neat and systematic set of connections that would somehow (surprisingly) bind them and fill the empty spots I have drawn in my mind for the last few years.

In the past, while I was dealing with multi-class classification problem (not so long ago), I have come to realize that multiple binary classifications is a viable option to address this problem through using error correction output coding (ECOC) - a well known coding technique used in the literature whose construction requirements are a bit different than classical block or convolutional codes. I would like to remind you that grouping multiple classes in two superclasses (a.k.a. class binarization) can be addressed in various ways. You can group them totally randomly which does not dependent on the problem at hand or based on a set of problem-dependent constraints that can be derived from the training data. One way I like the most stays at the intersection point of information theory and machine learning. To be more precise, class groupings can be done based on the resultant mutual information to be able to maximise the class separation. In fact, the main objective with this method is to maximise class separation so that your binary classifiers expose less noisy data and hopefully result in better performance. On the other hand, ECOC framework calls for coding theory and efficient encoder/decoder architectures that can be used to efficiently handle the classification problem. The nature of the problem is not something we usually come across in communication theory and classical coding applications though. Binarization of classes implies different noise and defect structures to be inserted into the so called "channel model" which is not common in classical communication scenarios. In other words, the solution itself changes the nature of the problem at hand. Also the way we choose the classifiers (such as margin-based, etc) will affect the characterization of the noise that impacts the detection (classification) performance. I do not know if possible, but what is the capacity of such a channel? What is the best code structure that addresses these requirements? Even more interestingly, can the recurrent issues of classification (such as overfitting) be solved with coding? Maybe we can maintain a trade-off between training and generalization errors with an appropriate coding strategy?

Similar trends can be observed in the estimation theory realm. Parameter estimations or in the same way "regression" (including model fitting, linear programming, density estimation etc) can be thought as the problems of finding "best parameters" or "best fit", which are ultimate targets to be reached. The errors due to the methods used, collected data, etc. are problem specific and usually dependent. For instance, density estimation is a hard problem in itself and kernel density estimation is one of its kind to estimate probability density functions. Various kernels and data transformation techniques (such as Box-Cox) are used to normalize data and propose new estimation methods to meet today's performance requirements. To measure how well we do, or how different distributions are we again resort to information theory tools (such as Kullback–Leibler (KL) divergence and Jensen-Shannon function) and use the concepts/techniques (including entropy etc.) therein from a machine learning perspective. Such an observation separates the typical problems posed in the communication theory arena from the machine learning arena requiring a distinct and careful treatment.

Last but not the least, I think that there is deep rooted relationship between deep learning methods (and many machine learning methods per se) and basic core concepts of information and coding theory. Since the hype for deep learning has appeared, I have observed that many studies applying deep learning methods (autoencoders etc) for decoding specific codes (polar, turbo, LDPC, etc) claiming efficiency, robustness, etc thanks to parallel implementation and model deficit nature of neural networks. However, I am wondering the other way around. I wonder if, say, back-propagation can be replaced with more reasonable and efficient techniques very well known in information theory world to date.Perhaps, distortion theory has something to say about the optimal number of layers we ought to use in deep neural networks. Belief propagation, turbo equalization, list decoding, and many other known algorithms and models may have quite well applicability to known machine learning problems and will perhaps promise better and efficient results in some cases. I know few folks have already began searching for neural-network based encoder and decoder designs for feedback channels. There are many open problems in my oppinion about the explicit design of encoders and use of the network without the feedback. Few recent works have considered various areas of applications such as molecular communications and coded computations as means to which deep learning background can be applied and henceforth secure performances which otherwise cannot be achieved using classical methods.

In the end, I just wanted to toss few short notes here to instigate further discussions and thoughts. This interface will attract more attention as we see the connections clearly and bring out new applications down the road...

I would like to know if there is an expression that shows the (maximum) channel capacity of a downlink multiuser MIMO channel when imperfect CSI is assumed.

Any references in this direction would be useful for me. Thanks!

Hi, How can we calculate the entropy of chaotic signals? Is there a simple method or formula for doing this?

Greetings,

I am working on my Grad Project implementing an LDPC DVB-S2 decoder. and the best resources explaining the LDPC decoding I found -unfortunately- follows the 5G standard. So, If I followed along with these resources discussing the 5G implementation, What should I look out for so not to get confused between the 2 implementations?

Thanks in advance!

Is there an equation connecting the wave function and the entropy of the quantum system?

I have the following data set (attached) and I would like to calculate mutual information and joint entropy between multiple columns (like for A,B,D,E or C,D,E,F,G etc.). I have gone through R package entropy and other related packages. but as I am very new to information theory, I am having some problem to compute it.

I am specifically looking for R code or online calculator options to calculate this.

Can we affirm that whenever one has a prediction algorithm, one can also get a correspondingly good compression algorithm for data one already have, and vice versa?

Please consider a set of pairs of probability measures (P, Q) with given means (m_P, m_Q) and variances (v_P, v_Q).

For the relative entropy (KL-divergence) and the chi-square divergence, a pair of probability measures defined on the common two-element set (u_1, u_2) attains the lower bound.

Regarding general f-divergence, what is the condition of f such that a pair of probability measures defined on the common two-element set attains the lower bound ?

Intuitively, I think that the divergence between localized probability measures seems to be smaller.

Thank you for taking your time.

Dear researchers,

Let's share our opinion about recent attractive topics on communication systems and the potential future directions.

Thanks.

By definition, the capacity of a communication channel is given by the maximum of the mutual information between the input (X) and output (Y) of the channel, where the maximization is with respect to the input distribution, that is C=sup_{p_X(x)}MI(X;Y).

From my understanding (please correct me if I'm wrong), when we have a noisy channel, such that some of the input symbols may be confused in the output of the channel, we can draw a confusability graph of such a channel where nodes are symbols and two nodes are connected if and only if they could be confused in the output.

If we had to communicate

**using messages made out of single symbols only,**then the largest number of messages that could be sent over such a channel would be α(G), the size of the largest independent set of vertices in the graph (in this case Shannon capacity of the graph equals independence number of that graph α(G)).Does this mean that for such a channel, the maximum mutual information of the input and output of the channel (channel capacity) is α(G), and it is achieved by sending the symbols of the largest independent set?

What in your opinion will the applications of the technology of analyzing big information collections in

**database systems be developed in the***Big Data***?***future*In which areas of industry, science, research, information services, etc., in your opinion, will the

**for the analysis of large collections of information in Big Data database systems be developed in the future?***applications of technology*Please reply

I invite you to the discussion

I described these issues in my publications below:

I invite you to discussion and cooperation.

Best wishes

The

**increasingly affects economic processes taking place in various branches and sectors of contemporary developed and***development of IT and information technologies***.***developing economies*Information technology and advanced information processing are increasingly affecting

**.***people's lives and business ventures*The current

*known as Industry 4.0, is determined by the development of the following technologies of advanced information processing: Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence, Business Intelligence and other advanced data mining technologies.***technological revolution,**In connection with the above, I would like to ask you:

**in the national economy resulting from the development of information and IT technologies?**

*How to measure the value added*Please reply

Best wishes

In information theory, the entropy of a variable is the amount of information contained in the variable. One way to understand the concept of the amount of information is to tie it to how difficult or easy it is to guess the value. The easier it is to guess the value of the variable, the less “surprise” in the variable and so the less information the variable has.

Rényi entropy of order q is defined for q ≥ 1 by the equation,

S = (1/1-q) log (Σ p^q)

As order q increases, the entropy weakens.

Why we are concerned about higher orders? What is the physical significance of order when calculating the entropy?

Normalized Mutual Information (NMI) and B3 are used for extrinsic clustering evaluation metrics when each instance (sample) has only one label.

What are equivalent metrics when each instance (sample) has only one label?

For example, in first image, we see [apple, orange, pears], in second image, we see [orange, lime, lemon] and in third image, we see [apple], and in the forth image we see [orange]. Then, if put first image and last image in the one cluster it is good, and if put third and forth image in one cluster is bad.

Application: Many popular datasets for object detection or image segmentation have multi labels for each image. If we used this data for classification (not detection and not segmentation), we have multiple labels for each image.

Note: My task is unsupervised clustering, not supervised classification. I know that for supervised classification, we can use top-5 or top-10 score. But I do not know what will be in unsupervised clustering.

The information inside the volume of black hole is proportional to its surface.

However, what if information does not cross the horizon, rather it is constrained to stay on the horizon's surface, progressively increasing the black hole radius? What if the black hole is empty, and its force comes just from a spacetime distorption inside it? Reversing Einstein, what if the black hole's attraction is not caused by its mass, but just by the spacetime deformation inside it? This would explain the paradoxes of the holographic principle...

Thanks

Clues: material isn’t doomed to be sucked into the hole. Only a small amount of it falls in, while some of it is ejected back out into space.

"Claims of experience" are autobiographical and semiotic. Yet they offer glimpses into the unique world of the individual. In a sense an empathic window into the "soul" of the experience. How does one sift out the exaggerations and elicit the "truth" when what is the truth is not even certain if the experience is reported for the first time?

Article Practice Based Research: A Guide

For compressive sensing (CS) , we can use fewer

*M*measurements to reconstruct original*N*-dimension signal, where M<<N and the measurement matrix satisfies the restricted isometry property (RIP). Can we combine the concept of entropy in information theorem with CS? Intuitively speaking, the data is successfully reconstructed and information will not get lost before and after doing CS. Can we claim that the entropy of the compressed measurements is equal to the entropy of the original signal because entropy stands for contained information?To understand my problem more easily, I take a example below:

Supposed in a data gathering wireless sensor network, we deploy

*N*machines in the area. To quantify the amount of information collected by each machine, we assume a Gaussian source field, where the collection of data gathered by all machines is assumed to follow a multi-variate Gaussian distribution ~N(\mu, \Sigma). The joint entropy of all data is H(X). NOW we can use*M*measurements to reconstruct these data by CS. The joint entropy of these*M*data is H(Y). Can we say H(X) will equal to H(Y)?Thanks for your response.

In the question, Why is entropy a concept difficult to understand? (November 2019) Franklin Uriel Parás Hernández commences his reply as follows: "The first thing we have to understand is that there are many Entropies in nature."

His entire answer is worth reading.

It leads to this related question. I suspect the answer is, yes, the common principle is degrees of freedom and dimensional capacity. Your views?

While studying information theory we do not consider any directionality of the channel.

There will be no change if Reciever and the Transmitter are interchanged (i.e. Lorentz Reciprocity is obeyed).

However, suppose if the channel is a non-reciprocal device like a isolator or a Faraday rotator, rather than a simple transmission cable. What are the consequences on the Information theory?

What would be consequences on Shannon Entropy and several theorems like Shannon-Coding theorem, Shannon-Hartley theorem etc. I have been googling with several terms like Non-reciprocal networks etc, but I have not been able to find anything. Any help will be appreciated.

We very frequently use cross-entropy loss in neural networks. Cross-entropy originally came from information theory about entropy and KL divergence.

My question is that.. if I want to design a new objective function, does it always need to satisfy information theory?

For example, in my objective function, I want to add a probability measure of something, say A, to cross-entropy loss. A ranges from 0 to 1. So, the objective function will look like this:

= A + (cross-entropy between actual and prediction)

= A + (-(actual)*log(prediction))

Say, the above objective function works good for neural networks, but violate information theory in the sense that we are adding a probability value, A, to a loss value, which is cross-entropy: (-(actual)*log(prediction))

So, my question is that.. even if it violates loss evaluation from the viewpoint information theory, is it acceptable as an objective function for neural networks?

When Carrier Aggregation and cross-carrier scheduling are applied in LTE-Advanced system, UE may support multiple Component Carriers (CC) and control information on one CC can allocate radio resource on another CC. Search space of all CCs and control information is only transmitted from a chosen CC. In this case, if search spaces of different CCs are not properly defined, high blocking probability of control information will be very harmful to system performance.

My Question is: What is the cause of this blocking, is it deficiency of control channel elements to served, scheduled UE or what?

My guess is not but I have no proof of this. Can any expert help?

For now, I assume either self-overlapping or high mutual over-lapping of the UEs' search spaces as the likely cause of blocking.

Hi

I have published this paper recently

In that paper, we did an abstracted simulation to get an initial result. Now, I need to do a detailed simulation on a network simulator.

So, I need a network simulator that implement or support MQTT-SN or some implementation of MQTT-SN that would work in a network simulator.

Any hints please?

Goal of the theory:

Informational efficiency is a natural consequence of competition, relatively free entry, and low costs of information. If there is a signal, not incorporated in market prices, that future values will be high, competitive traders will buy on that signal. In doing so, they bid the price up, until it fully reflects the information in the signal.. !

Free access to information should prevail on the Internet.

This is the main factor in the dynamic development of many websites, new internet services, the growth of users of social media portals and many other types of websites.

In my opinion, all information that can be publicly disseminated should be available on the Internet without restrictions, universally and free of charge.

Please, answer, comments.

I invite you to the discussion.

Best wishes

According to Shannon in classical information theory H(x) > f(H(x)) for an entropy H(x) over some random variable x, with an unknown function f. Is it possible that an observer that doesn’t know the function (that produces statistically random data) can take the output of the function and consider it random (entropy)? Additionally, if one were to use entropy as a measure between two states, what would that ‘measure’ be between the statistically random output and the original pool of random?

Hello, I need to select for my research paper a researchers, who had wrote research papers/works/articles for journals about how they "see" a single person in Informatology or Information Science.

It is connected with my MA thesis, so answers from this question could help me with my choices. Appreciate every answer!

Hi Francis

Greetings from India

Do you use information theory in your work?

What is the framework you are using for integrating the two?

Thanks in advance

Safeer

Do we loose information when we project a manifold.

For example, do we loose information about the manifold i.e. Earth (Globe) when we project it to a chart in the book (using maybe stereographic, mercator or any other method)

Similarly, we should be loosing information while we create a Bloch sphere for a 2 state system in Quantum Mechanics which is also a Projected space from a higher dimension i.e. 4 dim.

Also, is there a way to quantify this information loss, if there is any?

Apparently, in some countries, they are founded, usually somewhere underground, in specially created bunkers capable of surviving

**and other banks of large collections of information on the achievements of human civilization gathered on***climatic disasters***carriers.***digital data*These are properly secured

**database systems, data warehouses, underground information banks, digitally recorded.***Big Data*The

**themselves can survive various climatic and other calamities for perhaps hundreds or thousands of years.***underground bunkers*But how long will the large collections of

**survive in these***information***systems and data warehouses stored on digital media?***Big Data*Perhaps a better solution would be to write this data analogically on specially created discs?

Already in the 1970s, a certain amount of data concerning the achievements of human civilization was placed on the Pioneer 10 probe sent to space that recently left the solar system and will be

**with the information about human civilization to the Alpha Centauri constellation.***nearest 10,000 year flying*At that time, the amount of data sent to the Universe regarding the achievements of

**was recorded on gold discs.***human civilization*Is there a better form of data storage at the moment when this data should last

**?***thousands of years*Please reply

Best wishes

Given that:

1) Alice and Bob have access to a common source of randomness,

2) Bobs random values are displaced by some (nonlinear) function, i.e. B_rand = F(A_rand).

Are there protocols, which allow for the two to securely agree on the function (or its inverse) without

revealing any information about it?

Black holes cause event horizons, depending on the mass compressed into a narrow space (point?). From this analogy, could the quantity (mass?) of information in a narrow space lead to an insight horizon, which is why we cannot see into it from the outside and therefore no 100 percent simulation of a real system filled with a lot of information can succeed?

The more factors we use to model a system, the closer we get to reality (e.g. ecosystems) - but this process is asymptotic (reality is asymptotically approximated with every additional correct factor). Interestingly, it also seems to us that an object red-shifts into infinity when it approaches a black hole (also asymptotically).

Can we learn anything from this analogy? And if so, what?

Suppose we deal with a dataset with different kind of attributes (numeric and nominal) and binary class. How can we find a unique number as the shannon entropy of this dataset (as a presentation of kolmogorov complexity of this dataset)?

Information is a cornerstone in shaping of mathematical models of Expectations , both Rational and Adaptive . On the one hand, the arrival of intelligent information, the basis in building models of Economic Intelligence (EI) in its forms: American, French and Japanese as models of Economic Intelligence

**Traditional media**in many countries is controlled by

**large media corporations**and partly also by

**governments**of individual countries.

In contrast, on the Internet, in addition to typical news portals,

**new media**are developing, including**social media portals**and independent internet forums, where citizens unconnected with**corporations**and large**media companies**have the opportunity to publicly express themselves and make their views public.In this way, the Internet enables the potential increase of the level of objectivity and independence of the

**media**.Thanks to this, the issue of information dissemination and exchange of views,

**debates**with citizens can be of a more social, civic and objective nature.In view of the above, I am asking you to answer the following question: Does the development of the

**Internet**in your country increase the level of objectivity and independence of the**media**?Please, answer,

**comments**. I invite you to the**discussion**.** Given:

a) Ground truth clusters for a data,

b) Clusters obtained using a clustering algorithm (eg: DBSCAN) when applied on the data after processing it .

** Issue:

**How to evaluate the performance of the clustering technique when applied on a specific data??**

** NMI (Normalized Mutual Information) is a popular external measure to do so. But in cases like below, it gives bad results:

E.g:

Ground_truth = [1,1,1,1,1] ;

DBSCAN_Clusters = [1,1,1,1,2];

nmi = normalized_mutual_info_score(Ground_truth, DBSCAN_Clusters);

*%python code*** The value of the variable "nmi" approximately equal to

**zero**in this case.**

**Here, note that, nmi = 0 in-spite of the fact that DBSCAN (clustering algorithm) has failed to cluster only one cluster member and rest four matches the ground truth.**** This is a typical case when the ground truth contains only one cluster.

** Questions :

**1) Why does this happen?**

**2) Does it mean that clustering algorithm is performing bad?**

**3) Should I use other measures along with NMI ? If so which ones, and what are they for?**

Thanks.

I would like to know how to calculate the entropy of a binary word (I can have words of different sizes, 8, 16, 32, 400 bits). I know about the Shannon Entropy, but it is related to a set, not to an individual.

Can we update the Turing test? It is about time. The Turing test, created in 1950, aims to differentiate humans from robots -- but we cannot, using that test. Bots can easily beat a human in Chess, Go, image recognition, voice calls, or, seems, any test. We can no longer use the Turing test, we are not exceptional.

The relevant aspect of "playing better chess" is that chess is a model of a conversation, a give and take. It is unsetlling that people have difficulty accepting it, it is not a good performance on a conversation. A human who finds it "normal" that computers can pass as a colleague, frequently, and not wonder about the intelligence of that colleague ... or smile? The Turing test has also become an intelligence test, and humans are using bots to beat humans, easily. This is another reason, in ethics, to depreciate this tool and look deeper.

See...

Preprint Consciousness: The 5th Dimension

Consciousness defies definition. We need to understand it, and a metric, to measure it. Can trust provide both, even if in a limiied fashion?

Preprint Consciousness: The 5th Dimension

Hello all I need help please

How to create a function for the Renyi and Shannon formulas?

**RENYI=(1/1-alpha).* log2(sum(q .^alpha))**

**SHANNON=-(sum (q .* log2 (q)))**

thanks in advance

Hi

I have a question regarding high-rate LDPC codes constructions. My research field is not coding but somewhere in my research I need to find an explicit way of constructing LDPC codes with high rate with girth 6 or 8. I think high-rate is not an important factor in coding but in compressed sensing it is of great importance since it provides the main assumption of compressed sensing.

I would like to know whether Cyclic or Quasi-cyclic LDPC codes of girth 6 or 8 can provide high-rate or not? Any suggestion is appreciated!

Thanks

Mahsa

I realize true "information" or a reduction in entropy/uncertainty must be something not already known, but I am asking this question in a way where I want to ignore that facet.

Similarly, is it possible to for absolutely 0 information to exist even in any hypothetical situation besides the one where one already knows this information.

As this Wikipedia article (https://en.wikipedia.org/wiki/Probability_density_function#Link_between_discrete_and_continuous_distributions) describes, we can define probability density functions (pdfs) for discrete random variables using Dirac delta functions, which is called "generalized pdf".

That Wikipedia page has given a general equation for the (generalized) pdf of a discrete random variable which can take N different values among real numbers. The parameters of the distribution are {p_i} for i=1 to N.

I tried to calculate the entropy (differential entropy) of this pdf, and I obtained minus infinity by solving the integral (actually because delta(x)*log(delta(x)) terms appeared in the integral). However, this result does not seem intuitive to me, since the answer is independent of the parameters N and {p_i}. In general, we expect that the entropy gets higher as the distribution gets closer to uniform.

Have I made a mistake in solving the integral? Or the entropy is really independent of distribution parameters in this case?

These 'entropies' depend upon a parameter, which can be varied between two limits. In those limits they reduce to the Shannon-Gibbs and Hartley-Boltzmann entropies. If such entropies did exist they could be derived from the maximum-entropy formalism where the Lagrange multiplier would be identified as the parameter. Then, like all the other Lagrange multipliers, the parameter would have to be given a thermodynamic interpretation as an intensive variable which would be uniform and common to all systems, like the temperature and chemical potential. The Renyi and Havdra-Charvat entropies cannot be derived from the maximum-entropy formalism. Thus, there can be no entropy that can be parameter dependent, and whose parameter would be different for different systems.

Mark Srednicki has claimed to demonstrate the entropy ~ area law -- https://arxiv.org/pdf/hep-th/9303048.pdf

Does anyone know of an independent verification or another demonstration of this result?

Is there a proof of this law?

Here's how information theory has been helping us analyze real customer data sets across different domains:

a) One of the basic ideas of Information theory is that the meaning and nature of data itself does not matter in terms of how much information it contains. Shannon states in his famous paper "A Mathematical Theory of Communication (1948)" that "the semantic aspects of communication are irrelevant to the engineering problem". This enables us to construct our analytical approach around informational measures (Shannon entropy, mutual information for example) and have it to be

**domain and data agnostic**.b) There has been interesting work about the using the "Information bottleneck" concept to uncover the "Deep neural net blackbox".

Original paper here: https://arxiv.org/pdf/1703.00810...
I also recommend this very well written blog post.
https://blog.acolyer.org/2017/11...

Our technology uses a variant approach not only to "autonomously" diagnose our models but to improve their quality and efficiency and subject them to “noise-testing” using these “very generic” measures.

c) Using informational measures for analysis frees us from some of the assumptions that are made in conventional machine learning. We don't assume data to have properties such as independence or that some known probability distribution fits the data.
Here's an article describing some of the practical risks of those assumptionshttps://www.edge.org/response-de...

Our experiments to predict rare events (high-sigma or "black swan") with this approach has shown very impressive results.

**Conclusion**:

Information theory concepts can immensely contribute to machine learning in practice (we have quite a few case studies and success stories of customers benefiting from our platform) and I believe it would provide an even bigger significant foundation for predictive science as we run into harder problems in this space.

For example, suppose I have an observation network consisting of 10,000 gauges. It is easy to calculate the joint (Shannon) entropy of these 10,000 gauges, if these points are all present at the same time.

Now, if each of the single gauge within this network has a very low probability of presence (say p=0.01), which means that at each time we have an network consisting of about 100 gauges, and the ID of these 100 gauges is changing from time to time. If I know the probability of presence (p) of any of these 10,000 gauges, how would I calculate the (expected) joint entropy of this STOCHASTIC network?

I want to find secrecy rate of a main channel in case of Gaussian broadcast channel. e.g. I have a single transmitter and a receiver where eavesdropper wants to eavesdrop the communication over secondary channel. What will be the information rate, so eavesdropper can be kept ignorant of sent message

How, when, what threshold, or is there even a quantitative, or perhaps qualitative property required for the seemingly enormous "transition" where a system which is considered to be "Conscious" become aware that it is "Conscious?"

Successes of communications are more than evident. Information theory is one of the basic sources of the success. Nevertheless, it is frequently called non-constructive (some of sources are attached). If so, what impedes it to be constructive?

S. Haykin, Communication systems, vol. 1, p. 24,

R.L. Dobrushin, V.V. Prelov, https://www.encyclopediaofmath.org//index.php?title=Information_theory&oldid

K.H. Rosen,

https://www.google.pl/search?tbm=bks&hl=pl&q=Handbook+of+Discrete+and+Combinatorial+Mathematics, p.903.

S. Roman, Coding and Information Theory, https://books.google.pl/books?isbn=0387978127, p. 97.

The list can be continued.

PubPeer: May 29, 2017

Unregistered Submission:

(May 25th, 2017 2:46 am UTC)

In this review the authors attempted to estimate the information generated by neural signals used in different Brain Machine Interface (BMI) studies to compare performances. It seems that the authors have neglected critical assumptions of the estimation technique they used, a mistake that, if confirmed, completely invalidates the results of the main point of their article, compromising their conclusions.

Figure 1 legend states that the bits per trial from 26 BMI studies were estimated using Wolpaw’s information transfer rate method (ITR), an approximation of Shannon’s full mutual information channel theory, with the following expression:

Bits/trial = log2N + P log2P + (1-P) log2[(1-P)/(N-1)]

where N is the number of possible choices (the number of targets in a center-out task as used by the authors) and P is the probability that the desired choice will be selected (used as percent of correct trials by the authors). The estimated bits per trial and bits per second of the 26 studies are shown in Table 1 and represented as histograms in Figure 1C and 1D respectively.

Wolpaw’s approximation used by the authors is valid only if several strict assumptions are true: i) BMI are memoryless and stable discrete transmission channels, ii) all the output commands are equally likely to be selected, iii) P is the same for all choices, and the error is equally distributed among all remaining choices (Wolpaw et al., 1998, Yuan et al, 2013; Thompson et al., 2014). The violation of the assumptions of Wolpaw’s approximation leads to incorrect ITR estimations (Yuan et al, 2013). Because BMI systems typically do not fulfill several of these assumptions, particularly those of uniform selection probability and uniform classification error distribution, researchers are encouraged to be careful in reporting ITR, especially when they are using ITR for comparisons between different BMI systems (Thompson et al. 2014). Yet, Tehovnik et al. 2013 failed in reporting whether the assumptions for Wolpaw’s approximation were true or not for the 26 studies they used. Such omission invalidates their estimations. Additionally, the inspection of the original studies reveals the authors failed at the fundamental aspect of understanding and interpreting the tasks used in some of them. This failure led to incorrect input values for their estimations in at least 2 studies.

The validity of the estimated bits/trial and bits/second presented in Figure 1 and Table 1 is crucial to the credibility of the main conclusions of the review. If these estimations are incorrect, as they seem to be, it would invalidate the main claim of the review, which is the low performance of BMI systems. It will also raise doubts on the remaining points argued by the authors, making their claims substantially weaker. Another review published by the same group (Tehovnik and Chen 2015), which used the estimations from the current one, would be also compromised in its conclusions. In summary, for this review to be considered, the authors must include the ways in which the analyzed BMI studies violate or not the ITR assumptions.

References

Tehovnik EJ, Woods LC, Slocum WM (2013) Transfer of information by BMI. Neuroscience 255:134–46.

Shannon C E and Weaver W (1964) The Mathematical Theory of Communication (Urbana, IL: University of Illinois Press).

Wolpaw J R, Ramoser H, McFarland DJ, Pfurtscheller G (1998) EEG-based communication: improved accuracy by response verification IEEE Trans. Rehabil. Eng. 6:326–33.

Thompson DE, Quitadamo LR, Mainardi L, Laghari KU, Gao S, Kindermans PJ, Simeral JD, Fazel-Rezai R, Matteucci M, Falk TH, Bianchi L, Chestek CA, Huggins JE (2014) Performance measurement for brain-computer or brain-machine interfaces: a tutorial. J. Neural Eng. 11(3):035001.

Yuan P, Gao X, Allison B, Wang Y, Bin G, Gao S (2013) A study of the existing problems of estimating the information transfer rate in online brain–computer interfaces. J. Neural Eng. 10:026014.