Science topic

Information Theory - Science topic

An interdisciplinary study dealing with the transmission of messages or signals, or the communication of information. Information theory does not directly deal with meaning or content, but with physical representations that have meaning or content. It overlaps considerably with communication theory and CYBERNETICS.
Questions related to Information Theory
  • asked a question related to Information Theory
Question
4 answers
In recent years there has been a tremendous surge in neuroimaging research, and in my experience the most exciting aspects lie in:
  • exploring how neural systems are able to process and integrate multiple inputs,
  • elucidating how complex neuronal circuits can be understood by computational modelling of simplified models (both static as well as dynamical),
  • elucidating what is the mechanism for synaptic plasticity and the mechanism underlying how brain regions communicate (the neurodevelopmental and plastic brain models of cognitive and computational processing and brain connectivity).
  • Understanding what is the underlying computational basis for the generation of complex neural activity in a given brain region. For example, can neurons with similar inputs and identical synaptic parameters but different weights in a given layer of the brain show a qualitatively distinct neural firing rate pattern?
To understand these kinds of neuronal computations at an integrative level, a systems view is a powerful framework to provide both mechanistic insight, while taking advantage, as a complementary method, also allowing the possibility of modelling the system in a rigorous and detailed way and providing insight into its behaviour. What's your (qualified) opinion?
Relevant answer
Answer
i think the most important future development in neuroscience will be an understanding of how the brain controls focus of attention. This is crucial to an understanding of the operation of the brain and consciousness.
Richard
  • asked a question related to Information Theory
Question
3 answers
Dear all,
Why forward selection search is very popular and widely used in FS based on mutual information such as MRMR, JMI, CMIM, and JMIM (See )? Why other search approaches such as the beam search approach is not used? If there is a reason for that, kindly reply to me.
Relevant answer
Answer
There is three main types of feature selection, filtering methods, wrapper methods, and embedded methods. Filtering methods use criteria based metrics that are independent to the modeling process and uses criteria such as mutual information, correlation or Chi square test to check each feature or a selection of features compared with the target. Other type of filtering methods includes variance thresholding and ANOVA. Wrapper methods uses error rates to help train individual models or subsets of features iteratively to select the critical features. Subsets can be selected Sequential Forward Selection, sequential backwards selection, bidirectional selection or randomally. With selecting features and training they are therefore more computationally expensive than filtering methods. There are heuristic approaches too such as Branch and Bound Search that are non exhausted searches. In some cases filtering methods are used before wrapper methods. Embedded methods includes use of decision trees or random forests for extracting feature importance for deciding which features to select. Overall feedforward, backward and bidrectional methods are stepwise methods for searching for crucial features. In regards to beam search which is more of a graph based heuristic optimization method that is similar to Best first search , that can be seen applied in neural network optimization or tree optimization rather than direct as a feature selection method.
  • asked a question related to Information Theory
Question
3 answers
The general consensus about the brain and various neuroimaging studies suggest that brain states indicate variable entropy levels for different conditions. On the other hand, entropy is an increasing phenomenon in nature from the thermodynamical point of view and biological systems contradict this law for various reasons. This can be also thought of as the transformation of energy from one form to another. This situation makes me think about the possibility of the existence of distinct energy forms in the brain. Briefly, I would like to ask;
Could we find a representation for the different forms of energy rather than the classical power spectral approach? For example, useful energy, useless energy, reserved energy, and so on.
If you find my question ridiculous, please don't answer, I am just looking for some philosophical perspective on the nature of the brain.
Thanks in advance.
Relevant answer
Answer
Hi,
The mitochondrion in cells is a powerhouse of energy. There are some articles on the topics of your interest:
Jeffery KJ, Rovelli C. Transitions in Brain Evolution: Space, Time and Entropy. Trends Neurosci. 2020;43(7):467-474. doi:10.1016/j.tins.2020.04.008
Lynn CW, Cornblath EJ, Papadopoulos L, Bertolero MA, Bassett DS. Broken detailed balance and entropy production in the human brain. Proc Natl Acad Sci U S A. 2021;118(47):e2109889118. doi:10.1073/pnas.2109889118
Carhart-Harris RL. The entropic brain - revisited. Neuropharmacology. 2018;142:167-178. doi:10.1016/j.neuropharm.2018.03.010
Sen B, Chu SH, Parhi KK. Ranking Regions, Edges and Classifying Tasks in Functional Brain Graphs by Sub-Graph Entropy. Sci Rep. 2019;9(1):7628. Published 2019 May 20. doi:10.1038/s41598-019-44103-8
Tobore TO. On Energy Efficiency and the Brain's Resistance to Change: The Neurological Evolution of Dogmatism and Close-Mindedness. Psychol Rep. 2019;122(6):2406-2416. doi:10.1177/0033294118792670
Raichle ME, Gusnard DA. Appraising the brain's energy budget. Proc Natl Acad Sci U S A. 2002;99(16):10237-10239. doi:10.1073/pnas.172399499
Matafome P, Seiça R. The Role of Brain in Energy Balance. Adv Neurobiol. 2017;19:33-48. doi:10.1007/978-3-319-63260-5_2
Engl E, Attwell D. Non-signalling energy use in the brain. J Physiol. 2015;593(16):3417-3429. doi:10.1113/jphysiol.2014.282517
Kang J, Jeong SO, Pae C, Park HJ. Bayesian estimation of maximum entropy model for individualized energy landscape analysis of brain state dynamics. Hum Brain Mapp. 2021;42(11):3411-3428. doi:10.1002/hbm.25442
  • asked a question related to Information Theory
Question
2 answers
If there were the quantum mechanical equivalents of individual neurons and of larger networks of neurons, and if quantum mechanisms of error correction worked on those level, you could get something like consciousness. This is because information could (in principle) flow between neurons - that means you have a mechanism of some sort of distributed computing inside the brain. What's your view?
An alternate (rather elaborate) discussion about the two can be found below. However this particular idea just emerged once I started rethinking about information in general.
Relevant answer
Answer
Navjot Singh I think you are absolutely right to conclude that the key to understanding the operation of the brain is a better understanding of fundamental physics.
However I don’t think quantum theory will help. The Spacetime Wave theory indicates that there are two ways in which neurons can affect each other. One way is direct neuron network connection and the other way is the collective effect of electromagnetic wave action.
Richard
  • asked a question related to Information Theory
Question
4 answers
The current technological revolution, known as Industry 4.0, is determined by the development of the following technologies of advanced information processing: Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence, Business Intelligence and other advanced data mining technologies.
In connection with the above, I would like to ask you:
Which information technologies of the current technological revolution Industry 4.0 contribute the most to reducing the asymmetry of information between counterparties of financial transactions?
The above question concerns the asymmetry of information between such financial transaction partners, such as between borrowers and banks granting loans, and before granting a loan carrying out creditworthiness of a potential borrower and the bank's credit risk level associated with a specific credit transaction and, inter alia, financial institutions and clients of their financial services.
Please reply
Best wishes
Relevant answer
Answer
Information asymmetry between the financial institution offering certain financial services and the client can be reduced through the increase in the use of ICT and Industry 4.0 information technologies for remote, web-based service and concluding transactions. In addition, customers can use social media portals where they share their experiences of using specific financial services.
Best wishes,
Dariusz Prokopowicz
  • asked a question related to Information Theory
Question
42 answers
How to obtain currently necessary information from Big Data database systems for the needs of specific scientific research and necessary to carry out economic, business and other analyzes?
Of course, the right data is important for scientific research. However, in the present era of digitalization of various categories of information and creating various libraries, databases, constantly expanding large data sets stored in database systems, data warehouses and Big Data database systems, it is important to develop techniques and tools for filtering large data sets in those databases data to filter out of terabytes of data only information that is currently needed for the purpose of conducted scientific research in a given field of knowledge, for the purposes of obtaining answers to a given research question and for business needs, eg after connecting these databases to Business Intelligence analytical platforms. I described these issues in my scientific publications presented below.
Do you agree with my opinion on this matter?
In view of the above, I am asking you the following question:
How to obtain currently necessary information from Big Data database systems for the needs of specific scientific research and necessary to carry out economic, business and other analyzes?
Please reply
I invite you to the discussion
Thank you very much
Dear Colleagues and Friends from RG
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Respected Doctor
Big data has three characteristics as follows:
1-Volume
It is the volume of data extracted from a source, which determines the value and capabilities of the data to be classified as big data, and by the year 2020, cyberspace will contain approximately 40,000 megabytes of data ready for analysis and information extraction.
2-Variety
It means the diversity of extracted data, which helps users, whether they are researchers or analysts, to choose the appropriate data for their field of research and includes structured data in databases and unstructured data (such as: images, clips, audio recordings, videos, SMS, call logs, and data). Maps (GPS), and require time and effort to prepare them in a suitable form for processing and analysis.
3-Velocity
It means the speed of producing and extracting data and sending it to cover the demand for it. Speed is a crucial element in making a decision based on this data, and it is the time we take from the moment this data arrives to the moment the decision is made based on it.
There are many tools and techniques that are used to analyze big data, such as: Hadoop, Map Reduce, HPCC, but Hadoop is one of the most famous of these tools. Big data is on several devices and then distributes the processing process to these devices to speed up the processing result and is returned or called as a single package. Tools that deal with big data consist of three main parts:
1- Data mining tools
2- Data Analysis Tools
3- Tools for displaying results (Dashboard).
Its use also varies statistically according to the research objectives (improving education, effectiveness of decision-making, military benefit, economic development, health management ... etc.).
greetings
Senior lecturer
Nuha hamid taher
  • asked a question related to Information Theory
Question
36 answers
What are the important topics in the field: Data analysis in Big Data database systems?
What kind of scientific research dominate in the field of Data analysis in Big Data database systems?
Please reply. I invite you to the discussion
Dear Colleagues and Friends from RG
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Dear B. Dr. Ravishankar,
Today for the answer. Yes, you have indicated a key aspect that determines many of the currently developed analytical applications of Big Data Analytics technology.
Thank you very much,
Best wishes,
Dariusz Prokopowicz
  • asked a question related to Information Theory
Question
3 answers
Question closed an error was found.
Relevant answer
  • asked a question related to Information Theory
Question
4 answers
I would like to have a deeper insight into Markov Chain, its origin, and its application in Information Theory, Machine Learning and automated theory.
Relevant answer
Answer
Yes whilst a Markov chain is a finite state machine, it is distinguished by its transitions being stochastic, i.e. random, and described by probabilities.
you can learn more about here:
Kind Regards
Qamar Ul Islam
  • asked a question related to Information Theory
Question
1 answer
Question closed an error was found.
Relevant answer
Answer
Good question
  • asked a question related to Information Theory
Question
16 answers
The future of marketing development in social media
Marketing in social media is still a very developing field in the field of marketing techniques used on the Internet. On the one hand, some of the largest online technology companies have built their business concept on social media marketing or are entering this field.
On the other hand, there are startups of technology companies acquiring data from the Internet and processing information in Big Data database systems for the purpose of providing information services to other entities as support for strategic and operational management, including planning advertising campaigns.
Therefore, the question arises:
What tools for social media marketing will be developed in the future?
Please, answer, comments
I invite you to the discussion
Relevant answer
Answer
Nowadays, events are much more than mere gatherings. Instead, they are a place where you can promote your brand to spread your business ideas.
In many situations, you can meet like-minded people and form valuable relationships. However, you need to have a viable promotion plan, as well as a way for people to network at the event itself.
  • asked a question related to Information Theory
Question
104 answers
Hello Dear colleagues:
it seems to me this could be an interesting thread for discussion:
I would like to center the discussion around the concept of Entropy. But I would like to address it on the explanation-description-ejemplification part of the concept.
i.e. What do you think is a good, helpul explanation for the concept of Entropy (in a technical level of course) ?
A manner (or manners) of explain it trying to settle down the concept as clear as possible. Maybe first, in a more general scenario, and next (if is required so) in a more specific one ....
Kind regards !
Relevant answer
Answer
Dear F. Hernandes
The Entropy (Greek - ἐντροπία-transformation, conversion, reformation, change) establishes the direct link between MICRO-scopic state (in other words orbital) of some (any) system and its MACRO-scopic state parameters (temperature, pressure, etc).
This is the Concept (from capital letter).
Its main feature – this is the ONLY entity in natural sciences that shows the development trend of any self-sustained natural process. It is the state function; it isn’t the transition function. That is why the entropy is independent from the transition route, it depends only from the initial state A and final state B for any system under consideration. Entropy has many senses.
In the mathematical statistics, the entropy is the measure of uncertainty of the probability distribution.
In the statistical physics, it presents the probability (so-caled *statistical sum*) of the existence of some (given) microscopic state (*statistical weight*) under the same macroscopic characteristics. This means that the system may have different amount of information, the macroscopic parameters being the same.
In the information approach, it deals with the information capacity of the system. That is why, the Father of Information theory Claude Elwood Shannon believed that the words *entropy* and *information* are synonyms. He defined entropy as the ratio of the lost information to the whole of information volume.
In the quantum physics, this is the number of orbitals for the same (macro)-state parameters.
In the management theory, the entropy is the measure of uncertainty of the system behavior.
In the theory of the dynamic systems, it is the measure of the chaotic deviation of the transition routes.
In the thermodynamics, the entropy presents the measure of the irreversible energy loss. In other words, it presents system’s efficiency (capacity for work). This provides the additivity properties for two independent systems.
Gnoseologically, the entropy is the inter-disciplinary measure of the energy (information) devaluation (not the price, but rather the very devaluation).
This way, the entropy is many-sided Concept. This provides unusual features of entropy.
What is the entropy dimension? The right answer depends on the approach. It is dimensionless figure in the information approach (Shannon defined it as the ratio of two uniform values; therefore it is dimensionless by definition). On the contrary, in the thermodynamics approach it has a dimension (energy to temperature J/K)
Is entropy parameter (fixed number) or this is a function? Once again, the proper answer depends on the approach (point of view). It is a number in the mathematical statistics (logarithm of the number of the admissible (unprohibited) system states, well-known sigma σ). At the same time, this is the function in the quantum statistics. Etc., etc.
So, be very cautious when you are operating with entropy.
Best wishes,
Emeritus Professor V. Dimitrov vasili@tauex.tau.ac.il
  • asked a question related to Information Theory
Question
7 answers
An interesting thing is the algorithm according to which specific search results appear in a Google search engine based on a given password.
Formulas of this type of algorithms can be variously constructed so that different search results can be obtained according to the same password.
Added to this is the issue of promoting search results for companies that have made certain fees for a high level of positioning in search results. Unfortunately, this is not an objective process of finding information available on the Internet but a formula based on commercial marketing. In this situation, there is a question about competitiveness, which is limited in this way.
In view of the above, I am asking you: Does Google's search engine algorithm restrict competition in the availability of information on the Internet?
Please, answer, comments. I invite you to the discussion.
Relevant answer
Answer
As part of the technological development of the web browser that has taken place since the late 1990s, the importance of the business and / or marketing factor in search algorithms is growing.
Greetings,
Dariusz Prokopowicz
  • asked a question related to Information Theory
Question
21 answers
What kind of scientific research dominate in the field of Functionality and applications of smartphones?
Please, provide your suggestions for a question, problem or research thesis in the issues: Functionality and applications of smartphones.
Please reply.
I invite you to the discussion
Thank you very much
Best wishes
Relevant answer
Answer
Privacy... Smartphones are becoming some of our most trusted computing devices. People use them to store highly sensitive information including email, passwords, financial accounts, and medical records... Huang, Y., Chapman, P., & Evans, D. (2011, August). Privacy-Preserving Applications on Smartphones. In HotSec.
  • asked a question related to Information Theory
Question
3 answers
I have been pondering about the relationship between these two important topics of our data-driven world for a while. I have bits and pieces, but I have been looking forward to find a neat and systematic set of connections that would somehow (surprisingly) bind them and fill the empty spots I have drawn in my mind for the last few years.
In the past, while I was dealing with multi-class classification problem (not so long ago), I have come to realize that multiple binary classifications is a viable option to address this problem through using error correction output coding (ECOC) - a well known coding technique used in the literature whose construction requirements are a bit different than classical block or convolutional codes. I would like to remind you that grouping multiple classes in two superclasses (a.k.a. class binarization) can be addressed in various ways. You can group them totally randomly which does not dependent on the problem at hand or based on a set of problem-dependent constraints that can be derived from the training data. One way I like the most stays at the intersection point of information theory and machine learning. To be more precise, class groupings can be done based on the resultant mutual information to be able to maximise the class separation. In fact, the main objective with this method is to maximise class separation so that your binary classifiers expose less noisy data and hopefully result in better performance. On the other hand, ECOC framework calls for coding theory and efficient encoder/decoder architectures that can be used to efficiently handle the classification problem. The nature of the problem is not something we usually come across in communication theory and classical coding applications though. Binarization of classes implies different noise and defect structures to be inserted into the so called "channel model" which is not common in classical communication scenarios. In other words, the solution itself changes the nature of the problem at hand. Also the way we choose the classifiers (such as margin-based, etc) will affect the characterization of the noise that impacts the detection (classification) performance. I do not know if possible, but what is the capacity of such a channel? What is the best code structure that addresses these requirements? Even more interestingly, can the recurrent issues of classification (such as overfitting) be solved with coding? Maybe we can maintain a trade-off between training and generalization errors with an appropriate coding strategy?
Similar trends can be observed in the estimation theory realm. Parameter estimations or in the same way "regression" (including model fitting, linear programming, density estimation etc) can be thought as the problems of finding "best parameters" or "best fit", which are ultimate targets to be reached. The errors due to the methods used, collected data, etc. are problem specific and usually dependent. For instance, density estimation is a hard problem in itself and kernel density estimation is one of its kind to estimate probability density functions. Various kernels and data transformation techniques (such as Box-Cox) are used to normalize data and propose new estimation methods to meet today's performance requirements. To measure how well we do, or how different distributions are we again resort to information theory tools (such as Kullback–Leibler (KL) divergence and Jensen-Shannon function) and use the concepts/techniques (including entropy etc.) therein from a machine learning perspective. Such an observation separates the typical problems posed in the communication theory arena from the machine learning arena requiring a distinct and careful treatment.
Last but not the least, I think that there is deep rooted relationship between deep learning methods (and many machine learning methods per se) and basic core concepts of information and coding theory. Since the hype for deep learning has appeared, I have observed that many studies applying deep learning methods (autoencoders etc) for decoding specific codes (polar, turbo, LDPC, etc) claiming efficiency, robustness, etc thanks to parallel implementation and model deficit nature of neural networks. However, I am wondering the other way around. I wonder if, say, back-propagation can be replaced with more reasonable and efficient techniques very well known in information theory world to date.Perhaps, distortion theory has something to say about the optimal number of layers we ought to use in deep neural networks. Belief propagation, turbo equalization, list decoding, and many other known algorithms and models may have quite well applicability to known machine learning problems and will perhaps promise better and efficient results in some cases. I know few folks have already began searching for neural-network based encoder and decoder designs for feedback channels. There are many open problems in my oppinion about the explicit design of encoders and use of the network without the feedback. Few recent works have considered various areas of applications such as molecular communications and coded computations as means to which deep learning background can be applied and henceforth secure performances which otherwise cannot be achieved using classical methods.
In the end, I just wanted to toss few short notes here to instigate further discussions and thoughts. This interface will attract more attention as we see the connections clearly and bring out new applications down the road...
Relevant answer
Answer
I've been having similar random thoughts over the two topics. As a matter of fact, I'd like to think about learning in the more genernal sense, not limited to machines. But when I put keywords like 'coding theory', 'learning' etc. in google, most results are just about applying some information theoretical techniques in machine learning, while I'm looking for a deeper connection help me understand learning better. And your post is seemingly the closest thing to what I want.
To briefly summarise my idea, I think we can treat learning as encoding, similar to the last point brought up in your post. I have to admit my ignorance but I haven't found any works studying learning using the framework of coding theory, rather than just borrowing some convenient tools. You may have dug into the literature more since your post, please direct me to the right works/authors if you have found relevant materials.
I don't have a background in information theory, but I guess I know some naive basics of it. Many artificial neural networks can perform a denoising or pattern completion task -- Isn't that impossible from a information theoretical point of view? Why an output can ever be the 'denoised' version of any noisier input? Of course this is a stupid question, but it led me to realise that learning/training is like encoding and testing/responding is like decoding. Then I had to accept that a learning system with all its training data forms this information pathway that has a long (even permanent) lifespan, which should be shorter than the changing rate of the regularities underlying the data. Specifically, learning is a process for the system to compress the aggregated noise in the training data (coding types other than compression would be more fun, but I'm not discussing it here), it considers this as information and incorporates it into its learnable parameters (these things live longer than individual datum), and as a successful outcome the system becomes capable of denoising a test sample, which is in some sense similar to decoding an encrypted message with the correct codebook. In other words, I can think of learning as a procedure of the system minimising its lifetime entropy by data fitting. This idea is evidently hinted by the common use of error minimisation in terms of mimising loglikelihoods in machine learning, but was clearly spelt out in Smolensky's Harmonium, which is slightly different from Hinton's restricted Boltzmann machine in the goal of optimisation (involving entropy). Unfortunately I'm not experienced enough to explain the technical details.
From my perspective, I consider this research direction extremely important and relevant when it comes to continual learning. In a more classical, static data fitting or machine learning scenario, in theory the learning system could be embracing all the training data at the same time. Minimising lifetime system entropy is equivalent to reduce system uncertainty with respect to the training data at the exact moment it encounters data. However, this is clearly a non-realistic assumption for humans and for many AI applications. A more realistic data stream is more dynamic, and at each moment the system could only observe partially the data. Evidently if an artificial neural network tries only to optimise itself with respect to this imperfect information, it suffers from catastrophic forgetting. So people start to tweak the learning rules or the regularisers, etc. in order to fix the problem. I do similar things, too, but I feel a lack of theoretical guidance, as I consider there should be some information theoretical quantification of the difficulty of continual learning tasks (there are some primary but arbitrary classification now), at least for artificial tasks.
In summary, I believe an updated version for coding theory is needed for studying continual learning, because in this scenario the channel capacity of a learning system has to be affected by more than its instantaneous parameter (including structure) configuration, but additionally an integral over time of these parameters.
  • asked a question related to Information Theory
Question
3 answers
I would like to know if there is an expression that shows the (maximum) channel capacity of a downlink multiuser MIMO channel when imperfect CSI is assumed.
Any references in this direction would be useful for me. Thanks!
Relevant answer
Answer
I can give you a conceptual answer and then you can build on it.
The ergodic channel capacity C is determined by the Shannon theorem such that
C=B log2 ( 1+ r/ N)
B is the bandwidth,
r the recieved signal power at the input of the receiver
N is the noise power
We can express r in terms of the channel gain h and S the sent signal power suh that r=S/h. The channel gain h in case of imperfect estimation can be expressed by h +dh
where dh is the error in determining h
C/B= channel spectral density= log2( 1+ Sl(h+dh)N))
Finally SD= log2( 1+ S(1+dh/h) /hN)
= log2( 1 +r/N ( 1+dh/h),
So the noise will increase by the relative channel error.
One can apply this formula on all MIMO channels and sum sup for all cahnnels.
Best wishes
  • asked a question related to Information Theory
Question
3 answers
Hi, How can we calculate the entropy of  chaotic signals? Is there a simple method or formula for doing this?
Relevant answer
  • asked a question related to Information Theory
Question
3 answers
Greetings,
I am working on my Grad Project implementing an LDPC DVB-S2 decoder. and the best resources explaining the LDPC decoding I found -unfortunately- follows the 5G standard. So, If I followed along with these resources discussing the 5G implementation, What should I look out for so not to get confused between the 2 implementations?
Thanks in advance!
Relevant answer
Answer
welcome,
Conceptually the encoding and the decoding techniques are the same for LDPC for the two applications. The difference may be in the code rate which is k/n with k is the message length and n is the code length. In addition, the size of the block may be different among the two standards.
You can adopt the method used for LTE provided that it satisfies your requested performance parameter.
May be the major difference between the block size and the encoding and the decoding time in two standards. So, the computing platform required may have different ratings. So, you have to take these differences into consideration from all the beginning. You cam make this clear to some extent by simulation experiments.
Best wishes
  • asked a question related to Information Theory
Question
13 answers
Is there an equation connecting the wave function and the entropy of the quantum system?
Relevant answer
Answer
Quantum theory allows us to assign a finite value to entropy and calculate it as a function of Planck's constant. Constant entropy is included in the calculation and this allows us to quantify the predictions of quantum theory. The second law of thermodynamics establishes the existence of entropy as a function of the state of the thermodynamic system, that is, "the second law is the law of entropy." In an isolated system, the entropy either remains unchanged or increases (in nonequilibrium processes), reaching a maximum when thermodynamic equilibrium is established (the law of increasing entropy). Different formulations of the second law of thermodynamics found in the literature are specific consequences of the law of increasing entropy
  • asked a question related to Information Theory
Question
4 answers
I have the following data set (attached) and I would like to calculate mutual information and joint entropy between multiple columns (like for A,B,D,E or C,D,E,F,G etc.). I have gone through R package entropy and other related packages. but as I am very new to information theory, I am having some problem to compute it.
I am specifically looking for R code or online calculator options to calculate this.
Relevant answer
Answer
Interesting question
  • asked a question related to Information Theory
Question
7 answers
Can we affirm that whenever one has a prediction algorithm, one can also get a correspondingly good compression algorithm for data one already have, and vice versa?
Relevant answer
Answer
There is some correlation between compression and perdition. Prediction is a tool of compression. Assume you have data and you you have redundancy in it you can predict the redundancy from the context of the signal and remove the redundancy by simply subtracting the the predicted signal from the real signal.
The difference will be the compressed signal.
The prediction is a powerful concept to reduce the redundancy in the signals and consequently compress it.
prediction is used intensively in video codecs and other signal codecs.
Best wishes
  • asked a question related to Information Theory
Question
4 answers
Please consider a set of pairs of probability measures (P, Q) with given means (m_P, m_Q) and variances (v_P, v_Q).
For the relative entropy (KL-divergence) and the chi-square divergence, a pair of probability measures defined on the common two-element set (u_1, u_2) attains the lower bound.
Regarding general f-divergence, what is the condition of f such that a pair of probability measures defined on the common two-element set attains the lower bound ?
Intuitively, I think that the divergence between localized probability measures seems to be smaller.
Thank you for taking your time.
Relevant answer
Answer
  • asked a question related to Information Theory
Question
92 answers
Dear researchers,
Let's share our opinion about recent attractive topics on communication systems and the potential future directions.
Thanks.
Relevant answer
Answer
FANET
  • asked a question related to Information Theory
Question
6 answers
By definition, the capacity of a communication channel is given by the maximum of the mutual information between the input (X) and output (Y) of the channel, where the maximization is with respect to the input distribution, that is C=sup_{p_X(x)}MI(X;Y).
From my understanding (please correct me if I'm wrong), when we have a noisy channel, such that some of the input symbols may be confused in the output of the channel, we can draw a confusability graph of such a channel where nodes are symbols and two nodes are connected if and only if they could be confused in the output.
If we had to communicate using messages made out of single symbols only, then the largest number of messages that could be sent over such a channel would be α(G), the size of the largest independent set of vertices in the graph (in this case Shannon capacity of the graph equals independence number of that graph α(G)).
Does this mean that for such a channel, the maximum mutual information of the input and output of the channel (channel capacity) is α(G), and it is achieved by sending the symbols of the largest independent set?
Relevant answer
Answer
Hello Amirhossein Nouranizadeh You have an interesting question to discuss, but first you probably need to think again about your introductory text.
" when we have a noisy channel, such that some of the input symbols may be confused in the output of the channel "
It's not exactly like this. First of all, any channel is noisy, otherwise it's not real, or you do not need to communicate because everything is known with no uncertainty.
There is no such thing as the error-free zone of the encoding aplhabet and the error-free zone of the decoding alphabet. Otherwise it means that the job has not been done.
Take the case you want to transmit one bit b. It takes values 0 or 1.
Assume that the sender sends b, the receiver receives b', at the other end of the tranmission channel.
The channel is noisy (otherwise it's not real...) then the probability that b'=b is less than 1: P(b'=b) <1.
The probability of error is P(b' # b)= 1 - P(b'=b) >0
Clear?
That's how life is.
Now instead of sending a single bit b, you send a vector v, and you receive a vector v' (in channel language they speak of "words" but a word is avector, that's the same)
Then error probability is P(v' # v) = 1- P(v'=v).
You should not assume that there are immune vectors and contaminable vectors. If the channel coding is done properly, it spreads the risk evenly, usually.
There are other strategies. In the first Digital Mobile Communication Codec (GSM first generation) there was however a somewhat different structure:
-hierarchy of parameters (say projection of vectors v on subspaces V1, V2, V3, etc, for instance the first two characters of one words, the the following two, etc)
-coding with robustness herarchy
Protect V1 more than V2, protect V2 more than V3.
-decoding with hierarchy of error protection
Then decoding after transmission of v sent gives at receiver v', reconstructed from projections v'1, v'2, v'3. Then v'1 has lower probability of error than v'2, which has lower probability of error than v'3.
With such as scheme, you give more protection resource to what is more dramatic to lose.
Imagine it's the remote control of a car: going forward + or going - backward is more crucial information than the precision on the geometric angle of the movement. So you protect it more.
I hope that with the above you get a concrete sense of what is happening at a sender, on a channel, and at a receiver.
Does it help you?
  • asked a question related to Information Theory
Question
13 answers
What in your opinion will the applications of the technology of analyzing big information collections in Big Data database systems be developed in the future?
In which areas of industry, science, research, information services, etc., in your opinion, will the applications of technology for the analysis of large collections of information in Big Data database systems be developed in the future?
Please reply
I invite you to the discussion
I described these issues in my publications below:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Dear Shafagat Mahmudova, Len Leonid Mizrah, Reema Ahmad, Shah Md. Safiul Hoque, Natesan Andiyappillai, Omar El Beggar, Tiroyamodimo Mmapadi Mogotlhwane, Thank you very much for participating in this discussion and providing inspiring and informative answers to the above question: What will Big Data be like in the future? Thank you very much for the interesting information and inspiration to continue deliberations on the above-mentioned issues. This discussion confirms the importance of the above-mentioned issues and the legitimacy of developing research on this subject. I also believe that the Big Data Analytics analytical and database technology is one of the most developing technologies included in Industry 4.0. What do you think about it?
Thank you very much and best regards,
Dariusz Prokopowicz
  • asked a question related to Information Theory
Question
6 answers
The development of IT and information technologies increasingly affects economic processes taking place in various branches and sectors of contemporary developed and developing economies.
Information technology and advanced information processing are increasingly affecting people's lives and business ventures.
The current technological revolution, known as Industry 4.0, is determined by the development of the following technologies of advanced information processing: Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence, Business Intelligence and other advanced data mining technologies.
In connection with the above, I would like to ask you:
How to measure the value added in the national economy resulting from the development of information and IT technologies?
Please reply
Best wishes
Relevant answer
Answer
Dear Tarandeep Anand, Reza Biria, Krishnan M S, Thank you very much for participating in this discussion and providing inspiring and informative answers to the above question: How to measure the value added in the national economy resulting from the development of information and IT technologies? Thank you very much for your inspiring, interesting and highly substantive answer.
Thank you very much and best regards,
Dariusz Prokopowicz
  • asked a question related to Information Theory
Question
6 answers
In information theory, the entropy of a variable is the amount of information contained in the variable. One way to understand the concept of the amount of information is to tie it to how difficult or easy it is to guess the value. The easier it is to guess the value of the variable, the less “surprise” in the variable and so the less information the variable has.
Rényi entropy of order q is defined for q ≥ 1 by the equation,
S = (1/1-q) log (Σ p^q)
As order q increases, the entropy weakens.
Why we are concerned about higher orders? What is the physical significance of order when calculating the entropy?
Relevant answer
Answer
You may look at other entropies, search articles from Prof. Michèle Basseville on entropy (of probability measures)
  • asked a question related to Information Theory
Question
3 answers
Normalized Mutual Information (NMI) and B3 are used for extrinsic clustering evaluation metrics when each instance (sample) has only one label.
What are equivalent metrics when each instance (sample) has only one label?
For example, in first image, we see [apple, orange, pears], in second image, we see [orange, lime, lemon] and in third image, we see [apple], and in the forth image we see [orange]. Then, if put first image and last image in the one cluster it is good, and if put third and forth image in one cluster is bad.
Application: Many popular datasets for object detection or image segmentation have multi labels for each image. If we used this data for classification (not detection and not segmentation), we have multiple labels for each image.
Note: My task is unsupervised clustering, not supervised classification. I know that for supervised classification, we can use top-5 or top-10 score. But I do not know what will be in unsupervised clustering.
Relevant answer
Answer
  • asked a question related to Information Theory
Question
14 answers
The information inside the volume of black hole is proportional to its surface.
However, what if information does not cross the horizon, rather it is constrained to stay on the horizon's surface, progressively increasing the black hole radius? What if the black hole is empty, and its force comes just from a spacetime distorption inside it? Reversing Einstein, what if the black hole's attraction is not caused by its mass, but just by the spacetime deformation inside it? This would explain the paradoxes of the holographic principle...
Thanks
Clues: material isn’t doomed to be sucked into the hole. Only a small amount of it falls in, while some of it is ejected back out into space.
Relevant answer
Answer
"The Schwarzschild radius is a physical parameter that shows up in the Schwarzschild solution to Einstein's field equations, corresponding to the radius defining the event horizon of a Schwarzschild black hole".
Call it horizon, call it entropy as you desire, my question is the same: could it be that information from outside the black hole reaches just this external radius, accumulating on this surface, rather than entering INSIDE the radius? If inside the radius there is no information, it explains why the entropy of the black hole is proportional to the area of the entropy (as you call it), and not to the volume.
  • asked a question related to Information Theory
Question
3 answers
"Claims of experience" are autobiographical and semiotic. Yet they offer glimpses into the unique world of the individual. In a sense an empathic window into the "soul" of the experience. How does one sift out the exaggerations and elicit the "truth" when what is the truth is not even certain if the experience is reported for the first time?
Relevant answer
Answer
Samy Azer highlights a key element in the design phase - the inclusion of discussion. This is critical to promote scrutiny and consensus around trustworthiness.
  • asked a question related to Information Theory
Question
3 answers
For compressive sensing (CS) , we can use fewer M measurements to reconstruct original N-dimension signal, where M<<N and the measurement matrix satisfies the restricted isometry property (RIP). Can we combine the concept of entropy in information theorem with CS? Intuitively speaking, the data is successfully reconstructed and information will not get lost before and after doing CS. Can we claim that the entropy of the compressed measurements is equal to the entropy of the original signal because entropy stands for contained information?
To understand my problem more easily, I take a example below:
Supposed in a data gathering wireless sensor network, we deploy N machines in the area. To quantify the amount of information collected by each machine, we assume a Gaussian source field, where the collection of data gathered by all machines is assumed to follow a multi-variate Gaussian distribution ~N(\mu, \Sigma). The joint entropy of all data is H(X). NOW we can use M measurements to reconstruct these data by CS. The joint entropy of these M data is H(Y). Can we say H(X) will equal to H(Y)?
Thanks for your response.
Relevant answer
Answer
I will excite the issue from another point of view. If we assume N transmitters in an area. Not all the N transmitters will be active at the same time. If we assume that M transmitters will be active at a time interval T, then one needs at least M samples to detect the active transmitters. More samples will be redundant and help signify the active sources.
This is from physical point of view.
In principle you do not need to make samples than the M-samples to predict the sate of the N senders as N-M are off. The comprehensive sensing is based on this fact.
Best wishes
  • asked a question related to Information Theory
Question
9 answers
In the question, Why is entropy a concept difficult to understand? (November 2019) Franklin Uriel Parás Hernández commences his reply as follows: "The first thing we have to understand is that there are many Entropies in nature."
His entire answer is worth reading.
It leads to this related question. I suspect the answer is, yes, the common principle is degrees of freedom and dimensional capacity. Your views?
Relevant answer
Answer
Of course, I have no satisfactory answer to all the questions posed above.
I attache two papers: One is dealing with the connection between phenomenological non-equilibrium thermodynamics and stochastic thermodynamics. The second claimes that entropy is not uniquely defined, or even replacable by better defined quantities. But up to now, the question is open,
if there is a common background of all these different definitions of "entropy".
  • asked a question related to Information Theory
Question
3 answers
While studying information theory we do not consider any directionality of the channel.
There will be no change if Reciever and the Transmitter are interchanged (i.e. Lorentz Reciprocity is obeyed).
However, suppose if the channel is a non-reciprocal device like a isolator or a Faraday rotator, rather than a simple transmission cable. What are the consequences on the Information theory?
What would be consequences on Shannon Entropy and several theorems like Shannon-Coding theorem, Shannon-Hartley theorem etc. I have been googling with several terms like Non-reciprocal networks etc, but I have not been able to find anything. Any help will be appreciated.
Relevant answer
Dear Chetan,
welcome,
You touched an important point which is the channel reciprocity.
The Shannon channel capacity does not require reciprocity. It handels a communication medium which carry information signal from source to destination and namely it give the limit of the transmission speed in bits per second which is called the channel; capacity. So it handles the rate of data transmission in one direction. Such transmission mode is called simplex.
There is the half duplex which affect transmission in both directions but at different time slots. This is related to the physical ability of the channel.
So, reciprocity is an additional property of the channel independent of the channel capacity which is defined only in one direction.
There is also full duplex in this case one uses two channels one for the forward and one for the backward channels.
This is my opinion about the two properties. The channel capacity and the reciprocity.
Best wishes
  • asked a question related to Information Theory
Question
4 answers
We very frequently use cross-entropy loss in neural networks. Cross-entropy originally came from information theory about entropy and KL divergence.
My question is that.. if I want to design a new objective function, does it always need to satisfy information theory?
For example, in my objective function, I want to add a probability measure of something, say A, to cross-entropy loss. A ranges from 0 to 1. So, the objective function will look like this:
= A + (cross-entropy between actual and prediction)
= A + (-(actual)*log(prediction))
Say, the above objective function works good for neural networks, but violate information theory in the sense that we are adding a probability value, A, to a loss value, which is cross-entropy: (-(actual)*log(prediction))
So, my question is that.. even if it violates loss evaluation from the viewpoint information theory, is it acceptable as an objective function for neural networks?
Relevant answer
Answer
Dear Md Sazzad Hossain,
From my experience, it is to define the objective function taking into account the formulation of the problem to be solved and if necessary adapted them to the mathematical conditions.
For more details and information about this subject, i suggest you to see links on topic.
Best regards
  • asked a question related to Information Theory
Question
2 answers
When Carrier Aggregation and cross-carrier scheduling are applied in LTE-Advanced system, UE may support multiple Component Carriers (CC) and control information on one CC can allocate radio resource on another CC. Search space of all CCs and control information is only transmitted from a chosen CC. In this case, if search spaces of different CCs are not properly defined, high blocking probability of control information will be very harmful to system performance.
My Question is: What is the cause of this blocking, is it deficiency of control channel elements to served, scheduled UE or what?
My guess is not but I have no proof of this. Can any expert help?
For now, I assume either self-overlapping or high mutual over-lapping of the UEs' search spaces as the likely cause of blocking.
Relevant answer
Answer
The main responsible guys behind Blocking users are Available CCEs and the Schedular one design to get the user fit into this If the hash function gives the indices which are already occupied then user can't get scheduled resulting in getting blocked. I hope the mentioned point makes sense to understand the reason for blocking users. Kindly refer to this to understand clearly.
  • asked a question related to Information Theory
Question
5 answers
Hi
I have published this paper recently
In that paper, we did an abstracted simulation to get an initial result. Now, I need to do a detailed simulation on a network simulator.
So, I need a network simulator that implement or support MQTT-SN or some implementation of MQTT-SN that would work in a network simulator.
Any hints please?
Relevant answer
Answer
Hello,
Any network simulator, e.g., Netsim, ns2 or any IoT simulator.
  • asked a question related to Information Theory
Question
5 answers
Goal of the theory:
Informational efficiency is a natural consequence of competition, relatively free entry, and low costs of information. If there is a signal, not incorporated in market prices, that future values will be high, competitive traders will buy on that signal. In doing so, they bid the price up, until it fully reflects the information in the signal.. !
Relevant answer
Answer
Timira Shukla Thx, But Inbound is a method of attracting, engaging, and delighting people to grow a business that provides value and builds trust. As technology shifts, inbound guides an approach to doing business in a human and helpful way. Inbound is a better way to market, a better way to sell, and a better way to serve your customers. Because when good-for-the-customer means good-for-the-business, your company can grow better over the long term.
But how or what strategy is best suited for timely information?!
  • asked a question related to Information Theory
Question
21 answers
Free access to information should prevail on the Internet.
This is the main factor in the dynamic development of many websites, new internet services, the growth of users of social media portals and many other types of websites.
In my opinion, all information that can be publicly disseminated should be available on the Internet without restrictions, universally and free of charge.
Please, answer, comments.
I invite you to the discussion.
Best wishes
Relevant answer
Answer
The issue of the possibility of publishing certain content, texts, banners, comments, etc. on the Internet and free information gathering are the key determinants of the development of information services on the Internet. On the other hand, the largest online technology corporations receive revenues mainly from paid marketing services. The Internet environment is therefore a kind of a mix of free and paid information and marketing services, which simultaneously, simultaneously and simultaneously in a mutually connected way are developed by various Internet companies.
Best wishes
  • asked a question related to Information Theory
Question
4 answers
According to Shannon in classical information theory H(x) > f(H(x)) for an entropy H(x) over some random variable x, with an unknown function f. Is it possible that an observer that doesn’t know the function (that produces statistically random data) can take the output of the function and consider it random (entropy)? Additionally, if one were to use entropy as a measure between two states, what would that ‘measure’ be between the statistically random output and the original pool of random?
Relevant answer
Answer
@Nader Chmait
I imagine the approximate entropy notion may apply as an exercise of calculation. I am not sure the entropy and the approximate entropy reflect the same meaning.
The approximate entropy seems closer to the statistics, but Shannon entropy is closer to information content.
  • asked a question related to Information Theory
Question
4 answers
Hello, I need to select for my research paper a researchers, who had wrote research papers/works/articles for journals about how they "see" a single person in Informatology or Information Science.
It is connected with my MA thesis, so answers from this question could help me with my choices. Appreciate every answer!
Relevant answer
Answer
Hi,
Maybe this paper could help:
Toward an understanding of the dynamics of relevance judgment: An analysis of one person's searchbehavior By: Tang, R; Solomon, P INFORMATION PROCESSING & MANAGEMENT Volume: 34 Issue: 2-3 Pages: 237-256 Published: MAR-MAY 1998
  • asked a question related to Information Theory
Question
7 answers
Hi Francis
Greetings from India
Do you use information theory in your work?
What is the framework you are using for integrating the two?
Thanks in advance
Safeer
Relevant answer
Answer
Yes i ues it in my work
  • asked a question related to Information Theory
Question
8 answers
Do we loose information when we project a manifold.
For example, do we loose information about the manifold i.e. Earth (Globe) when we project it to a chart in the book (using maybe stereographic, mercator or any other method)
Similarly, we should be loosing information while we create a Bloch sphere for a 2 state system in Quantum Mechanics which is also a Projected space from a higher dimension i.e. 4 dim.
Also, is there a way to quantify this information loss, if there is any?
Relevant answer
Answer
when we project (Earth) to a chart in the book, if the transformation is diffeomorphism we will have a minor copy of the earth, some details may not be clear to be seen in the graph but this does't mean they are lost.
  • asked a question related to Information Theory
Question
11 answers
Apparently, in some countries, they are founded, usually somewhere underground, in specially created bunkers capable of surviving climatic disasters and other banks of large collections of information on the achievements of human civilization gathered on digital data carriers.
These are properly secured Big Data database systems, data warehouses, underground information banks, digitally recorded.
The underground bunkers themselves can survive various climatic and other calamities for perhaps hundreds or thousands of years.
But how long will the large collections of information survive in these Big Data systems and data warehouses stored on digital media?
Perhaps a better solution would be to write this data analogically on specially created discs?
Already in the 1970s, a certain amount of data concerning the achievements of human civilization was placed on the Pioneer 10 probe sent to space that recently left the solar system and will be nearest 10,000 year flying with the information about human civilization to the Alpha Centauri constellation.
At that time, the amount of data sent to the Universe regarding the achievements of human civilization was recorded on gold discs.
Is there a better form of data storage at the moment when this data should last thousands of years?
Please reply
Best wishes
Relevant answer
Answer
Theoretically thousands years unless unexpected disasters occur...
  • asked a question related to Information Theory
Question
5 answers
Given that:
1) Alice and Bob have access to a common source of randomness,
2) Bobs random values are displaced by some (nonlinear) function, i.e. B_rand = F(A_rand).
Are there protocols, which allow for the two to securely agree on the function (or its inverse) without
revealing any information about it?
Relevant answer
Answer
Basically, there are three main steps for secure key generation based physical layer properties . These are 1) randomness extraction 2) reconciliation 3) privacy amplification .
We usually refer to key agreement by the reconciliation where some correction techniques can be used such as LDPC , cascaded , .... .
Due to the leak of the information in the reconciliation step , the privcy amplification can be utilized by the mean of some functions like universal hashing function
  • asked a question related to Information Theory
Question
15 answers
Black holes cause event horizons, depending on the mass compressed into a narrow space (point?). From this analogy, could the quantity (mass?) of information in a narrow space lead to an insight horizon, which is why we cannot see into it from the outside and therefore no 100 percent simulation of a real system filled with a lot of information can succeed?
The more factors we use to model a system, the closer we get to reality (e.g. ecosystems) - but this process is asymptotic (reality is asymptotically approximated with every additional correct factor). Interestingly, it also seems to us that an object red-shifts into infinity when it approaches a black hole (also asymptotically).
Can we learn anything from this analogy? And if so, what?
Relevant answer
Answer
AK: Experiments on Bell inequalities have been done only on highly local experimental setups (to my knowledge merely couple hundred kilometers).
As long as the time between the measurements is less than the light travel time between the locations, that's enough.
AK: As long as one may assume that universe unfolds some sort of "cosmic wide Schrödinger equation" where all waves interlace, one can not be certain there are no internal "hidden variables", as randomness might be apparent.
If the measurements are determined by hidden variables carrying a random decision made at the time of creation of the entanglement, the statistics would be different from those for a random decision made at the time of measurement. That was the remarkable originality of Aspects experiments and all the tests have favoured the latter to the point where I think it has been accepted that hidden variables are comprehensively ruled out.
Whether that exclusively implies non-determinism is a more complex topic and perhaps more dependent on your preferred interpretation. I'm not fully convinced yet but it looks that way.
  • asked a question related to Information Theory
Question
16 answers
Suppose we deal with a dataset with different kind of attributes (numeric and nominal) and binary class. How can we find a unique number as the shannon entropy of this dataset (as a presentation of kolmogorov complexity of this dataset)?
Relevant answer
Answer
Can we use Shannon Entropy to measure 'consensus' of the various statements/questions in a Delphi study?
  • asked a question related to Information Theory
Question
2 answers
Information is a cornerstone in shaping of mathematical models of Expectations , both Rational and Adaptive . On the one hand, the arrival of intelligent information, the basis in building models of Economic Intelligence (EI) in its forms: American, French and Japanese as models of Economic Intelligence
Relevant answer
Thank you very much for your useful scientific efforts
  • asked a question related to Information Theory
Question
28 answers
Traditional media in many countries is controlled by large media corporations and partly also by governments of individual countries.
In contrast, on the Internet, in addition to typical news portals, new media are developing, including social media portals and independent internet forums, where citizens unconnected with corporations and large media companies have the opportunity to publicly express themselves and make their views public.
In this way, the Internet enables the potential increase of the level of objectivity and independence of the media.
Thanks to this, the issue of information dissemination and exchange of views, debates with citizens can be of a more social, civic and objective nature.
In view of the above, I am asking you to answer the following question: Does the development of the Internet in your country increase the level of objectivity and independence of the media?
Please, answer, comments. I invite you to the discussion.
Relevant answer
Answer
In case of the Philippines, there exist a state-sponsored perpetration of disinformation. The government use social media influencers to sway the opinion of the uninformed Internet users. Worse, they create troll accounts to frame support to the administration.
  • asked a question related to Information Theory
Question
6 answers
** Given:
a) Ground truth clusters for a data,
b) Clusters obtained using a clustering algorithm (eg: DBSCAN) when applied on the data after processing it .
** Issue:
How to evaluate the performance of the clustering technique when applied on a specific data??
** NMI (Normalized Mutual Information) is a popular external measure to do so. But in cases like below, it gives bad results:
E.g:
Ground_truth = [1,1,1,1,1] ;
DBSCAN_Clusters = [1,1,1,1,2];
nmi = normalized_mutual_info_score(Ground_truth, DBSCAN_Clusters); %python code
** The value of the variable "nmi" approximately equal to zero in this case.
** Here, note that, nmi = 0 in-spite of the fact that DBSCAN (clustering algorithm) has failed to cluster only one cluster member and rest four matches the ground truth.
** This is a typical case when the ground truth contains only one cluster.
** Questions :
1) Why does this happen?
2) Does it mean that clustering algorithm is performing bad?
3) Should I use other measures along with NMI ? If so which ones, and what are they for?
Thanks.
Relevant answer
Answer
I think you can add other measure to be able to discuss the obtained result such as the error measure, the F-measure and if you are working with density based clustering you can add internal evaluation metrics such as cd_bw index and DBCV index.
hoping it helps you in your question.
  • asked a question related to Information Theory
Question
6 answers
I would like to know how to calculate the entropy of a binary word (I can have words of different sizes, 8, 16, 32, 400 bits). I know about the Shannon Entropy, but it is related to a set, not to an individual.
Relevant answer
Answer
Dear Camps
You can calculate letter-level mean Shannon entropy independent or depending on sequence. Sequence-independent mean entropy can be calculated as the Sh = SUM[-(pi)·log2(pi)] where the probs pi for each i-th letter can be determined relative to the frequency of the letter in this text (genome, message, book, etc.) for sequence dependent entropy or graph entropy (sequence is a linear graph) you can use a Markov chain approach to calculate the probs. We have published together with Prof. Cristian R Munteanu and released the software S2SNet to doing both kind of calculations. Please, send me an email if you are further interested on it. See some refs:
1: Munteanu CR, Magalhães AL, Uriarte E, González-Díaz H. Multi-target QPDR
classification model for human breast and colon cancer-related proteins using
star graph topological indices. J Theor Biol. 2009 Mar 21;257(2):303-11. doi:
10.1016/j.jtbi.2008.11.017. Epub 2008 Dec 6. PubMed PMID: 19111559.
2: Munteanu CR, González-Díaz H, Borges F, de Magalhães AL. Natural/random
protein classification models based on star network topological indices. J Theor
Biol. 2008 Oct 21;254(4):775-83. doi: 10.1016/j.jtbi.2008.07.018. Epub 2008 Jul
22. PubMed PMID: 18692072.
  • asked a question related to Information Theory
Question
13 answers
Can we update the Turing test? It is about time. The Turing test, created in 1950, aims to differentiate humans from robots -- but we cannot, using that test. Bots can easily beat a human in Chess, Go, image recognition, voice calls, or, seems, any test. We can no longer use the Turing test, we are not exceptional.
The relevant aspect of "playing better chess" is that chess is a model of a conversation, a give and take. It is unsetlling that people have difficulty accepting it, it is not a good performance on a conversation. A human who finds it "normal" that computers can pass as a colleague, frequently, and not wonder about the intelligence of that colleague ... or smile? The Turing test has also become an intelligence test, and humans are using bots to beat humans, easily. This is another reason, in ethics, to depreciate this tool and look deeper.
Relevant answer
Answer
The relevant aspect of "playing better chess" is that chess is a model of a conversation, a give and take. It is unsetlling that people have difficulty accepting it, it is not a good performance on a conversation. A human who finds it "normal" that computers can pass as a colleague, frequently, and not wonder about the intelligence of that colleague ... or smile? The Turing test has also become an intelligence test, and humans are using bots to beat humans, easily. This is another reason, in ethics, to depreciate this tool and look deeper.
  • asked a question related to Information Theory
Question
38 answers
Consciousness defies definition. We need to understand it, and a metric, to measure it. Can trust provide both, even if in a limiied fashion?
Relevant answer
Answer
That Polyani theorized some matters in error does not invalidate all of his effort.
In addition, we now know that free market economies, if they ever exist, are not wholly self-adjusting with respect to externalities, since free market economies are not closed systems in reality. That does not imply a single-authority system, so perhaps Polyani should not be labelled as binary thinking in this case.
It strikes me that Polyani must certainly have meant truth in a fashion not corresponding to the Ed Gerck notion of truth being accessible to an AI.
I don't believe that successful chess-playing programs qualify as AI, although they do provide demonstration that intelligence is not required to play chess successfully. I don't know about the more-recent demonstration of GO mastery, but I suspect that it does not require an AI either. I conceded the achievement of successful heuristic approaches and the computing power available to apply them beyond the capacities of human opponents.
I love playing computer adventure games of the highly-animated, cinematic form. My favorites are termed third-person shooters because one can observe and operate a character without being trapped behind the character's eyes. I am currently working through "Shadow of the Tomb Raider," a great demonstration of the genre. That the operation of non-player characters and other entities that appear to exhibit agency is sometimes claimed to be evidence of AI is not much evidence for that claim, whatever its appeal in popular culture.
  • asked a question related to Information Theory
Question
3 answers
Hello all I need help please 
How to create a function for the Renyi and Shannon formulas?
RENYI=(1/1-alpha).* log2(sum(q .^alpha))
SHANNON=-(sum (q .* log2 (q)))
thanks in advance 
  • asked a question related to Information Theory
Question
2 answers
Hi
I have a question regarding high-rate LDPC codes constructions. My research field is not coding but somewhere in my research I need to find an explicit way of constructing LDPC codes with high rate with girth 6 or 8. I think high-rate is not an important factor in coding but in compressed sensing it is of great importance since it provides the main assumption of compressed sensing.
I would like to know whether Cyclic or Quasi-cyclic LDPC codes of girth 6 or 8 can provide high-rate or not? Any suggestion is appreciated!
Thanks
Mahsa
Relevant answer
Answer
Thanks! It was helpful!
  • asked a question related to Information Theory
Question
24 answers
I realize true "information" or a reduction in entropy/uncertainty must be something not already known, but I am asking this question in a way where I want to ignore that facet.
Similarly, is it possible to for absolutely 0 information to exist even in any hypothetical situation besides the one where one already knows this information.
Relevant answer
Answer
Dear Nathan,
in your answer is notices that a reduction of Information to Human Being does make no sense.
I understand, what you mean. You are right but if you read more of my publications you can find that the Human Being is the "first representator" for all living objects and subjects. The critical border is the amount of abilities for mental activities (f. i. understanding, consciously recognition, speaking, acting creativly, reasoning and thinking in single cases). The criterium is given by the biological evolution which gave that abilities to linving organisms in different way. All of them are included in treating Information of course - but in a limited way according their state of mental evolution.
  • asked a question related to Information Theory
Question
4 answers
As this Wikipedia article (https://en.wikipedia.org/wiki/Probability_density_function#Link_between_discrete_and_continuous_distributions) describes, we can define probability density functions (pdfs) for discrete random variables using Dirac delta functions, which is called "generalized pdf".
That Wikipedia page has given a general equation for the (generalized) pdf of a discrete random variable which can take N different values among real numbers. The parameters of the distribution are {p_i} for i=1 to N.
I tried to calculate the entropy (differential entropy) of this pdf, and I obtained minus infinity by solving the integral (actually because delta(x)*log(delta(x)) terms appeared in the integral). However, this result does not seem intuitive to me, since the answer is independent of the parameters N and {p_i}. In general, we expect that the entropy gets higher as the distribution gets closer to uniform.
Have I made a mistake in solving the integral? Or the entropy is really independent of distribution parameters in this case?
Relevant answer
Answer
You are right Reginald, but this question is a part of a more general problem that I am facing in my research. In fact, I have to find the entropy of "empirical distributions" of "continuous random variables", which are represented by generalized functions (i.e. using delta). I can not simply replace these functions by discrete random variables.
  • asked a question related to Information Theory
Question
11 answers
These 'entropies' depend upon a parameter, which can be varied between two limits. In those limits they reduce to the Shannon-Gibbs and Hartley-Boltzmann entropies. If such entropies did exist they could be derived from the maximum-entropy formalism where the Lagrange multiplier would be identified as the parameter. Then, like all the other Lagrange multipliers, the parameter would have to be given a thermodynamic interpretation as an intensive variable which would be uniform and common to all systems, like the temperature and chemical potential. The Renyi and Havdra-Charvat entropies cannot be derived from the maximum-entropy formalism. Thus, there can be no entropy that can be parameter dependent, and whose parameter would be different for different systems.
Relevant answer
Answer
What if we have several parameters? Then the situation can be described with fuzzy Shannon entropy ( see my paper in Journal of Physics & Astronomy, 2016- Approach with different entropies...)
  • asked a question related to Information Theory
Question
6 answers
Mark Srednicki has claimed to demonstrate the entropy ~ area law -- https://arxiv.org/pdf/hep-th/9303048.pdf
Does anyone know of an independent verification or another demonstration of this result?
Is there a proof of this law?
Relevant answer
Answer
An argument which depends on the assumption that every qubit of information, [1,0] or [0,1] can occupy one and only one 'box' on the horizon's area goes as follows. Since the sum of the boxes must equal the area, we have, N = A, where N is the number of qubits. We calculate the number of ways by which we can arrange the qubits on the horizon, as the sum of all the possible combinations of qubit configurations, W(N) = ΣN!/[(N-k)!k!], with the sum running from k=0 to k=N. This sum is calculated to be 2N, which suggests that we could simply put it this way: each qubit has two representations, for N qubits there are therefore, 2N ways to arrange this collection of qubits. Since according to the Boltzmann principle S=log[W], we have, S=log[2N], or S ∝N = A.
  • asked a question related to Information Theory
Question
3 answers
Here's how information theory has been helping us analyze real customer data sets across different domains:
a) One of the basic ideas of Information theory is that the meaning and nature of data itself does not matter in terms of how much information it contains. Shannon states in his famous paper "A Mathematical Theory of Communication (1948)" that "the semantic aspects of communication are irrelevant to the engineering problem". This enables us to construct our analytical approach around informational measures (Shannon entropy, mutual information for example) and have it to be domain and data agnostic.
b) There has been interesting work about the using the "Information bottleneck" concept to uncover the "Deep neural net blackbox".
Original paper here: https://arxiv.org/pdf/1703.00810... I also recommend this very well written blog post. https://blog.acolyer.org/2017/11...
Our technology uses a variant approach not only to "autonomously" diagnose our models but to improve their quality and efficiency and subject them to “noise-testing” using these “very generic” measures.
c) Using informational measures for analysis frees us from some of the assumptions that are made in conventional machine learning. We don't assume data to have properties such as independence or that some known probability distribution fits the data. Here's an article describing some of the practical risks of those assumptionshttps://www.edge.org/response-de...
Our experiments to predict rare events (high-sigma or "black swan") with this approach has shown very impressive results.
Conclusion:
Information theory concepts can immensely contribute to machine learning in practice (we have quite a few case studies and success stories of customers benefiting from our platform) and I believe it would provide an even bigger significant foundation for predictive science as we run into harder problems in this space.
Relevant answer
Answer
Interesting...
  • asked a question related to Information Theory
Question
9 answers
thanks
Relevant answer
Answer
thank you very much
now i got it perfectly
  • asked a question related to Information Theory
Question
3 answers
For example, suppose I have an observation network consisting of 10,000 gauges. It is easy to calculate the joint (Shannon) entropy of these 10,000 gauges, if these points are all present at the same time.
Now, if each of the single gauge within this network has a very low probability of presence (say p=0.01), which means that at each time we have an network consisting of about 100 gauges, and the ID of these 100 gauges is changing from time to time. If I know the probability of presence (p) of any of these 10,000 gauges, how would I calculate the (expected) joint entropy of this STOCHASTIC network?
Relevant answer
Answer
Just a clarifying question for you. When you refer to "presence" are you referring to the existence of an observation at a location for a specific moment in time (e.g. the stage reading of a stream)? Or are you actually referring to the presence of some physical object (e.g. a species of fish) or a phenomena (e.g. a lightning strike) at a particular moment in time?
The distinction here being that in the case of the existence of an observation, one could infer that what was observed on some occasions could have been potentially observed on the other occasions, if only someone had bothered to make the observation (e.g. the stream had a stage, but it was unobserved/unrecorded). In the case of actual presence, one might infer that the species of fish was absent from the location that month, or that a lightning strike did not occur on that day.
  • asked a question related to Information Theory
Question
2 answers
I want to find secrecy rate of a main channel in case of Gaussian broadcast channel. e.g. I have a single transmitter and a receiver where eavesdropper wants to eavesdrop the communication over secondary channel. What will be the information rate, so eavesdropper can be kept ignorant of sent message
Relevant answer
Answer
Thanks Rob, But I am looking for channel with log normal distribution. In referred paper they consider Raleigh fading. I am interested to find average secrecy capacity instead of outage probability of secrecy capacity.
thanks
  • asked a question related to Information Theory
Question
1 answer
How, when, what threshold, or is there even a quantitative, or perhaps qualitative property required for the seemingly enormous "transition" where a system which is considered to be "Conscious" become aware that it is "Conscious?"
Relevant answer
Answer
Any form of being Conscious, as this is irreducible, must merely be part of a response to phenomenon, is a subset of that response applicable to, I suggest, a loop within that response. It is integrated within consciousness through the specific irreducibility of consciousness. In this the phenomenon carries the material for self-consciousness.
  • asked a question related to Information Theory
Question
23 answers
Successes of communications are more than evident. Information theory is one of the basic sources of the success. Nevertheless, it is frequently called non-constructive (some of sources are attached). If so, what impedes it to be constructive?
S. Haykin, Communication systems, vol. 1,  p. 24,
K.H. Rosen,
S. Roman, Coding and Information Theory, https://books.google.pl/books?isbn=0387978127, p. 97. 
The list can be continued.
Relevant answer
Answer
To Mrs Murthy: Thank you. You had found very good brief survey prepared by known specialist in feedback communications.
Let you pay attention that all included results have top level mathematical form. In my opinion, this makes them too difficult for understanding by engineers and complicates implementation.
To Dr Prabhat Sharma:
Dear Prof. Sibi Raj B. Pillai and Dr Prabhat Sharma,
Thank you for the argued answer. Your point of view is close to mine (I expressed it above).
Finalizing comments to first part of discussion:  Theoretical results of Inf. Theory a) establish upper bound of the system performance (e.g. Shannon’s formula for capacity), b) prove existence of optimal systems attaining or approaching to these boundaries (as coding theorems do), c) determine general conditions for it were possible (also coding theorems), d) can be solution of optimization task determining the structure and parameters of optimal system, as well as the way of its design (as it is in automatics or estimation theory).
Each on these results is constructive, but you feel difference.
Analytical derivation of optimal codes is impossible. All codes, including excellent turbo and LDPC, are unique inventions. But we cannot say - are these codes best “for ever”, are they best in all applications, how should we transmit them, and so on. Closeness to Shannon limits solves not everything. Moreover, these limits are scenario and BER dependent, and we do not know how can we take it into account in codes transmission. These are small echoes of not absolute "constructiveness” of Inf. Theory, which can be a subject of another discussion, but all we work.
I think, we may agree that Inf. Theory is constructive and “non-constructiveness” concerns only impossibility to derive optimal codes what does not prevent the  new codes to solve their tasks well. In my opinion, the discussed term appeared under influence of specialists in systems optimization.  
So, I propose to close my first question and to look at the second one.
In developed form: does Inf. Theory have the tools “constructive” in all senses, i.e. does it permit to get the results determining the way of optimal (“perfect”) communication systems (CS) design?
Intensive research carried out in the years 1960-1970s (see also my comment two answers above) unambiguously showed – this is possible. Main results were published in IEEE Trans. on Inf. Theory and other central journals. If the researches were successfully finished, we would have today additional to CS with coding, “completely constructive” branch of Inf. Theory, and the set of CS with feedback, well operating in WSN and other low cost, low energy applications. But they did not appear.   
Moreover, in the mid of 70th, intensity of research abruptly fallen down until one-several papers per year and less. What was the reason?
Thank once more all readers and Colleagues who took and take part in discussion.
  • asked a question related to Information Theory
Question
3 answers
PubPeer:                                                                                    May 29, 2017
Unregistered Submission:
(May 25th, 2017 2:46 am UTC)
In this review the authors attempted to estimate the information generated by neural signals used in different Brain Machine Interface (BMI) studies to compare performances. It seems that the authors have neglected critical assumptions of the estimation technique they used, a mistake that, if confirmed, completely invalidates the results of the main point of their article, compromising their conclusions.
Figure 1 legend states that the bits per trial from 26 BMI studies were estimated using Wolpaw’s information transfer rate method (ITR), an approximation of Shannon’s full mutual information channel theory, with the following expression:
Bits/trial = log2N + P log2P + (1-P) log2[(1-P)/(N-1)]
where N is the number of possible choices (the number of targets in a center-out task as used by the authors) and P is the probability that the desired choice will be selected (used as percent of correct trials by the authors). The estimated bits per trial and bits per second of the 26 studies are shown in Table 1 and represented as histograms in Figure 1C and 1D respectively.
Wolpaw’s approximation used by the authors is valid only if several strict assumptions are true: i) BMI are memoryless and stable discrete transmission channels, ii) all the output commands are equally likely to be selected, iii) P is the same for all choices, and the error is equally distributed among all remaining choices (Wolpaw et al., 1998, Yuan et al, 2013; Thompson et al., 2014). The violation of the assumptions of Wolpaw’s approximation leads to incorrect ITR estimations (Yuan et al, 2013). Because BMI systems typically do not fulfill several of these assumptions, particularly those of uniform selection probability and uniform classification error distribution, researchers are encouraged to be careful in reporting ITR, especially when they are using ITR for comparisons between different BMI systems (Thompson et al. 2014). Yet, Tehovnik et al. 2013 failed in reporting whether the assumptions for Wolpaw’s approximation were true or not for the 26 studies they used. Such omission invalidates their estimations. Additionally, the inspection of the original studies reveals the authors failed at the fundamental aspect of understanding and interpreting the tasks used in some of them. This failure led to incorrect input values for their estimations in at least 2 studies.
The validity of the estimated bits/trial and bits/second presented in Figure 1 and Table 1 is crucial to the credibility of the main conclusions of the review. If these estimations are incorrect, as they seem to be, it would invalidate the main claim of the review, which is the low performance of BMI systems. It will also raise doubts on the remaining points argued by the authors, making their claims substantially weaker. Another review published by the same group (Tehovnik and Chen 2015), which used the estimations from the current one, would be also compromised in its conclusions. In summary, for this review to be considered, the authors must include the ways in which the analyzed BMI studies violate or not the ITR assumptions.
References
Tehovnik EJ, Woods LC, Slocum WM (2013) Transfer of information by BMI. Neuroscience 255:134–46.
Shannon C E and Weaver W (1964) The Mathematical Theory of Communication (Urbana, IL: University of Illinois Press).
Wolpaw J R, Ramoser H, McFarland DJ, Pfurtscheller G (1998) EEG-based communication: improved accuracy by response verification IEEE Trans. Rehabil. Eng. 6:326–33.
Thompson DE, Quitadamo LR, Mainardi L, Laghari KU, Gao S, Kindermans PJ, Simeral JD, Fazel-Rezai R, Matteucci M, Falk TH, Bianchi L, Chestek CA, Huggins JE (2014) Performance measurement for brain-computer or brain-machine interfaces: a tutorial. J. Neural Eng. 11(3):035001.
Yuan P, Gao X, Allison B, Wang Y, Bin G, Gao S (2013) A study of the existing problems of estimating the information transfer rate in online brain–computer interfaces.  J. Neural Eng. 10:026014.
Relevant answer
Answer
Fitts’ Law and Brain-machine Interfaces according to Willett et al. (2017):
Reaching movements typically obey Fitts’ law: MT = a + b log2 (D/R) where MT is movement time, D is target distance, R is target radius, and a & b are parameters. Fitts’ law describes two properties that would be ideal for brain-machine interfaces (BMIs): (1) that movement time is insensitive to the absolute scale of the task since the time depends on the ratio of D/R and (2) that movements have a large dynamic range of accuracy since movement time is logarithmically proportional to D/R.  Movement times for BMI (based on motor cortex electrophysiological recordings from two tetraplegics performing a center-out task) were better described by the formula, MT = a + bD + cR(-2 pow), since the movement time increased as the target radius became smaller, independent of target distance.  The mismatch between reaching movement and BMI-generated movement was determined to be due the signal-independent noise of the decoder for BMI which makes targets below a certain size very difficult to acquire in a timely manner.  This would reduce the information transfer rate by BMI when using small targets.
For the complete article see: Willett FR, Murphy BA, Memberg WD, Blabe CH, Pandarinath C, et al. (2017)  Signal-independent noise in intracortical brain-computer interfaces causes movement time properties inconsistent with Fitts’ law.  J. Neural Eng. 14:026010.
  • asked a question related to Information Theory