Science topic

Probability - Science topic

Probability is the study of chance processes or the relative frequency characterizing a chance process.
Questions related to Probability
  • asked a question related to Probability
Question
3 answers
Suppose A is a set measurable in the Caratheodory sense such for n in the integers, A is a subset of Rn, and function f:A->R
After reading the preliminary definitions in section 1.2 of the attachment where, e.g., a pre-structure is a sequence of sets whose union equals A and each term of the sequence has a positive uniform probability measure; how do we answer the following question in section 2?
Does there exist a unique extension (or method constructively defining a unique extension) of the expected value of f when the value’s finite, using the uniform probability measure on sets measurable in the Caratheodory sense, such we replace f with infinite or undefined expected values with f defined on a chosen pre-structure depending on A where:
  1. The expected value of f on each term of the pre-structure is finite
  2. The pre-structure converges uniformly to A
  3. The pre-structure converges uniformly to A at a linear or superlinear rate to that of other non-equivalent pre-structures of A which satisfies 1. and 2.
  4. The generalized expected value of f on the pre-structure (an extension of def. 3 to answer the full question) satisfies 1., 2., and 3. and is unique & finite.
  5. A choice function is defined that chooses a pre-structure from A that satisfies 1., 2., 3., and 4. for the largest possible subset of RA.
  6. If there is more than one choice function that satisfies 1., 2., 3., 4. and 5., we choose the choice function with the "simplest form", meaning for a general pre-structure of A (see def. 2), when each choice function is fully expanded, we take the choice function with the fewest variables/numbers (excluding those with quantifiers).
How do we answer this question?
(See sections 3.1 & 3.3 in the attachment for an idea of what an answer would look like)
Edit: Made changes to section 3.5 (b) since it was nearly impossible to read. Hopefully, the new version is much easier to process.
Relevant answer
Answer
Einstein was also determined to answer questions he found worth pursuing, so, continue studying , reading, writing, and , sharpen your mind by studying published refereed papers as well, and then, after maybe quite some time, you really know, whether what you are doing is really worthwhile , and then, knock on the door of a professor.....and YOU should tackle your questions......and, also very important, when you write a research paper, introduce your problem carefully, in such a way, that your paper triggers the minds of those, who are reading your paper, and, do not write , in the beginning , in a terse style.
Good luck, and take your time, as did Einstein and Gödel or Hilbert!
  • asked a question related to Probability
Question
4 answers
Hello!
I have a few questions about a TDDFT calculation that I ran ( # td b3lyp/6-31g(d,p) scrf=(iefpcm,solvent=chloroform) guess=mix) and when I calculation the % probability of some of the excitation states I am getting >100%. What I remember from statistics is that we cannot actually have >100% probability so I am trying to figure out why I have that occurring in my data.
I calculated %probability by 2*(coefficient^2). I have included one of data's oscillator strength information below.
"Excitation energies and oscillator strengths:
Excited State 1: 2.047-A 0.5492 eV 2257.37 nm f=0.1453 <S**2>=0.797
339B -> 341B 0.20758 (8.62%)
340B -> 341B 0.97366 (189.6%)
This state for optimization and/or second-order correction.
Total Energy, E(TD-HF/TD-DFT) = -6185.76906590
Copying the excited state density for this state as the 1-particle RhoCI density.
Excited State 2: 2.048-A 0.6312 eV 1964.25 nm f=0.0730 <S**2>=0.798
339B -> 341B 0.97645 (190.69%)
340B -> 341B -0.20706 (8.57%)
Excited State 3: 2.037-A 0.7499 eV 1653.42 nm f=0.0000 <S**2>=0.787
331B -> 341B 0.98349 (193.45%)
SavETr: write IOETrn= 770 NScale= 10 NData= 16 NLR=1 NState= 3 LETran= 64.
Hyperfine terms turned off by default for NAtoms > 100."
The other three questions I have are:
  • what the ####-A means (in bold above) as I have some calculations with various numbers-A and others that have singlet-A. (ZnbisPEB file)
  • I obtained a negative wavelength what should I do? I have already read a question on here about something similar, but the only suggestion was to remove the +, which I do not have in my initial gjf file. Should I solve for more states or completely eliminate (d,p)? (Trimer file)
  • In another calculation I obtained a negative oscillator strength which I know from some web searches is not theoretically possible and indicates that there is a lower energy state (is that correct?) - how would I fix that? I have included it below, the same basis set as above is used. (1ZnTrimer file)
"Excitation energies and oscillator strengths:
Excited State 1: 4.010-A -0.2239 eV -5538.35 nm f=-0.0004 <S**2>=3.771"
Any clarification would be super helpful. I have also included the out files for the three compounds I am talking about.
Thank you so much!
Relevant answer
Answer
2*(coefficient^2) is only correct for closed-shell case, this is because coefficients of orbital transitions of alpha and beta spins are exactly the same, therefore Gaussian only prints one of them, and thus the factor of 2 should be introduced. For open-shell cases like your example, alpha and beta transition information are printed separatedly, so it should be directly calculated as coefficient^2.
  • asked a question related to Probability
Question
6 answers
Could any expert try to examine our novel approach for multi-objective optimization?
The brand new approch was entitled "Probability - based multi - objective optimization for material selection", and published by Springer available at https://link.springer.com/book/9789811933509,
DOI: 10.1007/978-981-19-3351-6.
Relevant answer
Answer
  • asked a question related to Probability
Question
6 answers
P(y)=Integral( P(x|y)*P(y)*dx)
function is above if I didnt write wrongly. P(x|y) is conditional probability and it is known but P(y) is not known, thanks.
there may be an iterative solution but is there any analytical solution.
Relevant answer
Answer
Thanks you all for answers, I know that is bayes marginal distribution and I have rewritten the equation from bayes theorem, but here i want to find the P(y) while i assumed that i know p(x|y). In fact I want to ask the general name or solution of integral equation which the output is also in the integral. It has not to be probability equation.
  • asked a question related to Probability
Question
1 answer
Good day! The question is really complex since CRISPR do not have any exact sequence - so the question is the probability of generation of 2 repeat units, each of 23-55 bp and having a short palindromic sequence within and maximum mismatch of 20%, interspersed with a spacer sequence that in 0.6-2.5 of repeat size and that doesn't match to left and right flank of the whole sequence, in a random sequence.
Relevant answer
Answer
First, I'd re-state the question to assure that I understood it correctly. A nucleotide sequence of length l contains a palindrom with unit of length k. The palindrom is not exact; there can be from kmin to k matches between units. The distance between palindrom units can be from smin to smax. First and last sub-sequences of length k are not exact matches of any palindrom unit.
My solution. Let's omit the last condition for now. How we search for a palindrom with unit of length k? Take any subsequence of length k and search for a 'match'. Searching for a 'match' is equal to checking (l-k-smin) subsequences, because the unit itself occupies k nucleotides and a spacer can't be shorter than smin nucleotides. In each window the probability of hit is (1/4)^(kmin), if every nucleotide has equal probability of occurrence. The probability of having 1 or more hits then is equal to binomial cdf with the number of attempts equal to n-k-smin, the probability of success equal to (0.25)^kmin and number of successes equal to 1. For example, GSL function gsl_cdf_binom_Q(n-k-smin,0.25^kmin,0) would give the answer. The last paramerter is zero, because the function computes the probability of more than x successes, i.e. 1 and more in this case.
Now, let's include the last condition. It is important to define what 'does not match' mean. I suppose that it means that we can't find the second palindrom unit at postions 1 and l-k. So, the number of windows that we check has to be decreased by 2. The final answer would be:
F(n-k-smin-2,0.25^kmin,0), where F - binomial cdf.
For varying length the answer would be a weighted sum of those propabilities, with weights equal to the probability of observing given legnth. So, if all lengths have equal probability, this is the mean.
I checked the answer on a synthetic set and it seems it is correct or close to being so.
  • asked a question related to Probability
Question
1 answer
My lab recently got donated a 5500xl Genetic Analyzer from Applied Biosystems. However they are discontinuing the official reagents for this system come Dec 31, 2017; which is probably why it was given for free.
So I am wondering if anyone can offer help to get this machine running on generic reagents, or any tips/hints/advice; or even if it's worth the effort.
Basically it would be nice to get it sequencing, but if that can't be done are there any salvageable parts (for instance I know it has a high def microscope and a precise positioning system) ?
Here is a link to the machine we have:
We have all the accessory machines that go with it.
Thanks.
Relevant answer
Answer
Are you still using your 5500xl Genetic Analyzer from Applied Biosystems
  • asked a question related to Probability
Question
26 answers
Suppose that we have a two-component series system, what is the probability of failure of two components at the same time?
*** both components' failure times are continuous random variables,
*** Is it important that they follow the same distribution or the different ones or the same distribution with different parameters?
Relevant answer
Answer
The practical significance of the need to know Pr{ X=Y } may come from the need of estimating losses caused by catastrophic coincidence of two failures of a two-part system. Usually it happens with possibility of appearance of a common cause of a simultaneous failure. In case of a thunder it is rather am artificial model replacing the due three-part system instead. But there are real systems where the common causes are hidden. In such cases statistical analysis of models with non-zero probability of the coincidence are very useful.
Simplest theoretical models of some practical value are introduced in works by [R.] Barlow and [F.] Proschan in late 60's.
Example: If X= min{T1,T2} Y=min{T2,T3} with independent exponential T's. Then
Pr{ X=Y} = lambda2/sum_ of _ lambdas
where lambdas are the inverses of the mean values of the correspondings T's.
  • asked a question related to Probability
Question
11 answers
In more detail, which conditions should be held that if X is a continuous random, then f(X) is also a continuous RV?
More specially, if X and Y are two continuous RVs, is X-Y a continuous RV?
Bests
Relevant answer
Answer
1. if X is a continuous random variable then Z=f(X) is also continuous RV if the function f() that maps X to Z is continuous. If the function f() is discrete then Z is discrete.
For example, if f() is, say, polynomial or exponential or any other continuous function then Z will be continuous.
If f() is, say, discrete such as sign X which is
1, if X>0,
0, if X=0
-1, if X<0, then
RV Z is discrete that has two values: -1 and 1. The probability of Z=0 is 0 for the continuous X.
2. If X and Y are two continuous RVs with the joint density f(x,y) then Z is always continuous with the density g(Z)= integral(-inf ,+inf)[f(x, x-z)dx]
If RVs X and Y are independent with the densities f1(x) and f2(y), then
g(Z)=integral(-inf, inf)[f1(x)f2(x-z)dx] = integral(-inf, inf)[f1(y-z)f2(y)dy]
  • asked a question related to Probability
Question
31 answers
If for example the position of an electron in a one-dimensional box is measured at A (give and take the uncertainty), then the probability of detecting the particle at any position B at a classical distance from A becomes zero instantaneously.
In other words, the "probability information" appears to be communicated from A to B faster than light.
The underlying argument would be virtually the same as in EPR. The question might be generalized as follows: as the probability of detecting a particle within an arbitrarily small interval is not arbitrarily small, this means that quantum mechanics must be incomplete.
Yet another formulation: are the collapse of the wave function and quantum entanglement two manifestations of the same principle?
It should be relatively easy to devise a Bell-like theorem and experiment to verify "spooky action" in the collapse of the wave function across a given classical interval.
  • asked a question related to Probability
Question
3 answers
I want to draw a graph between predicted probabilities vs observed probabilities. For predicted probabilities I use this “R” code (see below). Is this code ok or not ?.
Could any tell me, how can I get the observed probabilities and draw a graph between predicted and observed probability.
analysis10<-glm(Response~ Strain + Temp + Time + Conc.Log10
+ Strain:Conc.Log1+ Temp:Time
,family=binomial(link=logit),data=df)
predicted_probs = data.frame(probs = predict(analysis10, type="response"))
I have attached that data file
Relevant answer
Answer
Plotting observed vs predicted is not sensible here.
You don't have observed probabilities; you have observed events. You might use "Temp", "Time", and "Conc.Log10" as factors (with 4 levels) and define 128 different "groups" (all combinations of all levels of all factors) and use the proportion of observed events within each of these 128 groups. But you have only 171 observations in total. No chance to get any reasonable proportions (you would need some tens or hundreds of observation per groups for this to work reasonably well).
  • asked a question related to Probability
Question
1 answer
I want to see the distribution of an endogenous protein(hereafter as A), and I followed the protocol from Axis-Shield (http://www.axis-shield-density-gradient-media.com/S21.pdf). In order to gain stronger signals, I tried small ultracentrifuge tube (SW60, 4ml).
In this protocol, Golgi is enrich in #1~3, ER is #9~12. But in my experiments,the enrichment of ER (Marker using Calnexin) is usually failed (#3~12), while Golgi (marker using GM130) is good (#1~3).
    Here are some questions :
1. the amount of protein loaded on gradient: Should it be considered? I mean, is the capacity of the gradient need to be think out? does it effect the fraction efficiency?
2. Is it necessary to use large tube (12ml)? Previously, to gain stronger signals of A, I switched to smaller tube (4ml). I have searched many papers, and some of them use small tube for fractionation. (Probably they use less protein for loading)?
Thanks a lot for answering
Relevant answer
Answer
1. the amount of protein loaded on gradient: Should it be considered? I mean, is the capacity of the gradient need to be think out? does it effect the fraction efficiency?
Indeed, it is possible to overload a gradient. For the larger format (12 ml), I typically use 1 mg input, while for the lower volume gradients (4-5ml), 500 ug work perfectly fine.
2. Is it necessary to use large tube (12ml)? Previously, to gain stronger signals of A, I switched to smaller tube (4ml). I have searched many papers, and some of them use small tube for fractionation. (Probably they use less protein for loading)?
Smaller tubes work just fine. When using smaller tubes you might want to decrease the volume per collected fraction. As for the amount to load, see the point above. One thing you might want to take into account is that if you load very low amounts of protein, you might not efficiently recover everything in a TCA precipitation (to concentrate the protein in each fraction and get rid of the sucrose).
  • asked a question related to Probability
Question
23 answers
Why the Chi-square cannot be less than or equal to 1 ?
Relevant answer
Answer
Adrianna Kalinowska There is no special meaning of the value 1 for the khi-square... As a probability function, continuous, the probability of a random variable following a khi-square law to be exactly 1 is 0. As a distance between two contingency tables, it's not clear why 1 should be given a special consideration.
So, I don't really understand the context of your question. Please could you detail?
(By the way, I don"t understand either the origin of the original question and this debate around the value « 1 » in the khi-square…).
  • asked a question related to Probability
Question
10 answers
I have camera traps data which was deployed at several sites (islands). The data consist of only N (abundance; independent image) of each species at their respected islands. No occupancy modelling could be run as I do not have the habitat data. Is it possible to calculate the occupancy and probability without the temporal data/repeated sampling (the week the species was detected)? Or would calculating the Naive Occupancy would do? Furthermore, does Occupancy and Probability has a range of high and low?
Relevant answer
Answer
I believe so, if one has the right tools.
  • asked a question related to Probability
Question
5 answers
probability and fuzzy numbers are ranged between 0 to 1. both are explaned in similar manner. what is the crisp difference between these terms? 
Relevant answer
Answer
P(S)=1, i.e. the probability of all possible outcomes equals 1, not necessarily in a fuzzy set, it may exceed one.
If we take the exam of the freehand circle, the probability is always equal to zero while in each subsequent turn the grade of membership approaches 1.
  • asked a question related to Probability
Question
39 answers
Hi everyone,
In engineering design, there are usually only a few data points or low order moments, so it is meaningful to fit a relatively accurate probability density function to guide engineering design. What are the methods of fitting probability density functions through small amounts of data or low order statistical moments?
Best regards
Tao Wang
Relevant answer
Answer
Good explanation is performed by Chao Dang,
Best regards
  • asked a question related to Probability
Question
8 answers
I Will be more than happy if somebody help me in this case. Does it has an specific function in R? or we should utilize quantile -copula methods...? or other???
Relevant answer
Answer
Send me equations.
  • asked a question related to Probability
Question
10 answers
Imagine there is a surface, with points randomly spread all over it. We know the surface area S, and the number of points N, therefore we also know the point density "p".
If I blindly draw a square/rectangle (area A) over such surface, what is the probability it'll encompass at least one of those points?
P.s.: I need to solve this "puzzle" as part of a random-walk problem, where a "searcher" looks for targets in a 2D space. I'll use it to calculate the probability the searcher has of finding a target at each one of his steps.
Thank you!
Relevant answer
Answer
@Jochen Wilhelm, the solutions are not equivalent because
For Poisson: P(at least one point) = 1 - P(K=0) = 1 - e^(-N/S*A)
For Binomial: P(at least one point) = 1 - ( (S - A)/S )^N
The general formula for the Binomial case is the following:
P(the rectangle encompasses k points)=(N choose k) ( A/S )^k ( (S - A)/S )^(N - k)
  • asked a question related to Probability
Question
5 answers
Dear colleagues.
In the following question i try to extend the concept of characters in group theory to a wilder class of functions. A character on a group G is a group homomorphism $\phi:G \to S^1$.
For every character $\phi=X+iY$ on a group $G$, we have $Cov(X.Y)=0$.
This is a motivation to consider all $\phi=X+iY: G\to S^1$ with $Cov(X,Y)=0$.
Please see this post:
So this question can be a starting point to translate some concepts in geometric group theory or theory of bamenability of groups in terms of notations and terminologies in statistics and probability theory.
Do you have any ideas, suggestions or comments?
Thank you
Relevant answer
Answer
George Stoica
Thank you very much for this very helpful and interesting answer, the information about non commutative probability theory.
  • asked a question related to Probability
Question
3 answers
For example, dirchlet and multinomial distribution were conjugated. We want to know the probability of variable A occurs with B. Therefore, we train two probability model with enough samples and computational power. First model based on dirchlet distribution. Second model based on multinomial distributions. When we infer the parameter of dirchlet-multinomial and multinomial distributions, will the accuracy of models be different?
Relevant answer
Answer
Would you please further explain your question?
E.g.: Why would you compare models with a discrete and a continuous distribution?
I wonder if you may be confusing conjugating models with a Dirichlet distribution being a conjugate prior of multinomial likelihood?
  • asked a question related to Probability
Question
8 answers
I need help with Queuing theory, easy explanation to M/M/C?
What are the parameters of the M/M/C queuing model?
Relevant answer
Answer
This is a personal rewording of ideas expressed by A. S. Tanenbaum at its book Distributed Operating Systems, but applied to communications:
It can be proven ( http://en.wikipedia.org/wiki/M/M/1_model Kleinrock, 1974) that the mean time between issuing a request to send a message and getting it completely transmitted, T (queueing time + service), is related to lambda (arrival rate in packets/s) and mu (mean service time) by the formula T = 1/(mu - lambda).
Consider a communication link at 64 kbit/s processing packets with exponentially distributed lengths with average packet size of 50 bytes, then the mean service time (1/mu) is 6.25 ms and this link should be able to handle up to 160 packet/s (maximum lambda). If it just gets 120 packets/s, then the mean Tx time will be 25 ms.
Suppose now that we have n communication lines at 64 kbit/s processing the same type of packets (average length 50 bytes exponentially distributed) at an arrival rate of 120 packets/s. The mean Tx time is the same, 25 ms. Now consider what happens if we use a single link able to send packets at n.64 kbit/s. Instead of having n communication lines at 64 kbit/s we got a single communication line n times faster, with an input rate n.lambda and a service rate n.mu, so the mean response time has got divided by n also.
This surprising result says that by replacing n small communications links by a big one that is n times faster, we can reduce the average response time n-fold ( http://en.wikipedia.org/wiki/Queueing_model#Multiple-servers_queue ).
This result is extremely general and applies to a large variety of systems. It is one of the main reasons that airlines prefer to fly a 300 seat 747 once every 5 hours to a 10 seat business jet every 10 minutes.
Dividing the communications capacity into small channels, each with few users statically assigned, is a poor match to a workload of randomly arriving requests. Much of the time, a few lines are busy, even overloaded, but most are idle. It is this wasted time that is eliminated in the high speed single link and the reason it gives better overall performance.
In fact, this queueing theory result is also one of the main arguments against having distributed systems at all and argues in favour of concentrating the computing power as much as possible.
However, mean response time is not everything. There are also arguments in favour of small channels and distributed systems, such as cost. In general, the cost of N single resources of cost C is N.C, but the cost of a single resource N times better is C^N, or it can be impossible to build it at any price. Reliability and fault tolerance are also factors to consider.
Moreover, it must be considered that for some users, a low variance in service time may be perceived as more important than the mean response time itself, specially for interactive applications. Consider for example web browsing through your own ADSL line, on which asking for the same page to be displayed always takes 500 ms (at least if served from the central office cache). Now consider web browsing on a shared high speed link on which asking for the next page takes 5 ms 95% of the time and 5 s one time in 20. Even though the mean here is twice as good as on the private ADSL line, the users may consider the performance intolerable. On the other hand, to the user running P2P file transfers, the high speed link may win hands down.
A possible compromise is to provide both options, providing each user with a small single amount of reserved capacity for interactive tasks such as web browsing and running all non-interactive transfers (e.g. P2P, mail, SFTP...) on the rest of shared bandwidth of a high speed link.
A related joke read on the Embedded Muse 223: Why does *my* queue at the supermarket usually move the slowest?
You compare yours to the ones on either side. The odds of yours being
the fastest are 1 in 3 if you compare yours to the immediately adjacent
queues, 1 in 5 if you look at two lines on either side, etc. If you
want to feel better about it, join the queue all the way on the end and
you'll have fewer others against which to measure.
  • asked a question related to Probability
Question
17 answers
Hi, everyone
In relation with the statistical power analysis, the relationship between effect size and sample size has crucial aspects, which bring me to a point that, I think, most of the time, this sample size decision makes me feel confusing. Let me ask something about it! I've been working on rodents, and as far as I know, a prior power analysis based on an effect size estimate is very useful in deciding of sample size. When it comes to experimental animal studies, providing the animal refinement is a must for researchers, therefore it would be highly anticipated for those researchers to reduce the number of animals for each group, just to a level which can give adequate precision for refraining from type-2 error. If effect size obtained from previous studies prior to your study, then it's much easier to estimate. However, most of the papers don't provide any useful information neither on means and standard deviations nor on effect sizes. Thus it makes it harder to make an estimate without a plot study. So, in my case, when taken into account the effect size which I've calculated using previous similar studies, sample size per group (4 groups, total sample size = 40 ) should be around 10 for statistical power (0.80). In this case, what do you suggest about the robustness of checking residuals or visual assessments using Q-Q plots or other approaches when the sample size is small (<10) ?
Kind regards,
Relevant answer
Answer
I cannot agree with the practice of estimating sample size based on previous studies. There are a number of important reasons for this.
1. The most important reason is that the sample size should have adequate power to detect the smallest effect size that is clinically significant. It doesn't matter what previous researchers have reported. If there is a clinically significant effect, then the study should have the power to detect it.
For instance, previous research may have shown that mask wearing reduces risk of Covid transmission by 50%. Fine. But even a 20% reduction in transmission risk is of considerable public health importance, so your study should be capable of detecting this. A study powered to detect a 20% risk reduction is, of course, comfortably powered to detect anything bigger.
2. The second reason is that early studies can suffer from, well, early study syndrome.
a) They are done by people who really believe in the effect, and who are prepared to put in unusual efforts to make the study work, so the study may have unrealistic levels of input.
b) Early studies take place in a context where protocols are evolving, and so the methodological quality is often lower – we learn by our mistakes; I'm not blaming early researchers!
c) They are more likely to be published if they find something interesting (a significant effect size).
And might I add that if your research actually matters there is no excuse for 80% power. It's a lazy habit. It's not ethical to run research that has a baked-in 20% chance of failing to find an important effect. Participants give their time and work for nothing (and animals give their lives). We have an ethical duty not to waste these on research that has one chance in five of failing to find something useful if it really exists.
  • asked a question related to Probability
Question
6 answers
I suspect this is a well-worn topic in science education and psychology, but these are fields I don't know well. I'd like a source or two to support my sense that probability/statistics are hard for people to understand and correctly interpret because they defy "the way our minds work" (to put it crudely). Any suggestions?
Relevant answer
Answer
Hi Daniel. Thanks for that answer. I guess I meant neither, though I had to see your answer to know that. What I mean is also not the well-known problem of how bad humans are at assessing probabilities.
What I am looking for a source on is the tendency for people to interpret statistics (such as averages) not as abstractions that summarize the value and variation of data but, rather, as entities. It's as if the brain took a shortcut that reifies statistics into individual things. For example, people often misinterpret how evolution by natural selection works (a change in gene or trait frequency over several generations), seeing it as a change to an individual organism's body within its lifetime. In this common fallacy, antibiotic resistance would occur as an individual bacterium's response to antibiotics, whereas it actually emerges from the differential survival and increasing proportion of more resistant individuals in the bacterial population.
I suspect that statistics are liable to be misread in this way more generally. But I'm not sure how to find reliable information because I"m not quite sure what to call what I'm looking for.
  • asked a question related to Probability
Question
3 answers
I have a large set of sampled data. How can I plot a normal pdf from the available data set?
Relevant answer
Answer
ok sir, thank You for the kind responses.
  • asked a question related to Probability
Question
2 answers
In R-studio, there are many commands of Gumbel package. Arguments are also different.
I`m asking about the alpha parameter of the Copula which must be greater than 1. If this is the one used to plot the probability paper, how can I choose the value of alpha?
  • asked a question related to Probability
Question
17 answers
I am considering to distribute N-kinds of different parts among M-different countries and I wan to know the "most probable" pattern of distribution. My question is in fact ambiguous, because I am not very sure how I can distinguish types or patterns.
Let me give an example. If I were to distribute 3 kinds of parts to 3 countries, the set of all distribution is given by a set
{aaa, aab, aac, aba, abb, abc aca, acb, acc, baa, bab, bac, bba, bbb, bbc, bca, bcb, bcc, caa, cab, cac, cba, cbb, cbc, cca, ccb, ccc}.
The number of elements is of course 33 = 27. I may distinguish three types of patterns:
(1) One country receives all parts:
aaa, bbb, ccc 3 cases
(2) One country receives 2 parts and another country receives 1 part:
aab, aac, aba, abb, aca, acc, baa, bab, bba, bbc, bcb, caa, cac, cbb, cbc, cca, ccb 17 cases
(3) Each county rceives one part respectively:
abc, acb, bac, bca, cab, cba 6 cases
These types may correspond to a partition of integer 3 with the condition that (a) number of summands must not exceed 3 (in general M). In fact, 3 have three partitions:
3, 2+1, 1+1+1
In the above case of 3×3, the number of types was the number of partitions of 3 (which is often noted p(n)). But I have to consider the case when M is smaller than N.
If I am right, the number of "different types" of distributions is the number of partitions of N with the number of summands less than M+1. Let us denote it as
p*(N, M) = p( N | the number of summands must not exceed M. )
N.B. * is added in order to avoid confusion with p(N, M), wwhich is the number of partitions with summands smaller than M+1.
Now, my question is the following:
Which type (a partition among p*(N, M)) has the greatest number of distributions?
Are there any results already known? If so, would you kindly teach me a paper or a book that explains the results and how to approach to the question?
A typical case that I want to know is N = 100, M = 10. In this simple case, is it most probable that each country receives 10 parts? But, I am also interested to cases when M and N are small, for example when M and N is less than 10.
Relevant answer
Answer
Thank, Luis Daniel Torres Gonzalez , you for your contribution. My question does not ask the probability distribution. It asks what is the most probable "pattern" when we distribute N-items among M-boxes. I have illustrated the meaning of "pattern" by examples, but it seems it was not sufficient. Please read Romeo Meštrović 's comments above posted in March, 2019.
  • asked a question related to Probability
Question
5 answers
Do we have a mathematical formula to compute the p-value of an observation from the Dirichlet distribution in exact sense at https://en.wikipedia.org/wiki/Exact_test?
Relevant answer
Answer
Yes, the p value is always < 0.05 100% of the time in an exact test. You use the probability distribution to determine the actual p value. P is the probability that the test statistic is equal to or less than the value actually observed based on your sample under H0
  • asked a question related to Probability
Question
4 answers
Sedimentology, Geology 
Relevant answer
Answer
May be it's too late but the answer is you need to create the chart so that one of the axes is of probability scale. You can do it by calculating z-scores for your cumulative distribution function, which is calculated by NORM.S.INV function in Excel. Put z scores on one axis and the random variable (whatever it is) on the other. The random variable's axis should be of logarithmic scale. That's how you obtain a log-probability chart.
In log-probability chart, they say that a straight line is formed if the random variable is log-normally distributed, and the standard deviation is equal to the the ratio of the values of random variables at CDF= 84.1% and 50%.
My question is where 84.1% comes from.
Also, I wrote an Excel VBA routine that uses Gradient algorithm and iteratively finds the mean and the standard deviation of a log-normally distributed sample once the random variable and corresponding cumulative probabilities are given. I need that 84.1% issue to be solved before I conclude my software and publish it.
Any help is appreciated.
  • asked a question related to Probability
Question
4 answers
I have a data which consists of an excess of zero counts. The independent variables are number of tree, diameter at breast height and basal area, and the dependent variable (predictors) is number of recruits (with many zero counts).
So, I want to use Zero-inflated negative binomial model and Hurdle negative binomial model to analyze. My problem is I do not know the code of these models in R package.
Relevant answer
Answer
Dear Kaushik Bhattacharjee
thank you for the useful link, thanks a lot
best
  • asked a question related to Probability
Question
5 answers
How can I calculate and report degrees of freedom for repeated mesure ANOVA?
I have 48 observations N=48 and 2 factors of 3(P) and 8(LA) levels.
I calculate degrees of freedom as follows:
dF P = a-1= 2
df LA = b-1= 7
df LA*P =(a-1)(b-1)= 14
Error dF P = (a-1) (N-1) = 94
Error dF LA = (b-1) (N-1) = 329
Error dF P*LA = (a-1)(b-1)(N-1) = 658
My JASP analysis gave me these results:
Within Subjects Effects
Cases Sum of Squares df Mean Square F p η²
P 1.927 2 0.964 33.9 < .001 0.120
P*LA 8.450 14 0.604 21.2 < .001 0.528
Residuals 0.454 16 0.028
Can I write P : F(2,14)= 33.9
and P*LA: F(14, 658) =21.2 ???
Or is it P: F(2, 16)=33.9
P*LA: F(14, 16) =21.2 ???
Thanks to anyone who would like to answer
Relevant answer
Answer
Following
  • asked a question related to Probability
Question
4 answers
Please consider a set of pairs of probability measures (P, Q) with given means (m_P, m_Q) and variances (v_P, v_Q).
For the relative entropy (KL-divergence) and the chi-square divergence, a pair of probability measures defined on the common two-element set (u_1, u_2) attains the lower bound.
Regarding general f-divergence, what is the condition of f such that a pair of probability measures defined on the common two-element set attains the lower bound ?
Intuitively, I think that the divergence between localized probability measures seems to be smaller.
Thank you for taking your time.
Relevant answer
Answer
  • asked a question related to Probability
Question
91 answers
How to prove or where to find the integral inequality (3.3) involving the Laplace transforms, as showed in the pictures here? Is the integral inequality (3.3) valid for some general function $h(t)$ which is increasing and non-negative on the right semi-axis? Is the integral inequality (3.3) a special case of some general inequality? I have proved that the special function $h(t)$ has some properties in Lemma 2.2, but I haven't prove the integral inequality (3.3) yet. Wish you help me prove (3.3) for the special function $h(t)$ in Lemma 2.2 in the pictures.
Relevant answer
Answer
Dear Dr. Feng Qi ,
The further, the more you will be convinced of the correctness of my words. Maybe. I wrote these words for the reason that I have passed a similar path here in August and September this year:
Certain personalities promised to destroy my proof - to show that my proof was wrong. Then they forgot it, apparently? I made a reminder 5 days ago. But, no one reacted and insignificant discussions were continued.
My answer is this: waiting is always the best tactic and strategy. Remember the RF's mathematician G. Perelman. This person generally did not give a damn about the opinion of the professional community of mathematicians and was absolutely indifferent to any action that disturbed him. This is a real Hero!
Best Regards,
Sergey
  • asked a question related to Probability
Question
1 answer
Hello dear,
A great opportunity for statisticians and mathematicians around the World,
Join the Bernoulli Society and IMS for the first-ever Bernoulli-IMS One World Symposium 2020, August 24-28, 2020! The meeting will be virtual with many new experimental features. Participation at the symposium is free, registration is mandatory to get the passwords for the zoom sessions.
Good luck dear colleagues
Relevant answer
Answer
Thank You. I am there to attend the meeting
  • asked a question related to Probability
Question
3 answers
In Garman's inventory model, buying order and selling order are poisson process with order size = 1. Buying price and selling price are denoted by pb and ps, that is, the market maker gets pb when she sells a stock to the others, and spends ps to buy a stock from the others.
Garman than calculates the probability of the inventory of the market maker, says Q(k, t+dt) = probability to get 1 dollar x Q(k-1, t) + probability to lose 1 dollar x Q(k+1, t) + no buying or selling order x Q(k, t), where Q(k, t+dt) = probability to have k money at time t+dt.
In the above equation, I think Garman had split the money received and loss by buying or selling a shock in many sub-poisson process, otherwise, getting 1 dollar or losing 1 dollar are impossible, as market maker receive pb dollar and loses ps dollar in each order, but not 1 dollar. Do my statement correct? Thank you very much.
Relevant answer
Answer
Dear Chi,
Your statement may not be correct. Pa is greater than pb, it can be provided liquidity crash is there and bankruptcy reached the market size. In that case λa < λb, hence split position may occur and possibilities are there.
Earlier models of dealership markets are inventory-based models. Demsetz (1968) seems to be the first work emphasizing order imbalance. Trading has a time dimension and at each point in time, there may be an imbalance in buy and sell orders. The presence of dealers allows submitted orders to be executed immediately. The bid-ask spread is therefore a price that public investors must pay in order to obtain immediacy in order execution. Unlike Demsetz (1968) where the focus is on the trading desires of individual traders, Garman (1976) switches the spot light to market clearing mechanisms. Hence Garman’s research is considered by many to be the first formal analysis of market microstructure. Two mechanisms were considered in Garman’s model research, a double auction and a monopolistic dealer exchange.
Model describes a monopolistic dealer must first set an ask price and a bid price, receive and execute orders from public traders, and the bid and ask prices are chosen to maximize the dealer’s profit at each point in time subject to the constraint that bankruptcy or failure must not take place (which would occur if the dealer ran out of the traded security or cash). In Garman’s model, the bid-ask spread exists in order that specialist will not be ruined with probability one. That is, market viability dictates the existence of a bid-ask spread.
Most importantly, The Garman's Inventory Model deals the problem of dealer's inventory imbalance was first addressed by Garman (1976). In his model, a monopolistic dealer assigns ask (pa) and bid (pb) prices, and fills all orders. Each order size is assumed to be one unit. The dealer's goal is, as a minimum, to avoid bankruptcy.
By equation risk probability could be understood,
Let Qk(t) be the probability that the dealer has k units of cash at time t, and Rk(t) be the probability that the dealer has k units of stock at time t. For sufficiently small time interval ∆t, we have from the assumed Poisson processes
∂Qk(t) ∂t = Qk−1(t)[λa(pa)pa]+Qk+1(t)[λb(pb)pb]−Qk(t)[λa(pa)pa+λb(pb)pb],
Similarly, one can show that
lim t→+∞ R0(t) { ∼ [ λa(pa) λb(pb) ] Is(0) , if λa(pa) < λb(pb); 1, otherwise.
By solving, Garman concludes that to avoid a sure failure, it must be that
paλa > pbλb
And
λa < λb
This implies a spread.
To guarantee that the dealer can make the market indefinitely, the dealer that starts with a fixed amount of cash and a fixed number of shares of the stock at time 0 must set the ask price higher than the bid price.
Garman’s model exhibits the couple of important features like monopolistic risk neutral specialist; liquidity traders only; market orders following Poisson processes; specialist can only set prices once and for all; and specialist is facing the ‘ruin’ problem.
Amihud and Mendelson (1980) extend Garman’s study to allow price adjustments. Prices change according to inventory positions. Unlike in Garman (1976), spread in Amihud and Mendelson simply reflects monopoly power of the specialist. One can show that the spread tends to zero, as competition gets in and increases. Inventory plays the role of a ‘buffer.’ Stoll (1978) takes a different view about the bid-ask spread: market makers are those selling insurance to liquidity traders, and the spread is risk premium.
Ashish
  • asked a question related to Probability
Question
4 answers
I have a list of chromosomes, say A, B, C, and D. The respective fitness values are 1, 2, 3, and 4. The chromosomes with higher fitness values (C and D) are more likely to be selected for the parent in the next generation. Therefore, how to assign probability in MATLAB such that C and D get a higher probability for parent selection?
Relevant answer
Nice Contribution Muhammad Karam Shehzad
  • asked a question related to Probability
Question
9 answers
Suppose we have statistics N(m1, m2), where m1 is the value of the first factor, m2 is the value of the second factor, N(m1, m2) is the number of observations corresponding to the values ​​of factors m1 and m2. In this case, the probability P(m1, m2) = N(m1, m2) /K, where K is the total number of observations. In real situations, detailed statistics N(m1, m2) is often unavailable, and only the normalized marginal values ​​S1(m1) and S2(m2) are known, where S1(m1) is the normalized total number of observations corresponding to the value m1 of the first factor and S2(m2) is the normalized total number of observations corresponding to the value m2 of the second factor. In this case P1(m1) = S1(m1)/K and P2(m2) = S2(m2)/K. It is clear that based on P1(m1) and P2(m2) it is impossible to calculate the exact value of P(m1, m2). But how to do this approximately with the best confidence? Thanks in advance for any advice.
Relevant answer
Answer
For your normalising constant or marginal distribution of vector case, you may use saddle point approximation, see e.g., http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.123.1487&rep=rep1&type=pdf
  • asked a question related to Probability
Question
6 answers
Dear all. The Normal distribution (or Gaussian) is mostly used in statistics, natural science and engineering. Their importance is linked with the Central Limit Theorem. Is there any ideas how to predict the numbers and parameters of thos Gaussians ? Or any efficient deterministic tool to decompose Gaussian to a finite sum of Gaussian basic functions with parameter estimations ? Thank you in advance.
Relevant answer
Answer
For Central Limit Theorem, the random variables are not necessarily gaussian but they have to be independent and identically distributed (in classical CLT). They can come from any distribution. Moreover, CLT is an approximation. Do you have a prior knowledge that the resultant gaussian variable is composed using gaussian variables? In that case, using the convolutional properties of normal random variables can give you an idea.
  • asked a question related to Probability
Question
8 answers
Hi guys,
Can anyone describe PCE in simple words, please? How can we find a PC basis and what is an appropriate sparse PC basis?
Thanks in advance
Relevant answer
Answer
PCE is a sum of truncated terms to estimate response of a dynamic system when uncertainties are involved, specifically in its design or parameters. Imagine you have a dynamic system such as a mass and spring in which your mass follows a normal distribution. If you want to find the mean and standard deviation of your system's response, let's say your mass acceleration, when such uncertainty exists, you can build a polynomial chaos expansion for your system and get your mean and standard deviation from it.
To build your PCE you need a set of basis functions which their types depend on your random variables' type, here your mass is normally distributed so based on the literature you should choose Hermite polynomials as your basis functions.
There are plenty of papers out there on this topic that explain this AWESOME tool in detail.
Good luck.
  • asked a question related to Probability
Question
7 answers
Dear all. The Gaussian function is mostly used in statistics, physics and engineering. Some examples include:
1. Gaussian functions are the Green's function for the homogeneous and isotropic
2. The convolution of a function with a Gaussian is also known as a Weierstrass transform
3. A Gaussian function is the wave function of the ground state of the quantum harmonic oscillator
3. The atomic and molecular orbitals used in computational chemistry can be linear combinations of Gaussian functions called Gaussian orbitals
4. Gaussian functions are also associated with the vacuum state in quantum field theory
5. Gaussian beams are used in optical systems, microwave systems and lasers
6. Gaussian functions are used to define some types activation function of artificial neural networks
7. Simple cell response in primary visual cortex has a Gaussian function modeled by a sine wave
8. Fourier transforms of Gaussian is a Gaussian
10. Easy and efficient approximation for signals analysis and fitting (Gaussian process, gaussian mixture model, kalman estimator , ...)
11. Discribe the Shape of the UV−Visible Absorption
12. Used in Time-frequency analysis (Gabor Transform)
13. Central Limit Theorem (CLT) : Sum of independent random variables tends toward a normal distribution
14. The Gaussian function serves well in molecular physics, where the number of particles is closed to the Avogadro number NA = 6.02214076×1023 mol−1 ( NA is defined as the number of particles that are contained in one mole)
15. ...
Why Gaussian is everywhere ?
Relevant answer
Answer
The Gaussian function serves well in molecular physics, where the number of particles is closed to the Avogadro number NA = 6.02214076×1023 mol−1 ( NA is defined as the number of particles that are contained in one mole)
  • asked a question related to Probability
Question
5 answers
I was working on the C++ implementation of Dirichlet Distribution for the past few days. Everything is smooth, but I am not able to deduce the CDF (Cumulative Distribution Function) of Dirichlet distribution. Unfortunately, neither Wikipedia nor WolframAlpha shows the CDF formula for Dirichlet Distribution.
Relevant answer
  • asked a question related to Probability
Question
3 answers
Forgetting the context for a second, the overall question is how to compare data that is expressed in a probability.
Scenario 1: Let us say there are two events A and B. The rules are:
- A (union) B = 1
- A (intersection) B = 0
- Probability of A or B is dependent on a dollar amount. Depending on the amount, the probability either A or B happens changes. For e.g. @20,000 chance of A is 80%, then B is 20%.
Scenario 2: we have A, B, and C.
- A (union) B (union) C = 1
- A (intersection) B or C = 0
- Probability dependent on dollar amount. Same as above.
- A and B in scenarios 1 and 2 are same but their probabilities of happening are changed due to the introduction of C.
QUESTION: How can I compare the probability of the events in these two scenarios?
Possible solutions I was thinking of:
1) A is X times as likely to happen as B, then I could plot all events as a factor of B on the same graph to get a sense of how likely all events are compared to a common denominator (event B)
2) Could also get a "cumulative" probability of each event as area under the curve and express as a % or ratio. So if A occupies 80% of the area under the curve, then B should be 20%, so overall A is four times as likely, and similarly in scenario 2.
3) Maybe the way to compare is to take the complement of each event separately, and express as a percentage at each point and graph them.
Any help is greatly appreciated. Please refer to attached pic for some visual understanding of the question as well. I am making a lot of assumptions, which are not true (as concerned with the graphs etc), but theoretically, I am interested in knowing. Thank you!
  • asked a question related to Probability
Question
6 answers
How to find the distance distribution of a random point in a cluster from the origin? I have uniformly distributed cluster heads following the Poisson point process and users are deployed around the cluster head, uniformly, following the Poisson process. I want to compute the distance distribution between a random point in the cluster and the origin. I have attached the image as well, where 'd1', 'd2', and 'theta' are Random Variables. I want to find the distribution of 'r'.
Relevant answer
Answer
It gets a bit messy algebraically, but I recommend that you transform your three random variables (d_1, d_2, theta) into three new variables (r, r_c, phi_1). the variable "r" is computed using the Law of Cosines, r_c is just d_2 and phi_1 is the angle formed between r and d_2. Using the Law of Sines, we have
d_1 / sin(phi_1) = d_2 / sin(phi_2).
phi_2 is the angle between r and d_1.
Realizing that phi_2 = pi - theta - phi_1, we can eliminate phi_2 from the Law of Sines and after some messy manipulation, find phi_1 as a function of d_1, d_2 and theta.
You then construct the Jacobian matrix d(r, rc, phi_1)/d(d_1, d_2, theta). The transformed PDF is given by
p(r, rc, phi_1) = p(d_1, d_2, theta) * det(J)^{-1}
where det(J) is the determinant of the Jacobian J.
The PDF for r requires you to integrate out the rc, phi_1 variables:
p(r) = int(p(r, rc, phi_1)*det(J)^{-1} d.rc d.phi_1)
Hope this helps,
Cheers!
  • asked a question related to Probability
Question
3 answers
hello dears, I have just interested in IFRS9 ECL models. I have three question and I appreciate all answers.
1) which models are best for pd, LGD & EAD calculation when I have scarce (about 5-7 years quarterly data) data?
2) can I calculate lifetime PD without macroeconomic variables and then add macro effects?
3) when I use transition matrix approach how have to estimate "stage 2" for earlier period, when IFRS9 was not valid and there was not any classification by stages.
Relevant answer
Answer
Answering:
(2) PD without macro-variables will be just point-in-time PD. If you are interested in forward-looking PD, you have to add macro variables (easily using 1 above, for example).
(3) Bayesian methods again will help for rigorously adding expert criteria to stage calculations through prior information.
  • asked a question related to Probability
Question
5 answers
Let us consider two boxes of different energies that are separated by a barrier and box 1 is filled with many electrons. The barrier allows the electrons to tunnel from box 1 to the other side with a 1% probability. In the event that an electron in a given time frame does tunnel through the barrier, does this affect the probability that a different electron teleports through the barrier during the same time frame, or is this probability statistically independent? Therefore in the extreme case, do we know if the probability of ALL electrons tunneling is equal to P(1 electron tunneling)^(Number of electrons). Let us assume that the change in energy due to electrons filling/leaving certain energy levels is negligible.
Relevant answer
Answer
There are essentially two ways of looking at this issue. One is Juan Weisz's way - totally correctly making use of wave functions and probability densities.
Another way is to look at the physical mechanism whereby quantum tunneling happens. In a nutshell, to tunnel through an energy barrier, a particle must borrow energy from a nearby virtual particle, and relinquish back that energy to the virtual particle within the alloted permissible time frame (as per Heisenberg's relationship linking virtual energy to permissible duration of virtual existence.)
To tunnel, the particle needs not the presence of a suitable virtual particle nearby: by suitable is meant capable of fleetingly lending out the exact right amout of energy needed for the tunneling event to occur. It is, by the way, the exact selfsame mechanism that allows any individual atom of a radioactive element to decay: there is an energy barrier that must be hurdled before decay can occur, and borrowing virtual energy from a suitable nearby virtual particle is what enables overcoming that barrier.)
The amount of energy that must be borrowed depends on the particle or entity (atom, etc.: anything can tunnel) that must tunnel. If the requisite energy is high, then the occurence of matching virtual particles is statistically rarer (which is why some half-lives are long, and some are short.)
If the energy to be borrowed is low, then separate occurences of tunneling remain statistically independent of anything that happens around - as is for instance the case with carbon-8, which needs so very little energy to decay: there are so many low-energy virtual particles popping up all the time within the quantum foam that no interference with the environment occurs.) When however the requisite tunneling or decay energy is high, then suitable matching virtual particles are much rarer (think osmium-186). In that case: an atom of, say, Osmium-186 could conceivably snap up the rare suitable such virtual particle and thus interfere with and prevent another nearby atom from having its chance at decay - until the next nearby suitable virtual particle pops up, and as such statistical independence could not hold.
For any particular particle or atom, the best is to do the calculation. You can extrapolate the statistics of the appearance of suitable-energy virtual particles from the measured values of half times, see e.g. the table at https://en.wikipedia.org/wiki/List_of_radioactive_nuclides_by_half-life
  • asked a question related to Probability
Question
4 answers
Model game as a discrete-time Markov chain
Relevant answer
Answer
 A Markov chain is a mathematical system that experiences transitions from one state to another according to certain probabilistic rules. The defining characteristic of a Markov chain is that no matter how the process arrived at its present state, the possible future states are fixed. So yes, model games are discrets Markov chains :)
  • asked a question related to Probability
Question
2 answers
Dear statisticans and HTA experts ,
I am newbie and working on my first model using a markov model with probabilities varying according to time in state. After synthesis clinical trials, unfortunately I have difficulties to derive transition probabilities from pooled HR. How to calculate shape and scale parameter of Weilbull distribution from this pooled HR?
I really appreciate your helps. 
Kind regards
Cuc
Relevant answer
Answer
I hope that Andrew Paul McKenzie Pegman's answer works perfectly well for you.
In case it doesn't, please clarify some things:
- Probabilities varying with time in state and using Weibull distributions seems to say that you are not actually making a Markov model but that you are somehow including simulation of time to event with a Weibull distribution. Is that correct? Can you give some more detail on what you are doing?
- It seems impossible to me to estimate two Weibull parameters from one hazard ratio. However, since you have pooled data, why don't you use more information from the data than just the HRs?
- Since you say that you are a newbie: Do you have somebody mentoring or supervising your work?
  • asked a question related to Probability
Question
3 answers
Hello,
I have a panel database for firms i (90 firms) across year t (from 2013 to 2019). Some firms witnessed a cross border investment in a year that is between 2013 and 2019. I created a dummy=1, when the firm has received the investment in the corresponding year.
I want to create three groups small (1), medium (2) and large (3) using the revenue generated by the company and then compute how likely one firm can go from group 1 to group 2 or 3, if received foreign investment.
I am not sure which statistical method I should use and how to arrange my data according to this method.
Relevant answer
Answer
You can provide probability distributions for the probability such as normal or weibull
  • asked a question related to Probability
Question
5 answers
Who has the higher chance to get the jackpot, (i) the one who spins on one slot-fruit machine, (ii) the other who spins on thousands of slot-fruit machines? Is there a summation or a multiplication of probabilities?
Relevant answer
Answer
Conjunction probabilities, that is, probabilities involving the conjunction 'and' involve the operation of multiplication. Disjunction probabilities (involving 'or') involve the operation of addition
P(A and B)= P(A) x P(B)
P(A or B) = P(A) + P(B)
  • asked a question related to Probability
Question
4 answers
The problem is very well described and explained by a plenitude of sources. So, just a very brief reminder:
a guest is given the opportunity to select one of the three closed doors. There is a prize behind exactly one of the doors.
Once the guest has selected a door, the host opens one another (not selected) door to reveal that it hides nothing.
Then the guest can either confirm the initial choice or select another (obviously still closed) door.
Which strategy leads to success with higher probability?
Long story short, if the guest's decision remains unchanged then the probability is 1/3, if the guest changes the selection then the probability is 2/3
The solution can be found elsewhere, along with the software simulators (among them one is mine https://github.com/tms320c/threedoorstrial)
The solution itself is fine no doubt, but let us change the trial a bit.
After the guest has made the first selection, the host removes the open box (let's use boxes instead of the doors). Because everybody knows that the open box contains nothing this action does not change the distribution of the probabilities (or does it? see the questions below).
Now we have two boxes, which are not equal: one contains the prize with the probability of 1/3, while the probability to find the prize in another box is 2/3.
The host says goodbye to the first guest and invites another one. The new guest is given the opportunity to select one of the two closed boxes.
Because new guest does not know the history, so the assumption is that the probability to get the prize is 1/2. Which is totally wrong as we all (and the host, and the first guest) know now.
If the host is a kind of generous person he can tell the story. After that, the new guest can get the prize with 2/3 probability. Is it a correct probability?
I propose to discuss the following questions:
if you are in a situation where you should choose one of two presumably equal boxes/packages/whatever, should you insist on the disclosure of their history? Perhaps, they were part in such trial once upon a time in a galaxy far far away and are not equal in fact.
for how long the boxes are in the non-equilibrium state? The time delay between the two trials (first and second guests) can be arbitrary long.
may the situation lead to the conclusion that probability depends on someone's opinion, and your decision always explicitly depends on someone else's decision, which could be made in a deep past?
may the probability to have a treasure be an internal property of every box that can be modified mentally by humans?
Relevant answer
Answer
There is nothing "mystical" about this game. By opening the doors, Monty has communicated extra information to the contestant that has modified the probabilities of the contestant finding the prize. It is not unlike the Forward Algorithm, where noisy observations of system state changes and a-priori knowledge of state distribution are used to improve knowledge of the system state.
  • asked a question related to Probability
Question
6 answers
I know that comparison of these terms may require more than one question but I would appreciate it if you could briefly define each and compare with relevant ones.
Relevant answer
Answer
Perfect explanation from Prof Alexander Kolker , and I may add that in some applications prediction and forecasting are interchangeably used.
  • asked a question related to Probability
Question
12 answers
Hello, I recently found in a test bank that those two can't be increased simultaneously. As we all know, Type I error is when you reject a true null hypothesis (you think there's a treatment effect when there is not), and Type II error is when you accept a false null hypothesis (you think there's no treatment effect when there is). But I'm having trouble understanding the logic of why these two can't be increased at the same time. Thank you.
Relevant answer
Answer
Just think of 100 people being charged with a crime. You do not know if they are guilty or not. Some of them are and some of them are not.
Type I error would mean you send someone to jail but he is not guilty, type II error means you release him in spite of him being guilty.
It is very easy to completely avoid type I error by sending everyone to jail, but of course you will end up with lots of innocent people behind bars. If you release everyone you completely avoid type II errors, but you have lots of criminals running around.
Those are the two extreme cases. But you can see how they are connected. The more people you send to jail, the more type I error goes up and type II error goes down and vice versa.
  • asked a question related to Probability
Question
1 answer
iam working on cyclostationary spectrum sensing in cognitive radio, can any one help me to find the MATLAB code or the main procedures  to compute pd (probability of detection), ROC curves ?
Relevant answer
Answer
Did you get that? Please help me out for doing the same.
  • asked a question related to Probability
Question
4 answers
On the traditional statistics tasks simultaneously with Probability Value calculation usually we can also estimate Confidence Interval for this Probability value (with some pre-defined Confidence Level value). For SVM method there are developed a lot of expressions, which allow us not only recognize class of new object, but also to calculate probability, that this object belong for this class. But is it possible to estimate Confidence Interval for this Probability value? Thanks for your answer beforehand. Regards, Sergey.
Relevant answer
Answer
CONFIDENCE LEVEL is not found or calculated; it is set as tge standard against which the probability of an event is compared. The "probability value" is the same as "confidence internval". The confidence interval is the % pre-set as a standard. The probability value is the % observed compared to pre-set confidence interval.
If you find a new class of object, the pre-set of confidence interval will serve as a filter if series of observation meets or exceeds ghat standard, it is considered a new class. Each member of the class has its owned probability value of % which is compared to the pre-set confidence ingerval %.
  • asked a question related to Probability
Question
11 answers
Is there any probabilistic study on When/If Artificial Intelligence may take over the humanity?
Relevant answer
Answer
There is a very good book, published in 2018, and named "Hello World: How to be Human in the Age of the Machine", It might help.
  • asked a question related to Probability
Question
3 answers
Negative binomial random variable with parameter r and p can be thought of as the number of attempts before rth success. This is a generalised form of geometric random variable with n=1. I am interested in calculating the expectation of max of N negative binomial random variables which are independent and identically distributed (i.i.d.). The difficulty that I am facing is that there is no known closed form formula for cumulative distribution function (CDF) of negative binomial random variables, therefore I can not apply the multiplication rule to the CDF of max of N negative binomial random variables.
Relevant answer
Answer
See the paper Maximum Statistics of N Random Variables Distributed by the Negative Binomial Distribution
  • PETER J. GRABNER (a1) and HELMUT PRODINGER (a2)
  • asked a question related to Probability
Question
2 answers
I am going to develop a queueing model in which riders and drivers arrive with inter-arrival time exponentially distributed.
All the riders and drivers arriving in the system will wait for some amount of time until being matched.
The matching process will pair one rider with one driver and takes place every Δt unit of time (i.e., Δt, 2Δt, 3Δt, ⋯). Whichever side outnumbers the other, its exceeding portion will remain in the queue for the next round of matching.
The service follows the first come first served principle, and how they are matched in particular is not in the scope of this problem and will not affect the queue modelling.
I tried to formulate it as a double-ended queue, where state indicates the exceeding number in the system.
However, this formulation didn't incorporate the factor Δt in it, it is thereby not in a batch service fashion. I have no clue how I can formulate this Δt (somewhat like a buffer) into this model.
Relevant answer
Answer
Can you please explain in more detail the process of matching. The enclosed picture is the standard random walk with two competing waiting exponentially distributed independent times: If wins the 1 type - we go one step to the right, in the oposite case -we go to the left. No service is sketched. Thus as much as possible precise description of the service is needed. Now, the main doubt is caused by lack of the interpretation of negative positions: Isn't it the difference of the numbers of arrived riders and drivers?
Also, writing these words:
GQ: "The matching process will pair one rider with one driver and takes place every Δt unit of time (i.e., Δt, 2Δt, 3Δt, ⋯). Whichever side outnumbers the other, its exceeding portion will remain in the queue for the next round of matching. "
it is not explained what are the "outnumbers". My English is too weak to understand the context. Can this be explained in more simple wards like this:
The state is characterized by current values of two numbers: of riders and the drivers in the waiting room. At the instant of matching k\tmes delta t the numbers are becomming less by the miniumum f the two (hence one becomes zero) . . . .
Note that this s a kind of guess what was meant by you!
Joachim
  • asked a question related to Probability
Question
6 answers
I want to realise a sampling in a target law knowing the equation of this last one. I want to make random sampling in this law. Does exist a general method to make this kind of sampling ? As for example for realise sampling in the Gamma or Gaussian function
Relevant answer
Answer
I can try that maybe. I will se about that, I am actualy able to noise a map with a physical simulation that we can find in the litterature. But for other application, I want to see what happend when I use a sort of simple random number generator to noise a displacement map instead of the simulation. I need it for other application. Thank you
  • asked a question related to Probability
Question
8 answers
  • Purpose of the post:
I'm struggling to understand the significance of the fat-tailed distribution especially in career choice. 80000hours career guide argues that the more accurate distribution for career choice is the long-tailed one.
I'm trying to understand how the implication would differ between a normal bell-curve and a long-tailed distribution. My request: are the implications I wrote in "Part 2: Significance of the fat-tail distribution" accurate? Please focus on points 1 and 2.
Other Names: heavy-tailed distribution, long-tailed distribution, , pareto distribution
  • Part 1: Description of Fat-Tail Distribution phenomenon in World Problems and Career Success in 80000hours career guide:
The guide is available for free download at https://80000hours.org/book/
"the most effective actions achieve far more than average. These big differences in expected impact mean that it’s really important to focus on the best areas. Of course, making these comparisons is really hard, but if we don’t, we could easily end up working on something with comparatively little impact. This is why many of our readers have changed which problem they work on. "p.60
"Each change took serious effort, but if changing area can enable you to have many times as much impact, and be more successful, then it’s worth it." (p.61)
"the top 10% of the most prolific elite can be credited with around 50% of all contributions, whereas the bottom 50% of the least productive workers can claim only 15% of the total work" (p. 89)
Simonton, Dean K. ʺAge and outstanding achievement: What do we know after a century of research?ʺ Psychological bulletin 104.2 (1988): 251 as cited in p.89 of 80000hours guide
"Areas like research and advocacy are particularly extreme, but a major study still found that the best people in almost any field have significantly more output than the typical person." Hunter, J. E., Schmidt, F. L., Judiesch, M. K., (1990) “Individual Differences
in Output Variability as a Function of Job Complexity”, Journal of Applied as cited in P.90 of 80000hours guide
"Moreover, success in almost any field gives you influence that can be turned into positive impact by using your position to advocate for important problems (p.91).
This all also means you should probably avoid taking a “high impact” option that you don’t enjoy, and lacks the other ingredients of job satisfaction, like engaging work". (p.91)
"Finally, because the most successful people in a field achieve far more than the typical person, choose something where you have the potential to excel. Don’t do something you won’t enjoy in order to have more impact. " (p.95)
📷
Figure Assumed (NOT actual) Bell-Curve for problems (p.59)
📷
Figure: Actual fat-tail distribution for problems (p.60)
📷
Figure: Actual Fat-Tail distribution of Careers (p.90)
  • Part 2: Significance of the fat-tail distribution especially for individuals (All points are comparing fat-tail to bell-curve) :
1. It seems to me that in both distributions we have a motivation to aim for the top (obviously the top would be better) but the nature of motivation changes.
The nature of the carrot and stick changes from "aim to be among the best" because
(bell-curve carrot): if you do great things will happen to you.
(bell curve stick): if you don't, you will remain in the average and mean range
To" aim to be among the best" because
(fat-tail carrot): if you do, VERY VERY VERY great things will happen to you.
(fat-tail stick):If you don't, you will remain in the average (BELOW mean)
2. The more variance there is in a statistical sample, the wiser it is to aim to move to the exploration direction in the exploration-exploitation (deliberation-action) spectrum (the relationship between variance and value of new information). In terms of career, the variance is great so the exploration investment should be great as well.
3. Median does not equal average. The average could be misleading and the median could be misleading
4. If all other factors are moderate or weak but you have a good reason to think you can reach the top, go for it because average can be misleading, your rank is as important as your field).
5. Choosing a job at random is not advised because the median is lower than we think.
6. Even though below mean is much more likely in this distribution, probability of being a hyper-performer is higher , which makes the hyper-performer goal a more realistic goal hence, it's easier to become motivated by it.
7. Looking at the outliers becomes more important because they become more influential.
8. Pareto principle (80/20) can encourage generalist over specialization because the effort you put in one field result in a better impact if distributed over many fields but at the same time it can encourage specialization over generalism because the pareto distribution makes you want to aim for the very top because its impact is disproportionate.
9. Does the long-tailed distribution favor an "all-star" "super-star" or "SWAT" team approach of fewer more qualified people (quality over quantity)?
10. Prioritization becomes more important?
11. We often behave as if the bell-curve distribution is true while it's not. So the significance lies in adjusting our behaviour?
12. Why does personal fit become more important under the long-tailed distribution? Because the long-tailed distribution has a higher variance. If success in a specific field had zero variance, then individual differences wouldn't matter much so the person-environment fit would not be v. important because as persons change nothing changes as a result. However, the reality is that there is great variance so individual differences matter a lot (how would you "react" to a field NOT JUST how good is a career).
Relevant answer
Answer
I really don`t like such long explanations.
You are not on the way with a mathematical proof!
If frequency of distribution and theoretical density fit you may have a solution.
Check your data with Kolmogorov–Smirnov test.
  • asked a question related to Probability
Question
7 answers
Can numbers (the Look then Leap Rule OR the Gittins Index) be used to help a person decide when to stop looking for the most suitable career path and LEAP into it instead or is the career situation too complicated for that?
^^^^^^^^^^^^^^^^^
Details:
Mathematical answers to the question of optimal stopping in general (When you should stop looking and leap)?
Gittins Index , Feynman's restaurant problem (not discussed in details)
Look then Leap Rule (secretary problem, fiancé problem): (√n , n/e , 37%)
How do apply this rule to career choice?
1- Potential ways of application:
A- n is Time .
Like what Michael Trick did https://goo.gl/9hSJT1 . Michael Trick A CMU Operations Research professor who applied this to his decide the best time for his marriage proposal., though he seems to think that this is a failed approach.
In our case, should we do it by age 20-70= 50 years --- 38 years old is where you stop looking for example? Or Should we multiply 37% by 80,000 hours to get a total of 29600 hours of career "looking"?
B- n is the number of available options. Like the secretary problem.
If we have 100 viable job options, we just look into the first 37? If we have 10, we just look into the first 4? If we are still in a stage of our lives where we have thousands of career paths?
2- Why the situation is more complicated in the career choice situation:
A- You can want a career and pursue it and then fail at it.
B- You can mix career paths. If you take option c, it can help you later on with option G. for example, if I went as an IRS, the irs will help me later on if I decide to become a writer so there's overlap between the options and a more dynamic relationship. Also the option you choose in selection #1 will influence the likelihood of choosing other options in Selection 2 (For example, if in 2018 I choose to work at an NGO, that will influence my options if I want to do a career transition in 2023 since that will limit my possibility of entering the corporate world in 2023).
C- You need to be making money so "looking" that does not generate money is seriously costly.
D- The choice is neither strictly sequential nor strictly simultaneous.
E- Looking and leaping alternates over a lifetime not like the example where you keep looking then leap once.
Is there a practical way to measure how the probability of switching back and forth between our career options affects the optimal exploration percentage?
F- There is something between looking and leaping, which is testing the waters. Let me explain. "Looking" here doesn't just mean "thinking" or "self-reflection" without action. It could also mean trying out a field to see if you're suited for it. So we can divide looking into "experimentation looking" and "thinking looking". And what separates looking from leaping is commitment and being settled. There's a trial period.
How does this affect our job/career options example since we can theoretically "look" at all 100 viable job positions without having to formally reject the position? Or does this rule apply to scenarios where looking entails commitment?
G- * You can return to a career that you rejected in the past. Once you leap, you can look again.
"But if you have the option to go back, say by apologizing to the first applicant and begging them to come work with you, and you have a 50% chance of your apology being accepted, then the optimal explore percentage rises all the way to 61%." https://80000hours.org/podcast/episodes/brian-christian-algorithms-to-live-by/
*3- A Real-life Example:
Here are some of my major potential career paths:
1- Behavioural Change Communications Company 2- Soft-Skills Training Company, 3- Consulting Company, 4-Blogger 5- Internet Research Specialist 6- Academic 7- Writer (Malcolm Gladwell Style; Popularization of psychology) 8- NGOs
As you can see the options here overlap to a great degree. So with these options, should I just say "ok the root of 8 is about 3" so pick 3 of those and try them for a year each and then stick with whatever comes next and is better?!!
Relevant answer
Answer
Hey Kenneth Carling , I got this number from page 29 in their book (Always Be Stopping, Chapter 1). They quote research results from Seale & Rapoport (1997) who found that on average their subjects leapt at 31% when given the secretary problem - they say that most people leapt too soon. They also say that there are more studies ("about a dozen") with the same result, which makes it more credible in my view.
  • asked a question related to Probability
Question
2 answers
It´s interesting the muscle answer after RPE, but probably it has another result if we experiment a RPE with bone anchorage
Relevant answer
Answer
It's interesting reading your team's papers but something that is not clear to me. Some of your papers mentioned that we can evaluate muscular activity of a person by comparing it with the normal range. However, I cannot understand how do you define the "normal range" eventhough I found and read your reference.
For example in the paper " Neuromuscular evaluation of post-orthodontic stability: An experimental protocol - 2002", your research team said that " In all examinations of the adolescent patient, POC was higher than 86% and TC was lower than10%. In the adult, POC was inside the normal range, while all TCs were higher than 10.5%." and did refer the normal range which is provided in the paper " An electromyographic investigation of masticatory muscles symmetry in normo-occlusion subjects - 2000".
The normal range means "the mean value ± SD" or something else?
I tried to find out but cannot understand. Please explain more and this would help a lot.
Thank you so much.
  • asked a question related to Probability
Question
4 answers
I have applied "gaussmix" function in voicebox MATLAB tools to calculate GMM. However, the code gives me error when I run it for 512 GMM components.
No_of_Clusters = 512;
No_of_Iterations = 10;
[m_ubm1,v_ubm1,w_ubm1]=gaussmix(feature,[],No_of_Iterations,No_of_Clusters);
Error using *
Inner matrix dimensions must agree.
Error in gaussmix (line 256)
pk=px*wt; % pk(k,1) effective number of data points for each mixture (could be zero due to underflow)
I need 1024 or 2048 Mixtures for Universal Background Model (UBM) construction. Could anyone give me matlab code to calculate GMM for big number of mixture such as 512 or 2048?
Relevant answer
Answer
Yes mathworks can help with this error with MATLAB code for Gaussian Mixture Model, www.mathworks.com
  • asked a question related to Probability
Question
10 answers
Dear scholars, the points in graph (attached, plotted using R) is from a probability mass function and it reduces exponentially to 0 but up to a certain index point (eg. x = 53). Above the point, the values start to fluctuate. Anyone has any idea why this happens? Any tips on how to fix this using R?
Thanks very much to everyone in advance.
Relevant answer
Answer
Thank you Ms. Mercedes Orús-Lacort
  • asked a question related to Probability
Question
4 answers
Hello.
I run REST2 simulations, which is one of the replica exchange simulations. I would like to ask one question about a validity of REST2.
I run REST2 simulations of the system of a membrane protein. I prepared 32 replicas and temperatures were assumed to be from 310 to 560 K.
Now, I run about 40ns simulations, but "tunnel" has not been observed yet. I mean, for example, a replica starting at 310 K has not been shifted completely to 560 K. I thought one of the requirements for demonstrating validity of REST2 is "tunnel". However, to my knowledge, most of papers about work with REST2 do not even mention if "tunnel" has completely occurred in the simulation time.
I would like to ask you whether "tunnel" is a necessary and general factor to show the validity of your simulations in REST2. The reason why I ask is that it seems to take much more times for "tunnel" to happen completely. And I actually wonder that "tunnel" can not be observed in REST2 for the big system like membrane proteins.
By the way, I confirmed any acceptance ratios were about 30% and the potential energies of each replica were overlapped. So apart from "tunnel", my REST2 simulations would be not strange in theory.
I would be happy if you gave me a reply.
Thank you in advance.
Relevant answer
Answer
Your question just led me to analyse and visualize the distribution of replicas over temperatures in REST2. See the images attached. It really looks like there are some kind of trading barriers. The images show the replica<->temperature distribution for the helical peptide (AAQAA)3 and the beta hairpin HP7. For comparison, I attached the distribution of our TIGER2hs for the same peptides.
  • asked a question related to Probability
Question
7 answers
Could somebody point me towards an efficient implementation of a (displaced) chi-square cumulative distribution?
Relevant answer
Answer
hope this link is useful
regards
  • asked a question related to Probability
Question
3 answers
I have two histogram. One has some sort of non-uniform distribution and the other one looks like exponential or similar to some standard distribution. Visually, The second one is smoother and curve fillting is possible. I need to quantify these two histograms to conclude the second one smoother than first one and it may follow some distribution(not necessarily normal).
What are typical parameters we can compare on histograms? I have tested skewness(not significant), kurtosis and ks test(failed).
Thanks
Relevant answer
Apply the normal curve on the histograms. Mean and SD are the typical features of a histogram.
  • asked a question related to Probability
Question
7 answers
Take N examples sampled from a multinomial distribution (p1,p2,⋯,pm) for m outcomes, with Ni being the number of examples taking the outcome i. Here I assume p1,p2,⋯,pm are listed in the descending order; if not, we can reorder them to make them descending.
My objective is to make the most frequently occuring outcome, i.e., argmax_i{N_i} as close to 1 as possible.
So my question is what size of N we should take to make the most frequently occuring outcome is less than some predetermined number m0 (1<=m0<=m) ? Furthermore, how N determines the most frequently occuring outcome?
The tool I can think might be useful is Bretagnolle-Huber-Carol inequality. Any clues or references are appreciated. Thanks.
Relevant answer
Answer
The most frequently outcome corresponds to the mode of the distribution.
It depends on
p1, p2 , ......... , pm , & Ni .
There may be one or more modes.
Accordingly, there may be one or more than one most frequently event.
  • asked a question related to Probability
Question
26 answers
Design-based classical ratio estimation uses a ratio, R, which corresponds to a regression coefficient (slope) whose estimate implicitly assumes a regression weight of 1/x.  Thus, as can be seen in Särndal, CE, Swensson, B. and Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlang, page 254, the most efficient probability of selection design would be unequal probability sampling, where we would use probability proportional to the square root of x for sample selection. 
So why use simple random sampling for design-based classical ratio estimation?  Is this only explained by momentum from historical use?  For certain applications, might it, under some circumstances, be more robust in some way???  This does not appear to conform to a reasonable data or variance structure.
Relevant answer