Science topic

Likelihood Functions - Science topic

Functions constructed from a statistical model and a set of observed data which give the probability of that data for various values of the unknown model parameters. Those parameter values that maximize the probability are the maximum likelihood estimates of the parameters.
Questions related to Likelihood Functions
  • asked a question related to Likelihood Functions
Question
3 answers
I am doing uncertainty in rainfall runoff model ("hbv") and wants to get optimal value of parameter, but the difficulty is the selection of the likelihood function? (the cod in Rstudio and interpretation)
Relevant answer
Answer
I would start with the normal likelihood function. See the attached to get started. Best wishes David Booth
  • asked a question related to Likelihood Functions
Question
7 answers
If I have a likelihood function with four parameters to maximise where two parameters comes from distribution function under study and other two parameters come from certain assumptions. Then is there any method to find best estimate or to optimise estimates, based upon some criteria like unbiasedness, consistency, efficiency, etc.
Relevant answer
Answer
Ahlam H. Tolba and Jochen Wilhelm thank you for showing your interest in the question. I am studying lifetime models, to be more specific my interest is to estimate the parameters (2 parameters coming from the distribution; let say A and B) of competing risk model with two risks competing for the life of the individual/unit at the same time. In addition to this the lifetime is assumed to be a fuzzy number(remaining 2 parameters come from here; let say a and b). Now, the problem is that till now I have optimised the likelihood for parameters 'A and B' keeping 'a and b' fixed. But on changing values of a and b arbitrarily it has been observed that bias and mean squared error was changing. And bias was even changing it's nature from negative to positive. So my question is can we estimate a and b as well to get the unbiased estimates or more consistent estimates or more efficient estimates?@
  • asked a question related to Likelihood Functions
Question
9 answers
Hi,
I'm a fish biologist and I'm interested in assessing the uncertainty around the L50, which is the (sex-specific) length (L) at which you expect 1 fish out of 2 (50%) to exhibit developed gonads and thus, participate in the next reproductive event.
Using a GLM with a binomial distribution family and a logit link, you can get the prediction from your model with the predict() function in R on the logit (link) scale, asking to generate too the estimated SE (se.fit=TRUE), and than back-transform the result (i.e., fit) on the response scale.
For the uncertainty (95%CI), one can estimate the commonly-used Wald CIs by multiplying the SE by ± 1.96 on the logit scale and then back-transform these values on the response scale (see the Figure below). From the same logistic regression model, one can also estimate the CI on the response scale with the Delta method, using the "emdbook" package and its deltavar() function or the "MASS" package and its dose.p() function, still presuming that the variance for the linear predictors on the link scale is approximately normal, which does not always hold true.
For the profile likelihood function that seems to better reflect the sometimes non-normal distibution of the variance on the logit scale when compared to the two previous methods (Brown et al. 2003), it unfortunately seems that no R package exists to estimate CIs of logistic regression model predictions according to this approach. You can, however, get the profile likelihood CI estimates for your Beta parameters with the confint() function or using the "ProfileLikelihood" package, but regarding a logistic regression prediction, it seems that one would need to write its own R scripts, which we will likely end up doing.
Any suggestion would be welcome. Either regarding specifically the profile likelihood function (Venzon & Moolgavkar 1988) or any advice/idea on this topic.
Briefly, we are currently trying to find out which of these methods (and others: parametric and non-parametric bootstrapping, Bayesian credible intervals, Fieller analytical method) is/are the most optimal at assessing the uncertainty around the L50 for statistical/biological inferences, pushing a bit further the simulation study of Roa et al (1999).
Thanks, Merci, Obrigado
Julien
Brown, L. D., T. T. Cai, and A. DasGupta. 2003. Interval estimation in exponential families. Statistica Sinica 13:19-49.
Roa, R., B. Ernst, and F. Tapia. 1999. Estimation of size at sexual maturity: an evaluation of analytical and resampling procedures. Fishery Bulletin 97:570-580.
Venzon, D. J., and S. H. Moolgavkar. 1988. A method for computing profile-likelihood based confidence intervals. Applied Statistics 37:87-94
Relevant answer
Answer
Hi Salvatore,
Thanks for your suggestion.
My collaborator and I will be comparing in fact 7 different methods to assess the uncertainty (95% CI) in a logistic regression model prediction, i.e. the L50 in our case (i.e., the length at which a fish has a 50% chance of exhibiting mature gonads):
1- Delta method
2- Wald-based method
3- Profile likelihood (the one for which an R package would be great)
4- Non-parametric bootstrapping
5- Parametric bootstrapping (less common)
6- Fieller (1944) analytical method
7- Bayesian credible intervals
The profile likelihood function is of interest to us, as it better models the sometimes non-normal variance in the linear predictors on the link scale. Moreover, as pointed out by Royston (2007) regarding the advantage of the profile likelihood over the Wald and bootstrap approaches:
"Both examples indicate that using profi le likelihood improves on normal-based CIs. The main reason is that the likelihood-ratio statistic tends to approach its asymptotic distribution more rapidly than the equivalent Wald statistic [...] In principle, one could do so also by using the bootstrap. However, for CI calculations not assuming normality, the bootstrap can become computationally expensive. The bootstrap is usually assumed valid, even in small samples, but that may not be so; the bootstrap gives only asymptotically correct results."
And Zheng et al. (2012) concluded that the profile likelihood function should be preferred over the non-parametric bootstrapping approach, as the latter performed more poorly in controlling type 1 error.
Looking forward to see what our study will reveal, but we'll need to figure out how to adequately quantify the profile likelihood CI for this.
Thanks again for your suggestion.
Royston, P. 2007. Profile likelihood for estimation and confidence intervals. The Stata Journal 7:376-387.
Zheng, H., Z. Liao, S. Liu, W. Liang, X. Zhang, and C. Ou. 2012. Comparison of three methods in estimating confidence intervals and hypothesis testing of logistic regression coefficients. Journal of Mathematical Medicine 4:393-396.
  • asked a question related to Likelihood Functions
Question
19 answers
Hi everyone.
I have a question about finding a cost function for a problem. I will ask the question in a simplified form first, then I will ask the main question. I'll be grateful if you could possibly help me with finding the answer to any or both of the questions.
1- What methods are there for finding the optimal weights for a cost function?
2- Suppose you want to find the optimal weights for a problem that you can't measure the output (e.g., death). In other words, you know the contributing factors to death but you don't know the weights and you don't know the output because you can't really test or simulate death. How can we find the optimal (or sub-optimal) weights of that cost function?
I know it's a strange question, but it has so many applications if you think of it.
Best wishes
  • asked a question related to Likelihood Functions
Question
8 answers
I have already estimated the threshold that I will consider, created the pdf of the extreme values that I will take into account and currently trying to fit to this a Generalized Pareto Distribution. Thus, I need to find a way to estimate the values of the shape and scale parameter of the Generalized Pareto Distribution. 
Relevant answer
Answer
Mojtaba Mohammadian When using the
the mean excess plot, you look to choose the threshold at a section of the plot where there is stability (i.e. near horizontal). You choose the smallest value of the threshold for this region to reduce bias, but at the expense of the variance. Therefore you must have a compromise between these two opposing measures.
  • asked a question related to Likelihood Functions
Question
2 answers
Hi, I have questions about HLM analysis.
I got results saying 'Iterations stopped due to small change in likelihood function'
First of all, is this an error? I think it needs to keep iterating until it converges. How can I make this keep computing without this sign? (type of likelihood was restricted maximum likelihood, so I tried full maximum likelihood but I got the same sign) Can I fix this if I set higher '% change to stop iterating' in iteration control setting?
Relevant answer
Answer
I general convergence is when there is no meaningful change in the likelihood!
I cannot tell if this is a warning or it has been successful.
You may want to have a look at this:
This is the HLM 8 Manual ; I suggest you search it for the word "convergence"
  • asked a question related to Likelihood Functions
Question
4 answers
What is the difference between Maximum Likelihood Sequence Estimation and Maximum Likelihood Estimation? Which one is a better choice in case of channel non-linearities? And why and how oversampling helps in this?
Relevant answer
Answer
The Maximum Likelihood Estimation (MLE) is a method of estimating the parameters of a specific model. It selects the set of values of the model parameters that maximizes the likelihood function. Intuitively, this maximizes the "agreement" of the selected model with the observed data.
  • asked a question related to Likelihood Functions
Question
3 answers
I have estimated the parameters by Maximum Likelihood Estimation (MLE) and Probability Weighted Method (PWM). I wish to construct the L Moment Ratio diagram, to graphically demonstrate that empirical (L-skewness, L-kurtosis) coordinates of my financial asset sample lie close to GL distribution (say), but the picture is very clumsy in R. I want to customize it, make it neat and hence i need the freedom to work in spreadsheet. Besides, an excel sheet is more intuitive. Could you kindly sir share it? I shall be grateful to you. I am willing to cite you this work in my reference, and put in in my acknowledgement section of thesis which I shall send you a copy by next July. Please.
Relevant answer
Answer
Rudolph Ilaboyar I would like to know if I can also have access to your algorithm of L-moments, I am working on a frequency analysis and this tool would very useful for me. I promise to use it correctly and cite the corresponding author. Thank you, Camila Gordon, my e-mail is camila_alejandra.nocua_gordon@ete.inrs.ca
  • asked a question related to Likelihood Functions
Question
13 answers
In general term, Bayesian estimation provides better results than MLE . Is there any situation, Where Maximum Likelihood Estimation (MLE) methods gives better results than Bayesian Estimation Methods?
Relevant answer
Answer
I think that your answer may vary depending on what you consider as better results. In your case, I will assume that you are referring to better results in terms of smaller bias and mean square error. As stated above, if you have poor knowledge and assume a prior that is very far from the true value, the MLE may return better results. In terms of Bias, if you work hard you can remove the Bias of the MLE using formal rules and you will get better results in terms of Bias and MSE. But if you would like to look at as point estimation, the MLE can be seen as the MAP when you assume a uniform distribution.
On the other hand, the question is much more profound in terms of treating your parameter as a random variable and including uncertainty in your inference. This kind of approach may assist you during the construction of the model, especially if you have a complex structure, for instance, hierarchical models (with many levels) are handled much easier under the Bayesian approach.
  • asked a question related to Likelihood Functions
Question
4 answers
Hansen, Gupta and Azzalini have distribution density distribution for skew student's t,  and from the density and distribution of Hansen (1994) he introduce conditional variance to the log likelihood which from the pdf there are no substitution or assumption made.
Relevant answer
Answer
hi
I have one the question
do you know skewed Student t distributions cumulative distribution?
  • asked a question related to Likelihood Functions
Question
3 answers
Dear friends: I have a myriad of small time-series data sets, which are normally distributed (example: Daily electricity demand in a location by hours 1 thru 24. You could clearly observe that during peak hours there is high demand for electricity, and then it dies out during evening and night hours). If I plot the demand curves under normal distribution, I need to mathematically find a way to find the most-like shapes of the curves. My initial thought was to use Maximum Likelihood Function (MLE) to find the most like shapes because we know Mean and Standard Deviation of each data set. Is there any other commonly used methods within the quant community to solve mathematically this problem. Please note I am not a big fan of using Least Squares method (LSE) because it is prone to errors - i found data sets with similar sum of least squared error but their shapes were NOT similar. I appreciate your advise.
Relevant answer
Answer
But LSE is just a special case of MLE...
  • asked a question related to Likelihood Functions
Question
1 answer
I want to estimate the unknown parameter of normal distribution by MLE of my data set of weight of new born babies of Nepal Demographic Health Survey. I want to construct confidence interval of parameter.
Relevant answer
Answer
If the H is the Hessian of the log likelihood evaluated at the maximum, then the diagonal elements of (-H)-1 are the estimated variances. Their square-roots are the standard errors (SE), and the approximate 95% (Wald-) confidence intervals are given by the MLE ± 2*SE. If the variance of the response is not known, the SE are scaled so that the ratios of the MLEs and the scaled SEs are t-distributed. You would then get the confidence interval from the quantiles of the t-distribution.
  • asked a question related to Likelihood Functions
Question
2 answers
Can any molecular fellow who is good at app math and programming help solving following problem:
How can I build the likelihood function on MKn model when there are more than 5 character states?
Many thx!
Regards
Z
Relevant answer
Answer
Great thanks.
  • asked a question related to Likelihood Functions
Question
2 answers
I am learning how I can estimate parameters by MLE using MATLAB. But for the part of custom likelihood function, it's a little complicated for me. I have done some exercises, but didn't succeed. So, I would like to know who can give me some examples on that, especially for estimating the parameters using custom likelihood function.
  • asked a question related to Likelihood Functions
Question
3 answers
Dear All,
We know that the MLEs maximize the likelihood function and they are restricted to be in the parameter space. What about weighted least squares (WLS) estimators? WLS estimators also minimize a quantity but are WLS estimators restricted to be in the parameter space?Thanks
Relevant answer
Answer
@ Dear Mr. S.M.T.K. MirMostafaee,
From my own experience, I can state that LSM Estimators can be applied only for estimation of numerical values. These include
- parameters of some parametrized model (see, for instance, parameters identification of ARMA models in [1]);
- estimations of some physical values based on the redundant sensors readouts in the fault-tolerant measuring systems (see, for instance, [2]).
I think that this list is incomplete and could be augmented with additional examples. WLS estimators can improve these evaluations only by increasing or decreasing the contribution of each estimated value in the total performance index (total square error) using numerical weight matrices, but they cannot extend the class of problems to be solved.
Therefore, you can evaluate the essence of the estimation problem, which you have to solve, and take decision.
1. L. Ljung. SYSTEM IDENTIFICATION: Theory for the User. University of Linköping. Sweden. PTR Prentice Hall, Englewood Cliffs, New Jersey.
2. V.B. Larin, A.A. Tunik. Fault–Tolerant Strap-Down Inertial Navigation Systems with External Corrections. Appl. and Comp. Math. An Intern. Journ.- Vol.14, No.1, 2015.-pp. 23-36.
  • asked a question related to Likelihood Functions
Question
5 answers
In Bayesian inference, likelihood function may be unable to obtain posterior distribution in some cases. In this cases, is there any alternative approach (apart from MCMC-alternative methods) as alternative to likelihood function for bayesian inferences ?
Relevant answer
Answer
You can dear brother, Çağatay Çetinkaya use the Approximative bayesian methods like: Integrated Nested Laplace Approximation developped bay Rue, H. and Martino, S. and Chopin, N. (2009). a package named "INLA" is available in R cran , or simply you will find a lot about it in http://www.r-inla.org
  • asked a question related to Likelihood Functions
Question
1 answer
I want to implement the conjugate gradient method to find the global minimum of the minus likelihood function in the space of dynamical trajectories. I was trying to compute the gradient function and the Hessian matrix using MATLAB, but I could not manage to implement it. Could someone help me to get an idea or the codes (either python or MATLAB) on how to proceed with it. Further reading: an article by Dmitry, L. and V.N.Smelyanskiy, Reconstruction of stochastic nonlinear dynamical models from trajectory measurements.
Thank you for your cooperation!!
  • asked a question related to Likelihood Functions
Question
5 answers
Hello,
I am working with an outcome variable that follows a count (Poisson) distribution.
I have 3 IV that follow a normal distribution and 1 DV that follow a count distribution. Thus, I'd like to compute a Negative Binomial Regression.
Yet, instead of a Maximum Likelihood Estimation, I would like to use Bayesian Inference Approach to specify the estimate of my model Negative Binomial Regression.
But I really cannot manage (yet) to understand how to compute a Bayesian Negative Binomial Regression in R.
I would be really delighted and grateful is someone could provide any help in this regard,
Thank you!
Sincerely,
Nicolas
Relevant answer
Answer
First, note that the distribution of IVs does not matter in regression models.
The brms package in R provides Bayesian negative binomial regression. The command for a full model would be:
brm(DV ~ IV1 * IV2, family = "negbinomial", data = YourData)
You can extract and interpret the results in much the same way as Poisson regression, which I describe in chapter 7.4 of my book draft:
  • asked a question related to Likelihood Functions
Question
4 answers
Am trying to predict more confident mean value of coefficients of systems , those determined by their no. of failures and linked consequences, by Bayesian Analysis though am confused how to calculate its likelihood. Most of research papers address number of failures in specific time intervals but am using number of failures and its impact on reliability. What distribution or model can i use to determine its likelihood function. Any relevant paper or material , specifically from railway industry if there is any.
Relevant answer
Answer
Dear Mary Nawaz, What is the progress in the solving of the problem you have aimed at. Go slowly, think deeply, work out perfectly. Success is then certain.
  • asked a question related to Likelihood Functions
Question
4 answers
The conditional variance is specified to follow some latent stochastic process in some empirical applications of volatility modelling. Such models are referred to as stochastic volatility (SV) models which were originally proposed by Taylor (1986). The main issue in univariate SV model estimations is that the likelihood function is hard to evaluate because, unlike the estimation of GARCH family models, the maximum-likelihood technique has to deal with more than one stochastic error process. Nevertheless, recently, several new estimation methods such as quasi-maximum likelihood, Gibbs sampling, Bayesian Markov chain Monte Carlo, simulated maximum likelihood have been introduced for univariate models.
I would like to know whether any of aforementioned estimation methods have been extended to multivariate stochastic volatility models? Could anyone recommend any code, package or software with regard to the estimation of multivariate stochastic volatility models?
Relevant answer
Answer
Following
  • asked a question related to Likelihood Functions
Question
5 answers
we are using generalized stepping-stone sampling method to calculate MLE (Marginal likelihood estimation) values. then after getting these MLE values from BEAST, how can we calculate the Bayes Factor?
Relevant answer
Answer
You can calculat the value of logBF from the formula: logBF = logPr(D|M1) – logPr(D|M2) , the vaule ranging from 3-5 strong support for M1 better fit to the data....
  • asked a question related to Likelihood Functions
Question
2 answers
Many authors have pointed out the problems that arise from using coefficient alpha when assumptions are violated. Coefficient omega is perhaps the most popular alternative; however, as far as I know (I may be wrong since I haven't read McDonald's original work), this coefficient is based on CFA. Since CFA assumes large samples, how large should sample size be to compute coefficient omega?
Relevant answer
Answer
Large samples are required to estimate alpha or omega precisely. As the sample size improves the CI for alpha or omega will narrow.
The linked paper shows you how to plan sample sizes for particular precision of estimation of omega or alpha. Note that the sample sizes are not dissimilar for the two.
  • asked a question related to Likelihood Functions
Question
2 answers
After perusing fuzzy sets literature I have yet to find a way of defuzzifying a mean's range (i.e. its lower and upper confidence levels) rather than the mean itself. Is there an output procedure that can handle this?
For example, if I have a mean of 1, and -- accounting for standard error and sample size -- I find that its lower confidence level (LCL) is 0.9 and its upper confidence level (UCL) is 1.2 at 95% a significance level.
Taking the LCL of 0.9 and UCL of 1.2, obviously 0.1 of the range falls within the membership function of < 1 and 0.2 of the range falls within the membership function of 1 < 2. Thus, 33.33% of the range is < 1 (at a 0.333 membership degree) and 66.66% of the range is 1 < 2 (at a 0.666 membership degree).
Is there a way of defuzzifying this range? Or must fuzzification only be applied to one unit of measure at a time (i.e. the mean itself)? Thank you for reading!
Relevant answer
Answer
  • asked a question related to Likelihood Functions
Question
2 answers
how to find VaR vector out of copula, say i have find the parameter of t copula...
Relevant answer
Answer
Dear Ye Liu, I am interested in looking for the VaR of a univariate r v which is function of several or all of my margins.
  • asked a question related to Likelihood Functions
Question
7 answers
The first model has 2 predictor variables and I want to know if the predictive power of the model increases with the addition of the 3rd predictor variable?
Relevant answer
Answer
.
Akaike Information Criterion is one possibility
.
  • asked a question related to Likelihood Functions
Question
6 answers
can anyone correct my syntax plzz for this equations
M=@SUM(speciality(i):@sum(priority,type_of_patient(j,k):Y(i,j,k))*(@exp(mu(i)*@sqrt(sigma))));
V=@SUM(speciality(i):@sum(priority,type_of_patient(j,k):Y(i,j,k))*(@exp(sigma(i))-1)*(@exp(2*mu(i)+sigma(i))));
Relevant answer
Answer
Dear Dina,
This error emerges since you define M as an attribute in the set "specialty'. This set basically has 3 members:
speciality:d,var,mu,sigma,M,V,T;
So, you have to remove M, V, and T from the attributes:
speciality /1..3/:d,var,mu,sigma;
priority /1..4/;
Once, you did that the error is resolved. However, another error emerges because you wrote :))*(@exp(mu(i)*@sqrt(sigma)));  without "i" however, it is defined as an attribute has 1 indice with 3 data. It should be sigma(i).
Best Wishes
  • asked a question related to Likelihood Functions
Question
3 answers
This is the error 
the code is here 
MODEL:
!TRIAL FOR REACTIVE SCHEDULING;
SETS:
! indecies;
! d= surgical duration of speciality i;
! var= variance of surgical duration of speciality i;
! mu,sigma= lognormal distrbution parameters for speciality i;
! M= The sum of the expected surgical durations assigned;
! v= the variance of the sum of the expected surgical durations;
! T= time remaining in the theatre;
! t= the surgical durations;
speciality:d,var,mu,sigma,M,V,T;
priority;
type_of_patient;
allocation(speciality,priority,type_of_patient):X,Y,E;
! X: initially assigned patients
Y: newly assigned patients
E: avaliable patients;
cost_benefit(priority,type_of_patient):C;
ENDSETS
DATA:
speciality = 1 2 3;
priority = 1 2 3 4;
type_of_patient= elective, emergency;
X = 6;
E = 10;
C = 10 15
8 10
4 5
1 3;
mu = 2.78 , 3.38, 3.42;
sigma= 0.674, 0.561 ,0.779;
ENDDATA
! objective function;
min = @sum(allocation(i,j,k):C(j,k)*(X(i,j,k)-Y(i,j,k))+@abs(X(i,j,k)-Y(i,j,k))); ! to minimize the cost of the new schedule;
!the constriant about the number of the patients assigned cannot exceed the number avaliable;
Y(i,j,k)<= E(i,j,k);
calc:
! the calculation of d;
d(speciality(i)) = @EXP(mu(i)+ (sigma(i)/2));
!the cal of variance;
var((speciality(i))) = (@EXP(sigma(i))-1)*(@EXP(2*mu(i)+sigma(i)));
!the cal of The sum of the expected surgical durations assigned;
M=@SUM(speciality(i):@sum(priority,type_of_patient(j,k):Y(i,j,k))*(@exp(mu(i)*@sqrt(sigma))));
!the cal of The sum of the variance of the expected surgical durations assigned;
V=@SUM(speciality(i):@sum(priority,type_of_patient(j,k):Y(i,j,k))*(@exp(sigma(i))-1)*(@exp(2*mu(i)+sigma(i))));
! the cal of the amount of the time that is planned for the patients assigned to the theatre;
T =>(@SQR(M)/(@SQRT(V+ @SQR(M))))*(@EXP(@SQRT(@LOG((V+@SQR(M)/@SQR(M))))));
@FOR(allocation(i,j,k): Y(i,j,k)=> 0);
endcalc
END
Relevant answer
Answer
Actually, this solution wasn't work and it showed me up this error
  • asked a question related to Likelihood Functions
Question
3 answers
Consider a signal F1(x) with a variance var1, and a signal F2(x) = a*F1(x-b) with another variance var2.
So, what becomes the variance var3 of the signal F3(x) resulting from:
F3(x) = F1(x) + F2(x).
Relevant answer
Answer
Some important identities:
Var(aX-b) = a^2 Var(X)
Var(X1 + X2) = Var(X1) + Var(X2) + 2*Covar(X1,X2)
Covar(X,aX) = a*Var(X)
If we assume no lag autocorrelation exists in your signal then Covar(X,X(x-b))=0
Thence Var(F3) = Var(F1) + Var(F2) + 2*Covar(F1,F2)
                       = Var(F1) + a^2*Var(F1) +  0
                       = (1+a^2) * Var(F1)
  • asked a question related to Likelihood Functions
Question
6 answers
Suppose f(x) is a piecewise continuous function defined for [a,∞).
If an inproper integral ∫[a,∞)f(x)dx coverges, what can we say on a limit(x→∞) f(x)?
Since we assume the inproper intgegral caoverges, we can say there is some c such that
inf(x>c)f(x)<=0
sup(x>c)f(x)>=0
otherwise ∫[c,∞)f(x)dx>=inf(x>c)f(x)×(∞-c)=∞ or ∫[c,∞)f(x)dx<=sup(x>c)f(x)×(∞-c)=-∞ that contradicts the assuption.
So we can say
1, if limit(x→∞) f(x) converges,it converges to 0,
2, limit(x→∞) f(x) can't diverge.
What else can we say?
For example, If limit(x→∞) f(x) oscillates around 0 without convergence perpetually, is it possible that the improper integral ∫[a,∞)f(x)dx converges?
And how about on others?
Best regards
Relevant answer
Answer
Perpetual oscillation is possible. Let us imagine a smooth nonnegative function f(x) defined on [0, ∞) that it is unimodal in each integral interval [N, N+1] and the integral has just 1/N2 there. Then the improper integral converges. This function is highly flexible. 
If we take a sharper bell curve which has the maximum N on the interval [N, N+1], the function is divergent.
To make the function oscillate alternatively around zero is also possible. 
Fujimoto's second observation is not very exact. We should instead say that 
2'.    limit(x→∞) f(x) = ∞ is impossible.  
  • asked a question related to Likelihood Functions
Question
2 answers
I would like to compute the omega coefficient for a bifactor model with residual variance.
May I know if adjustment/modification is needed for the formula?
By the way, I used the Watkins (2013)'s Omega software to compute the omega and omega (h).
Relevant answer
Answer
  • asked a question related to Likelihood Functions
Question
1 answer
I have seen many examples of ELM being examined as a moderator especially when examining the impact on the attitudes. However, I am wondering if ELM has ever been examined as a mediator to examine the impact on attitudes? can you refer me to some examples ?
Relevant answer
Answer
I am struggling with your use of the concept of Elaboration likelihood model. As far as I know, the ELM is a model describing how recipients of information (most often information aiming to change the attitude of the recipient) elaborate on this information - and consequently how effective this information is. This can be with low involvement on a peripheral route or with high involvement on a central route. Which route is taken depends on the level of involvement, and causes attitudinal change (if it works) that differs in stability/ longevity.
Are you talking about involvement of the information-recipient/information quality/sender characteristics/ etc.  as a mediator, instead of a moderator?
  • asked a question related to Likelihood Functions
Question
2 answers
Please help. Can a Tobit regression outputhave negative pseudo r2 and a positive log likelihood while the probability chi square is significant at 1 percent level?
Relevant answer
Answer
Thanks  for the answer Genidy.
  • asked a question related to Likelihood Functions
Question
2 answers
I have a partial adjustment regression with dummy variables in regressors. N=17 and T=18. because of dummies can't use GMM methods. how can I solve autocorrelation problem?
this is my model EIB = C(1)*EIB(-1) + C(2)*PB + C(3)*PG + C(4)*PN + C(5)*PK + C(6)*PL + C(7)*RD + C(8)*D1 + C(9)*D2 + C(10)*DS + C(11)
D1,D2 are industrial dummies and DS indicates time dummy.
Relevant answer
Answer
 Dear Ehsan,
Dorood
Thank you so much
good luck 
  • asked a question related to Likelihood Functions
Question
3 answers
Hello everyone,
Using exploratory methods to track changes in the syntactic preferences of constructions over time, I was wondering if anybody has ever conceived of time (e.g. decades) as a continuous variable in statistical analysis.
For instance, I have a corpus that covers the period between the 1830's and the 1920's (10 decades) and I would like to divide my dataset into, say, 5 clusters of decades.
Discrete time:
- 1830-1840
- 1850-1860
- 1870-1880
- ...
Continuous time:
- 1830-1850
- 1850-1870
- 1870-1890
- ...
What do you think? Knowing that this could be feasible only in exploratory analysis, not in predictive analysis (regression models).
Thanking you in advance!
Relevant answer
Answer
Thank you Ann Christina Foldenauer. It helps a lot, indeed! The thing is that I created clusters of decades right because the second outcome of the binary variable is underrepresented in my dataset (as it is often the case in linguistic studies) if the time variable is continuous. I've never used mixed models, only (multiple) linear and binary logistic models. Basically I created the clusters of decades in order to create Multiple Correspondence Analysis maps in which the configuration of the categories would not be affected by data sparseness.
  • asked a question related to Likelihood Functions
Question
3 answers
I have a mosaic plot (done by statisticians) that has a likelihood ratio and I want to know what this ratio actually represents. The mosaic plot summarizes the relationship between my protein of interest and tumor grade. As far as I know, the likelihood ratio in this case is the chi square test and not the same as likelihood ratio done for diagnostic testing? I am just not sure what it represents and what the null hypothesis would be. Any feedback will be greatly appreciated. 
Relevant answer
Answer
Just correction to the slide - the information given for the likelihood ratio is chi square = 9.931 and the prob>chi square is 0.0070
  • asked a question related to Likelihood Functions
Question
5 answers
What is the difference between joint distribution function and likelihood function?
Relevant answer
Answer
Let X be a random variable having probability density function f(.,theta) and X_1,X_2,...,X_n be a random sample from f(.), then joint distribution function of this sample be f(X_1,X_2,...,X_n,theta). If you loked this function as a function of theta i.e.f(theta; (X_1,X_2,...,X_n) , then this is called as likelihood.
The likelihood function is defined as the joint density function of the observed data treated as a functions of the parameter theta.
According to Lehmann, the likelihood function is a function of the parameter only, with the data held as a fixed constant. Here noted that the likelihood is not the probability of the parameter .
  • asked a question related to Likelihood Functions
Question
7 answers
In our models, we compute the likelihood of the whole data that is generated in one day (by summing up the log likelihood of single data instances). However,  data generated in one time interval may be more than the next day. This makes the log likelihood of the days with too many data instances to be less than the one with fewer data instances. How can I compare the likelihood for different sizes of data?
Relevant answer
Answer
I don't think that there is a really sound way to compare likelihoods obtained from different amounts of data (likelihoods can be used to compare different models *given* some fixed set of data).
Since the log-likelihood of a data set is the sums of the log-probabilities of the values in the data set I would assume that the average log-likelihood between two samples with unequal sizes are comparable.
However, (log-)likelihoods on similar but different data (but the same model) can be quite different, especailly for smaller samples (n<50). So the comparison of such (average log-)likelihoods between different data sets is associated with a huge uncertainty. (Note: this solution with the average won't work for instance for Cauchy-distributed data)
To get an impression you may use the following R script that produces a plot showing how much the average log likelihoods scatter, depending on the sample size (from n=5, 6, 7, ... up to n=100; each sample size is analyzed 100 times with different random data). The data are random values from a standard normal distribution, and the likelihood is evaluated for the same distribution. You can see that (i) the results scatter relatively much, (ii) especially for small sample sizes, and (iii) they seem to converge for n -> Inf.
nn = rep(5:100, each=100)
aveLogLiks = sapply(nn, function(n) mean(pnorm(rnorm(n),0,1,log.p=TRUE)) )
plot(unlist(aveLogLiks)~jitter(nn),pch=".")
  • asked a question related to Likelihood Functions
Question
5 answers
Fisher information [1] is a way of measuring the amount of information that an observable random variable X has about an unknown parameter u, upon which the likelihood function of u depends.
On the other hand we can characterize the sensitive dependence on initial conditions by measuring the distance between two close points on a statistical differential manifold.
Now the question is: "Is there any relationship between the amount of information and the distance of two close points?"
[1] Fisher, R. A. (1922) On the mathematical foundations of theoretical
statistics. Philosophical Transactions of the Royal Society, A, 220,
309–368.
Relevant answer
Answer
chaos in my opinion is a loss of memory of signal to itself during a finite time. As nearby points in phase space get closer this an information gain. As the distance increases this is an information loss. when the average information gain is equal to the average information loss this is the state of chaos. This is why chaotic attractor is limited in phase space. you can simply imagine that the change in initial condition as a disturbance to a system having three equilibrium points satisfying shilnikov conditions one is unstable at the origin, and amplifies the disturbance, until it reaches the stable equilibrium, then damped again to the origin. This can be characterized by Lyaponov Exponent.
  • asked a question related to Likelihood Functions
Question
3 answers
I am writing an article about Fagan's nomogram. I want to add an example for calculation of post-test probability using Fagan's nomogram. I need an online tool for accurate presentation of pre-test probability and +ve Likelihood ratio on the nomogram.
If not possible, what is the best solution? Should i seek help from a graphic designer?
Relevant answer
Answer
I found this one:
It works, but it doesn't look very nice. I would suggest you develop your own app using R and Shiny (http://shiny.rstudio.com/).
There is a `fagan.plot` function in the TeachingDemos package (http://www.inside-r.org/packages/cran/TeachingDemos/docs/fagan.plot) that could get you started with the app. 
Good luck!
  • asked a question related to Likelihood Functions
Question
4 answers
Model discrepancy in such cases?
Relevant answer
Answer
A likelihood function equal to a Dirac delta is a degenerate case, as the Dirac delta is not a function. It would mean that, after the observations, you are completely sure about the value of the parameter and all possible uncertainty is removed. In theory you can deal with such cases, but in practice, they are imposible. If your question arise from a practical situation, the observation model should be reviewed.
For such likelihood functions, discrepancy analysis is also degenerate, as the final (posterior) distribution of the parameter/s is concentrated in a single value of the parameter/s.
  • asked a question related to Likelihood Functions
Question
1 answer
Control
Relevant answer
Answer
You can get the similar or apropriate answer by searching the keyword in the GOOGLE SCHOLAR page. Usually you will get the first paper similar to your keyword.
From my experience, InsyaAllah this way will help you a lot. If you still have a problem, do not hasitate to let me know.
Kind regards, Dr ZOL BAHRI - Universiti Malaysia Perlis, MALAYSIA
  • asked a question related to Likelihood Functions
Question
6 answers
Does anyone know a citable paper in which the marginal likelihood of a normal distribution with unknown mean and variance is derived?
A short sketch of how the procedure should look like: The joint probability is given by P(X,mu,sigma2|alpha,beta), where X is the data. Rearranging gives P(X|mu, sigma2) x P(mu|sigma2) x P(sigma2). Integrating out mu and sigma2 should yield the marginal likelihood function.
I found several paper which work with the marginal likelihood for the linear regression model with a normal prior on the beta and an inverse gamma prior on the sigma2 (see e.g. (Fearnhead & Liu, 2007)). Or deriving the posterior distribution of the unknown parameters, but not the marginal likelihood.
I hope the question was understandable and anyone may help me.
Greetz,
Sven.
Relevant answer
Answer
Hey,
so as far as I can tell the derivation in Xuan agrees with my point - that you can't integrate over an inverse gamma prior on the variance unless the prior on sigma2 is linked to the prior on the mean (beta in this case). The horizontal arrow in Figure 2.1 of Xuan is crucial! It is this assumption that makes 2.22 still come out as Gaussian with variance proportional to sigma2, and not some horrible combination of parameters.
The fact that the priors are linked is very important, and often under-appreciated. To understand this, imagine a slider that allows you to change the value of sigma2 in the Xuan model. As you move this slider you change the variance of your likelihood AND the variance of your prior on beta. This is not the same as a model with independent priors on beta and sigma2, in which moving the slider would have no effect on the prior on beta.
Connecting the priors up in this way is exactly equivalent to assuming a joint normal-inverse-gamma prior on both parameters - in other words the results of Xuan and Greenberg that you found are identical (notice that Xuan 2.25 is the same as Greenberg's final answer, aside from a change of variables).
As for what you want to do, I am slightly confused. You want to assume a Gaussian likelihood and integrate over a normal-inverse-gamma prior on means and variances? If so then isn't the result in Greenberg already exactly what you want?
I'm interested to hear where this goes!
Bob
  • asked a question related to Likelihood Functions
Question
5 answers
Short history:
As I understood can the normalized (profile) likelihood be used to get the CI of an estimate. "Normalized" means that the function will have a unit integral.
This works beautifully for the binomial distribution (the normalized likelihood is a beta-density). For the expectation of the normal distribution the dispersion parameter needs to be considered -> profile likelihood, and the normalized profile likelihood is the t-density.
My problem:
Now I thought I should be able go a similar way with gamma-distributed data. The ready solution in R seems to be fitting a gamma-glm and use confint(). Here is a working example:
y = c(269, 299, 437, 658, 1326)
mod = glm(y~1, family=Gamma(link=identity))
(coef(mod)-mean(y) < 1E-4) # TRUE
confint(mod, level=0.9)
Ok, now my attempt to solve this problem step-by-step by hand:
The likelihood function, reparametrized to mu = scale*shape:
L = Vectorize(function(mu,s) prod(dgamma(y,shape=mu/s,scale=s)))
L.opt = function(p) 1/L(p[1],p[2])
optim(par=c(100,100), fn=L.opt, control=c(reltol=1E-20))
The MLE for mu is equal to the coefficient of the model. The next lines produce a contour plot of the likelihood function with a red line indicating the maximum depending on mu:
m = seq(200,1500,len=100)
s = seq( 50,1500,len=100)
Lmat = outer(m,s,L)
Rmat = Lmat/max(Lmat)
contour(m,s,Rmat,levels=seq(0.1,1,len=10))
s.max = function(mu) optimize(function(s) L(mu,s), lower=1, upper=2000, maximum=TRUE)$maximum
lines(m,Vectorize(s.max)(m),col=2,lty=3)
The next step is to get the profile likelihood (and look at a plot):
profileL = Vectorize(function(mu) L(mu,s.max(mu)))
plot(profileL, xlim=c(200,2000))
Then this profile likelihood is normalized to get a unit integral:
AUC = integrate(profileL,0,Inf,rel.tol=1E-20)$value
normProfileL = function(mu) profileL(mu)/AUC
(integrate(normProfileL,0,Inf)$value-1 < 1E-4) # TRUE
plot(normProfileL, xlim=c(200,2000))
Now obtain the 90% CI as the quantiles for which the integral of this normalized profile likelihood is 0.05 and 0.95:
uniroot(function(mu) integrate(normProfileL,0,mu)$value-0.05, lower=0, upper=5000)$root
uniroot(function(mu) integrate(normProfileL,0,mu)$value-0.95, lower=0, upper=5000)$root
This is 413.9 ... 1557.
confint(mod,level=0.9) # gives 364.8 ... 1076
This seems to be not just a rounding error.
Where am I wrong?
Relevant answer
Answer
Dear Dr.,
There is a relation between Normal distribution and Gamma distribution, so I suggest using the transformation technique into the Normal distribution then, estimate the CI for Normal distribution. Finally, returne the CI into Gamma distribution.
Best Regards.
  • asked a question related to Likelihood Functions
Question
1 answer
Hi, I'm new to ML / phylogenetic software and have a question regarding the importance of ML values.  Authors usually provide ML values, describe branches within tress as either strongly or weakly supported, and use ML values to justify their inferences.  The following information is cut from the RAxML manual (section 2.1) regarding ML values.....'My personal opinion is that topological search (number of topologies analysed) is much more important than exact likelihood scores in order to obtain a good tree'....... This suggests to me that bootstrapping is more important that ML values, does anyone have an opinion on this?  Support values in the ML tress I have made are <50%, I used 1000 bootstraps; however, the topologies and placing of taxa in my tress are not too dissimilar to those in the literature, should I be concerned about ML support values <50%?
Relevant answer
Answer
I think you should be concerned.
  • asked a question related to Likelihood Functions
Question
3 answers
What external factors can cause power system components outage? What is their occurrence likelihood? And how can it be calculated?
Relevant answer
Answer
Dear Salah,
I hope I understood your question as you wanted it to be recognized.
Numerous factors can ruin power electronic components. First let us eliminate mishandling - a major factor during assembly.
External factors to be considered include:
Solar load - often seen in hot environments when devices are subject to solar radiation leading to unexpected thermal burden regarding ambient temperature.
Cosmic radiation - a statistical failure depending on geographic position
Short circuit due to damage to external wires - happens quite often
Degradation of heat sinks due to dust - very common
Hazardous gases like SO2 in industrial environments - depending on location
Humidity as in tropical environments
Vandalism, sabotage or mechanical impacts like fork lifters hitting a cabinet (no kidding)
Voltage sags with quickly returning voltage that damages rectifier diodes due to high surge currents -  very common in weak grids like India and China
Lightning Strikes - happens more often than expected
For sure the list is no complete but before going on, please let me know if this is what you expected.
  • asked a question related to Likelihood Functions
Question
7 answers
I would be very appreciative if someone could help me. What I want to do is:
1. Compare two linear multiple regressions. I want to know what coping strategy best predicts burnout. The first model has eight coping strategies. The second has 7 coping strategies (I removed one coping strategy from first model based on the Z statistic).
2. From here I want to test for parsimony using a likelihood ratio test to compare likelihood values of each model based on a chi-square distribution.
I have no idea how to do the second step, which has been recommended by my research supervisor. Does anyone have any suggestions? I am using SPSS.
Thanks!
Julian
Relevant answer
Answer
Is it possible to use a variable reduction process like principle components or factor analysis (assuming there is a latent construct)?  This way you can use the data that was collected.
Otherwise, the likelihood ratio test (LRT) or Wald Test would work as well.  I do not know how to do either of them in SPSS for regression nor did I find an answer in the stats books I have.  I can give you the actual formula to calculate the log likelihood for regression, and than use the log likelihood to calculate the Chi2  stat for LRT.  Not sure how much help that would be for you though.  
  • asked a question related to Likelihood Functions
Question
1 answer
I need the syntax or command for a 2-stage conditional maximum likelihood estimation using STATA.
Relevant answer
  • asked a question related to Likelihood Functions
Question
4 answers
I have done stepping stone analysis in MrBayes to estimate the marginal likelihood for two models. I have got the following results:
Each table with the marginal likelihoods value for two runs and the mean for each model.
Model 1 (M1) =
Run Marginal likelihood (ln)
------------------------------
1 -22944.69
2 -22932.89
------------------------------
Mean: -22933.58
Model 2 (M2)
Run Marginal likelihood (ln)
------------------------------
1 -22936.41
2 -22940.16
------------------------------
Mean: -22937.08
Which model is better? Which one I should reject and why?
Relevant answer
Answer
Thank you very much Alexander
  • asked a question related to Likelihood Functions
Question
4 answers
Likelihood for incorporating tagging data into an age-structured stock assessment models are well-known but I am struggling in finding such functions to use tagging data in the context of a global production model (e.g Schaefer model).
Relevant answer
Answer
Hi Salar. The question is quite simple. When you have a time series of tagging data (mark-recaptures) how can you use it as an index of abundance (e.g how to build a likelihood for it) when you are modelling the population dymanic using a global model. Hope it is clearer now, Cheers. Rodrigo
  • asked a question related to Likelihood Functions
Question
2 answers
This is still my old problem...
Obviously, the likelihood and the "sampling distribution" are related. The shape of the likelihood converges against the shape of a normal distribution with mean = xbar and variance = s²/n (difference is just a scaling factor to make the integral equal unity).
Consider normal distributed data, the variance s² being a nuisance parameter. Ignoring the uncertainty of s², the likelihood again is similar to the normal sampling distribution. The confidence interval can be obtained from the likelihood as the central range covering 100(1-a)% of the area under the likelihood curve. This is explained for example in the attached link, p. 31, last paragraph.
The author shows that the limits of the intervals are finally given by sqrt(chi²(a))*se (equation 13) and states that sqrt(chi²(a)) is the same as the a/2-quantile of the normal distribution. Then he writes "The test uses the quantile of a normal distribution, rather than a Student t distribution, because we have assumed the variance is known.". Ok so far.
How is this done in the case when the variance is unknown? I suppose one would somehow come to sqrt(F)*se, so that sqrt(F) is the quantile of a t-distribution...
Other question: How is the likelihood (joint of mu and sigma²) related to the t-distribution? It there a way to express this in terms of conditional and/or marginal likelihoods? This could possibly help me to understand the principle when there are nuisance parameters in general.
Relevant answer
Answer
Like in this lecture:
(slides 22-24)?
It's clear that the philosophical background of likelihood and t-distribution is different, but are the shapes of these functions different? I cant show this analytically, but numerically they seem to have identical shapes (is there an analytical proof of proportinality of the profile likelihood and the t?).
I also understood that the limits of the profile likelihood confidence intervals are obtained as the intersections of the profile likelihood function with 1/8 or 1/16 of its maximum. It is clear that these values do not correspond exactly to the 0.025 and 0.975-quantiles of the t. But (essentially the same question as before): if one would normalize the profile likelihood to get an unity integal and take this as a probability density, would the 0.025 and 0.975-quantiles then be identical to the limits of the 95% CI as given by the t-distribution?
  • asked a question related to Likelihood Functions
Question
46 answers
In many texts I find the statement that the likelihood is a normal-Gamma, e.g. here:
I never found anything explaining HOW this is derived. It is always nicely shown how the likelohood for a known variance is derived (-> normal with mean mu and variance sigma²/n).
Relevant answer
Answer
Jochen, first of all let us agree that what you are doing has nothing to do with likelihood per se. I presume that what you are asking is "why do Bayesians commonly use the gamma distribution for this purpose?". Well, one answer is that it is purely mathematical convenience. In fact a true subjective Bayesian would strongly reject any suggestion that such a distribution is likely to be appropriate. In fact for variance components in general there have been a huge number of possible proposals. So if you don't find it obvious it's because it isn't.