Likelihood Functions - Science topic
Functions constructed from a statistical model and a set of observed data which give the probability of that data for various values of the unknown model parameters. Those parameter values that maximize the probability are the maximum likelihood estimates of the parameters.
Questions related to Likelihood Functions
I am doing uncertainty in rainfall runoff model ("hbv") and wants to get optimal value of parameter, but the difficulty is the selection of the likelihood function? (the cod in Rstudio and interpretation)
If I have a likelihood function with four parameters to maximise where two parameters comes from distribution function under study and other two parameters come from certain assumptions. Then is there any method to find best estimate or to optimise estimates, based upon some criteria like unbiasedness, consistency, efficiency, etc.
I'm a fish biologist and I'm interested in assessing the uncertainty around the L50, which is the (sex-specific) length (L) at which you expect 1 fish out of 2 (50%) to exhibit developed gonads and thus, participate in the next reproductive event.
Using a GLM with a binomial distribution family and a logit link, you can get the prediction from your model with the predict() function in R on the logit (link) scale, asking to generate too the estimated SE (se.fit=TRUE), and than back-transform the result (i.e., fit) on the response scale.
For the uncertainty (95%CI), one can estimate the commonly-used Wald CIs by multiplying the SE by ± 1.96 on the logit scale and then back-transform these values on the response scale (see the Figure below). From the same logistic regression model, one can also estimate the CI on the response scale with the Delta method, using the "emdbook" package and its deltavar() function or the "MASS" package and its dose.p() function, still presuming that the variance for the linear predictors on the link scale is approximately normal, which does not always hold true.
For the profile likelihood function that seems to better reflect the sometimes non-normal distibution of the variance on the logit scale when compared to the two previous methods (Brown et al. 2003), it unfortunately seems that no R package exists to estimate CIs of logistic regression model predictions according to this approach. You can, however, get the profile likelihood CI estimates for your Beta parameters with the confint() function or using the "ProfileLikelihood" package, but regarding a logistic regression prediction, it seems that one would need to write its own R scripts, which we will likely end up doing.
Any suggestion would be welcome. Either regarding specifically the profile likelihood function (Venzon & Moolgavkar 1988) or any advice/idea on this topic.
Briefly, we are currently trying to find out which of these methods (and others: parametric and non-parametric bootstrapping, Bayesian credible intervals, Fieller analytical method) is/are the most optimal at assessing the uncertainty around the L50 for statistical/biological inferences, pushing a bit further the simulation study of Roa et al (1999).
Thanks, Merci, Obrigado
Brown, L. D., T. T. Cai, and A. DasGupta. 2003. Interval estimation in exponential families. Statistica Sinica 13:19-49.
Roa, R., B. Ernst, and F. Tapia. 1999. Estimation of size at sexual maturity: an evaluation of analytical and resampling procedures. Fishery Bulletin 97:570-580.
Venzon, D. J., and S. H. Moolgavkar. 1988. A method for computing profile-likelihood based confidence intervals. Applied Statistics 37:87-94
I have a question about finding a cost function for a problem. I will ask the question in a simplified form first, then I will ask the main question. I'll be grateful if you could possibly help me with finding the answer to any or both of the questions.
1- What methods are there for finding the optimal weights for a cost function?
2- Suppose you want to find the optimal weights for a problem that you can't measure the output (e.g., death). In other words, you know the contributing factors to death but you don't know the weights and you don't know the output because you can't really test or simulate death. How can we find the optimal (or sub-optimal) weights of that cost function?
I know it's a strange question, but it has so many applications if you think of it.
I have already estimated the threshold that I will consider, created the pdf of the extreme values that I will take into account and currently trying to fit to this a Generalized Pareto Distribution. Thus, I need to find a way to estimate the values of the shape and scale parameter of the Generalized Pareto Distribution.
Hi, I have questions about HLM analysis.
I got results saying 'Iterations stopped due to small change in likelihood function'
First of all, is this an error? I think it needs to keep iterating until it converges. How can I make this keep computing without this sign? (type of likelihood was restricted maximum likelihood, so I tried full maximum likelihood but I got the same sign) Can I fix this if I set higher '% change to stop iterating' in iteration control setting?
I have estimated the parameters by Maximum Likelihood Estimation (MLE) and Probability Weighted Method (PWM). I wish to construct the L Moment Ratio diagram, to graphically demonstrate that empirical (L-skewness, L-kurtosis) coordinates of my financial asset sample lie close to GL distribution (say), but the picture is very clumsy in R. I want to customize it, make it neat and hence i need the freedom to work in spreadsheet. Besides, an excel sheet is more intuitive. Could you kindly sir share it? I shall be grateful to you. I am willing to cite you this work in my reference, and put in in my acknowledgement section of thesis which I shall send you a copy by next July. Please.
In general term, Bayesian estimation provides better results than MLE . Is there any situation, Where Maximum Likelihood Estimation (MLE) methods gives better results than Bayesian Estimation Methods?
Hansen, Gupta and Azzalini have distribution density distribution for skew student's t, and from the density and distribution of Hansen (1994) he introduce conditional variance to the log likelihood which from the pdf there are no substitution or assumption made.
Dear friends: I have a myriad of small time-series data sets, which are normally distributed (example: Daily electricity demand in a location by hours 1 thru 24. You could clearly observe that during peak hours there is high demand for electricity, and then it dies out during evening and night hours). If I plot the demand curves under normal distribution, I need to mathematically find a way to find the most-like shapes of the curves. My initial thought was to use Maximum Likelihood Function (MLE) to find the most like shapes because we know Mean and Standard Deviation of each data set. Is there any other commonly used methods within the quant community to solve mathematically this problem. Please note I am not a big fan of using Least Squares method (LSE) because it is prone to errors - i found data sets with similar sum of least squared error but their shapes were NOT similar. I appreciate your advise.
I want to estimate the unknown parameter of normal distribution by MLE of my data set of weight of new born babies of Nepal Demographic Health Survey. I want to construct confidence interval of parameter.
I am learning how I can estimate parameters by MLE using MATLAB. But for the part of custom likelihood function, it's a little complicated for me. I have done some exercises, but didn't succeed. So, I would like to know who can give me some examples on that, especially for estimating the parameters using custom likelihood function.
We know that the MLEs maximize the likelihood function and they are restricted to be in the parameter space. What about weighted least squares (WLS) estimators? WLS estimators also minimize a quantity but are WLS estimators restricted to be in the parameter space?Thanks
In Bayesian inference, likelihood function may be unable to obtain posterior distribution in some cases. In this cases, is there any alternative approach (apart from MCMC-alternative methods) as alternative to likelihood function for bayesian inferences ?
I want to implement the conjugate gradient method to find the global minimum of the minus likelihood function in the space of dynamical trajectories. I was trying to compute the gradient function and the Hessian matrix using MATLAB, but I could not manage to implement it. Could someone help me to get an idea or the codes (either python or MATLAB) on how to proceed with it. Further reading: an article by Dmitry, L. and V.N.Smelyanskiy, Reconstruction of stochastic nonlinear dynamical models from trajectory measurements.
Thank you for your cooperation!!
I am working with an outcome variable that follows a count (Poisson) distribution.
I have 3 IV that follow a normal distribution and 1 DV that follow a count distribution. Thus, I'd like to compute a Negative Binomial Regression.
Yet, instead of a Maximum Likelihood Estimation, I would like to use Bayesian Inference Approach to specify the estimate of my model Negative Binomial Regression.
I have found this reference :https://cran.r-project.org/web/packages/NegBinBetaBinreg/NegBinBetaBinreg.pdf
But I really cannot manage (yet) to understand how to compute a Bayesian Negative Binomial Regression in R.
I would be really delighted and grateful is someone could provide any help in this regard,
Am trying to predict more confident mean value of coefficients of systems , those determined by their no. of failures and linked consequences, by Bayesian Analysis though am confused how to calculate its likelihood. Most of research papers address number of failures in specific time intervals but am using number of failures and its impact on reliability. What distribution or model can i use to determine its likelihood function. Any relevant paper or material , specifically from railway industry if there is any.
The conditional variance is specified to follow some latent stochastic process in some empirical applications of volatility modelling. Such models are referred to as stochastic volatility (SV) models which were originally proposed by Taylor (1986). The main issue in univariate SV model estimations is that the likelihood function is hard to evaluate because, unlike the estimation of GARCH family models, the maximum-likelihood technique has to deal with more than one stochastic error process. Nevertheless, recently, several new estimation methods such as quasi-maximum likelihood, Gibbs sampling, Bayesian Markov chain Monte Carlo, simulated maximum likelihood have been introduced for univariate models.
I would like to know whether any of aforementioned estimation methods have been extended to multivariate stochastic volatility models? Could anyone recommend any code, package or software with regard to the estimation of multivariate stochastic volatility models?
we are using generalized stepping-stone sampling method to calculate MLE (Marginal likelihood estimation) values. then after getting these MLE values from BEAST, how can we calculate the Bayes Factor?
Many authors have pointed out the problems that arise from using coefficient alpha when assumptions are violated. Coefficient omega is perhaps the most popular alternative; however, as far as I know (I may be wrong since I haven't read McDonald's original work), this coefficient is based on CFA. Since CFA assumes large samples, how large should sample size be to compute coefficient omega?
After perusing fuzzy sets literature I have yet to find a way of defuzzifying a mean's range (i.e. its lower and upper confidence levels) rather than the mean itself. Is there an output procedure that can handle this?
For example, if I have a mean of 1, and -- accounting for standard error and sample size -- I find that its lower confidence level (LCL) is 0.9 and its upper confidence level (UCL) is 1.2 at 95% a significance level.
Taking the LCL of 0.9 and UCL of 1.2, obviously 0.1 of the range falls within the membership function of < 1 and 0.2 of the range falls within the membership function of 1 < 2. Thus, 33.33% of the range is < 1 (at a 0.333 membership degree) and 66.66% of the range is 1 < 2 (at a 0.666 membership degree).
Is there a way of defuzzifying this range? Or must fuzzification only be applied to one unit of measure at a time (i.e. the mean itself)? Thank you for reading!
The first model has 2 predictor variables and I want to know if the predictive power of the model increases with the addition of the 3rd predictor variable?
can anyone correct my syntax plzz for this equations
This is the error
the code is here
!TRIAL FOR REACTIVE SCHEDULING;
! d= surgical duration of speciality i;
! var= variance of surgical duration of speciality i;
! mu,sigma= lognormal distrbution parameters for speciality i;
! M= The sum of the expected surgical durations assigned;
! v= the variance of the sum of the expected surgical durations;
! T= time remaining in the theatre;
! t= the surgical durations;
! X: initially assigned patients
Y: newly assigned patients
E: avaliable patients;
speciality = 1 2 3;
priority = 1 2 3 4;
type_of_patient= elective, emergency;
X = 6;
E = 10;
C = 10 15
mu = 2.78 , 3.38, 3.42;
sigma= 0.674, 0.561 ,0.779;
! objective function;
min = @sum(allocation(i,j,k):C(j,k)*(X(i,j,k)-Y(i,j,k))+@abs(X(i,j,k)-Y(i,j,k))); ! to minimize the cost of the new schedule;
!the constriant about the number of the patients assigned cannot exceed the number avaliable;
! the calculation of d;
d(speciality(i)) = @EXP(mu(i)+ (sigma(i)/2));
!the cal of variance;
var((speciality(i))) = (@EXP(sigma(i))-1)*(@EXP(2*mu(i)+sigma(i)));
!the cal of The sum of the expected surgical durations assigned;
!the cal of The sum of the variance of the expected surgical durations assigned;
! the cal of the amount of the time that is planned for the patients assigned to the theatre;
T =>(@SQR(M)/(@SQRT(V+ @SQR(M))))*(@EXP(@SQRT(@LOG((V+@SQR(M)/@SQR(M))))));
@FOR(allocation(i,j,k): Y(i,j,k)=> 0);
Consider a signal F1(x) with a variance var1, and a signal F2(x) = a*F1(x-b) with another variance var2.
So, what becomes the variance var3 of the signal F3(x) resulting from:
F3(x) = F1(x) + F2(x).
Suppose f(x) is a piecewise continuous function defined for [a,∞).
If an inproper integral ∫[a,∞)f(x)dx coverges, what can we say on a limit(x→∞) f(x)?
Since we assume the inproper intgegral caoverges, we can say there is some c such that
otherwise ∫[c,∞)f(x)dx>=inf(x>c)f(x)×(∞-c)＝∞ or ∫[c,∞)f(x)dx<=sup(x>c)f(x)×(∞-c)＝-∞ that contradicts the assuption.
So we can say
1, if limit(x→∞) f(x) converges,it converges to 0,
2, limit(x→∞) f(x) can't diverge.
What else can we say?
For example, If limit(x→∞) f(x) oscillates around 0 without convergence perpetually, is it possible that the improper integral ∫[a,∞)f(x)dx converges?
And how about on others?
I would like to compute the omega coefficient for a bifactor model with residual variance.
May I know if adjustment/modification is needed for the formula?
By the way, I used the Watkins (2013)'s Omega software to compute the omega and omega (h).
I have seen many examples of ELM being examined as a moderator especially when examining the impact on the attitudes. However, I am wondering if ELM has ever been examined as a mediator to examine the impact on attitudes? can you refer me to some examples ?
I have a partial adjustment regression with dummy variables in regressors. N=17 and T=18. because of dummies can't use GMM methods. how can I solve autocorrelation problem?
this is my model EIB = C(1)*EIB(-1) + C(2)*PB + C(3)*PG + C(4)*PN + C(5)*PK + C(6)*PL + C(7)*RD + C(8)*D1 + C(9)*D2 + C(10)*DS + C(11)
D1,D2 are industrial dummies and DS indicates time dummy.
Using exploratory methods to track changes in the syntactic preferences of constructions over time, I was wondering if anybody has ever conceived of time (e.g. decades) as a continuous variable in statistical analysis.
For instance, I have a corpus that covers the period between the 1830's and the 1920's (10 decades) and I would like to divide my dataset into, say, 5 clusters of decades.
What do you think? Knowing that this could be feasible only in exploratory analysis, not in predictive analysis (regression models).
Thanking you in advance!
I have a mosaic plot (done by statisticians) that has a likelihood ratio and I want to know what this ratio actually represents. The mosaic plot summarizes the relationship between my protein of interest and tumor grade. As far as I know, the likelihood ratio in this case is the chi square test and not the same as likelihood ratio done for diagnostic testing? I am just not sure what it represents and what the null hypothesis would be. Any feedback will be greatly appreciated.
In our models, we compute the likelihood of the whole data that is generated in one day (by summing up the log likelihood of single data instances). However, data generated in one time interval may be more than the next day. This makes the log likelihood of the days with too many data instances to be less than the one with fewer data instances. How can I compare the likelihood for different sizes of data?
Fisher information  is a way of measuring the amount of information that an observable random variable X has about an unknown parameter u, upon which the likelihood function of u depends.
On the other hand we can characterize the sensitive dependence on initial conditions by measuring the distance between two close points on a statistical differential manifold.
Now the question is: "Is there any relationship between the amount of information and the distance of two close points?"
 Fisher, R. A. (1922) On the mathematical foundations of theoretical
statistics. Philosophical Transactions of the Royal Society, A, 220,
I am writing an article about Fagan's nomogram. I want to add an example for calculation of post-test probability using Fagan's nomogram. I need an online tool for accurate presentation of pre-test probability and +ve Likelihood ratio on the nomogram.
If not possible, what is the best solution? Should i seek help from a graphic designer?
Does anyone know a citable paper in which the marginal likelihood of a normal distribution with unknown mean and variance is derived?
A short sketch of how the procedure should look like: The joint probability is given by P(X,mu,sigma2|alpha,beta), where X is the data. Rearranging gives P(X|mu, sigma2) x P(mu|sigma2) x P(sigma2). Integrating out mu and sigma2 should yield the marginal likelihood function.
I found several paper which work with the marginal likelihood for the linear regression model with a normal prior on the beta and an inverse gamma prior on the sigma2 (see e.g. (Fearnhead & Liu, 2007)). Or deriving the posterior distribution of the unknown parameters, but not the marginal likelihood.
I hope the question was understandable and anyone may help me.
As I understood can the normalized (profile) likelihood be used to get the CI of an estimate. "Normalized" means that the function will have a unit integral.
This works beautifully for the binomial distribution (the normalized likelihood is a beta-density). For the expectation of the normal distribution the dispersion parameter needs to be considered -> profile likelihood, and the normalized profile likelihood is the t-density.
Now I thought I should be able go a similar way with gamma-distributed data. The ready solution in R seems to be fitting a gamma-glm and use confint(). Here is a working example:
y = c(269, 299, 437, 658, 1326)
mod = glm(y~1, family=Gamma(link=identity))
(coef(mod)-mean(y) < 1E-4) # TRUE
Ok, now my attempt to solve this problem step-by-step by hand:
The likelihood function, reparametrized to mu = scale*shape:
L = Vectorize(function(mu,s) prod(dgamma(y,shape=mu/s,scale=s)))
L.opt = function(p) 1/L(p,p)
optim(par=c(100,100), fn=L.opt, control=c(reltol=1E-20))
The MLE for mu is equal to the coefficient of the model. The next lines produce a contour plot of the likelihood function with a red line indicating the maximum depending on mu:
m = seq(200,1500,len=100)
s = seq( 50,1500,len=100)
Lmat = outer(m,s,L)
Rmat = Lmat/max(Lmat)
s.max = function(mu) optimize(function(s) L(mu,s), lower=1, upper=2000, maximum=TRUE)$maximum
The next step is to get the profile likelihood (and look at a plot):
profileL = Vectorize(function(mu) L(mu,s.max(mu)))
Then this profile likelihood is normalized to get a unit integral:
AUC = integrate(profileL,0,Inf,rel.tol=1E-20)$value
normProfileL = function(mu) profileL(mu)/AUC
(integrate(normProfileL,0,Inf)$value-1 < 1E-4) # TRUE
Now obtain the 90% CI as the quantiles for which the integral of this normalized profile likelihood is 0.05 and 0.95:
uniroot(function(mu) integrate(normProfileL,0,mu)$value-0.05, lower=0, upper=5000)$root
uniroot(function(mu) integrate(normProfileL,0,mu)$value-0.95, lower=0, upper=5000)$root
This is 413.9 ... 1557.
confint(mod,level=0.9) # gives 364.8 ... 1076
This seems to be not just a rounding error.
Where am I wrong?
Hi, I'm new to ML / phylogenetic software and have a question regarding the importance of ML values. Authors usually provide ML values, describe branches within tress as either strongly or weakly supported, and use ML values to justify their inferences. The following information is cut from the RAxML manual (section 2.1) regarding ML values.....'My personal opinion is that topological search (number of topologies analysed) is much more important than exact likelihood scores in order to obtain a good tree'....... This suggests to me that bootstrapping is more important that ML values, does anyone have an opinion on this? Support values in the ML tress I have made are <50%, I used 1000 bootstraps; however, the topologies and placing of taxa in my tress are not too dissimilar to those in the literature, should I be concerned about ML support values <50%?
What external factors can cause power system components outage? What is their occurrence likelihood? And how can it be calculated?
I would be very appreciative if someone could help me. What I want to do is:
1. Compare two linear multiple regressions. I want to know what coping strategy best predicts burnout. The first model has eight coping strategies. The second has 7 coping strategies (I removed one coping strategy from first model based on the Z statistic).
2. From here I want to test for parsimony using a likelihood ratio test to compare likelihood values of each model based on a chi-square distribution.
I have no idea how to do the second step, which has been recommended by my research supervisor. Does anyone have any suggestions? I am using SPSS.
I have done stepping stone analysis in MrBayes to estimate the marginal likelihood for two models. I have got the following results:
Each table with the marginal likelihoods value for two runs and the mean for each model.
Model 1 (M1) =
Run Marginal likelihood (ln)
Model 2 (M2)
Run Marginal likelihood (ln)
Which model is better? Which one I should reject and why?
Likelihood for incorporating tagging data into an age-structured stock assessment models are well-known but I am struggling in finding such functions to use tagging data in the context of a global production model (e.g Schaefer model).
This is still my old problem...
Obviously, the likelihood and the "sampling distribution" are related. The shape of the likelihood converges against the shape of a normal distribution with mean = xbar and variance = s²/n (difference is just a scaling factor to make the integral equal unity).
Consider normal distributed data, the variance s² being a nuisance parameter. Ignoring the uncertainty of s², the likelihood again is similar to the normal sampling distribution. The confidence interval can be obtained from the likelihood as the central range covering 100(1-a)% of the area under the likelihood curve. This is explained for example in the attached link, p. 31, last paragraph.
The author shows that the limits of the intervals are finally given by sqrt(chi²(a))*se (equation 13) and states that sqrt(chi²(a)) is the same as the a/2-quantile of the normal distribution. Then he writes "The test uses the quantile of a normal distribution, rather than a Student t distribution, because we have assumed the variance is known.". Ok so far.
How is this done in the case when the variance is unknown? I suppose one would somehow come to sqrt(F)*se, so that sqrt(F) is the quantile of a t-distribution...
Other question: How is the likelihood (joint of mu and sigma²) related to the t-distribution? It there a way to express this in terms of conditional and/or marginal likelihoods? This could possibly help me to understand the principle when there are nuisance parameters in general.
In many texts I find the statement that the likelihood is a normal-Gamma, e.g. here:
I never found anything explaining HOW this is derived. It is always nicely shown how the likelohood for a known variance is derived (-> normal with mean mu and variance sigma²/n).