Science topic

# Maximum Likelihood - Science topic

Explore the latest questions and answers in Maximum Likelihood, and find Maximum Likelihood experts.
Questions related to Maximum Likelihood
Question
I am currently replicating a study in which the dependent variable describes whether a household belongs to a certain category. Therefore, for each household the variable either takes the value 0 or the value 1 for each category. In the study that I am replicating the maximisation of the log-likelihood function yields one vector of regression coefficients, where each independent variable has got one regression coefficient. So there is one vector of regression coefficients for ALL households, independent of which category the households belong to. Now I am wondering how this is achieved, since (as I understand) a multinomial logistic regression for n categories yields n-1 regression coefficients per variable as there is always one reference category.
Question
Could you please explain specifically these two classification algorithms and the similarities and differences when they are applied in remote sensing image classification? Thanks a lot.
Bayesian analysis are based on Maximum Likelihood equation but add a multiplier in the numerator describing the prior probability that makes the ML probaility conditioned on the prior probability. This is the subject of and entire course on statistical inference so is difficult to describe unless you have a background in probabilities and Bayes theorem.
Question
While finding maximum likelihood estimates how to get minimum bias estimate.
Interesting.
Question
Hello everyone,
As the title suggests, I am trying to figure out how to compute a reduced correlation matrix in R. I am running an Exploratory Factor Analysis using Maximum Likelihood as my extraction method, and am first creating a scree plot as one method to help me determine how many factors to extract. I read in Fabrigar and Wegener's (2012) Exploratory Factor Analysis, from their Understanding Statistics collection, that using a reduced correlation matrix when creating a scree plot for EFA is preferable compared to the unreduced correlation matrix. Any help is appreciated!
Thanks,
Alex
Question
I’m in need of materials on exponential regression model estimation using maximum likelihood estimation or other methods
You can try this easy to use program:
The help file explains how it works.
Question
I have constructed phylogenetic tree of 16s rDNA sequence by Neighbour joining, Maximum Likelihood, Maximum Parsimony methods by MEGA with Bootstrap value. I want to combine all the three phylogenetic trees into a single phylogenetic tree. Which program is suitable for this and how can I construct that? following attached tree (image) is an example
Thank You very much Patrice Showers Corneli
Question
Hello, I would like to create 100 parsimonious trees through the command line. I am looking for the best software to do this. I have 25,000 tips for the trees, so it would not be possible to use ML methods. The software does not have to run 100 of them in one step. I would be happy to use just one script that makes one maximum parsimony tree and run this script 100 times using a workflow management tool like Snakemake. Also, the software has to take a multiple sequence alignment as the input file. I wanted to use TNT, but I cannot use my MSA fa file with TNT. Thank you in advance.
Thank you for your answering. I am running 25,000 tips since I am trying to reconstruct the phylogenetics for a virus. I need the starting tree for an additional analysis. The sequences range from 200 to 15,000 bp in length. I don't think using a bayesian method here like mrbayes would work for this since that would be far too computationally intensive. All I need is a MP tree, which should be less intensive than any ML method.
Question
Hi everyone.
I have a question about finding a cost function for a problem. I will ask the question in a simplified form first, then I will ask the main question. I'll be grateful if you could possibly help me with finding the answer to any or both of the questions.
1- What methods are there for finding the optimal weights for a cost function?
2- Suppose you want to find the optimal weights for a problem that you can't measure the output (e.g., death). In other words, you know the contributing factors to death but you don't know the weights and you don't know the output because you can't really test or simulate death. How can we find the optimal (or sub-optimal) weights of that cost function?
I know it's a strange question, but it has so many applications if you think of it.
Best wishes
Question
Experiencing slight difference in the results of tress obtained from BEAST and Maximum Likelihood...
I will suggest to delete the insertion deletion from the alignments. The analysis based on the alignment may provide similar results.
Question
Hi there,
I'm an undergrad Psychology student, I was taught in my statistics class that it is recommended to use pairwise deletion when dealing with missing data to not reduce your sample size and statistical power. I have been doing reading on Little's Missing Completely at Random analysis (MCAR) as well as imputation techniques like multiple imputation, maximum likelihood and full information maximum likelihood to deal with missing data instead of deletion.
My question is if a researcher were to have a significant p-value on the MCAR test and then use pairwise deletion would this mean that the sampling is not random, but instead based on whether a participant responds to a question or not. If so, does this then eliminate the generalizability of the research?
Dr. Wilhelm is completely correct. I would not go quite as far as he did on hypothesis tests. The important thing here is that pairwise deletion makes it less likely that you are sampling from the population of interest and if asked by some reviewer it's harder to say that you are talking about the population of interest instead of some other. This can defeat your argument for generalizability. Who suggested pairwise deletion? that I believe is from the SPSS literature. I would not suggest that as a good source. Best wishes, David Booth
Question
Hi, I have questions about HLM analysis.
I got results saying 'Iterations stopped due to small change in likelihood function'
First of all, is this an error? I think it needs to keep iterating until it converges. How can I make this keep computing without this sign? (type of likelihood was restricted maximum likelihood, so I tried full maximum likelihood but I got the same sign) Can I fix this if I set higher '% change to stop iterating' in iteration control setting?
I general convergence is when there is no meaningful change in the likelihood!
I cannot tell if this is a warning or it has been successful.
You may want to have a look at this:
This is the HLM 8 Manual ; I suggest you search it for the word "convergence"
Question
I am working with a distribution in which the support value of x depends upon the scale parameter of the distribution and when I obtaining the Fisher Information of the MLE, it exists and giving some constant value. So, in order to find the asymptotic variance of the parameter, can I take the inverse of the Fisher Information matrix even they violets the C.R. Regularity condition, and will it hold the normality property of the MLE??
Please suggest to me how can I proceed to find the confidence interval of the parameter.
Interesting
Question
When I learn about the meta-analytic structural equation modeling using TSSEM method, I find some different opinions regarding missing data:
In Jak, S. (2015). Meta-Analytic Structural Equation Modelling. Springer International Publishing., The author indicated that 'Similar to the GLS approach, selection matrices are needed to indicate which study included which correlation coefficients. Note however, that in TSSEM, the selection matrices filter out missing variables as opposed to missing correlations in the GLS-approach, and is thus less flexible in handling missing correlation coefficients'.
While in Cheung, M. W.-L. (2021, January 22). Meta-Analytic Structural Equation Modeling. Oxford Research Encyclopedia of Business and Management. Retrieved 28 Jan. 2021, from https://oxfordre.com/business/view/10.1093/acrefore/9780190224851.001.0001/acrefore-9780190224851-e-225., the author indicated that 'Instead of using the GLS as in Becker’s approach, the TSSEM approach uses FIML estimation. FIML is unbiased and efficient in handling missing data (correlation coefficients in MASEM) ···'.
Based on what I described above, I feel confused about what kind of missing data can TSSEM handle, the missing variables? or the missing correlation coefficients? or both can be handled using different methods?
Then my understanding is that the two authors described the ways to handle missing data in TSSEM from different corners, Dr.Jak emphasize on using selection matrices to filter out missing variables; While Dr.Cheung emphasize on using ML to interpolate missing correlation coefficients. But I am not sure whether my understanding on it is right or not, So I sincerely invite you to answer my question, thank you!
Zhenwei Dai
2021.8.28
I don't have access to the 2021 article by Cheung. To my understanding, the quoted section in Jak (2015) was talking about fixed-effects TSSEM proposed by Cheung and Chan in 2005. Fixed-effects TSSEM indeed cannot handle missing correlations as in GLS *in the past*. See the paper by Jak and Cheung in 2018 on how to solve this problem for fixed-effects TSSEM:
Maybe Cheung (2021) is talking about random-effects TSSEM in the quoted section. Random-effects TSSEM use a different approach in stage 1. You can check Chaper 7, section 7.4, of Cheung (2015): Meta‐Analysis: A Structural Equation Modeling Approach:
Hope this helps.
Question
Good day, I am interested in understanding a research paper that used the maximum likelihood of a generalized linear mixed model to predict the missing data from participants who did not complete an intervention in a randomized control trial.
I am not very familiar with statistical functions, and would like to understand this concept in simple terms to explain to my peers. Any information is appreciated.
Thank you very much Professor Eliana, I shall read into the article.
Question
Hello, I have not normally distributed data and wonder if I should use and report the standard Maximum Likelihood Model Fit Indices such as CFI, TLI etc. or if I cannot use them due to non-normality. I'm also unsure if I went correctly in the SEM procedure regarding non-normality.
This is the order I proceeded:
Model: UTAUT2, n=120, Software: AMOS 23, SPSS 23
1. Exploration
1.1 Eye-Inspection of data in SPSS for outliers (straight-liners)
1.2 KMO & Bartletts test -> both suggest data fitness for FA
2. CFA:
2.1 Construct reliability and validity -> good after modification
2.2 Model Fit Indices (based on MLE) -> acceptable
2.3 normality test -> items are not normally distributed
2.4 Bollen-Stine Bootstrap -> p not significant, which means model should NOT be rejected
2.5 Bootstrap -> Bias-corrected percentile Method -> all items load significantly
3. Structural Model:
3.1 Bollen-Stine -> p not significant -> model is not rejected
3.2 Model Fit Indices (based on MLE) -> acceptable
3.3 Bootstrap -> Make assumptions
I'd be very thankful for any help and recommendations.
Hi! Thank you all for your suggestions. And I would like to ask what do you think about using PLS-SEM in this case? Is the PLS-SEM modeling better option than SEM when most of data are highly skewed?
Question
Ml estimation is one of the methods for estimating autoregressive parameter in univariate and multivariate time series.
Question
Hello All,
Wooldridge's Introductory Econometrics (5th ed.) states that "Because maximum likelihood estimation is based on the distribution of y given x, the heteroskedasticity in Var(y|x) is automatically accounted for."
Does this hold also for bias-corrected or penalized maximum likelihood estimation under Firth logistic regression?
I may be misunderstanding your question, but there is no constant variance assumption with logistic regression, so you do not need to worry about heteroskedasticity. In fact, heteroskedasticity is almost guaranteed with logistic regression since the variance of a binomial random variable is a function of the probability of the event happening and the probability of the event not happening, which will usually differ between observations.
Question
my result is different from some papers', can you share your result?
thanks so much.
A quasi-likelihood incorporates an additional dispersion parameter into a true likelihood for greater flexibility in modeling the variability of the data (think overdispersed Poisson model). A profile likelihood re-expresses nuisance parameters in the full likelihood as a function of the parameters of interest. Under mild regularity conditions after profiling the quasi-likelihood you can differentiate its log transformation and set this equal to zero to solve for the root, which will be a function of the data. This function of the data will be your maximum quasi-likelihood estimate.
For inference you can rely on a Wald, Score, or quasi-likelihood ratio test. For the Wald and Score tests if the parameter space is bounded the performance can be improved by utilizing a link function g{MLE}, e.g. g{ } = log{ } if the parameter space is bounded below by 0. For the score and quasi-likelihood ratio tests the nuisance parameters should be profiled by estimating the nuisance parameters under the restricted null space.
Question
As far as I know, the simulated maximum log likelihood should be increased with the increase of the number of R of random draws.
Hi, Thank you. Could you please provide me the reference where I can learn more on this?@arie ten cate
Question
I want to run the Pseudo-Poisson Maximum Likelihood (PPML) in a panel data framework as my dependent variable has many zeroes. However, my challenge is that from all the literature I have read on the PPML, it seems to only work in gravity model type of estimation. Is is it possible to run a PPML using panel data for a non-gravity type of model? If it is possible, what is the STATA command to use?
Question
I have run ordered probit model for a latent factor using gsem function in Stata 16. This is because I have Likert scale (1-5) answers to my observable variables. The problem is I cannot obtain fit statistics for the model. I have read that it is not possible for gsem in Stata with some exceptions (e.g. latent class analysis). Is that the case? Would an alternative be to run a SEM with maximum likelihood? I guess that has been done in some studies using Likert scale answers, and I can then get gof easily. But I am hesitant as it does not seem to be technically correct.. Thank you for any suggestions!
Question
Hi all,
I'm looking for pieces of software that could compute Maximum Likelihood and Parsimony on mixed-ploidy SSR data (2n though 6n). I can devise sequences, but the crucial points are (1) handling more than 1n data; (2) ability to assess phylogeny among the mixed-ploidy dataset. Thank you for your suggestions!
-Marcin
Dear Marcin
Hope this helps
Pablo
Question
i tried to build a tree between bayesian inference and maximum likelihood algorithm using the same sequence but i got the different result. i reconstructed 5 homolog sequences with no polymorphic site. in maximum likelihood there is no branch length among 5 samples, while in bayesian inference i have branch length among 5 homolog sequence samples. why does this happen ? thank you
You really can't "average" Bayesian and Classical ideas. You pay your money and you make your choice, as they say. David Booth
Question
I was reading my class work which mentions about various algorithms for pharmacophore mapping and wanted to learn more about the maximum likelihood algorithm. What exactly are the advantages and limitations of using this algorithm? What is the best algorithm for pharmacophore mapping?
Maximum likelihood algorithm belongs to a supervised classification, for which the training samples are used to compute their distribution parameters. Further, every sample probability is computed and assigned to the class for which it gets the highest probability.
Question
What is the difference between Maximum Likelihood Sequence Estimation and Maximum Likelihood Estimation? Which one is a better choice in case of channel non-linearities? And why and how oversampling helps in this?
The Maximum Likelihood Estimation (MLE) is a method of estimating the parameters of a specific model. It selects the set of values of the model parameters that maximizes the likelihood function. Intuitively, this maximizes the "agreement" of the selected model with the observed data.
Question
I am trying to use R to create scatterplots of my data. I am using the effect_plot function (partial.residuals = TRUE) so that I can control for other variables in my plot, but I cannot figure out a way to estimate my missing data using full information maximum likelihood (FIML). Is there a way to do this in R?
I suppose you use the 'jtools' R package. When you fit the model:
data(movies) fit <- lm(metascore ~ budget + us_gross + year, data = movies) summ(fit)
You get the number of missing observation.
#> MODEL INFO: #> Observations: 831 (10 missing obs. deleted)
see:
Question
If we have a NOMA uplink of lets say 2 users. if the base station is utilizing Joint Maximum Likelihood Detection (JMLD) instead of Successive Interference cancellation. In this case the key performance metric is the BER (Bit Error Rate).
How do we calculate the Rate of first user? In case of SIC the user being decoded first has its individual rate affected by interference from the user to be decoded afterwards. If SIC is not utilized and JMLD is utilized then we should get rid of the interference caused by 2nd user to the one being decode first.
Is this conceptually right? What is the relation of BER with achievable rate? Is it possible that we achieve the desired rate but we have bit errors?
Rate = Bandwidth * log2(1 + SINR)
For clarity:
1) First user is the one being decoded first and has good channel gain and hence is strong user.
2) 2nd user is the one being decoded afterwards.
The capacity region of the uplink two-user channel is achieved by SIC. You can certainly apply joint detection if you want to, but it won't improve the rates since SIC is capacity-achieving. See Section 6.1 in "Fundamentals of Wireless Communications" by Tse et al. for details. All points in the capacity region can be achieved by varying the transmit powers and decoding order.
The capacity/rate is the bit/s that you can deliver with zero BER when using (asymptotically) long coding blocks. The BER is a relevant metric when using short blocks. Here is a video where I discuss that in further detail: https://youtu.be/4nRjsq_P4ZA
Question
I want to construct a Maximum likelihood tree of gene sequence from isolated species and a full-length sequence of the same gene in other species obtained from Genbank, but I am unable to decide which model I should choose WAG or JTT in RaxML.
Your suggestions will benefit me greatly
You could run a model test to see what substitution profile best suits your data. In my opinion, the easiest way to do that currently is using IQtree ( http://www.iqtree.org/doc/Substitution-Models ), as it returns understandable output when model testing. Their documentation also provides useful summaries of what particular models are suitable for.
Keep in mind that the quality and quantity of your data guides that process. You can then apply the suggested model in RaxML, and compare against IQTree inference.
Also, if I understand you correctly you are inferring a tree from a single tree. Keep in mind that the resulting tree, if resolved, traces the evolution of that gene, which may or may not correlate to species phylogeny.
Hope that helps :)
Question
Hi all,
I am running a mixed model and have noticed that my data violates the normality assumption. I am running this in SPSS and therefore the estimators available to me are maximum likelihood or restricted maximum likelihood. I was wondering if these estimates are robust for non-normal data? If so, is there a preference for one over the other? Currently my SPSS has defaulted to restricted maximum likelihood.
Thank you in advance for any help on this!
M
I too think that it will not make much of a difference which one is used in terms of formality.
The easiest way to think of these is how you calculate the simple descriptive variance - the numerator is the sum of the sum of the squared deviations around then mean , and the numerator is? If you use n - the sample size, this is maximum likelihood and if it is n-1 , it is the restricted version. Fisher by a geometric argument had seen that one degree of freedom had been consumed by fitting the mean. That is the variance is calculated taking account of the uncertainty in the mean. Bayesian analysis which is sometimes known as full uncertainty analysis takes account of the uncertainty in all the parameters of both of the fixed and the random part of the model.
Question
I have a non linear wiener model (xk = x0 + θ_k*(t_k)^b+ σ B(t_k )). I have a sample of m RMS values collected for m intervals of time.
For getting these parameters (θ_k,b,σ,B(t_k )), can i perform partial differential of PDF and get values?
You can introduce joint distribution to pdf. Then differentiate with respect to the parameters.
Parameters that maximize the log likelihood function are the best then.
Question
or principle axis factoring?And which turning method is the best with maximum likelihood or principle axis factoring in order to be able to confirm the scale model in Confirmatory factor analysis?( the participant number is between 400 and 500 and the research ares is social sciences-education)
ML is the basis for CFA, so if you want to maximize consistency, you should use it for your EFA as well. To avoid the "over-fitting" problem (i.e., simply confirming your EFA results by applying both methods to the same data), you can use the EFA on a randomly selected subset of the full data set.
Question
or principle axis factoring?And which turning method is the best with maximum likelihood or principle axis factoring in order to be able to confirm the scale model in Confirmatory factor analysis?( the participant number is between 400 and 500 and the research ares is social sciences-education)
It is not necessary to use ML method.
Question
Hi everybody!
I'm performing EFA on a 400 observations database that contains 39 variables that I'm trying to group. I'm using maximum likelihood and applying a varimax rotation.
I have eliminated all the variables that have have communlaties < 0.4, I know this can be a bit "relaxed" but overall I don't have communalities that are that high (0.67 the highest one and it is only two variables), I have then dropped the variables that have a loading < 0.4 and have eliminated the variables that are cross loading (usually with loadings just above.4).
After performing all these steps, I have 3 clearly defined factors with 19 variables in total (F1: 8 variables, F2: 7 variables, F3: 4 variables). Is it acceptable to drop that many variables?
Mauricio
Hello Mauricio,
The question really isn't a statistical one; the answer depends on how important the 20 variables you propose to discard are to the definition and identification of the construct(s) you seek to assess.
It might be the case that a lot of variables you initially gathered or chose were either inappropriate or of poor technical quality (e.g., unreliable indicators). In such instances, discarding such variables does no harm to your ability to identify the related latent variables. There are certainly many published studies in which the "final" set of indicators retained is far smaller than the original set mustered by the researcher(s).
As a second case, if you had 39 variables which were highly related to one another, you could likely discard variables and not lose appreciable power in identification of the underlying latent variable(s). This is sometimes done when people attempt to develop "short forms" of measures that retain fewer items than the original version. The high intercorrelations pattern doesn't appear to be the case for your data set, based on your query.
If there's no relevant theory to guide you here (and the decision to run an EFA suggests that this could be at least partially the case), It might be useful to confer with domain experts who could advise as to whether the remaining variables appear to still represent a viable set of manifest indicators for the latent variables of interest. As well, other studies may exist in which like factor/s have been identified. You could compare your resultant set of variables to those which others found useful as indicators.
Question
Hi, I want to get three years of hourly wind speed data at 50 meters height for specific locations in Turkey. Unfortunately, I couldn't find any good websites for this. Do you know any online sources for this?
Ahmet Emre Onay, You can click any location and select the data type (Hourly, daily etc.), select dates, then click process, download result. this will give 10m height data, use power law or logarithmic law to convert to 50 m. Also, you can check Global wind atlas
Question
one of the seven variables in my path model is a binary one, with yes/no answers, and this variable is an exogenous one. what estimation method should I use in Mplus? Is Maximum Likelihood good to go with?
Thanks for clearing it out for me.
Hi Chee-Seng Tan,
MLR implements a ML estimation approach (e.g., EM) using all available data. Therefore, MLR doesn't:
1) delete incomplete cases (e.g., "listwise deletion"),
2) impute data (e.g., "multiple imputation").
Roberto
Question
We are trying to map land cover classes on a watershed. We have selected training sites (during a field campaign in early 2017) and extracted their spectral profiles based on a Landsat 8 image acquired at the time of field surveying.
In order to assess the land cover changes, we wanted to map the same cover classes at a previous year. Since our training sites might not be relevant, we wanted to perform supervised classification using endmembers spectra instead of ROIs. When importing those spectra inside ENVI's Endmember Collection toolbox, it appears that only Spectral Angle Mapper and Spectral Information Divergence classifiers could be used. Common algorithms such as Maximum Likelihood or Mahalanobis distance fail, returning the following error message :
Problem: the selected algorithm requires that the collected endmember spectra all contain an associated covariance. ENVI is unable to continue because some of the endmembers collected to not have their covariance.
Could anyone help here ? Actually is our method relevant ? How can we possibly perform supervised classification using Maximum Likelihood/ Mahalanobis classifiers on some older satellite images ?
Roland Yonaba Aline de Matos Valerio Roland questioned himself as to whether the method is relevant. It is not. Methods such as SVM or Mahalanobis distance or Random Forest rely on a data distribution, thus the covariance that is derived from the many pixels in a region of interest taken for training. With the endmembers, you have only one example of each endmember spectrum. SAM and other methods can work with the endmembers because they are just calculating a distance in spectral space between each image pixel spectrum and the endmember spectrum.
Question
My problem is related to fitting an intensity decay with an exponential function via the maximum likelihood method of a poisson-distrubted distribution. Since I need to extract the lifetime of my intensity decay, I need to fit it with an exponential function, and since I know the data follows a Poisson distribution, I am using the MLM with such distribution to achieve my goal. I have seen similar problems but referred to normal distributions, not to my specific case.
Here, I show some simulated data (exponential function), whose parameters I want to calculate using the MLM for a Poisson distribution. It literally does nothing.
Anyone can help? Many thanks.
from scipy import stats import numpy as np from scipy.optimize import minimize import pylab as py xdata = np.linspace(0,25,1000) ydata = 10*np.exp(-xdata/2.5)+1 def negative_log_likelihood(params): A = params B = params C = params yPred = A*np.exp(-xdata/B)+C # Calculate negative log likelihood LL = -np.sum(stats.poisson.logpmf(ydata,yPred)) return(LL) initParams = [5, 1, 0] results = minimize(negative_log_likelihood, initParams, method='Nelder-Mead') print (results.x) estParms = results.x yOUt = yPred = estParms*np.exp(-xdata/estParms)+estParms py.clf() py.semilogy(xdata,ydata, 'b-') py.semilogy(xdata, yOUt) py.show()
Question
I conduct first-order CFA model with four factors and 23 variables, of which 11 are categorical and 22 are continuous. I do not know which estimator (ML, MLR, MLM, WLSMV ??) I should use. Please, somebody can help me?
Thank you very much in advance.
Hi
ML or MLR if you want Logit function (including OR results) on the categorical variables. If no estimator is requested, you will get the WLSMV with probit link function.
Best,
Rolf
Question
I am running multilevel models in R (two-level and three level models) for my thesis. However, I have two problems:
1. Missing data
Literature advises use of Multiple Imputation (MI) or Full-information maximum likelihood (FIML). I do not know how to carry out these processes in r or stata taking into account multilevel modeling. I am looking for practical videos or articles that can help me run either of these processes. I would like to have something running from imputation to analysis process.
2. Many variables
I have many explanatory variables i.e. dummies (e.g gender), discrete and continuous variables. I am looking for a procedure to choose the variables for the regressions. I read some article that said PCA can only work for continuous variables. So this process should take into account multilevel modeling. Are there recommendations of practical videos or articles
Lastly, which one should be conducted first, sorting out missing data or choosing the variables (PCA or other process)
Thank you so much. I will check it
Question
Dear all,
I am running a path model with a dummy variable (Gender) as an independent variable (there are more independent variables). The dependent variables are continuous.
Maximum Likelihood estimation assumes multivariate normality. But this assumption is violated using an independent dummy variable (or dichotomous variable).
I am looking for studies which have investigated the bias of an independent dummy variable on the maximum likelihood estimation when I use only continuous dependent variables.
Does anybody know such studies or some guiding rules?
[Due to sample design I can not use maximum likelihood parameter estimates with standard errors and a chi-square test statistic (when applicable) that are robust to non-normality and non-independence of observations.]
Thank you very much in advance.
Kind regards
Rico
Obasanjo Bolarinwa I would not recommend categorizing an outcome variable as it will lose information and therefore statistical power.
Question
Hello
I'm working on a project that I want to detect forest and deforestation areas. I need some information about Maximum-likelihood or Support vector machine (SVM) for image classification.
Question
Hi,
Just pope yield function: y = f(x; a) + h(z; P)e,- (mean effect + variance effect ).
I would like to know how to run this model through MLE procedure?
In stata there is an inbuilt maximum likelihood estimator
Question
In statsmodel package there is a class for estimating AR(p) processes, but this class can not handle exogenous inputs (ARX model)
I used ARMA class which can be used for estimating ARMAX(p,q) processes and set q=0 (number of MA coefficients) in order to estimate ARX model.
I expect my ARX model have better fit in comparison to AR model, but this is not the case. I suspect this is because of ARMA using Maximum Likelihood which is an approximate solver. How can I fit ARX model with exact least squares in python?
Here is source code to estimate an ARX model in Python with the Gekko sysid function. You can then use the Gekko arx function to simulate or build a Model Predictive Controller such as https://apmonitor.com/do/index.php/Main/TCLabF
from gekko import GEKKO import pandas as pd import matplotlib.pyplot as plt # load data and parse into columns url = 'http://apmonitor.com/do/uploads/Main/tclab_dyn_data2.txt' data = pd.read_csv(url) t = data['Time'] u = data[['H1','H2']] y = data[['T1','T2']] # generate time-series model m = GEKKO(remote=False) # remote=True for MacOS # system identification na = 2 # output coefficients nb = 2 # input coefficients yp,p,K = m.sysid(t,u,y,na,nb,diaglevel=1) plt.figure() plt.subplot(2,1,1) plt.plot(t,u) plt.legend([r'$u_0$',r'$u_1$']) plt.ylabel('MVs') plt.subplot(2,1,2) plt.plot(t,y) plt.plot(t,yp) plt.legend([r'$y_0$',r'$y_1$',r'$z_0$',r'$z_1$']) plt.ylabel('CVs') plt.xlabel('Time') plt.savefig('sysid.png') plt.show()
Question
Hi, my model results in a low critical ratio for most of the coefficient paths.
Can it be due to the sample size? N=157; I used maximum likelihood as an estimate.
Goodness of fit tests are all good.
Is there anyway how to solve that issue?
Hello Martin,
If you mean (as implied by Alizera's response) that the estimated path coefficients are not significantly different from zero, the most likely reasons would be:
1. There is little or no systematic relationship among the variables for the sample (and possibly the associated population) you are investigating.
2. There is a relationship, but your model is mis-specified (most likely by having omitted one or more important variables).
3. There is a relationship, but the statistical power of the procedure was inadequate to detect it/them (e.g., sample size was likely inadequate).
4. Some combination of the above.
Sorry if that's a bit generic, but your query really didn't furnish much detail about the specific model, variables proposed, and estimated relationships.
Question
I am currently analyzing my data for my thesis research, and an issue has come up that we do not know how to resolve. I have two time points, and I am conducting various path analyses in R with lavaan installed.
The issue I am having is with accounting for missing data. Since this is a longitudinal study, only 66% of participants completed both time points. I know that the default estimator with the lavaan package is maximum likelihood. However, this estimator removes cases that did not complete both time points from my analyses. Therefore, I resorted to use full information maximum likelihood estimator. However, when I use the function (missing = “ML”), my results come out strange. For instance, my r-square results are abnormally high, and my parameter estimates become very different from the default estimator and are very large as well. We think something isn’t right with my missing data function.
Does anyone know what might be causing this issue? Is there a better way to account for missing data in my longitudinal design with the lavaan package in R?
Your help and advice is greatly appreciated, as I feel like I am hitting a wall right now. Thank you!
Hi Jessica,
the FIML estimator is requested with estimator="fiml" not "ml".
And you should do an attrition analysis: Create a missingness indicator (0 = participant data available for both waves, 1 = missing at T2) and you regress this indicator on all available data (especially your DV's) with a logistic regression. This can help to address whether the data is NRM. The missingness does only have to be MAR for FIML.
Best,
Holger
Question
I have a small data set consisting of 16 sequences, 919bp each. I am trying to determine the phylogenetic relationship among the individuals. I am wondering if the neighbor-joining method, maximum likelihood or bayesian analysis works best for small data sets like mine. Thank you.
MrBayes will give you robust results, it's just about finding the best running parameters. And for ML, researches tend to go for RaxML, maybe you could try it and see what happens with the bootstrap values of the output.
Cipres server has various online versions: http://www.phylo.org/
Question
The proportional fitting procedure (IPFP,) is an iterative algorithm for estimating expected cell values [M_ijk] of a contingency table such that the marginal conditions are met.
For iterative computation of the M_ij, using MLE for the model, other Algorithm exist , and most common are for 2D while am interested in that of 3D, Assuming model of independence is equivalent to the model that all odds ratios equal one. I need references of examples showing manual calculation, and how to do more advance ones on R.
Use table function; is you share the exact breaker you faced, probably I can help you out.
Question
Hi Guys,
I would highly appreciate some help/input regarding boolen-stine bootstraping in AMOS. Given that my data violated the multivariate normality assumption I have opted to run bollen-stine bootstrapping in AMOS (as MLR are not available in this software). However, now I wanted to compare the model fit indices for both the bootstrapped model and the "normal" ML results and the all model fit parameters are identical.
I am not sure what I am doing wrong, but clearly there should be a difference between the two or what is the purpose of it otherwise?
Really looking forward to some input.
Thanks,
Jacky
Use the ULS estimator.
Question
Hi, I have a problem with a confirmatory analysis. I have 40 items of which 10 were strong, but the maximum likelihood analysis tells me that I must count 4 factors of these 10 items, thus I am asked to repeat some items in more than one factor, my question is, ¿Is this possibile?, ¿Methodological possible?
Hello Jesua,
So, for items like AP3 and AN1, were these cross-loadings implied by your theory, your instrument construction, or by you paying attention to suggested modification indices from your CFA software? As Cristian notes, choices need to be consistent with the underlying basis for your measure(s). In your case, this would apply to both the cross-loadings and the second-order factor solution.
So, to answer your question, is it possible? Absolutely. Is it mandatory to have these? Not at all. It may be better to modify (and retest) or discard items that do not conform to the theoretical framework than to re-imagine the framework to fit the behavior of the data. The latter approach leads to over-fitting, which tends not to generalize well to other samples and data sets.
So, think carefully about whether each change you make is consistent with the intended basis and behavior of the measure.
Question
I cannot acces the web based version, when I try it - this page cannot be displayed appears.
So I downloaded the command line version. I tried it on ubuntu (with installed python) but when I run the script it says /usr/bin/env: python -0 -t -W all  no such file or directory.
Can someone suggest me any idea to solve this?
Which Python version did You install? SpedeSTEM needs ver. 2.X
Question
Dear experts, I badly need a help to solve the problem of Landsat image classification of a coastal region of Bangladesh. The problem is regarding supervised classification of a coastal district. The land cover spectral reflectance value of pixels is quite complex. The Pixel values of Build up area and Barren land are very close. When selecting the Training sample pixels for Urban area and Barren lands that are often in conflict in the output. There are many unwanted areas in the classified image. Like many unwanted buildup areas or many Barren lands occupy Buildup areas. I tried more and less training inputs for urban area several times. but the results are not appropriate. Can you please suggest any ways to solve the problem??
Hi Sarowar,
A good strategy is to run a parallel unsupervised classification and check out the spectral signatures of your training samples. This can inform you about the spectral differences/similarities between your expert classes. You can use this information to split your 'barren lands' and/or 'buildup' into multiple classes. Having more classes might improve the strength of the classification and you can always regroup the classes afterwards.
Regards,
Maarten
Question
What are the methods to solve the blind source separation problem?
This is related to entropy.
This is related to independent components analysis.
This is related to maximum likelihood estimation.
But how higher order moments can be used?
If you already tried ICA, try Non-negative matrix factorization (NMF.)
Question
I'm trying to create a map, where I want to show a specific object. Is it possible to classify only the sample area I put in the map without doing maximum likelihood classification?
You can use object-oriented classification
Question
I used MEGA7 to determine the best substitution model and the SM with the lowest BIC value is the LG substitution model. However, I can't find LG when I want to make a maximum likelihood tree of my proteins. Can anyone help me?
I personally wish people would not use undefined terms in these questions. However this link may help you:
Good luck, D. Booth
Question
I used maximum likelihood method to draw the tree, i don't know why the bootstrap for the same bacterial species is low (1_29) as shown in the attachment (bootstrap consensus tree),and the numbers between the same species (original tree) are also low as you can see in the attachment. I used muscle algorithm for alignment my sequences by mega app for 16srRNA region. I also tried to use Gblock app, but the bootstrap is still the same. I appreciate your help.
Dear Patrice
thank you for your explanation, But I have some other question:
1. Why mega app construct 2 phyogenetic trees: Original tree (attachment no.1) and Bootstrap consensus tree (attachment NO. 2)
which one I can depend on?
2. When I construct a tree using ML is the <50% bs branches will consider as incorrect data?
If I understand well your explanation , these low bs reflect the high relatedness of these species (s1-s10) except s11, s12 and s13, which are slightly different. Although (s1-s13) identified as one same species through BLAST in NCBI.
3. Is it normal when I retry the construction of phylogenetic tree,sometimes , I get different bootstrap value as you can see in attachment No.3 and No.4 .
4. Attachment No.2 the purple branches which are identified as the same species. It was mentioned bellow the bs consensus tree that (less than 50% bs are collapsed). I would be grateful if you explain this point for me
Thank you very much
Question
When calculating interferometric coherence, why can't you do so on a pixel by pixel basis? I know the equation for estimating coherence = (|S1∙S2* |)/√(S1∙S1*∙S2∙S2*) where S1 and S2 are the two single look complexes. And I know this calculation uses a maximum likelihood estimator but why do you need to specify an estimation window and why cant the estimation window size be 1?
Thank you.
You are absolutely right. Compared to optical remote sensing where 'adjacency' is a high order effect the signal impact of 'neighbors' is much higher here. A perfect coherence estimator would include dipol distribution, local 3D geometry for ray tracing and radiosity estimation and a gaussian shaped weighting window to 'reflect' the mixing and superimposition of the representing physical process. I was often thinking to compile a paper about all this in connection with a multi stage, alternative phase unwrapping.... Hope it helps. If not you can send me an email: rogass@gfz-potsdam.de
Question
I have sample size of 215 with 10 of them having disease positive and I have about 20 covariables that I want to examine. Can I do univariate logistic Regression for each and those with p- value more than 0.1 are to be included in Firth logistic regression ?
how can I interpret the results in SPSS ?
Conf. Interval Type Wald
Conf. Interval Level (%) 95
Estimation Method Firth penalized maximum likelihood
Output Dataset --NA--
Likelihood Ratio Test 38.0566
Degrees of Freedom 11
Significance 7.65335733629025e-05
Number of Complete Cases 176
Cases with Missing Data 39
Number of Iterations 26
Convergence Status Converged
Last Log Likelihood Change 6.3948846218409e-14
Maximum Last Beta Change 5.36667170504665e-06
what is the significance , last log likelihood and maximum ones ??
I certainly agree with David Eugene Booth that I do not agree with serially testing each variable in a single predictor LR, then only using those that meet a bivariate p standard to be included in a subsequent multiple predictor LR. This would, of course, be parallel to examining all of the zero order PPM correlations between predictors and a criterion and including only those that met a bivariate p standard in a multiple regression equation. You are missing the joint predictive structure of the variables. Moreover, I am not quite sure how you know that the resultant LR will need to be a Firth LR. This would only be necessary if the LR maximum-likelihood iteration failed to converge due to complete, or quasi-complete separation. The only way one would know this from the bivariate results is if some (or one) of the predictors presented that solution by themselves; perhaps that is the case, I can't tell from what I see.
I presume that disease "positive" and "negative" represent the two groups. As such the very low base rate of 10 for positive, presents a logical and statistical problem. I suggest that you try to prune, or combine, predictors in some way; 20 is problematic given this situation.
Once all of this is settled, I suggest careful consideration of what the criterion of accuracy will be. A long-standing standard of accuracy in the sister-technique of discriminant analysis has been cross-validated classification accuracy -- how well can your model correctly classify Ss (with cross-validation) into the respective groups (postive and negative). My guess is that this would be what is important to you. Indeed, using a matrix simplification due to Bartlett, and introduced for discriminant analysis by Lachenbruch, the "U" method, or "leave-one-out" methop as Huberty deemed it, allows us to get cross-validated classification estimates with the sample at hand. This is, for instance, implemented in discriminant analysis in SPSS. The same round-robin omission of a single subject and prediction of group membership from a model including N-1 Ss is also applicable to LR, but not implemented in that package (no matrix identities to help here). I do have such a program if you are interested.
Finally, I would suggest that you consider discriminant analysis as well as LR as candidate analysis methods; don't toss Fisher out just yet :-) You might be interested in how closely the cross-validated performances of these techniques track in a publication that we wrote a bit back. However, in practice, I have seen LR perform much poorer than discriminant analysis in similar situations with a low base-rate.
Question
I have a complex loglikelihood and i want to estimate its parameter.I have tried all i could do. The best i got from optim function is that i didnt use a good initial value. Kindly advise please.It is slowing down my work
I would suggest you try one of the methods provided by the "nloptr" package from R. One of them may work to your problem.
Question
Hello,
I am currently looking at the results of confirmatory factor analyses (CFA) that were conducted by another person. There are a few analysis and model choices that don’t seem quite right to me. I would greatly appreciate if anyone with enough experience with CFA could let me know what they think of the following points:
1) Is maximum-likelihood (ML) estimation ever an acceptable method to use in CFA if variables are ordinal (e.g., Likert scales) or nominal?
2) Is ML estimation ever an acceptable method to use in CFA if data are not normally distributed?
3) Is it acceptable to keep an item with a factor loading > 1 (heywood case) if the model’s fit indices and parameter estimates are otherwise acceptable?
Thank you vey much!
Hi Alexandra
1) if the indicators are ordinal and have >=5 categories, it's acceptable. There are estimators such as DWLS or WLMSV that are explicitly designed for categorical indicators but they rely on the (unreasonable) assumption that there is a normally distributed latent response variable which underlies the indicators.
2) Depends on the extend and form of the deviation. Normally, it is easy in nowadays software to apply corrections (e.g., the Satorra-Bentler or Yuan-Bentler correction). Severe kurtosis, especially is problematic. Nonnormality creases if I remember correctly the chisquare and decreases the standard errors.
3) If the factor loading is standardized, then I would view this as a problem. Fit of a model is one thing but plausibility is an important second thing. Unreasonable coefficients or heywoods raise doubt whether the identification and estimation procedure really worked.
Best
Holger
Question
I can find some articles describing R codes for drawing Nomogram after logistic regression or cox regression. But if in my logistic regression model, penalized maximum likelihood has to be used to resolve the failure of maximum likelihood estimate to converge, is it still possible to draw a Nomogram?
If the answer is yes, how to write R codes for this condition?
Joan
Question
I have estimated a tobit regression model with one dependent variable and 14 independent variables. The number of observations is 450 out of which 41 are left censored while all others are uncensored. I am using a primary survey data relating to farm households. The stata output provides a pseudo-R2 value of 3.75. Kindly let me know if it is acceptable.
The value of the R-Square can not be negative and more than 1.
So your result 3.75 is considered to be not sensable. And I expecte that is because your model is suffering from multiple correation problem. Thath is because high dimention of independent variables (14 variables) that implice nonsingular information matrix that is made suprior regression.
My advise is to use stepwise regression
Question
Let X1,...,Xn iid random variables with distribution F(p), where p is some parameter. Due to some reasons, p is not observable directly (in the sense that there is no way to confirm whether p is static or dynamic). The challenge is to estimate the quantile of p to a given level, without assuming a particular distribution of p.
It seems that bootstrap is the only choice. So under the assumption that X1,...,Xn are not time-specific, bootstrap might not be bad choice. However, the estimate depends heavily on the number of bootstrap-draws. This results in unreliable quantile estimate.
However, assuming that p can be estimated by maximum likelihood method as well as any conditions necessary to ensure a consistent ML estimate, then we know that the ML estimate follows a normal distribution.
So is it appropriate to use the quantile of the ML estimate (using normal distribution) to estimate the quantile of p?
@ Christopher Imanto ,
p is a parameter. It does not have any distribution.
The MLE of p is a random variable and it has a distribution.
Note that the MLE of p is a function of the random variables
X1, X2 , ............... , Xn .
In the asymptotic distribution of sqrt(n) (p_MLE - p) , p is not a variable . MLE of p is the only variable.
Thus, p can not have quantiles. But MLE of p can have quantiles.
From the asymptotic distribution, you will obtain the quantile of MLE of p as a function of p & I(p).
Still you can not determine the value of the quantile of MLE of p since p is unknown.
The only approximation you can do is
" Substitute the value of the MLE of p, obtained from the observations, in place of p in the formula of the quantile of MLE of p."
Question
Additionally, can someone comment on the following:
I am generating a 1D data using a squared exponential kernel. If I use the data to learn the hyperparameters using maximum likelihood approach, then what are the conditions under which I will get the same hyperparameters as I have used for generating the data.
Question
I am estimating a multiple regression model (with one 2-way interaction) using Maximum Likelihood estimation (MLE). Due to some substantial missingness on some important covariates (30-60% missing; n=19000), I estimated the multiple regression model using two missing data treatments (Listwise Deletion, Multiple Imputation). These methods, however, produced different results - example, interaction was significant when using multiple imputation, but not listwise deletion.
Is there a method/test to evaluate which approach (listwise deletion or multiple imputation) is more trustworthy? In papers I've read/reviewed, people often seem to find concordance between their model coefficients when using listwise deletion and multiple imputation.
Also, for those interested, these models were estimated in Mplus, and I implemented a multiple imputation based on bayesian analysis to generate imputed datasets followed by maximum likelihood estimation.
Thanks much,
Dan
Hello Dan,
Yes; different tactics for addressing missing data frequently yield inconsistent results.
Usually the first order of business in trying to select a suitable method is to determine whether the data appear to be missing completely at random (mcar), missing at random (mar), or to have systematic relationships with presence/absence of data points.
Having said that, I suspect the multiple imputation approach you used is likely to be more warmly received than a listwise deletion approach that costs you 60% or more of the data set.
Question
Is there any particular software for partitioning of sequences like the CO1 sequences were partitioned by each codon position, whereas the EF1α sequences were partitioned as introns and exons. Please help me out about this querry.
Tray to use matlab mfile
Question
For given samples X1...Xn, which follow the distribution F with parameters p1,...pm, one may use the maximum likelihood method to estimate p1,...pm. In the technical process, the method is only an optimization problem in m-dimensional field.
In some cases, it is irritating to acquire different estimated values of p1,...pm than the one acquired if one only estimate p1 (assuming this parameter does not depend on p2,...pm). E.g. the joint estimators (\mu, \sigma) may be different than \mu, when estimated alone. Since \mu is reserved for expected value, which one is "correct"? the joint-\mu or the single-\mu?
1) if the joint one is correct, then any single estimator should be doubted.
2) if the single one is correct, then joint-model does not have any values. So one should always choose univariate model.
In some cases, the joint-\mu does not make any sense. The interpretation that \mu is an expected value is somehow no longer valid in a joint-model.
Maximum likelihood estimation works very well for multi-dimensional data. And you are right that singe point estimators are quite useless if the multidimensional space is not orthogonal. In that case the correlations among the parameters must be accommodated,
In other words the parameters that make up the multidimensional space are inaccurate unless the model accommodates any correlations among them.
Certainly if the model is biased by failure to accommodate interactions in the parameter then the estimators will be wrong.
Question
I am currently developing a new flexible discrete distribution with two parameters. However, I am unable to obtain the unique estimators using MLEs and moments method. Multiple solutions obtained from the R software with same logLikelihood values. The method of moments also a dead end. Why does this happen? By the way, the distribution is a legit distribution with sum of probability equates to one.
1. Log-likelihood estimates are guaranteed to be unique only if the likelihood function is log-concave in parameters. This is not always the case; see e.g. "
Examples of Nonunique Maximum Likelihood Estimators" by Dharamdhikari and Joag-Dev.
2. Method of moments will not work if your discrete distribution is on infinite support and the sequence of partial sums for one of the moments (say q^th moment) \sum_i p_i x^q_i does not converge.
Question
Hi all,
It is easy to define an outgroup when reconstructing a maximum-likelihood(ML) tree using IQ-Tree. Currently, I'd like to specify several outgroups in IQ-TREE. However, I don't find any option to made it. I also have a try to specify a comma-separated taxa name of outgroups after the parameter –o (-o outgroup1,ougroup2,outgroup3…). It still can not work. Do you anyone know how to do that? Many thanks.
All the best,
Fangluan
It's late here to answer. But since v1.6.2, IQ-TREE supports a comma-separated list of outgroup taxa for -o option.
Question
Intuitively, Maximum Likelihood inference on high frequency data should be slow, because of the large data set size. I was wondering if anyone has experience with slow inference, I can make optimization algorithms to speed up the infrence then.
I tried this with Yacine Ait Sahalia work on estimating diffusion models, using his code, which (Unfortunately!) is pretty fast, even for large data set. Now does any one know any large slow high frequency financial econometric problem do let me know,
For large samples exact maximum likelihood can be approached reasonably well by faster estimation methods. But I do not understand why you want slow methods. As far as I know, Ait Sahalia code is good. Why do you say "(Unfortunately!)" ?
Question
Hi, Can someone please explain that if the EM algorithm is initialised with K-means, then one needs to compute complete-data maximum likelihood estimation (MLE) or maximum a posteriori (MAP)?
In my understanding, setting the priors leads the EM algorithm to maximising MAP problem. So, by initialising using K-means, we are setting priors.
Thank you.
Thank you Firdos for sharing the article. I have read this and yes, they calculated MLE and not MAP.
But, in other work (Model-Based Clustering and Visualization of Navigation Patterns on a Web Site by Igor V. Cadez et al.) when priors are set using Dirichlet distribution, MAP is computed.
So, my concern is why MAP is not computed in case of K-means priors?
Question
I am new using the RStudio, and I would like your help and guidance to be able to make a phylogenetic tree with Maximum Likelihood. Thank you.
Thank you very much Vincent, a big hug.
Question