Science topic

# Computational Statistics - Science topic

Explore the latest questions and answers in Computational Statistics, and find Computational Statistics experts.
Questions related to Computational Statistics
Question
I have already estimated the threshold that I will consider, created the pdf of the extreme values that I will take into account and currently trying to fit to this a Generalized Pareto Distribution. Thus, I need to find a way to estimate the values of the shape and scale parameter of the Generalized Pareto Distribution.
Mojtaba Mohammadian When using the
the mean excess plot, you look to choose the threshold at a section of the plot where there is stability (i.e. near horizontal). You choose the smallest value of the threshold for this region to reduce bias, but at the expense of the variance. Therefore you must have a compromise between these two opposing measures.
Question
I'm currently working on my master's thesis, in which I have a model with two IVs and 2 DVs. I proposed a hypothesis that the two IVs are substitutes for each other in improving the DVs, but I cannot figure out how to test this in SPSS. Maybe I'm thinking to 'difficult'. In my research, the IVs are contracting and relational governance, and thus they might be complementary in influencing my DVs or they might function as substitutes.
I hope anyone can help me, thanks in advance!
I think you can check the sign of the coefficients. If the sign is positive it might be complementary, otherwise supplementary effects can be deduced.
Question
Hello Everyone,
I want to produce the following figure using R for my paper. But I don't how to produce this figure without overlapping labels in R. Does anybody help how to do that in plot function “plot ()”? There are other packages available to produce this figure but I am interested in plot function in R.
Here is my R script:
plot(SO~TO, xlim = c(0.4, 0.9), ylim=c(0.1, 0.5), col="green3", pch=19, cex=2,data=TOSO)
text(SO~TO, labels=X,data=TOSO, cex=0.8, font=1, pos=4)
Thank you in advance.
Himanshu
I think library ggrepel works really well on ggplot avoiding overlapping and making nice annotation. Moreover is possible to find solutions even using jittering
Question
I explored different packages (such as mgcv, MCMCglmm, glmmADMB, etc) but they all have limitations. Either they don't allow for zero-inflated, zero-altered distributions or they don't allow for temporal auto-correlation through functions as corAR() or corARMA().
A late addition to this question as there have been developments since then. mgcv can now handle this model!
1) The supported family of distributions has been expanded and zero-inflated Poisson can now be specified. [see ?family.mgcv ]
2) Smoothers can now be used to model the autocorrelation using a Gaussian Process [ + s(timevar, bs = "gp") ]. There is an example here:
(3) Random effects can be modelled using smooths as before. A recent paper goes over it in detail:
So the mgcv model would look roughly like:
gam(counts ~ ... + s(time, bs = "gp") + s(groups, bs = "re"), family = "ziP", data = mydataframe)
Hope that helps someone!
Question
I am working with ENMeval package (Muscarella et al., 2014) in R to develope Species Distribution Models. This package developes SDM in raw Maxent output, but I need logistic or probability maps in order to conduct further analysis in ArcGIS.
Hi Guillerme,
Thanks for the advice. I used conventional MaxEnt with the desire settings to derive predictive surfaces
Question
While running Multinomial logistic regression in spss an error displaying in parameter estimate table. For Wald statistics some item value is missing because of the zero standard error and displaying a message below this table "Floating point overflow occurred while computing this statistic. Its value is therefore set to system missing". Does anyone know how to resolve this error?
Madam, I am facing the same problems, could you please help me, how to resolve this problem?
Question
I have a set of nonlinear ordinary differential equations with some unknown parameters that I would like to determine from experimental data. Does anybody know of any good freely available software, or good reference books?
Use Mathematica-modul NonlinearModelFit.
For example, for the data set: data= {{0., 61.375}, {0.25, 366.5}, {0.5, 305.375}, {0.75, 211.375},
{1., 165.125}, {1.5, 128.25}, {2., 121.875}, {4., 84.375}}
a simple Mathematica-program [Glucose Tolerance Test]:
model[al_?NumberQ, om2_?NumberQ, KK_?NumberQ, mu_?NumberQ,
nu_?NumberQ] :=
Module[{y, x},
First[y /.
NDSolve[{y''[x] + 2 al y'[x] + om2 y[x] == KK, y == mu,
y' == nu}, y, {x, 0, 10}]]]
Clear[x, y]
nlm = NonlinearModelFit[SDP0,
model[al, om2, KK, mu, nu][x], {{al, 5}, om2,
KK, {mu, 100}, {nu, 4000}}, x]
nlm[[1, 2]]
gives:
{al -> 4.97808, om2 -> 16.8225, KK -> 1650.52, mu -> 61.2647,
nu -> 3811.59}
Parameter Identification Problem for a set of nonlinear ordinary differential equations with some unknown parameters can be solved similarly.
Question
Xlstat Add-Ins have two different kinds of the trend analysis...(MK and Seasonal MK).. what's the difference in the way of calculation? and what type of data should be an entry? (monthly or seasonally)?
The method of trend detection in seasonal Mk and MK is the same. Only, in Seasonal MK, the Kendall score is calculated for each month and then added to find the seasonal kendall score. rest the method is same for both.
Seasonal Kendall calculates kendall score for m seasons and then adds it. If your data belongs to only one season, I think it is as good as using MK test itself.
Question
Rolle's theorem is applicable in R. Is it also applicable for a function f going from R^n to R^n?
Dear Sengupta,
I hope this will help.
Question
We used SPSS to conduct a mixed model linear analysis of our data. How do we report our findings in APA format? If you can direct us to a source that explains how to format our results, we would greatly appreciate it. Thank you.
The lack of standard error depends on your software, and even then it only applies to the variance terms. The reason for this is the variance cannot go negative and the sampling distribution can often be expected not to be asymptotically normal but skewed. So just explain this in you results table.
Question
We want to calculate the Root Mean Square of Error (RMSE) from the model summary table in Multilayer Perceptron (SPSS) in the following format demonstrated in the Table.
An excel file is attached to calculate RMSEs and draw a diagram. You need to copy the relevant outputs of ANNs (SPSS) into the sheet. Please let me know if any errors are noticed. I hope you find this attachment helpful. Best.
Question
Hi,
I want to know how to rank the relative importance of predictors calculated by summing the Akaike weights of the different models where they are included.
I just find a comment at this respect the in following link, but it would be helpful to find a citation.
thanks
You can follow that Kittle et al. 2008 says about calculate the relative importance of predictors in a unbalanced models set, where them recurre to ponderate the cumulative of wi for each predictor divided for the total numbers of models were each predictor is represented.
textual, them says: " To determine the relative importance of variables and further reduce model ambiguity, we summed wi of all models containing a common factor for each analysis. The higher the combined weights for an explanatory variable, the more important it is for the analysis (Burnham and Anderson 2002). For this measure to be meaningful it is necessary to have the same number of models containing each variable (Burnham and Anderson 2002), so we divided the cumulative model weights for a particular variable by the number of models containing that variable to get an average variable weight (wi) per model "
DOI 10.1007/s00442-008-1051-9
Question
Hi All,
I'm looking for assistance on the below queries related to seasonality of VAR and Inventory Management/Supply Chain Related models
I've a multivariate time series with 2 variables.
1)Could you let me know if VAR model will work with multivariate time series with Seasonality?
2)Anything needs to be done explicitly for VAR to handle seasonality and random components of the constituent timeseries(s)?
3)What are the other multivariate time series similar to VAR model?
4) Are there any R packages that supports Inventory Management other than SCPerf and InventoryModelPackage? I'm looking for implementing inventory optimization using R. Any leads/sample code could be helpful for me.
Regards
Lal
ERP software can be useful in analysing the above models.
Question
I want to do a PERMANOVA using AIC as the selection criterion. I did PERMANOVA with adonis of package vegan but I can´t find how to use AIC:
> m <- adonis(dune ~ ., dune.env)
> extractAIC(m)
Error in UseMethod("extractAIC") :
no applicable method for 'extractAIC' applied to an object of class
Do you know how to do it? Or, is there another way to do PERMANOVA in R (choosing the distance matrix) that allow using AIC?
I realise that this question is now three years old! But for all of you who are still googling the same question and happen to land here, a colleague of mine showed me this useful script which answers this question:
Cheers, Mark
Question
Hi,
I was wondering if there is an R-package (and functions therein) that implements Bayesian Phylogenetic Mixed Models (BPMM) or is the general R-package "MCMCglmm" for Bayesian Mixed Models currently the best option?
Bit late, but in case anyone else wants this, here's a paper that implements BPMMs with MCMCglmm.
Question
As you know, Linear Discriminant Analysis (LDA) is used for a dimension reduction as well as a classification of data.
When we use LDA as a classifier, the posterior probabilities for the classes are normally computed in the statistical library such as R. (For example, https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/predict.lda.html)
Then how are the posterior probabilities computed? How is it estimated in the projected low-dimensional space?
Yusuke,
you can find the manual formula on Pattern Recognition and Neural Network by BD Ripley on page 49
Question
Hello everyone :)
I am currently conducting a comprehensive meta-analysis on customer loyalty, with a huge amount of articles, that are using SEM to evaluate the strengths of the relationships between the different variables I am interested in (satisfaction, loyalty, trust…etc).
I saw, that for most of the meta-analysis, the effect size metric is r. But since all my articles of interest are using SEM, I just could report the Beta coefficients, t-values and p-values. Is it okay to use these kinds of metrics to conduct a meta-analysis?
I saw an article of Peterson (2005), explaining how to transform a beta into a r coefficient for the articles where the r is not available. This is a first start, but this is not giving me a comprehensive method for conducting a meta-analysis only with SEM articles (what metrics should I code? what are the statistics to compute?...etc).
My question is then: is it possible to conduct a meta-analysis with articles using SEM? If yes, do you have references explaining how to code the metrics and compute the statistics for the meta-analysis?
Thanks in advance for your help ! :)
Kathleen Desveaud
1. You can use a structural equation approach to meta-analysis, as suggested by Gordon. I recommend another book by Mike Cheung: https://www.amazon.de/Meta-Analysis-Structural-Equation-Modeling-Approach/dp/1119993431. For this, you would use the individual correlation matrices of the primary studies to generate a meta-analytical variance-covariance matrix and run SEM on that. This method is preferable to first meta-analysing the bilateral relation and using the matrix of those relations as input to SEM.
2. You can use the output of SEM as you would treat any regression method for inclusion in meta-analysis. Hence, you can use the regression coefficients (or its derivative: partial correlation) and run the meta-analysis on that. Check the first question here: http://www.erim.eur.nl/research-facilities/meta-essentials/frequently-asked-questions/ (disclaimer: I wrote the answer). This is based on the work of Ariel Aloe and others: DOIs: 10.1080/00221309.2013.853021 and 10.3102/1076998610396901
Good luck!
Robert
PS: I think Martin Bjørn Stausholm misunderstood your question as referring to standard errors. His answer is absolutely correct (I think) but doesn't relate to structural equation modelling (SEM)
Question
.
If anybody wants to do this in Python, you can use the following method:
def _calculate_vips(model):
t = model.x_scores_
w = model.x_weights_
p, h = w.shape
vips = np.zeros((p,))
s = np.diag(np.matmul(np.matmul(np.matmul(t.T,t),q.T), q)).reshape(h, -1)
total_s = np.sum(s)
for i in range(p):
weight = np.array([ (w[i,j] / np.linalg.norm(w[:,j]))**2 for j in range(h) ])
vips[i] = np.sqrt(p*(np.matmul(s.T, weight))/total_s)
return vips
Where model is a trained instance of a PLSRegression object.
Question
Using R code in Vine copula package we can have tree gaph of dependent copula, can we draw the same tree in Matlab....any help?
Question
STATA does not support the moving block bootstrapping where one can specify the length of the block so I have to the command myself,
I would be very thankful of anyone can help
Cheers
Not really.
The answer that I always get is that STATA does using the bootstrap option. But effectively this is not moving block bootstrap. Stata does not treat my dataset as cross-section and this is irrelevant for my time series analysis. Have you tried R or Matlab ?
Question
In decision making applications based on neutrosophic logic, how to sort in best to worse order for following:
(T, F, I) : (1,0,0) (1,0,1), (1,1,0), (1,1,1), (0,0,0), (0,1,1), (0,1,0)
true (1,0,0) > weak true (1,1,0) > contradiction (1,0,1) > {ignorance (0,0,0), saturation (1,1,1)} > neutrality (0,1,0) > weak false (0,1,1) > false (0,0,1).
ignorance (0,1,0) and saturation (1,1,1) are not comparable.
Question
Hi!
We are trying to estimate body mass (W) heritability and cross-sex genetic correlation using MCMCglmm. Our data matrix consists of three columns: ID, sex, and W. Body mass data is NOT normally distributed.
Following previous advice, we first separated weight data into two columns, WF and WM. WF listed weight data for female specimens and “NA” for males, and vice-versa in the WM column. We used the following prior and model combination:
prior1 <- list(R=list(V=diag(2)/2, nu=2), G=list(G1=list(V=diag(2)/2, nu=2)))
modelmulti <- MCMCglmm(cbind(WF,WM)~trait-1, random=~us(trait):animal, rcov=~us(trait):units, prior=prior1, pedigree=Ped, data=Data1, nitt=100000, burnin=10000, thin=10)
The resulting posterior means of posterior distribution were suspiciously low (e.g. 0.00002). We calculated heritability values anyway, using the following:
herit1 <- modelmulti$VCV[,'traitWF:trait WF.animal']/ (modelmulti$VCV[,'traitWF:trai tWF.animal']+modelmulti$VCV[,' traitWF:traitWF.units']) herit2 <- modelmulti$VCV[,'traitWM:trait WM.animal']/
(modelmulti$VCV[,'traitWM:trai tWM.animal']+modelmulti$VCV[,' traitWM:traitWM.units'])
corr.gen <- modelmulti$VCV[,traitWF.traitW M.animal']/ sqrt(modelmulti$VCV[,'traitWF: traitWF.animal']*modelmulti$VC V[,'traitWM:traitWM.animal']) We get heritability estimates of about 50%, which is reasonable, but correlation estimates were extremely low, about 0.04%. Suspecting the model was wrong, we used the original dataset with all weight data in a single column and tried the following model: prior2 <- list(R=list(V=1, nu=0.02), G=list(G1=list(V=1, nu=1, alpha.mu=0, alpha.V=1000))) model <- MCMCglmm(W~sex, random=~us(sex):animal, rcov=~us(sex):units, prior=prior2, pedigree=Ped, data=Data1, nitt=100000, burnin=10000, thin=10) The model runs, but it refuses to calculate “herit” values, with the error message “subscript out of bounds”. We’d also add that in this case, the posterior density graph for sex2:sex.animal is not shaped like a bell. What are we doing wrong? Are we even using the correct models? Eva and Simona Relevant answer See our published paper on the topic: Cross-sex genetic correlation does not extend to sexual size dimorphism in spiders Question 10 answers Good morning, is there any way to check for multivariate outliers when data is not only composed by continuous variables? My dataset includes categorical variables (with 2 and 3 levels) and continuous variables. I guess whether exists any modification from Mahalanobis distances or any other test. I know that at least there is one, developed by Leon & Carriere for looking for outliers in data including nominal and ordinal variables (reference below). But i did not find any software including it. Leon, A.R. y K.C., Carriere. 2005. A generalized Mahalanobis distance for mixed data. Journal of Multivariate Analysis. 2005. 92, 174-185. I work mainly in R and MPlus, so it would be great if you can give a solution within tghis software. Thanks! Relevant answer Hi Mario. Thanks for your response. I'm familiar with MVN, and the various estimators. I'm working with ordinal data, five categories (these Likert-type scales we are so fond of in psychology). They are still sometimes treated as continuous data and I just felt a little strange not doing the usual tests in that context. I think I've cleared my confusion now. Question 3 answers I wonder how to compute statistical weight of a negative ion (anion) of hydrogen. I know that statistical weight of the electron is ge=2, of proton is also gp=2 as they both have spin equal to 1/2. Statistical weight of hydrogen atom is gH=gp×ge=4. Assuming that the electrons in negative ion are on different energy levels I get gA=gp×ge×ge=8. Is that correct? Relevant answer I agree completely. I understood that your question was regarding the multiplicity when electrons are on different energy levels. ( "Assuming that the electrons in negative ion are on different energy levels I get ...") Question 7 answers Cronbach's alpha for all the dimensions is satisfactorily above 0.7 All factor loadings are satisfactory. The Model fit indices are not satisfactory when CFA is run on AMOS, and error message "the following covariance matrix is not positive definite" is displayed. Please help. Relevant answer By eliminating the items having very high level of correlation the above-mentioned problem was solved. Question 4 answers Hi, Recently I was reading a paper from Will G Hopkins about magnitude-based inferences and downloaded his related spreadsheet. This concept is new for me and still gives me a lot of doubts. Indeed I did not understand what he means as "value of effect statistic" and what he considers beneficial (+ive) or harm (-ive). Someone can help me ? Thanks in advance Relevant answer magnitude-based inference(MBI) is an interpretation that clinicians need in their evidence-based approach. Due to clinical heterogeneity and small sample sized, clinicians' decision-making about an treatment is not only based on statistical significance, but an overall interpretation of effects including benefits and harms of a treatment, confidence interval (range of uncertianty). Question 2 answers Hi Your project sound really interesting. For your interest, we have developed highly novel methodology, referred as HMM-GP (Paper as attach file), for artificially generating synthetic daily streamflow sequences. HMM-GP, a suite of stochastic modelling techniques, integrates highly competent Hidden Markov model (HMM) with the generalised Pareto Distribution (GP). The application of HMM model retains the key statistical characteristics of the observed (input) streamflow records in the synthetic (output) streamflow series but essentially re-orders the magnitude, spacing and frequency of streamflow sequences to simulate realistically possible multiple alternative (artificial) flow scenarios. These synthetic series could be utilised in a range of hydrological/hydraulic applications. Moreover, within the HMM-GP modelling framework, Generalized Pareto Distribution (GP) fitted to values over 99 percentile allows highly accurate simulation of extreme flows/events. I would be very happy to hear from you if you have any comments/questions for me. Best wishes Sandhya Relevant answer No reply!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Question 1 answer In case you are having an heavy tailedness with residual distributions or you suspect Endogenity apply GMM. Relevant answer Please, explain the problem in details Question 3 answers Hello, I want to implement a function using Matlab, that can be used to perform sampling without replacement with unequal weights. - Without replacement: When sampling without replacement each data point in the original dataset can appear at most once in the sample. The sample is therefore no larger than the original dataset. With unequal weights: When sampling with unequal weights the probability of an observation from the original dataset appearing in the sample is proportional to the weight assigned to that observation. where the samples' weights should be changed at each iteration. I have found this function, in the link: function I=randsample_noreplace(n,k,w) I = randsample(n, k, true, w); while 1 [II, idx] = sort(I); Idup = [false, diff(II)==0]; if ~any(Idup) break else w(I) = 0; %% Don't replace samples Idup (idx) = Idup; %% find duplicates in original list I = [I(~Idup), (randsample(n, sum(Idup), true, w))]; end end where a vector of probabilities will be generated using random() function. for example: n=3; p=0.5; M=20; N=1; random('Binomial',n,p,[M,N]) Any suggestion would be appreciated. Relevant answer i say this method. if sampling be simple. so your mention are true Question 6 answers I am working with dichotomous data (8 items). To find out which model fits my data best, I used the WLSMV estimator in lavaan and specified two models: 1-factor-model 2-factor-model First I specified a model with 1 factor (myModel) with WLSMV estimation and the ordered=c(...) argument in lavaan. This is the code: fit <- cfa(myModel, data=data.deskr, ordered=c("T1_OD1", "T1_OR1", "V1_OD1", "V1_OR1", "T2_OD1", "T2_OR1", "V2_OD1", "V2_OR1"), group.equal = c("loadings")) It worked well- I got an output for this model including fit-indices etc. Then I specified a 2-factor-model in exactly the same way. It also worked well. The 2-factor model is called `myModel2'/ "fit2". Then I wanted to compare these two models by using: anova(fit, fit2) The warning message is: Error in lav_test_diff_af_h1(m1 = m1, m0 = m0) : lavaan ERROR: unconstrained parameter set is not the same in m0 and m1 Is there somebody who can help me? Thank you :) Relevant answer Hi Verena, I think you can compare the models with the AIC BIC values in the output. I've been reading this before. I do not have detailed information about the program you are using, so I can not comment on it in more detail. Best wishes. Question 5 answers Dear friends, Hello. On my Logistic Regression classification task the some parameters are quality (e.g., race, city, country, etc.). Please, answer for my questions: 1. We have to code values of these parameters as numbers 1, 2, 3,… Is it important to install these numerical values according similarity of its quality values or from classification point of view it isn’t important? 2. Is it possible to code missing values of some parameters for some objects as zero? Thanks in advance for your answers. Relevant answer Hi! My answers are below. regards, Sergey. 1. If it is possible, you should use your knowledge about parameter values. For example, if parameter "weight" has 3 values (high, medium, low) you should use value 1 for Low. 2 for medium and 3 for High 2. Same coding should be for missing values. For example, if parameter "weight" has 3 values (high, low, don't know) you should use value 1 for Low. 2 for 'don't know" and 3 for High Question 2 answers I am making a machine readable database of synaptic electrophysiology data from different sources. It is possible to find two experiments that have the same experimental condition, therefore data normalization is necessary. I am working on a list of possible covariates that influence the synaptic signals and possible suggestion for data normalization with respect to that covariate (please find the attached pdf document). Please let me know if you know a covariate that is missing from this list, also your proposals if you had any. Relevant answer Clément I am not sure what do you mean by technologist. Could you please elaborate on this. Question 4 answers I have temperature time series data, in which temperature is fluctuating with the time. I want to plot pdf of the temperature without any predefined fit. Relevant answer You can fit the data to likely distributions and the one with the lowest AIC will be the best 'fit'. There are functions in R that will do this for you e.g. 'fitdistr' :) Question 4 answers We are doing a mini project about differential equation Modeling. It is to use yeast data for logistic differential equation. We need to estimate parameters using Incomplete/Missing data set. Two questions: 1. Except least squares, MLE, what else is good for parameter estimation (quick & dirty)? 2. How to deal with data set with missing data points? (My students are using Neuron network and Gradient Descent) Thank you! Relevant answer My student used gradient descent with training, the result is also good. Question 1 answer I am trying for regression analysis in MATLAB and want to plot PPV vs Scaled distance using power function with log-log scale. If there is any specific command for this, kindly provide me. Relevant answer loglog(Y) loglog(X1,Y1,...) loglog(X1,Y1,LineSpec,...) loglog(...,'PropertyName',PropertyValue,...) loglog(ax,...) h = loglog(...) Question 8 answers I want to carried out a research on classification with missing data. But I want to explore a new method. I need you suggestions please. Relevant answer Data imputation is always tricky business. Mean imputation can be problematic because you are shedding away some of the inherent variance in the data set by injecting more observations about the mean. Columbia has a good write up on imputation. Many times a domain expert can be great assistance when deciding how to, or if you should fill in missing data. Asking the question, why is there data missing is very important. Was this a sensor error, or was there no data for a dat range, etc. These question can help guide you towards not imparting biases and noise into your data set. Hope this helps! Cheers, kyle Question 7 answers Hello Before applying multifactorial analyzes, we must ensure that the data follow a Gaussian distribution; Please, send me documents Relevant answer PCA ia an exploratory technique, You can apply PCA on non normal multivariate data. Question 4 answers I want to fit the nested linear-nonlinear Poisson models to the spike train of a neuron. How can I test the model performance by computing how much the log likelihood increase from a fixed mean firing rate model? i.e., what test can I use to see if the increase was significant? Is it possible to use the one-sided signed rank test? If yes, how? Relevant answer Use the BIC if you can get the Bayes factor otherwise use BIC Question 4 answers My question is about Bayesian VARs (to be specific, how to utilize them in Eviews). To begin with, I would like to know whether Bayesian VARs are superior to conventional VARs, or when to use Bayesian VARs, rather than conventional VARs. And suppose that I have to use Bayesian VARs in my research. In Eviews, unlike unrestricted VARs, I have to specify priors, and so on, when employing Bayesian VARs. Are there any "cookbook" (e.g. t-value greater than 2.0) procedure in using Bayesian VARs in Eviews? I will be delighted if you provide me with non-technical lecture notes or spell out the general procedure. Many thanks in advance, Mizuki Tsuboi Relevant answer Classical VARs are more straightforward to apply. Bayesian VARs are not so straightforward to apply. For them one must assume a prior distribution which can be quite subjective. With data this can be updated to obtain the posterior distribution with which to obtain Bayes solution. However if there is a vast amount of historical data the Bayesian approach might be beneficial. The relative performance of the classical frequentist approach and the Bayesian approach depends on the scenario. I use Eviews 7. There is VAR modelling program but there is no automatic facility for Bayesian VAR modelling. I believe that this is no hindrance because the researcher can improvise and customize the program to suit his/her specific needs if he/she is sufficiently knowledgeable. Question 2 answers My Dear dollgeues I have study in simulation about bivariate distributions(pareto, exponential) and need any help about this work. Relevant answer read articles carefully and try to find the derivations. U'll find yours problem IA Question 11 answers While the estimate and 95% Confidence Interval are available, it is unclear what the degrees of freedom would be. For example, with completely made up data, one might want to compare the association between sleep disturbance and depression (e.g., OR=1.2 [1.1, 1.3] k=10) as well as sleep disturbance and anxiety (e.g., OR=1.25 [1.15, 1.35], k=15). Relevant answer Hi Bruce I am glad you find it useful The mvmeta in Stata is quite straightforward to use. There also other commands (including a mvmeta) in R. I attach the link of a review paper for multivariate meta-analysis with guidance on how to perform it in various software Question 2 answers I have hierarchical model with prior parameters mu and tau. I should do simulation study and I should suppose true value of mu and tau and I should use these true values to generate datasets. My question: How I could choose true values of mu and tau? Is there rule to choose them? Thanks Relevant answer I talk about checking our model by simulation study. In simulation study I need to true value of parameter that is used to generate data sets. I need to know the rules of selection this true value of parameters. Question 4 answers I have a 347x225 matrix, 347 samples (facebook users), and 225 features (their profile), and I used the PCA function for the dimension reduction in Matlab. x = load (dataset) coeff = pca (x) It generated a 225x98 matrix. But I don't understand what exactly it is generating and I am unable to understand what to do next. Cananyonee help me with the understanding? My main goal is to reduce the dimension of my original matrix. and I don't what is coeff = pca(X,Name,Value) [coeff,score,latent] = pca(___) [coeff,score,latent,tsquared] = pca(___) [coeff,score,latent,tsquared,explained,mu] = pca(___) Relevant answer your PCA has returned 98 PCs that summarise the variation contained in your original 225 variables. ('coeff'). This variable shows you what way your original variables are being combined in the data reduction. The values of each of the PCs are obtained from 'score', if you want to use your reduced data in subsequent analyses it is these that you need. 'latent' shows how much variance each PC explains (PCs are ranked in descending order of contribution to variance). This will help you see what PCs are most useful for explaining your dataset - the lower the variance, the lower the contribution to the dataset. 't-squared' is used for outlier analysis 'explained' is the latent expressed in relative terms (% of overall variance) 'mu' is the mean of each original variable As Modh has indicated there are a lot of resources out there. The resource I used when I started was Multivariate Analysis from CAMO. http://www.camo.com/books/ it is oriented towards their software but it is more textbook than manual so may be worth looking at. Question 3 answers Assume a joint pdf of a bivariate data is a known distribution but not bivariate Gaussian. I have read that Copula might be used to measure dependency between the two variables. But is there a way to remove the dependency between the two variables? Can we convert dependent bivariate variables to independent bivariate variables? Relevant answer But is there a way to remove the dependency between the two variables?- No Can we convert dependent bivariate variables to independent bivariate variables?- No Further Comment: The advantage of using copula is that it captures both linear and non-linear dependence between variables, unlike the standard Spearman's rank correlation statistic which measures only linear dependence and applicable to two variables at a given time. Interestingly, it is possible to extract the multivariate dependence structure between more than two variable using copula which is not the case when using the standard correlation measure. Question 3 answers Hello, I'm working on a panel data, containing different banks in different years, and I'm trying to get a regression using all the data I have, for that I have to run an homogeneity test in order to see if the same coefficient can apply to all the banks. Trying to run the test using R using the pooltest function, I get the following error " Error in FUN(X[[i]], ...) : insufficient number of observations" I went to see the function code and found the error only appears when (nrow(X) <= ncol(X) which is not my case. since my data has 60 rows and 15 columns. What to do ? Relevant answer Generally when this happens you have misinterpreted the error message which, of course, can be cryptic. However your question is not clear. Where is the pooltest function located(i.e. what package, and what are you trying to do)?. Now, a 60x15 data matrix is problematic for regression. Most people would say the absolute minimum number of observations per IV is 5. Yours appears to be 4, assuming I understand what you mean by the data matrix. It is also not clear what you mean by "using all the data that I have". By homogeneity test are you referring to a Chow test or something like it? If you clear up some of these things, it should increase your chances of getting useful help. Best wishes, David Question 1 answer Need to interpret the trends shown in charts. Relevant answer Dear Prof. Faued , what if there are no statistically significant ranks? Question 6 answers As far as I have understood, in Weka there is only one option i.e., Replace with mean or mode.If I want to experiment with other imputation methods like regression, machine learning techniques, which tool would you suggest me to use? Relevant answer Thank you Fitriyani for your suggestion. Question 6 answers Problem: Given a Direct Graph G (as shown in Fig Below). Each vertex represent a sensor node (have a limited battery). A link -> has a weight (represents the probability that will send the data packet to ). This weight will be fixed for a t seconds and then will be changed after that, as the battery of the nodes will be consumed per time. For example, as shown in Figure below, say we have node , pr( -> )=0.3; pr( -> )=0.5; pr( -> )=0.2; ; after a time t, this probability distribution will be changed ( according to the energy level of the battery ). please see the link or the attached file. Relevant answer Time-inhomogeneous Markov chain that its transaction matrix is a random variable described by time-inhomogenous Poisson Process. Question 4 answers For the same field, I have both yearly and quarterly data sets. The first set is detailed by product, so I can use panel techniques, and my results are globally satisfying. The second set is not detailed, and looks quite suspicious, in particular concerning seasonality. No satisfying estimation can be found. How can I transform the yearly formulations into quarterly ones? My equations use an error correction framework,, so the dynamics are essential. Relevant answer I would use the Spline method of interpolation Question 2 answers Someone told me to use boostrap instead, but are they equivalent? Is bootstrap a better estimator? Relevant answer Thank you very much Alberto. Your answer helped me a lot. Question 2 answers how to find VaR vector out of copula, say i have find the parameter of t copula... Relevant answer Dear Ye Liu, I am interested in looking for the VaR of a univariate r v which is function of several or all of my margins. Question 3 answers Hello, I have a lot of historical data, from the past 20 years, I've been told that Bayesian could increase the predictions of several interesting subjects like the sample size or the model parameters estimation. I would like to wake up all of these data (I have like hundreds of variables for ~100000 data). Do you know what good uses of historical data from experiments I could do by the use of Bayesian ? Thank you very much for your answers. Yacine HAJJI Relevant answer Yes, the Bayesian approach to statistical inference is based on the assumption that the parameter can have a probability distribution instead of having a constant value. Knowledge of this prior probability distribution of the parameter may be used to update the distribution when data becomes available to obtain its posterior distribution. This Bayesian approach is known to provide better estimates than the Classical approach for which the parameter is not treated as a random variable. Historical data may be taken advantage of in suggesting a prior distribution for the parameter in question. Question 3 answers Suppose we have to algorithm (A and B) to solve a multi-objective problem. Each algorithm provides a set of solutions. Which statistical test is appropriate to compare these algorithms? Is Wilcoxon test appropriate? Relevant answer Dear Hamid, To check the statistical significance of pairwise differences among solutions, the Wilcoxon test is fine and the Chi-square test also would work. To perform these tests, the null hypothesis is that there is no significant difference between the two solutions at a significant level (e.g., 5 %). The p value and z value are used to assess the significance of differences between the solutions. When the p value is less than the significant level (0.05) and the z value exceeds the critical values of z (−1.96 and +1.96), the null hypothesis is rejected, meaning that the performance of the algorithms is significantly different. Good luck with your study. Question 3 answers I'm looking for a equation of ARL of EWMA chart and to be easy calculate it by manually. Relevant answer Did you check this research Question 9 answers Hi everyone, My data exhibits bimodal distribution. Both of them are beta distribution at different intervals and does not overlap. My doubt is how to obtain the first moment (mean) ,second moment (variance), third moment and so on of the total distribution? Thank you in advance Sam Relevant answer The general formula of moments of mixture say a F + (1-a) G of two distributions F and G follows from the definition of moments and general properties of integrals. Accordingly, the moment of order k of the mixture equals to m_k = \int_R x^k d( a F(x) + (1-a) G(x) ) = = a \int_R x^k dF(x) + (1-a) \int_R x^k dG(x) = a m_k(F) + (1-a) m_k(G) (the Stieltjes integral's notation applied). Obviously. existence of the moments involved should exist. Note, however, that the formula is invalid for central moments,unless the means (= moments of order 1) are equal. Probably, no particular reference can be easily found. One can try LadyWiki under mixture of probability measures , see e.g. With respect to estiamtes of the proportion a , the advice by Subrata, for samples with separted subsamples (which is the case in question due to separability of the supprors of the beta distributions) , the MLE equals s a* = n_1/n . This is due to the fact that n_1 is of binomal probability distribution with parameter a , if the sample size equals n. Simple reference: Regards, Joachim Question 10 answers This is w.r.t a hybrid of ANN and logistic regression in a binary classification problem. For example in one of the papers I came across (Heather Mitchell) they state that "A hybrid model type is constructed by using the logistic regression model to calculate the probability of failure and then adding that value as an additional input variable into the ANN. This type of model is defined as a Plogit-ANN model". So, for n input variables, I'm trying to understand how an additional input n+1 to a ANN is treated by the activation function (sigmoid) and in the summation of weights multiplied by inputs process. Do we treat this probability variable n+1 as an additional feature that will have its own weight associated with it or it is treated in a special way ?. Thank you for your assistance. Relevant answer It depends on why the author had to do that but the performance of ANN has less to do with the inputs. What improves learning basically are the architecture or topology of ANN used and the choice of learning rate. Although the scales of the inputs can affect but that can be handled using feature scaling or normalisation Question 2 answers The Issue is that the inverse of a Co-variance Matrix In Mahalanobis Distance sometimes leads to extreme values (Inf or NaN for example) when I try to calculate it. Is it something that is expected of an inverse of a matrix ?. If yes, then how to deal with these extreme values. Info : The Co variance Matrix is obtained from a set of feature vectors; comprising of Intensity, Homogeneity and Entropy in my case If the answer could be demonstrated with an example, it would be very helpful. Relevant answer Very helpful answer Prof. Ettte Etuk. But I can't seem to find a good reason to find inverse of matrix, for eg, A matrix with matrix elements having very small values (i.e., less distance) tends to large values when finding the inverse. I would like to know more about this inverse co-variance matrix and why is it necessary in finding the Mahalanobis distance. Question 2 answers Hi . I am trying to think about the suitable model. I want to Chose and for that I ask myself: What are the differences between the bivariate Probit Modell (biprobit in STATA) and the bivariate ordered Probit Modell (bioprobit in STATA)? Both can be used as Seemingly Unrelated Regressions, but in the bioprobit part , it says there have to be some valid exclusions restrictions....Is this really needed? And is there also a specification test in STATA for ordered bivariate models? Relevant answer Yeah, but as I understand it , is, that you really Need instruments for that, a recursive structure alone is not sufficient (like in the bivariate Probit Modell, where the nonlinearity is enough for identification in a recursive structure) this is not the case for an ordered model. Question 2 answers Hello researchers, I'm trying to fit a Multivariate probit Model for a project on household consumption that I'm working on. I was wondering if there is a way I can fit a Bayesian variation of the model. I will appreciate any help you can provide me with. Kind regards Relevant answer Thanks for your response Fadhil. I think i should just go the frequentist way using Stata. Question 3 answers I have 11 sleep outcomes - all binary .i.e. initial insomnia? yes/no. I want to see whether the 126 individuals I have cluster onto the different sleep variables. I have tried EFA before, and the structure is proven to be inappropriate for this, so am now trying cluster analysis potentially with Manhattan distance method. I am not sure how to go about this in STATA and would appreciate the help to be able to see whether my variables are clustering and from there, work these into regressions. Thanks. Relevant answer I am not a Stata user so I am not going to be helpul on that. However, I can propose you a strategy for the analysis. 1. Map the patients using multiple correspondence analysis (MCA), i.e. an equivalent (roughly speaking) of principal component analysis for binary variables. You will be able to represent the variables and the individuals on the same plane(s), and thus identify groups of individuals sharing the same covariate pattern. 2. Use individuals' MCA scores to perform an HCA with an Euclidean distance. 3. Project the resulting partition(s) on the previous plane(s). This can be done in a few lines of R code: let me know if you're interested. Question 5 answers I have quarterly data for some variables and annual data for other variables. Please how do I go about the estimation. Which procedure will be best? Relevant answer Hi, there is yet another possibility - you can also transform annual data to quarterly frequency - you can do it in R: but there is also a possibility to do it in Eviews. Best regards KB Question 4 answers For the variable capital gain there are 22,792 observations and 90% of them contain data points with a value of 0, what is the best way to normalize the data before building a artificial neural net model? Relevant answer Matthew, Your University of Iowa offers first rate statistical consulting. I assume that you have more than 1 variable in your data set. We need to know dependent variables, Independent variables, and your research question. Collect these things and take them to the appropriate UIowa Statistical Consultation Center listed at the link below. The answers to your questions may not be simple. Best wishes, David Question 5 answers Using nnstart in MatLab I have generated a MatLab function (shown below), how do I now use it to predict outcomes? Relevant answer The answer: output="neural network"("data set") Question 3 answers Hi all, I want to find out the significant effect that will reduce the open circuit potential (OCP) of metal-coating system. I have four factors with two levels and a limited time period for experiment, measured at an interval. The samples fail if their OCP is above a critical potential value in the time frame of my experiment. I read that minitab could help in this analysis and I would like to do the similar analysis in session 5 of the paper cited below. Is it possible for my case to do DOE life testing? I'm not good in statistic and I don't know how to get minitab to do the analysis for me. Please help! Guo, H., & Mettas, A. (2012). Design of experiments and data analysis. Paper presented at the 58th Annual Reliability & Maintainability Symposium (RAMS 2012). Relevant answer You can use DOE to fit your data to establish functional relationships between your factors and the response. Since you have 4 factors with two levels you can use either full factorial design or fractional fractional design. In case of using full factorial design the number of experiments will be 16, while if you want to use fractional fractional design the number of experiments will be 8. Those designs are better than Taguchi method because they can fit empirical models. You can do it by Minitab by selecting 'Stat' in toolbar then select DOE. After that select factorial design and then create factorial design. After that assign your fcators and their levels. Question 2 answers I see there is the lcmm (latent classes mixed models) package for R. What issues did you encounter? Relevant answer Thanks! Question 1 answer I am looking for an example (dataset and code) of fitting a spatial hurdle model (zero-inflated model with a single source of zeros) using the INLA package for R. With this model, two equations are jointly fitted: (1) the probability to observe at least one event (the single source of zeros) (2) the (strictly > 0) count of events I am aware of the R-!NLA documentation available at http://www.r-inla.org/ Relevant answer I asked the question on R-SIG-GEO and got two useful replies: 1. From Thierry Onkelinx That allows for a hurdle model with an intercept only model for the zeroes. Your initial question suggests that you want a more complex model for the zeroes. Op 17 feb. 2017 5:38 p.m. schreef "lancelot" <renaud.lancelot@cirad.fr>: Dear Thierry, Thank you for your reply. I wonder whether it would be possible to do something with zero-inflated models of type 0: see http://www.math.ntnu.no/inla/r-inla.org/doc/likelihood/zeroinflated.pdf, first equation on page 1. All the best Renaud Le 17/02/2017 à 15:29, Thierry Onkelinx a écrit : Dear Renaud, IMHO you can't. INLA currently only fits zero-inflated distributions (with a single zero-inflation parameter). A hurdle model would require to fit a zero-truncated distribution. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium 2. From Facundo Muñoz Are you aware of Quiroz et al. 2015? I think it is what you need, or very close. And they provide data and code. There is also Muñoz et al. 2016 (Disclaimer, that's me), which uses a joint model Bernoulli for zeros and Gamma for >0, with spatial-temporal effect via SPDE. The data and code are available as a companion R Hope it helps. ƒacu.- Quiroz, Z. C., M. O. Prates, and H. Rue (2015, March). A bayesian approach to estimate the biomass of anchovies off the coast of perú. /Biometrics/ /71 /(1), 208-217. http://dx.doi.org/10.1111/biom.12227 Muñoz, F., B. Marçais, J. Dufour, and A. Dowkiw (2016, December). Rising out of the ashes: Additive genetic variation for crown and collar resistance to hymenoscyphus fraxineus in fraxinus excelsior. /Phytopathology/ /106 /(12), 1535-1543. Question 6 answers Hi everyone, I am giving a lecture next week on transforming non-normal data to normal. Transforming skewed data to normal is fairly easy to do using the Box-Cox transformation. However, I cannot find a transformation that helps with kurtotic data? I have seen some recommend the modulus transformation. I tested it using a monte carlo simulation and it failed to normalize symmetrical but highly leptokurtic data (approx 0 skew and 10 kurtosis). Does anyone know of a transformation that normalizes kurtotic data? thanks, Cristian Relevant answer Hello Cristian, Z scores can easily be transformed back to the raw data. The transformed data follow N(0,1), Question 3 answers I get the number of components for my dataset using BIC but i want to know if the Silhouette coefficient method is the right option to validate my results. Thanks! Relevant answer Silhouette analysis is based on the distance of the data point, it is very friendly to linear based clustering such as K-Mean. As a measure strategy, you can try to use it but its performance on density natural data might not be very ideal. Meanwhile, if you want to know whether the number of components is a correct choice, may be you can try Variational Bayesian Gaussian Mixture. This is a deviation of traditional GMM which could automatically output the best component numbers. Question 5 answers I have daily data from Jan/1/2008 to Jan/1/2012 i would like to create dummy variable for the whole period after a specific date that is after March 2011, in addition i would like to create another dummy variable for the period from March 2011 to June 2011, How to do that using Stata 13 Thanks in advance Relevant answer There may be a more efficient way to do this, but here is one solution. (NOTE: I am assuming your date variable is in stata's date format). You can generate a variable for month and year: gen year=year(date_var) gen month=month(date_var) gen dummy=(year>=2011) replace dummy=0 if dummy==1 & month<3 & year==2011 The same idea can by applied to the other period as well. Question 4 answers Dear Members of RG community, I try to calculate incremental variance explained by variables in multivariate multiple linear regression model, but I don't have Sum of squares parameters like multiple linear regression. I'd like something like: library(car) #Create variables and adjusted the model set.seed(123) N <- 100 X1 <- rnorm(N, 175, 7) X2 <- rnorm(N, 30, 8) X3 <- abs(rnorm(N, 60, 30)) Y1 <- 0.2*X1 - 0.3*X2 - 0.4*X3 + 10 + rnorm(N, 0, 10) Y2 <- -0.3*X2 + 0.2*X3 + rnorm(N, 10) Y <- cbind(Y1, Y2) dfRegr <- data.frame(X1, X2, X3, Y1, Y2) (fit <- lm(cbind(Y1, Y2) ~ X1 + X2 + X3, data=dfRegr)) #How do we get the proportion now? af <- Anova(fit) afss <- af$"test stat"
print(cbind(af,PctExp=afss/sum(afss)*100))
#
Obviously doesn't work. There are some kind of approach for this?
Thanks
Stepwise regression would be a bad choice. See attachment RF4.pdf above.
Question
I want to find approximately equal value of Z in ACF(Auto correlation).
how to find it?
If you want an approximation of two values you can use:
_approx(x, y)
Question
Probability limits can be used if theta follows say a gamma or beta distribution, but I am not sure of using probability limit approach for some charts like EWMA or CUSUM. I would like to know if control limit of the form E(theta) +/- L*SD(theta) is always applicable when theta is not normally distributed.
An individual control chart is not robust to non-normally distributed data. So you would need to transform theta or at the very least remove outliers.
See: Gest. Prod., São Carlos, v. 23, n. 1, p. 146–164, 2016 http://dx.doi.org/10.1590/0104–530X1445-14, Method for determining the control limits of nonparametric charts for monitoring location and scale.
Question
I want to show the difference between two sets of Analytic Hierarchy process (AHP) data with proper statistical methods. Which method will be the most compatible and appropriate? Two sided hypotheses test of means or multinomial logit models or any other methods ?
I believe the best recommendation is to read through Tomas L. Saaty's books on AHP. At first I think you have to compare the Consistency Index for both matrices - if they are different statistically, you definetely can say that there is a significant difference in matrices. There are some research on this problem, for example, "A statistical approach to consistency in AHP" by F.J. Dodd, H.A. Donegan, T.B.M. McMaster. And Mr. Ebert's recommendation after that to use the Mantel test is absolutely appropriated so.
Question
We know that merging two Poisson processes results in another Poisson process with a rate that is the sum of the two original rates. (https://www.probabilitycourse.com/chapter11/11_1_3_merging_and_splitting_poisson_processes.php)
What type of process do we get by merging two processes with lognormal (interarrival time) distributions? How are the parameters of this process related to the parameters of the original lognormal processes?