Science topic

# Statistical Modeling - Science topic

Explore the latest questions and answers in Statistical Modeling, and find Statistical Modeling experts.
Questions related to Statistical Modeling
• asked a question related to Statistical Modeling
Question
I am working in the field of forest ecology by using statistical modelling. Please write me for any further clarification.
Thanks
i work with fish ecology
• asked a question related to Statistical Modeling
Question
I am currently doing my undergraduate study, and I am using UTAUT2 to evaluate factors that can affect adoption of a new application we made. I have 30 sample size, all obtain using a purposive sampling only. I want to use SEM by performing PLS but it seems that I needed a bigger sample size, what other statistical models can I use?
UTAUT2 (Unified Theory of Acceptance and Use of Technology 2) is an extension of the original UTAUT model and incorporates additional constructs to explain user acceptance and adoption of technology. When analyzing UTAUT2 data, various statistical models can be employed depending on the research objectives and the nature of the data. Here are some commonly used statistical models for analyzing UTAUT2 data:
1. Multiple Regression Analysis: Multiple regression is a common statistical technique used to analyze UTAUT2 data. It allows you to assess the relationships between the predictor variables (UTAUT2 constructs) and the outcome variable (user acceptance and use of technology) while controlling for other variables.
2. Structural Equation Modeling (SEM): SEM is a powerful statistical technique that estimates and tests complex relationships between latent variables. UTAUT2 constructs can be treated as latent variables, and their relationships can be assessed using SEM. This approach provides information on the direct and indirect effects of the constructs on user acceptance and use of technology.
3. Partial Least Squares Structural Equation Modeling (PLS-SEM): PLS-SEM is an alternative approach to SEM that is often used when the sample size is relatively small or the data violate the assumptions of traditional SEM. PLS-SEM is suitable for analyzing UTAUT2 data when the focus is on predicting the outcome variable rather than estimating population parameters.
4. Bayesian Structural Equation Modeling (BSEM): BSEM is a statistical approach that incorporates prior information and uncertainty into the estimation of structural equation models. It allows researchers to specify and test more flexible models and provides a Bayesian framework for model estimation and comparison.
5. Hierarchical Linear Modeling (HLM): HLM is a statistical technique used to analyze nested data or data with a hierarchical structure. If your UTAUT2 data includes multiple levels (e.g., users nested within organizations), HLM can be employed to examine how the UTAUT2 constructs influence user acceptance and use of technology at different levels.
These are some of the statistical models commonly used to analyze UTAUT2 data. The choice of model depends on factors such as the research objectives, sample size, data characteristics, and the complexity of the relationships being investigated.
• asked a question related to Statistical Modeling
Question
I am refering to models like REM, SC, REST and CTDS
The minimum number of detections required for applying statistical models to estimate the density from camera traps for unmarked individuals can vary depending on the specific statistical model and the study design. However, a general rule of thumb is that a minimum of 10-15 independent detections per individual is needed to reliably estimate density using camera traps.
This means that each individual should be detected at least 10-15 times in order to accurately estimate the population density. The number of independent detections is important because multiple detections of the same individual at the same camera trap location do not provide additional information for estimating density.
It's worth noting that this is just a general guideline, and the exact number of detections required will depend on the specific statistical model used and the study design. In some cases, more detections may be necessary to achieve reliable density estimates, particularly in situations where detection probability is low or highly variable.
• asked a question related to Statistical Modeling
Question
I can't find articles that describe the quality of multivariate statistical models according to RMSEC, RMSECV and RMSEP values. I find only describing about R2 and RPD, as in the Williams 2003 article.
This looks like the old PRESS analysis (David Allen who was at the university of Kentucky ,)n multivariate language. Why can't you use AIC ? It's a bit more modern and IMO a better way to compare predictive models. Some references are given in the attached paper. Best wishes David Booth
• asked a question related to Statistical Modeling
Question
In my time series dataset, I have 1 dependent variable and 5 independent variables and I need to find the independent variable that affects the dependent variable the most (the independent variable that explains most variations in the dependent variable). Consider all 5 variables are economic factors.
There are several statistical models that can be used to analyze the relationship between time series variables. Some common approaches include:
1. Linear regression: This is a statistical model that is commonly used to understand the relationship between two or more variables. In your case, you can use linear regression to model the relationship between your dependent variable and each of the independent variables, and then compare the coefficients to determine which independent variable has the strongest effect on the dependent variable.
2. Autoregressive Integrated Moving Average (ARIMA) model: This is a type of statistical model that is specifically designed for analyzing time series data. It involves using past values of the time series to model and forecast future values.
3. Vector Autoregression (VAR) model: This is a statistical model that is similar to ARIMA, but it allows for multiple time series variables to be analyzed simultaneously.
4. Granger causality test: This is a statistical test that can be used to determine whether one time series is useful in forecasting another time series. In other words, it can be used to determine whether one time series "Granger causes" another time series.
5. Transfer function model: This is a statistical model that is used to analyze the relationship between two or more time series variables by considering the transfer of information from one series to another.
It's worth noting that there is no one "best" model for analyzing the relationship between time series variables, and the appropriate model will depend on the specifics of your data and the research question you are trying to answer.
• asked a question related to Statistical Modeling
Question
LUE model consider the climatic factors even some model considers CO2 fertilization effect on radiation energy conversion, so I think the lue model is not only statistical model but also on the basis of bio-physical or bio-chemistrical theory. But I have no idea which is right.
Seek help from well-meaning scholars. Thanks a lot.
Dear Dr. Sun,
I guess that you might ask a wrong person since my major is meteorology. Please check the authors' full name or the coauthors for the article that you are interested.
Sincerely,
Runhua Yang
• asked a question related to Statistical Modeling
Question
Dear scholars,
I want a statistical model to analyze my data on rare a rare disease( asymptomatic or submicroscopic malaria). I want a consultation from experts in the field.
I am convinced that logistic regression is not suitable for my study however there are dozens of published articles used it. I want to see it in different way.
Abdissa B.
Abdissa Biruksew Hordofa What is the hypothesis you want to test?
• asked a question related to Statistical Modeling
Question
Currently, data is available in forms of text, images, audio, video and other such forms.
We are able to use mathematical and statistical modeling for identifying different patterns and trends in data which can be used through machine learning which is a A.I's subsidiary for performing different decision making tasks. The data can be visualized in variety of forms for different purposes.
Data Science is currently the ultimate state of Computing. For generating data we have hardware, software, algorithms, programming, and communication channels.
But, what could be next beyond this mere data creation and manipulation in Computing?
I understand manipulation of data from the past. How will we manipulate data from the future? Further the analysis of data seems constrained by science and mathematics development. David Booth
• asked a question related to Statistical Modeling
Question
I have a dataset of 140 students who have sat an exam. The exam had a total of 30 marks, split equally into an A and B question. The A question was numerical and was made up of several calculations. The B question was an essay response to a problem.
The exam has been marked twice. The first time using a mark scheme, each of the calculations had a maximum possible number of marks as did the essay. Each sub question was marked with reference to the mark scheme before all of the scores were totalled. The second time was based purely on academic judgement, the same marker considered the whole exam and made a judgement based on the overall performance.
I've calculated the correlation (.847, p=.00003), and I'm keen to explore this further, but I'm drawing a blank on which statistical models could be used? Any suggestions?
A basic answer is to start by doing some graphical plots to illustrate what is going on.
• asked a question related to Statistical Modeling
Question
The statistical model I used for the calculation of air-sea CO2 fluxes, gives me results of net air-sea CO2 flux in GtC yr-1.
I would like to convert GtC year-1 to mmol m-2 d-1
Dear Faty Patricia,
the information that you give is not enough to give an answer: If you want to convert a total flux (mass/time) into a flux per area (mass/time/area), you need to specify what the total area is that your total flux is integrated over. Also, you need to take care whether the total flux is in Gt CO2 / yr or in Gt carbon / yr, because 1 Gt of CO2 contains (12 / (12 + 2*16)) = 0.273 Gt carbon.
My suspicion is that the flux in Gigaton/year is for the total global ocean. Let's assume it is Gt carbon (not CO2) Then the conversion would be
FluxPerArea [mmol/m^2/day] =
TotalFlux [Gt/yr] * 10^18 [mg/Gt] * 1/12 [mol/g] * 1/(365.25} [year/day] / total area of the ocean [m^2]
I looked up the total area of the ocean on wikipedia, and it is around 362 * 10^6 km^2, which is 362 * 10^12 m^2.
So in the end the conversion would be
FluxPerArea [mmol/m^2/day] = TotalFlux [Gt/yr] * 10^18 / (12 * 365.25 * 362 * 10^12)
In case your flux is in Gt CO2, not Gt carbon, then you have to replace the 12 (molar weigth of carbon) by 44 (molar weigth of CO2)
I hope that helps..
Cheers, Christoph
• asked a question related to Statistical Modeling
Question
Hello friend,
I have done my thesis in simple lattice design (2 replications) for 2 years. I want to do combined analysis of variance. How can do it? which statistical software? which program?
Thanks
Hello dear you can use R.software
• asked a question related to Statistical Modeling
Question
Having effect sizes from multiple time points after intervention, which is the best way to take it into account? Which statistic model can I use?
Thanks to both for the answers!
• asked a question related to Statistical Modeling
Question
TAG is nothing but Thrust Area Group. The main objective of TAG - Learning Analytics is to work on the following objectives :
To provide helpful information to optimize or improve learning designs.
Track and evaluate student use of learning materials and tools to identify potential issues or gaps, and provide an objective evaluation of the materials and tools.
Identify patterns in historical data and apply statistical models and algorithms to identify relationships between various student academic behavioural data sets to forecast trends.
Exploring the relationship between computational thinking and academic performance.
Predictive Analytics: understanding the future of student learning ability
. Sustainable quality education
Anyone interested to join this TAG can register here.
Yes
• asked a question related to Statistical Modeling
Question
I applied Ordinal Logistic Regression as my main statistical model, because my response variable is 7 Point-Likert Scale data.
After testing for Goodness of Fit using AIC, i got my best fit model, including 4 independent variables (3 explanatory and 1 factor variable).
However, I encounter 1 negative coefficient value (0.44 odds) of 1 explanatory variable (all explanatory variables are also 7 point Likert-scale).
My theoretical assumption is simple: the more frequency of explanatory variables (engage in activities) happen, the higher impact score on response variable (mutual understanding)
That's why I am confused when 1 independent variable has negative coefficient.
In this case, how should I interpret this IV?
Thank you very much,
Just like any other regression..If you want examples simply Google your question. Best wishes David Booth
• asked a question related to Statistical Modeling
Question
I have previously conducted laboratory experiments on a photovoltaic panel under the influence of artificial soiling in order to be able to obtain the short circuit current and the open-circuit voltage data, which I analyzed later using statistical methods to draw a performance coefficient specific to this panel that expresses the percentage of the decrease in the power produced from the panel with the increase of accumulating dust. Are there any similar studies that relied on statistical analysis to measure this dust effect?
I hope I can find researchers interested in this line of research and that we can do joint work together!
Dear Dr Younis
Find attached:
1-(1) (PDF) Spatial Management for Solar and Wind Energy in Kuwait (researchgate.net)
2-(1) (PDF) Cost and effect of native vegetation change on aeolian sand, dust, microclimate and sustainable energy in Kuwait (researchgate.net)
regards
Ali Al-Dousari
• asked a question related to Statistical Modeling
Question
Hi
I'm working on a research for developing a nonlinear model (e.g. exponential, polynomial and...) between a dependent variable (Y) and 30 independent variables ( X1, X2, ... , X30).
As you know I need to choose the best variables that have most impacts on estimating (Y).
But the question is that can I use Pearson Correlation coefficient matrix to choose the best variables?
I know that Pearson Correlation coefficient calculates the linear correlation between two variables but I want to use the variables for a nonlinear modeling ,and I don't know the other way to choose my best variables.
I used PCA (Principle Component Analysis) for reduce my variables but acceptable results were not obtained.
I used HeuristicLab software to develop Genetic Programming - based regression model and R to develop Support Vector Regression model as well.
Thanks
Hello Amirhossein Haghighat. The type of univariable pre-screening of candidate predictors you are describing is a recipe for producing an overfitted model. See Frank Harrell's Author Checklist (link below), and look especially under the following headings:
• Use of stepwise variable selection
• Lack of insignificant variables in the final model
There are much better alternatives you could take a look at--e.g., LASSO (2nd link below). If you indicate what software you use, someone may be able to give more detailed advice or resources. HTH.
• asked a question related to Statistical Modeling
Question
Greetings to all those interested and eager to help.
In short: During 12 breeding seasons, my colleagues and I researched the nesting of one bird species on an area of ​​about 11,000 hectares. We spent the first two years looking exclusively for territories/nests. We recorded a total of 34 different territories/nests. For the next ten years, we had in mind to monitor the reproductive parameters (laying dates, number of eggs, number of offspring, etc.) for all 34 territorial pairs found. However, due to the vast study area, hard mountain relief, bad weather conditions and lack of time in general, we did not visit every territorial pair every year (we did it completely randomly). So, in some years, we followed the nesting parameters in only five territories, while in others, we managed to monitor up to 20 territories and collect reproductive data. For each year collected data table contained: year, number of controlled nests, number of nests with incubation, number of successful pairs, number of fledglings, productivity and nesting success. We defined productivity as the number of fledged juveniles divided by the number of successful nesting attempts. Nesting success was defined as the number of fledged juveniles divided by the total number of nesting attempts during one calendar year. My question is whether it is possible and in what way (statistical modelling, simple formula, etc.) to express the population trend for the entire monitored population with the help of partially collected reproductive data? So is it possible to project a ten-year overall population trend based on annual productivity and nesting performance data?
Sincerely yours,
This is exactly why statisticians suggest you know your overall question before you collect data. Good luck, David Booth
• asked a question related to Statistical Modeling
Question
How to develop a suitable model based on problem. Give some real time example
I agree with @Imra. Nevertheless I throw in a couple of examples. See the attached. Best wishes David Booth see the following response
• asked a question related to Statistical Modeling
Question
Hello,
I work with a knockout model infecting cells with cre-adenovirus to induce the deletion.
I am analyzing my qPCR results and I am not quiet certain about the best statisticalmethod to use, because I want to analyze the correlation of cre-adenoviral infection and animal genetic status (basically, I want to know how much of the gene expression is affected by cre-adenoviral infection by itself and how much is as a result of the mutation).
To evaluate that I have four groups:
- Wild-type cells not infected
- Wild-type cells cre-infected (n=3 for wild-type cells)
- Transgenic cells not infected
- Transgenic cells cre-infected (n=6 for transgenic cells)
I have tried different statistical models, but I believe I am still lacking understanding about choosing a statistical method.
PS: the reason for sample size differences was the unexpected result of gene expression being significantly affected in wild-type cells infected with cre-adenovirus - when I proved that wild-type cells were also affected, I did not have the opportunity to gather more samples.
Thank you
This is a typical 2x2 design: you have two experimental factor with 2 levels each:
• genotype (levels: wild-type and transgenic)
• infection (levels: yes and no (verum and mock))
There are several differences in gene expression that can be related to these factors. There can be:
• a genotype-effet in infected animals
• a genotype-effet in uninfected animals
• an infection-effect in wild-type animals
• an infection-effect in transgenic animals
Some of these might be interesting, others no so. For instance, the innfection-effect in wild-type animals would be interesting to show that the infection actually worked.
The most relevan scientific question, however, is about the difference in infection-effects between the two genotypes. This is addressed statistically by the interaction of the factors genotype and infection in a two-factorial model.
Practically, you would do a two-way ANOVA (using the dCt values) and look at the interaction term.
• asked a question related to Statistical Modeling
Question
Hi.
I have a data in which the relationship between two parameters seems to fit to a model that has two oblique asymptotes. Does any one have any idea about what type of function I should use? Please find attached a screenshot of the data. I appreciate any help.
Thanks.
Oblique assymptote rule for rational function - https://www.storyofmathematics.com/oblique-asymptote
• asked a question related to Statistical Modeling
Question
I want to read more about ordered probit model, but my searches are returning only application articles. Are there any good recomendations of texts that explain more of the theory behind it?
Hi Bruno,
I want to recommend the book: "Categorical Data Analysis" written by Alan Agresti. We used this book during our master (in statistics) program and the book builds up the theory from simple to complicated models to analyse categorical data.
Categorical Data Analysis, 3rd Edition | Wiley
His homepage where he lists out many more interesting books/papers/conference discussions around his work involved in categorical data analysis which is very useful.
I hope this helps.
• asked a question related to Statistical Modeling
Question
Any member of RG help me in applyuing statistical models for the explanation of adsorption process, like mono layer adsorption model, Double layer adsorption model and Multi-layer adsorption model and other then these another important statistical model if used please share with me. I read the article of Qun Li et al. group article that is published in chemical engineering journal in Dec 2021, Where these models are used but I am remain unable to apply these models on my adsorption articles.
The title of the article is:
"Effective adsorption of dyes on an activated carbon prepared from carboxymethyl cellulose: Experiments, characterization and advanced modelling"
Voir mes publication 2021
Je suis intéressée par les calculs
• asked a question related to Statistical Modeling
Question
I was wondering whether anyone had any suggestions as to what statistic to use (I use R Studio).
I have some muscle cell circumference measurements (response) and I have two explanatory variable columns (Control/Diet) and exercise (Yes/ No) (categorical) and I need the option to run an interaction between these explanatory variables (as I had 4 groups) as well as separately. A Bartlett's test states that its non-parametric data so does anyone have any suggestions as to what statistic/model I could use? I have looked into GLM's but as far as I can see, they don't work with interactions/ non-binomial data.
Thanks in advance for any recommendations.
• asked a question related to Statistical Modeling
Question
I am trying to estimate the strength of the relationship between a set of independent categorical variables (coded as binary variables; 1=yes; 0=no) and a continuous dependent variable. Which statistical model would suit here?
I agree..
To model probability of claims for example, each combo of the X(i)'s maps to a value of a continuous Y. Usually users define a rule based on the probability transform of Y: p=1/(1+e^{-bo-b1*X1+...+bn*Xn})<=y => Y=1 ; if a binary outcome is to be modeled.
• asked a question related to Statistical Modeling
Question
In my database have some variables such as income and age that are described in classes ( \$0 to \$2,000 or 20 to 29 years old for example). However, the texts I have read more often than not use those variables as numbers, not classes. As I see, using numbers allows for a more comprehensible analysis of most methods. Should I do it?
In the example mentioned, what should I test to convert \$0 to 2,000 to \$1.000?
If not, is there any other conversion possible?
I agree with David Morgan that this is often done as he described. I will advise against doing that however. Categories are categories and not intervals. Intervals have things like means and standard deviations. Categories don't have such things. I have counseled often in this space about using ordinal variables as something else. I will counsel against doing a similar thing here. The information was collected as categories not as intervals. Many different intervals can represent one category. You have no reason to choose an interval with mean equal to the center value and perhaps an arbitrary dispersion measurement. The original data didn't have that and IMO you're creating data when you do such a thing. If categories could be made intervals we don't need both groupings. BTW just because someone else measured the variable in intervals is no reason for you to create new data. Best wishes to all, David Booth, PSTAT
• asked a question related to Statistical Modeling
Question
The work entails tracing erosion in the last 100 years. It is testing the stability of the ecosystem as soil is deposited in various sites in the catchment. What statistical components do I need to look at and what is the most fit model that I can use so that I can analyse my data well?
This work is still at formative stages hence yet to be done. The purpose of the question is to find out from other experts in this area if and how statistics can be used to improve it.
That is a good question.
• asked a question related to Statistical Modeling
Question
I aim to allocate subjects to four different experimental groups by means of Permuted Block Randomization, in order to get equal group sizes.
This, according to Suresh (2011, J Hum Reprod Sci) can result in groups that are not comparable with respect to important covariates. In other words: there may be significant differences between treatments with respect to subject covaraites, e.g. age, gender, education.
I want to achieve comparable groups with respect to these covaraites. This is normally achieved with stratified randomization techniques, which itself seems to be a type of block randomization with blocks being not treatment groups, but the covariate-categories, e.g. low income and high income.
Is a combination of both approaches possible and practically feasible? If there are, e.g. 5 experimental groups and 3 covariates, each with 3 different categories, randomization that aims to achieve groups balanced wrt covariates and equal in size might be complicated.
Is it possible to perform Permuted Block Randomization to treatments for each "covariate-group", e.g. for low income, and high income groups separately, in order to achieve this goal?
Hi! You might want to check this free online randomization app. You can apply simple randomization, block randomization with random block sizes and also stratified randomization.
• asked a question related to Statistical Modeling
Question
I need some research papers on this topic. can somebody help me to pin some of the documents, please
• asked a question related to Statistical Modeling
Question
i) What kind of objective may be acheived by applying Markov chain analysis?
ii)How would be the arrangement of data?
first you understand the transition probability
• asked a question related to Statistical Modeling
Question
Dear Researchers,
We are aware that a shift in monsoon peak discharge may have an adverse impact on several water-based applications such as agriculture, dam operations, etc. E.g. I am interested to know how to quantify the same based on modeling approaches. Thank you!
Sincerely,
Aman Srivastava
1. On the Suitability of GCM Runoff Fields for River Discharge Modeling: A Case Study Using Model Output from HadGEM2 and ECHAM5, February 2012,
2. Development of a high resolution runoff routing model, calibration and application to assess runoff from the LMD GCM, September 2003,
3. Climate change and its impacts on river discharge in two climate regions in China, November 2015,
4. Modelling the potential impacts of climate change on hydrology and water resources in the Indrawati River Basin, Nepal, February 2016,
• asked a question related to Statistical Modeling
Question
Dear all,
I want to see whether a biomarker level at baseline can be used to predict the prognosis after a treatment alone as compared to a clinical parameter?
Which statistical model will be best to investigate it?
try to read this is an amazing book of Wout
• asked a question related to Statistical Modeling
Question
We have been conducting one continuous camera trap survey (2019 - present) with (n=40) camera traps set up across our study site. The objective is to determine prey availability in terms of demographic classes. However, since the majority of the prey species are not identifiable to individual, we are limited to unmarked species. Furthermore, we are open to the idea of using models that require population closure but will have to violate the assumption as we are purposefully comparing prey availability between breeding and non-breeding seasons.
Good question
• asked a question related to Statistical Modeling
Question
Many statistical tests require approximate normality (normal distribution should be seen approximately). On the other hand, normality tests such as Kolmogorov-Smirnov and Shapiro-Wilk are sensitive to the smallest departure from a normal distribution and are generally not suitable for large sample sizes. They can not show approximate normality (Source: Applied Linear Statistical Model). In this case, the Q-Q plot can show approximately normal.
Based on what is written in the book "Applied Linear Statistical Model", a severe departure from normality is only considered, in this case, parametric tests can no longer be used. But if severe departure is not seen, parametric tests can be used.
What method do you know to detect approximate normality in addition to using a Q-Q plot?
Neda Ravankhah
Thank you.
• asked a question related to Statistical Modeling
Question
in ASTM G102, corrosion rate is calculated using current density based on Faraday's law.
Question 1) Amplitude of current measured ACM sensor has a very large range, Should i take the average of the measured values?
Question 2) In most papers, statistical models are used for ACM sensor, why is ASTM G102 not being used?
Hi, according to ASTM G102, the corrosion current can indeed be obtained from galvanic cells as well, which in my opinion does not necessarily reflect a "true" corrosion current of a single material, since it is (as they say) a glavanic current. So that should answer your question 2 :-)
Your question 1 refers to the amplitude. In my opinion this is due to the wetting/drying and of course you have to average the current therefore or evaluate it separately.
Hope that helps.
Andreas
• asked a question related to Statistical Modeling
Question
This is my research problem so far:
In this scientific paper I will conduct an empirical investigation where the objective is to discover if the number of newly reported corona cases and deaths have been contributing towards the huge spike in volatility on the sp500 during the pandemic phase of the corona outbreak. This paper will try to answer the following questions: “Is there any evidence for significant correlation between stock market volatility on the SP500 and the newly reported number of corona cases and deaths in the US?”. “If there is significant evidence, can the surge in volatility mostly be explained by the national number of daily reported cases or was the mortality number the largest driver? “
So far i have conducted a time series object in R-studio containing the variables; VIX numbers, newly reported US corona cases and deaths. I have also converted my data into a stationary process and will later on test some assumptions. I have a total of 82 obersvations for each variable that stretches from 15. February to 15. June.
I do not have a lot of knowledge regarding all the different statistical models, and which ones that is logical to use in my case. My first thought was to implement a GARCH or OLS regression, although I am not sure if this is a smart choice or not. Hence, I ask you for some advice.
Best regards, stressed out student!
The performance of a model can only be judged after applying it to real data and comparing with simpler alternative models.
Please let me recommend this framework to compare and evaluate alternative models:
My recommendation is to use a set of alternative models and to use the above chapter to implement basic procedures to compare alternatives and interpret the results.
• asked a question related to Statistical Modeling
Question
I have 3 constructs. I have one dependent variable with 7 categories and 30 sub-categories. I have one independent variable with 5 categories and 20 subcategories and I also have one mediator variables with six categories and no sub-category. All these categories and sub-categories are assigned the value 0 or 1( dichotomous) Besides that I have 6 control variables, four are continuous one is dichotomous, one is scale..... Which is the appropriate statistical model and appropriate tests I can use for such type of data?
The RQs are: a) What is the role of reporting structure in firm's value
What is the role of stakeholders relationship in the relation of reporting structure and firm's performance?
Just to include examples IV = Performance Category1= Financial indicators Sub-category1= profit, sub-category 2= leverage, sub-category3= liquidity
Category2 = no-financial indicators sub-category1= environmental sub-category 2= economic sub-category 3= social.
DV=Governance Category 1= Board structure sub-category1= board size sub-category2= board profile sub-category3= board's experience
Category 2= Accountability Sub-category1= no discrimination sub-category2= fairness in reporting
I am interested to do the analysis up to category level. I can change the data from dichotomous(binary) to nominal. I am thinking to use averages or percentages to analyse the relationship between variables and constructs. Can I use Partial Least Square-SEM or any other parametric or non-parametric model?
First Step: Dimensional Reduction of IVs
This video is on Multiple Correspondance analysis reduction of Ivs into groups
Second step; Each of the groups after Multple corr ana can be coded.
Step Three: Take one Dv at a time and run Multinomial Logistic regr. Ana with the covariates as Stakeholders , other variables.
Try this and see
As the DVs are not correlated they can be analyzed separately
Refer to this video below
or
Try then and you can find a way.
Pls. keep us informed
• asked a question related to Statistical Modeling
Question
Dear all,
I would like to know if it is possible to use SAOMs (Stochastic actor oriented models) to analyse weighted networks?
Léa DAUPAGNE
There seems to be a less ised feature to handle ordinal values. It is discussed here: http://lists.r-forge.r-project.org/pipermail/rsiena-help/2013-July/000272.html
• asked a question related to Statistical Modeling
Question
I was looking for best statistical model for my research, my research concerning pregnant women. Also what is the suitable sampling technique can be used?
Discrete data are categorical data. One sample techniques include runs test, sign test and binomial test. Two sample techniques include Wilcoxson signed rank test and so on. Correlation techniques include Spearman's correlation technique. Association is measured by chisquare or Gsquare. And so on.
• asked a question related to Statistical Modeling
Question
I am working on a project, for that I want to make a statistical model. Please guide me for data and different operations used in this project.
Take a look at A W Warwick’s classic book ‘Ore sampling in mines’. I think it’s a free Google book. Written over 100 years ago well before computers when one worried about rattlesnakes and not pythons...
• asked a question related to Statistical Modeling
Question
We have very large data sets of populations across five years. We want to compare the proportion of people in different categories statistically, controlling for the differences in sample size. For example, if we have a total sample of 10,000 people in Year 1, and 17% are ages 18-24, how do we compare that 17% to the proportions of same age category in Year 2 with 15,000 total sample, Year 3 with 18,000 total sample etc.? I had assumed it would involve weighting but want to get expert opinions on approaches to this. Ultimately, the goal would be to see whether there are statistically significant differences in the proportions in each age category, controlling for differences in the total sample. Thank you in advance!
Hello Lia,
You could use the chi-square goodness of fit test. If Year 1 results were to be considered the baseline, and you'd like to know whether Year 2 shows a substantive departure from Year 1, then the Year 1 proportions for the respective category would be multiplied by the Year 2 N to obtain expected category frequencies. Across categories, the expected frequencies are compared to the observed frequencies for year 2. This is evaluated as a chi-square statistic with #categories - 1 df.
With sample sizes as large as you have, such a test will have considerable statistical power to detect differences.
Finally, if you consider that your yearly data sets truly are the full population, there is no need for any statistical or hypothesis tests at all. Any difference from year to year is a real difference (e.g., you have the population parameters).
• asked a question related to Statistical Modeling
Question
Dear all,
I would like to know if it is currenly possible to use temporal ERGMs (Exponential Random Graph Models) for analyzing weighted networks?
For now, it seems that software packages available to analyse TERGMs (tergm or btergm) only use binary networks.
Léa Daupagne
• asked a question related to Statistical Modeling
Question
I want to statistically model mode choice behavior ( modes: bus, train, car, other modes). My independent variables are gender(male/female), car ownership (yes/no), age (continuous), household income( continuous), travel time, travel distance, travel cost (continuous). Along with that service quality parameters: comfort, reliability, safety etc which are ordinal data collected from questionnaire survey to be included in the model. Users has been asked to rank service quality parameters to rank in an ordinal scale of 1-5.
My dependent variable in nominal and independent variables contains nominal, ordinal, continuous variables. In order to model mode choice, which statistical method should I choose: Multi-nominal logistic regression or Ordinary Logistic Regression?
TIA
Hello Md Musfiqur,
Given that your outcome is a nominal variable with 4 levels, it would appear that multinomial (logistic) regression is likely an appropriate approach.
• asked a question related to Statistical Modeling
Question
Hello everyone,
I am currently analyzing my dataset, I have obtained body weight data in insects for 7 days consecutively exposed to six different treatments in triplicates. I would like to analyze the effect of treatment over time, which model should I be using in SPSS?
Thank you so much in advance
I hope this isn't a curve ball but many time-based events can benefit from a graphical or variogram approach.
• asked a question related to Statistical Modeling
Question
Hello,
I am trying to setup a statistical model for a timecourse experiment. I have a total of 16 timepoints *7 before and 8 after treatment. I have 4 acclimation groups before treatment. At treatment, half the individuals from each group were treated with a protein inhibitor and all individuals are treated with a stress. Following treatment, I have 8 groups (inhibitor+stress, stress for each acclimation group). I have an unequal amount of measurements from each group at each time due to mortality and low quality data. This is not a repeated measures as each measurement is from a unique individual that was sacrificed. My data is non-normal as well possibly due to missing and low quality data. I read that I can use the average of each group to make up for the missing data points.
I have had great trouble trying to get each timepoint integrated in my model. I have tried analyzing by averaging all "before" and "after" timepoints for each group but it would be great to get results at higher resolution (point of the timecourse). I am using JMP but open to trying another program.
Any help you can provide here or point in a direction would be greatly appreciated!
Thank you!
*I forgot to add that I am missing data for a timepoint and some of the treatments do not have data for others.
What Jochen Wilhelm is describing is called an event study in Finance. You may wish to take a look at our last papers on the subject as an introduction to the subject. See Google Scholar references:
[PDF] Robust Methods in Event Studies: Empirical Evidence and Theoretical Implications.
N Sorokina, DE Booth, JH Thornton Jr - Journal of Data Science, 2013 - researchgate.netWe apply methodology robust to outliers to an existing event study of the effect of US financial reform on the stock markets of the 10 largest world economies, and obtain results that differ from the original OLS results in important ways. This finding underlines the … Cited by 67 Related articles All 8 versions [PDF] brockport.edu
[PDF] Analyst Optimism in the Automotive Industry: A Post-Bailout Boost and Methodological Insights
…, N Sorokina, Y Tanai, D Booth - Journal of Data …, 2015 - digitalcommons.brockport.eduThis paper empirically investigates the impact of the government bailout on analysts' forecast optimism regarding firms in the automotive industry. We compare the results from M- and MM-robust methodologies to the results from OLS regression in an event study context … Cited by 2 Related articl
Best wishes, David Booth
• asked a question related to Statistical Modeling
Question
I am currently building statistical models in Health care, education and Agriculture. and looking for someone who is willing to share primary data sets in these thematic areas. Support and or suggestions will be very much appreciated.
I will support
• asked a question related to Statistical Modeling
Question
I have generated so many models having a different coefficient of determination (r2) and RMSE. In this analysis, it has been seen that it is not mandatory that a model having maximum r2 has a minimum RMSE. It is very difficult for me to choose a model among them. I just want your suggestion. Which model should I choose? Should I go with r2 or RMSE?
Thank you.
P.S. Let me recommend this framework:
It was developed and presented for time series data, but can be easily adapted to the standard regression analysis setup.
• asked a question related to Statistical Modeling
Question
Does anyone have an example to share on a statistical model that incorporates temporal and spatial autocorrelation terms simultaneously?
Examples in ecology and hydrology research would be optimal. Thanks in advance.
You might want to check the following out - the first is an interesting article/tutorial that covers how to use R-INLA for performing the spatial and spatiotemporal modelling in R.
The second is an extremely useful textbook you should definitely get a hold off. You will find what you need in chapter 6 and 7. Beneath it is a link to the data and codes used in all chapters.
Most examples are epidemiological but highly applicable to your area of expertise.
Anwar.
• asked a question related to Statistical Modeling
Question
Hi,
For a research I am trying to compare google trends search volume series for two online retailers . However, as you migth know, Google Trends returns you a normalized search index value for a spesific time period and does not reflect real search volumes. So in my thinking, while estimating statistical models, this issue creates an important problem. Do you also think if a model includes more than one google trends variable, they should be weigthed? After that issue wiegthing could also be a problem too.
Omer Zeybek, I guess this question is not directed at the capability of Google Trends to compare but the type of statistical test to be used. It can be both parametric and nonparametric. Best of luck, bud.
Here is a paper we published.
• asked a question related to Statistical Modeling
Question
I have behavioral data (feeding latency) which is the dependent variable. There are 4 populations from which the behavioral data is collected. So population becomes a random effect. I have various environmental parameters like dissolved oxygen, water velocity, temperature, fish diversity index, habitat complexity etc. as the independent variables (continuous). I want to see which of these variables or combination of variables will have significant effect on the behavior.
I agreed with A. U. Usman answer. But some other techniques like non-linear analysis, cluster analysis, factor analysis, etc. may be utilized in this regard
• asked a question related to Statistical Modeling
Question
Dear:
Within the statistical process, there is a model (Mahalanobis Distance) that is used to support the regression. When is the ratio good on this scale? And why most Arab studies do not rely on it and do not refer to it. But most foreign studies rely on it to support the results of linear and multiple regression. Is it possible to clarify it? What is the best reliable percentage, as high and low values appear in spss?
Mahalanobis distance is an effective multivariate distance metric that measures the distance between a point and a distribution. It is an extremely useful metric having, excellent applications in multivariate anomaly detection, classification on highly imbalanced datasets and one-class classification. This post explains the intuition and the math with practical examples on three machine learning use cases.
• asked a question related to Statistical Modeling
Question
While running Multinomial logistic regression in spss an error displaying in parameter estimate table. For Wald statistics some item value is missing because of the zero standard error and displaying a message below this table "Floating point overflow occurred while computing this statistic. Its value is therefore set to system missing". Does anyone know how to resolve this error?
• asked a question related to Statistical Modeling
Question
Hi all,
My paper is looking at the most significant determinants of u.k. economic growth. I will look at macroeconomic , technological, human capital, socio-geographical and governance variables against the dependent variable of GDP growth. Anyone who has done research into growth-theories could suggest suitable statistical models which can be used on stata? I am looking at Baynesian Model Averaging currently but not sure that is the best model out there
Seems like a similar research I have done for my country and the EU in general.
You can run a Bayesian Model Averaging on panel data, using R software and BMS package: http://bms.zeugner.eu/ (here you can find also all support materials). As far as I know, BMA is not available in Stata and its full functionality could be run only in R.
• asked a question related to Statistical Modeling
Question
Statistical modeling based on the parameters affecting the characteristics , is it viable model to predict concrete properties based on the parameters
After conducting your experimental work in d laboratory,
Convert your mix ratio to measurable quantities eg Kg/m3
Run your model using d parameters. Check d Level of significance n of d parameters.
• asked a question related to Statistical Modeling
Question
Which error families are allowed in generalized least squares (gls) models? Can I have, for example, a binomial glm and define a covariance structure (which, I guess, makes it a gls) in it (see below example)?
model <- glmmTMB(response ~ ar1(predictor + 0 | group), data = data, family = binomial(link = logit))
And also, should I call a glm with a covariance structure a gls?
Yes you can do something but I would need to know more about your DV to say exactly what. I will include an example of our recent work to give you some idea of what our group does. Best, D. Booth
• asked a question related to Statistical Modeling
Question
When I have to choose my identification variable in lvl 1 and lvl 2, I cannot choose it because it is grayed out. Do you know why ? and how to fix it?
Just a guess but most software that I use won't take a name like lvl 2 but would take lvl2. Best wishes, David Booth
• asked a question related to Statistical Modeling
Question
Hi all,
I have run a three-factor mixed-design experiment with one between-subjects factor (biological sex: two levels) and two within-subjects factors (having four levels each). I have measured several continuous response variables, each of which I have already analysed with a standard ANOVA. I have also collected the values of a nominal (non-ordinal) categorical response variable. This response variable is non-binary (it takes one out of five possible values).
The question is: How to approach the statistical analysis of a three-factor mixed-design experiment with a non-binary non-ordinal categorical response variable?
In particular, I would like to be able to analyse main effects and interactions as in a standard ANOVA.
Any reference to a related R package would be more than welcome.
Francesco
Thank you Julio
• asked a question related to Statistical Modeling
Question
Accurate forecast can’t be possible by ARIMA model, generally used for forecasting.
It depends on the features of your dataset. If it is a time series apply Time series mogelin an forecasting methods. if it is not a time series apply multiple regression analysis. For the modelin and analysis, you can use any statistical package programs.
• asked a question related to Statistical Modeling
Question
In an exploratory study, If I want to state that certain components of counselling (7 items to assess), environmental modification (8 items) and therapeutic interventions ( 8 items) results in the practice of social case work, what analysis should I do?
NB: we have no items to assess practice of social work. Instead we want to state the practice of the other three components results in practice of social case work.
If according to Morgan
• asked a question related to Statistical Modeling
Question
I have been recently studying proprietary voter files and data. While I know that voter files are for the most part public (paid or free), I am confused as to how companies match this data to other data.
For example, the voter files (public) never reveal who you voted for, or your behavioral attributes, and so on. So how do companies that sell this "enhanced" data match every "John Smith" to other data. How can they say that they have a profile on every voter? Wouldn't that require huge data collection? or are there models that simply do that job for them?
Hi Melissa. I don'. Do you mean private opinion surveys ? They just ask who the interviewee voted for or who they intend to vote for. If I misunderstood your question and it doesn't have anything to do with my answer, I'm really sorry.
• asked a question related to Statistical Modeling
Question
In the November issue of the Journal of Conflict Resolution (https://tinyurl.com/tyhrn2j) Adrian Lucardi and I, debate with Bunce, Wolchik, Hale, Houle, Kayser, and Weyland, about whether democracy protest diffuse? We find across thousands of statistical models, that the don't between 1989-2000, in general? BW and W suggest that they might in very unusual circumstances.
What are your thoughts? In what situations do you think they might spread? Why, despite all the protests occurring today in close succession of each other, is no one is talking about protest diffusion?
This is a good question to call for an international conference, because the diffusion hypothesis seems to be widespread. Please look at my enclosed argument.
• asked a question related to Statistical Modeling
Question
Minitab software is used to fit model to experimental data. I go to Stay>DOE>Response surface and choose regressors in uncoded versions, as well as their lower and higher values, then responses at the same time. As far as I know the unusual observations and large residuals points must be totally excluded from data and RSM must be performed again with new data. However, right after I remove these points and rub RSM, the results show new unusual observations and large residuals, and again I remove them. The same thing happens which leads to an undesirable reduction in data. So, what must be done with unusual observations and large residuals?
IMO outliers should not be removed unless they represent KNOWN BLUNDERS. Otherwise study them for the information they contain. Example attached. Best, D. Booth
• asked a question related to Statistical Modeling
Question
If I have two parameters, A and B, is it always better to include A*A*B and A*B*B, or just A*B is enough. Generally, what is the basis for choice, does it depend on the number of factors, particularly in Minitab software. In response surface methodology, for example, software itself defines the terms, but in regression>fit model it is possible to include additional terms.
Hmmm first let me mention my fav resources in this area: https://www.amazon.com/Analysis-Experiments-Enhanced-Abridged-Companion/dp/1119593409/ref=sr_1_1?keywords=Douglas+montgomery+design+of+experiments&qid=1574824324&s=books&sr=1-1 and https://www.amazon.com/Response-Surface-Methodology-Optimization-Experiments/dp/1118916018/ref=sr_1_1?keywords=Douglas+montgomery+Response+Surface&qid=1574824473&s=books&sr=1-1. I would strongly suggest looking at these. Now you mention interaction. First, interaction appears in a factorial design structure. Following D of EXPTs above the model for a 2x2 complete factorial design is well-known to be y=a + bA + cB +dA*B + eps in regression form. Thus for 1 replication df(error)=0 It would make no sense to consider an A*A*B term in this context. Clearly we must replicate to get useful data. The basis of choice is the theory of 2x2 factorial designs and what makes sense there. This reasoning is applicable to all factorial designs where interaction can be studied. BTW software never defines the model. That is the purpose of the experimenter and the research question(s). I hope this and the Montgomery texts help clear this up. Best wishes, David Booth
• asked a question related to Statistical Modeling
Question
I am looking for statistical models (or results of relevant numerical simulations) to estimate the number of different types of contacts between particles in a powder mixture. The simplest case would be a mixture of particles of type A and B, both spherical and of same uniform size.
The goal would be to calculate the number of A-A, B-B and A-B contacts per unit volume.
More complex cases would include size and shape distributions (which may or may not differ between the components) and mixtures with more than two components.
Any relevant references are highly appreciated! Thanks in advance.
Hi Jan!
I'm not sure there is an answer to this since the problem as written depends on the exact packing state of the bed, which is presumably unknown at a particle level. I suggest that you make the additional assumption that the bed is entirely random and the the contacts are quite soft (you need this because otherwise the number of contacts is extremely sensitive to tiny particle positional errors).
If you allow those simplifications, then assuming the bed is large enough, you can make some progress by assuming that each particle has a coordination number determining its number of nearest neighbours. Each one of these then has a x% chance of being either A or (100-x) of being B depending on your concentration ratio, so you can construct a probability distribution for the distribution of A and B surrounding each particle. Then you can propagate this through the bed and see how/if it converges (you only need to do a 1D chain since the bed is random).
If you wanted to improve this then you would need to look at probability distributions of coordination numbers as a function of PSD and other particle properties. Standard DEM simulations routinely produce such data, although it is probably not published in an accessible form.
Pretty much all bets are off for hard, non-spherical particles...
• asked a question related to Statistical Modeling
Question
There are some works in literature in which selectivity models are developed in Minitab software by response surface methodology based on the data that have been collected by other researchers and optimization is performed to propose optimal conditions to maximize the selectivity for desirable products. I take the same data just to check how response surface methodology works, but I never get the same model that have been obtained in this works. The coefficients in the model for regressors and their interactions don't match. Even in such cases that R square and adjusted R square values are the same. Even in the case that I check some data points withy model and get the same responses as they get, but the coefficients are totally different. I don't know exactly whether I miss something or I don't use the software rightly or something else. I want to note that, in Minitab I go to Stay>DOE>Response surface and choose regressors in uncoded versions, as well as their lower and higher values, then responses at the same time. I think the problem might be in coded variables, are uncoded variables converted into coded ones in the same way, or it might be done differently? Or the problem might be with the way I interpret the results after I run RSM. What do you think, is the problem related to it or any other factors?
Dear Nurlan Amirov: Your DOE is for response surface, but you don't say what design it is. First you must determine the kind of design you use (CCD, Box-Behnken, D-optimal, etc.) and from that design there are variants that you should also know. With that information you can estimate the coding of factors that the software does. For example, in the CC design, the axial distance (alpha) influences (indirectly) the coding of the factors, since the coding of factors can be done with respect to the alpha value or with respect to the low-high levels that you chose from your factors .. According to what you describe, you do the coding with respect to alpha, but it may be that the software does so with respect to the low-high levels of the factors. Please check that aspect.
• asked a question related to Statistical Modeling
Question
I am interested in literature or research-results connecting Technology Entrepreneurship (either as a general concept or as a discrete variable) with Employment in high technology firms, (as a concept or discrete measurable variable). I am looking to statistically correlate the two variables, creating a whole statistical model.
Good evening I think it's linked to science fiction novels that could turn a kind of fantasy and attract attention  Reader / Viewer, just think this mint is effective in employing technology
• asked a question related to Statistical Modeling
Question
If you want to estimate a sample size for a qualitative research in an unknown population, what is the suitable statistical model to use?
Rather than a specific recommendation for N, what I would like to see considered is the distinctly different goals of qualitative methods in respect to what are traditionally considered "quantitative" methods. What is typically intended by quantitative methods are inferential statistical methods (there are other quantitative methods, but that is beyond the scope of this argument). The key for me, and I welcome argument herein, is that inference is not a goal of qualitative research; instead deep penetration with the sample at hand is the objective. Thus, the notion of "saturation" that I and Roshan Panditharathna mentioned is often used in thinking about how many, but even of more importance WHO, is sampled in a qualitative study. Thus the use of inferential notions to guide in consideration of sample size is an inappropriate conflation of methods.
• asked a question related to Statistical Modeling
Question
which statistical model is suitable to compare two curves for validation purposes, when comparing two force-displacement curves for instance?
You can use the design of experiments by YATES or TAGUCHI method for example
• asked a question related to Statistical Modeling
Question
I plan to carry out a survey research that will model a certain variable with other pre-specified variables. As this is not an intervention study, I am not obliged to register the study to such a database as clinicaltrials.gov. However, I plan to do that in order to increase transparency of my research. Particularly, I would like to make all the tested variables prospectively revealed in order to make the statistical modelling more valid.
Do you consider my plan correct? Would you give me any further tips on how to increase transparency of my study?
Consider registering under the Open Science Framework (www.osf.io). Maybe in addition to the clinical trials site.
• asked a question related to Statistical Modeling
Question
Hello everybody,
I have a problem regarding an analysis of a model in SPSS. The model consists of two IV (X and A), one Mediator (M) and my dependent Variable (Y). M is only a mediator for X though, not for A.
I used the Macro of Hayes (Model 4) to calculate my Model. However, here I can only put in A as a covariate, thus also establishing a mediator effect between A and M. To solve this, I redefined the Matrix of A, leaving my syntax as " process y=Y/m=M/x=X/total=1/cov=A/effsize=1/model=4/cmatrix=0,1."
Here is the twist: Process does not specify the total effect model (
"NOTE: Total effect model and estimate generated only when all covariates are specified in all models of M and Y."), therefore giving me no total of Rsquare of my complete model.
Is there any way I can get this result while using SPSS?
• asked a question related to Statistical Modeling
Question
Our research group on mental healthcare research is frequently facing the challenge to evaluate complex mental healthcare interventions. I am interested in sharing experiences and discussing methodological issues about such research projects. Maybe write a review on this subject?
Thanks to all colleagues who have responded. We are currently reviewing other options like hierarchical linear modelling.
• asked a question related to Statistical Modeling
Question
Dear Members,
I am studying customer behavior towards buying products from a specific company.
The company has three different products, Y1, Y2, Y3. These are agricultural inputs and coded as 0,1( buying or not). the company is interested in increasing it selling through existing customers. Currently, interest is to compute probability that if A is buying Y1 can buy Y2 and/or Y3.
My thoughts saying that estimate three logistics models, one for each product.
Y1=f(Y2,Y3, Xi) -----------------Xi are continuous variables.
y2=f(y1 y3 xi)
y3= f(y1 y2 xi)
Is it fine? else what you suggest?
Can i test correlation between these models, as in case of SUR?
Can we estimate like SUR?
Thank you for response.
I am not well prepared with simulation. Can you please share a source?
• asked a question related to Statistical Modeling
Question
I am looking for the correct model available within SPSS. I have measurements of a metabolite (some of which are negative) and need to determine if this is associate with a systolic blood pressure measurement. Then if there is an association does this stand-up when adjusting for age, sex and BMI? Finally, I need the appropriate model to determine if the metabolite level is associated to hypertensive cases (yes/no).
Both of the previous two responses are quite useful. The main thing I would add is that the term "hierarchical" refers to the order in which you enter more than one set of independent variables As David Morse indicates, you begin by entering the control variables, and then conduct a further regression that tests the effect of the key variable(s), above and beyond the control variables.
• asked a question related to Statistical Modeling
Question
we usually writing limit (theta) of the wrapped exponential density following way
[0,2pi)
Is there any problem if I change the limit into (0,2pi] ?
Because I have seen that, the important characteristic that differentiates circular data from data measured on a linear scale is its wrap-around nature with no maximum or minimum
yes there is. [ means 0 is in the interval. ( means 0 is not. Best, David Booth
• asked a question related to Statistical Modeling
Question
I performed a neuro-psychological battery consisting of 6 tests that measures different aspects of executive function. Each test has between 3 to 15 subscores. There is no consensus on whether one test or subscore is better than the other. There are a total of 64 subscores which are all non-normally distributed.
In this condition, is it appropriate to compare patients and controls using 64 univariate comparison using Mann whitney U and then perform Bonferroni correction or is there a more appropriate statistical model that I can use?
Hello Albert,
Sixty-four comparisons is a lot!
If you have the luxury of doing so (and in the absence of anyone else having so explored this), I would recommend factoring the subscores to determine what might be a more parsimonious set of factors/constructs that would account for the observed relationships among the subscores. Then, run statistical comparisons between groups on the factor scores. If factors are reasonably characterized as being inter-related, then a multivariate type analysis (see below for references to non-parametric multivariate tests) would make sense. If not, then a set of adjusted univariate tests could be conducted.