Science topic
Regression - Science topic
Explore the latest questions and answers in Regression, and find Regression experts.
Questions related to Regression
I am using L8 mix level. One factor (categorical) at 4 level and other four factors at 2 levels. How to develop regression equation
Hello, I have used machine learning regression algorithms, including Random Forest, Decision Tree, and K-Nearest Neighbors (KNN), to estimate soil moisture parameters. What other suitable and practical algorithms do you recommend for evaluating and estimating soil moisture?
Thank you.
I used to plot B-H plot. But getting negative INTERCEPT.
In three randomly selected primary care centers, we conducted a pilot intervention. We also have three control centers. The results are available for one month before the intervention, the first month of the intervention, and the second month of the intervention. However, I do not have individual-level data; I only have aggregated numbers. For example, in the first center, X individuals tested positive, and Y tested negative in the month before the intervention.Also all individuals were different since our healthcare center includes all those who screened during the study period.
My main question is: Which statistical test should I use? Can I use regressions, and specifically, which regression should I use?
Thank you
Hello everyone,
I have a data set with panel structure (panel data) with 78 individuals observed over 5 three-year periods. I have 10 dependent variables an 1 independent variable. I applied logarithmic transformations to all of them due to differences in scales. I have found that the true model is the FE and have tested for homoskedasticity and serial correlation, with the following results: heteroskedasticity and serial correlation.
However, I have read that serial correlation is not really an issue with panels with less than 20 time observations, so I would like to know if it's just safe to ignore it or not?
This leaves me with the problem of dealing with heteroskedasticity. I'm using R and cannot use Stata. So far I've pretty much used this presentation to run the regression: http://www.princeton.edu/~otorres/Panel101R.pdf
The image below shows the explanation of the covariance matrices:
So, stated all the above, I would like to know which covariance matrix is the best option. Should I only treat heteroskedasticity or serial correlation as well?

I learned ANOVA and Regression are general linear model but they have some differences. then shouldn't it be different betweeen MANOVA and multivariate regression(multiple dependent variables)? when I look in to internet, there are only instructions about how to conduct MANOVA on SPSS. is it becuase MANOVA and multivariate regression is same or SPSS just doesn't provide multivariate regression?
which software is best for conducting multivariate regression?
Hello to all dear friends
I have 2 questions
1- Why use regression to estimate heritability between parents and offspring?
2- Why use correlation to estimate heritability between full and half-siblings?
In Brewer, K.R.W.(2002), Combined Survey Sampling Inference: Weighing Basu's Elephants, Arnold: London and Oxford University Press, Ken Brewer proved not only that heteroscedasticity is the norm for business populations when using regression, but he also showed the range of values possible for the coefficient of heteroscedasticity. I discussed this in "Essential Heteroscedasticity," https://www.researchgate.net/publication/320853387_Essential_Heteroscedasticity, and further developed an explanation for the upper bound.
Then in an article in the Pakistan Journal of Statistics (PJS), "When Would Heteroscedasticity in Regression Occur, https://www.researchgate.net/publication/354854317_WHEN_WOULD_HETEROSCEDASTICITY_IN_REGRESSION_OCCUR, I discussed why this might sometimes not seem to be the case, but argued that homoscedastic regression was artificial, as can be seen from my abstract for that article. That article was cited by other authors in another article, an extraction of which was sent to me by ResearchGate, and it seemed to me to incorrectly say that I supported OLS regression. However, the abstract for that paper is available on ResearchGate, and it makes clear that they are pointing out problems with OLS regression.
Notice, from "Essential Heteroscedasticity" linked above, that a larger predicted-value as a size measure, where simply x will do for a ratio model as bx still gives the same relative sizes, means a larger sigma for the residuals, and thus we have the term "essential heteroscedasticity." This is important for finite population sampling.
So, weighted least squares (WLS) regression should generally be the case, not OLS regression. Thus OLS regression really is not "ordinary." The abstract for my PJS article supports this. (Generalized least squares (GLS) regression may even be needed, especially for time series applications.)
Hello dear academics/researchers
I want to use a test for robustness after conducting KRLS test. When i was looking for alternative tests i found lpoly and localp (advanced version of lpoly). Local Polynomial Regression is nice to visualization of graphs. But i'm not sure it will be a valid test after conducting KRLS. I couldn't find any paper that help me. That's why i prefer to ask you dear academics/researchers. Thank you in advance
Hi
I have a paper where the reviewer suggested the Benjamini-Hochberg Correction.
I have the following hypotheses/tests:
Mean differences across three groups: 5 DVs
Correlations of an ADOS score with various fluency scores: 5 correlations
Mean differences for two groups: 1 DV
Regressions with moderating variables: 6 regressions
I found the original (1995) paper and it seems that instead of using all tests across the whole study, they are grouped into families. My questions are:
1. Do I use the whole hypothesis total or do I do it by hypothesis? That is, is my n-tests=17 or 5, 5, 1, and 6 and I do the correction 4 times?
2. When I am doing mean differences across three groups, and especially for the regressions with all the moderators, am I counting the hypotheses correctly? In particular for the regressions, each beta weight is being tested along with the interactions. With the covariate and moderators I have 6 significance tests under the 1 regression analysis for 5 regressions, and 4 significance tests under 1 regression for the last one. Do I count the regression analysis as 6 (original) + 30 (beta weights for each regression with 6 beta weights) + 4 (beta weights for the regression with 4 beta weights) = 40? Relatedly, do I count the post-hocs in the ANOVAs or the covariates in the ANCOVAs?
3. Also, if p values are identical (e.g., <.000 in SPSS) they get the same rank?
4. Preliminary analyses are excluded, yes? I checked to see if groups were equivalent on age, IQ, etc. I suppose I could be fancy and do a BHC for that too but....the point is they are considered separately and not part of the hypothesis being tested, correct?
Thank you
Amy C
I have applied the regression and correlation analysis but the analysis is having a negative result. So I analysed on the Likert scale, still the result is negative.
I am using multiple regression, Correlation Analysis and simple regression in comparing variables for my dissertation. Any insights or resources?
Logistic regression can handle small datasets by using shrinkage methods such as penalized maximum likelihood or Lasso. These techniques reduce regression coefficients, improving model stability and preventing overfitting, which is common in small sample sizes (Steyerberg et al., 2000).
Steyerberg, E., Eijkemans, M., Harrell, F. and Habbema, J. (2000) ‘Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets’, Statistics in medicine, 19(8), pp. 1059-1079.
There is a research model with one IV, one DV and one mediator. So, when checking the parametric assumptions in SPSS, what will be done to the mediator?
Does the following answer is true?
Path 1: IV to Mediator
For the path where the IV predicts the mediator, the mediator is treated as the outcome.
- Run a simple regression with the IV as the predictor and the mediator as the dependent variable. Check parametric assumptions for this regression:
- Linearity
- Normality of Residuals
- Homoscedasticity
- Independence of Errors
Path 2: Mediator to DV (and IV to DV)
In this step, the mediator serves as a predictor of the DV, alongside the IV.
- Run a regression with the IV and the mediator as predictors and the DV as the outcome. Check assumptions for this second regression model:
- Linearity
- Normality of Residuals
- Homoscedasticity
- Independence of Errors
- Multicollinearity
I came across this article on Nature Human Behavior. I did not quite understand the analysis reporting here. What do the two decimal points mean?
Heiserman, N., Simpson, B. Discrimination reduces work effort of those who are disadvantaged and those who are advantaged by it. Nat Hum Behav 7, 1890–1898 (2023). https://doi.org/10.1038/s41562-023-01703-9

Hi everyone, I intended to use AMOS24 software to do a regression design with a moderator variable. That's why I installed this software. The software was installed without any problems, but there is a problem when I select the data file option, so that the data from spss27 enters Amos and is not listed. But the file itself is visible to me, even when I select the data view option, spss opens for me. what is problem ?
To my knowledge, the total effect in mediation reflects the overall impact of X on Y, including the magnitude of the mediator (M) effects. A mediator is assumed to account for part or all of this impact. In mediation analysis, statistical software typically calculates the total effect as:
Total effect = Direct effect + Indirect effect.
When all the effects are positive (i.e., the direct effect of X on Y (c’), the effect of X on M (a), and the effect of M on Y (b)), the interpretation of the total effect is straightforward. However, when the effects have mixed or negative signs, interpreting the total effect can become confusing.
For instance, consider the following model:
X: Chronic Stress, M: Sleep Quality, Y: Depression Symptoms.
Theoretically, all paths (a, b, c’) are expected to be negative. In this case, the indirect effect (a*b) should be positive. Now, assume the indirect effect is 0.150, and the direct effect is -0.150. The total effect would then be zero. This implies the overall impact of chronic stress on depression symptoms is null, which seems illogical given the theoretical assumptions.
Let’s take another example with mixed signs:
X: Social Support, M: Self-Esteem, Y: Anxiety.
Here, the paths for a and c’ are theoretically positive, while b is negative. The indirect effect (a*b) should also be negative. If the indirect effect is -0.150 and the direct effect is 0.150, the total effect would again be zero, suggesting no overall impact of social support on anxiety.
This leads to several key questions:
1. Does a negative indirect effect indicate a reduction in the impact of X on Y, or does it merely represent the direction of the association (e.g., social support first improves self-esteem, which in turn reduces anxiety)? If the second case holds, should we consider the absolute value of the indirect effect when calculating the total effect? After all, regardless of the sign, the mediator still helps to explain the mechanism by which X affects Y.
2. If the indirect effect reflects a reduction or increase (based on the coefficient sign) in the impact of X on Y, and this change is explained by the mediator, then the indirect effect should be added to the direct effect regardless of its sign to accurately represent the overall impact of both X and M.
3. My main question is: Should I use the absolute values of all coefficients when calculating the total effect?
There are many works and programs in which the closeness of measured and calculated response variables assesses regression quality. Is this enough?
Hi all,
I was hoping i could get some support on my difficulty in Amos. Please note that I am a novice to the software. I have a good model fit which I now wish to test with the common latent factor approach. I am using Amos version 27. I am running the model without the CLF and copying the standardized regression weights in excel. Then i am creating the CLF, and constraining the paths to a and the CLF to regression weight of 1. When i run the model and run the standardized regression weights they are identical to the model I run without the CLF. I think this cannot be correct. I am adding a photo of the model in question.
Any ideas or help would be much appreciated!

Hello all,
I want to assess the effect of one categorical independent variable along with a few confounders on one continuous outcome variable, I fitted a multiple regression and see the adjusted R square is negative like -0.002.
I added a few interactions and just increase from -0.002 to -.02, I believe linear regression is not a good model here.
All the figures for this variable is not positive to fit gamma regression or b/n zero and one to fit beta regression, I am not sure what to do in this case?
there is also no multicollinearity
Sample size=200
One independent variable
Two confounder
Any input appreciated.
Hello all,
I want to assess the effect of one categorical independent variable along with a few confounders on one continuous outcome variable, I fitted a multiple regression and see the adjusted R square is negative like -0.002.
I added a few interactions and just increase from -0.002 to -.02, I believe linear regression is not a good model here.
All the figures for this variable is not positive to fit gamma regression or b/n zero and one to fit beta regression, I am not sure what to do in this case?
there is also no multicollinearity
Sample size=200
One independent variable
Two confounder
Any input appreciated.
Hi everyone,
I am trying to identify relevant predictors of an outcome in multi-level data. I am wondering what the best approach is here - how do you go about the selection process? By significance in the overall model, recalculate, select again, recalculate again etc.? Using a normal regression stepwise, forwards or backwards procedure? LASSO (do I need to model the multi-level structure if ICC is low)?
Thank you so much!
I have monthly data of DJIA and S&P 500 (BSE) from 2003-2024 and I wish to test if there are any structural breaks in the data and to identify the same.
You are probably collecting data from tools such as interview guides where every respondent is giving different answers/opinions
I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV (perceived barriers) is nominal, and answers are not mutually exclusive as participants were allowed to select all options that applied to them. I was hoping to do a multinomial regression but as the DV is not mutually exclusive this is not possible. I am wondering if there is a similar analysis that allows for a non-mutually exclusive DV.
I would like to utilise the correct regression equation for conducting Objective Optimisations using MATLAB's Optimisation Tool.
When using Design Expert, I'm presented with the Actual factors or Coded factors for the regression equation. However, with the Actual Factors, I'm presented with multiple regression equations since one of my input factors was a categoric value. In this categoric value, the factors were, Linear, Triangular, Hexagonal and Gyroid. As a result, I'm unsure which Regression equation to utilise from the actual factors image.
Otherwise, should I utilise the single regression equation which incorporates all of them? I feel like I'm answering my own question and I really should be using the Coded Factors for the regression equation, but I would like some confirmation.
I used one of the regression equations under "Actual Factors" where Linear is seen, but I fear that this did not incorporate all of the information from the experiment. So any advice would be most appreciated.
Most appreciated!


Why direct inversion of mutivariate regression equation is not preferred and instead optimization techniques are used?
I'm so confused if I have too large R-squared over 0.9 in the regression. Does it mean over fitting, multicollinearity, or spurious regression? How can I solve it?
Can I easily do Regression with indicator variable using SPSS? Or is there a website that calculates it online?
I utilized a Bayesian state space model to analyze the data. The 95% confidence interval, which ranges from 0.0 to 1.2, prompts the question: Is the regression coefficient significantly different from zero?
I am getting a positive correlation between two variables. But in Multi-linear regression, i find the regression coefficient to be negative between those two variables. Is this normal?
added an answer
22 seconds ago
Hi my peers,
I have study with panel data from 2009 to 2018. My study is causal relationship . I need to highlight the temporal effect on the regression results... How can I do that?
I tried two methods but each produced different result to some extent.
The first was through creating year dummies through command.... tab year, gen (yr_) ,this produce ten dummy years.
The second method was through i.year which produced only 9 years, leaving the beginning year i.e 2009. However, the regression result to great extent unchanged but significant years become less in i.year inclusion.
So, please
I dont know what correctly to follow of these or if there is another way that show the temporal effects on the study findings.
Thank you in advance for your help
I have performed panel data regression analysis and selected the FEM model, and also used GLS to address autocorrelation and heteroscedasticity. However, my R-Square value is so high that it almost reaches 1, is that true? because my lecturer doubts this.
I did linear regression of X (independent variable) to M (Mediator)
then I used survival regression to fit X to Y (dependent variable)
With these questions:
a. HOW to correctly do a mediation analysis from X to Y through M with survival regression?
b. If the Mediation() function is available, why the results are so weird? ie. ACME and ADE are so large and have negative values.
C. if the negative values are fine, how to explain them? As I know, they might be explained as the suppressing effects.
I'm new to mediation analysis and I'm using mediation() with R. My results are very strange and I'm not sure if they are correct. I haven't found a very detailed mediation analysis on survival regression, any discussion is very welcome and if anyone can give me some hints I would appreciate it!
Here is the code:
# Mediator model
mediator_formula <- paste("scale(", mediator_var, ") ~ ", iv_name, " + ", paste(covariates, collapse = " + "))
model_mediator <- lm(as.formula(mediator_formula), data = data_with_residuals)
lm_sum <- summary(model_mediator)
# dependent model
model_dv_formula <- paste("Surv(time, status) ~ ", iv_name, " + ", "scale(", mediator_var, ")", " + ", paste(covariates, collapse = " + "))
model_dv <- survreg(as.formula(model_dv_formula), data = data_with_residuals)
surv_sum<-summary(model_dv)
# Mediation
mediator_name <- paste("scale(", mediator_var, ")", sep="")
mediation_results <- mediate(model_mediator, model_dv, treat = iv_name, mediator = mediator_name, sims = 500)
------------------------------------------------------------------------------
________________________________________________________________________________

Hello,
I have a question regarding the interpretation of the results from an experiment I conducted. Each participant answered 4 questions measuring motivation, satisfaction, help, and collaboration (my dependent variables) in 7 different scenarios (my independent variables). To analyze my results, I used three methods: a Wilcoxon signed rank test, a regression with standard errors clustered at the individual level - CRSE (to control for individual heterogeneity), and an ordinal regression (using GENLIN ) to account for the ordinal nature of the dependent variable.
The aim of this analysis was to verify if the significant results obtained with the Wilcoxon test were consistent across the other two methods. I conclude that significant results found with the Wilcoxon test, if they are also significant in the other two regressions, are robust.
Conversely, if an effect is significant in the Wilcoxon test and in the regression with CRSE ( standard errors cluster at the individual level), but not in the ordinal regression (GENLIN ordinal), I consider that this is not a robust effect, indicating that the result is not consistent across the three tests, this indicates that there is an indication of the effect, but that this indication is weak.
I am wondering how to properly interpret this ? What does it really mean ?
For the majority of my results, they are robust, but I have some scenarios where significant effects on certain dependent variables are no longer significant in the ordinal regression, but are in the Wilcoxon test and the regression with clustered standard errors. I am wondering why this happens and how to explain it.
I am working with SPSS version 27. Could you help me better understand these results and their interpretation?
Thank you in advance for your help.
I'm working on cancer model in mice. In my free-drug treatment group, there is slow growth of tumor as compared to control (Saline) group. But in the nano-particle treated Group, the tumor undergone apoptosis and complete disappearance of tumor was observed while the study plan was going on. So in this situation, how the study should be justified if in the paper, we will submit the histology only from the free drug treated Group and not the nano-particle treated Group ?
Should I re-replan the study only for this Nanoparticle treated Group and remove the tumor just before it spent off from the site of implantation so that I can perform histology of an apoptotic tissue atleast?
In what situation we will use please tell me I face difficulty
Can I run Mixed-effect-logistic regression with only one single outcome variable which is time varying covariate?there is no confounder for the study.
or should i have more than one independent variable?
Is there any article related to Mixed-effect-logistic regression with time varying covariate to send me?
I want to write the stat and I need some.
Thanks
Hello,
I had a questionnaire conducted in which all participants had to answer four different questions in order to measure the level of satisfaction, motivation, collaboration, and help in seven different scenarios. The aim was to measure the dependent variables in the first scenario and to see how they evolved through the different scenarios. The dependent variables were measured on a 7 Likert scale ( from strongly disagree to strongly agree ). All participants answered my questionnaire, so it's a within-subject design.
I've created dummies for scenarios:
scenario 1, 2,3,4,5 6 and 7. And also for the gender variable ( female, male ) and for situation ( employed, unemployed, student, retired, other,self-empoyed).
First, I performed a Wilcoxon test. Then, I performed an ordinal regression. But I don't understand how I'm supposed to interpret the values in the "Estimate" column from te table "Parameter Estimate".
For example, the Wilcoxon tests showed that scenarios 2 and 3 had a significantly negative impact on satisfaction, whereas scenario 6 had a significantly positive impact. But here I see that for scenarios 2 and 3, the “Estimate” is positive, not negative? I don't understand.
When I run ordinal regression on spss, and select els dummies for scenario (except scenario 1 because I'm using it as a basis for comparison), I get this :
The table shows :
Scenario 2 = 0 Estimate :1.037 p:<.001
Scenario 2= 1 “This parameter is set to 0 because it is redundant”
I would rather have expected a negative value for scenario 2, as the Wilcoxon test and simple linear regression had shown me.
And when I don't put in the dummies, but just put in the lavairbale scenario, I get this.
Scenario =2 Estimate:-1.118 p:<.001
This seems more logical to me than the other wilcoxon and regression tests I had done before. But the problem is that the table for scenario=7 shows no value.
scenario =7 '' “This parameter is set to 0 because it is redundant”'
Sorry, I'm a beginner and I'm learning to use spss.
If anyone can help me, that would be very kind.
In Amos, I am explaining a dependent construct with one exogenous construct and get a squared multiple correlation for the dependent of .48. I then add a second exogenous construct as an additional predictor and the squared multiple correlation for the dependent falls to .37. How is that possible? When I do a normal regression, the R2 goes up, as it should.
I have asked two local experts here that could not help. Any help here? Thanks in advance!
How do we evaluate the importance of individual features for a specific property using ML algorithms (say using GBR) and construct an optimal features set for our problem.
image taken from: 10.1038/s41467-018-05761-w
Hello all,
I am running into a problem I have not encountered before with my mediation analyses. I am running a simple mediation X > M > Y in R.
Generally, I concur that the total effect does not have to be significant for there to be a mediation effect, and in the case I am describing, this would be a logical occurence, since the effects of path a and b are both significant and respectively are -.142 and .140, thus resulting in a 'null-effect' for the total effect.
However, my 'c path X > Y is not 'non-significant' as I would expect, rather, the regression does not fit (see below) :
(Residual standard error: 0.281 on 196 degrees of freedom
Multiple R-squared: 0.005521, Adjusted R-squared: 0.0004468
F-statistic: 1.088 on 1 and 196 DF, p-value: 0.2982).
Usually I would say you cannot interpret models that do not fit, and since this path is part of my model, I hesitate to interpret the mediation at all. However, the other paths do fit and are significant. Could the non-fitting also be a result of the paths cancelling one another?
Note: I am running bootstrapped results for the indirect effects, but the code does utilize the 'total effect' path, which does not fit on its own, therefore I am concerned.
Note 2: I am working with a clinical sample, therefore the samplesize is not as great as I'd like group 1: 119; group2: 79 (N = 198).
Please let me know if additional information is needed and thank you in advance!
I am evaluating impacts of major health financing policy changes in Georgia (country). The database is household level and it is not panel data. Continues outcome variable is out-of-pocket health spending (OOPs) and it exhibits skewed distribution as well as seasonality. The residuals are positively autocorrelated. The regression also takes independent variables connected with each household characteristics into account. My goal is to evaluate impact of health policies on the financial wellbeing of population connected with health care utilization determinants. Should I aggregate the dataset or keep it as it is?
Here is the case, as I said, I am working on how Macroeconomic variables affect REIT Index Return. To understand how macroeconomic variables affect REIT which tests or estimation method should I use.
I know I can use OLS but is there any other method to use? All my time series are stationary at I(0).
In the domain of clinical research, where the stakes are as high as the complexities of the data, a new statistical aid emerges: bayer: https://github.com/cccnrc/bayer
This R package is not just an advancement in analytics - it’s a revolution in how researchers can approach data, infer significance, and derive conclusions
What Makes `Bayer` Stand Out?
At its heart, bayer is about making Bayesian analysis robust yet accessible. Born from the powerful synergy with the wonderful brms::brm() function, it simplifies the complex, making the potent Bayesian methods a tool for every researcher’s arsenal.
Streamlined Workflow
bayer offers a seamless experience, from model specification to result interpretation, ensuring that researchers can focus on the science, not the syntax.
Rich Visual Insights
Understanding the impact of variables is no longer a trudge through tables. bayer brings you rich visualizations, like the one above, providing a clear and intuitive understanding of posterior distributions and trace plots.
Big Insights
Clinical trials, especially in rare diseases, often grapple with small sample sizes. `Bayer` rises to the challenge, effectively leveraging prior knowledge to bring out the significance that other methods miss.
Prior Knowledge as a Pillar
Every study builds on the shoulders of giants. `Bayer` respects this, allowing the integration of existing expertise and findings to refine models and enhance the precision of predictions.
From Zero to Bayesian Hero
The bayer package ensures that installation and application are as straightforward as possible. With just a few lines of R code, you’re on your way from data to decision:
# Installation
devtools::install_github(“cccnrc/bayer”)# Example Usage: Bayesian Logistic Regression
library(bayer)
model_logistic <- bayer_logistic( data = mtcars, outcome = ‘am’, covariates = c( ‘mpg’, ‘cyl’, ‘vs’, ‘carb’ ) )
You then have plenty of functions to further analyze you model, take a look at bayer
Analytics with An Edge
bayer isn’t just a tool; it’s your research partner. It opens the door to advanced analyses like IPTW, ensuring that the effects you measure are the effects that matter. With bayer, your insights are no longer just a hypothesis — they’re a narrative grounded in data and powered by Bayesian precision.
Join the Brigade
bayer is open-source and community-driven. Whether you’re contributing code, documentation, or discussions, your insights are invaluable. Together, we can push the boundaries of what’s possible in clinical research.
Try bayer Now
Embark on your journey to clearer, more accurate Bayesian analysis. Install `bayer`, explore its capabilities, and join a growing community dedicated to the advancement of clinical research.
bayer is more than a package — it’s a promise that every researcher can harness the full potential of their data.
Explore bayer today and transform your data into decisions that drive the future of clinical research: bayer - https://github.com/cccnrc/bayer

I have three variables (A, B, C) and do a multilevel SEM with R - Lavaan.
I do not understand why the following two models render different regression coefficients:
in the 1st one I use the ready aggregated latent variables from the sheet directly, in the 2nd one I define them within the model, but the data behind is of course the same.
Could anybody please explain why that is and which model would be the right one to use?
1.) "
level: 1
A ~ B + C
level: 2
A ~ B + C
"
2.)"
level: 1
A =~ a1 + a2 + a3
B =~ b1 + b2 + b3 + b4
c =~ c1 + c2 + c3
A ~ B + C
level: 2
A =~ a1 + a2 + a3
B =~ b1 + b2 + b3 + b4
C =~ c1 + c2 + c3
A ~ B + C
"
thanks so much for any help!
Dear All,
I have an imagery with a single fish species within each image along with a list of morphometric measurements of the fish (length, width, length of tail, etc). I would like to train a CNN model that will predict these measurements having as input only the images. Any ideas what kind of architecture is ideal for this task? I read about multioutput learning, but I haven't found a practical implementation in Python.
Thank you for your time.
I have collected data at community level using cluster sampling. The ICC shows >10% variability at cluster level. However, I don't have relevant variable at cluster level (all variables are at household and individual levels).
Then, can I run multilevel regression without having multilevel variable?
Thanks!
My Topic is Study of the energy status of the construction materials, that is why all parameters of them is very needed. calculation, comparison, regression, co relation etc.
that is why if you have any idea of them .
thanking you all
Hello everyone and thank you for reading my question.
I have a data set that have around 2000 data point. It have 5 inputs (4 wells rate and the 5th is the time) and 2 ouputs ( oil cumulative and water cumulative). See the attached image.
I want to build a Proxy model to simualte the cumulative oil & water.
I have made 5 models ( ANN, Extrem Gradient Boost, Gradient Boost, Randam forest, SVM) and i have used GridSearch to tune the hyper parameters and the results for training the models are good. Of course I have spilited the training data set to training, test and validation sets.
So I have another data that I haven't include in either of the train,test and validation sets and when I use the models to predict the output for this data set the models results are bad ( failed to predict).
I think the problem lies in the data itself because the only input parameter that changes are the (days) parameter while the other remains constant.
But the problem is I can't remove the well rate or join them into a single variable because after the Proxy model has been made I want to optimize the well rates to maximize oil and minimize water cumulative respectively.
Is there a solution to suchlike issue?

Hi,
I am trying to evalaute the impact of gender quotas on women's political engagement. I am using the world values survey data on different countries over the time period of range 1981-2009. I wish to do a country and time fixed regression of gender quotas on a variable while controlling for age. However age in the survey is divided into categories, how can i recode it for my regression. Should I use binning to control for age or should I use the mean values of the categories?
It is known that we can use the regression analysis to limit the confounding variables affecting the main outcome. But what if the entire sample have a confounding variable affecting main outcome, will Regression Analysis still applicable and reliable ?
For example a study was done to investigate the role of certain intervention in cognitive impairment, the entire population included was old aged (more than 60 years old ), which means that the age here is a risk factor ( Co-variate ) in the entire sample, and it is well known that age is a significant independent risk factor of cognitive impairment
My question here is; Will the regression here of a real value ? Will it totally vanish the effect of age and got to us the clear effects the intervention on cognitive impairment ?
in the use of spectral indices in the estimation of corn yield, why is it that when I put the average of the total index at the farm level in the equation generated from the regression, the predicted yield is closer to the actual yield even though the coefficient of determination is weak?
# spectralindices
#predictedyield
#RS
I am doing landuse projection using the Dyna-CLUE model, but I am stucked with the error "Regression can not be calculated due to a large value in cell 0,478". I would appreciate any advice you can provide to solve this error.
I am conducting a meta-analysis and I want to use the nonlinear polynomial regression and splines functions to model the dose-response relationship between the parameters of interest.
I would appreciate any help or suggestions.
Thank you very much.
My question is looking at the influence of simulation on student attitudes. My professor would like me to do regression analysis, but he says to do two regressions. I have my pre-test data and post-test data the only other information I have is student college. What I found in my class materials seems to indicate that I can complete a regression using the post-test as my dependent variable and the pre-test as my independent variable in SPSS. How would I do another regression? Should I work in the colleges as another dependent variable and if so, do I do them as a group or do I need to create a variable for each college?
I regress X to Y: ,direct effect (c)
M: mediator: I regress X to M (a), M to Y (b)
Total effect = c + a*b
now i introduce a moderator effect between X and Y
How i calculate the total effect with moderator and mediator effect
I have daily sales data and stock availability for items in a supermarket chain. My goal is to estimate the sales quantity elasticity with respect to availability (represented as a percentage). With this model, I want to understand how a 1% change in availability impacts sales. Currently, single-variable regressions yield low R-squared values. Should I include lagged sales values in the regression to account for other endogenous factors influencing sales? This would isolate availability as the primary exogenous variable
I am doing a study focusing on analyzing differences in fish assemblages due to temperature extremes. I calculated Shannon diversity, evenness, richness, and total abundance for each year sampled. The years are grouped into 2 temperature periods essentially as well, which is what I want to overall compare.
On viewing results, there appears to be consistency across years, and when comparing the two groupings. I do have multivariate tests to follow this after for community composition, but when describing univariate results, are there any statistical tests that can be followed up with to better show there is no difference, rather than simply describing the numbers and their mean differences?
Dear all,
I am sharing the model below that illustrates the connection between attitudes, intentions, and behavior, moderated by prior knowledge and personal impact perceptions. I am seeking your input on the preferred testing approach, as I've come across information suggesting one may be more favorable than the other in specific scenarios.
Version 1 - Step-by-Step Testing
Step 1: Test the relationship between attitudes and intentions, moderated by prior knowledge and personal impact perceptions.
Step 2: Test the relationship between intentions and behavior, moderated by prior knowledge and personal impact perceptions.
Step 3: Examine the regression between intentions and behavior.
Version 2 - Structural Equation Modeling (SEM)
Conduct SEM with all variables considered together.
I appreciate your insights on which version might be more suitable and under what circumstances. Your help is invaluable!
Regards,
Ilia

Hello everyone, for my dissertation I have two predictor variables and one criterion variable. In one of the predictor variable- I further have 5 domains and it doesn't have a global score so in that case can i used multiple regression or i have to perform step wise linear regression seperately for 6 predictors(5 domains and another predictor) ?- keeping in mind the assumption of multicollinearity.
hi, i'm currently writing my psychology dissertation where i am investigating "how child-oriented perfectionism relates to behavioural intentions and attitudes towards children in a chaotic versus calm virtual reality environment".
therefore i have 3 predictor variables/independent variables: calm environment, chaotic environment and child-oriented perfectionism
my outcome/dependent variables are: behavioural intentions and attitudes towards children.
my hypotheses are:
- participants will have more negative behavioural intentions and attitudes towards children in the chaotic environment than in the calm environment.
- these differences (highlighted above) will be magnified in participants high in child-oriented perfectionism compared to participants low in child oriented perfectionism.
i used a questionnaire measuring child-oriented perfectionism which will calculate a score. then participants watched the calm environment video and then answered the behavioural intentions and attitudes towards children questionnaires in relation to the children shown in the calm environment video. participants then watched the chaotic environment video and then answered the behavioural intentions and attitudes towards children questionnaire in relation to the children in the chaotic environment video.
i am unsure whether to use a multiple linear regression or repeated measures anova with a continuous moderator (child-oriented perfectionism) to answer my research question and hypotheses. please please can someone help!
How can I interpret these two examples below in the mediation analysis? Help me
1) with negative indirect and total effect, positive direct effect
Healthy pattern (X)
Sodium Consumption (M)
Gastric Cancer (Y)
Total Effect: Negative (-0.29)
Indirect Effect: Negative (-0.44)
Direct Effect: Positive (0.14)
Mediation percentage: 100%
2) With total and direct negative effect, positive indirect effect
Healthy pattern (x)
Sugar consumption (m)
Gastric Cancer (Y)
Total Effect: Negative (-0.42)
Indirect Effect: Positive (0.03)
Direct Effect: Negative (-0.29)
Mediation percentage: 10.3%
I run OLS regression on panel data in Eviews and then 2SLS and GMM regression.
I introduced all the independent variables of OLS as instrumental variables.
I am getting exacty same results under the three methods.
is there any mistake in running the models
I am also attaching the results.
thanks in advance
In his 1992 paper, (Psychological Assessment 1992, Vol.4, No. 2,145-155) Tellegen proposed a formula to calculate the uniform T score.
UT = B0 + B1X + B2X2 + B3X3.
B0 being the intercept, X the raw score and B1, B2 and B3 different regression coefficients. X2 is squared and X3 cubic.
What is the intercept ? How do you calculate the intercept (Bzero)?
How do you calculate the regression cofficient? Is it between the raw score and the percentile? Why 3 different regression coefficients?
Suppose I compute a least squares regression with the growth rate of y against the growth rate of x and a constant. How do I recover the elasticity of the level of y against the level of x from the estimated coefficient?
In most of the studies tobit regression is used but in tobit model my independent variable is not significant. Whether fractional logistic regression is also an appropriate technique to explore determinants of efficiency?
I want to analyse data of body measurement of Osn\manabadi goats
If I want to carry out innovative research based on Wasserstein Regression, what other perspectives can I carry out statistical innovation? Wasserstein Regressions can I carry out statistical innovation? Specifically,(1) Combining with Bayesian framework, the prior distribution is introduced and parameter estimation is performed based on Bayesian rule to obtain more reliable estimation results.(2)Variable selection technique is introduced to automatically select the predictive distribution that has explanatory power to the response distribution to obtain sparse interpretation.
Can the above questions be regarded as a highly innovative research direction?
Whether one independent variable in multilevel regression can be the other two independent variables, such as type, token, and type/token ratio
Can we use inferential statistics like correlation and regression if we use non-probability sampling technique like convenient or judgement sampling?
Phonetic - What is progressive and regressive assimilation and dissimilation in the Romanic Languages (especially Spanish) and how do you recognize it? Was ist progressive und regressive Assimilation und Dissimilation in den Romanischen Sprachen, besonders im Spanischen? Que es la asimilación progresiva y regresiva y la disimilación fonética en las lenguas románicas (especialmente en el español)?
I am searching for an explication, a good source recommendation, where I could read more about this topic and some examples for the assimilation or dissimilation in the romanic languages, especially in the Spanish language. Thank you for helping me!!
I found in google but can't understand properly.
Hi,
I have some confusion as to which one is better for outcomes as a model using binomial regression or logistic regression. currently i am working on judicial decisions, outcomes in tax courts where the cases go either in favour of assessee or the taxman. The factors influencing the judges as reflected in the cases are duly represented by presence(1) or absence (0) of the same. If a factor is not considered in final judgment it takes '0' else '1'. if outcome is favourable to assessee - it is '1' else'0' - now which would be the best approach to put this into a regression model showing relationhip between outcome (dependant) and independent ( factors - may be 5-6 variables). I need some guidance on this . can i use any other better model for forecast after i can perform a bootstrap run for say 1000 simulations and then arrive average outcomes and related statistics.
I've often seen the following steps used to test for moderating effects in a number of papers, and I don't quite understand the usefulness of Model 1 (which only tests the effect of the control variable on the dependent variable) and Model 4 (which only adds the cross-multiplier term of one of the moderating variables with the independent variable). These two models seem redundant.

if we can
1- First degree equation
2- 7th-degree polynomial equation
3- A non-linear differential equation
4- A system of two variables and 2 equations
5- A system of second-order differential equations
6- A device of nonlinear equations
And
7- The next step is the non-linear differential equation
If we can model all of these by regression and get the output correctly or with good accuracy, then we can have an approximate model of the system using only data, and this can be a start for using control or troubleshooting. Complex systems for which exact models are not available.
for the test, I start with 1 and 2 but Is it possible to achieve a good accuracy of the answer with regression for the rest of the cases?
In the case of a constant coefficient, where the VIF is greater than 10, what does that mean? Do all the variables in the model exhibit multicollinearity? How can multicollinearity be reduced? Multicollinearity could be reduced by removing variables with VIF >10. But I don't know what to do with the constant coefficient.
Thank you very much
Hello,
I am measuring thermal stability of a small protein (131 aa) using circular dichroism following the loss of its secondary structure. The data obtained is normalized to be within 0 and 1 where 0 is folded protein and 1 is completely unfolded. The CD of the fully unfolded state was calculated from a different experiment on the same batch and taken as reference. Once plotting my data in Graphpad Prism 9, I am fitting a standard 4PL curve using non-linear regression, constraining the regression to use 0 as bottom value and 1 as top value (see attached file). The Tm is reported as IC50 in this screenshot because this formula is often use for calculating IC50 and EC50. However, the resulted fitted line seems to not being able to represent correctly my data. I performed this experiment twice and the replicate test is showing that the model is inadequately representing the data. Should I look for a different equation to model my data? Or am I making a mistake in performing this regression? Thank you for the help!

There are two ways we can go about testing the moderating effect of a variable (assuming the moderating variable is a dummy variable). One is to add an interaction term to the regression equation, Y=b0+b1*D+b2*M+b3*D*M+u, to test whether the coefficient of the interaction term is significant; an alternative approach could also be to equate the interaction term model to a grouped regression (assuming that the moderating variables are dummy variables), which has the advantage of directly showing the causal effects of the two groups. However, we still need to test the statistical significance of the estimated D*M coefficients of the interaction terms by means of an interaction term model. Such tests are always necessary because between-group heterogeneity cannot be resorted to intuitive judgement.
One of the technical details is that if the group regression model includes control variables, the corresponding interaction term model must include all the interaction terms between the control variables and the moderator variables in order to ensure the equivalence of the two estimates.
If in equation Y=b0+b1*D+b2*M+b3*D*M+u I do not add the cross-multiplication terms of the moderator and control variables, but only the control variables alone, is the estimate of the coefficient on the interaction term still accurate at this point? At this point, can b1 still be interpreted as the average effect of D on Y when M = 0?
In other words, when I want to test the moderating effect of M in the causal effect of D on Y, should I use Y=b0+b1*D+b2*M+b3*D*M+b4*C+u or should I use Y=b0+b1*D+b2*M+b3*D*M+b4*C+b5*M*C+u?
Reference: 江艇.因果推断经验研究中的中介效应与调节效应[J].中国工业经济,2022(05):100-120.DOI:10.19581/j.cnki.ciejournal.2022.05.005.
I am writing my bachelor thesis and I'm stuck with the Data Analysis and wonder if I am doing something wrong?
I have four independent variables and one dependent variable, all measured on a five point likert scale and thus ordinal data.
I cannot use a normal type of regression (since my data is ordinal and my data is not normally distributed and never will be (transformations could not change that) and is also violating homoscedasticity), so I figured ordinal logisitc regression. Everything worked out perfectly but the test of parallel lines on SPSS was significant and thus the assumption of proportional odds violated. So, I am now considering multinomial logisitc regression as an alternative.
However, here I could not find out how to test the assumption on SPSS: Linear relationship between continuous variables and the logit transformation of the outcome variable. Does somebody know how to do this???
Plus, I have a more profound question about my data. To get the data on my variables, I asked respondents several questions. My dependent variable for example is Turnover Intention and I used 4 questions using a 5 point likert scale, thus I got 4 different values from everyone about their Turnover Intention. In order to do my analysis, I took the average since I only want one result, so one value of Turnover Intention per respondent (and not four). However, now the data does not range from 1,2,3,4 and 5 anymore like before with the five point likert scale but is infinite since I took the average and now have decimals like 1,25 or 1,75. This leaves me with endless data points and I was wondering if my approach makes sense? I was thinking of grouping them together since my analysis is biased by having so many different categories due to the many decimals.
Can somebody provide any sort of guidance on this??
In my Ms thesis, I calculated my data by the process of linear regression but my supervisor added me also step-wise linear regression.
I have retrieved a study that reports a logistic regression, the OR for the dichotomous outcome is 1.4 for the continuous variable ln(troponin) . This means that the Odds increase 0.4 every 2.7-fold in the troponin variable; but, is there any way of calculating the OR for a 1 unit increase in the Troponin variable?
I want to meta-analyze many logistic regressions, for which i need them to be in the same format (i.e some use the variable ln(troponin) and others (troponin). (no individual patient data is available)
For instance, when using OLS, objective of the could be
# to determine the effect of A on B
could this kind of objective hold when using threshold regression?
Hi folks!
Let's say that I have two lists / vectors "t_list" and "y_list" representing the relationship y(t). I also have numerically computed dy/dt and stored it into "dy_dt_list".
The problem is that "dy_dt_list" contains a lot of fluctuations, and that I know that it MONOTONOUSLY DECREASES out of a physical theory.
1) Is there is a simple way in R or Python to carry out a spline regression that reproduces the numerical values of dy/dt(t) in "dy_dt_list" as best it can UNDER THE CONSTRAINT that it keeps decreasing? I thus want to get a monotonously decreasing (dy/dt)_spline as the output.
2) Is there is a simple way in R or Python to carry out a spline regression that reproduces the numerical values of y(t) as best it can UNDER THE CONSTRAINT that (dy/dt)spline keeps decreasing? I thus want to get y_spline as the output, given that the above constraint is fulfilled.
I'd like to avoid having to reinvent the wheel!
P.S: I added an example to clarify things!
I have the OR of a logistic regresion that used the independent variable as continuous. I also have the ORs of 2x2 tables that dichotomized the variable (high if >0.1, low if < 0.1).
Is there anyway i can merge them for a meta-analysis. i.e. can the OR of the regression (OR for 1 unit increase) be converted to OR for High vs Low?
#QuestionForGroup
GoodDay,
I'm utilizing the Lovibond photometer for water analysis but I noticed in its handbook for analysis that this calibration function is like in the attached photo.
is it the inverted equation of Beer's law and why did it use polynomial regression?!.
can you clarify the derivation and purpose of this equation?
We know that bone is active tissue with continuous remodelling( bone growth and resorption). The atherosclerosis is static process as it formed or it could regress ? if the condition of lipid oxidation stopped, could atheroma regress spontaneously?
Suppose one has 40 or 50 survey questions for an exploratory analysis of a phenomenon, several of which are intended to be dependent variables, but most independent. A MLR is conducted with e.g. 15 IVs to explain the DV, and maybe half turn out to be significant. Now suppose an interesting IV warrants further investigation, and you think you have collected enough data to at least partially explain what makes this IV so important to the primary DV. Perhaps another, secondary model is in order... i.e. you'd like to turn a significant IV from the primary model into the DV in a new model.
Is there a name for this regression or model approach? It is not exactly nested, hierarchical, or multilevel (I think). The idea, again, is simply to explore what variables explain the presence of IV.a in Model 1, by building Model 2 with IV.a as the DV, and employing additional IVs that were not included in Model 1 to explain this new DV.
I am imagining this as a sort of post-hoc follow up to Model 1, which might sound silly, but this is an exploratory social science study, so some flexibility is warranted, imo.
When i do regression analyze, in Model Summary Table, i found Rsquare is very weak like:0,001 or 0.052, and value of sig. in Anova table is greater than 0.05, how can i fix this?
İ have a data set with six categorical variables, with responses on a scale of 1-5; the reliability test for the individual variables is very strong but when combined for all variables the reliability test give very low figures. What could be the problem. Also what would be an appropriate regression for this analysis.
If we have a research (analysis of factors affecting on sustainable agriculture...) in order to analyze its data, most previous researches have used techniques such as regression. To identify effictive factors, Is it possible to use the exploratory factor analysis technique?
I am planning to assess the extent of different income diversification strategies on rural household welfare. Considering simultaneous causality between different livelihood strategies and welfare indicators, the Two Stage Least Square (2SLS) method with instrumental variables will applied to estimate the impact of the strategies on household welfare.
Please check the attached file also. I just need to know which regression was used in table 4 of this paper and which tool (SPSS, STATA, R, etc.) I need to use to analyse the data.
What does the unstandardized regression coefficient in simple linear regression means?
Where as in multiple linear regression Unstandardized regression coefficients tell how much change in Y is predicted to occur per unit change in that independent variable (X), *when all other IVs are held constant*. But my question is in simple linear regression we have only one independent variable so how should I interpret it?
Hi,
How do I interpret a significant interaction effect between my moderator (Coh) and independent variable (Hos)? The literature states Hos and my dependent variable (PDm) has a negative relationship. The literature also states the moderator (Coh) has a positive relationship with the DV (PDm). My regression co-efficient for the interaction effect is negative. Does this mean Coh is exacerbating the negative effect (i.e., making it worse) or weakening the effect (i.e., making it better)?
I have attached the SPSS output and simple slopes graph.
Thank you!


Hello, I am trying to analyze factors that influence the adoption of technology, and while doing that, I am facing issues with rbiprobit estimation. I have seven years (2015-2021) balanced panel data that contains 2835 data. The dependent variable y1 (Adopt2cat), the endogenous variable "BothTechKnowledge," and the instrumental variable "SKinfoAdoptNew" takes value 0 and 1. Although the regression works, I am unsure how to include panel effects in the model.
I am using follwing codes:
rbiprobit Adopt2cat ACode EduC FarmExp HHCat LaborHH LandSizeDec LandTypeC landownership SoilWaterRetain SoilFertility CreditAvail OffFarmCode BothTechAware IrriMachineOwn, endog(BothTechKnowledge = ACode EduC FarmExp HHCat LaborHH LandSizeDec LandTypeC landownership SoilWaterRetain SoilFertility CreditAvail OffFarmCode BothTechAware IrriMachineOwn SKinfoAdoptNew)
rbiprobit tmeffects, tmeff(ate)
rbiprobit margdec, dydx(*) effect(total) predict(p11)
If we do not add time variables (year dummy), can we say we have obtained pooled panel estimation? I kindly request you to please guide me through both panel and pool panel estimation procedures. I have attached the Data file for your kind consideration.
Thank you very much in advance.
Kind regards
Faruque
I have 4 groups in my study and I want to analyse the effect of treatment in 4 groups at 20 time points. Which test should I chose?
I did principal component analysis on several variables to generate one component measuring compliance to medication but need understanding on how to use the regression scores generated for that component.