Science topic

Multiple Linear Regression - Science topic

Explore the latest questions and answers in Multiple Linear Regression, and find Multiple Linear Regression experts.
Questions related to Multiple Linear Regression
  • asked a question related to Multiple Linear Regression
Question
6 answers
If I select the learning ecosystem as the independent variable, it encompasses the learning environment, teaching environment, technological environment, etc. Now, Should I use simple linear regression or multiple linear regression to examine the impact of the learning ecosystem on the dependent variable, for example, students' achievement?
Relevant answer
Answer
In deciding between simple linear regression (SLR) and multiple linear regression (MLR), these are some of the technicalities or factors you need to consider:
*Simple Linear Regression (SLR):*
1. Single independent variable (predictor)
2. Linear relationship between independent and dependent variables
3. Few data points (small sample size)
4. Simple model interpretation
*Multiple Linear Regression (MLR):*
1. Multiple independent variables (predictors)
2. Complex relationships between independent and dependent variables
3. Larger sample size
4. More accurate predictions (if relevant variables are included)
*Additionally, ask yourself some of these questions:*
1. How many independent variables do you have?
- SLR: 1 variable
- MLR: 2 or more variables
2. What is the complexity of the relationship between variables?
- SLR: Simple, linear relationship
- MLR: Complex, non-linear relationships or interactions between variables
3. How many data points do you have?
- SLR: Small sample size (e.g., <100)
- MLR: Larger sample size (e.g., >100)
4. How accurate do you need the predictions to be?
- SLR: Reasonable accuracy for simple models
- MLR: Higher accuracy for complex models
5. Do you need to control for confounding variables?
- SLR: No
- MLR: Yes
*Also, consider this decision Flowchart:*
1. Do you have only 1 independent variable?
- Yes: SLR
- No: Proceed to 2
2. Do you have a simple, linear relationship?
- Yes: SLR
- No: MLR
3. Do you have a small sample size (<100)?
- Yes: SLR
- No: MLR
4. Do you need high accuracy or control for confounding variables?
- Yes: MLR
- No: SLR
*Further Considerations:*
1. Multicollinearity: If independent variables are highly correlated, MLR may not be suitable.
2. Overfitting: MLR can suffer from overfitting if too many variables are included.
3. Model interpretability: SLR is generally easier to interpret than MLR
  • asked a question related to Multiple Linear Regression
Question
5 answers
I conducted a multiple linear regression using gretl for my exploratory research. The data was from published reports from 2013 to 2021, as these were the periods where all the yearly data was recorded. The results show that two of my variables are significant at 10% and have an r-square value of 0.664428, but the overall model fit comes to F(3,5) = 3.299978 and p-value (F) = 0.115750. So, can I go forward with this result or not and how can I justify this result?
Relevant answer
Answer
If the purpose of your study is exploration, as you say, why do you perform hypothesis tests at all? How did you come to these hypotheses you tested in your exploratory study? Usually, you would take the data to generate (ideas about) hypotheses you may test in an independent set of data (ideally collected more specifically with the aim to test just that hypothesis).
  • asked a question related to Multiple Linear Regression
Question
7 answers
Do you:
1) use G*Power and choose an effect size based on your best guess/ effect size from literature
2) follow Green (1991) rule of thumb (e.g. N > 50 + 8m , or N > 104 + k)
3) apply Maxwell's (2000) rule of thumb
4) Any other methods?
Relevant answer
Answer
David L Morgan , if I had sufficiently good information, I wouldn't do an experiment. If I need to do an experiment, my knowledge is by definition insufficient. To plan the experiment, I need to make a guess on what I expect from the experiment. Of course will I use the best sources of information available, to make a sensible, educated guess. But I must make a guess. So I say: we have to rely on guesswork (to plan a study), but only rarely on pure guesswork.
  • asked a question related to Multiple Linear Regression
Question
6 answers
I generated six eqautions, 3 for soaked CBR, 3 for unsoaked CBR. How will I know the best two equations to choose one from h CBR from the equations generated? Thank you.
Relevant answer
Answer
If these are the only variables in your equation (i.e., there are no covariates), then it sounds like you could perform a 2 x 3 Analysis of Variance.
  • asked a question related to Multiple Linear Regression
Question
1 answer
Three soil samples were used for the study and six equations were generated from the model which comprises of three for unsoaked CBR and the other three for soaked CBR. How would I know the best model for the unsoaked and soaked CBR? Thank you Sirs and Mas.
Relevant answer
Answer
Simple linear regression analysis (SLRA) that CBR value decreases with increase in plasticity index and also increases with increase in maximum dry density. From SLRA coefficient of correlation R2 for plasticity index and maximum dry density are 0.72 and 0.78 respectively. The soaked and unsoaked CBR approximate values range between 3-9% and 14-38% respectively, Soaked CBR value of the subgrade is usually preferred for designing flexible pavements as it gives the CBR strength of subgrade soil under worst case scenario of a pavement being submerged under water for a minimum period of 4 days during floods in addition to the compaction process, preparation usually involves soaking each specimen in water for 96 hours before the penetration test. The CBR value obtained at 2.5mm penetration is normally higher than that at 5mm penetration and this is the value used
  • asked a question related to Multiple Linear Regression
Question
4 answers
I ran multiple linear regression in SPSS and it turned out that one of the variables was not statistically significant.
multiple regression equation formula, if X1 not significant at (P<0.05), Can this variable be entered into the equation?
Y = A+B1X1+B2X2,......
Relevant answer
Answer
In my research, there are hypotheses, to test them I used multiple linear regression, and after that I want to write a prediction equation in the dependent variable.
According to this equation: Y=A+B1X1+B2X2+,........
  • asked a question related to Multiple Linear Regression
Question
3 answers
I am working on predicting the labor force participation rate (dependent) by countries using multiple linear regression. I'm not sure if it is a count or continuous. The labor force participation rate is computed as (Labor Force ÷ Civilian Noninstitutional Population) x 100.
Relevant answer
Answer
Yes, it's generally okay to use a percentage as a dependent variable in multiple linear regression. However, there are some considerations to keep in mind:
  1. Distribution: Ensure that the percentage variable follows a roughly normal distribution. If it's heavily skewed or contains outliers, transformation techniques might be necessary.
  2. Interpretation: The interpretation of coefficients in the regression model may need adjustment. For example, a one-unit change in an independent variable may not correspond to a fixed change in the percentage. Instead, it might represent a percentage point change.
  3. Homoscedasticity: Check for homoscedasticity, which means that the variability of the residuals is constant across all levels of the independent variables. This can be done through residual plots.
  4. Model Fit: Assess the overall fit of the model using appropriate goodness-of-fit measures like R-squared, adjusted R-squared, and others.
  5. Assumption of Linearity: Ensure that the relationship between the dependent variable and the independent variables is linear. This can be assessed through visual inspection of scatterplots.
  6. Endogeneity: Be cautious of potential endogeneity issues, especially if there are omitted variables or reverse causality that might bias the results.
  • asked a question related to Multiple Linear Regression
Question
3 answers
Hello !!!
Since simple multiple linear regression doesn't take into account the within-subject model, nor the fact that my dependent variable is an ordinal variable, I need to control for individual heterogeinity. For this, it's possible to run a regression model with clustered standard errors to take into account the pattern within the subject.
Can anyone explain how to run a regression model with clustered standard errors on ibm spss?
Explanation of my survey:
112 participants indicated on a scale of 1 to 7 their level of agreement with different statements after reading a scenario. 7 Likert scale (from strongly disagree to strongly agree)
The statements measured my dependent variables: work motivation, bonus satisfaction, collaboration and help. There were a total of 7 different scenarios. I also control for gender, age, and status ( employed, self-employed, student, unemployed, retired, other).
My aim is to see how the different scenarios affect the dependent variables.
My thesis supervisor advised me to take into account clustered standard errors in my regression model, but I have no idea how to do this on spss. I can't find the right test and command to do this.
Could someone help me?
Thanks in advance,
Best regards
Relevant answer
Answer
Dear
Onipe Adabenege Yahaya
,
Thank you very much for your explanations!
However, I am working on IBM SPSS.
Do you know how I could do a regression within subject design with clustered standard error on IBM SPSS ?
Since my dependent variables are ordinal I thought about doing an ordinal regression. Analyze > Generalized Linear Model > then I could specify the type of model as ''ordinal logistic''. But does this test take into account within subject design?
I also thought about GEE on spss. But I don't know if I can do it. ?
My study :
112 participants had to face 7 different scenarios ( the factors) and indicate their level of agreement on a scale likert scale from strongly disagree to strongly agree to measure their motivation, satisfaction, collaboration and help. I also checked for age, gender, and professional situation.
Thank you in advance
  • asked a question related to Multiple Linear Regression
Question
2 answers
Hello everyone,
I am currently working on my thesis and due to the normality assumption being violated, I am performing bootstrapping for my two simple linear regressions and my multiple linear regression.
However I do have difficulties reporting my findings (APA 7 style but also what to report in general). I am using SPSS and the model summary table containing r squared (among others) is not bootstrapped, so can I report this in my thesis or leave it out?
Additionally, the bootstrapped coefficients table contains the confidence interval and a p-value but no degrees of freedom or t-value. What can I report here? So far, I only reported the BcA confidence intervals but I am sure there are other values I can discuss?
Does somebody know what can be reported in a bootstrapped linear regression or maybe knows of a paper that uses this method for inspiration?
Any help is greatly appreciated.
Thank you!!!
Relevant answer
Answer
You don't necessarily need normally distributed data for linear regression analysis. The key assumptions are actually about the residuals, which should follow a normal distribution. This requirement holds independently of the overall distribution of the data. It is also important to check for collinearity between variables, typically done using the variance inflation factor. Adequate Durbin-Watson values are also essential to ensure independence of residuals.
Bootstrapping is a robust method that can address various issues. Another way to assess model relationships is by comparing the root mean square error (RMSE) before and after resampling, although SPSS might not offer this analysis.
  • asked a question related to Multiple Linear Regression
Question
5 answers
I am having trouble differentiating between a random effects model and a linear mixed effects model. I am currently using this model https://bashtage.github.io/linearmodels/panel/panel/linearmodels.panel.model.RandomEffects.html#
for my research. Can somebody tell me if this is a random effects model or a linear mixed effects model and what are the differences between the two models?
Relevant answer
Answer
The only difference of note is that the random effects model could be non-linear (e g in analysing binary or count data). The mixed effects model includes a 'fixed part' - that is means, and a 'random part' (that variances and covariances).
Lot of resources here
and the free online course
  • asked a question related to Multiple Linear Regression
Question
2 answers
Two independent variables
1. Attachment style
1 to 5 likert scale
Strongly disagree to Strongly agree
2. Family functioning
1 to 4 likert scale
Strongly agree to strongly disagree
3. Childhood trauma
Mediating variable
Response Options:
1 to 5
never true - 1
rarely true - 2
sometimes true - 3
often true- 4
very often true - 5
4. Personality Disorder traits
Dependent variable
Response options
0 to 3
Very false or often false 0
Sometimes or somewhat false 1
Sometimes or somewhat true 2
Very true or often true
Please suggest
Thanks
Relevant answer
Answer
Of course, you can use multiple linear regression. The results will show which variables have a stronger effect on your dependent variable after controlling for childhood trauma. Considering mediation analysis and attachment styles research, I can suggest that you should consider whether you measure the two dimensions of the attachment scale of romantic relationships or attachment styles as a classification. The second one is whether your sample is clinical or healthy. In this case, if you had not preregistered your study, you can try to analyze the effect of attachment styles or dimensions on personality via childhood trauma after controlling for family functioning. You can use JASP, adding attachment dimensions as independent variables, personality as the dependent variable, and childhood trauma as the mediator, with family functioning as a background confounder. At first, you can use it without a background confounder, and in the second model, you can use it with a background confounder. But I think if you had used romantic attachment styles, attachment styles could be independent variable, family functioning as the mediator, personality as the dependent variable, and childhood trauma as a moderator
Best regards,
  • asked a question related to Multiple Linear Regression
Question
18 answers
I have performed a Box-Cox transformation of a response variable in multiple linear regression in SPSS. As I understand, in order to correctly interpret and present the data (B, t, CI for B and p), I need to back-transform the data by applying the formula for Box-Cox back-transformation to B, t, CI for B. But the formula specifies lambda and I can't find it anywhere in SPSS. Could you please tell me where I can find the lambda to apply this formula for back transformation?
Relevant answer
Answer
Thanks for the additional info, Svetlana Bondar. Just to remind everyone, earlier, you wrote:
"The residuals of the dependent variable in my case are not normally distributed – a bimodal distribution."
I don't remember ever running into this problem myself, but a bit of digging around suggests that one possible cause of this is failure to include an important dichotomous explanatory variable. Do any possibilities come to mind?
Another suggestion I've seen that might be helpful is to use quantile regression rather than OLS regression, and to perhaps model the 25th and 75th percentiles instead of (or in addition to) the median.
Finally, I found this article, which seems to be addressing the same problem. Maybe you'll find some good suggestions in it.
HTH.
  • asked a question related to Multiple Linear Regression
Question
3 answers
I am using bootstrapping technique for hypothesis testing as it has been done in Conditional Process Analysis by Hayes. But instead of a single-dimensional construct, I have a two-dimensional construct for my independent variable. Therefore, instead of bootstrapping a number, I need to bootstrap a pair of numbers, or we shall say a vector. I can do it mathematically, but I wonder if there is any previous research on bootstrapping a vector (instead of a single number)?
I would be grateful if you can refer me to existing papers on this topic or give me some keywords on this statistical question.
Thank you.
Relevant answer
Here are some specific examples of literature on bootstrapping vectors:
  • "Bootstrapping in vector autoregressive models" by Hsiao (1986)
  • "Bootstrapping the O(N) Vector Models" by Nakayama and Sugiura (2013)
  • "Bootstrapping In Vector Autoregressions: An Application To The Pork Sector" by Goodwin, Nelson, and Schnitkey (2005)
  • "Bootstrapping - Quick-R" by Efron and Tibshirani (2013)
  • "FAQ: Bootstrapping vectors" by StataCorp (2023)
I hope this helps!
  • asked a question related to Multiple Linear Regression
Question
11 answers
According to these results, Is regression analysis significant?
Analysis of Variance
- SS = 1904905
- DF = 8
- MS = 238113
- F (DFn, DFd)= F (8, 17) = 1.353
- P value P=0.2843
Goodness of Fit
- Degrees of Freedom = 17
- Multiple R = 0.6237
- R squared = 0.389
- Adjusted R squared = 0.1015
Relevant answer
Answer
The p value for the overall F statistic indicates that the overall multiple correlation R (and R squared) value is not significantly different from zero at the .05 level (p is greater than .05). The reason for the non-significant result may be a small sample (as indicated by your degrees of freedom) and/or the presence of too many predictor variables. This also explains the large discrepancy between R squared and adjusted R squared. You are lacking statistical power to "detect" an effect.
  • asked a question related to Multiple Linear Regression
Question
3 answers
I want to ask you a question about data analysis in psychology. I have two independent variables, one is group (between subjects), one is age (continuous data), and the dependent variable is a 6-point Likert score. I intend to use regression for data analysis, with three questions:
1. Should the ID of the subjects (the number of each subject) be included in the model as a random variable? If it is included, it should be Linear Mixed Model (LMM), if it is not included, it should be Multiple Linear Regression, right?
2. In the case of multiple linear regression, should I directly build the full model and see the influence of each independent variable, or should I build the full model and compare with the zero model, and then analyze with the method of step by step elimination?
3. When I do my analysis, do I need to centralize both age (continuous variable) and rating, or do I just need to centralize age?
Relevant answer
Answer
you are welcome
  • asked a question related to Multiple Linear Regression
Question
2 answers
What does the unstandardized regression coefficient in simple linear regression means?
Where as in multiple linear regression Unstandardized regression coefficients tell how much change in Y is predicted to occur per unit change in that independent variable (X), *when all other IVs are held constant*. But my question is in simple linear regression we have only one independent variable so how should I interpret it?
Relevant answer
Answer
The same, just that there is no "*when all other IVs are held constant*".
It's simply the (expected) change in Y per unit change of X. There are no other variables involved that would need to be "held constant".
  • asked a question related to Multiple Linear Regression
Question
3 answers
Although it is true that the number of independent variables influences multiple linear regression, or the time horizon, among other factors, what do you think are the statistical tests that must accompany a multiple linear regression to be considered rigorous? What statistical program do you consider to be the best and/or most suitable to process it? Finally, what additional considerations should be taken in methodological terms?
Relevant answer
Answer
Hello Pablo José Arana Barbier, Twenty years ago I did generalised linear modelling in SPSS [sorry can't recall the version of the program for sure]. Possibly version 14. This way you can enter multiple factors, continuous variables into the formulae.
  • asked a question related to Multiple Linear Regression
Question
4 answers
Hello!
I ran two models (linear regression and multiple linear regression) to test a Hypothesis for my research project. The first model tested the IV with the DV. For the second model I added control variables (gender and tenure). Tenure is statistically significant. However, I do not know how to interpreted this result. Can anyone please give some guidance to report this result?
H: career development opportunities positively impact job satisfaction.
Results:
Model 1: Career Development Opportunities coefficient 0.575; t: 7.592; p-value <.001
Model 2: Career Development Opportunities coefficient 0.589; t: 7.785; p-value <.001
Tenure: coefficient 0.187; t: 2.103; p-value <.001; Male -0.411; t: -0.987; p-value 0.326
Thanks!
Relevant answer
Answer
Thank you David!!
  • asked a question related to Multiple Linear Regression
Question
5 answers
I am supposed to create a fake data set with 4 predictors that yields two strong significant relationships, 1 weak significant relationship, 1 non-significant relationship, and a significant interaction.
I have some materials but I am at a total loss at how to do this.
I have created an excel book with the four variables and generated random numbers using the RANDBETWEEN function and have imported that data to JASP but from there it doesn't matter how many times I run it, I can't get the results I need.
Does anyone have any suggestions?
Relevant answer
Answer
Presumably, strong significant relationship means the p-value is well below your chosen alpha level (likely 0.05), whereas weak significant relationship means it is just below alpha. Is that right?
Did your instructor give any guidance (or restrictions) on what the sample size is supposed to be for your model? I ask because of the relationship between n and p-values: The larger n gets, the smaller p becomes (all else being equal).
PS- Daniel Wright, Catherine Strutz could have stated explicitly that this question relates to a course assignment (assuming it does--and I think it does, judging by the wording), but I do not think she was deliberately attempting to conceal that fact. YMMV.
  • asked a question related to Multiple Linear Regression
Question
11 answers
I was doing a regression on the relationship between time living in the rural and depression, and when I included time living in the regression as an overall variable, the p was significant and the standard coefficient showed a large negative number.
However, when I divide the time living in the rural more finely into "residency duration from 0 to 18 years of age", "residency duration from 18 to 29 years of age", "residency duration from 29 to 45 years of age", etc., there is only a p-value of "residency duration from 0 to 18 years of age" is significant (p=0.034), and the standard coefficient is only a small negative number.
What I am confused about is how to draw conclusions. In this case, should I draw conclusions: 1. The longer the time of living in the rural, the lighter the depression, and living in the rural at the age of 0-18 can significantly reduce depression.
Or: 2, only living in the rural when 0-18 years old can reduce depression.
Similarly, when all variables are insignificant after division, then my conclusion should be
1, the longer the time of living in the countryside, the lighter the depression.
Or 2, time spent living in the country has absolutely nothing to do with depression?
Relevant answer
Answer
You say "...(for example, "residency duration from 0 to 18 years of age" means how long these 65-year-old individuals lived in rural areas during the ages of 0-18)." Sorry. You are correct that for multiple regression they would all be included. So if it was zero years, you count them in that sample. But there are probably fewer zeroes for the number of years lived by a 65-year-old in the 0-18 category, as I explained above, perhaps providing more useful data for that category. Zeroes may be problematic here. Regression involving counts may be done differently, and not in my expertise, but might be useful here.
By the way, are your 65-year-olds currently residents anywhere, or are you only looking at 65-year-old current rural residents (because that would make a difference)?
The only other comment that occurs to me is that I think you want to be careful not to make too much of your results. You might even want to separate your study into different parts with different models, perhaps.
  • asked a question related to Multiple Linear Regression
Question
4 answers
Hello everyone!
I need your help with some work I am doing.
Some context first:
I am writing a dissertation for my master. The topic is about perceived trust in Smart Home technology. I launched a survey with a closed ended questions for demographic data, and likert scale that asks 8 Questions on a scale of 1 to 5. I gathered 159 responses in total.
The 8 Questions in ther likert scale are actually 4 different dependent variables. Q1/Q2 make dependent variable1, Q3/Q4 dependent variable 2 etc.
Since it's a likert scale the data is not an interval, so what I did is that I took the sum of Q1 and Q2 and divided it by 2, which gave me a mean. This mean is one of the 4 dependent variables. I did this an additional 3 times for the other 3.
The idea is to test each one of these dependent variables and see if they can be predicted with the independent variables (and control variables) that I have ( age, gender, educational attainment, household size and income).
For that I read that a multiple linear regression would be enough. So I started reading about that method and I saw that there were some assumptions that needed to be met before I could use that method. For normality (3 of the 4dependent variables were normally distributed, but the last one had was not quite normally distributed. Secondly, it seems that testing the the four variables for linearity resulted in all of them not being linear.
Now I need to start the analysis part of my dissertation but I have no clue wich method I should use since the assumptions of the multiple linear regression are not met.
I know about non-parametric tests, but I can't find anything non-parametric alternative for the multiple linear regression.
If you need more info about the variables etc let me know, I will provide them!
Thanks for your help and time.
Relevant answer
Answer
Multiple linear regression (MLR) is a commonly used statistical technique for modeling the relationship between a dependent variable and several independent variables. However, MLR has certain assumptions that must be met for the results to be valid. When these assumptions are violated, the results of the analysis can be biased or misleading.
Here are some common assumptions of MLR:
  1. Linearity: The relationship between the independent and dependent variables should be linear.
  2. Independence: The observations should be independent of each other.
  3. Normality: The errors should be normally distributed.
  4. Homoscedasticity: The variance of the errors should be constant across all levels of the independent variables.
  5. No multicollinearity: The independent variables should not be highly correlated with each other.
If one or more of these assumptions are violated, there are several things you can do:
  1. Transform the variables: If the relationship between the independent and dependent variables is not linear, you can try transforming the variables. For example, you can take the log of the variables or square root them.
  2. Remove outliers: Outliers can have a large effect on the regression results. You can identify and remove outliers from the dataset.
  3. Use a different model: If the assumptions of MLR are not met, you can try a different model. For example, you can use a generalized linear or non-parametric regression model.
  4. Use robust regression methods: Robust regression methods are less sensitive to violations of the assumptions than ordinary least squares regression. You can use methods such as weighted least squares regression or robust regression.
  5. Use variable selection techniques: If there is multicollinearity between the independent variables, you can use variable selection techniques such as forward selection or backward elimination to choose the most important variables.
It is important to note that there is no one-size-fits-all solution for dealing with violated assumptions in MLR. The approach you take will depend on the violation's specific nature and your dataset's characteristics. It is important to carefully examine the results of your analysis and to use good judgment in interpreting them.
  • asked a question related to Multiple Linear Regression
Question
5 answers
Hi!
I'm performing a multiple linear regression with bootstrapping, but I'm also adding my independent variables in steps (hierarchical). So I have age and sex (demographics) in my first model, then add my independent variable for the second model. I now get results for the linear regression, as well as the bootstrap for coefficients. For my interpretation, do I still use the R^2 and F of the original linear regression, in addition to the B from the bootstrap? Do I also report on the confidence intervals, and if so, how? I've only had the basics of statistics during the course of my study so I'm really new to all of this; please feel free to correct any of my mistakes with this analysis!
Relevant answer
Answer
Bootstrapping is a resampling technique that can be used to estimate the variability of a statistical estimator, such as the coefficients of a multiple linear regression model. The basic idea of bootstrapping is to repeatedly draw samples from the original dataset with replacement, and then compute the statistic of interest (in this case, the coefficients of the regression model) for each sample. By doing this many times, we can obtain a distribution of the statistic, which can be used to estimate its standard error and construct confidence intervals.
To interpret the results of a bootstrapped multiple linear regression, you should start by examining the point estimates of the coefficients, as well as their standard errors and p-values. These can be used to determine the significance of each predictor variable in the model, and to make inferences about the relationship between the predictors and the outcome variable.
Next, you should examine the distribution of the coefficients obtained from the bootstrap samples. This can be visualized using a histogram or a boxplot. If the distribution is roughly symmetric and centered around the point estimate, this suggests that the estimate is stable and reliable. On the other hand, if the distribution is skewed or has long tails, this may indicate that the estimate is more variable and less trustworthy.
Finally, you can use the bootstrap distribution to construct confidence intervals for the coefficients. These intervals represent the range of values that the coefficient is likely to fall within, based on the data and the statistical model. A 95% confidence interval, for example, would include the true value of the coefficient with 95% probability, assuming that the model assumptions are correct. If the confidence interval does not include zero, this suggests that the corresponding predictor variable is significantly related to the outcome variable
  • asked a question related to Multiple Linear Regression
Question
5 answers
Salam,
I'm asking If my data is not time series data, then the stationarity is (or is not )a relevant concern for fitting a multiple linear regression model? and if so, what makes differences with time series data?
Thank you
Relevant answer
Answer
It is not the fact that the data are time series which is important. What is important is the presence of autocorrelation. For example, if your data are about the composition of the soil along a road, for instance, every 10 meters, you will also face autocorrelation despite the fact that the data are not time series. It is known how to handle autocorrelation for regression data but it adds difficulties and it requires typically longer samples.
  • asked a question related to Multiple Linear Regression
Question
8 answers
I'm doing a multiple linear regression where some of the independent variables are normally distributed whereas others aren't. The normal P-P plots of regression seems appropriate as the plots are in line. I have 84 participants in total, is that enough to go ahead with linear regression without assumption of normality being met?
Relevant answer
Answer
The normality assumption in linear regression concerns the residuals (error terms), not the independent or dependent variables.
  • asked a question related to Multiple Linear Regression
Question
3 answers
ما هو الانحدار وماهو انواه
Relevant answer
Answer
Simple and multiple linear regression are statistical techniques used to model the relationship between a dependent variable (also called the response variable) and one or more independent variables (also called predictors or explanatory variables).
Simple linear regression involves modeling the relationship between the dependent variable and a single independent variable. The goal is to find a linear equation that best describes the relationship between the two variables. This equation can be used to make predictions about the value of the dependent variable for a given value of the independent variable. The equation for simple linear regression is:
Y = a + bX + e
where Y is the dependent variable, X is the independent variable, a is the intercept (the value of Y when X is equal to 0), b is the slope (the change in Y for a one-unit change in X), and e is the error term (the difference between the actual and predicted values of Y). The goal is to estimate the values of a and b that minimize the sum of the squared errors.
Multiple linear regression, on the other hand, involves modeling the relationship between the dependent variable and two or more independent variables. The goal is to find a linear equation that best describes the relationship between the dependent variable and all the independent variables together. The equation for multiple linear regression is:
Y = a + b1X1 + b2X2 + ... + bnXn + e
where Y is the dependent variable, X1, X2, ..., Xn is the independent variables, a is the intercept, b1, b2, ..., bn are the slopes, and e is the error term. The goal is to estimate the values of a, b1, b2, ..., bn that minimize the sum of the squared errors.
In both simple and multiple linear regression, the quality of the fit can be assessed using various measures, such as the R-squared value, which indicates the proportion of variance in the dependent variable that is explained by the independent variable(s), or the F-test, which compares the fit of the regression model to a null model (i.e., a model with no independent variables).
Linear regression can be used for various purposes, such as predicting future values of the dependent variable, identifying the most important predictors of the dependent variable, or testing hypotheses about the relationship between the dependent and independent variables. However, it is important to be aware of the assumptions underlying linear regression, such as the linearity, independence, normality, and homoscedasticity of the errors. Violations of these assumptions can lead to biased estimates and invalid conclusions.
  • asked a question related to Multiple Linear Regression
Question
6 answers
I encountered a problem while performing multiple linear regression and ols single factor regression. "There are two factors with low coefficients when performing single factor regression, but when performing multiple factor regression, the coefficients of these two factors are also high.". After removing one factor and performing multifactor regression, the coefficient of the other factor also decreased significantly. But. These two factors have passed the collinearity test and are not collinear. Therefore, why is there such a result? Are these two factors good at fitting the equation?
Relevant answer
Answer
How large is your sample size, and how large is the correlation between the two predictors?
  • asked a question related to Multiple Linear Regression
Question
4 answers
I have gathered data from 60 companies, across 5 years.
I want to do a Multiple Linear Regression Analysis with some variables, from 2017 to 2021.
1) Should I do the average of the 5 years for each variable for each company, and do the regression, having 60 observations, or
2) Should I threat the data as individual points, meaning observations = n*5 = 300
How do I display that information (colums and rows), on the excel sheet (if option 2)?
Thanks a lot
Relevant answer
Answer
Do not average, but use a multi-level model with company-level random effects. Also, you should examine how time itself influences your outcome directly. In one of the R regression engines (lme4, rstanarm, brms), the model would look like this:
outcome ~ 1 + time + pred_1 + pred_2 + (1 + time + pred_1|Company)
In this model, all companies get their own slope over time. Pred_1 is also estimated on populatio-level and per individual company. Pred_2 is an example of a pure population-level effect (aka fixef effect). That is useful for predictors that do not vary within one company, such as the country of residence.
To estimate multi-level models, you'll have to bring your data into long-format (one outcome measure per row), e.g.:
CompID | Pred_1 | Pred_2 | time | outcome
  • asked a question related to Multiple Linear Regression
Question
1 answer
optimisation et prédiction du ratio KWH/m3 dans une station de pompage par la modélisation mathématique régression linéaire multiple RLM et réseaux de neurones artificiels RNA
  • asked a question related to Multiple Linear Regression
Question
4 answers
I generated a scatterplot for Multiple Linear Regression analysis using Sigmaplot software and I want to display the regression equation. But, I couldn't able to do it. Can anybody suggest to me how to display the equation on sigma plot.
Relevant answer
Answer
Dear Fiseha, when in presence of a multiple linear regression, the usual way of representing it in a bivariate space is to use as axes :
X axis = expected value of Y from the fitted equation
Y axis = observed value of Y.
This representation will give rise (if the equation has a decent fit) of the cloud of points elongated along the main diagonal of the plot (Y=X) allowing you to estimate by eye ouutliers, corresctly fitted points and so forth...
  • asked a question related to Multiple Linear Regression
Question
3 answers
I am conducting a meta-analysis on a research study and will need to build a model for my data to conduct a multiple linear regression analysis. Can anyone provide a sample model or step-by-step approach to building a model? Thanks.
Relevant answer
Answer
Here is a playlist of videos of how to conduct Meta-Analysis & Meta-Regression In R: https://www.youtube.com/playlist?list=PLrLWLaG7yx85X7ZjN4ySllDQ-hiB9Q4nm
  • asked a question related to Multiple Linear Regression
Question
5 answers
I have clinical parameters, and I need to find the relationship between the variables; any Ideas or suggestions on how to do the calculation?
Relevant answer
Answer
Dear Lina Naji
You may look at the following article.
Best of luck
  • asked a question related to Multiple Linear Regression
Question
6 answers
I have fitted a multiple linear regression model with three continuous independent variables and one independent categorical variable. How to visualize or display the fit?
Thanking you in advance
Relevant answer
Answer
It's obviously going to be difficult to try to display all those variables in a single plot.
A) A simple approach is to plot the observed values vs. the predicted values from the model.
B) You could plot the dependent variable vs. one continuous independent variable grouped by the categorical variable. Three plots like this may be informative (e.g. : https://rcompanion.org/rcompanion/images/e_04_04.jpg )
  • asked a question related to Multiple Linear Regression
Question
3 answers
The correlation coefficient between one of the independent variables and the dependent variable has a p value greater that the alpha value of 0.05
Relevant answer
Answer
Bivariate screening of candidate predictors for a multivariable regression model is considered a bad practice that tends to produce overfitted models. See Mike Babyak's 2004 article, for example:
And having non-significant regressions in a multivariable model is not problematic either. There is more likely a problem when all variables are statistically significant, in fact--unless n is quite large. See these two sections in the DataMethods.org author checklist, for example (link below):
  • Use of stepwise variable selection
  • Lack of insignificant variables in the final model
HTH.
  • asked a question related to Multiple Linear Regression
Question
20 answers
I am running a multiple linear regression between demographics and performance on a measure. However, despite all the other assumptions being met, linearity hasn't (example image attached). I have applied a log10 transformation to the data and that hasn't helped. Are there any other corrections that can be applied?
Relevant answer
Answer
Daniel Wright Unfortunately I can no longer see what you had written as Rgate will not let me see previous replies; so I am remembering/guessing.
By 'corrective' action I mean using a method that does not make the strong assumptions of the standard model. For example if you see heteroscedasticity in a catch-all plot I would model that heterogeneity and see if it makes a difference to the result; I think that is much better than some test for it . As Box (1953) notes of Bartlett's test "To make the preliminary test on variances is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for an ocean liner to leave port!" The GAM approach allows for non-linearity, displays the partial residuals so you can see what is going on (very helpful in the multiple predictor case) and gives you the linear model if there is not strong evidence for a curve.
I think we are all getting more aware of the "Forking paths" in the practice of modelling. I am reminded of the phrase that you would not eat sausages if you saw them being made! The good thing about GAM is the inbuilt cross validation which is aimed at reducing overfitting. I have enjoyed reading Stuart Ritchie's Science Fictions and think the only way is to have a plan, being open about what you have done and use cross validation to guard against implicit p-hacking.
Back to the original question - it looks to be an underlying linear relation to me and I would stick with that.
  • asked a question related to Multiple Linear Regression
Question
7 answers
To be more precise, my dependent variable was the mental well-being of students. The first analysis was chi-square (mental well-being x demographic variable), hence I treated the dv as categorical. Then, in order to find the influence of mental well-being on my independent variable, I treated the dv as a continuous variable so that I can analyse it using multiple regression.
Is it appropriate and acceptable? and is there any previous study that did the same thing?
Need some advice from all of you here. Thank you so much
Relevant answer
Answer
اذا لم نقوم باستعمال الادوات الاحصائية الدقيقية لهذا المجال بالنسبة للمتغيرات البحث سوف يؤدي الى الى اخطاء احصائية في دقة النتائج ولكن ممكن استخدام في بحثين منفصلين كمتغير تابع @
  • asked a question related to Multiple Linear Regression
Question
4 answers
Hi, I am new here but really hope someone can help.
I ran a hierarchical multiple linear regression with 3 predictors in the first block and 1 in the second block. In the first model, only one of my predictors was significant, but I only included it to control for it as I expected it would be highly correlated to the DV. In my second model, one of the predictors that was non-significant in the first model is now significant. Can anyone explain what that means and how I can discuss those results? It would be great if you could also point me towards books or papers that explain this.
Thanks,
Charlenne
Relevant answer
Answer
This could be due to a suppressor effect. A suppressor effect can occur when predictor variables are highly correlated with one another. In that case, one predictor can "suppress" variance in the other predictor and thereby enhance the predictive value of that other predictor. It is even possible that a predictor that is not significantly correlated with the DV turns out to be a "significant" predictor in a multiple regression due to suppression.
If you do a literature search for "suppression" or "suppressor effect" in regression, you should be able to find a number of relevant resources.
  • asked a question related to Multiple Linear Regression
Question
3 answers
Hello everyone,
I would like to do a multiple regression with 4 independent variables and 1 dependent variable. Also i have a dichotomous moderator "gender" which is split in female = 1 and male = 2.
How do i test the moderator with SPSS to see if it is linear?
I have already checked the assupmptions of the multiple linear regression with the dependent variables and independent variable using partial regression plots, But how can i check the dichotomous moderator if it is linear?
Thanks in advance!
Relevant answer
Answer
Again, there is no way that a dichotomous variable could have a non-linear relationship with another variable. Therefore, no need or possibility to check for non-linearity.
You can see this by plotting the dichotomous variable against the DV in a scatter plot. The dichotomous variable has only two possible values on the x axis. The OLS regression line will go through the group means. You could not fit anything other than a straight line through the two means.
  • asked a question related to Multiple Linear Regression
Question
6 answers
My dataset is not very big (between 100-200). the residuals are not normally distributed. so:
1. Is there any other statistical method similar to multiple linear regression but suitable for this case?
2. If not, what can be the solution?
Thank you
Relevant answer
Answer
Look at the severity of the violation. It may be just negligible given the large sample size.
  • asked a question related to Multiple Linear Regression
Question
1 answer
When performing sensitivity analysis of the activated sludge model, multiple regression analysis of the different parameters (variables) with the model output is required. However, the model output is also a function of time, so I am confused how to implement a multiple linear regression of the model output with the parameters (variables) in MATLAB to find the linear regression coefficients.
Relevant answer
Answer
Add the function of time as another IV just like you would any other eg an interaction term. David Booth
  • asked a question related to Multiple Linear Regression
Question
4 answers
Hello dear community,
I have a question regarding multiple linear regression (MLR), moderation analysis and median splits (dichotomization). For the context: I have a dichotomous independent variable OKO, continuous moderator KS and die Independent variable IPL. The question that I am asking myself is if I should dichotomize the moderator variable KS. The reason I am asking that myself are the following results.
Regressing IPL on OKO and OKO*KS (KS as continuous variable) yields the unstandardized regression coefficients:
OKO - 2.017
KS - 0.2189
OKO*KS 0.6475
=> meaning that KS dampens the negativ relation between OKO and IPL.
However, if I include the dichotomous variant of KS (KS_M) into the regression instead of KS (continuous variable) I get the following unstandardized regression coefficients:
OKO 0.272
KS_M -1.312
OKO*KS_M 1.820
=> meaning that KS amplifies the positive relation between OKO And IPL.
Can someone explain to me why I get contrary results?
THANK YOU
Relevant answer
Answer
Thou shalt not dichotomize...
There is really no good reason to dichotomize a continuous moderator as this can introduce all kinds of problems. Simply use the continuous version and plot the interaction as suggested by David Booth.
  • asked a question related to Multiple Linear Regression
Question
7 answers
For my data analysis I am conducting a multiple linear regression model. Currently I am testing the following assumptions:
1. Normality of residual errors
2. Multicollinearity
3. Homoskedasticity
Since the normality of residual errors was violated, I transformed my dependent variable into a log variable. However, I am wondering whether I should now continue testing multicollinearity and homoskedasticity with this new log variable or still use the variable before data transformation? Moreover, when writing descriptives & correlation matrix should I include the variable before data transformation or the log?
Hopefully someone can help me! :)
Relevant answer
Answer
Julie Schipper -
For heteroscedasticity, even after a transformation - which I would only do if necessary, and not to address heteroscedasticity - there can still be heteroscedasticity. I recall an example among those that Penn State puts on the internet where a transformation had been done for purposes of addressing heteroscedasticity in a real estate problem, and they still had substantial heteroscedasticity remaining. You can have a kind of artificial, I say "nonessential" heteroscedasticity, which may be the result of a problem with your model, but the natural, I say "essential" heteroscedasticity that you should expect can be impaired, again, by a problem with your model. Data quality issues can also be involved, etc. See https://www.researchgate.net/publication/354854317_WHEN_WOULD_HETEROSCEDASTICITY_IN_REGRESSION_OCCUR.
With enough data, you can measure it well using the following:
Defaults are mentioned in one sheer of that Excel file, and it would be reasonable for the coefficient of heteroscedasticity as defined here to be from 0.5 to 1.0. See https://www.researchgate.net/publication/320853387_Essential_Heteroscedasticity, as determined by Ken Brewer.
For more information, see the following:
If you are working with data which have already been transformed, then that changes things in ways perhaps both good and problematic. The problematic can also include problems of interpretation, as Jochen noted. I suggest only using transformations if you absolutely must, even if you are going to "back-transform" as Salvatore mentioned. I think that the simpler you can keep things, the better.
Best wishes - Jim
  • asked a question related to Multiple Linear Regression
Question
4 answers
My data analyst has used a concept called "pre-hypothesis"; it isn't equivalent to the hypothesis equation, but it is for checking Durbin Watson.
My problem is that I haven't found any source that has used the same concept (i.e., pre-hypothesis) for Multiple Linear Regression in general and Durbin Watson in particular.
I would be more than grateful if anyone could provide me with:
1. an explanation of "pre-hypothesis" in this concept, and if that equals "assumption"; and
2. a source in which "pre-hypothesis" is used and explained.
Relevant answer
Answer
Ask your analyst to explain it coz nobody else says stuff like this. Good luck on finding a better analyst
Best wishes David Booth
  • asked a question related to Multiple Linear Regression
Question
3 answers
Hello, I'm a student research looking to apply a multiple linear regression to my data set of all continuous values. My question is mainly about a step-by-step idea of how to do this, as I believe I understand the general gist of it all, but not quite the whole picture.
To start, it is my understanding you plot a scatterplot of your independent vs dependent data and your independent vs independent data. This helps determines if there's a linear relationship between the variables, and if there's any collinearity between your independent variables. In doing so, after this step I can eliminate the independent variables that either have a strong collinearity or no relationship with the dependent variable.
After the initial screen, I am under the impression I can run a step-wise(or forward/backward) multiple linear regression slowly adding in the variables that fits the BEST MODEL. Is that a correct way to look at it?
Relevant answer
Answer
Doing a meaningful analysis is, unfortunately, not that simple.
The problem are possible interactions between variables that may either obscure non-linear relationships with the response variable or wrongly imply a non-linear relationship if the impact of the other variables is not considered at the same time (i.e. when you look at the marginal relationships). There is no easy way out. You need to make some assumptions, and they should be resonable - and justifying this is not done by means of statistics but by expert knowledge of the subject matter. A help - to a certain limited amount - can be residual diagnostics: if the residuals do not show any pattern and seem to be a sample from a normal distribution, then this is a slight hint that the model might be approriate or at least that it does not miss anything really important (like an important interaction or non-linear relationship).
Stepwise variable selection is a very bad approach to build a model. You should definitely avoid this. The selection of variables (and non-linear relationships, and interactions) should be based on theoretical considerations, expert knowledge of the subject matter and the purpose of the model. If you consider a variable important enough to inlcude it in the model, then there should not be a statistical reason (like the lack of a statistically significant effect estimate given your data) to exclude it from the model.
If you have many variables, and you are just looking if they allow you to somehow predict the outcome, modelling using linear models (or alikes) may actually not be the best choice. You may consider regression trees, random forrests or deep learning approaches. It's evident that this needs quite a lot of data, but this is the proce you have to pay if you have no theoretical concept but still want to create/build some tool that uses data to make good predictions.
And, finally, there is no BEST MODEL. There are many different possible models, all with their own pros and cons. Only if you have very, very specific demands what the model should do (and if you are able to formulate this in a mathematical way) you may be able to identify a specific model that is best suited for exactly this purpose. And again, decinding on this best model is not (only) a matter of statistics you calculate for your observed data - it remains mainly a task requiring expert knowledge in the subject matter.
If you are expert in the subject matter, it may be very helpful to cooperate with a local statistician to identify a good solution.
  • asked a question related to Multiple Linear Regression
Question
4 answers
Hi! I am trying to run the following multiple linear regression in R:
r ~ condition (1 = Control, 2 = Active Control, 3 = Treatment, 4 = Importance Treatment) + type (0 = false, 1 = true) + age (13, 14, 15, adult) + domain (1 = eco, 2 = health, 3 = society, 4 = culture)
r is the intention to share a certain headline for a certain participant (initially given on a 6-points scale but the score being transformed into a variable comprised between 0 and 1) , where participants are randomly assigned to 4 conditions, and we want to test the respective effect of 2 treatment conditions vs 2 controls on the intention to share false headlines (we predict it will reduce the sharing of fake news, without impacting the sharing of real news), knowing that in all the conditions the main task consists in assessing the intention to share 24 headlines successively presented to the participant, half false half true (so we also want to know the effect of the "type" of the headline, being true or false; and eventually the effect of its category - within the 2 sets of headlines, we always have 1/4 of headlines with an economic subject, 1/4 on health, 1/4 on "society", and 1/4 on culture, although it is less important); and finally we are testing different age groups (13, 14, 15 years old, as well as an "adult" group with participants aged from 25 to 36 years old pooled together), to know if the effect predicted varies with age etc.
The main hypothesis of our study are :
-that treatments will improve discernment, defined as the difference between the intentions (r > 0.5) to share true news and the intentions to share fake news (ie it will really improve the quality of sharing, and not merely causing general skepticism),
-that this effect will be higher for headlines perceived as the most inaccurate (there will be an analysis on headlines as well, based on pretest), as we suppose the effect of the treatment works by refocusing the attention of the participant on the accuracy criterion, hence being greater for headlines that were generally (we will calculate a mean perceived accuracy for each headline across participants) perceived as the least accurate ones, and that will be consequently the ones for which sharing intentions will drop the most thanks to the treatment
-We have no particular prediction concerning age, which is precisely the novelty of the study (the literature review concerning adolescents' ability to evaluate fake news and their sharing online leaving the door open to quite different scenari)
-and no prediction for domain as well (we expect it won't play a role as the headlines have been chosen to be quite similar in tone, whatever the category)
Also, I need to cluster by participant (since there are repeated measures for each participants) and headline (multiple ratings for each headline).
I thought of using lm_robust but I don't know if we can put 2 clusters directly?
I also wonder what is the simplest way to check for potential effect of other secondary measures (like scores on a Cognitive Reflexion Task, gender, CSP appartenance etc): do I have to do regression testing all simple effects, interaction etc, or can I just add them to the global formula?
Thanks in advance!
Relevant answer
Answer
Thank you very much for all your answers! I have unfortunately no choice but to use R... so I am going to go for the lme4 method.
Thanks again to you all!
  • asked a question related to Multiple Linear Regression
Question
8 answers
Through a survey, I have measured the Big Five Personality traits and Task Performance.
Now I'm about to do the regression analysis, however, I'm not sure if I should do a simple linear or multiple linear regression.
In the case of openness when I run a simple linear regression (DV: task performance IV: openness) my results are:
Sign: .001
Unstandardized B= .494
so the results are positive & significant
But when I run multiple linear regression (DV: task performance, IV: openness, conscientiousness, agreeableness, extraversion, neuroticism) my results for example for openness are:
Sign: .311
Unstandardized B= .110
so the results are positive & not significant
Now, I'm confused about which one should I use and why.
Relevant answer
Answer
The multiple regression result shows you that there is partial redundancy between the independent variables (they are correlated and overlap in terms of "explaining" the DV). Openness is no longer a significant predictor once the other predictors are in the model because one or more of them accounts for (partially) the same portion of variance as does openness.
Another issue could be overfitting. With additional correlated predictors, standard errors will become larger, reducing statistical power. Therefore, with more predictors in the model, previously significant predictor variables may become insignificant.
The question is what you want to find out. If your goal is to determine the most relevant predictor variables then the multiple regression model would be preferred because it accounts for the redundancy of the predictors. Depending on your sample size, you may not have enough power though given the number of predictors.
  • asked a question related to Multiple Linear Regression
Question
5 answers
I want to know how age, obesity (0 for no and 1 for yes), smoking status (0 for no smoking and 1 for having smoking) affect the serum Vitamin D level using regression with SPSS.
The best-fit curve for age reported: Linear (R2=0.108), Quadratic (R2=0.109), Cubic (R2=0.106)
So I assumed that the E(VitaminD)= b0 + b1*Age +b2*Age^2
But if want to put this into a multiple regression analysis with age, obesity, and smoking status as independent variables, which one should I use? Age+obesity+smokingStatus or Age^2+obesity+smokingStatus?
Relevant answer
Answer
Ho Nguyen Tuong -
Are you sure you just want "yes" or "no" for obesity? How about a BMI value? Also, how much smoking? At any rate, you can compare model results for a given sample using a "graphical residual analysis." (You can research that online.) Also, you probably want to consider a "cross-validation," as you do not want to overfit a model to a particular sample when that may mean your model won't fit so well to other data which you wanted to cover. Trying more than one sample could help.
Best wishes - Jim
  • asked a question related to Multiple Linear Regression
Question
6 answers
I have a query regarding which type of regression analysis to use for my study. I have used a scale (dependent variable) that contains 9-items and each item is marked on a 5-point Likert scale. Scores of each item are summed and ranges from 9 to 45. Higher score indicates the respondent has more characteristics of that construct.
Similarly, there are two independent variable. One IV has 20-items and each item is marked on a 4-point Likert scale. Score ranges from 20 to 80. Second IV has 7-items and each item is marked on a 5-point Likert scale. Score ranges from 7 to 28.
The reviewer has suggested me to use non-parametric tests since my data is ordinal. However, previous studies have used Multiple Linear Regression using similar types of constructs.
Which type of regression analysis is appropriate in this case - ordinal regression or multiple linear regression? Any literature explaining this would be highly useful.
Relevant answer
Answer
I'd normalize the DV (that is, rescale it so that the value will be with 0...1) and use a beta-model or a quasi-binomial (regression) model.
  • asked a question related to Multiple Linear Regression
Question
3 answers
I have been working with a GAM model with numerous features(>10). Although I have tuned it to satisfaction in my business application, I was wondering what is the correct way to fine tune a GAM model. i.e. if there is any specific way to tune the regularizers and the number of splines; and if there is a way to say which model is accurate.
The question is actually coming from the point that on different level of tuning and regularization, we can reduce the variability of the effect of a specific variable i.e. reduce the number of ups and downs in the transformed variable and so on. So I don't understand at this point that what model represents the objective truth and which one doesn't; since other variables end up influencing the single transformed variables too.
Relevant answer
Answer
Cross validation algorithm in scikit-learn of Python is working very well in tuning hyper-parameters.
  • asked a question related to Multiple Linear Regression
Question
2 answers
Hi everyone, I am working with a disjunctive model for decision-making. But here I'm a bit confused how can I figure out Co (Co is a constant variable above the largest Xi to ensure Y
will not be infinite) value when doing multiple regression analysis in SPSS? Does it have any contact value?
Relevant answer
Answer
I'm confused about the model you are trying to use... Is there one value for Y and several values for X and for B ? And then you need to determine the best value for Co ? ...
  • asked a question related to Multiple Linear Regression
Question
9 answers
I seem to have a hard time with the statistics for linear regression. I have been scrolling on the Internet, but did not find an answer.
I am testing the assumptions for linear regression, of which one is homoscedasticity. My data however shows a heteroscedastic pattern in the scatterplot. But how do I check if this is correct? Whether it is actually heteroscedastic...
Could I still perform the linear regression even though my data seems to be heteroscedastic?
I have tried transforming my DV with ln, lg10 and sqrt but the heteroscedasticity remains visible.
Relevant answer
Answer
Ellen Kroon, an alternative would be to estimate/use Heteroskedasticity Robust Standard Errors in your significance testing. This can be done easily in Stata or R. See the link below for more: https://www.r-econometrics.com/methods/hcrobusterrors/
  • asked a question related to Multiple Linear Regression
Question
4 answers
there are different models on regression used in wind speed forecasting. I need some relevant papers and reading sources and writings published on Multiple linear regression , moving average regression and K-nearest neighbour classification and regression methodology to develop a good understanding.
  • asked a question related to Multiple Linear Regression
Question
2 answers
I have 3 DVs and 5 predictor variables (technically, only 1 IV and the other 4 are controls).
I ran it on Stata and all seems to have worked (since mathematically, all predictor variables are treated the same way), but technically I only identify one of my predictor variables as the IV of interest. I would rather stay away from SEM. Thank you so much!
Relevant answer
Answer
Thank you very much and hope you are staying warm, Xingyu Zhou
  • asked a question related to Multiple Linear Regression
Question
4 answers
Hello everyone!
Can we report the results of independent sample t-tests and multiple linear regression analysis in one table? If yes, what information should be put in that table? Thank you very much.
Relevant answer
Answer
Hello Ayyu,
Reporting conventions differ by outlet, so you may wish to consider what is usual and customary for your target.
If you're talking about summary statistics for the variables involved (means, SDs, correlations among variables), then yes; one table can suffice.
If you're talking about the outcomes of the two methods, then no, generally not. The exception would be if the table was restricted to: (a) test name; (b) p-value; and (c) indicator of whether result was statistically significant.
For MLR, you'd generally want to present: (a) regression coefficients (both unstandardized and standardized) for each independent variable; (b) test of significance for that variable's coefficient; and in a summary, the overall R-square and test of whether the overall regression was significant.
For t-test, a table really isn't needed, beyond that for summary statistics. In text, something like: "t(42) = 4.22, p < .001" can suffice.
Good luck with your work.
  • asked a question related to Multiple Linear Regression
Question
11 answers
I want to study the risk factor on a dependent variable which is percentage of lame cows on the farm-level,using a multiple linear regression on spss. Is that possible?
Relevant answer
Answer
Christer Thrane thank you for your responses. I appreciate your help
  • asked a question related to Multiple Linear Regression
Question
5 answers
Hello my friends - I have a set of independent variables and the Likert scale was used on them and I have one dependent variable and the Likert scale was used as well. I made the analysis and I want to be sure that I'm doing this right - how I can use control variables such as age, gender, work experience and education level as control variables to measure their effect on the relationship between the independent variables and the dependent variable? Please give me one example. Thanks
Relevant answer
Answer
you may include all of your control variables - age, gender, work experience and education level - in one block and begin the regression analysis with them. Next, you include the independent variables that are supposed to have an effect on your dependent variable. I would recommend entering them one after another so that you can observe the change each new variable causes to the ones already included and the explained variance in the dependent variable. Once you have entered all variables, you will have controled for the initial four variables, presumably rendering they influence on the dependent variable insignificant.
I do not have and article present, but perhaps someone else can help you out.
Best
Marcel
  • asked a question related to Multiple Linear Regression
Question
4 answers
I'm interested to compare multiple linear regression and artificial neural networks in predicting the production potential of animals using certain dependent variables. However, i have obtained negative R squared values at ceratin model architecture. please explain the reason for neagative prediciton accuracy or R square values.
Relevant answer
Answer
If R2 for your regression is negative, it means that your regression predicts worse than the simple mean value predictor (e.g., when you simply predict that y = mean(x)).
  • asked a question related to Multiple Linear Regression
Question
8 answers
A manpower deployment model built using the Multiple Linear Regression Method got a higher P-value (above 0.05) for the final model. This research is basically based on an empirical study. The reason for the higher P-value was the problem with the data we received for the independent variables. ( Not highly accurate). The data sample is 240. My final question is concluding this research final model with higher P values reasonable? We found out other several reasons related to the data errors (organizational data entry problems). We have good reasons to show the overall model. Is it okay to conclude in that way?
**The overall model has one dependent variable and two independent variables (two independent variables also shows a high correlation between each other)**
Relevant answer
Answer
For "graphical residual analyses," you could compare different models on the same scatterplot for the same sample. (Different models here would have one predictor, or the other, or both, with or without an intercept if that is in question, though subject matter knowledge could tell you if y is zero when the predictor or predictors are all zero.) For "cross-validation," you could try this on different subsamples, but results for other data not collected could still be different.
  • asked a question related to Multiple Linear Regression
Question
4 answers
Hi,
I have 2 categorical and 1 continuous predictors (3 predictors in total), and 1 continuous dependent variable. The 2 categorical variables have 3 and 2 levels, respectively, and I have only dummy coded the variable with 3 levels, but directly assigned 0 and 1 to the variable with only 2 levels (my understanding is that if the categorical variable has only 2 levels, dummy coding is not necessary?).
In this case, how do I do and interpret the assumption tests of multicollinearity, linearity and homoscedasticity for multiple linear regression in SPSS?
Thank you!
Relevant answer
Answer
Yufan Ye -
Have you looked at a "graphical residual analysis?" You can search on that term if you aren't familiar. It will help you study model fit, including heteroscedasticity. Also, a "cross-validation" may help you to avoid overfitting to the sample at hand to the point that you do not predict so well for the rest of that population or subpopulation which you wish to be modeling.
If this model is a good fit, I expect you will likely see heteroscedasticity. See https://www.researchgate.net/publication/354854317_WHEN_WOULD_HETEROSCEDASTICITY_IN_REGRESSION_OCCUR.
Plotting your data may be very helpful, and much more informative and practical than "assumption tests."
Cheers - Jim
  • asked a question related to Multiple Linear Regression
Question
4 answers
  • When it comes to multiple linear regression. If we take the dependent variable as an example time, then the rest independent variables should be in time? Here this MLR uses for model building-related task completion time. So whatever independent variables we get should be in a timely manner? cant we get the number of people as an independent variable?
Relevant answer
Answer
The units in which the dependent and independent variables are measured do not have to be the same for multiple linear regression. For example, you could try to predict job success in salary dollars (DV) from IQ scores (IV1), number of years of education (IV2), and gender (IV3). In this example, all variables would have different units of measurement.
  • asked a question related to Multiple Linear Regression
Question
15 answers
r2 for MLR = 10-20 % approx at various train:test ratio
but for ANN it is 1-4%.
why?
thanks in advance
Relevant answer
Answer
What is the size of your data? Is it big data or a small one? MLR is a parametric model which assumes certain criteria. Neural needs big data as it uses multiple and many parameters. That may be one reason for this difference. Pls read more about parametric vs non-parametric models
  • asked a question related to Multiple Linear Regression
Question
3 answers
Hi everyone! I'm running multiple regression models on incomplete data using R, and I applied the MICE algorithm to deal with the missing data. I've been able to get the pooled coefficients (B, t-tests, p-values) with no effort using the available scripts, but I couldn't find a way to obtain "goodness of fit" measures (like adjusted R squared and F) for the pooled data (not individual imputations or original data, but pooled values). I got the same problem using multiple imputation in SPSS.
Thank you very much for your attention, any help will be greatly appreciated!
Relevant answer
Answer
I would suggest using the pool.r.squared() function from the mice package.
It should give your desired results if you are using the lm modelling function.
Best wishes,
Francesco
  • asked a question related to Multiple Linear Regression
Question
7 answers
Dear colleagues,
I want to raise this question to the community. I know that in the ANOVA test where we compare means, for example, in different groups we will have problems with multiple comparisons since we only know the F-test results but group-to-group difference is unknown. Therefore, we would choose multiple comparison correction methods, such as Tukey's, Scheffe's, or Bonferroni, to adjust the p-value and explore each pair difference and significance.
However, while I am conducting multiple linear regression analysis, by implying stepwise (backward selection) method, that means I have an DV (QoL scores in eight different domains and two summary components; scores is continuous; the Rand-36 or SF-36), and a group of potential and associated IVs (factors; categorical continuous type) .
For the models that I get from the auto-selection process (stepwise and backward), I would like to ask will there be problems about multiple comparison? why? and what would be the recommended solutions to the kind of multiple comparisons problems in this multiple linear regression model building? Thank you!
Relevant answer
Answer
There will be problems because you used an auto-selection process.
You can fit a given model to obeserved data, but you cannot use observed data to specify the model that should be fitted to. Model specification should be driven by theory. Surely, observed data may give you ideas about an improved model specification ("exploratory data analysis"), and such an improved model can in future be fitted to new observed data.
Automated model selection is a kind of extreme exploratory analysis, you actually lose completely the possibility to interpret any statistical significance of the data. This is in principle nothing bad. Just don't interpret significance and everything is "fine". However, the automated process is clearly not recommended because it ineviably will just find the best fit to all peculiarities of our observed set of data, disregarding any theory, what is never a very clever thing to do.
  • asked a question related to Multiple Linear Regression
Question
2 answers
I am looking at whether stress levels reduce from time point 1 to time point 2 when engaging in recommended resources.
DV's - stress levels from time point 1 and time point 2
IV's - engaged in resource 1 and resource 2
Relevant answer
Answer
Yep, would agree, and recommend to use residualised change scores.
  • asked a question related to Multiple Linear Regression
Question
9 answers
Regarding either a linear or a machine learning based regression analysis, how should we perform the normality test of the model? should we consider all data or just the training or testing dataset? I would be grateful if anyone could describe it with more details!
Relevant answer
Answer
Non-normality will not matter much, outliers will.
  • asked a question related to Multiple Linear Regression
Question
3 answers
Hello,
I am using Multiple Linear Regression model while analyzing the UTAUT2 model. However, the Durbin Watson value appeared as 2.080. How to check if the survey data has any negative autocorrelation as the result is <2? The sample size is 209.
Checked the DW model table from Savin and White (Durbin-Watson Statistic: 1 Per Cent Significance Points of dL and dU) and it lies in dU (k=10). Can anyone help explain my situation?
If the negative autocorrelation exists, how to avoid this without increasing the sample size?
  • asked a question related to Multiple Linear Regression
Question
4 answers
Hey all,
For my master's thesis I am conducting research to determine the influence of certain factors (ghost games, lack of fan interaction, esports) on fan loyalty. For my statistical analyses, I will first conduct confirmatory factor analysis to validate which items (f.e., purchased merchandise) belong to what latent factor (f.e., behavioral loyalty).
However, I am unsure for my next step. Can I use multiple lineair regression with my latent variables to identify the relationship between the factors and loyalty. The data is collected through a survey of mainly 7-point Likert scale questions. Can I use lineair regression or is ordinal regression a must with Likert scale data?
Thanks in advance for answering!
Relevant answer
Answer
In statistics, it's crucial to think about testing the assumptions that go along with the analysis and determining the suitability of the data set for various analyses. Using factor analysis, few factors are extracted from the large number of related variables to a more manageable number, prior to using them in other analysis such as multiple regression or multivariate analysis of variance:
  • asked a question related to Multiple Linear Regression
Question
8 answers
my research is accuracy of ECG among doctors
dependent - ECG score (normally distributed)
independent - sociodemographic characteristic ( not normaly distributed)
which do i use for one to one correlation?
pearson/ spearman OR single linear regression?
what is the differences
i was told it was the same
but yet the results are different
i want to proceed with multiple linear regression
Relevant answer
Answer
All the fellows have contributed well. Whenever we are to analyze the data, we ask ourselves whether we want to check whether variables are related or one is said to have impact on the other. If the case is of measuring strength of relation then we can use cross-tabulation (nominal or ordinal data), Spearman (ordinal data) and Pearson f(Continuous data).
But if the question is regarding one variable being the reason for the other then we need to use regression.
  • asked a question related to Multiple Linear Regression
Question
2 answers
Hi there, essentially, I have collected data for my study that includes injury frequency of particular injuries i.e., Sprained ankle x 5. I also have the experience of each individual i.e., they have played for 5 years. Due to the varying years played, in order to standardise the data to make it comparable i divided injury frequency by the total amount of years each person has played e.g. 5 injuries across 5 years would result in a standardised frequency of 1.
My question resides in, when I want to look at the effect of experience on my standardised injury frequency, does the fact that I have used experience as my standardising factor affect the results published. The only reason I ask is due to my results of a Multiple linear regression (there are other variables) showing that experience has a significant negative effect, which logically shouldn't happen as the more exposure you have, the increased probability of sustaining an injury would increase.
Thanks in advance
George
Relevant answer
Answer
In regards to experience, there are several ways to look at it. Many believe that an inverse correlation between skill level (which is closely aligned with experience) and injury exists--which, based on my research and many others, favors that point of view. However, on the other hand, since the most experienced athletes will be on the field or will play more than their inexperienced peers, the opportunity for injury exposure would be greater. Lastly, more experienced players may be more prone to attempt riskier maneuvers during play, increasing the potential for injury.
  • asked a question related to Multiple Linear Regression
Question
3 answers
As part of my research, I have to analyse 10 years time series financial data of four public limited companies through Multiple Linear Regression. First I analysed each company separately using Regression. The adjusted R square value is above 95% and VIF is well within limits but Durbin-watson measure shows either 2.4, 2.6 or 1.1 etc which signifies either positive or negative auto correlation. Then I tried the model with the combined data of all the four companies. This results in very less adjusted R squared value (35%) and again a positive auto correlation of 0.94 Durbin-Watson . As I am trying DuPont Analysis where the dependent variable is Return on Equity and independent variables are Net Profit Margin, Total Asset Turnover and Equity Multiplier, which are fixed, I cannot change the independent variables to reduce the effect of auto correlation. Please suggest me what to do.
Relevant answer
Answer
Time series analysis allows you to have various models like auto regressive etc.which would explain the behavior of your data based on which you can project the d.v. into the future.
  • asked a question related to Multiple Linear Regression
Question
7 answers
If dependent variable is continuous. Is it justifiable to use both categorical (region, gender, history of disease, etc.) and continuous variables (direct cost, indirect cost) as Independent variable in multiple linear regression model in epidemiological studies.
Relevant answer
Answer
Hi,
Categorical variables as I.Vs can be converted into dummy variables and studied.
  • asked a question related to Multiple Linear Regression
Question
6 answers
Hi everyone,
My results show for some models that the model itself is not significant, but some independent variables within the model are significant (including the constant) in SPSS.
I was wondering how I should interpret these results? What can I say/conclude and what not?
Thank you in advance,
Minke
Relevant answer
Bruce Weaver, thank you for this clarification. Greatly appreciated.
  • asked a question related to Multiple Linear Regression
Question
6 answers
Dear All,
I am working on a data having cost of care as DV. This is a genuinely skewed data reflecting the socioeconomic gap and therefore healthcare financing gap among population of a developing country. Because of this skewness, my data violated normality assumption and therefore was reported using median and IQR. But I will like to analyze predictors of cost of care among these patients. 
I need to know if I can go ahead and use MLR or are there alternatives?
The sample size is 1,320 and I am thinking of applying Central Limit theory.
Thanking you for your anticipated answers.
Dimeji
Relevant answer