Science topic
Multiple Linear Regression - Science topic
Explore the latest questions and answers in Multiple Linear Regression, and find Multiple Linear Regression experts.
Questions related to Multiple Linear Regression
If I select the learning ecosystem as the independent variable, it encompasses the learning environment, teaching environment, technological environment, etc. Now, Should I use simple linear regression or multiple linear regression to examine the impact of the learning ecosystem on the dependent variable, for example, students' achievement?
I conducted a multiple linear regression using gretl for my exploratory research. The data was from published reports from 2013 to 2021, as these were the periods where all the yearly data was recorded. The results show that two of my variables are significant at 10% and have an r-square value of 0.664428, but the overall model fit comes to F(3,5) = 3.299978 and p-value (F) = 0.115750. So, can I go forward with this result or not and how can I justify this result?
Do you:
1) use G*Power and choose an effect size based on your best guess/ effect size from literature
2) follow Green (1991) rule of thumb (e.g. N > 50 + 8m , or N > 104 + k)
3) apply Maxwell's (2000) rule of thumb
4) Any other methods?
I generated six eqautions, 3 for soaked CBR, 3 for unsoaked CBR. How will I know the best two equations to choose one from h CBR from the equations generated? Thank you.
Three soil samples were used for the study and six equations were generated from the model which comprises of three for unsoaked CBR and the other three for soaked CBR. How would I know the best model for the unsoaked and soaked CBR? Thank you Sirs and Mas.
I ran multiple linear regression in SPSS and it turned out that one of the variables was not statistically significant.
multiple regression equation formula, if X1 not significant at (P<0.05), Can this variable be entered into the equation?
Y = A+B1X1+B2X2,......
I am working on predicting the labor force participation rate (dependent) by countries using multiple linear regression. I'm not sure if it is a count or continuous. The labor force participation rate is computed as (Labor Force ÷ Civilian Noninstitutional Population) x 100.
Hello !!!
Since simple multiple linear regression doesn't take into account the within-subject model, nor the fact that my dependent variable is an ordinal variable, I need to control for individual heterogeinity. For this, it's possible to run a regression model with clustered standard errors to take into account the pattern within the subject.
Can anyone explain how to run a regression model with clustered standard errors on ibm spss?
Explanation of my survey:
112 participants indicated on a scale of 1 to 7 their level of agreement with different statements after reading a scenario. 7 Likert scale (from strongly disagree to strongly agree)
The statements measured my dependent variables: work motivation, bonus satisfaction, collaboration and help. There were a total of 7 different scenarios. I also control for gender, age, and status ( employed, self-employed, student, unemployed, retired, other).
My aim is to see how the different scenarios affect the dependent variables.
My thesis supervisor advised me to take into account clustered standard errors in my regression model, but I have no idea how to do this on spss. I can't find the right test and command to do this.
Could someone help me?
Thanks in advance,
Best regards
Hello everyone,
I am currently working on my thesis and due to the normality assumption being violated, I am performing bootstrapping for my two simple linear regressions and my multiple linear regression.
However I do have difficulties reporting my findings (APA 7 style but also what to report in general). I am using SPSS and the model summary table containing r squared (among others) is not bootstrapped, so can I report this in my thesis or leave it out?
Additionally, the bootstrapped coefficients table contains the confidence interval and a p-value but no degrees of freedom or t-value. What can I report here? So far, I only reported the BcA confidence intervals but I am sure there are other values I can discuss?
Does somebody know what can be reported in a bootstrapped linear regression or maybe knows of a paper that uses this method for inspiration?
Any help is greatly appreciated.
Thank you!!!
I am having trouble differentiating between a random effects model and a linear mixed effects model. I am currently using this model https://bashtage.github.io/linearmodels/panel/panel/linearmodels.panel.model.RandomEffects.html#
for my research. Can somebody tell me if this is a random effects model or a linear mixed effects model and what are the differences between the two models?
Two independent variables
1. Attachment style
1 to 5 likert scale
Strongly disagree to Strongly agree
2. Family functioning
1 to 4 likert scale
Strongly agree to strongly disagree
3. Childhood trauma
Mediating variable
Response Options:
1 to 5
never true - 1
rarely true - 2
sometimes true - 3
often true- 4
very often true - 5
4. Personality Disorder traits
Dependent variable
Response options
0 to 3
Very false or often false 0
Sometimes or somewhat false 1
Sometimes or somewhat true 2
Very true or often true
Please suggest
Thanks
I have performed a Box-Cox transformation of a response variable in multiple linear regression in SPSS. As I understand, in order to correctly interpret and present the data (B, t, CI for B and p), I need to back-transform the data by applying the formula for Box-Cox back-transformation to B, t, CI for B. But the formula specifies lambda and I can't find it anywhere in SPSS. Could you please tell me where I can find the lambda to apply this formula for back transformation?
I am using bootstrapping technique for hypothesis testing as it has been done in Conditional Process Analysis by Hayes. But instead of a single-dimensional construct, I have a two-dimensional construct for my independent variable. Therefore, instead of bootstrapping a number, I need to bootstrap a pair of numbers, or we shall say a vector. I can do it mathematically, but I wonder if there is any previous research on bootstrapping a vector (instead of a single number)?
I would be grateful if you can refer me to existing papers on this topic or give me some keywords on this statistical question.
Thank you.
According to these results, Is regression analysis significant?
Analysis of Variance
- SS = 1904905
- DF = 8
- MS = 238113
- F (DFn, DFd)= F (8, 17) = 1.353
- P value P=0.2843
Goodness of Fit
- Degrees of Freedom = 17
- Multiple R = 0.6237
- R squared = 0.389
- Adjusted R squared = 0.1015
I want to ask you a question about data analysis in psychology. I have two independent variables, one is group (between subjects), one is age (continuous data), and the dependent variable is a 6-point Likert score. I intend to use regression for data analysis, with three questions:
1. Should the ID of the subjects (the number of each subject) be included in the model as a random variable? If it is included, it should be Linear Mixed Model (LMM), if it is not included, it should be Multiple Linear Regression, right?
2. In the case of multiple linear regression, should I directly build the full model and see the influence of each independent variable, or should I build the full model and compare with the zero model, and then analyze with the method of step by step elimination?
3. When I do my analysis, do I need to centralize both age (continuous variable) and rating, or do I just need to centralize age?
What does the unstandardized regression coefficient in simple linear regression means?
Where as in multiple linear regression Unstandardized regression coefficients tell how much change in Y is predicted to occur per unit change in that independent variable (X), *when all other IVs are held constant*. But my question is in simple linear regression we have only one independent variable so how should I interpret it?
Although it is true that the number of independent variables influences multiple linear regression, or the time horizon, among other factors, what do you think are the statistical tests that must accompany a multiple linear regression to be considered rigorous? What statistical program do you consider to be the best and/or most suitable to process it? Finally, what additional considerations should be taken in methodological terms?
Hello!
I ran two models (linear regression and multiple linear regression) to test a Hypothesis for my research project. The first model tested the IV with the DV. For the second model I added control variables (gender and tenure). Tenure is statistically significant. However, I do not know how to interpreted this result. Can anyone please give some guidance to report this result?
H: career development opportunities positively impact job satisfaction.
Results:
Model 1: Career Development Opportunities coefficient 0.575; t: 7.592; p-value <.001
Model 2: Career Development Opportunities coefficient 0.589; t: 7.785; p-value <.001
Tenure: coefficient 0.187; t: 2.103; p-value <.001; Male -0.411; t: -0.987; p-value 0.326
Thanks!
I am supposed to create a fake data set with 4 predictors that yields two strong significant relationships, 1 weak significant relationship, 1 non-significant relationship, and a significant interaction.
I have some materials but I am at a total loss at how to do this.
I have created an excel book with the four variables and generated random numbers using the RANDBETWEEN function and have imported that data to JASP but from there it doesn't matter how many times I run it, I can't get the results I need.
Does anyone have any suggestions?
I was doing a regression on the relationship between time living in the rural and depression, and when I included time living in the regression as an overall variable, the p was significant and the standard coefficient showed a large negative number.
However, when I divide the time living in the rural more finely into "residency duration from 0 to 18 years of age", "residency duration from 18 to 29 years of age", "residency duration from 29 to 45 years of age", etc., there is only a p-value of "residency duration from 0 to 18 years of age" is significant (p=0.034), and the standard coefficient is only a small negative number.
What I am confused about is how to draw conclusions. In this case, should I draw conclusions: 1. The longer the time of living in the rural, the lighter the depression, and living in the rural at the age of 0-18 can significantly reduce depression.
Or: 2, only living in the rural when 0-18 years old can reduce depression.
Similarly, when all variables are insignificant after division, then my conclusion should be
1, the longer the time of living in the countryside, the lighter the depression.
Or 2, time spent living in the country has absolutely nothing to do with depression?
Hello everyone!
I need your help with some work I am doing.
Some context first:
I am writing a dissertation for my master. The topic is about perceived trust in Smart Home technology. I launched a survey with a closed ended questions for demographic data, and likert scale that asks 8 Questions on a scale of 1 to 5. I gathered 159 responses in total.
The 8 Questions in ther likert scale are actually 4 different dependent variables. Q1/Q2 make dependent variable1, Q3/Q4 dependent variable 2 etc.
Since it's a likert scale the data is not an interval, so what I did is that I took the sum of Q1 and Q2 and divided it by 2, which gave me a mean. This mean is one of the 4 dependent variables. I did this an additional 3 times for the other 3.
The idea is to test each one of these dependent variables and see if they can be predicted with the independent variables (and control variables) that I have ( age, gender, educational attainment, household size and income).
For that I read that a multiple linear regression would be enough. So I started reading about that method and I saw that there were some assumptions that needed to be met before I could use that method. For normality (3 of the 4dependent variables were normally distributed, but the last one had was not quite normally distributed. Secondly, it seems that testing the the four variables for linearity resulted in all of them not being linear.
Now I need to start the analysis part of my dissertation but I have no clue wich method I should use since the assumptions of the multiple linear regression are not met.
I know about non-parametric tests, but I can't find anything non-parametric alternative for the multiple linear regression.
If you need more info about the variables etc let me know, I will provide them!
Thanks for your help and time.
Hi!
I'm performing a multiple linear regression with bootstrapping, but I'm also adding my independent variables in steps (hierarchical). So I have age and sex (demographics) in my first model, then add my independent variable for the second model. I now get results for the linear regression, as well as the bootstrap for coefficients. For my interpretation, do I still use the R^2 and F of the original linear regression, in addition to the B from the bootstrap? Do I also report on the confidence intervals, and if so, how? I've only had the basics of statistics during the course of my study so I'm really new to all of this; please feel free to correct any of my mistakes with this analysis!
Salam,
I'm asking If my data is not time series data, then the stationarity is (or is not )a relevant concern for fitting a multiple linear regression model? and if so, what makes differences with time series data?
Thank you
I'm doing a multiple linear regression where some of the independent variables are normally distributed whereas others aren't. The normal P-P plots of regression seems appropriate as the plots are in line. I have 84 participants in total, is that enough to go ahead with linear regression without assumption of normality being met?
I encountered a problem while performing multiple linear regression and ols single factor regression. "There are two factors with low coefficients when performing single factor regression, but when performing multiple factor regression, the coefficients of these two factors are also high.". After removing one factor and performing multifactor regression, the coefficient of the other factor also decreased significantly. But. These two factors have passed the collinearity test and are not collinear. Therefore, why is there such a result? Are these two factors good at fitting the equation?
I have gathered data from 60 companies, across 5 years.
I want to do a Multiple Linear Regression Analysis with some variables, from 2017 to 2021.
1) Should I do the average of the 5 years for each variable for each company, and do the regression, having 60 observations, or
2) Should I threat the data as individual points, meaning observations = n*5 = 300
How do I display that information (colums and rows), on the excel sheet (if option 2)?
Thanks a lot
optimisation et prédiction du ratio KWH/m3 dans une station de pompage par la modélisation mathématique régression linéaire multiple RLM et réseaux de neurones artificiels RNA
I generated a scatterplot for Multiple Linear Regression analysis using Sigmaplot software and I want to display the regression equation. But, I couldn't able to do it. Can anybody suggest to me how to display the equation on sigma plot.
I am conducting a meta-analysis on a research study and will need to build a model for my data to conduct a multiple linear regression analysis. Can anyone provide a sample model or step-by-step approach to building a model? Thanks.
I have clinical parameters, and I need to find the relationship between the variables; any Ideas or suggestions on how to do the calculation?
I have fitted a multiple linear regression model with three continuous independent variables and one independent categorical variable. How to visualize or display the fit?
Thanking you in advance
The correlation coefficient between one of the independent variables and the dependent variable has a p value greater that the alpha value of 0.05
I am running a multiple linear regression between demographics and performance on a measure. However, despite all the other assumptions being met, linearity hasn't (example image attached). I have applied a log10 transformation to the data and that hasn't helped. Are there any other corrections that can be applied?
To be more precise, my dependent variable was the mental well-being of students. The first analysis was chi-square (mental well-being x demographic variable), hence I treated the dv as categorical. Then, in order to find the influence of mental well-being on my independent variable, I treated the dv as a continuous variable so that I can analyse it using multiple regression.
Is it appropriate and acceptable? and is there any previous study that did the same thing?
Need some advice from all of you here. Thank you so much
Hi, I am new here but really hope someone can help.
I ran a hierarchical multiple linear regression with 3 predictors in the first block and 1 in the second block. In the first model, only one of my predictors was significant, but I only included it to control for it as I expected it would be highly correlated to the DV. In my second model, one of the predictors that was non-significant in the first model is now significant. Can anyone explain what that means and how I can discuss those results? It would be great if you could also point me towards books or papers that explain this.
Thanks,
Charlenne
Hello everyone,
I would like to do a multiple regression with 4 independent variables and 1 dependent variable. Also i have a dichotomous moderator "gender" which is split in female = 1 and male = 2.
How do i test the moderator with SPSS to see if it is linear?
I have already checked the assupmptions of the multiple linear regression with the dependent variables and independent variable using partial regression plots, But how can i check the dichotomous moderator if it is linear?
Thanks in advance!
My dataset is not very big (between 100-200). the residuals are not normally distributed. so:
1. Is there any other statistical method similar to multiple linear regression but suitable for this case?
2. If not, what can be the solution?
Thank you
When performing sensitivity analysis of the activated sludge model, multiple regression analysis of the different parameters (variables) with the model output is required. However, the model output is also a function of time, so I am confused how to implement a multiple linear regression of the model output with the parameters (variables) in MATLAB to find the linear regression coefficients.
Hello dear community,
I have a question regarding multiple linear regression (MLR), moderation analysis and median splits (dichotomization). For the context: I have a dichotomous independent variable OKO, continuous moderator KS and die Independent variable IPL. The question that I am asking myself is if I should dichotomize the moderator variable KS. The reason I am asking that myself are the following results.
Regressing IPL on OKO and OKO*KS (KS as continuous variable) yields the unstandardized regression coefficients:
OKO - 2.017
KS - 0.2189
OKO*KS 0.6475
=> meaning that KS dampens the negativ relation between OKO and IPL.
However, if I include the dichotomous variant of KS (KS_M) into the regression instead of KS (continuous variable) I get the following unstandardized regression coefficients:
OKO 0.272
KS_M -1.312
OKO*KS_M 1.820
=> meaning that KS amplifies the positive relation between OKO And IPL.
Can someone explain to me why I get contrary results?
THANK YOU
For my data analysis I am conducting a multiple linear regression model. Currently I am testing the following assumptions:
1. Normality of residual errors
2. Multicollinearity
3. Homoskedasticity
Since the normality of residual errors was violated, I transformed my dependent variable into a log variable. However, I am wondering whether I should now continue testing multicollinearity and homoskedasticity with this new log variable or still use the variable before data transformation? Moreover, when writing descriptives & correlation matrix should I include the variable before data transformation or the log?
Hopefully someone can help me! :)
My data analyst has used a concept called "pre-hypothesis"; it isn't equivalent to the hypothesis equation, but it is for checking Durbin Watson.
My problem is that I haven't found any source that has used the same concept (i.e., pre-hypothesis) for Multiple Linear Regression in general and Durbin Watson in particular.
I would be more than grateful if anyone could provide me with:
1. an explanation of "pre-hypothesis" in this concept, and if that equals "assumption"; and
2. a source in which "pre-hypothesis" is used and explained.
Hello, I'm a student research looking to apply a multiple linear regression to my data set of all continuous values. My question is mainly about a step-by-step idea of how to do this, as I believe I understand the general gist of it all, but not quite the whole picture.
To start, it is my understanding you plot a scatterplot of your independent vs dependent data and your independent vs independent data. This helps determines if there's a linear relationship between the variables, and if there's any collinearity between your independent variables. In doing so, after this step I can eliminate the independent variables that either have a strong collinearity or no relationship with the dependent variable.
After the initial screen, I am under the impression I can run a step-wise(or forward/backward) multiple linear regression slowly adding in the variables that fits the BEST MODEL. Is that a correct way to look at it?
Hi! I am trying to run the following multiple linear regression in R:
r ~ condition (1 = Control, 2 = Active Control, 3 = Treatment, 4 = Importance Treatment) + type (0 = false, 1 = true) + age (13, 14, 15, adult) + domain (1 = eco, 2 = health, 3 = society, 4 = culture)
r is the intention to share a certain headline for a certain participant (initially given on a 6-points scale but the score being transformed into a variable comprised between 0 and 1) , where participants are randomly assigned to 4 conditions, and we want to test the respective effect of 2 treatment conditions vs 2 controls on the intention to share false headlines (we predict it will reduce the sharing of fake news, without impacting the sharing of real news), knowing that in all the conditions the main task consists in assessing the intention to share 24 headlines successively presented to the participant, half false half true (so we also want to know the effect of the "type" of the headline, being true or false; and eventually the effect of its category - within the 2 sets of headlines, we always have 1/4 of headlines with an economic subject, 1/4 on health, 1/4 on "society", and 1/4 on culture, although it is less important); and finally we are testing different age groups (13, 14, 15 years old, as well as an "adult" group with participants aged from 25 to 36 years old pooled together), to know if the effect predicted varies with age etc.
The main hypothesis of our study are :
-that treatments will improve discernment, defined as the difference between the intentions (r > 0.5) to share true news and the intentions to share fake news (ie it will really improve the quality of sharing, and not merely causing general skepticism),
-that this effect will be higher for headlines perceived as the most inaccurate (there will be an analysis on headlines as well, based on pretest), as we suppose the effect of the treatment works by refocusing the attention of the participant on the accuracy criterion, hence being greater for headlines that were generally (we will calculate a mean perceived accuracy for each headline across participants) perceived as the least accurate ones, and that will be consequently the ones for which sharing intentions will drop the most thanks to the treatment
-We have no particular prediction concerning age, which is precisely the novelty of the study (the literature review concerning adolescents' ability to evaluate fake news and their sharing online leaving the door open to quite different scenari)
-and no prediction for domain as well (we expect it won't play a role as the headlines have been chosen to be quite similar in tone, whatever the category)
Also, I need to cluster by participant (since there are repeated measures for each participants) and headline (multiple ratings for each headline).
I thought of using lm_robust but I don't know if we can put 2 clusters directly?
I also wonder what is the simplest way to check for potential effect of other secondary measures (like scores on a Cognitive Reflexion Task, gender, CSP appartenance etc): do I have to do regression testing all simple effects, interaction etc, or can I just add them to the global formula?
Thanks in advance!
Through a survey, I have measured the Big Five Personality traits and Task Performance.
Now I'm about to do the regression analysis, however, I'm not sure if I should do a simple linear or multiple linear regression.
In the case of openness when I run a simple linear regression (DV: task performance IV: openness) my results are:
Sign: .001
Unstandardized B= .494
so the results are positive & significant
But when I run multiple linear regression (DV: task performance, IV: openness, conscientiousness, agreeableness, extraversion, neuroticism) my results for example for openness are:
Sign: .311
Unstandardized B= .110
so the results are positive & not significant
Now, I'm confused about which one should I use and why.
I want to know how age, obesity (0 for no and 1 for yes), smoking status (0 for no smoking and 1 for having smoking) affect the serum Vitamin D level using regression with SPSS.
The best-fit curve for age reported: Linear (R2=0.108), Quadratic (R2=0.109), Cubic (R2=0.106)
So I assumed that the E(VitaminD)= b0 + b1*Age +b2*Age^2
But if want to put this into a multiple regression analysis with age, obesity, and smoking status as independent variables, which one should I use? Age+obesity+smokingStatus or Age^2+obesity+smokingStatus?
I have a query regarding which type of regression analysis to use for my study. I have used a scale (dependent variable) that contains 9-items and each item is marked on a 5-point Likert scale. Scores of each item are summed and ranges from 9 to 45. Higher score indicates the respondent has more characteristics of that construct.
Similarly, there are two independent variable. One IV has 20-items and each item is marked on a 4-point Likert scale. Score ranges from 20 to 80. Second IV has 7-items and each item is marked on a 5-point Likert scale. Score ranges from 7 to 28.
The reviewer has suggested me to use non-parametric tests since my data is ordinal. However, previous studies have used Multiple Linear Regression using similar types of constructs.
Which type of regression analysis is appropriate in this case - ordinal regression or multiple linear regression? Any literature explaining this would be highly useful.
I have been working with a GAM model with numerous features(>10). Although I have tuned it to satisfaction in my business application, I was wondering what is the correct way to fine tune a GAM model. i.e. if there is any specific way to tune the regularizers and the number of splines; and if there is a way to say which model is accurate.
The question is actually coming from the point that on different level of tuning and regularization, we can reduce the variability of the effect of a specific variable i.e. reduce the number of ups and downs in the transformed variable and so on. So I don't understand at this point that what model represents the objective truth and which one doesn't; since other variables end up influencing the single transformed variables too.
Hi everyone, I am working with a disjunctive model for decision-making. But here I'm a bit confused how can I figure out Co (Co is a constant variable above the largest Xi to ensure Y
will not be infinite) value when doing multiple regression analysis in SPSS? Does it have any contact value?
I seem to have a hard time with the statistics for linear regression. I have been scrolling on the Internet, but did not find an answer.
I am testing the assumptions for linear regression, of which one is homoscedasticity. My data however shows a heteroscedastic pattern in the scatterplot. But how do I check if this is correct? Whether it is actually heteroscedastic...
Could I still perform the linear regression even though my data seems to be heteroscedastic?
I have tried transforming my DV with ln, lg10 and sqrt but the heteroscedasticity remains visible.
there are different models on regression used in wind speed forecasting. I need some relevant papers and reading sources and writings published on Multiple linear regression , moving average regression and K-nearest neighbour classification and regression methodology to develop a good understanding.
I have 3 DVs and 5 predictor variables (technically, only 1 IV and the other 4 are controls).
I ran it on Stata and all seems to have worked (since mathematically, all predictor variables are treated the same way), but technically I only identify one of my predictor variables as the IV of interest. I would rather stay away from SEM. Thank you so much!
Hello everyone!
Can we report the results of independent sample t-tests and multiple linear regression analysis in one table? If yes, what information should be put in that table? Thank you very much.
I want to study the risk factor on a dependent variable which is percentage of lame cows on the farm-level,using a multiple linear regression on spss. Is that possible?
Hello my friends - I have a set of independent variables and the Likert scale was used on them and I have one dependent variable and the Likert scale was used as well. I made the analysis and I want to be sure that I'm doing this right - how I can use control variables such as age, gender, work experience and education level as control variables to measure their effect on the relationship between the independent variables and the dependent variable? Please give me one example. Thanks
I'm interested to compare multiple linear regression and artificial neural networks in predicting the production potential of animals using certain dependent variables. However, i have obtained negative R squared values at ceratin model architecture. please explain the reason for neagative prediciton accuracy or R square values.
A manpower deployment model built using the Multiple Linear Regression Method got a higher P-value (above 0.05) for the final model. This research is basically based on an empirical study. The reason for the higher P-value was the problem with the data we received for the independent variables. ( Not highly accurate). The data sample is 240. My final question is concluding this research final model with higher P values reasonable? We found out other several reasons related to the data errors (organizational data entry problems). We have good reasons to show the overall model. Is it okay to conclude in that way?
**The overall model has one dependent variable and two independent variables (two independent variables also shows a high correlation between each other)**
Hi,
I have 2 categorical and 1 continuous predictors (3 predictors in total), and 1 continuous dependent variable. The 2 categorical variables have 3 and 2 levels, respectively, and I have only dummy coded the variable with 3 levels, but directly assigned 0 and 1 to the variable with only 2 levels (my understanding is that if the categorical variable has only 2 levels, dummy coding is not necessary?).
In this case, how do I do and interpret the assumption tests of multicollinearity, linearity and homoscedasticity for multiple linear regression in SPSS?
Thank you!
- When it comes to multiple linear regression. If we take the dependent variable as an example time, then the rest independent variables should be in time? Here this MLR uses for model building-related task completion time. So whatever independent variables we get should be in a timely manner? cant we get the number of people as an independent variable?
r2 for MLR = 10-20 % approx at various train:test ratio
but for ANN it is 1-4%.
why?
thanks in advance
Hi everyone! I'm running multiple regression models on incomplete data using R, and I applied the MICE algorithm to deal with the missing data. I've been able to get the pooled coefficients (B, t-tests, p-values) with no effort using the available scripts, but I couldn't find a way to obtain "goodness of fit" measures (like adjusted R squared and F) for the pooled data (not individual imputations or original data, but pooled values). I got the same problem using multiple imputation in SPSS.
Thank you very much for your attention, any help will be greatly appreciated!
Dear colleagues,
I want to raise this question to the community. I know that in the ANOVA test where we compare means, for example, in different groups we will have problems with multiple comparisons since we only know the F-test results but group-to-group difference is unknown. Therefore, we would choose multiple comparison correction methods, such as Tukey's, Scheffe's, or Bonferroni, to adjust the p-value and explore each pair difference and significance.
However, while I am conducting multiple linear regression analysis, by implying stepwise (backward selection) method, that means I have an DV (QoL scores in eight different domains and two summary components; scores is continuous; the Rand-36 or SF-36), and a group of potential and associated IVs (factors; categorical continuous type) .
For the models that I get from the auto-selection process (stepwise and backward), I would like to ask will there be problems about multiple comparison? why? and what would be the recommended solutions to the kind of multiple comparisons problems in this multiple linear regression model building? Thank you!
I am looking at whether stress levels reduce from time point 1 to time point 2 when engaging in recommended resources.
DV's - stress levels from time point 1 and time point 2
IV's - engaged in resource 1 and resource 2
Regarding either a linear or a machine learning based regression analysis, how should we perform the normality test of the model? should we consider all data or just the training or testing dataset? I would be grateful if anyone could describe it with more details!
Hello,
I am using Multiple Linear Regression model while analyzing the UTAUT2 model. However, the Durbin Watson value appeared as 2.080. How to check if the survey data has any negative autocorrelation as the result is <2? The sample size is 209.
Checked the DW model table from Savin and White (Durbin-Watson Statistic: 1 Per Cent Significance Points of dL and dU) and it lies in dU (k=10). Can anyone help explain my situation?
If the negative autocorrelation exists, how to avoid this without increasing the sample size?
Hey all,
For my master's thesis I am conducting research to determine the influence of certain factors (ghost games, lack of fan interaction, esports) on fan loyalty. For my statistical analyses, I will first conduct confirmatory factor analysis to validate which items (f.e., purchased merchandise) belong to what latent factor (f.e., behavioral loyalty).
However, I am unsure for my next step. Can I use multiple lineair regression with my latent variables to identify the relationship between the factors and loyalty. The data is collected through a survey of mainly 7-point Likert scale questions. Can I use lineair regression or is ordinal regression a must with Likert scale data?
Thanks in advance for answering!
my research is accuracy of ECG among doctors
dependent - ECG score (normally distributed)
independent - sociodemographic characteristic ( not normaly distributed)
which do i use for one to one correlation?
pearson/ spearman OR single linear regression?
what is the differences
i was told it was the same
but yet the results are different
i want to proceed with multiple linear regression
Hi there, essentially, I have collected data for my study that includes injury frequency of particular injuries i.e., Sprained ankle x 5. I also have the experience of each individual i.e., they have played for 5 years. Due to the varying years played, in order to standardise the data to make it comparable i divided injury frequency by the total amount of years each person has played e.g. 5 injuries across 5 years would result in a standardised frequency of 1.
My question resides in, when I want to look at the effect of experience on my standardised injury frequency, does the fact that I have used experience as my standardising factor affect the results published. The only reason I ask is due to my results of a Multiple linear regression (there are other variables) showing that experience has a significant negative effect, which logically shouldn't happen as the more exposure you have, the increased probability of sustaining an injury would increase.
Thanks in advance
George
As part of my research, I have to analyse 10 years time series financial data of four public limited companies through Multiple Linear Regression. First I analysed each company separately using Regression. The adjusted R square value is above 95% and VIF is well within limits but Durbin-watson measure shows either 2.4, 2.6 or 1.1 etc which signifies either positive or negative auto correlation. Then I tried the model with the combined data of all the four companies. This results in very less adjusted R squared value (35%) and again a positive auto correlation of 0.94 Durbin-Watson . As I am trying DuPont Analysis where the dependent variable is Return on Equity and independent variables are Net Profit Margin, Total Asset Turnover and Equity Multiplier, which are fixed, I cannot change the independent variables to reduce the effect of auto correlation. Please suggest me what to do.
If dependent variable is continuous. Is it justifiable to use both categorical (region, gender, history of disease, etc.) and continuous variables (direct cost, indirect cost) as Independent variable in multiple linear regression model in epidemiological studies.
Hi everyone,
My results show for some models that the model itself is not significant, but some independent variables within the model are significant (including the constant) in SPSS.
I was wondering how I should interpret these results? What can I say/conclude and what not?
Thank you in advance,
Minke
Dear All,
I am working on a data having cost of care as DV. This is a genuinely skewed data reflecting the socioeconomic gap and therefore healthcare financing gap among population of a developing country. Because of this skewness, my data violated normality assumption and therefore was reported using median and IQR. But I will like to analyze predictors of cost of care among these patients.
I need to know if I can go ahead and use MLR or are there alternatives?
The sample size is 1,320 and I am thinking of applying Central Limit theory.
Thanking you for your anticipated answers.
Dimeji