Science method
Regression Analysis - Science method
Procedures for finding the mathematical function which best describes the relationship between a dependent variable and one or more independent variables. In linear regression (see LINEAR MODELS) the relationship is constrained to be a straight line and LEAST-SQUARES ANALYSIS is used to determine the best fit. In logistic regression (see LOGISTIC MODELS) the dependent variable is qualitative rather than continuously variable and LIKELIHOOD FUNCTIONS are used to find the best relationship. In multiple regression, the dependent variable is considered to depend on more than a single independent variable.
Questions related to Regression Analysis
In Brewer, K.R.W.(2002), Combined Survey Sampling Inference: Weighing Basu's Elephants, Arnold: London and Oxford University Press, Ken Brewer proved not only that heteroscedasticity is the norm for business populations when using regression, but he also showed the range of values possible for the coefficient of heteroscedasticity. I discussed this in "Essential Heteroscedasticity," https://www.researchgate.net/publication/320853387_Essential_Heteroscedasticity, and further developed an explanation for the upper bound.
Then in an article in the Pakistan Journal of Statistics (PJS), "When Would Heteroscedasticity in Regression Occur, https://www.researchgate.net/publication/354854317_WHEN_WOULD_HETEROSCEDASTICITY_IN_REGRESSION_OCCUR, I discussed why this might sometimes not seem to be the case, but argued that homoscedastic regression was artificial, as can be seen from my abstract for that article. That article was cited by other authors in another article, an extraction of which was sent to me by ResearchGate, and it seemed to me to incorrectly say that I supported OLS regression. However, the abstract for that paper is available on ResearchGate, and it makes clear that they are pointing out problems with OLS regression.
Notice, from "Essential Heteroscedasticity" linked above, that a larger predicted-value as a size measure, where simply x will do for a ratio model as bx still gives the same relative sizes, means a larger sigma for the residuals, and thus we have the term "essential heteroscedasticity." This is important for finite population sampling.
So, weighted least squares (WLS) regression should generally be the case, not OLS regression. Thus OLS regression really is not "ordinary." The abstract for my PJS article supports this. (Generalized least squares (GLS) regression may even be needed, especially for time series applications.)
I would like to perform a regression analyses with 3 categorical predictors (each with two levels), one continuous predictor, and one criterion variable in SPSS. I want to be able to look at the simple effects, so I was planning to use PROCESS. However, there is only room for one X variable in the dialogue box. How can I enter all 4 predictor variables into the model and get statistics for all interaction terms, and also obtain the simple slopes?
Let variable x as compression index Cc and variable y as plasticity index IP. I have found the relationship in the form of (x/y)-x. Is the (x/y)-x relationship correct for the regression analysis? How do you explain the importance of such a correlation in regression analysis? Please try to explain why authors used this kind of relationship instead of x-y?
Dear Researchers:
when we do the regression analysis by using SPSS, when we want to measure a specific variable, some researchers take the average of items under each measurement while some others add the value of each items ? Which one is more reliable? which one produces more better results ?
Thanks in advance
Hypothetical:
I conduct a survey and ask the respondents to answer a yes/no question "do you consider yourself to be an alcoholic". Their responses correlates with high/low averages on another measure "Qualities of Alcoholism" (QoA, e.g. yes=high, no=low). Most of the responses are in the 'no' camp. My questions are:
- Should I include everyone in the sample (yes and no's) if I'm comparing QoA to another measure (e.g. 'Effects of Drinking' scale) or should I just section out the 'yes' cases?
- What would the effects be on regression analyses if I took either approach?
Confused with the regression analysis...need more clarity on what kind of regression analysis should be used and is there any database that can create prognostic model or using R is the only option
Hi,
I am trying to do an Ordinal Logistic Regression (OLR) in R since this is the regression analysis that I need to use for my research.
I followed the tutorial video I found in YouTube in setting up the two sample models. Here are the 2 codes for the models:
modelnull <- clm(as.factor(PRODPRESENT)~1,
data = Ordinaldf,
link = "logit")
model1 <- clm(as.factor(PRODPRESENT)~Age+ as.factor(Gender)+Civil Status,
data = Ordinaldf,
link = "logit")
Next, I followed what the instructor did in doing anova and an error message prompted. It says, Error in UseMethod("anova") : no applicable method for 'anova' applied to an object of class "c('double', 'numeric')"
Is there something wrong in setting up the two sample models, hence, an error message in prompting? What needs to be done to fix the error?
Please help.
Thank you in advance.
I want to conduct a multivariable linear regression analysis with 15 independent variables, however when I was checking the assumptions not a single IV correlates with the dependent variable. Can I still conduct a linear regression analysis or should I do something else?
Are there new projects and studies considering recursive approaches, in energy transition and resources, being empirically tested with regression analysis and models?
I have a mixed effect model, with two random effect variables. I wanted to rank the relative importance of the variables. The relimpo package doesn't work for mixed effect model. I am interested in the fixed effect variables anyway so will it be okay if I only take the fixed variables and use relimp? Or use weighted Akaike for synthetic models with alternatively missing the variables?
which one is more acceptable?
I want to perform an analysis using Poisson/negative binomial regression. There are 90 observations and about 20 variables(predictors). I read somewhere that there should be at least 10 observations per variable. So to prevent overfitting, I have to remove some of them. What is the best way to do this? I tried "Boruta feature selection" and "stepwise AIC" but I'm not sure about the results.
Thanks!
which regression analysis will be applied if there are 7 nominal dependent variables(YES/NO categories) and 1 ordinal independent variable(3 categories: Low, normal, high)?
In the journal, "THE TRANSITION FROM GEL SEPARATORY SERUM TUBES TO LITHIUM HEPARIN GEL TUBES IN THE CLINICAL LABORATORY" by Oğuzhan Zengi, how do the results of Bland-Altman plots and regression analysis contribute to our understanding of the comparability between serum tubes and LIH tubes for different clinical chemistry and immunoassay tests, and what implications do these findings have for clinical practice?
It is known that we can use the regression analysis to limit the confounding variables affecting the main outcome. But what if the entire sample have a confounding variable affecting main outcome, will Regression Analysis still applicable and reliable ?
For example a study was done to investigate the role of certain intervention in cognitive impairment, the entire population included was old aged (more than 60 years old ), which means that the age here is a risk factor ( Co-variate ) in the entire sample, and it is well known that age is a significant independent risk factor of cognitive impairment
My question here is; Will the regression here of a real value ? Will it totally vanish the effect of age and got to us the clear effects the intervention on cognitive impairment ?
I want to examine the relationship between school grades and self-esteem and was planning to do a linear regression analysis.
Here's where my Problem is. I have three more variables: socioeconomic status, age and sex. I wanted to treat those as moderation variables, but I'm not sure if that's the best solution. Maybe a multiple regression analysis would be enough? Or should I control those variables?
Also if I'd go for a moderation analysis, how'd I go about analysing with SPSS? I can find a lot of videos about moderation analysis, but I can't seem to find cases with more than one moderator.
I've researched a lot already but can't seem to find an answer. Also my statistic skills aren't the best so maybe that's why.
I'd be really thankful for your input!
My question is looking at the influence of simulation on student attitudes. My professor would like me to do regression analysis, but he says to do two regressions. I have my pre-test data and post-test data the only other information I have is student college. What I found in my class materials seems to indicate that I can complete a regression using the post-test as my dependent variable and the pre-test as my independent variable in SPSS. How would I do another regression? Should I work in the colleges as another dependent variable and if so, do I do them as a group or do I need to create a variable for each college?
Dear all,
I am planning to conduct an experiment for 2 IVs (categorical variable - each IV has 2 categories) and 1 mediator (continuous variable - 7-point Likert scale) on an ordinal DV (6 categories). I understand that usually mediation analysis involves regression analysis to examine the indirect and direct effect of IV --> DV and mediator --> DV, and I will be able to use the PROCESS SPSS by Hayes (2013) to estimate the moderated mediation model. However, since it is a between subject design, I am not sure if I can separate the IVs when conducting the regression analysis.
I would deeply appreciate it if anyone can recommend tests and models I can use for this study, or have any resources that I may look into to better find a suitable test. Thank you very much!
How can decision trees improve regression analysis?
Dear Colleagues
Which the best type of regression analysis can be performed to test the affect the treatment on the single/multiple outcomes?
Dependent Variable is continous such as thiskness of Achilles tendon (in mm)
Independent variable is categorical (treatment/no treatment).
Best regards
In some researches, in addition to structural equation modeling, regression analysis or mediating variable tests are performed. Is this necessary?
Hi all,
We are working on EFA/CFA analysis on a dataset.
We choose 'randomly select the cases' in SPSS 'Select Case' tab. Our rational is that we'll use the samples purely for factor analysis. We will not do any regression analysis. So, a simple action of randomly split the sample is appropriate.
We wonder if we should split the sample by controlling a variable, such as gender. We are not sure what difference it will make. Is this action necessary for straight-forward factor analysis?
Many thanks.
What are the key considerations, methodologies, and interpretive techniques for correctly applying and interpreting regression analysis in quantitative research, and how do they compare in terms of their accuracy, reliability, and suitability for different research contexts?
According to these results, Is regression analysis significant?
Analysis of Variance
- SS = 1904905
- DF = 8
- MS = 238113
- F (DFn, DFd)= F (8, 17) = 1.353
- P value P=0.2843
Goodness of Fit
- Degrees of Freedom = 17
- Multiple R = 0.6237
- R squared = 0.389
- Adjusted R squared = 0.1015
I got 1.41 and 1.7 for two independence variables.
Whenever I like an article in which regression analysis is used, I ask the authors if they can share some raw (!) data, because I'm writing a book and software about this topic, and I want to include very diverse real examples.
But, to my disappointment, practically nobody even reacts! Why?
Are people affraid that a new light on their data might disrupt their conclusions?
I thought openness was considered a virtue in the world of science?
But if I want to see articles that include data, I have to dig in the very old ones!
What are your thoughts?
P.S.: I can still use simple datasets from physics to psychology, from chemistry to sociology, anything...(just 1 independent variable, preferably with information about the measurement imprecision). Of course I quote you as the source. Thanks in advance!
I need to test the relationship between two different variables. One of the variables is calculated as 1-5 points, the other as 1-7 points. Does having different scale scores cause an error in the correlation or regression analysis results? Can you recommend a publication on the subject?
It is possible to run a regression of both Seconday and primary data in the same model? I mean, when the dependent variable is primary data to be sourced via questionnaire and the Independent variable is secondary data to be gathered from published financial statements?
For Example: if the topic is Capital Budgeting moderator and shareholders wealth (SHW). Capital budgeting moderators is proxy by inflation , management attitude to risk, Economic condition and Political instability. while SHW is proxy by Market value, Profitability and Retained earnings.
A dummy variable is a variable that takes specific numbers. Values for different attributes. Full rank.
Basically I am looking at a dichotomous dependent variable and other variables which possibly predict it. The problem is that that odds ratio for each of these predictor variables change depending on how many of them I have added into the binary logistic regression analysis. When I just look at one or two of them in the regression analysis the odds ratio seems more accurate. Any advice would be much appreciated.
You can read my analysis about that
Since I found out that there is a correlation between Timeliness and Semantic Accuracy (I'm studying linked data quality dimensions assessment, trying to evaluate a dimension quality -in this case Timeliness- from another dimension (Semantic Accuracy)), I presumed that regression analysis is the next step in this matter.
-the Semantic accuracy formula I used is: msemTriple = |G ∧ S| / |G|
msemTriple measures the extent to which the triples in the repository G (the original LOD dataset) and in the gold standard S have the same values.
-the Timeliness formula I used is:
Timeliness((de)) = 1-max{1-Currency/Volatility,0}
where :
Currency((de)) = (1-(lastmodificationTime(de )-lastmodificationTime(pe ))/(currentTime-startTime))*Ratio (the Ratio measures the extent to which the triples in the the LOD dataset (in my case wikidata) and in the gold standard (wikipedia) have the same values.)
and
Volatility((de)) = (ExpiryTime(de )-InputTime(de ))/(ExpiryTime(pe )-InputTime(pe ) )
(de is the entity document of the datum in the linked data dataset and pe is the correspondent entity document in the gold standard).
NB: I worked on Covid-19 statistics per country as a dataset sample, precisely Number of cases, recoveries and deaths
this is my spss file: https://drive.google.com/file/d/1DqMqVv4JHPbo3-pAXmavuC91pMlImFlu/view?usp=drive_link
this is the output of my spss file: https://drive.google.com/file/d/1JxVf542Kq9KfxeWIqmm1deLfJv67HOUh/view?usp=drive_link
I have performed a hypothesis testing using the simple regression analysis model. What action must I take after testing the hypothesis?
I have a set of data with spatial map with census tract, nearest distances to close by hospitals and cities. I need advise on how I can process this data for regression analysis and generate maps for ARC GIS (to see correlation). If you have a guide, it will be great. Thanks
After collecting my data, I decide to test my hypothesis using regression analysis. I also recognized the fact that my data must meet the assumptions of the tool before I can use it. Therefore I will like to know if after testing two assumptions and realizing the data had met those assumptions, can I just run the analysis or I must test all the assumptions before?
In a psychology study of N = 149, I was testing for moderation using a three-step hierarchical regression analysis using SPSS. I had two independent variables, X1 and X2, an outcome variable, Y, and the moderator, M. Step 1 uses the variables X1, X2, the interaction X1X2, and 5 covariates. Step 2 adds M. Step 3 adds the interaction variables X1M and X2M.
In my collinearity statistics, VIF is all under 10 for Steps 1 & 2 (VIF of 6 is found between X2 and X1X2 in both steps). For Step 3, VIF is high for X1, X2, M, X1M, and X2M. When I go look at the collinearity diagnostics box, the variance proportions are high for the constant, X1, M, and X1M. I'm understanding that there is multicollinearity.
My question is, what does it mean when the constant shows a high VIF? What would it mean if only one predictor variable and the constant coefficient were collinear?
This question is concerned with understanding the degree and direction of association between two variables, and is often addressed using correlation or regression analysis.
I am using the spss 28 version and I want to know how do I run the regression analysis. I have one dependent variable- bmc and one independent variable- mvpa. I have two other variables age and height and i dont know how do i adjust it on spss. Are these independent variables too? There is no covariate box in the spss version which im using. I
Hello,
I'm working on a panel multiple regression, using R.
And I want to deal with outliers. Is there a predifined function to do that?
If yes would you please give me an example of how to use it
Of late, some journal editors are insistent on authors providing a justification for the ordering of the entering of the predictor variables in the hierarchical regression models.
a) Is there a particular way of ordering such variables in regression models?
b) Which are the statistical principles to guide the ordering of predictor variables in regression models?
c) Could someone suggest the literature or other bases for decisions regarding the ordering of predictor variables?
Greetings of peace!
My study is about the effect of servicescape on the quality perception and behavioral intentions
independent Variable-Under servicescape there are 4 indicators
Layout Accessibility - 10 items
Ambience condition- 3 items
Facility Aesthetics - 6 items
Facility cleanliness -4 items
Quality perception serve as mediator with 3 items
Dependent Variable-Behavioral intentions - 4 items
All were measured using Likert Scale (N = 400)
I tried Ordinal Regression Analysis but I don't know how to combine the items and the independent is ordinal. And the value of Pearson is <0.001 and the Deviance is 1.000.
I need to get the effect of individual indicators in servicescape on the quality perception and behavioral intentions.
Thank you in advance
The variable physical environment effect, is only a subset of the independent variable ( environmental factors) in my research, there are social and cultural environment effects as well. They are measured in my questionnaire with five questions and the responses are; ( never, rarely, often and always). The dependent variable, student performance, was also measured in the same format as the environmental factors(i.e with five questions and Never, rarely...being the responses). I have coded them into SPSS with the measure; Ordinal. I want to answer the research question; 1. How physical environment affect student performance? 2. How social environment affect student performance? 3. To what extent does cultural environment influence student performance? I've computed the composite score(mean) for the questions, can I use these scores in the ordinal regression analysis? Or is there any other way to compute the questions into a single variable, for both the independent and dependent variables?
Hi,
I want to predict the traffic vehicle count of different junctions in a city. Right now, I am modelling this problem as a regression problem. So, I am scaling the traffic volume (i.e count of vehicles) between 0 to 1 and using this scaled down attributes for Regression Analysis.
As a part of Regression Analysis, I am using LSTM, where I am using Mean Squared Error (MSE) as the loss function. I am converting the predicted and the actual output to original scale (by using `inverse_transform`) and then calculating the RMSE value.
But, as a result of regression, I am getting output variable in decimal (for example 520.4789), whereas the actual count is an integer ( for example 510 ).
Is there any way, where I will be predicting the output in an integer?
(i.e my model should predict 520 and I do not want to round off to the nearest integer )
If so, what loss function should I use?
If the value of correlation is insignificant or negligible. Whether We should run regression analysis or not. Obviously it will be insignificant, is it necessary to mention in article?
Correlation and regression analysis are part of descriptive or inferential statistics.
Hello researchers,
I am facing problem to do a regression analysis with three independent variables, one mediating variable, and one independent variable. How ca I do this in SPSS? Any one please can you help me?
When doing a regression analysis, the coefficients table in spss shows that my 3 main effects are significant. When I do a regression analysis for my 6 moderating effects, where I created interaction terms for, the coefficients table also shows these are significant. But when I do the regression analysis and include the 3 main effects and 6 moderating effects at the same time, none is significant. How should i interpret this? And how should I continue?
Examining some students on their final year projects defence. I discovered that a student had the Adjusted R² in the Regression analysis of her work to be greater than 99%. Could that be possible?
This question is concerned with determining whether two or more groups differ in some meaningful way on a particular variable or set of variables, and is often addressed using statistical tests such as t-tests, ANOVA, or regression analysis.
Can you give all the criteria to evaluate the forecasting performance of the regression estimators?
I am performing a cross-country regression analysis with a sample of 101 countries. Most of my variables are averages of annual data across a period of 7 years. Every one of my primary variables has data available in each of these 7 years. However, certain countries have data missing in certain years for variables used in my robustness checks.
How should I handle this missing data for each robustness variable? Here are a few ideas I have considered
A. Average data for each country, regardless of missing years
B. Exclude any country with any missing years from data for that respective variable
C. Exclude countries that are missing data up to a certain benchmark, perhaps removing countries that are missing more than 2 or 3 of the 7 years that are being averaged for that respective regressor
D. Only use robustness variables that have available data for every country in every year that is being averaged
Please offer the best solution and any other solutions that would be acceptable.
Dear fellows,
Maybe you have done interesting measurements to test some model?
I can always use such data to use as examples and tests for my regression analysis software, and it's a win-win, since I might give you a second opinion on your research.
It's important that I also get the imprecision (measurement error/confidence interval) on the independent and dependent variables. At this moment, my software only handles one of each, but I'm planning to expand it for more independent variables.
Thanks in advance!
Assumptions of multinomial and linear regression analysis?
In finding the correlation and regression of multivariable distribution what is the significance of R and R^2? What is the main relation between them?
Hello,
I am doing a multiple regression with 2 predictors. The predictors correlate moderately/strongly, r=0,45. When the first predictor is put in a regression analysis, on its own it explains 16,8 % of variance of the dependent variable. The second predictor on its own explains 17,5 % of variance of the dependant variable. When both predictors are put into regression analysis, the VIF=1,26, so multicollinearity should not be a problem. The predictors together explain 23,4 % of variance of the dependant variable.
First of all, I would like to ask, whether the change in explained variance from 16,8-17,5 % to 23,4% is a big change. More specifically, if the predictors are together better at predicting the dependant variable compared to the situation when there is only one predictor. Also, as the predictors correlate but VIF is okay, is it safe to say that they probably explain some same parts of variance in the dependant variable/ do the predictors explain little of unique variance?
I would like to create a factor of the interaction effect of two variables for regression analysis.
I was wondering how to create the factor.
I was thinking to multiply the scores of the two, but I would like to hear from other researchers. Thank you,
Suppose I want to predict energy consumption for my building using regression analysis. What factors should I consider including in my model, and how can I determine their relative importance?
i am using a fixed effect panel data with 100 observations (20 groups), 1 dependent and three independent variables. i would like to get a regression output from it. my question is it necessary to run any normality test and linearity test for panel data? and what difference would it make if i don't go for these tests?
situation: The moderating variable can explain up to 25 percent while the remaining 75 percent is explained by
other factors outside the model. What does this mean? OR would this mean, the moderating variable did not significantly moderate the relationship of the IV nad DV? Thank you to anyone who would respond!
Propensity score matching (PSM) and Endogenous Switching Regression (ESR) by full information maximum likelihood (FIML) are most commonly applied models in impact evaluation when there are no baseline data. Sometimes, it happens that the results from these two methods are different. In such cases, which one should be trusted the most because both models have their own drawbacks?
I would like to know if I am wrong by doing this. I made quartiles out of my independent variable and from that I made dummy variables. When I do linear regression I have to record the betas with 95%CI per quartile per model (I adjust my model 1 for age and sex). Can I enter all the dummies into the model at the same time or do I have to enter them separately (while also adjusting for age and sex for example)?
So far I entered all the dummies and adjusted for age and sex at the same time but now I wonder whether SPSS doesn't adjust for the second dummy variable and the third.. So I think I need to redo my calculations and just run my models with one dummy in each.
Thank you.
Hello everyone,
I am currently working on my thesis but I have encountered a problem and I am not sure how to solve it. I would like to measure the impact that ESG (Environmental, Social, Governance) has on financial performance (ROA, ROE) from 2016 to 2021. Some important details about my study:
- I would like to compare two samples of companies: One first group with ESG part of the DJ Sustainability Index (DJSI) and another group without ESG (no part of DJSI).
- I intend to analyze companies that have been part of the DJSI between 2016 and 2021. However, some companies don't have an ESG score (independent variable) for some years. Should I still collect information for my dependent variables for all the years? For example, company X has ESG scores for 2016 and 2017 only, would I need data for ROA and ROE for all the years or just for 2016 and 2017?
- Any other aspects I should consider?
Thanks!!
Hi,
I conducted a survey where all the question items corresponded to the components of each variable I wanted to measure. I consolidated these components following a literature review. I built the survey with 5-point Likert scale questions.
I'm measuring the impact of an independent variable on a dependent variable, and I am considering a linear regression analysis.
My question is, how can I combine all those components of independent variable into one? I have read that averaging is a frequently used method if all components measure the same but I am a bit worried that all components may not have the same weight or influence determining the outcome.
Would you have any recommendations on this topic? I'm happy to read any research articles
Thanks for your help
Hi,
I have a dependent variable that represents ridership on each street. The numbers are bucketed to the nearest 50 so the values are 50 , 100 , 150 .. and so fourth.
My independent variable also discontinuous 1, 2 , 3 ,4 .. etc, representing street continuity.
Would it be appropriate to execute linear regression analysis, to see if there is a correlation between these two variable ?
Note that I will execute the analysis on multiple city.
I'm working on the below topic for my master thesis.
“Investigating the stages in a customer’s buying journey and determining the factors influencing the decision to switch between a retailer’s online sales channels – marketplace and own website.”
Considering this, my plan was to apply logit regression analysis on the channel choice of the customer (in this case “Marketplace” and “retailer’s own website”) as the dependent variable and the interaction between the independent variables “Age” and subjective norms (recommendation from peers, product reviews) for the three stages.
I’m struggling to ascertain if using the customer channel choice of either marketplace or own website be considered as the dependent variable. I have not used a Likert scale for this as this was a scenario-based survey. So, the respondents have chosen the channel they would use in every stage.
Could you please advise if using this choice as a dependent variable makes sense? And, if using Logit regression is the right way to go?
Also, how to calculate/analyze relative importance of the predictor variables (independent variables) in Logit Regression analysis?
Is there any explanation for strong, adequate or low value of it? Thank you
I'm doing a regression analysis of the effect of housing type on resident depression. When I included all samples in a single model, housing type had a significant effect on depression (p=0.000). But when I divided the sample into males and females, and performed regression analysis on the two separately, the analysis results of both males and females showed that housing type had no significant effect on depression (p=0.1-0.2). I wonder how to explain this result
I have 667 participants in my sample, and the outcome is continuous. I tested the normality, and the data on the histogram are bell-shaped, but the test results show that they are not normally distributed.
1- What is the cause of the discrepancy between the chart and the test results?
2- Can I still perform linear regression analysis on this data?
I am testing hypothesis of relationships between CEA and Innovation Performance (IP). If I am testing the relationship of one construct , say Management support to IP , is it ok to use single linear regression? Of should I be testing it in a multiple regression with all the constructs?
We want to analyze the relationship and impact between two variables in the education sector, The first variable is the independent variable (intellectual capital), which is measured by a sample of workers and leaders, size 150, and the second variable is the dependent variable (quality of service provided), and it is measured by a sample of Students and parents, its size 330
Regression analysis is used for models that have covariables.
Adjusted r2 5.99 %
F value 9.61
p-value 0.00
Oral Com = 21.36 - 1.194 Dissatisfaction with one’s linguistic skills
My questionnaire consist of 20 questions and five of them are related to dependent variable. But the problem is those questions are not in Likert scale and in different scales with fixed answers and one multiple choice question.
Example 5th question has 1-4
6th question dichotomous 1-2
7th question Multiple choice question 7 options
8th question 1-5 options to pick one.
can I build a composite index variable for dependent variable by standardizing these variables ? using Z Scores
can I use Standardized variables to perform a correlation and regression analysis?