Question
Asked 4 November 2023

Using a "secondary" regression on a significant IV in a "primary" multiple linear regression?

Suppose one has 40 or 50 survey questions for an exploratory analysis of a phenomenon, several of which are intended to be dependent variables, but most independent. A MLR is conducted with e.g. 15 IVs to explain the DV, and maybe half turn out to be significant. Now suppose an interesting IV warrants further investigation, and you think you have collected enough data to at least partially explain what makes this IV so important to the primary DV. Perhaps another, secondary model is in order... i.e. you'd like to turn a significant IV from the primary model into the DV in a new model.
Is there a name for this regression or model approach? It is not exactly nested, hierarchical, or multilevel (I think). The idea, again, is simply to explore what variables explain the presence of IV.a in Model 1, by building Model 2 with IV.a as the DV, and employing additional IVs that were not included in Model 1 to explain this new DV.
I am imagining this as a sort of post-hoc follow up to Model 1, which might sound silly, but this is an exploratory social science study, so some flexibility is warranted, imo.

Most recent answer

James R Knaub
Retired US Fed Govt/Home Research
A similar situation occurred when I worked at the US Energy Information Administration and a friend/colleague asked about and he looked into predicting for some values of a predictor variable when it was not observed for a past finite population used for predictor data, but was present in a period previous to that. This complicates variance estimations and makes variances larger, but predictions can be done this way. However, what we had were very simple models with predictors on the same variables from a previous period. You seem to have more variables involved which could be a lot messier.
Because predictor data are not supposed to be random variables, only the Y-variables, this makes the situation a bit odd. Even for errors-in-variables models, the variability in predictors are supposed to be due to measurement error, which isn't quite the same thing as a random variable in my opinion. So it is an odd situation, but OK I'd say. However, with multiple regression, as you are looking at that (actually it sounds like multivariate multiple regression), you already have to worry about how the predictors (IVs) work together, collinearity for example, so this could make the situation even harder to interpret.
Best wishes - Jim

All Answers (4)

David C. Coker
West Liberty University
Exploratory models offer a lot of possibilities. Do you believe there are confounders, lurking variables, moderators/mediators, and the path of IV --> DV ordered correctly? Did theory and precedent assist in picking IVs? Be careful you don't go on a fishing expedition until you get what you want. This statement means statistical significance must be practical significance, more than noise or random error (common with large samples + large number of variables), and a logical model. If you use post hoc testing, you'd also need to report it.
1 Recommendation
Daniel Wright
University of Nevada, Las Vegas
I am not sure that I understand, but if you have a set of DVs and a set of IVs, and want to first reduce the dimensionality of each of these sets there are a few approaches. You might start at https://rpubs.com/esobolewska/pcr-step-by-step and https://psycnet.apa.org/doiLanding?doi=10.1037%2Fmet0000503
David L Morgan
Portland State University
If you how coherent subsets of your variables (i.e., they all measure essentially the same thing), then you can create scales that are stronger measures than any of the variables take alone.
I have consolidated a set of references on this approach here:
1 Recommendation
James R Knaub
Retired US Fed Govt/Home Research
A similar situation occurred when I worked at the US Energy Information Administration and a friend/colleague asked about and he looked into predicting for some values of a predictor variable when it was not observed for a past finite population used for predictor data, but was present in a period previous to that. This complicates variance estimations and makes variances larger, but predictions can be done this way. However, what we had were very simple models with predictors on the same variables from a previous period. You seem to have more variables involved which could be a lot messier.
Because predictor data are not supposed to be random variables, only the Y-variables, this makes the situation a bit odd. Even for errors-in-variables models, the variability in predictors are supposed to be due to measurement error, which isn't quite the same thing as a random variable in my opinion. So it is an odd situation, but OK I'd say. However, with multiple regression, as you are looking at that (actually it sounds like multivariate multiple regression), you already have to worry about how the predictors (IVs) work together, collinearity for example, so this could make the situation even harder to interpret.
Best wishes - Jim

Similar questions and discussions

Related Publications

Article
Nanofluids play a crucial role in modern technology by significantly improving thermal management systems, leading to more efficient and compact designs. Their enhanced heat transfer capabilities are pivotal in advancing fields like electronics cooling, renewable energy, and medical treatments. Acknowledging this diverse significance, a numerical s...
Article
1 overview The semi-partial regression coefficient—also called part correla-tion—is used to express the specific portion of variance explained by a given independent variable in a multiple linear regression an-alysis (MLR). It can be obtained as the correlation between the de-pendent variable and the residual of the prediction of one inde-pendent v...
Got a technical question?
Get high-quality answers from experts.