Science method

OLS - Science method

Explore the latest questions and answers in OLS, and find OLS experts.
Questions related to OLS
  • asked a question related to OLS
Question
3 answers
What are the pre-estimation tests or cautions for Dynamic Panel Data Analysis? For example, in Pooled OLS we run the Unit root Test, Cointegration Test, VIF test etc.
Relevant answer
Answer
the pre-estimation condition is the presence of endogeneity in the data set
  • asked a question related to OLS
Question
6 answers
The threshold least square regression model by Hansen (2000) divides the series into two regimes endogenously. The Regime above the threshold and below the threshold, and then regress both regimes individually by OLS. this method also involve bootstrap replication. In my case the regime above the threshold only remain with 17 number of observation. Does it creates loss of degree of freedom issue in the data?
Relevant answer
Answer
It is not possible to answer that question without more information. First, you should say how many observations you have. A regression with 17 observations is questionable but it depends on the number of explanatory variables. Since these observations correspond to above the threshold, I fear that they are outliers, hence we are in the worst situation. Did you consider a transformation on the dependent variable? That would perhaps improve the situation provided the relationship with the other variables allows for it. I would say that each of the two regimes should have enough observations.
  • asked a question related to OLS
Question
1 answer
Dear researchers,
Selecting a function of the right form (linear, polynomial, exponential, power law ...) to fit a set of data usually requires the use of some knowledge and some trial-and-error experimentation.
In practice, I guess researchers:
- first select a function form and
- then use a chosen method (e.g., ordinary least squares OLS) to estimate the parameters of that model minimising a defined objective (e. g: minimising the RMSE)
The web contains numerous guidelines on how to estimate the parameters for a given objective . However, at first, from my understanding, the function form must be assumed.
My concern comes from a very concrete issue.
I have numerous inputs and one output. I would like to build a model to predict the output. I've checked numerous laws in the form:
ex:
test 1: F1=aX^2+b*X+c*Y^2+d^Y+e*(X*Y)**2+f*(X*Y)
test 2: F2=a*ln(X)+Y^b+c
..
For each test, I've used train/test subsets, OLS method to find parameters and then RMSE computation, ... very usual process I guess.
Is there research work/tools to automatically generate the functions to evaluate?
I've been searching online for days so any help will be very much appreciated.
Regards,
Relevant answer
Answer
I have found partial answer studying with "symbolic regression" method.
The shape of the function to fit data is not assumed.
Regards
  • asked a question related to OLS
Question
9 answers
Long story short:
I use a long unbalanced panel data set.
All tests indicate that 'fixed effects' is more appropriate than 'random effects' or 'pooled OLS'.
No serial correlation.
BUT, heteroskedasticity is present, even with robust White standard errors.
Can someone suggest a way to either 'remove' or just 'deal' with heteroskedasticity in panel data model?
Relevant answer
Answer
To address heteroskedasticity in a panel data model:
  1. Use Robust Standard Errors: Apply heteroskedasticity-consistent standard errors (e.g., White’s robust standard errors).
  2. Transform Data: Consider log-transformations or other data transformations to stabilize variance.
  3. Apply Generalized Least Squares (GLS): Use GLS techniques that adjust for heteroskedasticity.
  4. Include Additional Variables: Add relevant covariates that might explain the variance in the residuals.
  5. Test and Adjust: Use tests like the Breusch-Pagan test to diagnose heteroskedasticity and adjust your model accordingly.
  • asked a question related to OLS
Question
6 answers
I need to understand which model should I select from the attached file, are these results valid for selecting model? as LM test is insignificant to choose RE Model? if OLS is good then should i go for FE Model or not, and if FE is good as per Hausman test then, Pesaran and Wooldridge test are significant for autocorrelation.
How to move from there, please guide.
Relevant answer
Answer
Panel data analysis chooses between pooled OLS, RE, and FE based on assumptions about individual effects. Pooled OLS ignores individual differences. Random Effects (RE) assumes individual effects are uncorrelated with other variables. Fixed Effects (FE) accounts for all time-invariant individual differences. The choice depends on whether individual effects are correlated with other variables and if time-invariant variables are of interest.
  • asked a question related to OLS
Question
6 answers
I am running a panel data and the Hausman test shows p-value = 0.0113, which means that FE model is better. What test is required to choose between the Fixed effect model and Pooled OLS? Pls kindly explain clearly the test and the steps or Stata command? Thank you very much.
Relevant answer
Answer
Use FE for controlling unobserved heterogeneity and analyzing changes over time, and OLS for broader effects without individual-specific considerations.
  • asked a question related to OLS
Question
2 answers
I am performing the Breusch-Pegan test and it is showing an insignificant p-value. Also, the Hausman test also shows an insignificant p-value.
I wanted to ask if OLS, Fixed-effect or Random-effect model will be suitable for my dataset, depending on the above test statistics.
Relevant answer
Answer
The OLS model is a good choice because there is no heteroscedasticity. The random-effects model is better than the fixed-effects model because the Hausman test result is insignificant. But you should still use the OLS model because of the absence of heteroscedasticity.
  • asked a question related to OLS
Question
9 answers
Here is the case, as I said, I am working on how Macroeconomic variables affect REIT Index Return. To understand how macroeconomic variables affect REIT which tests or estimation method should I use.
I know I can use OLS but is there any other method to use? All my time series are stationary at I(0).
Relevant answer
Answer
You can use econometric methods such as regression analysis, Vector Autoregression (VAR), or Granger causality tests to analyze how macroeconomic variables affect REIT index returns.
  • asked a question related to OLS
Question
2 answers
I run OLS regression on panel data in Eviews and then 2SLS and GMM regression.
I introduced all the independent variables of OLS as instrumental variables.
I am getting exacty same results under the three methods.
is there any mistake in running the models
I am also attaching the results.
thanks in advance
Relevant answer
Answer
OLS (Ordinary Least Squares), (2SLS) Two-Stage Least Squares, and (GMM) Generalized Method of Moments, are all statistical methods used in econometrics.
These tools are used in different contexts and under different assumptions, so they do not generally produce similar results.
  • asked a question related to OLS
Question
3 answers
For instance, when using OLS, objective of the could be
# to determine the effect of A on B
could this kind of objective hold when using threshold regression?
Relevant answer
Answer
Thank you
  • asked a question related to OLS
Question
5 answers
Dear community,
I spend the last days reading many questions and responses as well as several journal articles and research papers on the topic of time series analysis and the individual steps. Unfortunately, the more I read the more confused I became...now I hope that you can help me out.
I am writing my master thesis about FDI determinants in Ethiopia. I have selected FDI per GDP as dependent variable and several independent variables (exchange rate change, inflation rate, labor force participation rate, telephone subscriptions per 100 citizens, a governance index (ranking from 0 to 100), and exports + imports as ratio of GDP). I am also thinking about adding a dummy variable to reflect the effects of the Growth and Transformation Plan in 2010.
The next step is to analyse the data empirically in order to investigate the impact of the explanatory variables on the FDI inflow. Due to data availability I only cover a period of 20 years (1996-2016). I read several papers on this topic but somehow everyone performed/applied different steps/tests/models. Also the order of the test performed varies in the papers.
I am really confused by now with regards to the differences between OLS, ECM, ARDL, and VAR and what is the most appropriate method in my case.
In addition, authors performed (some didn't) different tests for unit root/stationary, for co-integration, for multicollinearity, for autocorrelation, for heteroskedasticity. Also in a different order.
Besides, I am confused about the lag selection process and the meaning/difference of AR(1) and I(1).
Moreover, many authors transformed the variables first with log. I cannot do that as I have to negative observations (inflation rate).
Earlier I also read something about trend and difference stationary and depending on this different unit root test.
Like I said, I am just so confused by now that I don't even know who and where to start.
I am working mainly with SPSS but will perform the unit root tests in Eviews as SPSS do not have this function.
I really hope that you can help me by providing a little guideline on what I need to do and in which order. Thank you so much!
Relevant answer
Answer
Here are some steps you can follow to perform time series analysis for a period of 20 years:
  1. Collect the data and clean it.
  2. Prepare visualization with respect to time vs. key feature.
  3. Observe the stationarity of the series.
  4. Develop charts to understand its nature.
  5. Build the model – AR, MA, ARMA, and ARIMA.
  6. Evaluate the model using statistical measures such as RMSE, MAPE, etc.
  • asked a question related to OLS
Question
4 answers
Since OLS and Fixed effect estimation varies, for a fixed effect panel data model estimated using a fixed effects (within) regression what assumptions, for example no heteroskedasticity, linearity, do I need to test for, before I can run the regression.
I'm using the and xtreg,fe and xtscc,fe commands on stata.
Relevant answer
Answer
Before performing a fixed effects regression on panel data, several assumptions should be tested to ensure the validity of the results. These assumptions include:
  1. Time-invariant individual effects: One assumption of the fixed effects model is that the individual-specific effects are time-invariant. This means that the unobserved individual-specific factors affecting the dependent variable remain constant over time. This assumption can be tested by examining whether the individual effects are correlated with the time-varying regressors. If there is a correlation, it suggests that the assumption may be violated.
  2. No perfect multicollinearity: The independent variables in the regression should not exhibit perfect multicollinearity, which occurs when one or more independent variables are perfectly linearly dependent on others. Perfect multicollinearity can lead to unreliable coefficient estimates and inflated standard errors.
  3. No endogeneity: The assumption of exogeneity implies that the independent variables are not correlated with the error term. Endogeneity can arise when there are omitted variables, measurement errors, or simultaneity issues. Various tests, such as instrumental variable approaches or tests for correlation between the residuals and the independent variables, can be employed to check for endogeneity.
  4. Homoscedasticity: Homoscedasticity assumes that the error term has constant variance across all observations. Heteroscedasticity, where the variance of the error term varies systematically, can lead to inefficient coefficient estimates. Graphical methods, such as plotting residuals against predicted values or conducting formal tests like the White test, can be used to diagnose heteroscedasticity.
  5. No serial correlation: Serial correlation, also known as autocorrelation, assumes that the error terms are not correlated with each other over time. If there is serial correlation, it violates the assumption of independence of observations. Diagnostic tests like the Durbin-Watson test or plotting residuals against time can help identify serial correlation.
  6. Normality of errors: The assumption of normality assumes that the error term follows a normal distribution. Departures from normality can affect the reliability of hypothesis tests and confidence intervals. Graphical methods, such as histograms or Q-Q plots of residuals, can help assess normality.
  • asked a question related to OLS
Question
5 answers
Hello,
Need your help,
My research topic is impact of dividend policy and profitability on share prices. I have taken 29 nonfinancial companies for 10 years time span. I ran simple OLS model in E-View 12 Student Version. But, I faced the problem of positive autocorrelation. I also applied panel data analysis - Pooled OLS, Fixed Effect, Random Effect Model. Is it require to fulfill the assumptions i. e. Normality of Residuals, Homoskedasticy of Residuals, No Autocorrelation, No Multicollinearity? E-View doesn't allow diagnostic tests in Panel Data analysis. I have no much idea about analysis and econometrics. Guide me. So, I can complete my multiple linear regression analysis.
Thank you in Advance.
Relevant answer
Answer
It sounds as if your model is something like
share price_it = b0 + b1 dividend_policy_it + b2 profitability_it + u _it
for various definitions of b0. Before you do any econometrics I would suggest that you consider the following points
  1. Are you sure that your model is reasonably complete or are there other variables that determine share price? If there are other variables and they are correlated with the included explanatory variables then the estimates of your coefficients may be biased.
  2. Share prices are generally non-stationary.
  3. If you want to do some form of panel data analysis you are implying that b1 and b2 are constant across firms. Consider the different effects on a firm with a small share price (many shares) and one with a large price (few shares).
The idea is that before you go use your econometrics you use your knowledge of finance, economics, and common sense to set out the model which describes how the system works. When you have this model you can then test it. If it fails the tests either your data or your model is wrong.
The theory of share prices is very difficult. Very often random walk models outperform other models.
  • asked a question related to OLS
Question
3 answers
I am using a Durbin-Watson test as a method through which to test for autocorrelation in a time series. Said data forms the basis of an interrupted time series analysis. My question is whether the absence of detected autocorrelation in the DW test on a simple model (OLS) is sufficient to inform modelling thereafter (ie. should I undertake the DW test on other plausible model types or does the DW test on an OLS suffice)?
Relevant answer
Answer
Hi,
Just to mention (refere to wikipedia), the Durbin–Watson statistic, while displayed by many regression analysis programs, is not applicable in certain situations. For instance, when lagged dependent variables are included in the explanatory variables, then it is inappropriate to use this test. Durbin's h-test (see below) or likelihood ratio tests, that are valid in large samples, should be used.
Hamid
  • asked a question related to OLS
Question
1 answer
I am already familiar with the process of calculating hedge ratios with linear regression (OLS). I am already running 4 different regressions for calculating hedge ratios between emerging markets and different hedging assets like gold. This is done on in-sample data.
That would look something like this: EM=a+b*GOLD+e
I then construct a portfolio and test the standard deviation of the portfolio and compare that with the non-hedged portfolio of only emerging market equities in the out-sample: R-b*GOLD
However, I want to compare these OLS hedge ratios to conditional hedge ratios from for instance a BEKK GARCH or a DCC GARCH.
I have already tried to work with R and I used the rugarch and rmgarch packages and created a model, modelspec and modelfit, but I do not know how to go from there.
Relevant answer
Answer
Calculating conditional hedge ratios using models like BEKK GARCH or DCC GARCH in R or EViews involves a slightly different approach compared to OLS regression. Here's a general outline of the steps you can follow:
  1. Data Preparation: Ensure that you have your data ready in the appropriate format. Make sure you have the returns or log returns of the relevant variables, such as the emerging markets (EM) and the hedging asset (e.g., GOLD). Ensure that the data is properly aligned and covers the desired time period.
  2. Model Specification: Specify the appropriate conditional volatility model, such as BEKK GARCH or DCC GARCH. In R, you can use the rugarch package for both models. For BEKK GARCH, you can use the ugarchspec function with the appropriate specifications, such as the variance model (variance.model) and covariance model (distribution.model). For DCC GARCH, you can use the dccspec function.
  3. Model Estimation: Estimate the specified model using the ugarchfit function for BEKK GARCH or the dccfit function for DCC GARCH. Provide the returns of the EM and hedging asset as inputs to the estimation function. The output will be a fitted model object (e.g., fit_bekk or fit_dcc) that contains the estimated parameters and other relevant information.
  4. Extract Conditional Covariance: Extract the conditional covariance matrix from the fitted model object. In the case of BEKK GARCH, you can use the rcov function on the fitted model object (fit_bekk@fit$rcov). For DCC GARCH, you can use the rcov function on the fit_dcc@fit$R object.
  5. Calculate Conditional Hedge Ratios: Compute the conditional hedge ratios based on the extracted conditional covariance matrix. The conditional hedge ratio can be obtained by dividing the covariance between EM and the hedging asset by the variance of the hedging asset. In the case of BEKK GARCH, you can use the formula: conditional hedge ratio = (covariance EM-GOLD) / (variance GOLD). For DCC GARCH, you can use the formula: conditional hedge ratio = (conditional correlation EM-GOLD) * (conditional standard deviation EM) / (conditional standard deviation GOLD).
  6. Compare Results: Compare the conditional hedge ratios obtained from the GARCH models with the OLS hedge ratios you calculated previously. Assess the differences and evaluate the effectiveness of each approach in terms of risk reduction or portfolio performance.
It's important to note that the specific syntax and function names may vary slightly depending on the version of the packages you are using. Make sure to refer to the package documentation and examples for more detailed guidance.
  • asked a question related to OLS
Question
5 answers
I am using VECM testing long run relationships of variables affecting housing prices. In my OLS, I have seasonable variables and lag 1 and other economic fundamental variables. Johansen cointegration test and VECM didn't work when
I include these seasonable variables and lag 1 of the dependent variable. When I exclude these, Johansen and VECM worked. Can I conclude error term in VECM can correct disequilibrium? thanks.
Relevant answer
Answer
VECM stands for Vector Error Correction Model. It is a statistical model used to analyze the long-run equilibrium relationship between two or more cointegrated variables. The VECM model is a special case of the VAR (Vector Autoregression) model that takes into account the cointegrating relations among the variables. The VECM model consists of a VAR model of the order p - 1 on the differences of the variables, and an error-correction term derived from the known (estimated) cointegrating relationship1.
I hope this helps. Let me know if you have any other questions
  • asked a question related to OLS
Question
2 answers
I conducted research on the effects of natural disasters on credit growth, using 2008-2022 panel data. All of my independent variables are represented in the lag, and I also include the lag dependent variable as a regressor. When comparing the coefficient of lag dependent variable between GMM, OLS, and FEM, with GMM are always greater than FEM and OLS. Is there an explanation for this condition? Is there any treatment that can be done so that the GMM value is between OLS and FEM?
Relevant answer
Answer
Dickson Utonga Thank you for your explanation. It's really helpfull
  • asked a question related to OLS
Question
3 answers
I have data related to households with a number of variables with the dependent variable being household consumption. I need to specify an OLS regression to identify the treatment effect of interest but I do not have interest as a variable within the data provided. How would I go about creating this variable and introducing it as a shock to the data?
Relevant answer
Answer
Without the data of the variable (either dependent or independent), you cannot enter it into a regression model.
If the variable is (nominal), has two states/yes/no
You can enter it into the model as a dummy variable with two values 0 when it does not happen and 1 when it does.
Note: The OLS model is not a research goal in itself. Choosing the model based on the necessary tests before any regression model is / normal distribution test, correlation test, linearity test, time series stability ... etc. /
  • asked a question related to OLS
Question
9 answers
Hello!
I have a non-normally distributed variable (income) and although I tried to transform it to a normally distributed variable skewness and kurtosis values are still so high and there is lots of outliers on it. But can't delete the outliers because it is about nature of income variable. So I didn't delete a single one (by the way N=9918, I am not sure it is acceptable to delete 200 or 300 of them). I read about after conducting the OLS if residuals are distibuting normally it is acceptable to use OLS results. But I couldn't find any academic source/strong reference about it.
I wonder that when I have normally-distributed residuals can I use OLS results even if the variable has outliers and have higher skewness and kurtosis values? If this is an acceptable way to conduct this analysis, can you suggest an academic resource that I can reference to support this usage?
Thank you in advance.
Relevant answer
Answer
If you look up the assumptions for an OLS general linear model, you'll see that there is no assumption that the dependent variable is normally distributed.
Usually the assumption is written as the errors from the model being normally distributed. These are approximated by the residuals.
Pretty much any textbook on the design and analysis of experiments should be a reference for this. Slides in the following presentation are taken from Montgomery, Design and Analysis of Experiments. See slide 8, and then slides 19 - 22.
  • asked a question related to OLS
Question
4 answers
I am trying to run an OLS regression, with two continuous variables, both with negative and positive values. How should one deal with this? How do you plot a regression line in this case? Is it necessary to add a constant to counteract any negative values? Thank you.
Relevant answer
Answer
This question is not really a statistical issue but an algebraic equation issue.What are being graphically plotted are the dependent and independent variables (x and y) .of the relationship
equation.
Therefore,the question needs not arise.
  • asked a question related to OLS
Question
3 answers
Hi Everyone,
I am trying to use the SURE model in Nlogit, the software uses generalized least square regression for estimation. Is there any way or command to use OLS instead of GLS to be used in the SURE model?
Thanks
Relevant answer
Answer
Maybe “R” code is better?
  • asked a question related to OLS
Question
7 answers
Hi, Everyone!
I am developing a model in which the dependent variable (y) is the number of days a company takes to complete an acquisition (or a merger) of another company. Since it can take from 1 to a 1000 or more days according to records, I think the Poisson model would not be suitable, since the number of days are not necesarily small count data . Can I apply the traditional OLS, or wich model do you recomend?
Thank You
Relevant answer
Answer
Hi again
ARDL model works best with quarterly and annual data analysis, whereas ARCH models work best with high-frequency data such as hourly, daily, and monthly data analysis.
  • asked a question related to OLS
Question
3 answers
I tested multiple linear regression analysis with my Likert scale data and it violated the normality assumption of OLS, after that I found ordinal logistic regression and tested but the p-value of parallel lines and goodness of fit(Pearson) is less than 5%. What to do?
Relevant answer
Answer
You need a good set of notes on ordinal logistic regression. I attaching some from Frank Harrell 's course. Best wishes David Booth
  • asked a question related to OLS
Question
1 answer
Hello Everyone,
I am using SPSS with the PROCESS plugin by Hayes. I performed a (OLS) regression analysis (in SPSS) with dependent variable X and 3 dummy variables (I have 4 groups, one is the reference group). This gave me some coefficients for the dummies that made sense. But then, I continued with my analysis to a mediation analysis (using PROCESS), I used the same X and the same 3 dummies (same reference group), and the total effect model spat out different values for the coefficients (quite a bit larger). The same happened for another variable when PROCESS calculated an effect of X on M (the mediator), i.e. it returned different results from my "normal" regression analysis without PROCESS. From what I understand in Hayes 2018 book, the equations used to calculate these coefficients in mediation should be the same as my previously calculated ones. But they differ. Any ideas why?
Thank you,
Stefan
Relevant answer
Answer
Did you ever figure it out? I have the same question..
  • asked a question related to OLS
Question
3 answers
I have 5-point Likert data for 54 items. These items are broken into several scales, some have been adapted from validated measures. I have two waves of data with the same survey, different samples drawn from the same populations. In each wave I have three groups and a supplemental sample (so really 4 groups). I would like to compare the responses between the two waves and between the groups. My inclination is to use an OLS regression with everything in the model (wave, group, demographics). However, I have some hesitation for treating Likert data as interval. Yet, a categorial approach would require analyses item by item, which would be impractical given the number of items and the 5-level response option. I am looking forward to some takes on this-can I use a categorial approach and collapse items into scales or is that a "having your cake and eating too"-kind of question?
Relevant answer
Answer
Hello Ana,
If there are multiple items per scale, and you're confident that previous work on these scales establishes their unidimensionality, then your options would be:
1. Generate factor scores for each case on each scale. You'd likely be better off starting with polychoric correlations among the individual constituent item scores than using ordinary Pearson correlations, however.
2. Use a polytomous item response theory model (e.g., graded response model) to generate scale scores for each person on each scale.
3. Emulate what many others do and simply sum, average, or take the median of responses for each person on each scale, and use these values as if they were interval strength. Of these, the median is likely the safest choice, as that is guaranteed to identify the "middle-most" score value across a set of individual items or responses.
Beyond that, you have cases nested within waves (and possibly within groups, depending on how you sampled), so the comparison design is singly, or at most doubly, nested and can be handled pretty well with regression models.
Good luck with your work.
  • asked a question related to OLS
Question
2 answers
I am working on my thesis and I have a few questions about which method I should use for the analysis of my data. My research is about inequality in Europe, and in general households in Europe that have access to broadband internet connections and what its effects are on their educational attainment. The IV here would be
  1. % of households that have access to a broadband connection
  2. GINI Index of the country (to mesure inequality between the countries)
The dependent variable would be a variable that can be divided in 3 groups: . % of population that completed primary education, % of population that completed secundary education, and finally % of population that completed tertiary education.
The data is available on the World Bank database and has about 20 countries over a period of 15 years. After doing some research I figured out that the type of data i'm using is Panel Data. I have done some reading about it, but I can't figure out how to continue, because most of the tutorials only use one IV and on DV. What I have read suggests using OLS(my promotor also told me that OLS would be best suited) for the type of variables and data I'm using and that I will need control variables like "population" or "unemployment", but I don't get it.
I don't know if I'm being clear here, I basically want to know if I need to do what I read (but then I have no clue on how to work with 2IV and 1 DV), or if it's something completely different..?
If something isn't clear, let me know, and I'll try to explain better. Thank you very much.
Relevant answer
Answer
Thank you very much for the answer Christian Geiser . I checked a bit more, and I think I might have to rethink my DV, because it seems as what I want to achieve is multiple DV's and that would make things more complex than they need to be. My aim, and my research question actually is about the digital divide and what effects access to broadband internet has on the educational attainment. Maybe I should go for a single DV (that is continuous), like "% of people that completed college" instead. Would OLS Linear regression still be a good method? Thank you.
  • asked a question related to OLS
Question
4 answers
I assume that volatility of dependent variable for a population is explained by volatility of same variable for one subpopulation rather than another. Comparison of those subpopulations is the focus of my study. Thus, I proposed a model:
Y for population A = b0 + b1 * time + b2 * Y for subpopulation A1 + b3 * Y for subpopulation A2
My qestion is, if such a model is correct? If not, what kind of method should be applied to measure the impact of subpopulations' variability on volatility of population in total.
Relevant answer
Answer
Dear All. Much appreciate your comments. Just to clarify, the study I conduct is about consequences of Anglo-American dominance in studies on sustainable tourism. We decided to collect metadata of papers in Journal of Sustianable Tourism pubslied in last 10 years. Asking about expected and demanded changes like e.g., increase of interest in methodological triangulation. The question is, whether that increase is evidenced in time, and which subgroup of papers contribute to that change mostly: leaded by Anglo-American authors or others. Hope this explanation make my concern more clear to all of you. Thx in advance for advices.
  • asked a question related to OLS
Question
3 answers
Good day scholars,
Please I need your suggestions on single-equation estimators that can best capture serial correlation problem in a single-equation regression model settings. The single-equation model has been estimated using OLS and we found the Durbin-Watson Statistic to be 1.042838 and R-squared as 0.967900.
The datasets used for the study is time series datasets with mixed order of integrations such as four I(1)s and 1 I(2).
Thanks and God bless
Relevant answer
Answer
Hi, try to add one or two lags of your dependent variable. If you have a regression like this:
Y(t)= a+b*X(t)+d*Z(t)+e(t), to give an example, you could add Y(t-1), Y(t-2), etc, to your expression:
Y(t)= a+c1*Y(t-1)+c2*Y(t-2)+b*X(t)+d*Z(t)+e(t)
Another option: you could model the persistence in your errors by adding an ARMA structure to them. This is easily handled in Eviews or traditional econometrics softwares.
Another alternative idea: try to model the first difference of your time series. So change your dependent variable Y(t) to Z(t)=Y(t)-Y(t-1). First differences typically display less denedence than the variable in levels itself.
Hope it helps!!!
  • asked a question related to OLS
Question
1 answer
I was constructing a multiple regression model and was inspired by two papers the two papers didn't test stationarity although when I tried to test stationarity and took the difference all the variables were insignificant and R squared was too low 0.22 , but when I tried without taking difference of stationarity the variables were significant and R squared was 0.78
Relevant answer
Answer
If by "stationarity" you mean constant variance of the errors, then no, you shouldn't test it; it is an *assumption*. However, it is always a good idea to examine your residuals for curvature and heteroskedasticity. Finding either (or both) suggests you may need to re-express your response variable. But you don't need a hypothesis test for that
  • asked a question related to OLS
Question
4 answers
I found different ways in estimating long-run and short-run when processing time series data through ARDL-ECM method in Eviews 10.
Someone use OLS method to estimasi long-run and short run, however, others directly use output ARDL for long run, and ECM for short run.
In your opinion, which one between these two method shoud I use, and why?
Thank you
Relevant answer
Answer
I recommand to see first the intergration order of your series. I(0), (1) I(2). Usually a mix of I(0) I(1) implies the usage of ARDL. I suggest also to see model with time varying parameters. One can see this :
  • asked a question related to OLS
Question
3 answers
Hello,
I am attempting a two-part model on semi-continuous data (zero inflated).
As I understand, the first part is a binary logistic regression (or probit) model for the dichotomous event of having zero or positive values.
logit[P(Yi = 0)] = xβ ......... Equation (1)
Conditional on a positive / non-zero value, the second part (continuous) can be modelled using either a OLS regression (with or without log-normal transformation of outcome variable) or generalized linear models (GLM).
log(yi|yi > 0) = xβ2 + e where e is normally distributed .... Eq (2)
Combining the above two parts, the overall mean can be written as the PRODUCT of expectations from the first and second parts of the model (refer Eq 1 and 2), as follows:
E(y|x)=Pr(y> 0|x) ×E(y|y> 0, x) .... Eq (3).
I could find a no. of papers which have employed the 'twopm' command of STATA package. However, I am using SPSS. I have conducted the two parts (binary and continuous) using the same set of predictors.
Please suggest how to multiply the results from both the parts using SPSS.
Thank you.
Relevant answer
Answer
You might want to look at.the attached screenshot search. A package in R is available. If you aren't familiar with R I suggest you download jarad Lander R for everyone available from the z-library. All of this is free. Best wishes, David Booth
  • asked a question related to OLS
Question
3 answers
Dear Community,
I am facing a rather odd problem. I am running an OLS multivariate regression, my independent variables change signs compared to signs of univariat regression, although VIF is below 2,0. Can somebody please help me and explain the matter to me? Many thanks.
Relevant answer
Answer
Do you mean multiple regression or multivariate regression?
This may be some form of suppression effect, one indicator may be that the sign for the predictors switches between bivariate and multiple(!) regression. Another indicator may be that both predictors have the same bivariate sign in the association with the dependent variable (so both correlate either positively or both negatively) but the predictors are negatively correlated with each other (which is odd if you think about it). Typically, suppression leads to an R2 which is larger than the sum of ryx12 and ryx22 which is also odd. But please have a look at the seminal book by Cohen, Cohen, West, & Aiken – Applied Multiple Regression/Correlation Analysis for further explanation.
  • asked a question related to OLS
Question
9 answers
Dear Community,
I was wondering whether it is possible or not to validate hypothesis testing based on OLS regression with 2 indep variables only? Or will such thing decrease the credibility of OLS regression results?
If we push it to the extreme, is it possible to validate hypothesis testing with uni-variate regression analysis?
Relevant answer
Answer
Hello Yuko,
I'm not sure I understand exactly what you're asking.
You certainly can:
1. Use OLS regression to develop a model having two IVs and one DV (which is a univariate model: one DV).
2. Use OLS regression, with additional data, to determine how well a previously developed model having two IVs works to account for variance in the DV in the added/new data batch.
3. Use OLS regression to evaluate the relationship between scores on a single IV with those of a single DV (simple bivariate regression).
4. Use OLS regression to address questions of whether a categorical IV can help explain differences in scores on a single metric DV.
If none of these gets at the intention of your query, perhaps you could elaborate a bit so as to increase the likelihood of getting a more constructive response.
Good luck with your work.
  • asked a question related to OLS
Question
7 answers
Results in OLS method, illustrate a strong relationship between 3 explanatory variables and a dependent variable; however, when it comes to GWR method(using the same explanatory variables), running fails, containing Error 040038 (Results cannot be computed because of severe model design problems).
What's the reason? how to solve it.
Thanks in advance.
Relevant answer
Answer
Roshan Bhandari: I am not sure if you have changed the data structure by doing it (depending on the data format). As I mentioned, applying these types of analyzes in other software packages is also suggested.
  • asked a question related to OLS
Question
4 answers
I am doing my masters essay. I am trying to find the determinants of environmental performance index using Human development index as the income variable. I am a bot confused with my results. The EPI and HDI scatter plot showed linear relationship, or it was my mistake, i did not look at it close enough. I already designed all my work around this. I do not have much time in my hand either. Will my OLS regression be legitimate under the cicumstances?
Relevant answer
Answer
I'm not very familiar with these augmented component-plus-residual plots, but I can't imagine that amount of non-linearity in the model makes the model inappropriate. Nothing is perfect.
  • asked a question related to OLS
Question
2 answers
Good evening,
I am currently working on my master thesis document and I do have a potentially basic question.
I try to understand the impact of having female in the acquirer's board of directors on the premium paid during an acquisition (percentage of price paid above or under the real value of the target).
To do so, I tried to perform an OLS on 800 observations (R Studio) but I kept finding heteroskedasticity in my model (even by transforming the data).
I tried to perform a plm regression but I keep having heteroskedasticity according to either the residual vs. fitted plot or the bptest. Additionnaly, the variable I try to analyse does have a p-value of 0.8 rather than <0.05, it is thus not significant and not something I could use in my report.
Do you have any idea of how I could manage to cancel the heteorskedasticity problem or if this might be something I will not be able to get rid of ? Additionnaly, do you think I could write a master thesis with the variable studied being not significant or should I keep searching ?
Thank you in advance for your help
Relevant answer
Answer
PS -
Using a graphical residual analysis (you said you used) with and without a particular variable in the model, and a cross-validation in each case, should indicate the value (or lack of it) as part of your model.
  • asked a question related to OLS
Question
7 answers
Removed Content
Relevant answer
Answer
It depends on the estimated findings of your final model design. You have to check whether the interaction term LIQUIDITY x PRECRISIS_DUM could capture the potential effect or not.
  • asked a question related to OLS
Question
4 answers
I did IPS test and Fisher for unit root test and found out that all variables were stationary at level but only real exchange rate was not. Futhermore, the real exchange rate was stationary at I(2).
Then I took Pedroni test and the null hypothesis was rejected, which means there is long term relationship between variables.
Could I continue to do OLS, FE or Re? If not what should I do next?
Plus that my data is panel with N=12, T=20, unbalanced data.
Relevant answer
Answer
Dear Tram.
You'r stationary at I(2) could affect your results..
I would prefer you continue with OLS and compare your result with ARDL technique.
Regards
  • asked a question related to OLS
Question
5 answers
Hi all,
I hope anyone can help me and save my life!
I have been trying to gather as much information as possible to decide which analysis method I should use for my collected data, but I can’t make a decision.
In particular, I am struggling to decide whether I should see my data as non-normal or normal, then I can decided which statistical method I can adopt.
For my thesis, I am currently working on analyses with the collected primary data (n=110) based on a cross-sectional, non-probability and self-reported survey for a theory-testing purpose.
All constructs are reflective measures and multi-item scales (between 3 to 6 items) across 2 IVs, 2 DVs, 2 moderators. .
1 IV is a higher-order construct (3 dimensions), and another IV is unidimensional (5 items).
All of them are seven points Likert scale that is previously developed and validated in other papers
Positive sides:
After the preliminary analysis, reliability (Cronbach’s a >.7), correlation, and regression analyses (OLS) show a good sign and are mostly in line with the conceptual and theoretical models from the role model papers.
Negative sides:
· The results of EFA (to check the unidimensionality) and CFA analysis (to test the model) were horrible. The EFA results were messy as per image below. Many of them are cross-loaded or loaded on unmeaningful components.
· KMO is >.850, suggesting that a sample size is not an issue?
· Basically, many OLS assumptions are violated because of leptokurtic distribution, heteroscedastic, and non-random sampling.
· Normality tests (e.g. ‎Shapiro–Wilk test ) says all of them are nonnormally distributed. Besides, across all constructs, the Skewness and Kurtosis values are around -2 and +6.5, respectively. However, although I am aware of the threshold line, +- 2 for BOTH skewness and Kurtosis, I can chose to follow the Hair’s (2010) guideline, which suggests that +- 2 for skewness and +- 7 for Kurtosis for data to be considered as “normal”.
· Lastly, Common method bias is checked through Harman's single factor test (34% of total variance)
Based on this information, I have questions.
(1) Which regression model should I use? (OLS, PLS and so on). Please provide me with some justifications of choice (e.g. Sample size).
(2) Then, which statistical tool do you recommend? ( I am currently using SPSS and AMOS, but I can use R, SmartPLS or MPlus if needed.
(3) Should I consider my data as normal or non-normal?
Thank you so much for your help in advance!!
Ted
Relevant answer
Answer
@Ted I would recommend to go with PLS approach in Smart-PLS as it doesn't require normality assumption and allows to add latent variables add as moderator and test the model as a whole. AMOS won't allow you to add latent variables as moderator and you would have to run path analysis for moderation taking composite variables and interaction variables (IV*moderator).
  • asked a question related to OLS
Question
1 answer
Hello everyone,
Currently I am doing an research of factors affecting exports by using OLS,FEM or REM with N=12 and T=20. When I did an unit root test, it turned out that all independent variables are stationary but for dependent variable. Futhermore, dependent variable was stationary at I(1).
Could I use the data at the level? My dependent variable have some missing values so is it good if I run the regression at I(1)? What should I do next?
I am really confused so please help me, thank you a lot.
  • asked a question related to OLS
Question
12 answers
Hi everyone,
recently, I have been working on a study where I examine the impact of American tariffs, customs and other import duties on European exports to the US. I have three variables (y = EU exports to the US, x1 = US tariffs, x2 = US customs and other duties). I use quarterly data from 1995Q1 until 2017Q1 (89 observations). My tutor has emphasized that I need to controll for year and country fixed effects and maybe introduce dummies per year and country. I am quite clueless how to do that. Why is it necessary? What is the equation? How do I do that in Excel or eViews? I would appreciate step by step instructions so much!
Thank you in advance for any help or comments.
Relevant answer
Answer
Rasidah Mohd-Rashid I hope the answer is not late, I suppose you want to add year fixed effects to control for year to year variation, this is a perfectly normal situation for event studies. Ideally, to not have a lot of dummy variables and noise, there should be few effects to be controlled, so having few years of data is fine.
  • asked a question related to OLS
Question
4 answers
Why do the signs of coefficients change when moving from OLS regression to Fixed Effect Regression? my theory stands on the assumption that PS would be positive and significant, the same as in the OLS result, but the fixed effect changes that. What's the solution?
Relevant answer
Answer
The reason for this is obvious and is related to the removal of the fixed effect or the same intercept in the Fixed Effect model. According to your description, I think Model OLS is better.
  • asked a question related to OLS
Question
8 answers
Is it possible to use backward regression for all variables in a OLS model, but to leave the year dummies out of the bacward selection, so that these dummies appear in the final model even if they are not significant?
Thank you.
Relevant answer
Answer
I too would be very wary if you are developing a model for understanding, but you question is a software issue - so Minitab for example gives you this facility
Displays the set of terms that the procedure will assess. Indicators (E or I) next to the term in the list signify how the procedure handles the term. The Method you choose determines the initial settings in this list. You can modify how the procedure handles the terms with the two buttons below. If you don't use these buttons, the procedure can add or remove the term from the model based on its p-value.
  • E = Include term in every model: Select a term and click this button to force the term into every model regardless of its p-value. Click the button again to remove this condition.
  • I = Include term in the initial model: Select a term and click this button to include the term in the initial model. The procedure can remove these terms if its p-value is too high. Click the button again to remove this condition.
  • asked a question related to OLS
Question
3 answers
I need the data to perform OLS where the dependent variable will be the shadow economy in Greece and the independent the corruption in Greece and the amount of e-transactions per person in the country, but I can't find the data, can someone help me please?
Relevant answer
Answer
Here is a document where they show a few variables regarding electronic transactions in Greece. Likewise, I add other documents that will surely be of help. I hope they serve you, best regards.
  • asked a question related to OLS
Question
4 answers
I am looking to do a power analysis with assumptions about dependent observations.
I the research design, each respondent is assigned 4 short pieces of text, each of which they are asked to rate (on a continuous scale). Each text belongs to either treatment A or treatment B so each respondent reads and rates 2 texts from each treatment. Therefore, the 4 ratings that each respondent gives will be dependent on the respondent or they will be correlated.
In the following OLS regression, I would normally cluster the standard errors by respondent, but I am unsure how to implement this in a power analysis. I will assume either a small or medium treatment effect (e.g. Cohen's d of 0.2 or 0.4). Then, I am interested in knowing the power if I have a) 100, 200, 300 or 400 respondents and b) 2, 4 or 6 texts rated by each respondent.
I have tried to use the clusterPower package in R but I am not sure which function to use exactly and how.
I hope you can help. Thanks!
Relevant answer
Answer
Gpower is pretty user friendly.
  • asked a question related to OLS
Question
8 answers
Dear everyone,
I am in great distress and desperately need your advice. I have the cumulated (disaggregated) data of a survey of an industry (total export, total labour costs etc.) of 380 firms. The original paper is using a Two-stage least square (TSLS) model in oder to analyze several industries with one Independent variable having a relationship with the dependent variable, which was the limitation not to use an OLS method, according to the author. However, i want to conduct a single industry analysis and exclude the variable with the relationship, BUT instead analyze the model over 3 years. What is the best econometric model to use? Can is use an OLS regression over period of 3 years? if yes, what tests are applicable then?
Thank you so much for your help, you are helping me out so much !!!!!!!
Relevant answer
Answer
Dear, conducting any standard model depends on an important factor, namely the number of observations included in the model, for example, if the observations are small, the Phelps-Peron test can be conducted to test the stability, and if the observations are large, the whooping-full test can be conducted, and in light of the stability results, we can determine the model that can be conducted Julius Hogan
  • asked a question related to OLS
Question
4 answers
I have panel dataset with T=15 and N=5 countries. I used hausman test to decide between the fixed and random effect model and i got p-value as 1 which implies random effect model is preferred...Then i also used Breusch Pargan LM test to decide between RE model and pooled OLS model which gives LM statistic as 0 and p-value as 1. This implies that pooled OLS is better. Could anyone tell me whether getting Chi square test statistic of 0 and p-value of 1 is fine or doubtful that something is not right with the data. Any suggestion/comments in this regard will be highly appreciated.
Relevant answer
Answer
Same issue I faced in analysis with T 20 and N 15. . What a solution?
  • asked a question related to OLS
Question
13 answers
Hi,
This is my first ever research I am doing for my MSc so please bear with me.
N=422, looking at the histogram and the skewness coefficients it can be confirmed it is skewed. Scatter plots tell me there are 3 weak to moderate relationships between IVs and DV linear and 1 curvilinear relationship but also seem to be homoscedastic. Kolmogorov-Smirnov values for all variables were significant, once again confirming non-normal distribution.
I tried removing outliers but did not help the distribution. I transformed them in spss, following Pallant (2016) recommendation by using Reflect and Logarithm, Reflect and inverse as these shapes matched my histograms. The results are still far from normal.
Following SPSS survival manual (Pallant, 2016), is seems there are limited options to what I can do at this point. I can report the descriptive part of my analysis for the dissertation and what usually happens is people report correlations next and then they test their model with some form of regression. And if they test moderating effect of a variable they seem to do SEM.
Now I chose Spearman's correlation and described the association between the variables (rather than correlation) then I learned that it can also be used for hypothesis testing? But it can only be used for hypothesis such as H1 There is a relationship between this and that. It that correct? If so, that is fine, three of my hypotheses are formulated according to this. But what about the moderating effect? Is there a non-parametric way of doing it?
As an answer to my question regarding regression with data like this, my supervisor said OLS does not assume normal distribution (referring to the Gauss Markov theorem) so I could look into that and that it is recommended not to rely on normality tests but the diagnostic plots of residuals. Or stick to Spearman's but then he says I need to change my approach accordingly.
So once I use Spearman can I in addition use OLS? And if so, is there a non-parametric way of testing for moderating effects of a variable on the other relationships? Sorry about the long message, just wanted to give you enough context.
Relevant answer
Answer
Thank you Sal Mangiafico , thanks for confirming some of the things I was worried about. I did read that outliers are often not outliers but the representation of the sample and should not be removed unless are due to missing data, error or they are really sitting out on their own.
  • asked a question related to OLS
Question
8 answers
Dear colleagues,
I am planning to investigate the panel data set containing three countries and 10 variables. The time frame is a bit short that concerns me (between 2011-2020 for each country). What should be the sample size in this case? Can I apply fixed effects, random effects, or pooled OLS?
Thank you for your responses beforehand.
Best
Ibrahim
Relevant answer
Answer
It seems a very small sample to apply microeconometric techniques. Having 27 seven observations and 10 covariates, at most, you will have 27 - 1 - 10 = 16 degrees of freedom. This is pretty low. If I had to decide to pursue a project based on that, I would try to avoid it.
It is really closer to multiple times series than panel data. Have a look at this link:
  • asked a question related to OLS
Question
8 answers
ran OLS regression on my data and found issues with auto correlation due to non-stationarity of data (time series data). I need to conduct a Generalized least Square regression as it is robust against biased estimators
Relevant answer
Videos of Generalized Least Square regression (GLS) in SPSS
Check on Youtube there you found step by step how to do the GLS in SPSS
  • asked a question related to OLS
Question
7 answers
I have multiple measurements of two variables in different settings; my hypothesis is that the relationship between variables _in general_ is described by a Kuznets curve (U-shape). I tried quadratic curve fitting with OLS and results seem to confirm my hypothesis (out of 25 sets of measurements only 1 relationship is not U-shaped). Any advice on a better test?
Relevant answer
Answer
Andrei P Kirilenko, have you seen this article (with Kuznets curve in the title)?
Wang, Y. C. (2013). Functional sensitivity of testing the environmental Kuznets curve hypothesis. Resource and Energy Economics, 35(4), 451-466.
Here are the "Highlights" from the journal website:
► The regression validity of conventional test for the EKC hypothesis is questioned. ► The test of the EKC hypothesis is sensitive to different non-linear functional forms. ► None of the functional forms supports the EKC hypothesis in an OECD sample. ► SO2 and CO2 EKCs are fundamentally spurious.
  • asked a question related to OLS
Question
8 answers
Hello everyone
I have extensively read throughout the platform about the usage of different models to analyze likert scale and ordered dependent variables. I wanted to share my plans and see your opinions if it is the best model.
My context is the following: We asked how comfortable they would feel downloading an app with different characteristics (factors) from 1 - 11, 1 being under no circumstance I would download such app and 11 being I would download and use that app everyday. There are three factors,with 2, 3 and 7 levels accordingly (open-source, security and app provider). We deckerized with open source, as our previous research showed it wasn't significant, meaning that respondents were asked to evaluate a set of vignettes either from open-source or non-open source. We used clustered sampling and our sample data is representative of our objective population (with 600 answers).
I have read from sociological methodology that given that the likert scale is of 11 points (bringing a number of benefits) and it is set in an experimental manner, you can use ANOVA, OLS and Random Intercepts Models. However, I feel a bit uncomfortable using these, as some assumptions are broken. Thus, I decided to use an ordered logit regression (OLR) , as for me the dependent variable (willingness to download) is ordered. The parallel line assumptions isn't broken and all variable as significant, so that gives me confidence I can use this model. However, I started doubting if maybe a multinomial logistic regression.
I'm using R for the analysis, with the MASS packaged (specifically the polr function for the OLR and rant and poTest for checking the parallel assumptions). I have crossed checked I get the same results with STATA and it fits.
On the article I plan on also including the ANOVA, OLS and Random Intercepts Model to add robustness to the analysis. What's interesting is that, although some specific coefficients change from OLS to OLM, the conclusion are the same.
Thus: Should I used the multinomial logistic regression or not? Comments on what to report and improvements?
Edit PS: Through my ANOVA, it shows that the ind.var don't interact. Should I still include them in the OLR? Currently it is like dep.var ~ x1 + x2 . Would you suggest dep.var ~ x1 + x2 + x1:x2 as a better fit, even if the ANOVA with interaction says the interaction isn't significant? And if you think that the OLR should include the interaction, do you exactly know how to know if it is significant?
Relevant answer
Answer
see also this paper
  • asked a question related to OLS
Question
4 answers
Do I use OLS or GMM estimator
Relevant answer
Answer
OLS
  • asked a question related to OLS
Question
5 answers
I have sample a of 138 observations (cross sectional data) and running OLS regression with 6 independent variables.
My adjusted R2 is always coming negative even if I include only 1 independent variable in the model. All the beta coefficients as well as regression models are insignificant. The value of R2 is close to zero.
My queries are:
(a) Is negative adjusted R2 possible? If yes how should I justify it in my study and any references that can be quoted to support my results.
(b) Please suggest what should I do to improve my results? It is not possible to increase the sample size and have already checked my data for any inconsistencies.
Relevant answer
Answer
@David If you can suggest any book or published research paper I can refer to because I couldn't find any authentic source on google that can be cited for supporting my results.
  • asked a question related to OLS
Question
4 answers
Dear Readers,
I am writing this short series of steps that I have formulated to help those who are in a state of confusion regarding the "Analysis" of their research. As a professional from the industry, I encountered several difficulties in understanding the research. I will post that discussion separately. Please bear in mind this information will apply to the "Chapter 4- Analysis" section of your research and the same principle will be applied at the Masters, MPhil, and PhD level. If any expert can help me in further refining the concepts, I would very much appreciate it. I do understand the information might be too much to process at first, so give it a read several times, I guarantee you will find most of the answers. If you are confused about the concept of research i will post another refresher for clarity- I write this because I have been a victim of the same confusion.
P.S. A word of warning, DO NOT confuse "Model" with "Method", I made the same mistake. Model is the equation you formulate in Chapter 3 (X= Alpha + Beta x X1 + error term), Method is what we (some of us) stupidly call models (OLS, Fixed effects, Random effects, etc).
Steps for Analysis (What to do When confused):
1. Identify Data Type (Cross-Sectional, Time Series, Panel)
2. Run Descriptive Statistics
3. Check for Multicollinearity (Tested by correlation among variables or VIF tests) _(Books ridiculously quote application of Heteroscedasticity and Autocorrelation but researches have given little importance and only used these items to defend their methodology so if you use then good, if not that it’s not the end of the world- BUT Multicollinearity is a MUST).
4. Identify if there is any Trend in Data Using Unit Root Tests- This will decide the next steps
a) If there is no Trend in Data- We can Use OLS- Next steps are as follows for point A.
i. Run OLS and check for Heteroscedasticity- If Heteroscedasticity is detected OLS is not applicable- We should move towards the application of Fixed Effects and Random Effects and subsequently decide which is better from Hausman Test. Researchers point out the possible issue of Heterogeneity which is sometimes used as an acid test for choosing between FE and RE. This usually states that if your data is Homogenous (i.e. You are working on a single industry- Fixed Effects is appropriate this is because the only Heterogeneity is a with-in the sample (Firm Size, Capital Structures, etc., i.e. “Beta (Ungeared-CAPM-Industry Risk and NOT the beta we use in research ” is similar/.) If the data is heterogeneous (say an index that includes several companies from different industries this contributed to unidentified Heterogeneity and Random Effects Model is more appropriate). However, you still have to check the "Rho Values" and correlation for presence of endogeneity which can still reject the choice of RE/FE model.
b). If Trend Is detected (Which will obviously be there for Time Series and Panel Data)- than Regression is not applicable- (Do not get confused if you still see researches with this method because the method is just a choice-)In case of Trend, The following steps are to be followed:
a. Use log of variables or perform Differencing (1st observation minus second and so on) - (1st Order, 2nd Order, etc.) and identify the “Level” at which variables become stationary i.e. Trend is removed (Eventually after some levels, data automatically becomes stationary so need to get confused).
If some variables are stationary at “Level” (Initially) - while some become stationary at different levels (1st Order Differencing, 2nd Order...)- then a mixed approach is used. In this instance the following models are applicable: 1. ARDL 2. VAR 3. GMM/LSDV
GMM (Is Applicable when: 1. The panel is Dynamic 2. The panel is Short (Literature states if N<15, T is between 5 & 15) 3. The panel is Unbalanced (Not All observations are available for all time periods- for example, some companies close down during the research period) In this case closest compeitor to GMM is LSDV but GMM is still prefered- Further Tested by 1. Under Identification Test 2. Over Identification test- Additional Tests include Kruskal Walis Test- which can be used if your variables are challenged for being “Too Similar” say Return on Assets and Return on Equity - You can use this test to shut them up by proving that variables are not "Similar"- As both are derived from Returns. Ramsey Test can also augment the findings.
"Old Men" in the field, adamantly advocate the application of 2SLS/3SLS model in such situations. the basic functioning of all of these models is the same- They take lags up to several stages. However, As per Research Papers, i ve found that 2SLS/3SLS can offer a remedy instead of GMM, ARDL,LSDC, because of instrumental variables, but developing instrumental variables is another complex task and is often challenged, secondly researchers also believe that 2SLS/3SLS will be applicable for "Survey Data". Using these logics and citing references, you can easily defend your choice.
My Apologies for the bad language, I only speak from my personal experiences, when People merely pointed out "Some Thing Wrong" and never gave me the solution- unless I dove into the literature to find out the problem and the solution my self- Will post the story of my transition from a "Professional in the Industry" to "Researcher and a Professional in the Industry" and how I was able to conclude a paragon research in my industry- The first-ever based on secondary data from Telecommunications in my country- Never got the recognition for it, but wanted to give something back to the industry in the form of a contribution to the literature.
Relevant answer
Answer
Thank you Umair Khan Khan for the notes. Its beneficial to those in dark, especially not familiar with the econometrics approaches at the beginning of the research.
  • asked a question related to OLS
Question
14 answers
Hi,
I am using recall of words first trial as a dependent variable. It captures the amount of words an individual remembered. It ranges from 0 to 10 words.
Do I need to use an ordered model since 10 is preferred above 0 and it are all integers, or can I just run it with OLS?
Thank you!
Relevant answer
Answer
Martinson Ankrah Twumasi I would be interested to know why you would use a Poisson Model when there is an upper limit on the score (10 out of 10) as well as a lower limit (0 out of 10) . Moreover, the one parameter Poisson model (the conditional variance assumed to be equal to the mean) is quite a restrictive model and the Negative Binomial Model is see as more flexible; see https://www.cambridge.org/core/books/negative-binomial-regression/
  • asked a question related to OLS
Question
10 answers
Dear researchers
I am calculating Gravity Equation. For the OLS everything goes normal but when taking a country id and time fixed effects then Distance goes wrong. Can anyone explain this how to make it negatively affect trade volume? 
I have attached the screenshot of my equation and results. 
Relevant answer
Answer
Dear Saba
It's been a long I was facing the issue but solved it successfully. Kindly allow me to know more about your sample size and variable descriptions.
Best Wishes!
Dr. Saqib
  • asked a question related to OLS
Question
3 answers
I want to create variable_3 taking the difference of variable_1 and variable_2
in order to run the OLS regression containing variable_3, correctly, what should I do for the dates that variable_1/variable_2 don't have data?
Relevant answer
Answer
Would be great to have an example. Why is the difference needed?
  • asked a question related to OLS
Question
3 answers
This is my research problem so far:
In this scientific paper I will conduct an empirical investigation where the objective is to discover if the number of newly reported corona cases and deaths have been contributing towards the huge spike in volatility on the sp500 during the pandemic phase of the corona outbreak. This paper will try to answer the following questions: “Is there any evidence for significant correlation between stock market volatility on the SP500 and the newly reported number of corona cases and deaths in the US?”. “If there is significant evidence, can the surge in volatility mostly be explained by the national number of daily reported cases or was the mortality number the largest driver? “
So far i have conducted a time series object in R-studio containing the variables; VIX numbers, newly reported US corona cases and deaths. I have also converted my data into a stationary process and will later on test some assumptions. I have a total of 82 obersvations for each variable that stretches from 15. February to 15. June.
I do not have a lot of knowledge regarding all the different statistical models, and which ones that is logical to use in my case. My first thought was to implement a GARCH or OLS regression, although I am not sure if this is a smart choice or not. Hence, I ask you for some advice.
Thank you in advance :)
Best regards, stressed out student!
Relevant answer
Answer
The performance of a model can only be judged after applying it to real data and comparing with simpler alternative models.
Please let me recommend this framework to compare and evaluate alternative models:
My recommendation is to use a set of alternative models and to use the above chapter to implement basic procedures to compare alternatives and interpret the results.
  • asked a question related to OLS
Question
5 answers
I am working on a panel of 36 countries over a period 1990-2018 and l want to compute marginal effects for the various regression models employed(OLS, RE, FE and system GMM) using STATA. Anytime l run the marginal effects in stata with the commands....."margins, dydx" or "margins atmeans", the coefficients for the marginal effects are the same as those of the original regression coefficients. I learnt l must consider conduit variables.
Please, l need your help on how to compute marginal effects in stata.
Relevant answer
Answer
After an estimation, the command mfx calculates marginal effects. A marginal effect of an independent variable x is the partial derivative, with respect to x, of the prediction function f specified in the mfx command’s predict option. If no prediction function is specified, the default prediction for the preceding estimation command is used. This derivative is evaluated at the values of the independent variables specified in theat option of the mfx command; if no values are specified, it is evaluated at the default values, which are the means of the independent variables. If there were any offsets in the preceding estimation, the derivative is evaluated at the means of the offset variables.
The derivative is calculated numerically by mfx, meaning that it approximates the derivative by using the following formula with an appropriately small h
Further u can follow the following link for detailed discussion and stata commands for marginal effects...
  • asked a question related to OLS
Question
3 answers
I have moderating variable W and dependent variable is Y. OLS results show WY is positive and significant but Y becomes negative which was positive without moderation. So my main relationship changes from positive to negative after moderation. Can I report the results? What could be the possible justification?
Relevant answer
Answer
After all, you did not temper with any variable. Thus, you report your findings the way it is. However, try to reveal the mystery story in the data and the factors that are likely to contribute to such unwarranted changes.
  • asked a question related to OLS
Question
4 answers
I am attempting to construct an error-correction model. I have 7 variables including the dependent variable. I already ran a ADF test to test unit root for both level of difference and First difference, and I'm looking at the results through both with constant and constant and trend. One of the variables, without the first difference is already smaller than 0.05 in constant and trend, but other than that all the other variables accepted the Null hypothesis. Would that one variable that rejected cause any problem?
My second problem is that when looking at the First difference, one variable is still bigger than 0.05 and when I ran the test for cointegration the p value showed up to be 0.7881, which is bigger than 0.05, does this mean that I cannot run a OLS then estimate a Error Correction Model?
Relevant answer
Answer
I would make several comments on your econometrics.
1. You have 33 observations and you have 8 estimated coefficients in your estimated error correction term. You will also be estimating some coefficients on first difference terms. Your dataset may be too short to support such an elaborate analysis.
2. I presume that the 9 in the lags column is the maximum lag from which you tested down. It would be better to give the actual lag used in the test. If your tests were based on 9 lags, think that something has gone wrong.
3. In an ADF style unit root test the null hypothesis is that the time series being examined is non-stationary. The choice of
  • constant + trend
  • constant,
  • no constant
to be used in the test depends on the alternative hypothesis that you consider appropriate
If the alternative is that the series is stationary about a trend then you should include a constant and trend in your test. Consider log(gdp). Your alternative here would be that apart from a random disturbance log(gdp) might grow at a fairly constant rate. Therefore including both constant and trend would be appropriate.
Consider an interest rate. Clearly, if this is stationary it can not be expected to grow but should be stationary about some constant. Therefore in an ADF style test of an interest rate, I would include a constant only.
Such considerations should be part of your initial economic examination.
4. If the alternative in the test for levels is a constant plus trend then the alternative in the test for the first differences of the same variable is a constant. Taking first differences removes the trend. If the alternative in the test for levels is a constant then the alternative in the test for first differences is no constant and no trend.
5. It is almost a certainty that you should be working with the logs of your variables. Perhaps you are, but your notation casts some doubt on this. You may find that using the transformed variables improves your results. One usually finds that log(GDP Con) is I(1).
6 Looking at your equation I would think that you have simultaneity problems. For example, GFCF (Gross Fixed Capital Formation?) is as much determined by FDI as FDI is by GFCF. Some of the other variables that you are using may have the same problem. I would think that log(GFCF) would also be I(1). I would expect a good argument that GFCF causes FDI.
7. As suggested by
Georgios Savvakis
ARDL is appropriate when you have I(0) and I(1) variables and no I(2)variables. However ARDL also requires that there be only one cointegrating relationship and that certain exogeneity conditions hold. It is very likely that ARDL is not applicable in your case.
8. VECM methods can be used in the presence of I(0), I(1), and I(2) variables. This is not covered in many textbooks and may be missing from the software that you are using. A good reference to start with is Juselius (2007), The cointegrated VAR model, OUP. This process is difficult and that is why it is missing from many textbooks.
9. You do not need to include a trend in your Engle-Granger regression when an "explanatory" variable contains a drift. See Hayashi (2000), Econometrics, Princeton, for an explanation. (While this an advanced econometrics textbook the coverage of the Engle-Granger cointegration test is very good and does not require advanced mathematics)
10.
  • asked a question related to OLS
Question
25 answers
Dear colleagues,
I ran several models in OLS and found these results (see the attached screenshot please). My main concern is that some coefficients are extremely small, yet statistically significant. Is it a problem? Can it be that my dependent variables are index values that ranged between -2.5 and +2.5 while I have explanatory variables that have, i.e the level of measurement in Thousand tons? Thank you beforehand.
Best
Ibrahim
Relevant answer
Answer
Dear Mr. Niftiyev, your dependent variable varies betweem -2,5 and +2,5. Hence, it is better to employ a Tobit, Probit or Logit approaches if possible. The choice between these three approaches depends mostly on the distribution patterns of the variables.
  • asked a question related to OLS
Question
2 answers
I am analysis a data using propensity score matching and path analysis. However, one of the assumptions in selecting matching variables for propensity score matching is that they should not affect selection into treatment and also not affected by the treatment variable (endogeneity problem and unconfoundedness assumption) (Caliendo & Kopeinig, 2008).
1. Am I making an analytical mistake (violating the assumption) if I run both propensity score matching and normal regression (OLS using outcome variable as dependent variable or probit regression using treatment variable as dependent) in the same study? Because I am using the same matching variables as independent variables in the OLS and probit.
2. Is it analytically correct to run a path analysis in addition to propensity score matching using the same matching variables? I am running the path analysis to trace the path towards the outcome variables.
Relevant answer
Answer
Yeap! I agree. Propensity is a preprocessing method. You do what you have to do at this stage to prepare for the outcome analysis stage. Do not conflate both. You can use matching vars in your outcome analysis - it is for double-robust estimation.
  • asked a question related to OLS
Question
42 answers
I need to do an OLS regression on a dataset with stock index returns. To control for the Monday- and Holiday effect I need to add dummy variables. An other study that did the same regression described this as follows:
"𝐷𝑡={𝐷1𝑡,𝐷2𝑡,𝐷3𝑡,𝐷4𝑡}t,Dt={D1t,D2t,D3t,D4t} are dummy variables for Monday through Thursday, and 𝑄𝑡={𝑄1𝑡,𝑄2𝑡,𝑄3𝑡,𝑄4𝑡,𝑄5𝑡}Qt={Q1t,Q2t,Q3t,Q4t,Q5t} are dummy variables for days for which the previous 1 through 5 days are non‐weekend holidays."
Let's consider that there is no trading on Tuesday May 1, because of a non-weekend holiday, how do I correctly use the dummies? Which dummy from Q1t through Q5t has to be used for Wednesday May 2 or are there also dummies needed for Thursday May 3 and Friday May 4? Do I only need one dummy for Wednesday or do I need to add more than one dummy for the next days after Tuesday May 1?
I am new with this kind of regressions, so I do not know how to use this.
Can somebody maybe help me? Thanks in advance.
Relevant answer
Answer
A dummy variable should be to control for any qualitative difference. If the stock market is closed for the weekend, a dummy variable indicating that would be sufficient. If there are qualitative differences between different weekends, or between a holiday and a weekend, two dummy variables may be used
  • asked a question related to OLS
Question
4 answers
Hi, dear researchers.
I am running panel regression in STATA. My time series dataset contains 3 countries covering 25 years with 7 variables.
The variable I need is found to be significant as the result of panel regression. However, when I run simple OLS regression for each country separately, the required variables show the significance for only two countries. What would you recommend in this case? Which model should I use?
Relevant answer
Answer
Is it feasible to use an ARIMAX model by including an indicator variable for each country in your final model?
  • asked a question related to OLS
Question
4 answers
I have asked students a series of questions relating to service quality of university from before and after the pandemic on a 7 point likert scale.
For instance
"Instructors motivate the students- PRE COVID-19" Strongly Agree-Strongly Agree
and then again straight after
"Instructors motivate the students- DURING COVID-19" Strongly Agree-Strongly Agree
So I have about 30 of these questions and I went to see if changes in scores across the two periods have impacted a range of dependant variables, for example one of them is student engagement proxied by how many lectures they attend per week.
"How many lectures a week do you attend (Pre COVID-19)?" 12/3/4/5/6/7/8/9+
"How many lectures a week do you attend (During COVID-19)?" 12/3/4/5/6/7/8/9+
I am not very knowledgeable in using stata. What is the best regression model for this? Do I need to do a fixed effects model and if so how? Can I just pool the data by creating a new variables for instance change in scores pre to during covid and then run a basic OLS?
Relevant answer
Answer
follower
  • asked a question related to OLS
Question
17 answers
Dear Colleagues,
I estimated the OLS models and checked them for several tests; however, instability in CUSUMSQ persists as described in the photo. What should I do in this case?
Best
Ibrahim
Relevant answer
Answer
I presume that your data is quarterly or monthly as otherwise, you have too few observations to make any reasonable inferences.
If you are trying to make causal inferences (e.g. you have an economic model that implies that x causes y and you wish to measure that effect). the CUMSUMSQ is one test that indicates that your model is not stable. Either the coefficients or the variance of the residuals is not stable. You have indicated that there is no heteroskedasticity so it is possible that the model coefficients are the problem. The test itself only indicates that there's instability and does not say what the instability is or what causes it. There are many possible causes of instability, (omitted variables, functional form, heteroskedasticity, autocorrelation, varying coefficients etc.) Your best procedure is to return to your economics and work out how your theory might lead to stability problems. Are there possible breaks in your data caused by policy changes, strikes, technological innovations, and similar. that might be covered with a dummy variable or a step dummy.
If you are doing forecasting (or projections) I would not be too concerned about specification tests. It is very unlikely that an unstable model will forecast well. You may achieve good forecasting results with a very simple model that need not be fully theory compliant.
  • asked a question related to OLS
Question
7 answers
Hello,
I am in the process of estimating panel data regressions where I regress stock returns on Fama-French factors + 1 dummy capturing ESG initiatives. In order to first test for which estimation technique is the most appropriate, I run the Hausman test in order to compare FE and RE. The results from this test are that we fail to reject the null that Cov(alpha_i , x_it) = 0, which suggests that random effects yields the most efficient estimates.
However, when I estimate the regression using RE i get a theta (also referred to lambda) of 0 indicating that RE is equivalent to OLS, since the cross-sectional mean will disappear from the model (multiplied by zero). I've also tested estimating with Pooled OLS, and it indeed gives exactly identical estimates and SEs. The problem is that when I run a poolability test (H0: beta_i = beta), it rejects the null that the data is fit for a pooled OLS.
I feel like this is somewhat contrary as the RE model estimated the regression as a pooled OLS, but the F-test for poolability rejects the null.
Am I missing something here or have anyone been in a similar situation? Should be stated that I do the econometric modelling using Matlab and Panel Data Toolbox (Álvarez et. al. 2017).
Relevant answer
Answer
With a time dimension of 1005 and a panel dimension of 18, I would think that the time dimension asymptotics would be more important than the panel dimension.
Looking at the economics of your model, I would think that the factor coefficients would differ from stock to stock or from portfolio to portfolio. If this is so then panel data is inappropriate as panel data methods force these coefficients to be equal across stocks.
Have you considered estimating your factor equation separately for each stock giving you a stacked system of 18 equations. There may also be some contemporaneous correlation between the disturbances in your system. You may then need to use Seemingly Unrelated Regressions (AKA SUR). Within such a system you can also test for cross equation restrictions and if found estimate imposing these constraints.
  • asked a question related to OLS
Question
1 answer
I build a model of 4 regression equations (probit and OLS) in R studio. I need to take the weights of observations into consideration. While it is simple to do so for every regression equation, separately. I can't find how to do so for the SUR model.
Relevant answer
Answer
Ali Faissal Maarouf, SUR models can alternatively be estimated using structural equation modelling (SEM), in R with the lavaan package. The link below explains how to use survey weights based on this main package. https://statistics.ohlsen-web.de/sem-with-lavaan/
  • asked a question related to OLS
Question
3 answers
The following model: ( 10)
Y_t=β_0+β_1 Y_(t-1)+β_2 X_t+u_t