Science topic

# Applied Econometrics - Science topic

Explore the latest questions and answers in Applied Econometrics, and find Applied Econometrics experts.
Questions related to Applied Econometrics
Question
I would like to test whether the general relationship between the number of years of education and the wage is linear, exponential, etc. Or in other words, does going from 1 year to 2 years of education have the same impact on wages as going from 10 to 11. I want a general assessment for the world and not for a specific country.
I got standardized data from surveys on several countries and multiple times (since 2000). My idea is to build a multilevel mixed-effects model, with a fixed effect for the number of years of education and random effects for the country, the year of the survey and other covariates (age, sex, etc.). I’m not so used to this type of model: do you think it makes sense? Is this the most appropriate specification of the model for my needs?
https://d-nb.info/1212378199/34
Question
Hey there. I am trying to walk through this concept in my head but unfortunately do not have the name for the type of model I am attempting to create. Let me describe the data:
1) One dependent variable on an hourly interval
2) Multiple exogenous "shock" variables that have positive effects on the dependent variable.
3) The dependent variable has 0 effect on the exogenous shock variables.
The dependent variable can be modeled by a function of it's own lags and the exogenous shock variables.
I would like to model the variable with an ARIMA model with exogenous variables that have an immediate impact at time T and lagging effects for a short period of time. (This is similar to a VAR's IRFS, except the exogenous variables are independent of the dependent variable).
The assumption is that without the exogenous shock variables, there is an underlying behavior of the data series. It is this underlying behavior that I would like to capture with the ARIMA. The exogenous shock variables are essentially subtracted from the series in order to predict what the series would look like without exogenous interference.
The problem:
I am worried that the ARIMA will use information from the "exogenous" shocks within the dependent series in estimating the AR and MA terms. That would mean that there would be positive bias in the terms. For example: If an exogenous shock is estimated to have an effect of a 100 unit increase the dependent variable, then this 100 unit increase should NOT effect the estimation of the AR or MA terms since it is considered to be unrelated to underlying function of the dependent variable.
I've attempted to write this out mathematically as well.
Question
I have heterogeneous panel data model,, N=6 T=21,What is the appropriate regression model? I have applied CD test , It shows the data have cross-sectional dependency
I used the 2nd unit root tests , and the result found that my data is stationary at level
is it possible to use PMG ? would you pleas explain the appropriate regression model?
Question
Hello
I am conducting a Quadratic ARDL model on Eviews for Kuznets Curve.
My significance level is 10%. And I want to conduct CUSUM and CUSUM SQUARE test for stability using 10% significance level on Eviews. But I am only getting for 5%.
If not on eviews, where and how can I get this test?
Thank you
Question
I want to find how the number of COVID-19 cases, deaths and lockdown affect each sector in Sri Lankan economy. Hausman test suggested me fixed effect model as the most suitable model. I need to fit the fixed effect model for each sector separately including the sector as a dummy variable to the model. but I get the error 'Near singular matrix'. Why is that? Is it impossible to fit fixed effect model for each sector separately when the sector was included as a dummy variable?
In panel data, it is possible to fit fixed effect models separately for each dummy variable, but this approach is not recommended because it can lead to biased estimates and inefficient inference.
Fixed effect models in panel data are typically used to control for unobserved heterogeneity that is time-invariant. This is accomplished by including a set of dummy variables for each individual in the data. The inclusion of these individual-specific dummy variables absorbs the time-invariant heterogeneity, allowing for unbiased estimation of the remaining time-varying effects.
Fitting fixed effect models separately for each dummy variable would essentially be the same as estimating a separate model for each individual. This approach would fail to account for the time-invariant unobserved heterogeneity that is common to all individuals. As a result, the estimates would be biased and inefficient.
Therefore, it is recommended to include all the dummy variables in the model together as a set of fixed effects. This allows for a more efficient estimation and unbiased inference of the time-varying effects.
@Randimal Herath
Question
Dear Scholars, I am measuring the effect of policy on firms' performance. I found a common (in both treatment and control groups) structural break 4 years before the policy intervention. I used the difference-in-difference model to find the impact of the policy. I am using 20 years of firm-level panel data. What are your suggestions?
It depends on how the shock influenced the treated and control unit.
If you can argue that the shock influenced all units in the same way, then the inclusion of two way fixed effects (particularly time fixed effects in this case) can help you to rule out the effects of such 'common' shock.
However, if there are reasons to think that the structural shock differently affected treated and control units, then this could result in potential biases in the estimation and identification of the average treatment effect.
There is a novel literature dealing with the correct identification of such effects. I would suggest you reading the following paper:
"VISUALIZATION, IDENTIFICATION, AND ESTIMATION IN THE LINEAR PANEL EVENT-STUDY DESIGN" from Simon Freyaldenhoven, Christian Hansen, Jorge Pérez Pérez and Jesse M. Shapiro. You can find the paper here: http://www.nber.org/papers/w29170
I particularly suggest you to read section 3.1. of the paper as long as accompanying the paper there is the Stata command "xtevent" which is particularly useful to deal with such econometric analyses.
Question
I am currently trying to understand a possible dynamic panel model with the years of observation (t) higher than the number of unit of observation (n). The particular dataset contain 6 different cross-sectional region observed within the span of 30 years.
It is possible to build a GMM with the number of observations (n) being lower than the number of time periods (t). However, in general it is recommended to have a larger number of observations than the number of parameters being estimated. This is known as the "curse of dimensionality" and can lead to overfitting and poor model performance. Additionally, a GMM requires estimation of the mean, covariance, and mixing proportion for each component, which may be difficult to do with a small number of observations. Therefore, it is generally recommended to have a sufficient amount of data when building a GMM model.
Question
I ran the ARDL approach on E-Views 9, and it turns out that the independent variable has a small coefficient, but it appears as zero in the long-term equation as shown in the table, despite having a very small value in the ordinary least squares method. How can I show clearly this low value?
Look at a scatterplots which may help but it seems that you don't have much of a relationship here. Best wishes David Booth
Question
Using E-Views 9, I ran the ARDL test, resulting in an R-Squared value in the initial ARDL model output and an R-Squared value under the Bounds test. so, what is the difference between these two R squared values?
The difference between R squared in the ARDL model and R squared in the bounds test is that R squared in the ARDL model is about the rubostness of the overall model result and not about the robustness of the bounds test result only. This means R squared in the ARDL model will tell you how much the variability in Y or Ys (your dependent variable/variables) is/are explained by the ARDL model given Xs variables (your independent variables). On the other hand, R squared in the bounds test is specifically about the bounds test result within the ARDL model and not about the robustness of the overall model result. You need an ARDL model first to have a bounds test but not a bounds test to have the ARDL model. I hope it helps you.
Question
Hi! I would like to have an opinion on something, rather than a straight-out answer, so to speak. In time-series econometrics, it is common to present both long-term coefficients from the cointegrating equation, as well as the short-term coefficients from the error correction model. Since I have a lot of specifications, and since I'm really only interested in the long-term, I only present the long-term coefficients from a cointegrating equation in a paper I'm writing. Would you say that is feasible? I'm using the Phillips-Oularis singe-equation approach to cointegration.
Question
Hi, I am wondering if we can "manually generate" more instrumental variables in TSLS estimation by taking high order terms?
For instance, when Z is a valid instrumental variable for X, then Z^2 must also satisfy the conditions for relevance and exogeneity. Can I make more insturments in this way?
My guess is that the answer dependents on the true relationship between X and Z, when X is linear in Z, then the additional quadratic term will not be significant in the first stage regression and hence is useless. But for cases when X has a quadratic relationship with Z, I think it is better to include the quadratic term in the first stage as well. However, I hardly see such a practice in literatures, why? Can you share some examples when quadratic or higher order terms IV are included? Thanks
thanks to both of you!
Question
Hi,
Could anyone suggest me material (ppt, pdf) related with Applied Econometrics with R (intro level)?
I have often been approached with questions such as this. I would have known the person and his background in economics, statistics/econometrics, linear algebra, and mathematics. I would also have known or asked why he wanted to proceed in this way and what he hoped to achieve. The lists below are long but I would hope that you find something there that is useful.
I know of three modern introductions to econometrics that are accompanied by R companions.
1. Michael R Jonas has already mentioned Wooldridge (2019), Introductory Econometrics: A Modern Approach, 7th ed, South-Western College Publishing, and the R companion available at http://www.urfie.net.
2. Stock and Watson (2019), Introduction to Econometrics, Pearson, and https://www.econometrics-with-r.org/
3. Brooks (2019), Introductory econometrics for finance, CUP and the R Guide for Introductory Econometrics for Finance available in kindle format from Amazon or in pdf from the publisher (guide is free from both sources.
4. https://scpoecon.github.io/ScPoEconometrics/ is an introduction to econometrics with R taught to second-year undergraduates.
5. Kleiber and Zeileis (2008), Applied Econometrics with R, Springer
The first three books cover an undergraduate introduction to econometrics. I found item 5 very useful but you probably need to have a good knowledge of econometric theory to take full advantage of it.
There are also various graduate textbooks on time series analysis, panel data analysis, and various other aspects of econometrics.
Econometrics is often taught as a series of recipes. e.g. estimate the following expression using OLS, without an explanation of the importance of the underlying economic theory. Causality is often inferred from tests of significance without an understanding of how the statistical tests depend on the underlying economic theory. The ideas of tests of hypotheses and p-values are often not understood. The books below are attempts to explain what can be achieved by statistical/econometric analysis and how to avoid false conclusions.
1. Huntington Klein (2022), The Edge An Introduction to Research Design and Causality, CRC Press. R code is used throughout the book. An online version is available at https://theeffectbook.net/.
2. Edge (2019), Statistical Thinking from scratch, Oxford deals with the simple regression model and adds considerably to an understanding of the various tests and recipes that are used in econometrics without resource to advanced mathematics.
3. Cunningham (2021) Causal Inference: The Mixtape, Yale. has examples in R and Stata. This is a little more advanced than 1 and 2.
4. Hirschauer, Gruner et al. (2022), Fundamentals of Statistical Inference, Springer. This book should be compulsory reading for all econometricians and applied statisticians (and anyone doing statistical analysis. In particular, the coverage of testing hypotheses and p-values is essential reading. It is written in a very accessible form. (It does not contain any R examples)
Question
Can ECM term be lower than -1? How to interpret a value lower than -1? Does it suggest that something wrong with the model?
Any help?
The coefficient of ECM is also consistent with the rule of thumb which suggests that, the coefficient of error correction term (ECT) should be negative, less than one in its absolute value and significant.
Question
What is the difference between mediating and moderating variables in panel data regression?
I recommend that you use the free statistical program JAMOVI which allows the inclusion of multiple moderators and mediators.
Question
Hello, I am new to VARs and currently building a SVAR with the following variables to analyse monetary policy shocks and their affects on house prices: House prices, GDP Growth, Inflation, Mortgage Rate first differenced, Central Bank Base Rate First Differenced and M4 Growth Rate. The aim of the VAR analysis is to generate impulse responses of house prices to monetary policy shocks, and understand the forecast error variance decomposition.
I'm planning on using the Base Rate and the M4 Growth Rate as policy instruments, for a time period spanning 1995 to 2015. Whilst all variables are reject the null hypothesis of non stationarity in an Augmented Dickey Fuller test, with 4 lags, the M4 growth rate fails to reject the null hypothesis.
Now, if I go ahead anyway and create a SVAR via recursive identification, the VAR is stable (eigenvalues within the unit circle), and the LM test states no autocorrelation at the 4 lag order.
Is my nonstationarity of M4 Growth Rate an issue here? As I am not interested in parameter estimates, but rather just impulse responses and the forecast error variance decomposition, there is no need for adjusting M4 Growth rate. Is that correct?
Alternatively I could first difference the M4 growth rate to get a measure of M4 acceleration, but I'm not sure what intuitive value that would add as a policy instrument.
KB
I will answer this question by suggesting that you consider your basic model.
1. You have a strange mixture of nominal, real variables and growth rates.
2. Interest rates are the equivalent of log differences
3 If the growth rate of M4 is non stationary this implies that the log of M 4 is I(3). I find this hard to believe.
4. Are there seasonal effects in your data?
5. I would be very doubtful if ARDL is valid in your case.
6. If I understand correctly you intend to identify your system by using a particular ordering of your variables. It will be difficult to justify any particular order
7.I would be concerned that you might have trouble accounting for bubbles in house prices and subsequent falls. The 20 years that you are using was a very difficult time for house prices and they were out of equilibrium for a lot of this period. Your project is very ambitious.
8. I would recommend that you talk to your supervisor and agree an analysis that might be more feasible..
Question
I am using an ARDL model however I am having some difficulties interpreting the results. I found out that there is a cointegration in the long run. I provided pictures below.
Mr a. D.
The ECT(-1)os always the lagged value of your dependent variable.
Regards
Question
I am currently replicating a study in which the dependent variable describes whether a household belongs to a certain category. Therefore, for each household the variable either takes the value 0 or the value 1 for each category. In the study that I am replicating the maximisation of the log-likelihood function yields one vector of regression coefficients, where each independent variable has got one regression coefficient. So there is one vector of regression coefficients for ALL households, independent of which category the households belong to. Now I am wondering how this is achieved, since (as I understand) a multinomial logistic regression for n categories yields n-1 regression coefficients per variable as there is always one reference category.
Question
Hey! I need to improve already existing panel data model by adding 1 variable for access to technology. Is it possible, and what is the best  variable to measure for technology accessibility. If is possible I would like to measure technological advancment as well. What should be my variables fpr this? What are the common practises so far? Thank you!
Technological progress is measured by the rate at which these inputs improve, and the direction of technological progress is represented by the relative pace of these improvements.
Technological access is measured by the adequate skills and effectiveness.
Question
Hi all, I'm doing fyp with the title of the determinant of healthcare expenditure in 2011-2020. Here are my variables: government financing, gdp, total population.
first model is: healthcare expendituret=B0+B1gov financingt + B2gdpt + B3populationt + et
second causal relationship model is healthcare expenditure per capitat= B0 + B1gdp per capitat +et
It is possible to use unit root test then ADRL for the first model and what test can use for the second model?
You should read the Global Burden of Disease articles on health care financing. Their appendices fully disclose their variables (& they've spent a lot of time & effort dettermining the best ones) and they provide their (sophisticated) statistical code. Lead author on most of that work is Joseph Dieleman or something like Global Burden of Disease Health Financing Collaborator Network. A full biblio of this work is at https://www.healthdata.org/about/joseph-dieleman if you click on the publications tab. The pubs are all open-access & most are on ResearchGate. :-)
Question
I use a conditional logit model with income, leisure time and interaction terms of the two variables with other variables (describing individual's characteristics) as independent variables.
After running the regression, I use the predict command to obtain probabilities for each individual and category. These probabilities are then multiplied with the median working hours of the respective categories to compute expected working hours.
The next step is to increase wage by 1%, which increases the variable income by 1% and thus also affects all interaction terms which include the variable income.
After running the modified regression, again I use the predict command and should obtain slightly different probabilities. My problem is now that the probabilities are exactly the same, so that there would be no change in expected working hours, which indicates that something went wrong.
On the attached images with extracts of the two regression outputs one can see that indeed the regression coefficients of the affected variables are very, very similar and that both the value of the R² and the values of the log likelihood iterations are exactly the same. To my mind these observations should explain why probabilities are indeed very similar, but I am wondering why they are exactly the same and what I did possibly wrong. I am replicating a paper where they did the same and where they were able to compute different expected working hours for the different scenarios.
Either something went wrong or performed the same test. Did you use the same version of the software as the original study?
Question
In the file that I attached below there is a line upon the theta(1) coefficient and another one exactly below C(9). In addition, what is this number below C(9)? There is no description
I asked that question to the person who code that package, and he said C(9) coefficient does not have any meaning here, just ignore. It comes up because the package is written for the old version of Eviews and has not been updated that is why.
Question
Hello, I am facing a problem concerning the computation of regression coefficients (necessary information in attached image):
Three regression coefficients (alpha y, m and f) of the main regression (2) are generated through three separate regressions.
Now I was wondering which would be the appropriate way to compute the alphas and gammas.
In case I first run regression (2) and obtain the three regression coefficients alpha y, m and f, can I use these for the separate regressions as dependent variables in order to then run regressions (3) and obtain the gammas?
What is strikes me with this approach is that the value of the dependent variables alpha y, m and f would always be the same for each observation.
In the paper they state that the alphas are vectors, but I don't properly understand how they could be vectors (maybe that's the issue after all?).
Or is there a way to directly merge the regressions / directly integrate the regressions (3) into regression (1)? Preferably in Stata.
I appreciate any help, thank you!
If I have understood the question well, you have a system of regression equations, not a single one; unless you replace the alpha coefficients in the main equation which yields a new one in terms of new exogenous variables (Z'n x yn, etc.). Then alpha coeffs will disappear and the single regression can be solved for the other coeffs, i.e. betas and gammas. Alphas can be calculated easily.
Question
Dear All,
I’m conducting an event study for the yearly inclusion and exclusion of some stocks (from different industry sectors) in an index.
I need to calculate the abnormal return per each stock upon inclusion or exclusion from the index.
I have some questions:
1- How to decide upon the length of backward time to consider for the “Estimation Window” and how to justify ?
2- Stock return is calculated by:
(price today – price yesterday)/(price yesterday)
OR
LN(price today/price yesterday)?
I see both ways are used, although they give different results.
Can any of them be used to calculate CAR?
3- When calculating the Abnormal return as the difference between stock return and a Benchmark Return (market return), The market (benchmark) return should be the index itself (on which stock are included or excluded) ? Or the sector index related to the stock?
Hi there, I am using Eventus software and I am wondering how the software computes the Market index in order to calculate abnormal returns?
Question
I have a big dataset (n>5,000) on corporate indebtedness and want to test wether SECTOR and FAMILY-OWNED are significant to explain it. The information is in percentage (total liabilities/total assets) but is NOT bounded: many companies have an indebtedness above 100%. My hypothesis are that SERVICES sector is more indebted than other sectors, and FAMILY-OWNED companies are less indebted than other companies.
If the data were normally distributed and had equal variances, I'd perform a two-way ANOVA.
If the data were normally distributed but were heteroscedastic, I'd perform a two-way robust ANOVA (using the R package "WRS2")
As the data is not normally distributed nor heteroscedastic (according to many tests I performed), and there is no such thing as a "two-way-kruskall wallis test", which is the best option?
1) perform a generalized least squares regression (therefore corrected for heteroscedasticity) to check for the effect of two factors in my dependent variable?
2) perform a non-parametric ANCOVA (with the R package "sm"? Or "fANCOVA"?)
What are the pros and cons of each alternative?
Often, the log-normal distribution is a sensible assumption for percentages that are not bounded at 100%. Are there any arguments against this in your case?
It makes a huge difference for interactions if you analyze the data on the original scale or on the log scale. If is is sensible to assume relative effects (a given impulse will change the response depending on the "base value" of the response, so absolute changes will differ when the base values differ), interactions seen on the original scale are highly misleading in terms of the functional interpretation (i.e. when you aim to understand the interplay of the factors).
Question
I have run an ARDL model for a Time Series Cross Sectional data but the output is not reporting the R.squared. What could be the reason/s.
Thank you.
Maliha Abubakari
I thought PMG estimator (a form of ARDL ) is more approprite
Question
Hi colleagues,
I use Stata13 and I want to run panel ARDL on the impact of institutional quality on inequality for 20 SSA countries. I have never used the technique so I am reading up available articles that used it. But I need help with a Stata do-file because I still don't know what codes to apply, how to arrange my variables in the model, and what diagnostics to conduct.
Any help or suggestion will do....thanks in anticipation!!!
*Panel ARDL
*PMG
xtpmg d.output d.fcapinf d.bankcr d.pfolio d.equity, lr(l.output fcapinf bankcr pfolio equity) ec(ECT) replace pmg
xtpmg d.output d.fcapinf d.bankcr d.pfolio d.equity, lr(l.output fcapinf bankcr pfolio equity) ec(ECT) replace full pmg
*MG
xtpmg d.output d.fcapinf d.bankcr d.pfolio d.equity, lr(l.output fcapinf bankcr pfolio equity) ec(ECT) replace mg
xtpmg d.output d.fcapinf d.bankcr d.pfolio d.equity, lr(l.output fcapinf bankcr pfolio equity) ec(ECT) replace full mg
*Hausman Test to Know Which to Choose (Significant prob. will mean choosing MG and insignificant prob. will mean choosing PMG)
hausman mg pmg, sigmamore
Question
Dear research community,
I am currently working with Hofstede's dimensions, however, I do not exactly use his questionnaire. In order to calculate my index in accordance to his process, I am looking for the meaning of the constants in front of the mean scores.
For example: PDI = 35(m07 – m02) + 25(m20 – m23) ... What do 35 and 25 mean? How could I calculate them with regard to my research?
Thank you very much for your help!
Best wishes,
Katharina Franke
Question
Dear Researchers,
I'm working on the research using the DEA(Data Envelopment Analysis) method to measure the provincial energy efficiency. However, due to the data constraint the provincial energy consumption data is not available. Can i assume the provincial energy consumption is proportional to provincial GDP?
(national energy consumption/national GDP x province i GDP)?
I agree with researcher Abdulrahman M. Ahmed . There is not necessarily a correlation between national energy consumption and national GDP. Steven Setiawan, you should try to have the information on energy consumption by province.
Question
Can I use Granger Causality test on a monetary variable only? or do I need non-monetary variables?
Also Do I need to do any test before Granger, like a unit root test, or just use raw data?
What free programs can I use to compute the data?
You can yes. You can also use Eviews or Stata softwares
Question
I have heard some academics argue that t-test can only be used for hypothesis testing. That it is too weak a tool to be used to analyse a specific objective when carrying out an academic research. For example, is t-test an appropriate analytical tool to determine the effect of credit on farm output?
Depending on you objective statement, if your objective is to compare variables that influence a particular problem, you can use the t- test to compare and them give justification.
Question
I am using "mvprobit" in STATA, however it is not clear to me how i can estimate marginal effect after this. Any help will be much appreciated.
Dear All,
I have worked on this and written codes below for estimating marginal effect after MVPROBIT. I hope this will be useful. *Example data use http://www.stata-press.com/data/r7/school.dta, clear *Step 1: Running the MV probit model mvprobit (private = years logptax loginc) (vote = years logptax loginc)
*Step 2: Post estimation command to estimate Predictions from multivariate probit models estimated by SML mvppred pred_xb, xb *Step 3" Generate coeefiencts for each binay category gen Coef_years_private = -.0089447 gen Coef_logptax_private =-.1018381 gen Coef_loginc_private =.3787381 gen Coef_years_vote = -.0160871 gen Coef_logptax_vote =-1.260877 gen Coef_loginc_vote =.9744685 *Step 4: Calculating Marginal effect using (Coefficients and linear Predictions) gen ME_years_private =normalden(pred_xb1)*Coef_years_private gen ME_logpatx_private =normalden(pred_xb1)*Coef_logptax_private gen ME_loginc_private =normalden(pred_xb1)*Coef_loginc_private gen ME_years_vote =normalden(pred_xb1)*Coef_years_vote gen ME_logpatx_vote =normalden(pred_xb1)*Coef_logptax_vote gen ME_loginc_vote =normalden(pred_xb1)*Coef_loginc_vote *Step 5 After estimating the ME for each observation we can get the mean using the summarize or mean command.
Question
What is the most acceptable method to measure the impact of regulation/policy so far?
I only know the Difference-in-Difference (DID), Propensity Score Matching (PSM), Two-Step System GMM (for dynamic) are common methods. Expecting your opinion for 20 years long panel for firm-level data.
recent development
(1) Wooldridge two-way Mundlak regression and fixed effects and dif-in-dif
(2) synthetic control
(3) Cerulli, G. 2015. Econometric Evaluation of Socio-Economic Programs: Theory and Applications.
(4) Pesaran (2015) Time Series and Panel Data Econometrics
Question
Hello everyone,
i would like to analyze the effect of innovation in 1 industry over a time period of 10 years. the dependent variable is export and the Independent variables are R&D and Labour costs.
What is the best model to use? i am planning to do a Log-linear model.
Thank you very much for your greatly needed help!
Before deciding on the econometric model, you should go through the stationarity test (ADF test). If the data are stationary, OLS Regression with a log-linear model would be fine. But, if not, you may go for VAR or ARDL. You should also check the robustness of the model by going through residual tests such as Autocorrelation LM Test.
Question
Dear colleagues,
I am planning to investigate the panel data set containing three countries and 10 variables. The time frame is a bit short that concerns me (between 2011-2020 for each country). What should be the sample size in this case? Can I apply fixed effects, random effects, or pooled OLS?
Thank you for your responses beforehand.
Best
Ibrahim
It seems a very small sample to apply microeconometric techniques. Having 27 seven observations and 10 covariates, at most, you will have 27 - 1 - 10 = 16 degrees of freedom. This is pretty low. If I had to decide to pursue a project based on that, I would try to avoid it.
It is really closer to multiple times series than panel data. Have a look at this link:
Question
Hi Everyone,
I am investigating the change of a dependent variable (Y) over time (Years). I have plotted the dependent variable across time as a line graph and it seems to be correlated with time (i.e. Y increases over time but not for all years).
I was wondering if there is a formal statistical test to determine if this relationship exists between the time variable and Y?
Any help would be greatly appreciated!
Just perform a regression of the variable on time or a simple correlation.
Neverthless, usually, what we do is to carry out a test for mean differences at two different points of time.
Question
Dear Research Community,
I would like to check structural breaks in polynomial regression that predicts expect excess return on excess equity-to-bond market volatility. I find some good references useful but none is dealing with the polynomials. For instance:
- Andrews, D.W.K., 1993, Tests for Parameter Instability and Structural Change With Unknown Change Point. Econometrica 61, 821-856.
- Bai, J. and P. Perron, 1998, Estimating and Testing Linear Models With Multiple Structural Changes. Econometrica 66, 47-78.
- Bai, J. and P. Perron, 2003. Computation and Analysis of Multiple Structural Change Models. Journal of Applied Econometrics 18, 1-22.
- Bai, J. and P. Perron, 2004. Multiple Structural Change Models: A Simulation Analysis. In Econometric Essays, Eds. D. Corbae, S. Durlauf, and B.E. Hansen (Cambridge, U.K.: Cambridge University Press).
As my polynomial is 3-order, I am wondering if structural breaks have to be checked for the 3 orders' parameters (X, X2 and X3) in time-varying or is there other efficient way to handle this issue?
Thank you!
Faten
Byrne, Joseph P., and Roger Perman. Unit roots and structural breaks: a survey of the literature. University of Glasgow, Department of Economics, 2006.
Question
Hi,
I am looking forward to test unit root for a panel data series. In this regard, I would want to use the Hadri and Rao (2008) test with structural break. Is there any way, I can perform the test in STATA or any other like statistical software.
thanks,
Sagnik
In stata software there is xtbunitroot command for break point unitroot test.
Question
My research is to find out the determinants of FDI. I am doing bound test to see the long run relationship, cointegration test and other diagnostic tests.
Question
Dear Colleagues,
I ran an Error Correction Model, obtaining the results depicted below. The model comes from the literature, where Dutch disease effects were tested in the case of Russia. My dependent variable was the real effective exchange rate, while oil prices (OIL_Prices), terms of trade (TOT), public deficit (GOV), industrial productivity (PR) were independent variables. My main concern is that only the Error Correction Term, the dummy variable, and the intercept are statistically significant. Moreover, residuals are not normally distributed, while also the residuals are heteroscedasdic. There is no serial correlation issue according to the LM test. How can I improve my findings? Thank you beforehand.
Best
Ibrahim
I notice the following about your specification. (1) Your inclusion of a constant (and its subsequent significance) means you allow for (and find) a trend in the real exchange rate independent of any trends in the other variables. Is that economically reasonable? (2) I assume the CRISIS variable is a zero-one dummy for time periods with a "crisis" of some sort. Apparently it is not in the cointegration vector. Why not? If it were, then I'd expect to find CRISIS differences in the error correction equation. Instead, you have it in levels. Thus you specify that a temporary crisis has a permanent effect of the level of the real exchange rate independent of the other variables. Is that what you intend? (3) You do not include the lagged difference of the real exchange rate in the error correction equation. Why not? Normally it would be there.
Question
i estimated Autoregressive model in eview. I got parameter estimation for one additional variabel which i have not included in the model. the variable is labelled as ' SIGMASQ '.
what is that variable and how to interpret it?
i am attaching the results of the autoregressive model.
Sigmasq is sigma square of the distribution of residuals which is considered as a proxy of the variance of the distribution of the dependent variable. Indeed this distribution is necessary to maximum liklihood method
Sigmasq is estimated in the second step after estimating the parameter related to estimators
Otherwise, SE is the standard error of regression which is the average of differencesome between actual values and fitted values of the dependent variable
Question
Dear colleagues,
I applied the Granger Causality test in my paper and the reviewer wrote me the following: the statistical analysis was a bit short – usually the Granger-causality is followed by some vector autoregressive modeling...
What can I respond in this case?
P.S. I had a small sample size and serious data limitation.
Best
Ibrahim
Ibrahim Niftiyev , probably the reviewer wants to see not only whether a variable affects or not the other (i.e. the results of the Granger causality tests), but also to which extent (the magnitude and temporality of the dynamic relationship, something you can obtain from the IRFs of a VAR model). If you want to apply a VAR but you have a small sample size/data limitations, you want to consider a Bayesian VAR. Bayesian VARs are very popular and Bayesian methods are valid in small samples.
Question
Hello, dear network. I need some help.
I'm working on research, using the Event Study Approach. I have a couple of doubts about the significance of the treatment variable leads and lag coefficients.
I'm not sure to be satisfying the pre-treatment Parallel Trends Assumption: all the lags are not statistically significant and are around the 0 line. Is that enough to accomplish the identification assumption?
Also, I'm not sure about the leads coefficient's significance and their interpretation. The table with the coefficients is attached.
Thank you so much for your help.
You may find this attached paper helpful.
Best wishes, David Booth
Question
Dear researchers,
I am working on formulating hydrological model when runoff(output variable) is available at monthly time-step while rainfall(input variable) is at daily time-step.
I firstly wanted to explore mathematical models and techniques that can be used here. I have found MIDAS regression method, which forms relationship between mixed frequency data variables (output at monthly time step and input at daily time step). But the problem is variables in hydrological models are at the same time step. So that technique will not work, because the MIDAS model will have relation between variables sampled at different frequency.
So can anyone suggest relevant literature, in which both output and input variables of model are related at high frequency (say daily) but the model is learning through low frequency (monthly) output data and high frequency (daily) input data.
Can you use daily input data to forecast daily output data then cumulate to monthly? You will be smoothing output forecasts, but observed output is already smoothed.. Because output data is only available monthly, daily variation in output is not observable.
Question
The pair-wise granger causality test can be done by using e-views. Only doing this test, is it reliable enough to explain causality? and is it only for long run causality or both long run and short run causality test?
As pointed out by the answers given above Granger-causality represents a particular type of causality related to precedence. Granger-causality testing can also be used as part of a strategy to determine whether variables are weakly, strongly and/or super exogenous - see the reference below:
Engle, R.F., Hendry, D.F. and Richard, J.F. (1983) Exogeneity. Econometrica, 51, 277-304. http://dx.doi.org/10.2307/1911990
If you have more than two variables you should consider multivariate Granger-causality style testing using a VAR. You should also consider whether variables are stationary or nonstationary. If variables are stationary you can apply Granger-causality testing in a stationary VAR. If the variables are nonstationary and not cointegrated you can difference the variables and apply Granger-causality testing in a stationary VAR of the differenced variables . If the variables are nonstationary and are cointegrated you can apply Granger-causality testing in a VECM (that has short-run and long-run components) that assumes all the data are stationary by either cointegration or differencing transformations. If there is no cointegration the error-correction term can be excluded and testing is conducted by a Wald test in a stationary VAR. Alternatively, you can use the Toda and Yamamoto (1995) surplus lag Granger-causality test that uses nonstationary data. See Dave Giles blog for a discussion of this - the link is given below.
This method means you do not need to consider the notions of long-run and short-run Granger-causality. This is useful if you simply wish to know whether Y Granger-causes X for example. In a VECM long-run Granger-causality is tested using the t-ratio on the error-correction term in each equation. Short-run Granger-causality uses Wald tests to set the coefficients on all lagged differences of on variable to zero in an equation.
Question
My aim is to find out the significant relationship between FDI and its determinants. I am using bound test and error correction model.
Since your objective is not to test for causal link between the variables, I think you can perform the ARDL without the causality test. The Granger causality does not necessarily address the cause-and-effect relationship between the variables however, it helps to determine if one variable can predict or forecast another variable.
Question
Hello everyone. I am using the VECM model and I want to use variance decomposition, but as you know variance decomposition is very sensitive to the ordering of the variable. I read in some papers that it will be better to use generalized variance decomposition because it is invariant to the ordering of the variables. I am using Stata, R or Eviews and the problem is how to perform Generalised VD and please if anyone knows help me
Question
I am running an ARDL model on eviews and I need to know the following if anyone could help!
1. Is the optimal number of lags for annual data (30 observations) 1 or 2 OR should VAR be applied to know the optimal number of lags?
2. When we apply the VAR, the maximum number of lags applicable was 5, beyond 5 we got singular matrix error, but the problem is as we increase the number of lags, the optimal number of lags increase (when we choose 2 lags, we got 2 as the optimal, when we choose 5 lags, we got 5 as the optimal) so what should be done?
1. My first comment is that all cointegrating studies must be based on the economic theory (and common sense) of the system that you are examining. Your theory should suggest which variables are stationary, which are non-stationary, and which are cointegrated. Your ARDL, VECM, etc, analyses are then tests of the fit of the data to your theories. It is not appropriate to use these methodologies to search for theories that fit the data. Such results will give spurious results. Now suppose that you have outlined your theory in advance of touching your computer keyboard to do your econometrics.
2. You have only 30 annual observations. This is on the small size for an elaborate analysis such as this. It appears that you have one dependent variable and possibly 3 explanatory variables. If you have 5 lags you are estimating about 25 coefficients which is not feasible with 30 observations.
3. If you wish to use the ARDL methodology you must be satisfied that (1) there is only one cointegrating relationship and (2) that the explanatory variables are (weakly) exogenous. Otherwise, a VECM approach is required and you may also not have enough data for a VECM analysis.
4. Is it possible that you would use a simpler approach? Could you use graphs or a simpler approach to illustrate your economic theory? These are questions that you alone can answer. Advanced econometrics is not a cure for inadequate data and a deficit of economic theory.
5. If this is an academic project, consult your supervisor. While what I have set out above is correct, it may not be what your supervisor expects at this stage in your studies.
Question
In one of my paper, I have applied Newey-West standard error model in panel data for robustness purpose. I want to differentiate this model from FMOLS and DOLS model. So, on what ground can we justify this model over FMOLS and DOLS model.
This estimator allows you for controlling for heteroskedasticity and serial correlation of the form AR, not every type of serial correlation. Clustered standard errors are robust to heroskecasticity and any form of serial correlation over time.
Question
My research is based on Foreign direct investment and its determinants. So, I need to see if there is any significant relationship between the variables by looking at the p values. Should i interpret all the variables including the lagged ones ?
When we are interpreting the estimated model, then we extract all information, based on the conventional level of significance. The most important issue in your research is that how your analysis is related to a real-world problem, that is economic justification behind the significant and non-significant results. Similarly, I suggest you estimate multiple different econometric models and then compare these models based on economic theory, statistical analysis. This exercise will ultimately help you to select a final and more efficient model.
Question
Dear colleagues,
I ran several models in OLS and found these results (see the attached screenshot please). My main concern is that some coefficients are extremely small, yet statistically significant. Is it a problem? Can it be that my dependent variables are index values that ranged between -2.5 and +2.5 while I have explanatory variables that have, i.e the level of measurement in Thousand tons? Thank you beforehand.
Best
Ibrahim
Dear Mr. Niftiyev, your dependent variable varies betweem -2,5 and +2,5. Hence, it is better to employ a Tobit, Probit or Logit approaches if possible. The choice between these three approaches depends mostly on the distribution patterns of the variables.
Question
I have GDP and MVA data and though the MVA is stationary, the GDP is non stationary even after log-transformation followed by de-trend followed by differencing. I want to build a VAR/VEC model for ln(GDP) and ln(MVA) but this data has been haunting me for past 3 days. I also tried both method of differencing i.e linear regression detrend and direct difference but nothing seems to work.
Also, they(ln GDP and ln MVA) satisfy the cointegration test, the trends are very similar. But for VAR/VEC I will need them to be I(1) which is not the case. Any suggestions on how to handle this data will be highly appreciated!
I have attached the snapshot of the data and also the data itself.
If the time series is short, you can can be converted the data into quarterly data, it will give better results
Question
I would like to employ within transformation in panel data analysis. Market Value Added represents dependent variable. Various value drivers (advertising expenses, number of patents, etc) are explanatory variables. Is it appropriate to use standardized coefficients? Maybe logarithmic forms of a regression is more suitable
Question
Dear Colleagues,
I paid attention to that, when I estimate an equation by Least Squares in Eviews, under the options tab we have a tick mark for degrees of freedom (d.f.) Adjustment. What is the importance and its role? Because, when I estimate an equation without d.f. Adjustment, I get two statistically significant relationship coefficients out of five explanatory variables; however, when I estimate with d.f. Adjustment, I do not get any significant results.
Thank you beforehand.
Are you attempting prediction or are you trying to do some form of “causal” relationship. If you are estimating a ”causal“ model you conclusions are conditional on the model estimated. Strictly speaking, it would be better to use the adjusted degrees of freedom - particularly with your small sample. In this case, a non-significant coefficient does not necessarily imply that the coefficient is truly zero. It is more likely that your sample is too small t establish a significant result. Its p-value must not be very far from your significance level. If the estimate is of the sign and magnitude expected by theory I would accept the value and report the p-value. Esimating 5 coefficients is a big demand from a sample of 23 obs.
If you are simply doing prediction or forecasting and are not attributing explanatory power to your coefficients you might be better with a simpler model which might have better predictive ability.
Question
I made a Nested logit model. Level 1: 8 choices and level 2: 22 choices.
In type 4, I have only 1 choice in level 1 corresponds to one choice in level 2.
The dissimilarity parameters are equal to 1 in this case (not surprising).
Can i run the model normally when i have a an IV parameter than is equal to one?
The results can be interpreted normally or what should i have to do in this case?
I tried the commande "constraint 1=[type4_tau]_cons=1" but the model does not run.
what can i do?
Good question
Question
I am trying to run a regression of cobb douglas function:
The problem that my dataset capture the firm at a point of time,
So I have a dataset over the period 1988-2012.
Each firm appears one time!
(I cannot define if it is a panel/time series/cross section..)
I want to find the effect of labor, capital on value added.
I have information on intermediate input.
I use two methods Olley& pakes, levinsohn-patrin.
But Stata is always telling me that there is no observations!
my command:
levpet lvalue, free(labour) proxy(intermediate_input) capital(capital) valueadded reps(250)
Why the command is not working and telling that there is no observations?
(Is this due the fact that each firm appear only one time in the data?)
(If yes, what is the possible corrections for simultanety and selection bias in this data?)
Mina
I agree with anton
Question
Dear All,
I have Panel Data fits difference in difference
I regress the (Bilateral Investment Treaties-BIT) on (Bilateral FDI). BIT: is dummy taking 1 if BIT exists and Zero Otherwise. While Bilateral FDI: Amount of FDI between the two economies. Objective: Examine If BIT enhances Bilateral FDI?
The issue is : - Each country have started its BIT with another pair country at a fixed time (different from the others): NO Fixed Time for the whole data.
I am willing to assume different time periods in a random way and run my Diff in Diff (for robustness):
Year 2004
Year 2006
Year 2008
My questions :
(1) Do you suggest this method is efficient?
(2) Any suggestion random selection of time?
مهتم
Question
I am interested to know about the difference between 1st and 2nd and 3rd generation panel data techniques.....
First generation panel data analysis often assume cross sectional independence, i.e the shock to a variable in a country will not have any effect on the other countries variables. However, as a result of globalization and other related cross nation interlinks, it is apparent that a problem in country A can affect country B. Most of the conventional panel analysis like fixed effect, Random effect, Panel ols, among others fall into this category. In order to correct the bias in the estimate of 1st generational panel analysis, the 2nd generational panel analysis was invented. This methods appropriately incorporate the cross sectional dependence in the modeling. This includes methods like ccemg, cs-ardl, cup-FM and so on..
Question
I am currently trying to estimate the effect of energy crises on food prices. Given the link between energy and food prices, I am inclined to reason that ECM will be best to estimate the relationship between food price and energy price (fuel price). Additionally I would like to include dummy variables in the model to estimate the effects of periods of energy crises on food prices. This I know is simple to do.
Where am confused is, how to model price volatility in the context of an ECM. I am only interested in the direction where fuel price, as well as the structural dummies for energy crises influences not just the determination of food price, but their volatility as well.
Hello. I hope you are doing well. It seems to me that in this case you can use NARDL midel to check the Asymmetric Impact of price volatility besides the ECM. moreover, to check the causal direction you can apply Toda Yamamato test.
Good luck
Question
Can anyone help me to carry out mean group analysis and pooled mean group analysis. I have used Microfit and Eviews before. Appreciate if I can get some advice on how to use these panel data methods in Microfit, Eviews and STATA.
Does anyone know if you can run the MG estimator in Eviews? (Hausman test rejects null hypothesis therefore PMG is not efficient, and i need to use the MG estimator.)
Question
Hello,
I am estimating a bivariate probit model, where the errors of the two probit equations are correlated and therefore not independent. However, I suspect that one of the explanatory variables of both models may also cause endogeneity problems. My question is whether there is a perhaps two-stage procedure to correct this situation? Instrumental variables maybe? Could you suggest literature on this problem?
I should know best what you are estimating, but, in principle, age is one of the few exogenous variables. The endogeneity/exogeneity of one variable is a thing can be mainly assessed retorically. At best, and I supposed is what you are doing, you can compare IV with OLS through a Hausman test (some people call that endogeneity test) but it is subject to the assumpation that the IV is valid.
Being a relationship not estimating before I would suggest perform the analysis without IVs. Then, if possible, at most, try to find an IV showing similar results to defend your analysis with a robustness check.
Best,
José-Ignacio
Question
Dear All,
I would like to perform event study analysis through website: https://www.eventstudytools.com/.
Unfortunately they ask for uploading data in a format i dont understand , dont know how to put data in this form, and i dont find a user manual or email to communicate with them.
Can anyone kindly advise how to use this service and explain it in a plain easy way?
Ahmed Samy
Question
Dear All,
I'm conducting an event study for a sample of 25 firms that each gone through certain yearly event (inclusion in an index).
(The 25 firms (events) are collected from last 5 years.)
I'm using daily price abnormal returns (AR), and consolidated horizontally the daily returns for the 25 firms to get daily "Average abnormal Returns" (AAR).
Estimation Window (before the event)= 119 days
Event Window = 30 days
1- I tested the significance of daily AAR through a t-test and corresponding P-value, How can i calculate the statistical power for those daily P-values?
(significance level used=.0.05, 2 tailed)
2- I calculated "Commutative Average Abnormal Returns" (CAAR) for some period in the event window, performed a significance test for it by t-test and corresponding P-value, how can i calculate the statistical power of this CAAR significance test?
(significance level used=.0.05, 2 tailed)
Thank you for your help and guidance.
Ahmed Samy
Question
The original series is nonstationary as it has a clear increasing trend and its ACF plot gradually dampens. To make the series stationary, what optimum order of differencing (d) is needed?
Furthermore, if the ACF and PACF plots of the differenced series do not cut off after a definite value of lags but have peaks at certain intermittent lags. How to choose the optimum values of 'p' and 'q' in such a case?
You can use auto.arima function from 'forecast' package for R.
Alternatively, if you have many observations, you can try out-of-sample comparison of alternative models with different values of d.
To compare alternative models, you can use the instructions described here:
Question
I have seen that some researchers just compare the difference in R2 in two models: one in which the variables of interest are included and one in which they are excluded. However, in my case, I have that this difference is small (0.05). Is there any method by which I can be sure (or at least have some support for the argument that) this change is not just due to luck or noise?
Partial F-test will be useful hear. After the 1st variable is in, you add other variables ONE at a time. after 2nd bariable is added you have your y-variable as function of 2 variables giving model with 2 d.f. & certain Sum of Squares (SS). from 2-variable SS subtract 1-variable SS. that change in SS will have 1d.f. cost. So extra variable SS divided by '1' is the "Change in regression mean squares (regMS). Further, divide 2-variable residual SS (RSS) by 2-variable residual d.f to get Residual mean Squares (resMS). Now divide Change in regMS by resMS to get partial F-value & look up Tables for probability of partial F-value. If significant keep the 2nd variable in & do the same for any further independent variable you may want to add to your model. Adjusted Rsq is = 100*[1 - {(regSS/regDF)/(totalSS/totalDF)}]
Question
To illustrate my point I present you an hypothetical case with the following equation:
wage=C+0.5education+0.3rural area (
Where the variable "education" measures the number of years of education a person has and rural area is a dummy variable that takes the value of 1 if the person lives in the rural area and 0 if she lives in the urban area.
In this situation (and assuming no other relevant factors affecting wage), my questions are:
1) Is the 0.5 coefficient of education reflecting the difference between (1) the mean of the marginal return of an extra year of education on the wage of an urban worker and (2) the mean of the marginal return of an extra year of education of an rural worker?
a) If my reasoning is wrong, what would be the intuition of the mechanism of "holding constant"?
2) Mathematically, how is that just adding the rural variable works on "holding constant" the effect of living in a rural area on the relationship between education and wage?
To assume that other variable do not change in order to allow for an evaluation of partial variation in a dependent variable due to variation in the only independent while other variables do not change
Question
I am trying to learn use of augmented ARDL. But I did not find the command for augmented ardl  in stata. Can anyone please refer to the user written code for Augmented ARDL? Is there any good paper that describe the difference between ARDL bound test and augmented ARDL process? I would be happy if you can answer those questions.
In this Augmented ARDL, I find there are three test to get confirmation for the long run cointegration; e.g, overall F test, t test on lagged dependent variable, F test on lagged independent variable.
1. How to find/calculate t-statistics for the lagged dependent variable?
2. How to find/calculate F-statistics for the lagged independent variable?
Using STATA, I find that the bound test produces two test statistics: F statistics and t-statistics. But both of them are for examining overall test for cointegration. How could I find t-statistics for lagged dependent variable and F statistics for lagged independent variable?
Thank you.
Not so much difference between the two at the level of estimation. At the level of testing however, the augmented ARDL takes the testing further. For ARDL, the dependent variable is required to be I(1). The augmented ARDL overcome this by a number of tests to sort out things properly in order to avoid the degenerate cases.
Now Eviews does not directly provide the testing procedure for augmented ARDL. However, there's an addin called NARDL on the Eviews page written by yours sincerely. After estimating your model just the way you would normally do, then use the Make Testable Form to generate an OLS version of the results. All the test you need to carry out can then be done directly.
Note that the new set of critical values has been suggested. You need to use those critical values and NOT the ones generated by Eviews.
Hope that helps.
In view of this question, I'll soon add Example on my page.
Question
The paper, on which I am working, is a multivariate study. I am planning to use this model as it has two advantages:
1. It tests the stability of the long-term relationship across quantiles and provides a more flexible econometric framework.
2. It can explain the possible asymmetry in the response on one variable to changes in another variable.
Because of these two reasons, I am preferring it above NARDL.
As I am not good in STATA coding, therefore, any help regarding coding this method is highly appreciated.
STATA is not appropriate for QARDL, you can use GAUSS, R or MATLAB. Give me your email I will send you the GAUSS code
Question
Dear All,
I’m conducting and event study for inclusion of companies in a certain index.
The event is the “inclusion event” for companies in this index for last 5 years.
For the events, we have yearly Announcement date (AD) for inclusions, and also effective Change Dates (CD) for the inclusion in the index.
Within same year, I have aligned all companies together on (AD) as day 0, and since they are companies from same year, CD will also align for all of them.
The problem comes when I try to aggregate companies from different years together, although I aligned them all to have same AD, but CD is different from one year to another so CD don’t align for companies from different years.
How can I overcome this misalignment of CD from different years , so that I’m able to aggregate all the companies together?
Many Thanks.
Dear Prof. Raymond,
My aim is to see what happens to returns of stocks when they join a certain stock exchange index, do they generate abnormal returns?
I’m trying to study that for the last 3 years.
So the event is “joining the index” which happens with 2 dates (1) announcement date (AD), on which stock exchange announces the news of those stocks will join the index, and (2) change date (CD), which is the date for really including those stocks in the index, this CD is is decided and announced by the stock exchange during the AD.
I have attached a similar work for your kind reference.
Question
Dears,
I'm conducting an event study for the effect of news announcement at certain date on stock return.
Using the market model to estimate the expected stock return in the "estimation window" , we need to regress stock returns ( stock under study) with returns from market portfolio index.
1- How can we decide upon choosing this market portfolio index for regression ?
Is it just the main index of the market?
Sector index from which the stock under study belong?..etc ?
2- Is it necessary that stock under study be among the constituents of this market index?