Science topic

# Econometric Applications - Science topic

A group for discussion and experience sharing on applied econometrics issues.
Questions related to Econometric Applications
Question
What is the difference between mediating and moderating variables in panel data regression?
I recommend that you use the free statistical program JAMOVI which allows the inclusion of multiple moderators and mediators.
Question
There are several independent variables and several dependent variables. I want to see how those independent variables affect the dependent variables. In other words, I want analyze:
[y1, y2, y3] = [a1, a2, a3] + [b1, b2, b3]*x1 + [c1, c2, c3]*x2 + [e1, e2, e3]
The main problem is that y1, y2, y3 are correlated. y1 increases may have lead to decrease of y2 and y3. In this situation, what multivariate multiple regression models can I use? And what assumptions of those models?
Hello Jialing,
The fact that DVs are correlated is often one argument for choosing a multivariate method to analyze the data rather than generating a univariate model for each individual DV.
If what you are saying is that, causally or temporally, is that Y1 influences Y2, then perhaps you'd be better off evaluating a path model which incorporates such proposed relationships rather than simply allowing for correlations/covariances other than zero among the DV pairs in the multivariate regression.
Question
Hi all, I'm doing fyp with the title of the determinant of healthcare expenditure in 2011-2020. Here are my variables: government financing, gdp, total population.
first model is: healthcare expendituret=B0+B1gov financingt + B2gdpt + B3populationt + et
second causal relationship model is healthcare expenditure per capitat= B0 + B1gdp per capitat +et
It is possible to use unit root test then ADRL for the first model and what test can use for the second model?
You should read the Global Burden of Disease articles on health care financing. Their appendices fully disclose their variables (& they've spent a lot of time & effort dettermining the best ones) and they provide their (sophisticated) statistical code. Lead author on most of that work is Joseph Dieleman or something like Global Burden of Disease Health Financing Collaborator Network. A full biblio of this work is at https://www.healthdata.org/about/joseph-dieleman if you click on the publications tab. The pubs are all open-access & most are on ResearchGate. :-)
Question
In the file that I attached below there is a line upon the theta(1) coefficient and another one exactly below C(9). In addition, what is this number below C(9)? There is no description
I asked that question to the person who code that package, and he said C(9) coefficient does not have any meaning here, just ignore. It comes up because the package is written for the old version of Eviews and has not been updated that is why.
Question
Suppose, one of the regressors is an index number and the index ranges from 1-15, where the higher value indicates greater flexibility and the lower value indicates greater rigidity. If the coefficient is negative, then what will be the interpretation of this coefficient? Does it mean moving to greater rigidity or greater flexibility is better?
Index number is a measure of average (overall) change in a group of related variables.
It is to be defined clearly what is coefficient of index.
Question
Dear All,
I’m conducting an event study for the yearly inclusion and exclusion of some stocks (from different industry sectors) in an index.
I need to calculate the abnormal return per each stock upon inclusion or exclusion from the index.
I have some questions:
1- How to decide upon the length of backward time to consider for the “Estimation Window” and how to justify ?
2- Stock return is calculated by:
(price today – price yesterday)/(price yesterday)
OR
LN(price today/price yesterday)?
I see both ways are used, although they give different results.
Can any of them be used to calculate CAR?
3- When calculating the Abnormal return as the difference between stock return and a Benchmark Return (market return), The market (benchmark) return should be the index itself (on which stock are included or excluded) ? Or the sector index related to the stock?
Hi there, I am using Eventus software and I am wondering how the software computes the Market index in order to calculate abnormal returns?
Question
I collected 109 responses for 60 indicators to measure the status of urban sustainability as a pilot study. So far I know, I cannot run EFA as 1 indicator required at least 5 responses, but I do not know whether I can run PCA with limited responses? Would you please suggest to me the applicability of PCA or any other possible analysis?
I would recommend you read about the difference between EFA and PCA first. Whether or not you should run an EFA has nothing to do with the number of response options on the indicators, five or otherwise. In general, EFA is preferable to PCA as it is considered to be the 'real' factor analysis. The are many threads on RG on this issue.
Best
Marcel
Question
I have heard some academics argue that t-test can only be used for hypothesis testing. That it is too weak a tool to be used to analyse a specific objective when carrying out an academic research. For example, is t-test an appropriate analytical tool to determine the effect of credit on farm output?
Depending on you objective statement, if your objective is to compare variables that influence a particular problem, you can use the t- test to compare and them give justification.
Question
Dear everyone,
I am in great distress and desperately need your advice. I have the cumulated (disaggregated) data of a survey of an industry (total export, total labour costs etc.) of 380 firms. The original paper is using a Two-stage least square (TSLS) model in oder to analyze several industries with one Independent variable having a relationship with the dependent variable, which was the limitation not to use an OLS method, according to the author. However, i want to conduct a single industry analysis and exclude the variable with the relationship, BUT instead analyze the model over 3 years. What is the best econometric model to use? Can is use an OLS regression over period of 3 years? if yes, what tests are applicable then?
Thank you so much for your help, you are helping me out so much !!!!!!!
Dear, conducting any standard model depends on an important factor, namely the number of observations included in the model, for example, if the observations are small, the Phelps-Peron test can be conducted to test the stability, and if the observations are large, the whooping-full test can be conducted, and in light of the stability results, we can determine the model that can be conducted Julius Hogan
Question
Hello everyone,
i would like to analyze the effect of innovation in 1 industry over a time period of 10 years. the dependent variable is export and the Independent variables are R&D and Labour costs.
What is the best model to use? i am planning to do a Log-linear model.
Thank you very much for your greatly needed help!
Before deciding on the econometric model, you should go through the stationarity test (ADF test). If the data are stationary, OLS Regression with a log-linear model would be fine. But, if not, you may go for VAR or ARDL. You should also check the robustness of the model by going through residual tests such as Autocorrelation LM Test.
Question
I am trying to measure the existence of dynamic relations between green innovation and the implementation of environmental regulation. According to literature, green innovation has a dynamic impact but I found 'lag term of green patents counts' is not statically significant. Is there any other test to confirm dynamic relation? Do you have any suggestions?
Regarding model specification, I think GMM is a good option here, given that N>T.
Question
Dear colleagues,
I am planning to investigate the panel data set containing three countries and 10 variables. The time frame is a bit short that concerns me (between 2011-2020 for each country). What should be the sample size in this case? Can I apply fixed effects, random effects, or pooled OLS?
Thank you for your responses beforehand.
Best
Ibrahim
It seems a very small sample to apply microeconometric techniques. Having 27 seven observations and 10 covariates, at most, you will have 27 - 1 - 10 = 16 degrees of freedom. This is pretty low. If I had to decide to pursue a project based on that, I would try to avoid it.
It is really closer to multiple times series than panel data. Have a look at this link:
Question
Dear Colleagues,
I ran an Error Correction Model, obtaining the results depicted below. The model comes from the literature, where Dutch disease effects were tested in the case of Russia. My dependent variable was the real effective exchange rate, while oil prices (OIL_Prices), terms of trade (TOT), public deficit (GOV), industrial productivity (PR) were independent variables. My main concern is that only the Error Correction Term, the dummy variable, and the intercept are statistically significant. Moreover, residuals are not normally distributed, while also the residuals are heteroscedasdic. There is no serial correlation issue according to the LM test. How can I improve my findings? Thank you beforehand.
Best
Ibrahim
I notice the following about your specification. (1) Your inclusion of a constant (and its subsequent significance) means you allow for (and find) a trend in the real exchange rate independent of any trends in the other variables. Is that economically reasonable? (2) I assume the CRISIS variable is a zero-one dummy for time periods with a "crisis" of some sort. Apparently it is not in the cointegration vector. Why not? If it were, then I'd expect to find CRISIS differences in the error correction equation. Instead, you have it in levels. Thus you specify that a temporary crisis has a permanent effect of the level of the real exchange rate independent of the other variables. Is that what you intend? (3) You do not include the lagged difference of the real exchange rate in the error correction equation. Why not? Normally it would be there.
Question
Dear colleagues,
I applied the Granger Causality test in my paper and the reviewer wrote me the following: the statistical analysis was a bit short – usually the Granger-causality is followed by some vector autoregressive modeling...
What can I respond in this case?
P.S. I had a small sample size and serious data limitation.
Best
Ibrahim
Ibrahim Niftiyev , probably the reviewer wants to see not only whether a variable affects or not the other (i.e. the results of the Granger causality tests), but also to which extent (the magnitude and temporality of the dynamic relationship, something you can obtain from the IRFs of a VAR model). If you want to apply a VAR but you have a small sample size/data limitations, you want to consider a Bayesian VAR. Bayesian VARs are very popular and Bayesian methods are valid in small samples.
Question
In a regression, I'm using household income and household specific expenditures as independent variables, both with a natural logarithmic transformation, and I want to control them by the household size. I find in the literature that in that kind of cases, it is used the natural logarithm this last variable, but I don't get the logic. If I'm not wrong, the household size is the number of people living in the household, so i find that the interpretation would be very weird: an increase in 1% of the number of people leads to a x% in Y?
Very nicely explained Renaud Di Francesco Thank you.
Question
Hello everyone. I am using the VECM model and I want to use variance decomposition, but as you know variance decomposition is very sensitive to the ordering of the variable. I read in some papers that it will be better to use generalized variance decomposition because it is invariant to the ordering of the variables. I am using Stata, R or Eviews and the problem is how to perform Generalised VD and please if anyone knows help me
Question
I am running an ARDL model on eviews and I need to know the following if anyone could help!
1. Is the optimal number of lags for annual data (30 observations) 1 or 2 OR should VAR be applied to know the optimal number of lags?
2. When we apply the VAR, the maximum number of lags applicable was 5, beyond 5 we got singular matrix error, but the problem is as we increase the number of lags, the optimal number of lags increase (when we choose 2 lags, we got 2 as the optimal, when we choose 5 lags, we got 5 as the optimal) so what should be done?
1. My first comment is that all cointegrating studies must be based on the economic theory (and common sense) of the system that you are examining. Your theory should suggest which variables are stationary, which are non-stationary, and which are cointegrated. Your ARDL, VECM, etc, analyses are then tests of the fit of the data to your theories. It is not appropriate to use these methodologies to search for theories that fit the data. Such results will give spurious results. Now suppose that you have outlined your theory in advance of touching your computer keyboard to do your econometrics.
2. You have only 30 annual observations. This is on the small size for an elaborate analysis such as this. It appears that you have one dependent variable and possibly 3 explanatory variables. If you have 5 lags you are estimating about 25 coefficients which is not feasible with 30 observations.
3. If you wish to use the ARDL methodology you must be satisfied that (1) there is only one cointegrating relationship and (2) that the explanatory variables are (weakly) exogenous. Otherwise, a VECM approach is required and you may also not have enough data for a VECM analysis.
4. Is it possible that you would use a simpler approach? Could you use graphs or a simpler approach to illustrate your economic theory? These are questions that you alone can answer. Advanced econometrics is not a cure for inadequate data and a deficit of economic theory.
5. If this is an academic project, consult your supervisor. While what I have set out above is correct, it may not be what your supervisor expects at this stage in your studies.
Question
Dear colleagues,
i am looking for the package or command, which performs times series decomposition in STATA. So far I did not find anything. Example can be found here: https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b at figure 5.
Dear Mousumi Bhattacharya , thank you for your help!!
Dear Kehinde Mary Bello and Olumide Olaoye , Mousumi Bhattacharya has helped me with Unobserved-components models, which, as far as i understand, help to decompose into trends and cycle components. However, if you know any other options for positive and negative shocks decomposition (apart from impulse - response functions), i will be happy to hear for them. Especially for some references on STATA implementation of them.
Question
I am doing a study on the relationship between religiosity and visitation to religious healers. Before adding interaction term in the regression, the multilevel model showed religiosity as significant, but after I added the religiosity variable (i.e., frequency of prayers) as interaction terms with other control variables education, gender, urbanity, and such, the original variable frequency of prayers turned insignificant.
- what am i doing wrong here?
- is it because I added too many interactions with it?
- should I use another religiosity indicator (e.g., frequency of reading the Bible) along with the frequency of prayers as an interaction term with education, gender, urbanity?
It would be really helpful for me if someone tell me what to do in this situation.
Hello Md Rakib Hossain,
In an ordinary linear regression, if inclusion of an interaction (say, X*M) in a model that previously only had the individual variables as predictors (X, M), and the interaction was significant in the expanded model, but the individual predictor(s) was(were) not any longer, then to me this sounds like a classic case of moderation effect.
Question
Dear Colleagues,
I estimated the OLS models and checked them for several tests; however, instability in CUSUMSQ persists as described in the photo. What should I do in this case?
Best
Ibrahim
I presume that your data is quarterly or monthly as otherwise, you have too few observations to make any reasonable inferences.
If you are trying to make causal inferences (e.g. you have an economic model that implies that x causes y and you wish to measure that effect). the CUMSUMSQ is one test that indicates that your model is not stable. Either the coefficients or the variance of the residuals is not stable. You have indicated that there is no heteroskedasticity so it is possible that the model coefficients are the problem. The test itself only indicates that there's instability and does not say what the instability is or what causes it. There are many possible causes of instability, (omitted variables, functional form, heteroskedasticity, autocorrelation, varying coefficients etc.) Your best procedure is to return to your economics and work out how your theory might lead to stability problems. Are there possible breaks in your data caused by policy changes, strikes, technological innovations, and similar. that might be covered with a dummy variable or a step dummy.
If you are doing forecasting (or projections) I would not be too concerned about specification tests. It is very unlikely that an unstable model will forecast well. You may achieve good forecasting results with a very simple model that need not be fully theory compliant.
Question
Dear Colleagues,
I paid attention to that, when I estimate an equation by Least Squares in Eviews, under the options tab we have a tick mark for degrees of freedom (d.f.) Adjustment. What is the importance and its role? Because, when I estimate an equation without d.f. Adjustment, I get two statistically significant relationship coefficients out of five explanatory variables; however, when I estimate with d.f. Adjustment, I do not get any significant results.
Thank you beforehand.
Are you attempting prediction or are you trying to do some form of “causal” relationship. If you are estimating a ”causal“ model you conclusions are conditional on the model estimated. Strictly speaking, it would be better to use the adjusted degrees of freedom - particularly with your small sample. In this case, a non-significant coefficient does not necessarily imply that the coefficient is truly zero. It is more likely that your sample is too small t establish a significant result. Its p-value must not be very far from your significance level. If the estimate is of the sign and magnitude expected by theory I would accept the value and report the p-value. Esimating 5 coefficients is a big demand from a sample of 23 obs.
If you are simply doing prediction or forecasting and are not attributing explanatory power to your coefficients you might be better with a simpler model which might have better predictive ability.
Question
Dear Colleagues,
If I have 10 variables in my dataset (time series) out of which 9 is explanatory and 1 dependent, and if I clarify that all the variables are non-stationary, should I take the first difference of the dependent variable as well?
Best
Ibrahim
Econometric models estimated with non-stationary data are profoundly invalid and misleading (Greene, 2002). An example of a simple scenario: - in a regression with one regressor, there are three variables that could be stationary or non-stationary; namely the dependent variable (Y), the regressor (X), and the disturbance term (u). A suitable econometric treatment of such a model depends critically on the pattern of stationarity and non-stationarity of these three variables (incl. the dependant variable). Since quite often variables can be non-stationary at I(0), it is important to understand the forces behind such non-stationarity, which largely include structural breaks, deterministic trends, and stochastic trends. Differencing (including the explained variable as in your case) is a common appropriate in nonstationary models, and this is often correct (Granger & Newbold, 1974; Green, 2002; and Stock, James & Watson, 2011).
Granger, C. W. J., and Paul Newbold. 1974. Spurious Regressions in Econometrics. Journal of Econometrics, 2(2):111-120.
Greene, W. 2002. Time series Models. (pp. 608-662) In Econometric Analysis, 5th edition. Prentice Hall, Upper Saddle Rive, NJ.
Stock, James H., and Mark Watson. 2011. Introduction to Econometrics. 3rd ed. Boston: Pearson Education/Addison Wesley.
Question
for the year 1975-76, some work is done... but how to get latest tables for input-output analysis in Pakistan?
You can access IO Tables for Pakistan over the period of 2010-2017 from following link :
Question
Dear colleagues,
I am applying the PCA to political and institutional variables to create an index and use it in the regression analysis as a dependent variable. However, the variables which will form the main components contain different measurements. For example, while control of corruption ranges between -2.5 (weaker) and 2.5 (stronger), freedom of press ranges between 0-100 and if the value is higher, it shows fewer degrees of freedom of the press. So, I am in a loss to understand if this difference creates any hardship to PCA to produce a valid index. In other words, is it a problem for PCA if one variable implies higher success as the values of it get higher, while the other variable shows higher success as the values of it get lower? What should I do in this case? Thank you beforehand.
Best
Ibrahim
Check if the bivariate scatterplots are linear or at least monotonically increasing or decreasing.
Question
Dear Colleagues,
I applied A PCA analysis (fixed two components to reduce 7 variables) and SPSS stored vectors of two components, which as far as I see can be treated as an index (especially the Fac_1 which explains more variation of the original variables compared to Fac_2). My question is as follows: can I use this index as an independent variable in simple or multivariate regression? Thank you beforehand for your help and suggestions.
Best
Ibrahim
Thank you for the response and information shared on your PCA analysis – quite helpful.
Sincere apologies if I might have used the term “threshold” loosely. What I mean by “threshold” is the minimum score/loading a variable should have under a particular component. In other words, it is the minimum loading or score size which you have set as a criterion to regard and select a variable as significant for classifying a variable under a particular Component. For Instance, results in your Pattern Matric show that scores of variable FR_CULT as 0.774 under Component 1 and 0.414 under Component 2, so if you have set your minimum score size at 0.5 to regard a variable as significant, then you classify FR_CULT under Component 1. Using the same criterion of 0.5 as the minimum score, the variable FR_PROF loading with scores of 0.645 under Component 1 and 0.621 under Component 2 suggest that FR_PROF has a complex structure since it is loading with high scores in more than 1 component, and therefore the respective variables should be eliminated – unless a different analytical procedure is applied.
To probably reach at a conclusion on your analysis, I have a few suggestions:
(1). Could you please first compute and share results of the correlations between the variables. Correlations between variables should as much low as possible to avoid the potential problem of multicollinearity (Steven, 2002).
(2). Your Scree Plot shows that the number of components was extracted based on the Eigenvalues greater than 1 – thanks. But please advise how you determined your Maximum Iterations for Convergence – by default at 25 or you did input a pre-determined number? Such information can help understand the complete procedure you adopted in the determination of the number of components in your PCA. Probably you can read Huang and Tseng (1992) on the procedure for determining the number of components in PCA – full reference herein below.
(3). If your data is time-series, please assess and share the distributions of each of your variables.
(4). Sample size as one of the factors that influences the accuracy of procedures in PCA (Velicer, Eaton & Fava, 2000). I am NOT implying that your sample size is small, but I am not sure if the sample size of 26 relative to the number of variables (7) that you did run PCA is good enough, – you can read Velicer, Eaton & Fava (2000) with full reference provided below.
(5). Variable-to-component ratio (v:c) is an important feature in evaluation of PCA results. In your Pattern Matrix results, despite the complex structure demonstrated by FR_PROF, the distribution of number of variables in Component 2 relative to Component 1 raises some questions. The number of variables per component (which is measured counting the number of variables correlated with each component in the population conditions) influences the accuracy of the results, with more variables per component producing more stable results. If we exclude FR_PORF (due to complex structure) from your results, Component 1 will contain 5 variables while Component 2 will contain only one variable (FR_LI).
Huang, D-Y. and Tseng, S-T. (1992). A decision procedure for determining the number of components in principal component analysis. J. Statist. Plan. Inf., 30, 63–71.
Stevens, J. (2002). Applied multivariate statistics for the social sciences. Mahwah. NJ: Lawrence Erlbaum Associates.
Velicer, F. W., Eaton, C. A., & Fava, J. L. (2000). Construct Explication through factor or component analysis: A review and evaluation of alternative procedures for determining the number of factors or components. In R. D. Goffin, & E. Helmes (Eds.), Problems and solutions in human assessment (pp. 42 -71). Boston: Kluwer Academic Publishers.
Question
Dear colleagues,
Is it ok to include 2 or 3 dummy variables in the regression equation? Or should I rotate the dummy variables in different models? The thing is, I have never come up with the model examples with more than 2 dummy variables in economics so far. Do you know any serious shortcomings of using more than 1 dummy variable in the same equation? Thank you beforehand.
Best
Ibrahim
Dear all,
you can use as much as you whish but be carefull to the multicolinearity
best
Question
Hi,
I am using ARDL bound test model for my research. It indicates that there is no long-run cointegration. How can I now interpret my result? Can I use granger causality?
Thanks.
In this case, you can check cointegration in the presence of structural break by using Gregory Hansen cointegration test.
You can also perform the Toda- Yamamoto causality test, because it can apply without checking the cointegration, and applicable for admixture of stationary and non-stationary series.
Question
Dear colleagues,
I am capable of using linear estimations between X and Y variables via OLS or 2SLS (on Eviews, for example); however, I need to study how to estimate/model non-linear relationships as well. If you know any source which can explain it in a simple language based on time series, your recommendations are well-welcomed. Thank you beforehand.
Best
Ibrahim
Dear Ibrahim,
I also recommend Greg. N Gregorios and Raven Pascalau @Hamid Muili and google for more materials on non-linear relationship.
Regards
Question
I can't find the exact command to run the model.
Hello,
2. right-click and select open as equation
3. select cointegration regression
4. Go ahead and choose FMOLS, DOLS, and CCR
Question
Dear All,
I would like to perform event study analysis through website: https://www.eventstudytools.com/.
Unfortunately they ask for uploading data in a format i dont understand , dont know how to put data in this form, and i dont find a user manual or email to communicate with them.
Can anyone kindly advise how to use this service and explain it in a plain easy way?
Ahmed Samy
Question
Dear All,
I'm conducting an event study for a sample of 25 firms that each gone through certain yearly event (inclusion in an index).
(The 25 firms (events) are collected from last 5 years.)
I'm using daily price abnormal returns (AR), and consolidated horizontally the daily returns for the 25 firms to get daily "Average abnormal Returns" (AAR).
Estimation Window (before the event)= 119 days
Event Window = 30 days
1- I tested the significance of daily AAR through a t-test and corresponding P-value, How can i calculate the statistical power for those daily P-values?
(significance level used=.0.05, 2 tailed)
2- I calculated "Commutative Average Abnormal Returns" (CAAR) for some period in the event window, performed a significance test for it by t-test and corresponding P-value, how can i calculate the statistical power of this CAAR significance test?
(significance level used=.0.05, 2 tailed)
Thank you for your help and guidance.
Ahmed Samy
Question
I have seen that some researchers just compare the difference in R2 in two models: one in which the variables of interest are included and one in which they are excluded. However, in my case, I have that this difference is small (0.05). Is there any method by which I can be sure (or at least have some support for the argument that) this change is not just due to luck or noise?
Partial F-test will be useful hear. After the 1st variable is in, you add other variables ONE at a time. after 2nd bariable is added you have your y-variable as function of 2 variables giving model with 2 d.f. & certain Sum of Squares (SS). from 2-variable SS subtract 1-variable SS. that change in SS will have 1d.f. cost. So extra variable SS divided by '1' is the "Change in regression mean squares (regMS). Further, divide 2-variable residual SS (RSS) by 2-variable residual d.f to get Residual mean Squares (resMS). Now divide Change in regMS by resMS to get partial F-value & look up Tables for probability of partial F-value. If significant keep the 2nd variable in & do the same for any further independent variable you may want to add to your model. Adjusted Rsq is = 100*[1 - {(regSS/regDF)/(totalSS/totalDF)}]
Question
To illustrate my point I present you an hypothetical case with the following equation:
wage=C+0.5education+0.3rural area (
Where the variable "education" measures the number of years of education a person has and rural area is a dummy variable that takes the value of 1 if the person lives in the rural area and 0 if she lives in the urban area.
In this situation (and assuming no other relevant factors affecting wage), my questions are:
1) Is the 0.5 coefficient of education reflecting the difference between (1) the mean of the marginal return of an extra year of education on the wage of an urban worker and (2) the mean of the marginal return of an extra year of education of an rural worker?
a) If my reasoning is wrong, what would be the intuition of the mechanism of "holding constant"?
2) Mathematically, how is that just adding the rural variable works on "holding constant" the effect of living in a rural area on the relationship between education and wage?
To assume that other variable do not change in order to allow for an evaluation of partial variation in a dependent variable due to variation in the only independent while other variables do not change
Question
Dear All,
Doing Financial Event studies using excel is just horrible process for the arrangement and chopping of huge data and complicated manual calculations..etc
Please advise what software are out there that can do Financial Event Studies in a more neat and time efficient way?
Thanks
Ahmed Samy
Stata, Eviews and R can do time series measurement well. You need to know how to implement event study using these software from the literature review of other similar studies.
Question
Dear All,
I’m conducting and event study for inclusion of companies in a certain index.
The event is the “inclusion event” for companies in this index for last 5 years.
For the events, we have yearly Announcement date (AD) for inclusions, and also effective Change Dates (CD) for the inclusion in the index.
Within same year, I have aligned all companies together on (AD) as day 0, and since they are companies from same year, CD will also align for all of them.
The problem comes when I try to aggregate companies from different years together, although I aligned them all to have same AD, but CD is different from one year to another so CD don’t align for companies from different years.
How can I overcome this misalignment of CD from different years , so that I’m able to aggregate all the companies together?
Many Thanks.
Dear Prof. Raymond,
My aim is to see what happens to returns of stocks when they join a certain stock exchange index, do they generate abnormal returns?
I’m trying to study that for the last 3 years.
So the event is “joining the index” which happens with 2 dates (1) announcement date (AD), on which stock exchange announces the news of those stocks will join the index, and (2) change date (CD), which is the date for really including those stocks in the index, this CD is is decided and announced by the stock exchange during the AD.
I have attached a similar work for your kind reference.
Question
Dears,
I'm conducting an event study for the effect of news announcement at certain date on stock return.
Using the market model to estimate the expected stock return in the "estimation window" , we need to regress stock returns ( stock under study) with returns from market portfolio index.
1- How can we decide upon choosing this market portfolio index for regression ?
Is it just the main index of the market?
Sector index from which the stock under study belong?..etc ?
2- Is it necessary that stock under study be among the constituents of this market index?
Many thanks
You can consider using the country's stock market index as a proxy for market portfolio.
Question
I am doing some research on the growth of joblessness in the manufacturing sector of Pakistan. In the literature I have studied, the seemingly unrelated regression technique (SUR) was used instead of the ordinary least square (OLS) method. Can anyone help me to identify the reasons why OLS is not used here? Or under what conditions SUR technique should be used instead of OLS?
Thanks to all of you for providing clear concept about SUR model.
Question
I'm working with life satisfaction as my dependent variable and some other independent variables that measure purchasing power (consumption, income and specific expenditures). To take into account the diminishig marginal returns of this last variables (following the literature) I transformed them in terms of their natural logarithm. However, now I want to compare the size of the coefficients of specific expenditures with the ones of consumption and income. Specifically, I would like some procedure which allows me to interpret the result like this: 1 unit of resources directed to a type of expenditure (say culture) is more/less effective to improve life satisfaction in comparison with the effect that this same unit would have under the category of income. If I just do this with withouth the natural logarithm (that is, expressed in dolars) the coefficients change in counterintuitive ways, so I would prefer to avoid this.
I was thinking about using beta coefficients, but I don't know if it makes sense to standarize an already logarithmic coefficient.
Am not sure Santiago I follow what you said. Elasticity can be used and Beta weights can be used. If I understand you? I will interpret elasticity as I % increase in the RHS variable changes the regressand by the estimated sign and coefficient of the RHS eg LnY= 2 -0.5Ln(X) -- Here a percent increase in X decreases y by .5 on the avg
Question
In a regression with a database with N=1200, I have an independent dummy variable that measures if the surveyed is unemployed or employed. The variable has the following characteristics:
Unemployment = 0 - Frecuency: 1196
Unemployment = 1 - Frecuency : 4
The regression gives me a significant coefficient, but, also, very counter intuitive (especifically, thay Life Satisfaction has a possitve association with unemployment). I think, however, that it's wrong to obtain a valid conclusion from just 4 cases in Unemployment=1. I also have other dummy variables where the situation is even less clear. For example:
Dummy = 0 - Frecuency: 1170
Dummy = 1 - Frecuency: 30
Or even more:
Categorical option A = 0 - Frecuency: 1150
Categorical option B = 1 - Frecuency: 30
Categorical option C = 2 - Frecuency: 12
Cateogorical optio D = 3 - Frecuency: 8
Can I obtain valid conlcusions from this? And, in more general terms, is there a minimun number of observations needed per category of response in each independent variable so the conslusions that arise from it are pertinent/correct? If that's the case, how can I calculate this number?
I sm sgree with Paul
Question
In order to analyze if there is a mediation effect using Baron & Kenny's steps, is it necessary to include the control variables of my model, or is it enough to do the analysis just with the independent variable, the mediator variable and the dependent variable of my interest?
"I don't have the theory to include more control variables that may me important for this model." -- so this is a statement. You know the field and the arguments why or why not a variable might have to be considered. You state that, at the best of your knowledge. So you can defend it and clearly point the advantages and limitations of your model. If reviewers (and, later, readers) have different ideas, they are invited to discuss it.
Question
I have a dummy varible as the possible mediatior of a relationship in my model. By reading the Baron and Kenny's (1986) steps, I see that, in the second one you have to test the relationship between the indepentend variable and the mediator, using the last one as a dependent variable. However, normally you won't use an OLS when you have a dummy as a dependent variable. Should I use a Probit in this case?
Now I understand. Please look at the definition of dummy variable. What you mean seems to me to be a categorical variable. BTW Actually I am not. I just answer these questions because I enjoy the enlightened repartee. If you want to know my background you are welcome to click on my name where a number of my papers, affiliations, my degrees, etc are included. May the force be with you. D. B.
Question
In my investigation about the determinants of subjective well being (life satisfaction) I have some variables that measure the access to food and also other variables that measure the affections (if, in the last week, the interviewed felt sad/happy, for example). These variables don't show high levels of simple Pearson correlations nor high levels of VIF. In experimenting with different models (including and excluding some variables), I see that access to food has a positive and significant coefficient, except in the ones that the affective varaibles are included. Can I make the case that this is due to the fact that affective variables are mediating the effect of access to food to life satisfaction? I also tried a with an interaction between access to food and affective variables but they are not significant.
Thank you Paul!
And yes, David. What can you say about the question if you re-read it changing the word variable with "coefficient of the variable"?
Question
Reading Wooldridge's book on introductory econometrics I observe that the F test allows us to see if, in a group, at least one of the coefficients is statistically significant. However, in my model I have that, individually, one of the variables of the group I want to test is already statistically significant (measured by the t-test). So, if that is the case I expect that, no matter with which variable I test for, if I include the one that is already individually significant, the F test will also be significant. Is there any useful usage I can make with the F test in this case?
Hello Santiago and Calleagues,
A t -test for one variable is identical to an F-test, in a simple regression model (with one explanatory variable); as a matter of fact in this case the square of the t-statistic value is exactly equal to the F statistic value (t^2 =F).
Once you include additional independent variables the F-test is the one that you rely to report the results. If you have significant F- test then you report the regression results and you look which explanatory variables are significant using the t-test. However if the F-test is insignificant you stop there and you do not report the regression.You need to find another model that at the minimum passes the F-test.This is a specification issue.
Kind regards,
George K Zestos
Question
I have a well endowed database with almost 29 0000 observations and I want to make an analysis with more than 50 variables. What are the problems that can arise from this situation? Can the model be overfitted? If it is possible, why?
Yes. Your model can be overfitted. You can think of overfitting in several ways, but let us take two different avenues. First, number of relevant variables. Imagine that the truly correct model has only 30 of the total of 50 variables that you happen to have. Whatever method you use to identify the correct variables in your model can lead you to "false discoveries". This is closely related to the type I error in statistical inference. You can be very strict with the type I error, but this will never be zero. So, you are admitting the possibility of false discoveries. These false discoveries have more chances to occur, the more variables and transformations of variables you try over the same sample. You mention that you have 50 variables...but what about using the squares of these variables, or logs of them or product of pairwise combinations of them. The more combinations you try over the same sample, the more likely is that you end up with false disvoveries: variables that are not in the model...but your statistical method is unable to detect that they are not....
Second, imagine that the true share of variance that can be explained in your model is 50%. So, this is the best R2 you can get if you could identify the correct model. Now, because you are trying to find the correct variables using the same sample over and over again, you end up with a sample R2 of 75%. You again have an overfitting issue induced by the data mining process.
Large N helps but a lot of the overfitting problem relies on the repeated use of the same sample to find a correct model.
This is a key topic....hope it helps!
Question
I have reported life satisfaction as my dependent variable and many independent variables of different kinds. Of them, one is the area in which the individual lives (urban/rural) and other is the access to public provided water service. When the area variable is included in the model, the second variable is non significant. However, when it is excluded, the public service gains enough significance for a 95% level of confidence. The two variables are moderately and negatively correlated (r= -0.45).
What possible explanations do you see for this phenomenom?
Mr. Valdivieso. I think the Multicollinearity is almost definitely the problem. Try the simple VIF test. Then try to change the way you measure the variables Area & Access. Good luck.
Question
I'm studying the determinants of subjectve well being in my country and I have reported satisfaction with life as my dependent variable and almost 40 independent variables. I ran multicolinearity tests and I dind't find values bigger than 5 (in fact, just two variables had a VIF above 2). Also, my N=22 000, so I dont expect to have an overfitted model. Actually, at thet beggining, all was going well: the variables maintained their signficance and the values of their coefficients when I added or deleted some variables to test the robustness of the model and the adjusted R squared increased with the inclusion of more variables.
However, I finally included some variables that measure the satisfaction with specific life domains (family, work, profession, community, etc.) and there is when the problem started: my adjusted R squared tripled and the significance and even the signs of some variables changed dramatically, in some cases, in a counterintuitive way. I also tested multicolinearity and the correlation of these variables with the other estimators and I didn't find this to be a problem.
The literature says that it is very likely that there are endogeneity problems between satisfaction with life domains and satisfaction with life since it is not too much the objective life conditions that affect life satisfaction but the propensity to self-report satisfaction. Can this be the cause of my problem? If so, how?
PD: I'm not trying to demonstrate causality.
Hi Santiago,
I believe that what you write is correct, there might be a problem of endogeneity. As you seem to be dealing with subjective indicators, there might be self-reporting biases involved. For instance, respondents who tend to be more optimistic would report a higher score with overall life satisfaction but also with the different dimensions that you look at. Further, those scores might beaffected by the current mood of the respondent etc.
I think this problem becomes worst if you include these self reported variables as predictors and outcomes in your model. Maybe addressing these two elements in separate models might be a solution (one model with overall life satisfaction and your 40 dependent variables and one model looking at the different dimensions of life satisfaction and their impact on overall satisfaction).
I had a quick glance at the literature. Pachecho & Lange adress this question of endogeneity: https://www.emerald.com/insight/content/doi/10.1108/03068291011062489/full/html
Question
I'm investigating the determinants of subjective well being in my country and I have a well endowed database in which I found a lot of environmental, psicosocial and political variables (+ the common ones) that are theory-related to my dependent variable (subjective well-being). In this context, do you find any trouble about including them all (almost 35, and that already deleting the ones that measure the same concept) in one single model (using the adjusted R2)?
You have many independent variables: To check the multicollinearity among the variables use the Variance Inflation factors.
Question
More researcher have applied R and Matlab software to estimate Panel Threshold Regression models. Can we estimate Panel Threshold Regression models by Eviews?
STATA 14 is very good to estimate the panel threshold model.
Question
I am using time series for my research.
Hi, I have a related question to this topic. Currently I'm working in a time series anylsis, the first ADF test with raw data I didn't consider any lag and all the results where nonstationarity. After this I runt eADF test for the first difference and the are stationarity, but I also run the test with lags according to the varsoc test and the first difference and they also non stationarity. Wich should be the right process. Thanks
Question
I have estimated a VECM model. It reports standard error and t-statistic instead of the P-value. So how do I check if the coefficients are significant using the t-statistics?
*I'm using Eviews for my estimations
Question
Hi all, i'm trying to understand Fama - Macbeth two step regression. I have 10 portfolios and T=5 years. In the first step i compute 10 time series regressions and if i have 2 factors i get 20 betas. How many regression i have to compute in the second step? only 5? and which betas should i use? The average of the betas found in the first step?
We compute Fama-MacBeth betas in this paper which is available on line.
Question
While Conducting ARDL BOUND test I discover that my F statistics is greater than the lower bound but less than the upper bound . Upon which Pesaran and Shin 2001 concluded it to be inconclusive. Please scholar what then becomes the implication could we agree there is long run relationship or not if yes what action statisticalLy can be taken in the decision Of The right hypothesis to use Null or Alternate
I ink you need 5 of hose starting with your first answer with ECM(-!)
Question
Hi, in an RCT, I have 3 different treatment groups and one control group. The size of the control group is around 1000 while the size of other groups are just above 300.
To test balance I used ANOVA and Welch test, which show that several variables are unbalanced. Can I draw a smaller random sample(400) from the control group so that the sizes of all groups are relatively equal?
Actually after doing that only one variable is unbalanced. So would it cause any problems if I want to draw a smaller sample just for more balance? Thanks!
My apologies for not noticing sooner that this thread continued with a further question. Consider running a random effects panel regression, with each person who is tracked over time as a panel, person-level demographic and other explanators, and dummy variables denoting the group/village each person was in (with no dummy for the comparison group). Run a Hausmann test to assure random effects made sense. To look at the effects of the intervention, initially do not code demographic/geographic differences between the villages as explanators. But in a subsequent run, code those to probe whether differences observed (or lack thereof) resulted from differences between the villages rather than the intervention. Often, the significance of intervention only becomes clear when those village-level controls are added. I'm unsure if ANCOVA also allows a panel formulation.
Question
I have tried with E-views 7 and stata 11 but failed to find the result. While doing it with E-views, I am getting only result for "level shift" but no such estimated result can be found for "level shift with trend" and "Regime Shift". But in stata I could not find the command.
Dear all, kindly click on the link below to watch video on how to perform the Gregory-Hansen test
Question
Hi,
When estimated the DCC-GARCH in stata at the end of the output pairwise quasi correlations are given. What does it mean in practice? is it the mean value of dynamic correlations or something else?
Much appreciated if anybody could clarify this.
Kind regards
Thushara
Dear SC Thushara,
Question
I use monthly data in my model with 166 observations. I did the unit root tests. I choose and made the appropriate delayed ARDL model. Diagnostic statistics include autocorrelation, heteroskedactity and Ramsey reset pass. All greater than 0.05. But when I test normality, jarque bera statistic is 0.00003. So it's not normal.I also took logarithms of variables. Should I continue the analysis by ignoring the normality test? Do you have any other advice?
Hi there. Normality is desirable but it is not a requirement for many aspects of time series analysis.
For instance, if your objective is to obtain the best linear forecast for a given variable, simple OLS estimation in a regression model, or an ARDL model, will give you a consistent estimator of the BLP (Best Linear Predictor) irrespective of any assumption about normality.
Basic asymptotic theory for stationary time series does not require normality to hold true. Most of Wald tests, t-tests in traditional time-series regressions will have the "correct" asymptotic distribution even if errors are not normal.
So, my general answer is that normality in error terms is not necessarily a requirement, especially when you have a decent number of observations.
In most financial applications normality is rejected. It is not the end of the world and I would not be extremely concerned about it. I am mostly interested in forecasting, so what is the cost of non normality for your forecast?, confidence intervals will be tipically different than in the Gaussian case, but the forecast will be just fine.
Hope it helps!
Question
I have 4 time series variables, say x1, x2, x3, x4. I am mainly interested in finding a co-integration relation between x1 and rest of the variables. All variables are I(1). Johansen’s p procedure for testing for co-integration suggest 1 co-integration among x1,x2,x3,x4. I estimated the VECM with one co-integrating vector. The alpha coefficient with x1 equation came out positive and significant, which is my main interest along with beta coefficients. The alpha coefficients on x2, and x4 are also positive and the same coefficient in x3 is negative. All coefficients are significant. I have done estimation in EViews, which also reports the significance levels of coefficients in Beta vector, which are also significant and economically makes sense. Can I expect to have a positive alpha with x1 variable, which I want to show it having a long-run relationship with other variables in the system when they are all I(1) variables. If alpha with x1 cannot be negative, then what does it suggest?
It is possible to such a result depends on what you were working on.
Question
Hi
I've estimated a DCC-GARCH(1,1) model using STATA. at the end of the stata output, correlation matrix is given and it is also called quasi correlation matrix. Is it the conditional correlation matrix or a different one? if so is it the average/mean value of the dynamic conditional correlations?
Much appreciated if anybody clarifies this.
(I've herewith attached the output)
Kind regards
Thushara
Hi. The answer is in the Stata documentation : "When Qt is stationary, the R matrix in (1) is a weighted average of the unconditional covariance matrix of the standardized residuals et, denoted by R, and the unconditional mean of Qt, denoted by Q. Because R is not eaqual to Q, as shown by Aielli (2009), R is neither the unconditional correlation matrix nor the unconditional mean of Qt. For this reason, the parameters in R are known as quasicorrelations; see Aielli (2009) and Engle (2009) for discussions". Type in your favorite search engine "DCC-GARCH STATA quasi correlation matrix" and you will find it on page 5. I used Google.
Question
I have 12 portfolios per year, I entered one for each month of the year. I did that for a number of years. I have calculated the average return for each year. Now I need to calculate correlation between these returns and other returns from another model, and also the correlation between these returns and benchmark volumes and standard deviation.
I tried to calculate correlations with the pearson method using percentages, using prices, using a mix, etc. and I continuosly get different correlation coefficients, I mean even opposite in some cases. What is the best way to calculate correlations in this case?
• I belive that u can use canonical correlation that give optimal solution and make r2 that descripe the correlation between many variables or group of variabls
• for more specific results u can use spss package by using CORREL function.
Question
Hi,
Does the eviews allow to estimate ARIMA models assuming a student's t distribution or GED distribution? I have estimated some ARIMA models and found that some residuals are not normally distributed. I guess either Student's t distribution or GED distribution would fit well.
Much appreciated if anyone could advise in this regards
Kind regards
Thushara
When asked questions such as this by colleagues my initial reaction would be to ask for more details of the series. Did the user look at the background of the series to see if there was any reasons why there might be discontinuities in the series (changes in method of calculation of data, changes in government policy or other interventions). Was there a deterministic element in the series that should have been removed before ARIMA estimation? Perhaps some transformation of the data might help, Perhaps the heavy tails were caused by an ARCH/GARCH effect.
Note that estimates of an AR(p) process are robust with respect to non-normality.
There also appears to be some confusion in earlier answers to this question. To me white noise is a process that is identically distributed with zero mean, constant variance and is uncorrelated. Independent white noise is a process that is independent identically distributed with zero mean and constant variance. Normal white noise is a white noise that follows a normal distribution.
Some of the proposals in the first paragraph should cure your problem. If they do not I would fear that the heavy tails would make you confidence intervals large and you might have to search for an alternative forecasting methodology.
Question
is there any difference between exponential moving average(EMA) and exponentially weighted moving average(EWMA)?
The exponential smoothing method gives greater weight to the more new observations, while moving average give equal weight to all observations
Question
Hi,
suppose we get the following 2sls model:
Log Y=B1- 0.8623X2+0.05X3
Where X2 is a dummy variable
Can you please interpret the coefficient of X2?
I am sorry to have missed the not-gender post.
I suspect that 2SLS with a dummy endogenous explanatory variable is a applied while there is no way to tell the software that X2 is a dummy variable (like with logit or probit). In a way the software assumes that X2 coincidentally has only two values. Then the estimate is computed as if X2 can assume all values. (Admittedly this is not an answer to the original question.)
Question
HI everyone,
During my late piece of research, I have found some argument discussing about the need to adjust daily stock prices for the exchange rate vs US dollar (given the adoption od S&P 500 as benchmark). Actually, I can't very understand the reasons underlying this point, since daily returns are stated in percentage terms, unless you are to demonstrate the cofounding effect of exchanges on daily returns.
Please could you provide me with your suggestions on this point, if possible with some relevant reference?
Thank you so much in advance.
Regards,
Nicola
Question
having a discrete variable will affect it's distribution, so is it appropriate to assume it is continuous especially there will be problem in the interpretation of the results?
Hello Heba,
The quick answer is yes, you can use linear regression for such a dependent variable. But, there are things to consider.
Think of a linear probability model with two outcomes (0 or 1). Although a major drawback of such a model is its predicted probabilities (i.e., greater than 1 or less than 0), it is much easier to deal with endogeneity issues when using a linear model. As such, if you are looking to make causal inferences (i.e., zero mean assumption), but have endogeneity issues, a linear problem is a bit easier to tackle. If you take a look at marginal effects of logit or probit and compare to the beta estimates of the linear model, they should be similar. This is telling you that the effect of the variable is approximately the same with either modeling framework.
Ultimately, it comes down to what you trying to do: predict or make causal inference by dealing with endogeneity.
Best,
Jason
Question
Hello. I am actually doing a study for the relationship of foreign trade with economic growth in Albania. My data is yearly, ranging from 1993 to 2016 (Export, Import, GDP). I am using log form (lnexport,lnimport,lnGDP) to conduct the study (lnGDP is the dependent variable). Unit Root test(ADF) indicates the series are stationary at first difference [I(1)].
Regarding lag length, 'Lag Length Criteria' chooses as optimal lag 1 when max lag is 1; lag 2 when max lag is 2; lag 1 when max lag is 3 and lag 4 when max lag is 4 (SIC and AIC criterion). Judging from my intuition lag length should be 1, as the data is yearly.
When I open the differenced series as VAR the coefficients after estimation appear to be insignificant. When I do the Johansen Cointegration test (data in levels) it shows 2 cointegrating equations in the long run. After that I check VECM (in levels) and the coefficients still appear insignifcant. I also ran a Granger Causality test (differenced data) which shows no causality in any direction.
What can I do now? Does that mean the series need to have more data?
I would be very grateful for any suggestions regarding my study, which is actually very important to me, as I need it for my thesis. I am also attaching the results for a better understanding (*removed after edit).
EDIT: Thank you everyone for your suggestions. I tried to use quarterly data and it worked. With optimal lag 7 (according to AIC) there was one cointegrating vector and VECM was successful with the error correction term being negative and significant.  Also the export coefficient sign was positive and the import coefficient was negative (in the long run part of the equation) thus satisfying economic theories. I am attaching the result below.
An explanation without any theory will likely lead to misspecified equations. A god estimate for such an equation is not a data problem ("weakness of data"). Even if you had a mass of data and, by accident, got good statistical results, these resuts are not acceptable.
Question
I am studying on how the interaction of domestic credit growth and macroeconomic variables changed after a certain event in a country by an econometric analysis thus the data (quarterly) starts from 2012:Q3 till 2017:Q1 corresponding to 19 terms. I also took the changes in each variable  to see the differences according to the previous term.
Is it alright if I conduct causality, cointegration and multiple regression analyses for this data?
I would be pleased if you could suggest a proper analysis method for this limited data or any other relevant suggestions.
Thank you Feroze Sheikh,
Best wishes..
Question
I have 2 such variables in my var model that exhibit the above given condition. Should i do something special in my Var model.
Different methods give different break dates. If we should be concerned about these breaks then which method is reliable?
Because of presence of breaks in the variables, stationarity ZA test is the best fit. As far as analysis part is concerned  the best would be Markov -Switching VAR which will endogenously takes care of the break
Question
Dear all,
I need assistance with interpreting the results from Z-A (1992) test on a variable with structural break point. Here are some results:
The variable is Gini index (measure of income inequality):
zandrews gini, break(trend) trim(0.01) lagmethod(BIC) graph
Results: //Zivot-Andrews unit root test for gini
//Allowing for break in trend
//Lag selection via BIC: lags of D.gini included = 0
//Minimum t-statistic -3.449 at 2006 (obs 27)
//Critical values: 1%: -4.93 5%: -4.42 10%: -4.11
The break point is at 2006 and since the calculated t-stat (-3.449) is below the critical values, i suppose that the variable is non-stationary, so I took the 1st difference and here's the result:
zandrews d.gini, break(trend) trim(0.01) lagmethod(BIC) graph
Results: //Zivot-Andrews unit root test for D.gini
//Allowing for break in trend
//Lag selection via BIC: lags of D.D.gini included = 0
//Minimum t-statistic -6.478 at 2011 (obs 32)
//Critical values: 1%: -4.93 5%: -4.42 10%: -4.11
The t-stat is now -6.478 which is well above the critical values evidencing stationarity but the break point has now shifted to 2011 from 2006. So, which break point is applicable for the analysis?
The year selected by ZA procedure  are the years in which the t statistic is minimized.
In my short experience, I add dummies variables based on historical events related to the study's objetive or theme. Sometimes they match with the dates given by ZA test.
Question
Does a variable have to be I(1) to be cointegrated in levels with another  I(1) variable, or can it be I(2)? I'm using the Phillips-Oularis method, and I'm only interested in the long-run coefficients, not in making an ECM.
Dear Lars,
Theoretically, cointegration under Engle-Granger approach and Johansen approach requests the same integration order of variables. With ARDL approach, we can get cointegration with different order of integration; but with I(0) and I(1), and not with I(2).
Regards,
Question
I ran a model with two integrated variables of order I(1). The variables are CDS and a Bond yield with daily data over a period of 3 years. I proceed by doing the dickey fuller test, then i ran a VAR model and choose the lags using Scharz lag criterion. Then i test LM test for the residual, heterocedasticity and a Normatility. In every one p value is les than 0,05 meaning that the model is not optimal. When i  choose for johansen cointegration test it tells me that there is at least 1 cointegration relation. If the var model is not optimal does this means that the variables are not cointegrated?
If your diagnostic tests have suggested that the assumptions of the Johansen approach are not satisfied then, strictly speaking, your Johansen results have no meaning - you do not know whether cointegration exists. On a more pragmatic level, I believe that it is the zero autocorrelation assumption that is most important so try to remove the residual autocorrelation by extending your lag length.
Question
Hello, I am studying the effects of ICT diffusion on financial sector activity and efficiency. To do so, I am creating a GMM model using panel data across 205 countries over 24 years. The model is based on one by Asongu and Moulin (2016). The model and link to Asongu and Moulin's paper are attached.
I am using the xtabond2 command in Stata, writing my line as so:
xtabond2 fe lfe ict inf gdppcg trade inv linc hinc, twostep gmm(lfe inf gdppcg trade inv linc hinc, lag(1 24))
I am particularly confused with how to correctly reflect the (t-tau) lag subscripts on the select independent variables.
fe is my proxy dependent variable for financial development. (financial efficiency).
May I please have some help in specifying the correct Stata line to reflect the model?
Thanks!
Question
I have to perform Zivot-Andrews unit root test in E-views. How to decide which type of model to use (intercept, slope, both). Is it possible to use a formal test to decide that or something else. Also, how to decide what lag length to use. The data are on quarterly basis.
The good idea is to start with the general model. Therefore, include both intercept and trend into your model. Based on the result you can exclude the one that is statistically insignificant. Regarding lag length selection, EViews itself can automatically select the appropriate lag length for you based on the lag length criteria (FPE, AIC, BIC).
Question
Hello Experts!  I'm conducting a study on the effect of trade openness on industry competition. My panel dataset is composed by 3 dimensions: country (5), industry (19) and time (9 years). The purpose of my study is to capture a dynamic equilibrium relationship. For that I need to distinguish between the short-run impact of a trade openness on industry price markups and the long-run equilibrium relationship between price markups and the competitive environment. It is important from an econometric point of view to control for the high degree of persistence of price markups: an Error Correction Model specification I allow industries to trend to different equilibria.
Now my question concerns the stata code for an error correction model with a 3 dimension panel data. I'm also indecisive between combining country and industry information to one panel unit instead of having 2 units - which doesn't let me perform xtset due to repeated time values within panel.
Thank you for you attention and time to consider my issue!