Science topic

# Applied Econometrics - Science topic

Explore the latest questions and answers in Applied Econometrics, and find Applied Econometrics experts.
Questions related to Applied Econometrics
Question
I am using an ARDL model however I am having some difficulties interpreting the results. I found out that there is a cointegration in the long run. I provided pictures below.
Dear Mr. a D,
You can use Eviews Software.
Question
Hi all, I'm doing fyp with the title of the determinant of healthcare expenditure in 2011-2020. Here are my variables: government financing, gdp, total population.
first model is: healthcare expendituret=B0+B1gov financingt + B2gdpt + B3populationt + et
second causal relationship model is healthcare expenditure per capitat= B0 + B1gdp per capitat +et
It is possible to use unit root test then ADRL for the first model and what test can use for the second model?
1. Does fyp mean "Final Year Project"?
2. With 10 years of annual data, you can try some econometrics. I would think that if you try to estimate your first equation you will find that no variable is significant. This does not mean that the coefficient is zero. The standard errors on your coefficients will be large because you do not have enough data. If you had 50 years of data the coefficient might be significant.
3. If you are trying to establish some kind of causality you must start with a fully specified economic model that contains causal links. This model must include all possible variables that impinge on the variable being caused. You need a dataset covering these variables,. You then test the model and data for misspecification. If the model and data are consistent you then estimate the model and evaluate the causal links contained in the model. Your econometrics does not establish causality. Causality is conditional on your model.
4. My recommendation is that you consult with your supervisor and find out what is expected. With your data, the most you can do is make the economic arguments and draw up some tables and graphs. Read some of the material recommended by Ted R Miller
Question
I am currently replicating a study in which the dependent variable describes whether a household belongs to a certain category. Therefore, for each household the variable either takes the value 0 or the value 1 for each category. In the study that I am replicating the maximisation of the log-likelihood function yields one vector of regression coefficients, where each independent variable has got one regression coefficient. So there is one vector of regression coefficients for ALL households, independent of which category the households belong to. Now I am wondering how this is achieved, since (as I understand) a multinomial logistic regression for n categories yields n-1 regression coefficients per variable as there is always one reference category.
Question
Hey! I need to improve already existing panel data model by adding 1 variable for access to technology. Is it possible, and what is the best  variable to measure for technology accessibility. If is possible I would like to measure technological advancment as well. What should be my variables fpr this? What are the common practises so far? Thank you!
Technological progress is measured by the rate at which these inputs improve, and the direction of technological progress is represented by the relative pace of these improvements.
Technological access is measured by the adequate skills and effectiveness.
Question
I use a conditional logit model with income, leisure time and interaction terms of the two variables with other variables (describing individual's characteristics) as independent variables.
After running the regression, I use the predict command to obtain probabilities for each individual and category. These probabilities are then multiplied with the median working hours of the respective categories to compute expected working hours.
The next step is to increase wage by 1%, which increases the variable income by 1% and thus also affects all interaction terms which include the variable income.
After running the modified regression, again I use the predict command and should obtain slightly different probabilities. My problem is now that the probabilities are exactly the same, so that there would be no change in expected working hours, which indicates that something went wrong.
On the attached images with extracts of the two regression outputs one can see that indeed the regression coefficients of the affected variables are very, very similar and that both the value of the R² and the values of the log likelihood iterations are exactly the same. To my mind these observations should explain why probabilities are indeed very similar, but I am wondering why they are exactly the same and what I did possibly wrong. I am replicating a paper where they did the same and where they were able to compute different expected working hours for the different scenarios.
Either something went wrong or performed the same test. Did you use the same version of the software as the original study?
Question
In the file that I attached below there is a line upon the theta(1) coefficient and another one exactly below C(9). In addition, what is this number below C(9)? There is no description
I asked that question to the person who code that package, and he said C(9) coefficient does not have any meaning here, just ignore. It comes up because the package is written for the old version of Eviews and has not been updated that is why.
Question
Hello, I am facing a problem concerning the computation of regression coefficients (necessary information in attached image):
Three regression coefficients (alpha y, m and f) of the main regression (2) are generated through three separate regressions.
Now I was wondering which would be the appropriate way to compute the alphas and gammas.
In case I first run regression (2) and obtain the three regression coefficients alpha y, m and f, can I use these for the separate regressions as dependent variables in order to then run regressions (3) and obtain the gammas?
What is strikes me with this approach is that the value of the dependent variables alpha y, m and f would always be the same for each observation.
In the paper they state that the alphas are vectors, but I don't properly understand how they could be vectors (maybe that's the issue after all?).
Or is there a way to directly merge the regressions / directly integrate the regressions (3) into regression (1)? Preferably in Stata.
I appreciate any help, thank you!
If I have understood the question well, you have a system of regression equations, not a single one; unless you replace the alpha coefficients in the main equation which yields a new one in terms of new exogenous variables (Z'n x yn, etc.). Then alpha coeffs will disappear and the single regression can be solved for the other coeffs, i.e. betas and gammas. Alphas can be calculated easily.
Question
Dear All,
I’m conducting an event study for the yearly inclusion and exclusion of some stocks (from different industry sectors) in an index.
I need to calculate the abnormal return per each stock upon inclusion or exclusion from the index.
I have some questions:
1- How to decide upon the length of backward time to consider for the “Estimation Window” and how to justify ?
2- Stock return is calculated by:
(price today – price yesterday)/(price yesterday)
OR
LN(price today/price yesterday)?
I see both ways are used, although they give different results.
Can any of them be used to calculate CAR?
3- When calculating the Abnormal return as the difference between stock return and a Benchmark Return (market return), The market (benchmark) return should be the index itself (on which stock are included or excluded) ? Or the sector index related to the stock?
Hi there, I am using Eventus software and I am wondering how the software computes the Market index in order to calculate abnormal returns?
Question
I have a big dataset (n>5,000) on corporate indebtedness and want to test wether SECTOR and FAMILY-OWNED are significant to explain it. The information is in percentage (total liabilities/total assets) but is NOT bounded: many companies have an indebtedness above 100%. My hypothesis are that SERVICES sector is more indebted than other sectors, and FAMILY-OWNED companies are less indebted than other companies.
If the data were normally distributed and had equal variances, I'd perform a two-way ANOVA.
If the data were normally distributed but were heteroscedastic, I'd perform a two-way robust ANOVA (using the R package "WRS2")
As the data is not normally distributed nor heteroscedastic (according to many tests I performed), and there is no such thing as a "two-way-kruskall wallis test", which is the best option?
1) perform a generalized least squares regression (therefore corrected for heteroscedasticity) to check for the effect of two factors in my dependent variable?
2) perform a non-parametric ANCOVA (with the R package "sm"? Or "fANCOVA"?)
What are the pros and cons of each alternative?
Often, the log-normal distribution is a sensible assumption for percentages that are not bounded at 100%. Are there any arguments against this in your case?
It makes a huge difference for interactions if you analyze the data on the original scale or on the log scale. If is is sensible to assume relative effects (a given impulse will change the response depending on the "base value" of the response, so absolute changes will differ when the base values differ), interactions seen on the original scale are highly misleading in terms of the functional interpretation (i.e. when you aim to understand the interplay of the factors).
Question
I have run an ARDL model for a Time Series Cross Sectional data but the output is not reporting the R.squared. What could be the reason/s.
Thank you.
Maliha Abubakari
I thought PMG estimator (a form of ARDL ) is more approprite
Question
Dear research community,
I am currently working with Hofstede's dimensions, however, I do not exactly use his questionnaire. In order to calculate my index in accordance to his process, I am looking for the meaning of the constants in front of the mean scores.
For example: PDI = 35(m07 – m02) + 25(m20 – m23) ... What do 35 and 25 mean? How could I calculate them with regard to my research?
Thank you very much for your help!
Best wishes,
Katharina Franke
Question
Dear Researchers,
I'm working on the research using the DEA(Data Envelopment Analysis) method to measure the provincial energy efficiency. However, due to the data constraint the provincial energy consumption data is not available. Can i assume the provincial energy consumption is proportional to provincial GDP?
(national energy consumption/national GDP x province i GDP)?
I agree with researcher Abdulrahman M. Ahmed . There is not necessarily a correlation between national energy consumption and national GDP. Steven Setiawan, you should try to have the information on energy consumption by province.
Question
Can I use Granger Causality test on a monetary variable only? or do I need non-monetary variables?
Also Do I need to do any test before Granger, like a unit root test, or just use raw data?
What free programs can I use to compute the data?
You can yes. You can also use Eviews or Stata softwares
Question
I have heard some academics argue that t-test can only be used for hypothesis testing. That it is too weak a tool to be used to analyse a specific objective when carrying out an academic research. For example, is t-test an appropriate analytical tool to determine the effect of credit on farm output?
Depending on you objective statement, if your objective is to compare variables that influence a particular problem, you can use the t- test to compare and them give justification.
Question
What is the most acceptable method to measure the impact of regulation/policy so far?
I only know the Difference-in-Difference (DID), Propensity Score Matching (PSM), Two-Step System GMM (for dynamic) are common methods. Expecting your opinion for 20 years long panel for firm-level data.
recent development
(1) Wooldridge two-way Mundlak regression and fixed effects and dif-in-dif
(2) synthetic control
(3) Cerulli, G. 2015. Econometric Evaluation of Socio-Economic Programs: Theory and Applications.
(4) Pesaran (2015) Time Series and Panel Data Econometrics
Question
Hello everyone,
i would like to analyze the effect of innovation in 1 industry over a time period of 10 years. the dependent variable is export and the Independent variables are R&D and Labour costs.
What is the best model to use? i am planning to do a Log-linear model.
Thank you very much for your greatly needed help!
Before deciding on the econometric model, you should go through the stationarity test (ADF test). If the data are stationary, OLS Regression with a log-linear model would be fine. But, if not, you may go for VAR or ARDL. You should also check the robustness of the model by going through residual tests such as Autocorrelation LM Test.
Question
Dear colleagues,
I am planning to investigate the panel data set containing three countries and 10 variables. The time frame is a bit short that concerns me (between 2011-2020 for each country). What should be the sample size in this case? Can I apply fixed effects, random effects, or pooled OLS?
Thank you for your responses beforehand.
Best
Ibrahim
It seems a very small sample to apply microeconometric techniques. Having 27 seven observations and 10 covariates, at most, you will have 27 - 1 - 10 = 16 degrees of freedom. This is pretty low. If I had to decide to pursue a project based on that, I would try to avoid it.
It is really closer to multiple times series than panel data. Have a look at this link:
Question
Hi Everyone,
I am investigating the change of a dependent variable (Y) over time (Years). I have plotted the dependent variable across time as a line graph and it seems to be correlated with time (i.e. Y increases over time but not for all years).
I was wondering if there is a formal statistical test to determine if this relationship exists between the time variable and Y?
Any help would be greatly appreciated!
Just perform a regression of the variable on time or a simple correlation.
Neverthless, usually, what we do is to carry out a test for mean differences at two different points of time.
Question
Dear Research Community,
I would like to check structural breaks in polynomial regression that predicts expect excess return on excess equity-to-bond market volatility. I find some good references useful but none is dealing with the polynomials. For instance:
- Andrews, D.W.K., 1993, Tests for Parameter Instability and Structural Change With Unknown Change Point. Econometrica 61, 821-856.
- Bai, J. and P. Perron, 1998, Estimating and Testing Linear Models With Multiple Structural Changes. Econometrica 66, 47-78.
- Bai, J. and P. Perron, 2003. Computation and Analysis of Multiple Structural Change Models. Journal of Applied Econometrics 18, 1-22.
- Bai, J. and P. Perron, 2004. Multiple Structural Change Models: A Simulation Analysis. In Econometric Essays, Eds. D. Corbae, S. Durlauf, and B.E. Hansen (Cambridge, U.K.: Cambridge University Press).
As my polynomial is 3-order, I am wondering if structural breaks have to be checked for the 3 orders' parameters (X, X2 and X3) in time-varying or is there other efficient way to handle this issue?
Thank you!
Faten
Byrne, Joseph P., and Roger Perman. Unit roots and structural breaks: a survey of the literature. University of Glasgow, Department of Economics, 2006.
Question
Hi,
I am looking forward to test unit root for a panel data series. In this regard, I would want to use the Hadri and Rao (2008) test with structural break. Is there any way, I can perform the test in STATA or any other like statistical software.
thanks,
Sagnik
In stata software there is xtbunitroot command for break point unitroot test.
Question
My research is to find out the determinants of FDI. I am doing bound test to see the long run relationship, cointegration test and other diagnostic tests.
Question
Dear Colleagues,
I ran an Error Correction Model, obtaining the results depicted below. The model comes from the literature, where Dutch disease effects were tested in the case of Russia. My dependent variable was the real effective exchange rate, while oil prices (OIL_Prices), terms of trade (TOT), public deficit (GOV), industrial productivity (PR) were independent variables. My main concern is that only the Error Correction Term, the dummy variable, and the intercept are statistically significant. Moreover, residuals are not normally distributed, while also the residuals are heteroscedasdic. There is no serial correlation issue according to the LM test. How can I improve my findings? Thank you beforehand.
Best
Ibrahim
I notice the following about your specification. (1) Your inclusion of a constant (and its subsequent significance) means you allow for (and find) a trend in the real exchange rate independent of any trends in the other variables. Is that economically reasonable? (2) I assume the CRISIS variable is a zero-one dummy for time periods with a "crisis" of some sort. Apparently it is not in the cointegration vector. Why not? If it were, then I'd expect to find CRISIS differences in the error correction equation. Instead, you have it in levels. Thus you specify that a temporary crisis has a permanent effect of the level of the real exchange rate independent of the other variables. Is that what you intend? (3) You do not include the lagged difference of the real exchange rate in the error correction equation. Why not? Normally it would be there.
Question
i estimated Autoregressive model in eview. I got parameter estimation for one additional variabel which i have not included in the model. the variable is labelled as ' SIGMASQ '.
what is that variable and how to interpret it?
i am attaching the results of the autoregressive model.
Sigmasq is sigma square of the distribution of residuals which is considered as a proxy of the variance of the distribution of the dependent variable. Indeed this distribution is necessary to maximum liklihood method
Sigmasq is estimated in the second step after estimating the parameter related to estimators
Otherwise, SE is the standard error of regression which is the average of differencesome between actual values and fitted values of the dependent variable
Question
Dear colleagues,
I applied the Granger Causality test in my paper and the reviewer wrote me the following: the statistical analysis was a bit short – usually the Granger-causality is followed by some vector autoregressive modeling...
What can I respond in this case?
P.S. I had a small sample size and serious data limitation.
Best
Ibrahim
Ibrahim Niftiyev , probably the reviewer wants to see not only whether a variable affects or not the other (i.e. the results of the Granger causality tests), but also to which extent (the magnitude and temporality of the dynamic relationship, something you can obtain from the IRFs of a VAR model). If you want to apply a VAR but you have a small sample size/data limitations, you want to consider a Bayesian VAR. Bayesian VARs are very popular and Bayesian methods are valid in small samples.
Question
Hello, dear network. I need some help.
I'm working on research, using the Event Study Approach. I have a couple of doubts about the significance of the treatment variable leads and lag coefficients.
I'm not sure to be satisfying the pre-treatment Parallel Trends Assumption: all the lags are not statistically significant and are around the 0 line. Is that enough to accomplish the identification assumption?
Also, I'm not sure about the leads coefficient's significance and their interpretation. The table with the coefficients is attached.
Thank you so much for your help.
You may find this attached paper helpful.
Best wishes, David Booth
Question
Dear researchers,
I am working on formulating hydrological model when runoff(output variable) is available at monthly time-step while rainfall(input variable) is at daily time-step.
I firstly wanted to explore mathematical models and techniques that can be used here. I have found MIDAS regression method, which forms relationship between mixed frequency data variables (output at monthly time step and input at daily time step). But the problem is variables in hydrological models are at the same time step. So that technique will not work, because the MIDAS model will have relation between variables sampled at different frequency.
So can anyone suggest relevant literature, in which both output and input variables of model are related at high frequency (say daily) but the model is learning through low frequency (monthly) output data and high frequency (daily) input data.
Can you use daily input data to forecast daily output data then cumulate to monthly? You will be smoothing output forecasts, but observed output is already smoothed.. Because output data is only available monthly, daily variation in output is not observable.
Question
The pair-wise granger causality test can be done by using e-views. Only doing this test, is it reliable enough to explain causality? and is it only for long run causality or both long run and short run causality test?
As pointed out by the answers given above Granger-causality represents a particular type of causality related to precedence. Granger-causality testing can also be used as part of a strategy to determine whether variables are weakly, strongly and/or super exogenous - see the reference below:
Engle, R.F., Hendry, D.F. and Richard, J.F. (1983) Exogeneity. Econometrica, 51, 277-304. http://dx.doi.org/10.2307/1911990
If you have more than two variables you should consider multivariate Granger-causality style testing using a VAR. You should also consider whether variables are stationary or nonstationary. If variables are stationary you can apply Granger-causality testing in a stationary VAR. If the variables are nonstationary and not cointegrated you can difference the variables and apply Granger-causality testing in a stationary VAR of the differenced variables . If the variables are nonstationary and are cointegrated you can apply Granger-causality testing in a VECM (that has short-run and long-run components) that assumes all the data are stationary by either cointegration or differencing transformations. If there is no cointegration the error-correction term can be excluded and testing is conducted by a Wald test in a stationary VAR. Alternatively, you can use the Toda and Yamamoto (1995) surplus lag Granger-causality test that uses nonstationary data. See Dave Giles blog for a discussion of this - the link is given below.
This method means you do not need to consider the notions of long-run and short-run Granger-causality. This is useful if you simply wish to know whether Y Granger-causes X for example. In a VECM long-run Granger-causality is tested using the t-ratio on the error-correction term in each equation. Short-run Granger-causality uses Wald tests to set the coefficients on all lagged differences of on variable to zero in an equation.
Question
Hello everyone. I am using the VECM model and I want to use variance decomposition, but as you know variance decomposition is very sensitive to the ordering of the variable. I read in some papers that it will be better to use generalized variance decomposition because it is invariant to the ordering of the variables. I am using Stata, R or Eviews and the problem is how to perform Generalised VD and please if anyone knows help me
Question
My aim is to find out the significant relationship between FDI and its determinants. I am using bound test and error correction model.
Endashaw Sisay Sirah , i found unidirectional causality between exchng rate to fdi here by using the pairwise granger causality, but it sonly for long run not for short run
Question
I am running an ARDL model on eviews and I need to know the following if anyone could help!
1. Is the optimal number of lags for annual data (30 observations) 1 or 2 OR should VAR be applied to know the optimal number of lags?
2. When we apply the VAR, the maximum number of lags applicable was 5, beyond 5 we got singular matrix error, but the problem is as we increase the number of lags, the optimal number of lags increase (when we choose 2 lags, we got 2 as the optimal, when we choose 5 lags, we got 5 as the optimal) so what should be done?
1. My first comment is that all cointegrating studies must be based on the economic theory (and common sense) of the system that you are examining. Your theory should suggest which variables are stationary, which are non-stationary, and which are cointegrated. Your ARDL, VECM, etc, analyses are then tests of the fit of the data to your theories. It is not appropriate to use these methodologies to search for theories that fit the data. Such results will give spurious results. Now suppose that you have outlined your theory in advance of touching your computer keyboard to do your econometrics.
2. You have only 30 annual observations. This is on the small size for an elaborate analysis such as this. It appears that you have one dependent variable and possibly 3 explanatory variables. If you have 5 lags you are estimating about 25 coefficients which is not feasible with 30 observations.
3. If you wish to use the ARDL methodology you must be satisfied that (1) there is only one cointegrating relationship and (2) that the explanatory variables are (weakly) exogenous. Otherwise, a VECM approach is required and you may also not have enough data for a VECM analysis.
4. Is it possible that you would use a simpler approach? Could you use graphs or a simpler approach to illustrate your economic theory? These are questions that you alone can answer. Advanced econometrics is not a cure for inadequate data and a deficit of economic theory.
5. If this is an academic project, consult your supervisor. While what I have set out above is correct, it may not be what your supervisor expects at this stage in your studies.
Question
In one of my paper, I have applied Newey-West standard error model in panel data for robustness purpose. I want to differentiate this model from FMOLS and DOLS model. So, on what ground can we justify this model over FMOLS and DOLS model.
This estimator allows you for controlling for heteroskedasticity and serial correlation of the form AR, not every type of serial correlation. Clustered standard errors are robust to heroskecasticity and any form of serial correlation over time.
Question
My research is based on Foreign direct investment and its determinants. So, I need to see if there is any significant relationship between the variables by looking at the p values. Should i interpret all the variables including the lagged ones ?
When we are interpreting the estimated model, then we extract all information, based on the conventional level of significance. The most important issue in your research is that how your analysis is related to a real-world problem, that is economic justification behind the significant and non-significant results. Similarly, I suggest you estimate multiple different econometric models and then compare these models based on economic theory, statistical analysis. This exercise will ultimately help you to select a final and more efficient model.
Question
Dear colleagues,
I ran several models in OLS and found these results (see the attached screenshot please). My main concern is that some coefficients are extremely small, yet statistically significant. Is it a problem? Can it be that my dependent variables are index values that ranged between -2.5 and +2.5 while I have explanatory variables that have, i.e the level of measurement in Thousand tons? Thank you beforehand.
Best
Ibrahim
Dear Mr. Niftiyev, your dependent variable varies betweem -2,5 and +2,5. Hence, it is better to employ a Tobit, Probit or Logit approaches if possible. The choice between these three approaches depends mostly on the distribution patterns of the variables.
Question
I have GDP and MVA data and though the MVA is stationary, the GDP is non stationary even after log-transformation followed by de-trend followed by differencing. I want to build a VAR/VEC model for ln(GDP) and ln(MVA) but this data has been haunting me for past 3 days. I also tried both method of differencing i.e linear regression detrend and direct difference but nothing seems to work.
Also, they(ln GDP and ln MVA) satisfy the cointegration test, the trends are very similar. But for VAR/VEC I will need them to be I(1) which is not the case. Any suggestions on how to handle this data will be highly appreciated!
I have attached the snapshot of the data and also the data itself.
If the time series is short, you can can be converted the data into quarterly data, it will give better results
Question
I would like to employ within transformation in panel data analysis. Market Value Added represents dependent variable. Various value drivers (advertising expenses, number of patents, etc) are explanatory variables. Is it appropriate to use standardized coefficients? Maybe logarithmic forms of a regression is more suitable
Question
Dear Colleagues,
I paid attention to that, when I estimate an equation by Least Squares in Eviews, under the options tab we have a tick mark for degrees of freedom (d.f.) Adjustment. What is the importance and its role? Because, when I estimate an equation without d.f. Adjustment, I get two statistically significant relationship coefficients out of five explanatory variables; however, when I estimate with d.f. Adjustment, I do not get any significant results.
Thank you beforehand.
Are you attempting prediction or are you trying to do some form of “causal” relationship. If you are estimating a ”causal“ model you conclusions are conditional on the model estimated. Strictly speaking, it would be better to use the adjusted degrees of freedom - particularly with your small sample. In this case, a non-significant coefficient does not necessarily imply that the coefficient is truly zero. It is more likely that your sample is too small t establish a significant result. Its p-value must not be very far from your significance level. If the estimate is of the sign and magnitude expected by theory I would accept the value and report the p-value. Esimating 5 coefficients is a big demand from a sample of 23 obs.
If you are simply doing prediction or forecasting and are not attributing explanatory power to your coefficients you might be better with a simpler model which might have better predictive ability.
Question
I made a Nested logit model. Level 1: 8 choices and level 2: 22 choices.
In type 4, I have only 1 choice in level 1 corresponds to one choice in level 2.
The dissimilarity parameters are equal to 1 in this case (not surprising).
Can i run the model normally when i have a an IV parameter than is equal to one?
The results can be interpreted normally or what should i have to do in this case?
I tried the commande "constraint 1=[type4_tau]_cons=1" but the model does not run.
what can i do?
Good question
Question
I am trying to run a regression of cobb douglas function:
The problem that my dataset capture the firm at a point of time,
So I have a dataset over the period 1988-2012.
Each firm appears one time!
(I cannot define if it is a panel/time series/cross section..)
I want to find the effect of labor, capital on value added.
I have information on intermediate input.
I use two methods Olley& pakes, levinsohn-patrin.
But Stata is always telling me that there is no observations!
my command:
levpet lvalue, free(labour) proxy(intermediate_input) capital(capital) valueadded reps(250)
Why the command is not working and telling that there is no observations?
(Is this due the fact that each firm appear only one time in the data?)
(If yes, what is the possible corrections for simultanety and selection bias in this data?)
Mina
I agree with anton
Question
Dear All,
I have Panel Data fits difference in difference
I regress the (Bilateral Investment Treaties-BIT) on (Bilateral FDI). BIT: is dummy taking 1 if BIT exists and Zero Otherwise. While Bilateral FDI: Amount of FDI between the two economies. Objective: Examine If BIT enhances Bilateral FDI?
The issue is : - Each country have started its BIT with another pair country at a fixed time (different from the others): NO Fixed Time for the whole data.
I am willing to assume different time periods in a random way and run my Diff in Diff (for robustness):
Year 2004
Year 2006
Year 2008
My questions :
(1) Do you suggest this method is efficient?
(2) Any suggestion random selection of time?
مهتم
Question
I am interested to know about the difference between 1st and 2nd and 3rd generation panel data techniques.....
First generation panel data analysis often assume cross sectional independence, i.e the shock to a variable in a country will not have any effect on the other countries variables. However, as a result of globalization and other related cross nation interlinks, it is apparent that a problem in country A can affect country B. Most of the conventional panel analysis like fixed effect, Random effect, Panel ols, among others fall into this category. In order to correct the bias in the estimate of 1st generational panel analysis, the 2nd generational panel analysis was invented. This methods appropriately incorporate the cross sectional dependence in the modeling. This includes methods like ccemg, cs-ardl, cup-FM and so on..
Question
I am currently trying to estimate the effect of energy crises on food prices. Given the link between energy and food prices, I am inclined to reason that ECM will be best to estimate the relationship between food price and energy price (fuel price). Additionally I would like to include dummy variables in the model to estimate the effects of periods of energy crises on food prices. This I know is simple to do.
Where am confused is, how to model price volatility in the context of an ECM. I am only interested in the direction where fuel price, as well as the structural dummies for energy crises influences not just the determination of food price, but their volatility as well.
Hello. I hope you are doing well. It seems to me that in this case you can use NARDL midel to check the Asymmetric Impact of price volatility besides the ECM. moreover, to check the causal direction you can apply Toda Yamamato test.
Good luck
Question
Can anyone help me to carry out mean group analysis and pooled mean group analysis. I have used Microfit and Eviews before. Appreciate if I can get some advice on how to use these panel data methods in Microfit, Eviews and STATA.
Does anyone know if you can run the MG estimator in Eviews? (Hausman test rejects null hypothesis therefore PMG is not efficient, and i need to use the MG estimator.)
Question
Hello,
I am estimating a bivariate probit model, where the errors of the two probit equations are correlated and therefore not independent. However, I suspect that one of the explanatory variables of both models may also cause endogeneity problems. My question is whether there is a perhaps two-stage procedure to correct this situation? Instrumental variables maybe? Could you suggest literature on this problem?
I should know best what you are estimating, but, in principle, age is one of the few exogenous variables. The endogeneity/exogeneity of one variable is a thing can be mainly assessed retorically. At best, and I supposed is what you are doing, you can compare IV with OLS through a Hausman test (some people call that endogeneity test) but it is subject to the assumpation that the IV is valid.
Being a relationship not estimating before I would suggest perform the analysis without IVs. Then, if possible, at most, try to find an IV showing similar results to defend your analysis with a robustness check.
Best,
José-Ignacio
Question
Dear All,
I would like to perform event study analysis through website: https://www.eventstudytools.com/.
Unfortunately they ask for uploading data in a format i dont understand , dont know how to put data in this form, and i dont find a user manual or email to communicate with them.
Can anyone kindly advise how to use this service and explain it in a plain easy way?
Ahmed Samy
Question
Dear All,
I'm conducting an event study for a sample of 25 firms that each gone through certain yearly event (inclusion in an index).
(The 25 firms (events) are collected from last 5 years.)
I'm using daily price abnormal returns (AR), and consolidated horizontally the daily returns for the 25 firms to get daily "Average abnormal Returns" (AAR).
Estimation Window (before the event)= 119 days
Event Window = 30 days
1- I tested the significance of daily AAR through a t-test and corresponding P-value, How can i calculate the statistical power for those daily P-values?
(significance level used=.0.05, 2 tailed)
2- I calculated "Commutative Average Abnormal Returns" (CAAR) for some period in the event window, performed a significance test for it by t-test and corresponding P-value, how can i calculate the statistical power of this CAAR significance test?
(significance level used=.0.05, 2 tailed)
Thank you for your help and guidance.
Ahmed Samy
Question
The original series is nonstationary as it has a clear increasing trend and its ACF plot gradually dampens. To make the series stationary, what optimum order of differencing (d) is needed?
Furthermore, if the ACF and PACF plots of the differenced series do not cut off after a definite value of lags but have peaks at certain intermittent lags. How to choose the optimum values of 'p' and 'q' in such a case?
You can use auto.arima function from 'forecast' package for R.
Alternatively, if you have many observations, you can try out-of-sample comparison of alternative models with different values of d.
To compare alternative models, you can use the instructions described here:
Question
I have seen that some researchers just compare the difference in R2 in two models: one in which the variables of interest are included and one in which they are excluded. However, in my case, I have that this difference is small (0.05). Is there any method by which I can be sure (or at least have some support for the argument that) this change is not just due to luck or noise?
Partial F-test will be useful hear. After the 1st variable is in, you add other variables ONE at a time. after 2nd bariable is added you have your y-variable as function of 2 variables giving model with 2 d.f. & certain Sum of Squares (SS). from 2-variable SS subtract 1-variable SS. that change in SS will have 1d.f. cost. So extra variable SS divided by '1' is the "Change in regression mean squares (regMS). Further, divide 2-variable residual SS (RSS) by 2-variable residual d.f to get Residual mean Squares (resMS). Now divide Change in regMS by resMS to get partial F-value & look up Tables for probability of partial F-value. If significant keep the 2nd variable in & do the same for any further independent variable you may want to add to your model. Adjusted Rsq is = 100*[1 - {(regSS/regDF)/(totalSS/totalDF)}]
Question
To illustrate my point I present you an hypothetical case with the following equation:
wage=C+0.5education+0.3rural area (
Where the variable "education" measures the number of years of education a person has and rural area is a dummy variable that takes the value of 1 if the person lives in the rural area and 0 if she lives in the urban area.
In this situation (and assuming no other relevant factors affecting wage), my questions are:
1) Is the 0.5 coefficient of education reflecting the difference between (1) the mean of the marginal return of an extra year of education on the wage of an urban worker and (2) the mean of the marginal return of an extra year of education of an rural worker?
a) If my reasoning is wrong, what would be the intuition of the mechanism of "holding constant"?
2) Mathematically, how is that just adding the rural variable works on "holding constant" the effect of living in a rural area on the relationship between education and wage?
To assume that other variable do not change in order to allow for an evaluation of partial variation in a dependent variable due to variation in the only independent while other variables do not change
Question
I am trying to learn use of augmented ARDL. But I did not find the command for augmented ardl  in stata. Can anyone please refer to the user written code for Augmented ARDL? Is there any good paper that describe the difference between ARDL bound test and augmented ARDL process? I would be happy if you can answer those questions.
In this Augmented ARDL, I find there are three test to get confirmation for the long run cointegration; e.g, overall F test, t test on lagged dependent variable, F test on lagged independent variable.
1. How to find/calculate t-statistics for the lagged dependent variable?
2. How to find/calculate F-statistics for the lagged independent variable?
Using STATA, I find that the bound test produces two test statistics: F statistics and t-statistics. But both of them are for examining overall test for cointegration. How could I find t-statistics for lagged dependent variable and F statistics for lagged independent variable?
Thank you.
Not so much difference between the two at the level of estimation. At the level of testing however, the augmented ARDL takes the testing further. For ARDL, the dependent variable is required to be I(1). The augmented ARDL overcome this by a number of tests to sort out things properly in order to avoid the degenerate cases.
Now Eviews does not directly provide the testing procedure for augmented ARDL. However, there's an addin called NARDL on the Eviews page written by yours sincerely. After estimating your model just the way you would normally do, then use the Make Testable Form to generate an OLS version of the results. All the test you need to carry out can then be done directly.
Note that the new set of critical values has been suggested. You need to use those critical values and NOT the ones generated by Eviews.
Hope that helps.
In view of this question, I'll soon add Example on my page.
Question
The paper, on which I am working, is a multivariate study. I am planning to use this model as it has two advantages:
1. It tests the stability of the long-term relationship across quantiles and provides a more flexible econometric framework.
2. It can explain the possible asymmetry in the response on one variable to changes in another variable.
Because of these two reasons, I am preferring it above NARDL.
As I am not good in STATA coding, therefore, any help regarding coding this method is highly appreciated.
STATA is not appropriate for QARDL, you can use GAUSS, R or MATLAB. Give me your email I will send you the GAUSS code
Question
Dear All,
I’m conducting and event study for inclusion of companies in a certain index.
The event is the “inclusion event” for companies in this index for last 5 years.
For the events, we have yearly Announcement date (AD) for inclusions, and also effective Change Dates (CD) for the inclusion in the index.
Within same year, I have aligned all companies together on (AD) as day 0, and since they are companies from same year, CD will also align for all of them.
The problem comes when I try to aggregate companies from different years together, although I aligned them all to have same AD, but CD is different from one year to another so CD don’t align for companies from different years.
How can I overcome this misalignment of CD from different years , so that I’m able to aggregate all the companies together?
Many Thanks.
Dear Prof. Raymond,
My aim is to see what happens to returns of stocks when they join a certain stock exchange index, do they generate abnormal returns?
I’m trying to study that for the last 3 years.
So the event is “joining the index” which happens with 2 dates (1) announcement date (AD), on which stock exchange announces the news of those stocks will join the index, and (2) change date (CD), which is the date for really including those stocks in the index, this CD is is decided and announced by the stock exchange during the AD.
I have attached a similar work for your kind reference.
Question
Dears,
I'm conducting an event study for the effect of news announcement at certain date on stock return.
Using the market model to estimate the expected stock return in the "estimation window" , we need to regress stock returns ( stock under study) with returns from market portfolio index.
1- How can we decide upon choosing this market portfolio index for regression ?
Is it just the main index of the market?
Sector index from which the stock under study belong?..etc ?
2- Is it necessary that stock under study be among the constituents of this market index?
Many thanks
You can consider using the country's stock market index as a proxy for market portfolio.
Question
I am currently assisting on a research on cross border capital flows.
A common problem seems to be that both the acquisition of assets and valuation effects determine the cross border asset holdings as , for example, reported in the CPIS data. Hobza and Zeugner use the BoP statistics on portfolio investments to derive valuation effects on portfolio debt and equity (change in asset holdings minus acquisitions) (2014).
I am wondering if the valuation effect could also be estimated because I do not only want to distinguish between portfolio debt and equity but also between different types of instruments.
For instance, between different debt maturities.
Valuation effects depend on inflation, exchange rates and liquidity (on how tightly held the asset is).Different financial markets have different levels of liquidity.
Question
Dear community,
I am struggling with statistics for price comparison.
I would like to check if the mean market price of a given product A differs over two consecutive time periods, namely from December till January and from February till March.
My H_0 would be that means are equal and H_1 that from Feb till March the mean is lower.
For this I have all necessary data as time series, sampled at the same frequency.
I thought of using the paired t-test, yet price distribution is not normal (extremely low p-value of Shapiro-Wilk test).
I guess that the two random samples of two groups cannot be treated as independent, as my intuition is that price in February would depend on price in January.
Do you know any test that would fit here? Given the nature of the problem?
Two-sample t-Test.
Question
I would like to estimate changes in GINI during 15 years. Should I make log transformation of independet varible? Raw data for some independent varibles are not normaly distributed.
Sincerely,
A.
If you want to build a parametric model, yes, you probably will need to transform some of your variables. You can even use a more general power transformation with optimal parameters so that the CLNR assumptions would be plausible. I think that the Gini coefficient itself can be transformed in order to ensure better statistical properties of your dependent variable.
Question
Every generation seems to strive for a better future ignoring the immediate present betterment in real sense. Because time is essential to all activities and the results are only obtained at end irrespective of scale and magnitude or the span of it. Thus the present behaviour is in anticipation of a future outcome. In this way the society fails to address the present and gets trapped in this vicious cycle of future driven momentum and ignores the true future. By isolating the driving force itself, mankind should realize the present and secure the future by securing the present.
Is it not the economic problem of the society at this very juncture? Are we actually addressing sustainability?
Dear Sidheswar Patra.
Thanks for raising an interesting question. I think that it is true that the current society is sacrificing the future of incoming generations. The current world is more governed, say, by the gospel of war, money, greed, competition, lack of respect for the environmental problems and corruption than by the gospel of peace, ethical principles, empathy, cooperation, honesty, and concern with environmental problems. All of this compromises a sustainable development and the future of future generations. We all have the responsibility to make of this world a better place to live in.
Question
I'm working with life satisfaction as my dependent variable and some other independent variables that measure purchasing power (consumption, income and specific expenditures). To take into account the diminishig marginal returns of this last variables (following the literature) I transformed them in terms of their natural logarithm. However, now I want to compare the size of the coefficients of specific expenditures with the ones of consumption and income. Specifically, I would like some procedure which allows me to interpret the result like this: 1 unit of resources directed to a type of expenditure (say culture) is more/less effective to improve life satisfaction in comparison with the effect that this same unit would have under the category of income. If I just do this with withouth the natural logarithm (that is, expressed in dolars) the coefficients change in counterintuitive ways, so I would prefer to avoid this.
I was thinking about using beta coefficients, but I don't know if it makes sense to standarize an already logarithmic coefficient.
Am not sure Santiago I follow what you said. Elasticity can be used and Beta weights can be used. If I understand you? I will interpret elasticity as I % increase in the RHS variable changes the regressand by the estimated sign and coefficient of the RHS eg LnY= 2 -0.5Ln(X) -- Here a percent increase in X decreases y by .5 on the avg
Question
Hello everyone,
Please, would somebody can help me to indicate a right direction how to analyze salary? I asked students about their salary expectations in 6 different situations, so I got 6 independent groups. Then i asked about gender, age, employment status, course and GPA.
I have to 1) compare some groups and indicate a side-effect and 2) select 1 group and found out the impact (gender, GPA etc.) on salary expectations. But my problem is – besides I have quite a lot data, my distribution is strange. More I read, more I confused.
For example, I have 2 groups N1=175, N2=202
How can I compare salary expectations in this groups? I read that “overlapping kernel density plots can be a powerful way to compare groups”. Ok I can overlap them but then what? I want to have also some quantitative result in salaries, not just a visualization.
• 1) - What do you think about a) Quantile regression or b) Kernel regression in this case?
Or maybe there is some other way and I don’t see it.
Thank you for reading my question!
#experimental_psychology
Everything discussed above cannot yield a meaningful answer because heteroskedasticity a has been ignored. The means whether tested by independent approaches or by binary approaches will lead to wrong inference because of nonconstant variance.
Question
My observations are points along a transect, irregularly spaced.
I aim at finding the distance values that maximize the clustering of my observation attribute, in order to use it in the following LISA analysis (Local Moran I).
I iteratively run Global Moran I function with PySAL 2.0, recreating a different distance-based weight matrix (binary, assigning 1 to neighbors and 0 to not neighbors) with a search radius 0.5m longer at every iteration.
At every iteration, I save z_sim,p_sim, I statistics, together with the distance at which these stats have been computed.
From these information, what strategy is best to find distances that potentially show underlying spatial processes that (pseudo)-significantly cluster my point data?
• Esri style: ArcMap Incremental Global Moran I tool identify peaks of z-values where p is significant as interesting distances
• Literature: I found many papers that simply choose the distance with the higher absolute significant value of I
CONSIDERATIONS
Because with varying search radius the number of observations considered in the neighborhood change, thus, the weight matrix also change, the I value is not comparable
Hi everyone,
after a little research, I finally came up with the answer I was looking for.
when using Global Moran's I index (I) with incrementally increasing distance searches (thus, changing the weight matrix at every iteration), only the the z-values are independent from both weight matrices and variable intensity variations, thus, they are comparable across multiple analyses.
The I in Moran's I statistics is not comparable across analyses, i.e, if with distance of 10m I=0.3 and distance 15m I=0.6, we cannot say that with a distance of 15m the clustering strength is double.
We could only say that in both cases there is a positive (sign of the I) spatial autocorrelation.
For the strengths, we use the z-values.
That is why ESRI plots distances in the x-axis and z-values in the y axis, indicating significant (p-value < than specified signification level) peaks as interesting distances.
For more information, it is clearly explained during a class that Luc Anselin in this Global Autocorrelation class, given in 2016 in Chicago University.
Enjoy!
Question
In order to analyze if there is a mediation effect using Baron & Kenny's steps, is it necessary to include the control variables of my model, or is it enough to do the analysis just with the independent variable, the mediator variable and the dependent variable of my interest?
"I don't have the theory to include more control variables that may me important for this model." -- so this is a statement. You know the field and the arguments why or why not a variable might have to be considered. You state that, at the best of your knowledge. So you can defend it and clearly point the advantages and limitations of your model. If reviewers (and, later, readers) have different ideas, they are invited to discuss it.
Question
I have a dummy varible as the possible mediatior of a relationship in my model. By reading the Baron and Kenny's (1986) steps, I see that, in the second one you have to test the relationship between the indepentend variable and the mediator, using the last one as a dependent variable. However, normally you won't use an OLS when you have a dummy as a dependent variable. Should I use a Probit in this case?
Now I understand. Please look at the definition of dummy variable. What you mean seems to me to be a categorical variable. BTW Actually I am not. I just answer these questions because I enjoy the enlightened repartee. If you want to know my background you are welcome to click on my name where a number of my papers, affiliations, my degrees, etc are included. May the force be with you. D. B.
Question
In my investigation about the determinants of subjective well being (life satisfaction) I have some variables that measure the access to food and also other variables that measure the affections (if, in the last week, the interviewed felt sad/happy, for example). These variables don't show high levels of simple Pearson correlations nor high levels of VIF. In experimenting with different models (including and excluding some variables), I see that access to food has a positive and significant coefficient, except in the ones that the affective varaibles are included. Can I make the case that this is due to the fact that affective variables are mediating the effect of access to food to life satisfaction? I also tried a with an interaction between access to food and affective variables but they are not significant.
Thank you Paul!
And yes, David. What can you say about the question if you re-read it changing the word variable with "coefficient of the variable"?
Question
Reading Wooldridge's book on introductory econometrics I observe that the F test allows us to see if, in a group, at least one of the coefficients is statistically significant. However, in my model I have that, individually, one of the variables of the group I want to test is already statistically significant (measured by the t-test). So, if that is the case I expect that, no matter with which variable I test for, if I include the one that is already individually significant, the F test will also be significant. Is there any useful usage I can make with the F test in this case?
Hello Santiago and Calleagues,
A t -test for one variable is identical to an F-test, in a simple regression model (with one explanatory variable); as a matter of fact in this case the square of the t-statistic value is exactly equal to the F statistic value (t^2 =F).
Once you include additional independent variables the F-test is the one that you rely to report the results. If you have significant F- test then you report the regression results and you look which explanatory variables are significant using the t-test. However if the F-test is insignificant you stop there and you do not report the regression.You need to find another model that at the minimum passes the F-test.This is a specification issue.
Kind regards,
George K Zestos
Question
I have a well endowed database with almost 29 0000 observations and I want to make an analysis with more than 50 variables. What are the problems that can arise from this situation? Can the model be overfitted? If it is possible, why?
Yes. Your model can be overfitted. You can think of overfitting in several ways, but let us take two different avenues. First, number of relevant variables. Imagine that the truly correct model has only 30 of the total of 50 variables that you happen to have. Whatever method you use to identify the correct variables in your model can lead you to "false discoveries". This is closely related to the type I error in statistical inference. You can be very strict with the type I error, but this will never be zero. So, you are admitting the possibility of false discoveries. These false discoveries have more chances to occur, the more variables and transformations of variables you try over the same sample. You mention that you have 50 variables...but what about using the squares of these variables, or logs of them or product of pairwise combinations of them. The more combinations you try over the same sample, the more likely is that you end up with false disvoveries: variables that are not in the model...but your statistical method is unable to detect that they are not....
Second, imagine that the true share of variance that can be explained in your model is 50%. So, this is the best R2 you can get if you could identify the correct model. Now, because you are trying to find the correct variables using the same sample over and over again, you end up with a sample R2 of 75%. You again have an overfitting issue induced by the data mining process.
Large N helps but a lot of the overfitting problem relies on the repeated use of the same sample to find a correct model.
This is a key topic....hope it helps!
Question
I have reported life satisfaction as my dependent variable and many independent variables of different kinds. Of them, one is the area in which the individual lives (urban/rural) and other is the access to public provided water service. When the area variable is included in the model, the second variable is non significant. However, when it is excluded, the public service gains enough significance for a 95% level of confidence. The two variables are moderately and negatively correlated (r= -0.45).
What possible explanations do you see for this phenomenom?
Mr. Valdivieso. I think the Multicollinearity is almost definitely the problem. Try the simple VIF test. Then try to change the way you measure the variables Area & Access. Good luck.
Question
I'm studying the determinants of subjectve well being in my country and I have reported satisfaction with life as my dependent variable and almost 40 independent variables. I ran multicolinearity tests and I dind't find values bigger than 5 (in fact, just two variables had a VIF above 2). Also, my N=22 000, so I dont expect to have an overfitted model. Actually, at thet beggining, all was going well: the variables maintained their signficance and the values of their coefficients when I added or deleted some variables to test the robustness of the model and the adjusted R squared increased with the inclusion of more variables.
However, I finally included some variables that measure the satisfaction with specific life domains (family, work, profession, community, etc.) and there is when the problem started: my adjusted R squared tripled and the significance and even the signs of some variables changed dramatically, in some cases, in a counterintuitive way. I also tested multicolinearity and the correlation of these variables with the other estimators and I didn't find this to be a problem.
The literature says that it is very likely that there are endogeneity problems between satisfaction with life domains and satisfaction with life since it is not too much the objective life conditions that affect life satisfaction but the propensity to self-report satisfaction. Can this be the cause of my problem? If so, how?
PD: I'm not trying to demonstrate causality.
Hi Santiago,
I believe that what you write is correct, there might be a problem of endogeneity. As you seem to be dealing with subjective indicators, there might be self-reporting biases involved. For instance, respondents who tend to be more optimistic would report a higher score with overall life satisfaction but also with the different dimensions that you look at. Further, those scores might beaffected by the current mood of the respondent etc.
I think this problem becomes worst if you include these self reported variables as predictors and outcomes in your model. Maybe addressing these two elements in separate models might be a solution (one model with overall life satisfaction and your 40 dependent variables and one model looking at the different dimensions of life satisfaction and their impact on overall satisfaction).
I had a quick glance at the literature. Pachecho & Lange adress this question of endogeneity: https://www.emerald.com/insight/content/doi/10.1108/03068291011062489/full/html
Question
Gini index is used as a measure of inequality in the income distribution of a nation. However, there might be cases where the income of a person is negative (debts, etc.). In such scenarios, how do we proceed to calculate the Gini index?
The Gini is surely not "location invariant". Just think of the following example :
10 people earning 1,2,3,4,5,6,7,8,9 and 10 respectively. The last person holds 10/55=18.2% of wealth, and the first person holds 1/55=1.8% of wealth.
Now give everyone 90 units each.
The richest person now has 100/955= 10.5% of wealth, and the poorest person holds 91/955=9.1% of wealth. A completely different income distribution resulting in a completely different Gini.
Question
I have read that it's relatively unproblematic to include dummy variables (for structural breaks for instance) in ARDL cointegration. How is this done? I have included them simply as independent variables (Chow and Bai-Perron tests suppport the break dates). Is that ok? I use Stata, with the Stata module "ARDL".
Yes, you can but only as an exogenous variable (just like you treat seasonal dummies).
Question
Insurance economic model
Look at the research conducted by IPA and Y-RISE on Innovations to Improve Agricultural Insurance. There are several research projects referenced. I am highlighting just two - Mobarak and Rosenweig, 2013; Ward and Makhija, 2018.
Question
Hi,
There are arguments for and against adjusting data for seasonality before estimating a VAR model (and then Granger causality). I've monthly tourist arrival data for three countries (for 18 years) and interested in spill-over effects or causality among the arrivals. I expect your views on the following.
1. Is seasonal adjustment is compulsory before estimating VAR?
2. If I take 12-month seasonal differenced data without adjusting for seasonality, will it be okay?
Kind regards
Thushara