Science topic

Panel Data - Science topic

Explore the latest questions and answers in Panel Data, and find Panel Data experts.
Questions related to Panel Data
  • asked a question related to Panel Data
Question
4 answers
I can't understand if is it necessary to standardize variables in panel data to properly compare the coefficients because all variables are measured in different scales (eg; currencies, counts). If need to standardize, can I apply the normal standardization? I have 3 sectors and 8 years in my dataset
Relevant answer
Answer
Yes, you can.
  • asked a question related to Panel Data
Question
2 answers
I have used xtcd command to check cross sectional dependence. And before that tested heterogeneity by doing a slope homogeneity test, results show the presence of heterogeneity among variables.
Relevant answer
Answer
Thank you Dr. Nurul Islam for this amazing and complex question. Since, you did CSD test & slope homogeneity test and found heterogeneity & CSD problem, you can apply CS-ARDL approach for long-run estimation. You can also apply AMG and CCEMG for robustness and making sure your result is valid and acceptable. Thank you.
  • asked a question related to Panel Data
Question
3 answers
I am trying to conduct an event study (green bond announcement) on listed companies.
I have collected the stock price and market price index, unfortunately i am stock because i dont know how to calculate CAR using R. Please can anyone help?
Thanks in advance
Relevant answer
Answer
You can use unexpected earnings to calculate cumulative abnormal return (CAR) and get significant results. I have similar studies using E-view, but I am unsure about R.
  • asked a question related to Panel Data
Question
4 answers
WE Know that we can not use PMG, MG, DFE..if there is cross section dependence, but until now the CS-NARDL model has not been developed, therfore Can we use Panel NARDL if there is cross section dependence?
Relevant answer
Answer
If a data set contains slope heterogeneity and cross sectional dependence problems. You can apply CS-ARDL method and also apply AMG and CCEMG for robustness.
  • asked a question related to Panel Data
Question
1 answer
I am currently trying to perform (moderated) mediation analyses with a long panel data set in Stata.
I am using a SEM model and I am trying to follow UCLA (http://www.ats.ucla.edu/stat/stata/faq/sem_mediation.htm and http://www.ats.ucla.edu/stat/stata/faq/modmed.htm). The issue I have is that these approaches are describing cross sectional analyses. Bollen & Brand (2008, http://www.escholarship.org/uc/item/3sr461nd) explain an approach to do this, but I struggle with applying their advice to Stata.
Does anyone have information, links, advice, papers etc. on how to approach this challenge
Relevant answer
Answer
Have you looked at the following paper?
Cole, D. A., & Maxwell, S. E. (2003). Testing Mediational Models With Longitudinal Data: Questions and Tips in the Use of Structural Equation Modeling. Journal of Abnormal Psychology, 112(4), 558–577. https://doi.org/10.1037/0021-843X.112.4.558
  • asked a question related to Panel Data
Question
1 answer
Hi! How do I fix both serial and cross-section auto correlation in panel date at the same time? I run a fixed effects model with PSCE errors (takes care of cross-sectional autocorrelation), and one lag of the dependent variable (not significant).
But my Durbin-Watson test is about 1.7, which is too low accordning to the Bhargava, Franzini, and Narendranathan test statistics tables (which are very tight at about 1.9 for the lower bound). Likewise the Wooldridge test for autocorrelation in panel data is significant.
I try to apply a dynamic panel data with two-step estimation and an AR model, but regardless of the number of lags I get significant results for a Pesaran CD test for cross-sectional dependence.
Bottom line: It seems like I can´t address serial and cross-section auto correlation at the same time. Or can I? How?
  • asked a question related to Panel Data
Question
2 answers
Can I ask help to decide which method is better to analysing the relationship between board gender diversity and firm performance?
I've run Fixed Effect, Random Effect, and GMM as shown in attached photos
Relevant answer
Answer
Dear Huda,
Hello! In order to choose between Fixed Effects and Random Effects models, Hausman test should be conducted. If the result of Hausman test is higher than 5% ( Prob>chi2 > 0.05) this means that you should choose Random Effects model, otherwise Fixed Effects model. Just in case I'm writing codes for conducting Hausman test below. Good luck!
xtreg y x1 x2 x3, fe
estimates store Fixed
xtreg y x1 x2 x3, re
estimates store Random
hausman Fixed Random
  • asked a question related to Panel Data
Question
2 answers
Dear Sir/Mam,
My data is panel data. May i please know whether can i use STATA using "medsem" package for mediation analysis for panel data?
And im using GMM method of regression analysis for the panel data. so should i use the log form of variables for mediation analysis? kindly explain. Thank you.
Relevant answer
Answer
Thank you Sir
  • asked a question related to Panel Data
Question
6 answers
How to identify whether a panel data is static or dynamic? And what are the techniques used to analyse static vs dynamic panel data. Thank you
Relevant answer
Answer
One way to determine whether you have a dynamic model is to estimate a dynamic model by adding lagged dependent variables and testing whether the lagged dependent variables are significant. If the lagged dependent variables are significant you should use a dynamic model and a dynamic panel estimator. If the lagged dependent variables are not significant you can use a standard panel (fixed effects) estimator. GMM estimators are preferred to the (LSDV) fixed effects estimator when you have panel data and a dynamic model (it includes a lagged dependent variable as a regressor). The GMM dynamic panel estimators are appropriate for large N and small T. For a broader discussion and how to apply GMM methods in Stata see: Roodman, D. (2009). How to do xtabond2: An introduction to "Difference" and "System" GMM in Stata. The Stata Journal, 9(1), 86-136. However, if T is large GMM estimators can become unreliable because the number of instruments becomes large and the instrumented variables can be overfitted and so may not remove the endogenous components of the lagged dependent variable(s) as intended. When N is not large and T is moderate you may wish to use a bias corrected LSDV estimator to deal with dynamic panel bias, although these assume that all variables other than the lagged dependent variable are strictly exogenous. To apply a bias-corrected LSDV estimator to a potentially unbalanced dynamic panel in Stata see: Bruno, G. (2005). Estimation and inference in dynamic unbalanced panel-data models with a small number of individuals. The Stata Journal, 5(4), 473-500. An example paper that uses Bruno's estimator is given in: Goda, T., Stewart, C. and Torres García, A. (2020) ‘Absolute Income Inequality and Rising House Prices’, forthcoming in Socio-Economic Review. An example application of GMM dynamic panel estimators is: Matousek, R., and Nguyen, T. N., and Stewart, C. (2017) ‘Note on a non-structural model using the disequilibrium approach: Evidence from Vietnamese banks’, Research in International Business and Finance, 41, pp. 125 – 135.
  • asked a question related to Panel Data
Question
1 answer
Dear all
I have a set of balance panel data, i:6, t: 21 which is it overall 126 observation. I decided that 1 dependent variable (y) and 6 independents variables (x1,x2......).
First: I do unit root test it shows:
y I(I)
x1 I(0)
x2 I(I)
x3 I(I)
x4 I(0)
X5 I(I)
x6 I(0)
If I would like to run panel data regression (Pooled, Fixed Effect and Random Effect), is that the correct form for inputting the model in Views:
d(y) c x1 d(x2) d(x3) x4 d(x5) x6
or
Shall I sort all variables in the same difference level, adding "d" to all ?
please correct if I am wrong, these are the steps I would like to conduct the statical part of a panel data:
1. Test Unit Root
2. Panel Regression?
3. ARDL
Relevant answer
Answer
Not sure what you mean here but perhaps the attached screenshot may help you. The book is available for download from the z-library Best wishes David Booth
  • asked a question related to Panel Data
Question
4 answers
I have panel data (T=10, N=26) where all variables are integrated I(1) with cross-section dependence. I applied Westerlund test and found no cointegration. So I proceeded with Pvar (Panel var) estimation. However, I want to confirm the robustness of my analysis by applying another estimation technique. Any advice?
Relevant answer
Answer
You can try AMG and CCMG for robustness.
  • asked a question related to Panel Data
Question
6 answers
Dear Sir/Mam,
I have run autocorrelation for my panel data using wooldridge test using the command xtserial. The p value is less than 0.05 showing that autocorrelation is there in data.
May i please know how to remove autocorrelation in panel data using stata. Thank you
Relevant answer
Answer
There are basically two methods to reduce autocorrelation, of which the first one is most important:
  1. Improve model fit. Try to capture structure in the data in the model. See the vignette on model evaluation on how to evaluate the model fit: vignette("evaluation", package="itsadug").
  2. If no more predictors can be added, include an AR1 model. By including an AR1 model, the GAMM takes into account the structure in the residuals and reduces the confidence in the predictors accordingly.
  • asked a question related to Panel Data
Question
7 answers
Hey Members, I'm running quantile regression with panel data using STATA, i find that there are two options :
1- Robust quantile regression for panel data: Standard using (qregpd )
2- Robust quantile regression for panel data with MCMC: using (adaptive Markov chain Monte Carlo)
Can anyone please explain me the use of MCMC ? how can i analyse the output of Robust quantile regression for panel data with MCMC ? thanks
Relevant answer
Answer
Thank you Dr. Mohamed-Mourad
  • asked a question related to Panel Data
Question
2 answers
Good morning
I am struggling a bit as I have a panel data set with a dependent variable, which varies over time and individual, and an independent variable which is constant over time but not in regard to individuals. Yit = b0 + b1 * Xi
Now my question is if it is correct to run casual fixed effects panel data regressions? If I am correct, using fixed effects would "eliminate" the variable in the end as it calculates Yit = b0 + b1 * (Xi - (mean of Xi). This would exactly eliminate what I would like to analyze.
Relevant answer
Answer
Could you perhaps look at the time-varying dependent variable in terms of a growth curve model and use the time-invariant independent variable as a predictor of the growth (intercept and slope) factors?
Growth curve and related latent change score models with covariates/predictors of intercept and slope change) can be estimated either as hierarchical linear (multilevel, random coefficient regression) models or, equivalently, as structural equation/confirmatory factor models.
Within the multilevel (random coefficient regression) framework, you could examine time as a Level 1 predictor (with random regression coefficients) and your time-invariant independent variable as a Level 2 predictor of the random coefficients (intercept and/or slope).
  • asked a question related to Panel Data
Question
1 answer
What are the unit root test techniques that can be used for unbalanced panel data with cross-sectional data? CADF seems to be OK, but CIPS seems not applicable. Is there any other method?
Relevant answer
Answer
Hi Yuning Cao, you can seek and read this recent paper: P. Chen, Y. Karavias, and E. Tzavalis (2022) Panel unit-root tests with structural breaks.
I hope it will help
  • asked a question related to Panel Data
Question
4 answers
Dear all,
I have a panel dataset consisting of string names of countries. I want to generate specific ID for each country.
E.g.
Country ----------ID
A----------------------1
A----------------------1
B----------------------2
B----------------------2
Relevant answer
Answer
encode (Var), gen (countryID)
  • asked a question related to Panel Data
Question
3 answers
I am working on a paper assessing the impact of a law enacted in 2018 using difference-in-differences regression on panel data from 2014 to 2022. I want to know if there is any advice on how to control for 2020 drop in FDI which could affect the outcome of the study
Relevant answer
Answer
Including covid cases or deaths by time and country (or even a pre/post covid dummy variable) and interacting with variables of interest would be a good start. You are correct that this is a difficult problem for anyone studying impacts of events that coincide with the pandemic. Good luck!
  • asked a question related to Panel Data
Question
6 answers
Dear Sir/Mam,
My data is panel data, secondary data. The results show that fixed effects model is appropriate. But breusch pagan/ Cook Weisberg heteroskedastic test shows that data is heteroskedastic. Should the data be homoskedastic to continue multiple linear regression. If yes kindly provide me solutions. Thank you
Relevant answer
Answer
Heteroskedasticity is not a big problem in panel data. However, using fixed effects cannot solve endogeneity among the variables. In this world, its hardly to find pure exogenous regressors that favour fixed effects model.
I would strongly suggest to use dynamic panel model (Ardl_PMG), or system GMM where lagged dependent variable is used as a regressor in the modelling approach.
Thanks
  • asked a question related to Panel Data
Question
4 answers
I need your suggestions
Relevant answer
Answer
Thank you dear ❤️ Mohammed
  • asked a question related to Panel Data
Question
1 answer
hello researcher greetings
Actually I want to run panel data model in stata, my panel data consist monthly time variable with 6 cross-sectional observation. When I am putting my data on stata the time variable is coming to be string. When i generating monthly time variable, the time variable get extended to many time period ahead. Can any one help me to solve such problem.
Relevant answer
Answer
Hi,
You will have to give an identification code for each of the panel. For instance, if your dataset consists of 5 countries (panels), and monthly dataset from January 2020 to December 2020; Panel code for country 1 from Jan to Dec shall be 1, panel code for country 2 from Jan to Dec shall be 2, and so on. This can be done manually on excel, or using formula on excel.
Two links attached for reference:
1. Tips to Building Panel Data in Stats- https://www.youtube.com/watch?v=4hEr0t4a4QM&t=194s
2. How to Reshape Wide to Longitudinal Data- https://www.youtube.com/watch?v=Htay1iz4S4Y&t=209s
Let me know if this helps.
Best,
Nikhat
  • asked a question related to Panel Data
Question
2 answers
Hello,
for my thesis I am looking at the relationship between the level of unionization (trade-unionmembership rate of the firm's employees, TR) and the firm performance (Roe).
I wanted to use a mediation model, but according to my supervisor this goes beyond my level for this thesis.
So now i simply want to regress the Roe on the TR and control variables.
I have a panel data set, consisting of 237 firms over a five year time period.
For control variables I thought I use the size of the firm (Log number of employees) and the age of the firm. Futhermore I thought it would be useful to control for industry effects, so i incorporated FTA groups, which are 37 groups.
What would be the proper course of action?
Relevant answer
Answer
Hi Lucas,
At the first step, kindly create panel IDs, which are a pre requisite for panel data based regression. Treat any missing data through interpolation/extrapolation to have a strongly balanced panel data.
In the second step, you may want to test stationarity of each of the data series employed. After determining integration orders of each of the series, you can determine the panel data model you can employ. As a general rule, Fixed and Random effects models can be used in the case of the series having same order of integration, otherwise Panel ARDL may have to be applied in case of I(0) and I(1) series. Benefit of ARDL is that it has the ability to give long and short term coefficients.
Now, if you want to use the FEM/REM panel data models, use the Hausmann test for choice between the two.
Let me know if this helps.
  • asked a question related to Panel Data
Question
5 answers
Is there any test like this in Stata?
Relevant answer
Answer
Wintoki et al. (2012) adapt the procedure for testing the relevance of the instruments in 2SLS settings, that is to say they examined the F-Statistics for the joint significance of the instruments on the instrumented variable after the first stage, which is the reduced form equation where we regress the instrumented variable on all exogenous variables (x) + some some excluded instruments (z).
They carried out two separate procedures, one for the difference equation and for the level equation as they use system GMM estimator. In case you are using differenced GMM then you can follow the procedure for the differenced equation. For each procedure you use the instruments for the corresponding equation, for example, when you do the test for the level equation, use the instruments that were used in the level equation.
For more details you can consult their paper:
Wintoki, M. B., Linck, J. S., & Netter, J. M. (2012). Endogeneity and the dynamics of internal corporate governance. Journal of financial economics, 105(3), 581-606.
  • asked a question related to Panel Data
Question
8 answers
Where can I find the heterogeneity of panel data model from EViews.
Relevant answer
Answer
I think R software is better and there are many packages (ex. lme4) linear mixed effect models. You can try. I think it is very useful.
  • asked a question related to Panel Data
Question
8 answers
Dear RG Members,
Is it necessary to check the secondary data for normality and linearity while performing panel data regression (like Fixed Effect or Random Effect model)?
And if not?
Then what to give references? As in many papers, the author/s neither discuss the secondary data's normality nor linearity before performing FIxed Effect or Random Effect.
Regards
Relevant answer
Answer
Hello Faizi,
It's not the observed scores which are the concern, it's the fitted model residuals which are presumed normal, homoscedastic, and independent. As well, the residuals can be helpful in judging whether departure from linearity is a concern.
Good luck with your work.
  • asked a question related to Panel Data
Question
5 answers
In Panel Data set where N>T , to examine the long run and short run relationship and variable are stationary at I (0) and I(1). Can I use ARDL?
Relevant answer
Answer
In this case GMM estimator is more efficient to be apply than ARDL method. This is because the number of cross-sectional unit given as "N" in your own case is greater than the time series unit given as "T". Therefore, I recommended that you apply GMM not ARDL.
  • asked a question related to Panel Data
Question
2 answers
I have a Labor force survey from 2007-2016 (10 years). The survey is not a panel data (selected from different strata and have different N across the years). N = 12,581 |12,823 |9,561 | 6,167| 9269 | 8865 | 12319 | 12544 | 12731 | 9137
I am interested in the factors (X) that influence the education-job mismatch (Y) by running Profit model. And then the Mincer equation. My questions are:
1. Should I use pooled data analysis? If so, how to merge the 10 years data together?
2. Should I analyse the 10 years survey separately before, to see, if my (assumed) model is rather stable over time?
3. There is a possibility that one same person completed the surveys but I don't know how to match people directly or narrow down the data as a panel data considering large number of sample and years?
Thank you.
Relevant answer
Answer
Hello Amy,
If the Labor Survey is conducted with a "fresh" sample each year, rather than as a longitudinal study, then each year's data set may (and should) be treated as independent. If longitudinal, most long-term surveys with which I've worked do assign unique iD tags to each case and carry these forward, so that one may match responses across time.
Given the sample sizes, there's no apparent loss to treating each year's results as a stand-alone batch for purposes of investigating your model. If there are temporal events that could account for differences in variable values (e.g., pre-pandemic, post-pandemic; pre-recession of 2008, post-recession of 2008), this would argue against merging data sets across years.
You may certainly test whether model parameters are constant across years (for example, as a multi-group model in path/sem).
Good luck with your work.
  • asked a question related to Panel Data
Question
3 answers
I am conducting a panel data study wherein I am measuring the impact of one independent variable (continuous) on a dependent variable (continuous) in a fixed effects regression. In addition, four control variables are included which are also continuous. There are 12 firms and I have repeatedly measured them for 10 years. Thus, the total number of observations is 120. Can someone please point to relevant resources / software to estimate an adequate sample size / power for such a research design.
Thanks in advance.
Relevant answer
Answer
The attached Google search may be helpful to you. Best wishes David Booth
  • asked a question related to Panel Data
Question
2 answers
What is the difference between mediating and moderating variables in panel data regression?
Relevant answer
Answer
I recommend that you use the free statistical program JAMOVI which allows the inclusion of multiple moderators and mediators.
  • asked a question related to Panel Data
Question
8 answers
Hello colleagues,
when doing a meta-analysis with firm data, we accounted some panel studies haveing, e.g., 100 firms repeatedly providing data across, say, 10 years. I would naturally use N = 100 as the relevant sample size but colleagues disagree and vote for N = 100x10.
Does anybody have an idea?
All the best
Holger
Relevant answer
Answer
Hello Holger,
Usual practice is to count independent cases. As panel data would not be independent, I'd vote for your choice of N = number of firms, rather than number of data points.
Good luck with your project.
  • asked a question related to Panel Data
Question
1 answer
It is well-known that Dumitrescu and Hurlin (2012) proposed a test of Granger causality in panel datasets. However, this test requires balanced panel data.
I wonder if there is a test for the Unbalanced panel data?
Relevant answer
Answer
you can try xtgranger in stata
  • asked a question related to Panel Data
Question
1 answer
answer
Relevant answer
Answer
sgmediation dv, iv(independent variable) mv(mediator) cv(controls) -> the variable names are in brackets I am getting significant effects of the mediator and the IV (for the direct and indirect effect)
In these analysis is done for cross sectional data if you can de-mean the data in case of fixed effect and can use this test which will treat data a pool
  • asked a question related to Panel Data
Question
3 answers
I know this is probably fairly straight forward but I can't find a clear answer to the question online and I'm new to STATA. I want to work out marginal effects of an interaction term in a simple panel mixed with country and year fixed effects in STATA ( Y=a+β1X1it+ βX2it + β3X1it*X2it + error term). I want to determine the impact on Y when X1 is high and X is medium, when X1 is low and X2 is high etc etc
I've used xtreg Y X1 X2 c.X1#c.X2
X1 and X2 are both continuous scores between 0 and 1.
I'm wondering if I can use the STATA command:
margins, at x1(=(0 0.5 1) x2=(0 0.5 1))
For marginal effects at a score of 0,0.5 and 1 for because it's giving me weird results or if I should use
margins, dydx(x1) at (x2=(0 0.5 1))
but I'm a little unsure of how interpret the latter
Any help would be greatly appreciated, I've read most of the things already posted about this topic and found them not very helpful but any hints are greatly appreciated. I'm very confused.
Thanks in advance
Relevant answer
Answer
I think that the main problem with your estimation is, that in the original equation you cannot get rid of the correlation between the first two and the third (combined) explaining variable. First, you should think about, whether this function is really the best choice for showing the relations between the variables. If you come to the conclusion it is, then you should try to find transformations of the equation to weaken the dependency between the explaining variables. One possible way: For simplicity, I rewrite your equation Z=a+bX+cY+dXY (without indices i and t). If one calculates the first differences in time t, one gets DZ=bDX+cDY+dDXY, where D means the difference to the preceding period (e.g. DZ=Z-Z-1). As DXY=XY-XY-1+XY-1-X-1Y-1, we get DZ=bDX+cDY+dXDY+dY-1DX. Dividing through DX and DY (one might also simply divide by X and Y) leads to
(DZ/(DXDY)=b(1/DY)+c(1/DX)+d(X/DX+Y-1/DY). Of course, there remains some correlation between the transformed explaining variables, but this correlation will likely be considerably less than in the original equation. Another advantage is, that the constant should be near zero and insignificant (if not, something is wrong with the equation). Estimating in differences is, in general, a good test, if the original equation is really linear.
  • asked a question related to Panel Data
Question
7 answers
I have a balanced panel data set with 156 observations from the years 2010 to 2021, and after testing the goodness of fit with linear regression, the R square is 58 percent, which is good. After model fitting, testing revealed, however, that the linear regression violated 1) the normality of the error terms and 2) was (non-constant) in the variance of the error terms. If you know how to fix the issue and why it persists, please do. what restrictions my data were subjected to.
Relevant answer
Answer
I make that 14 entities that you are repeatedly measuring; is that right (156/ (2021-2010)? It will be difficult to asses 'normality' with that few observations. But what reality matters is outliers.
  • asked a question related to Panel Data
Question
6 answers
data set is in regard of Macro economic, company and Industry porter analysis.
Relevant answer
Answer
Probably yes. There is a lot of literature on the topic. I am not an expert on the topic. Google search for the literature.
  • asked a question related to Panel Data
Question
3 answers
Hello, Is it posible to use logistic regression on pooled panel data? The dependent variable is whether or not respondent has diabetes. The independent variables are income, gender, education. Should the individual income observations be adjusted to reflect increasing (average) income over time? Are there any specific considerations that should be addressed?
Thank you.
Jakub
Relevant answer
Answer
Why is this panel data and what do you mean by pooled? I am attaching what may be a similar study. Best, David Booth
  • asked a question related to Panel Data
Question
4 answers
Hello! Can repeated (multiple) cross sectional data be treated as panel data?
Relevant answer
Answer
Dear Deepani Siriwardhana,
You can treat them as pooled panel data and carry out analysis.
  • asked a question related to Panel Data
Question
2 answers
I'm having trouble with my Stata codes and am getting some werid results.
Relevant answer
Answer
Thank you for sharing these resources Ibrahiim. I have actually used both of these, but am still running into some challenges.
  • asked a question related to Panel Data
Question
2 answers
Hi, today I came across a strange problem. I found a panel data set online today (for the period 1980-1987). I first estimated a model with no fixed effects, but with time fixed effects (year dummies). As expected, one of the dummy variables was removed from the model (1980). I then used the fixed-effects estimator and observed something odd. Now, in addition to the 1980 dummy variable, the 1987 variable was also removed, as was the education variable. The education was removed because it is time invariant, but I can't explain the removal of the 1987 dummy variable. I had initially inferred mulitcollinearity, but then the 1987 variable should have been removed in the first regression, right? Also, the VIF values do not indicate a problem of mulitcollinearity. So what could be the reason for this? Could it have something to do with the -xtreg- command or fixed-effects transformation in general?
These are my commands:
reg lnwage union educ exp i.year, robust
xtreg lnwage union educ exp i.year, fe robust
Relevant answer
Answer
Dear Lorenz
The best way to understand this problem is to program it in a matlab procedure (for instance) and then observe what happens when you use all the created dummy variables in the regression. Brajesh´s answer is near the core of the problem, but not very close. The problem is related to including ALL those DUMMY independent variables. As the picture shows, you are not including a column of ones for the intercept. You mentioned that, for the no-fixed-effects time-fixed-effects case, "as expected", one of the dummy variables was removed from the model. Do you know exactly why? Besides, why was 1980 the removed variable? First, whenever you have a set of dummy variables, you cannot include a column of ones as an independent variable because then you will get PERFECT COLLINEARITY and thus inv(X'X) does not exist (ie, any results you may get -from the routine you are using- are completely wrong). Secondly, the routine is getting rid of one of those dummies (in this case, 1980) just to be able to provide "correct results" WITHOUT a column of ones. This implies that the routine is computing the degree of multicollinearity and it is somehow detecting "the specific variable generating that problem" (subroutine?) and thus getting rid of it. Let's consider the second case, the fixed-effects time-fixed-effects case. Now you got two sets of dummy variables and thus the problem gets worse because now you have PERFECT COLLINEARITY (because just one of those sets of dummy variables is now working as a column of ones wrt the other set of dummy variables). To tell you the true, "the specific variable generating that problem" (subroutine?) is running twice, thus getting rid of two variables. CLEARLY, I suggest you should code a matlab program and then consider those two sets of dummy variables as "the whole set of dummy variables" (excluding "union", "educ" & "exp"). Then get rid of the FIRST dummy variable in the whole set of dummy variables, and save the results. Then do it again getting rid of the SECOND dummy variable, and save the results, ... and so on until getting rid of the LAST dummy variable. Then you must compare all those saved results wrt the SSR's, the AIC's, or any other econometric criterion in order to choose the best estimated model. You should NOT exclude any variable in the set { "union", "educ", "exp"}.
  • asked a question related to Panel Data
Question
4 answers
I have panel data with 21 countries and 8 years.
Relevant answer
Answer
Bin Peng Thank you for your response!
  • asked a question related to Panel Data
Question
4 answers
Hello,
I am trying to estimate a stochastic frontier model in the panel data set using stata13. In doing so, I prefer to use the TRE model. Even though I am aware that this model allows estimating the frontier model and the inefficiency determinants at a time(following a single step), I am not sure if I am following the correct procedure.
Could someone help me?
sfpanel ln y ln x1 lnx2 .....lnxn, model (tre) dist(hnormal) usigma(z1,z2,z3...zn) vsigma()
The assumption here is z1....zn are inefficiency determinants and also considering heteroscedasticity at the same time.
Thank you!
Relevant answer
Answer
Many thanks, Abebayehu Girma Geffersa . It was two years ago and as you writously said, I followed the procedure that I mentioned here and it is the correct approach. Akbar Muhammad Soetopo please follow a similar procedure that I wrote here as well. For a detailed answer to this, please read this paper. Belotti, F., Daidone, S., Ilardi, G., & Atella, V. (2013). Stochastic frontier analysis using Stata. The Stata Journal, 13(4), 719-758.
  • asked a question related to Panel Data
Question
3 answers
I'm trying to get the data of loan officers from microfinance(how many borrowers they approach, loan amount outstanding, the portfolio risk, the percentage of complete repayment, etc). Can anyone suggest to me the database to use data for the panel data model?
Thank you.
Relevant answer
Answer
Data about customers are sensitive and confidential and I doubt if any bank will like to release data about customers' loans. In my country (Nigeria) there may not be possibility of getting online information to capture this data because of fraudsters. I therefore have no idea of how clearance and permission can be given by Micro finance bank to get customers data about loans for research..
  • asked a question related to Panel Data
Question
4 answers
Dear Research
I want to run the NARDL model in Stata. Can someone explain to me the steps in order to run the NARDL model in Stata, especially with panel data? My dependent variable is RPPIs, my independent variable is GDP, and the control variables are inflation, interest rates, and credit. Please, someone explain to me the whole procedure. I am very thankful.
regards
Yuni
Relevant answer
  • asked a question related to Panel Data
Question
3 answers
I have T=5 N=26 panel data set for NUTS 2 Regions of Turkey. Can I use System GMM technique? Thanks in advance for your valuable comments.
Relevant answer
Answer
Of course, but you might get problems with the IV count, as the number of observations (N = 26) is quite small.
  • asked a question related to Panel Data
Question
7 answers
Hi all,
I conducted analysis on 3 countries using two methods:
· Time series (one model per country)
· Panel data (one model for three countries by combining time series and 3 cross sections units)
The results are as follows:
Time series:
Impact of independent variable X to dependent variable Y:
Country A: Positive and significant
Country B: Negative and significant
Country C: Negative and significant
Panel data:
Impact of independent variable X to dependent variable Y:
Country A, B, C: Positive and significant
How do I explain the different results between time series and panel data for Country B and C?
Why is the result different using the time series and panel data regression for those countries?
Note: I checked the data and meet all requirements for both methods.
Appreciate your help.
Relevant answer
Answer
Your result aptly shows influence of country uniqueness matters, compared to the pooled panel analysis of the countries. For example, and in a GDP-Environmental quality, measured by CO², relationship: For country A, a higher economic output increases CO² emissions, but reduces it in both Country B and C, suggesting perhaps countries in higher income grouping with greater resources to provoke shift to non pollutant energy source and growth. The negative and significant coefficient of GDP, for example, in the pooled panel data, suggests overall state of these countries, which could majorly be developing countries with little or no pollution standards and regulations, and one whose growth does not take into cognisance factors that prop up CO² emissions in the overall economy. Hope this help.
  • asked a question related to Panel Data
Question
4 answers
I have a SEM model (with 9 psychological and/or physical activity latent variables) with cross-sectional data in which, guided by theory, different predictor and mediator variables are related to each other to explain a final outcome variable. After verifying the good fit of the model (and after being published), I would like to replicate such a model on the same sample, but with observations for those variables already taken after 2 and after 5 years. My interest is in the quasi-causal relationships between variables (also in directionality), rather than in the stability/change of the constructs. Would it be appropriate to test an identical model in which only the predictor exogenous variables are included at T1, the mediator variables at T2 and the outcome variable at T3? I have found few articles with this approach. Or, is it preferable to use another model, such as an autoregressive cross-lagged (ACL) model despite the high number of latent variables? The overall sample is 600 participants, but only 300 have complete data for each time point, so perhaps this ACL model is too complex for this sample size (especially if I include indicator-specific factors, second-order autoregressive effects, etc.).
Thank you very very much in advance!!
Relevant answer
Answer
Hi there,
a completely different approach could be to run a TETRAD model. This allows to set some restrictions where you can be sure about the direction (e.g., the autoregressive effects as well as forbidding reverse effects from t_n to t_n-1) and freely expore the rest. The model will print a path diagram that shows you three things
1) clearly supported causal (happens only rarely)
2) Effects with one side (the arrowhead) crealy "depentent" but the other end ambiguous (may be a cause or a consequent)
3) completely ambiguous relationships
TETRAD exists since the 80s and is remarkably invisible to our field.
Eberhardt, F. (2009). Introduction to the epistemology of causation. Philosophy Compass, 4(6), 913-925.
Malinsky, D., & Danks, D. (2018). Causal discovery algorithms: A practical guide. Philosophy Compass, 13(1), 1-11. https://doi.org/10.1111/phc3.12470
A few final comments
1) 90% of confirmatory SEMs are much too complex and involve dozens and sometimes hundreds of testable implications. And because of that the models never fit which--in almost funny manner--is then used as support for the model ("well, models never fit, so why should my do?"). I would always focus one ONE or TWO essential effects or chains of effect and then try to a) think hard about confounders and b) potential instruments
2) Yes, causation needs time to evolve but most often, the time lag is embedded in the measurement. Otherwise you would not have any cross-sectional correlations. That is, if the causal lag is similar to the lag embedded in the measure (e.g., "how satisfied are you with your job" will prompt an answer that is derived from memory) OR / AND the IV is stable, then cross-sectional data will generally allow to identify causal effects. The key issue is and stays "causal identification"--that is removing confounding biases and potential reverse effects. The latter can be solved in a cross-lagged design but not the former. That is, you have to think hard about confounding no matter what the timely design is.
I had a long discussion in the following thread in case you differ in your opinion (which is fine for me):
Best,
Holger
  • asked a question related to Panel Data
Question
1 answer
What is the best dynamic panel model when T>N ?
Relevant answer
Answer
For the long T panel, a possible option is the panel data techniques that account for possible cross-section dependence. See for instance: https://www.sciencedirect.com/science/article/abs/pii/S0304407620301020.
Hope this helps. Thank you.
  • asked a question related to Panel Data
Question
4 answers
Dear ReseaarchGate community,
I am researching the impact of information technology capability on audit report lag before and during Covid-19.
I question how to build a model to see the impact of information technology capability on audit report lag without separately testing data (mean before the covid-19 and during the Covid-19).
Note: the type of data is panel data
Thank you, and I'm looking forward to your feedback.
Relevant answer
Answer
  • asked a question related to Panel Data
Question
8 answers
I use panel data
I0 and I(1)
i use eviews 9 lite
Relevant answer
Answer
It can not be done with Eviews
  • asked a question related to Panel Data
Question
2 answers
I am working on my thesis and I have a few questions about which method I should use for the analysis of my data. My research is about inequality in Europe, and in general households in Europe that have access to broadband internet connections and what its effects are on their educational attainment. The IV here would be
  1. % of households that have access to a broadband connection
  2. GINI Index of the country (to mesure inequality between the countries)
The dependent variable would be a variable that can be divided in 3 groups: . % of population that completed primary education, % of population that completed secundary education, and finally % of population that completed tertiary education.
The data is available on the World Bank database and has about 20 countries over a period of 15 years. After doing some research I figured out that the type of data i'm using is Panel Data. I have done some reading about it, but I can't figure out how to continue, because most of the tutorials only use one IV and on DV. What I have read suggests using OLS(my promotor also told me that OLS would be best suited) for the type of variables and data I'm using and that I will need control variables like "population" or "unemployment", but I don't get it.
I don't know if I'm being clear here, I basically want to know if I need to do what I read (but then I have no clue on how to work with 2IV and 1 DV), or if it's something completely different..?
If something isn't clear, let me know, and I'll try to explain better. Thank you very much.
Relevant answer
Answer
Thank you very much for the answer Christian Geiser . I checked a bit more, and I think I might have to rethink my DV, because it seems as what I want to achieve is multiple DV's and that would make things more complex than they need to be. My aim, and my research question actually is about the digital divide and what effects access to broadband internet has on the educational attainment. Maybe I should go for a single DV (that is continuous), like "% of people that completed college" instead. Would OLS Linear regression still be a good method? Thank you.
  • asked a question related to Panel Data
Question
3 answers
Hi all, I have a panel data set consisting of monthly returns, ESG scores and Fama French factors. Monthly returns and ESG scores are changing across units and across time. However, the Fama French factors does not change across units, but do change across time. My question is, whether I then can use Fama French factors in a Pooled OLS, RE of FE estimation? And can I then add time and firm fixed effects?
Thanks, Kind regards Karoline
Relevant answer
Answer
Dear Karoline.
You can run multiple regressions with various variation and choose the one with the best results.
It is a Research, try it and see what the results look like.
Regards
  • asked a question related to Panel Data
Question
5 answers
Could anybody explain that while performing the system GMM model, I have to perform a unit root test or not? I have data from 46 firms for 12 years, but while performing the Levin-Lin-Chiu test in Stata software, it shows the following "Levin-Lin-Chiu test requires strongly balanced data".
Relevant answer
Answer
Testing for unit roots is also of high importance when we have to deal with panel data. In the panel unit root test framework, two generations of tests have been developed. The first generation contains the Levin, Lin and Chu test (2002), Im, Pesaran and Shin test (2003) and Fisher -type tests, whose main limit is the assumption of cross-sectional independence across units. A second generation of tests that rejects the cross-sectional independence hypothesis also exist. Within this second generation of tests, two main approaches can be distinguished: the covariance restrictions approach, adopted by Chang (2002, 2004), and the factor structure approach, including contributions by Phillips and Sul (2003), Moon and Perron (2004), Choi (2002) and Pesaran (2003).
Most econometric packages incl. Stata contain the first generation tests. However, some of them (eg. the Levin-Lin-Chiu test) require strongly balanced data (namely not to have missing values). So, you proceed with the rest tests without any problem.
Cheers!
Dimitris
  • asked a question related to Panel Data
Question
6 answers
Dear all,
Could you advise whether Granger Causality in panel data can be tested in R?
Thank you
Best,
Irina
Relevant answer
Answer
  • asked a question related to Panel Data
Question
6 answers
According to someone, using country-level data for study might bring the "aggregate fallacy" introduced by macro data, resulting in possible bias in estimate findings. If this is true, then why have researchers published several research articles in prestigious journals in which they analyzed country-level data? What if we conducted estimates using data from income groups? Simply said, if this is indeed a problem, what will be the solution?
Relevant answer
Muhammad Ramzan , I believe this quote may express the meaning: the aggregation fallacy = aggregation bias, or better, the conclusion that what is true for the group must be true for the sub-group or individual. It's called aggregation bias because you're using aggregated data and extrapolating it inappropriately. It emerges, as one but not the only one reason, due to heterogeneity!
  • asked a question related to Panel Data
Question
4 answers
I am thinking estimating a bivariate random effect probit for panel data and I wondered if there was a way to do this with stata
Relevant answer
Answer
Yes
  • asked a question related to Panel Data
Question
8 answers
time series analysis
Relevant answer
Answer
Существует много методов выделения тренда и сезонных колебаний из экономических, биологических и метеорологических рядов. Разные методы дают порой существенно разные результаты. Перебирать все возможные методы на конкретных примерах - это путь в недостижимую бесконечность.
Чтобы доказать, что вода в море соленая не надо выпивать все море. Конструктивной альтернативой перебору методов является сравнительный их анализ на базе аксиоматического подхода. Я готов представить опубликованный пример результатов такого подхода
  • asked a question related to Panel Data
Question
11 answers
I have panel data comprises 5 cross sections across 14 independent variables. the data time series part is 10 years. while I run the panel data model for pooled OLS and FE model it gives results while for Random effect model it shows error as RE estimation required number of cross-section>number of coefficients for between estimators for estimation of RE innovation variance. Can anyone help me how to get the results for Random effect model?
Relevant answer
Answer
You can try Mixed model (random for periods and fixed for cross-sections or vice versa), if that didn't succeed, you should take random(for period only) and Non for the cross section, or reverse. you can do this by fixed also, after that you should check your choice by Housman or Chow test(redundant test in EVEWS) .
  • asked a question related to Panel Data
Question
6 answers
I am interested in some recommendations regarding the use of bootstrapping with panel data.
I appreciate any suggestions on literature and also software that can be used to create the bootstrapping sample.
Relevant answer
Answer
Stata allows cluster bootstrapping.
clusterid is just the numeric id of firms/households/individuals/etc.
Clustered standard errors:
xtreg y x, fe vce(cluster clusterid) Bootstrapped clustered standard errors: xtreg y x, fe vce(bootstrap, seed(12345)) cluster(clusterid)
  • asked a question related to Panel Data
Question
1 answer
The 5 methods of estimating dynamic panel data models using 'dynpanel" in R
# Fit the dynamic panel data using the Arellano Bond (1991) instruments reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,4) summary(reg) # Fit the dynamic panel data using an automatic selection of appropriate IV matrix #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,0) #summary(reg) # Fit the dynamic panel data using the GMM estimator with the smallest set of instruments #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,1) #summary(reg) # Fit the dynamic panel data using a reduced form of IV from method 3 #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,2) #summary(reg) # Fit the dynamic panel data using the IV matrix where the number of moments grows with kT # K: variables number and T: time per group #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,3) #summary(reg)
Relevant answer
Answer
Brown Chitekwere,
May I ask if you find your answer to share it here, please?
In addition, when I ran the model, I received the error "rbind error: "names do not match previous names." Do you have any idea about it?
I appreciate your help.
Kind regards,
Mona
  • asked a question related to Panel Data
Question
6 answers
I've applied a 4-variable panel data VAR for 19 units for 15 years' data (variables were all normalized to get values between 0 and 1). After checking for cross sectional dependency, appropriate unit root tests were applied for stationarity and where necessary, variables were transformed to make them stationary.
I've used STATA [commands mentioned in a paper by MRM Abrigo and I Love, 2016) to conduct panel VAR. Sabiliity condition was satisfied, VAR shows significant relation between the variables (so does Granger tests). But when I plot the IRF graphs with Monte Carlo simulations, my confidence bands are explosive.
Without the MC simulation, the IRF graphs look good.
How can IRF graphs be inconsistent with my data that is stationary and results that are stable?
Is it something to do with using normalized values? Or a problem with the instrumental lags selected?
Thanks a lot in advance!
Relevant answer
  • asked a question related to Panel Data
Question
2 answers
Hi, community.
I have some questions regarding ARDL model in the form of UECM(Dynamic panel data).I use this technique to estimate EKC Hypothesis for panel of 16 countries.I wonder if I can use this method in the presence of random effects (for example, by using GLS).I also would like to know if I should implement specification tests(autocorrelation, heteroskedasticity and contemporaneous correlation)like fixed effects model.I really appreciate your advice.
Relevant answer
Answer
Thanks for your advice,@Kehinde Mary Bello.
  • asked a question related to Panel Data
Question
6 answers
Hi, community.
I have some problems regarding the stationarity of the variables.I have realised unit root test to check the stationarity of the variables and it shows that some of them are not stationary even at the first difference, but stationary at the second difference ,namely variables are I(2).May I ask in this case which panel econometric technique can be used to estimate short and long run effects when variables are integrated of order 2?(clearly ARDL is not the option)
I really appreciate your help.
Relevant answer
Answer
If your series/variables are continuous in terms of their measurement levels, try to transform (eg. logarithmic transformation) your variables to make their distributions symmetrical and assess if they still remain nonstationary........
  • asked a question related to Panel Data
Question
4 answers
Do we need to run the panel regression diagnostic tests for panel stochastic frontier analysis (xtfrontier or sfpanel)?
Could someone suggest the best read for this analysis, especially for the time-varying decay model and sfpanel with inefficiency functions explicitly mentioned?
Relevant answer
Answer
Regarding your problem, I can recommend the following article:
Battese, G.E., Coelli, T.J. Frontier production functions, technical efficiency and panel data: With application to paddy farmers in India. J Prod Anal 3, 153–169 (1992). https://doi.org/10.1007/BF00158774.
  • asked a question related to Panel Data
Question
2 answers
I want to examine the impact of economic growth (GDP per capita) and population on CO2 emissions. I checked the unit root and find that the variables are integrated in different order, CO2 I(1), GDPc I(1) and Pop I(0), where the ARDL model is recommended for this case, but I get stuck by ARDL estimation. By using Eviews 11, I got an error "Near singular matrix" which means there is multicollinearity. I checked the correlation between variables and got 1 between GDPc and population.
Therefore, I am asking if there is a way to fix multicollinearity in panel data or if GDP per capita and population can not be in one regression? looking for your help
Relevant answer
Answer
to solve the multicollinearity problem substitute the variables by proxies, or merge the similar variables in one variable.
  • asked a question related to Panel Data
Question
13 answers
Dear All,
I am analyzing the impact of adopting two technologies using panel data. My dependent variable is categorized into 0=non-adopters, 1=partial adopters, and 2=both technology adopters. Independent variables include respondents' demographic and socio-economic variables.
I am using six years of panel data (2016-2021). Can anyone kindly suggest which impact assessment analysis is suitable for this study?
Thank you very much.
Relevant answer
Answer
Hi Faruque,
I think you should use a multinominal endogenous switching regression model or multivalued treatment effects model by including year dummies as regressors. After you calculate the ATTs, you can do some disaggregated analysis using year dummies. By doing this, you can see how the treatment effects change over time.
  • asked a question related to Panel Data
Question
4 answers
Y(dependent) = I(0)
X(independent)= I(1)
if I take the first difference of X and make it stationary, is it possible to apply Dumitrescu & Hurlin Causality test. İf the answer is yes, could you please share an article for this situation.
Thanks and regards,
Relevant answer
Answer
533 / 5.000
Çeviri sonuçları
I guess I didn't express myself fully. For example, while the Konya causality test can analyze variables at mixed level stationary, the variables for D&H need to be stationary at the same level. Accordingly, as I want to use D&H instead of using Konya causality test, if I make the non-stationary variable stationary by taking its first difference, will this pose a problem in terms of analysis?
  • asked a question related to Panel Data
Question
1 answer
Hi dear scholar! With best wishes for you all. I know a little about heterogeneity problems in a model, such as slope heterogeneity due to cross sections differences. Is there any other heterogeneity exist in panel data ? To solve this slope heterogeneity we split the units into subgroups based on a characteristics such as income level, democracy, etc. Or use interaction term even in homogeneous models, or use heterogeneous mode such as DCCEMG, CS ARDL etc. Please let me know if there is another method. Thank you
Relevant answer
Answer
There is no possible end to heterogeneity, since circumstances alter cases and the world has grown more heterogeneous with several permutations arising from cultural- association intermixes.
  • asked a question related to Panel Data
Question
6 answers
I have panel data of various countries where exchange rate is one of the variables. I have an exchange rate denominated in local currencies only. My question is that Do I need to transform local currency to dollar denominated currency?
Relevant answer
Answer
If you are using the log of the exchange rate in your analysis it is not important which one you use as
log(US dollar price of one unit of domestic currency) = - Log (domestic currency price of one US dollar)
If you are considering comparing volume series (real GDP) at constant prices you must convert them at base year (or quarter) exchange rates. Otherwise, you will be contaminating your common currency volume data with changes due to changing exchange rates.
  • asked a question related to Panel Data
Question
7 answers
I have a panel dataset of 120 different countries measuring a variable over three periods. This variable indicates the percentage of 1000 respondents in each country that answered yes to the question. I am considering this representative of the probability that a respondent in each country will answer yes to that question. The dataset currently has a bimodal distribution across countries, with the results concentrated around 0.0-0.10 and 0.9-1.0. To transform this into a normal distribution, I am using a logit transformation employing the function
log[𝑝/(1−𝑝)]
where p is the probability of a respondent answering yes. However, in three of the countries, 100% of the respondents answered yes, resulting in a logit function that cannot be calculated of
log[1/(1−1)]
What should I do with these three countries in the sample? Is there a legitimate way to lower their values from 1.0 so that they can be used in the formula?
I will also be averaging the panel data over the three periods to create a cross-sectional data set. The three countries of concern have values of 1.0 in a single period, with their results being less than 1.0 in the other two periods, meaning their average probability would be less than 1. Would it be appropriate for me to average the probability values across the periods prior to employing the logit transformation? an example of these two options is formulated below.
Log{{[p1/(1-p1)+p2/(1-p2)+p3/(1-p3)]}/3}
Or
{log[𝑝1/(1−𝑝1)]+log[𝑝2/(1−𝑝2)]+log(𝑝3−(1-𝑝3)}/3
Relevant answer
Answer
Don't transform the response. Use an appropriate statistical model (binomial or quasi-binomial) instead.
  • asked a question related to Panel Data
Question
4 answers
I am trying to recreate the example from data which was reproduced by William Greene and later discussed by Damodar Gujarati in his book "Basic Econometrics". I have panel data of 6 airlines from 1970-1984. The data analyzes the costs of six airline firms for the period 1970–1984, for a total of 90 panel data observations. The variables are defined as: I = airline id; T = year id; Q = output, in revenue passenger miles, an index number; C = total cost, in $1,000; PF = fuel price; and LF = load factor, the average capacity utilization of the fleet. where, cost is dependent variable and others are independent variable.
I want to analyze how the cost of individual airlines has been affected due to changes in other factors. Further, I want to know if the magnitude of change in independent variable is same on each airline or different. Finally, which factor contributes most to affecting the cost of individual airlines? I am using EViews to recreate the model.
I have generated two models: 1. with dummy assigned to each airline 2. fixed effect pooled model.
Relevant answer
Answer
You already have useful tips from the earlier respondents.
I have seen your output and it is clear you have the idea. I also assume you have conducted the pre-estimation tests and found that FE would be the suitable model.
You can estimate the two models separately or at once. The two objectives will still be achieved. The first objective needs coefficients on your independent variables and these will be the same across the airlines. To achieve the second objective, you will introduce dummies for (n-1) of the airlines. I assume that setting up the dummies will not be a challenge. The coefficients of dummies will be kind of the starting point for the costs of the respective airline before effects of independent variables set in. Formally, the dummies mean that your estimated model assumes 6 parallel regression fits, one for each airline, but with different Y-intercepts.
Hope this is useful.
  • asked a question related to Panel Data
Question
2 answers
I AM USING PANEL DATA, I WANT TO ESTIMATE THE IMPACT OF REGULATION ON FIRMS' INNOVATION THROUGH DID, PSM-DID APPROACHES, I CAN ABLE TO CALCULATE DID BUT NOT PSM-DID FOR PANEL DATA. PLEASE ANYONE CAN EXPERIENCE.
  • asked a question related to Panel Data
Question
2 answers
In my panel data (10-time units and 50 cross sections) I am using inflation as one of the control variable. When I am running Levin-Lin-Chu unit-root test the p-value I am getting is 1. It may be because the inflation will be different across time units and the same across cross-sections. How to report this?
Relevant answer
Answer
Dear Sreekanth, a 1.000 p-value (for computational purposes) means that the null hypothesis of a unit root is not rejected. That is, panels contain unit roots. It might be helpful to look at possible "similarities" in the data variables sample you are using to, for instance, check for cross-sectional correlation or other statistical issues. From a macroeconomic perspective, inflation is a very particular variable in this kind of analysis. I use Stata for my work. This reference on unit root tests could be helpful.
Best.
  • asked a question related to Panel Data
Question
12 answers
Hi,
I hava a data sample for 12 years and 11 companies. Nine of the 11 companies belong to a single parent company and its estrategic investment decisions are not totally independent.
For an analysis to get the determinants of investment deciosions, can this be a problem?
Thanks in advance,
Sérgio
Relevant answer
Answer
It is easy to recommend to extend the sample (Why not for the number of years, but only for the number of firms?), but it may be difficult (and expensive) to do so. Important (expensive) investments of individual companies do not take place every year, but maybe every 5-15 years (for buildings etc. even more) and this "cycle" need not be regular, as decisions are made some time before and the investment can be effected one or two years earlier of later (dependent on economic or other circumstances - e.g. Corona). Investment goods can also be leased, and you find only the leasing rate, but no investment in the book-keeping numbers. Therefore, it is very difficult to specify an investment function for individual firms. But such a careful specification is necessary before estimation. It is important, too, to have a careful look on the data to find out relations/tendencies (e.g. with diagrams, simple correlations).
If you have already specified this equation (or model), you could try an estimation for the sample of the nine companies under the holding, and two time series estimates for the others. But I am sceptical, that you get senseful results, even if you had much more companies and years.
  • asked a question related to Panel Data
Question
2 answers
Dear all,
I have run a SFA panel data model in Stata and would like to know how can I plot the stochastic frontier line?
Thanks in advance.
  • asked a question related to Panel Data
Question
7 answers
I have seen a couple of panel data research where the authors fill up missing data of a variable with similar variable from another source. For example , getting inequality data from WDI and filling the missing values up with data from SWIID. Is there any justification for this and also how do we fill this up when the 2 estimates are Calculated differently and with different scales.
Relevant answer
Answer
Dear Humid.
The use of interpolation is better than filling missing data from one source with another source. The units and other conditions surrounding the data collection is different.
However, you can make use of proxy, that is using other close related variables to proxy the missing variables as a whole.
Regards