Questions related to Panel Data
I can't understand if is it necessary to standardize variables in panel data to properly compare the coefficients because all variables are measured in different scales (eg; currencies, counts). If need to standardize, can I apply the normal standardization? I have 3 sectors and 8 years in my dataset
I have used xtcd command to check cross sectional dependence. And before that tested heterogeneity by doing a slope homogeneity test, results show the presence of heterogeneity among variables.
I am trying to conduct an event study (green bond announcement) on listed companies.
I have collected the stock price and market price index, unfortunately i am stock because i dont know how to calculate CAR using R. Please can anyone help?
Thanks in advance
WE Know that we can not use PMG, MG, DFE..if there is cross section dependence, but until now the CS-NARDL model has not been developed, therfore Can we use Panel NARDL if there is cross section dependence?
I am currently trying to perform (moderated) mediation analyses with a long panel data set in Stata.
I am using a SEM model and I am trying to follow UCLA (http://www.ats.ucla.edu/stat/stata/faq/sem_mediation.htm and http://www.ats.ucla.edu/stat/stata/faq/modmed.htm). The issue I have is that these approaches are describing cross sectional analyses. Bollen & Brand (2008, http://www.escholarship.org/uc/item/3sr461nd) explain an approach to do this, but I struggle with applying their advice to Stata.
Does anyone have information, links, advice, papers etc. on how to approach this challenge
Hi! How do I fix both serial and cross-section auto correlation in panel date at the same time? I run a fixed effects model with PSCE errors (takes care of cross-sectional autocorrelation), and one lag of the dependent variable (not significant).
But my Durbin-Watson test is about 1.7, which is too low accordning to the Bhargava, Franzini, and Narendranathan test statistics tables (which are very tight at about 1.9 for the lower bound). Likewise the Wooldridge test for autocorrelation in panel data is significant.
I try to apply a dynamic panel data with two-step estimation and an AR model, but regardless of the number of lags I get significant results for a Pesaran CD test for cross-sectional dependence.
Bottom line: It seems like I can´t address serial and cross-section auto correlation at the same time. Or can I? How?
My data is panel data. May i please know whether can i use STATA using "medsem" package for mediation analysis for panel data?
And im using GMM method of regression analysis for the panel data. so should i use the log form of variables for mediation analysis? kindly explain. Thank you.
How to identify whether a panel data is static or dynamic? And what are the techniques used to analyse static vs dynamic panel data. Thank you
I have a set of balance panel data, i:6, t: 21 which is it overall 126 observation. I decided that 1 dependent variable (y) and 6 independents variables (x1,x2......).
First: I do unit root test it shows:
If I would like to run panel data regression (Pooled, Fixed Effect and Random Effect), is that the correct form for inputting the model in Views:
d(y) c x1 d(x2) d(x3) x4 d(x5) x6
Shall I sort all variables in the same difference level, adding "d" to all ?
please correct if I am wrong, these are the steps I would like to conduct the statical part of a panel data:
1. Test Unit Root
2. Panel Regression?
I have panel data (T=10, N=26) where all variables are integrated I(1) with cross-section dependence. I applied Westerlund test and found no cointegration. So I proceeded with Pvar (Panel var) estimation. However, I want to confirm the robustness of my analysis by applying another estimation technique. Any advice?
I have run autocorrelation for my panel data using wooldridge test using the command xtserial. The p value is less than 0.05 showing that autocorrelation is there in data.
May i please know how to remove autocorrelation in panel data using stata. Thank you
Hey Members, I'm running quantile regression with panel data using STATA, i find that there are two options :
1- Robust quantile regression for panel data: Standard using (qregpd )
2- Robust quantile regression for panel data with MCMC: using (adaptive Markov chain Monte Carlo)
Can anyone please explain me the use of MCMC ? how can i analyse the output of Robust quantile regression for panel data with MCMC ? thanks
I am struggling a bit as I have a panel data set with a dependent variable, which varies over time and individual, and an independent variable which is constant over time but not in regard to individuals. Yit = b0 + b1 * Xi
Now my question is if it is correct to run casual fixed effects panel data regressions? If I am correct, using fixed effects would "eliminate" the variable in the end as it calculates Yit = b0 + b1 * (Xi - (mean of Xi). This would exactly eliminate what I would like to analyze.
What are the unit root test techniques that can be used for unbalanced panel data with cross-sectional data? CADF seems to be OK, but CIPS seems not applicable. Is there any other method?
I have a panel dataset consisting of string names of countries. I want to generate specific ID for each country.
I am working on a paper assessing the impact of a law enacted in 2018 using difference-in-differences regression on panel data from 2014 to 2022. I want to know if there is any advice on how to control for 2020 drop in FDI which could affect the outcome of the study
My data is panel data, secondary data. The results show that fixed effects model is appropriate. But breusch pagan/ Cook Weisberg heteroskedastic test shows that data is heteroskedastic. Should the data be homoskedastic to continue multiple linear regression. If yes kindly provide me solutions. Thank you
hello researcher greetings
Actually I want to run panel data model in stata, my panel data consist monthly time variable with 6 cross-sectional observation. When I am putting my data on stata the time variable is coming to be string. When i generating monthly time variable, the time variable get extended to many time period ahead. Can any one help me to solve such problem.
for my thesis I am looking at the relationship between the level of unionization (trade-unionmembership rate of the firm's employees, TR) and the firm performance (Roe).
I wanted to use a mediation model, but according to my supervisor this goes beyond my level for this thesis.
So now i simply want to regress the Roe on the TR and control variables.
I have a panel data set, consisting of 237 firms over a five year time period.
For control variables I thought I use the size of the firm (Log number of employees) and the age of the firm. Futhermore I thought it would be useful to control for industry effects, so i incorporated FTA groups, which are 37 groups.
What would be the proper course of action?
Dear RG Members,
Is it necessary to check the secondary data for normality and linearity while performing panel data regression (like Fixed Effect or Random Effect model)?
And if not?
Then what to give references? As in many papers, the author/s neither discuss the secondary data's normality nor linearity before performing FIxed Effect or Random Effect.
In Panel Data set where N>T , to examine the long run and short run relationship and variable are stationary at I (0) and I(1). Can I use ARDL?
I have a Labor force survey from 2007-2016 (10 years). The survey is not a panel data (selected from different strata and have different N across the years). N = 12,581 |12,823 |9,561 | 6,167| 9269 | 8865 | 12319 | 12544 | 12731 | 9137
I am interested in the factors (X) that influence the education-job mismatch (Y) by running Profit model. And then the Mincer equation. My questions are:
1. Should I use pooled data analysis? If so, how to merge the 10 years data together?
2. Should I analyse the 10 years survey separately before, to see, if my (assumed) model is rather stable over time?
3. There is a possibility that one same person completed the surveys but I don't know how to match people directly or narrow down the data as a panel data considering large number of sample and years?
I am conducting a panel data study wherein I am measuring the impact of one independent variable (continuous) on a dependent variable (continuous) in a fixed effects regression. In addition, four control variables are included which are also continuous. There are 12 firms and I have repeatedly measured them for 10 years. Thus, the total number of observations is 120. Can someone please point to relevant resources / software to estimate an adequate sample size / power for such a research design.
Thanks in advance.
when doing a meta-analysis with firm data, we accounted some panel studies haveing, e.g., 100 firms repeatedly providing data across, say, 10 years. I would naturally use N = 100 as the relevant sample size but colleagues disagree and vote for N = 100x10.
Does anybody have an idea?
All the best
It is well-known that Dumitrescu and Hurlin (2012) proposed a test of Granger causality in panel datasets. However, this test requires balanced panel data.
I wonder if there is a test for the Unbalanced panel data?
I know this is probably fairly straight forward but I can't find a clear answer to the question online and I'm new to STATA. I want to work out marginal effects of an interaction term in a simple panel mixed with country and year fixed effects in STATA ( Y=a+β1X1it+ βX2it + β3X1it*X2it + error term). I want to determine the impact on Y when X1 is high and X is medium, when X1 is low and X2 is high etc etc
I've used xtreg Y X1 X2 c.X1#c.X2
X1 and X2 are both continuous scores between 0 and 1.
I'm wondering if I can use the STATA command:
margins, at x1(=(0 0.5 1) x2=(0 0.5 1))
For marginal effects at a score of 0,0.5 and 1 for because it's giving me weird results or if I should use
margins, dydx(x1) at (x2=(0 0.5 1))
but I'm a little unsure of how interpret the latter
Any help would be greatly appreciated, I've read most of the things already posted about this topic and found them not very helpful but any hints are greatly appreciated. I'm very confused.
Thanks in advance
I have a balanced panel data set with 156 observations from the years 2010 to 2021, and after testing the goodness of fit with linear regression, the R square is 58 percent, which is good. After model fitting, testing revealed, however, that the linear regression violated 1) the normality of the error terms and 2) was (non-constant) in the variance of the error terms. If you know how to fix the issue and why it persists, please do. what restrictions my data were subjected to.
Hello, Is it posible to use logistic regression on pooled panel data? The dependent variable is whether or not respondent has diabetes. The independent variables are income, gender, education. Should the individual income observations be adjusted to reflect increasing (average) income over time? Are there any specific considerations that should be addressed?
Hi, today I came across a strange problem. I found a panel data set online today (for the period 1980-1987). I first estimated a model with no fixed effects, but with time fixed effects (year dummies). As expected, one of the dummy variables was removed from the model (1980). I then used the fixed-effects estimator and observed something odd. Now, in addition to the 1980 dummy variable, the 1987 variable was also removed, as was the education variable. The education was removed because it is time invariant, but I can't explain the removal of the 1987 dummy variable. I had initially inferred mulitcollinearity, but then the 1987 variable should have been removed in the first regression, right? Also, the VIF values do not indicate a problem of mulitcollinearity. So what could be the reason for this? Could it have something to do with the -xtreg- command or fixed-effects transformation in general?
These are my commands:
reg lnwage union educ exp i.year, robust
xtreg lnwage union educ exp i.year, fe robust
I am trying to estimate a stochastic frontier model in the panel data set using stata13. In doing so, I prefer to use the TRE model. Even though I am aware that this model allows estimating the frontier model and the inefficiency determinants at a time(following a single step), I am not sure if I am following the correct procedure.
Could someone help me?
sfpanel ln y ln x1 lnx2 .....lnxn, model (tre) dist(hnormal) usigma(z1,z2,z3...zn) vsigma()
The assumption here is z1....zn are inefficiency determinants and also considering heteroscedasticity at the same time.
I'm trying to get the data of loan officers from microfinance(how many borrowers they approach, loan amount outstanding, the portfolio risk, the percentage of complete repayment, etc). Can anyone suggest to me the database to use data for the panel data model?
I want to run the NARDL model in Stata. Can someone explain to me the steps in order to run the NARDL model in Stata, especially with panel data? My dependent variable is RPPIs, my independent variable is GDP, and the control variables are inflation, interest rates, and credit. Please, someone explain to me the whole procedure. I am very thankful.
I conducted analysis on 3 countries using two methods:
· Time series (one model per country)
· Panel data (one model for three countries by combining time series and 3 cross sections units)
The results are as follows:
Impact of independent variable X to dependent variable Y:
Country A: Positive and significant
Country B: Negative and significant
Country C: Negative and significant
Impact of independent variable X to dependent variable Y:
Country A, B, C: Positive and significant
How do I explain the different results between time series and panel data for Country B and C?
Why is the result different using the time series and panel data regression for those countries?
Note: I checked the data and meet all requirements for both methods.
Appreciate your help.
I have a SEM model (with 9 psychological and/or physical activity latent variables) with cross-sectional data in which, guided by theory, different predictor and mediator variables are related to each other to explain a final outcome variable. After verifying the good fit of the model (and after being published), I would like to replicate such a model on the same sample, but with observations for those variables already taken after 2 and after 5 years. My interest is in the quasi-causal relationships between variables (also in directionality), rather than in the stability/change of the constructs. Would it be appropriate to test an identical model in which only the predictor exogenous variables are included at T1, the mediator variables at T2 and the outcome variable at T3? I have found few articles with this approach. Or, is it preferable to use another model, such as an autoregressive cross-lagged (ACL) model despite the high number of latent variables? The overall sample is 600 participants, but only 300 have complete data for each time point, so perhaps this ACL model is too complex for this sample size (especially if I include indicator-specific factors, second-order autoregressive effects, etc.).
Thank you very very much in advance!!
Dear ReseaarchGate community,
I am researching the impact of information technology capability on audit report lag before and during Covid-19.
I question how to build a model to see the impact of information technology capability on audit report lag without separately testing data (mean before the covid-19 and during the Covid-19).
Note: the type of data is panel data
Thank you, and I'm looking forward to your feedback.
I am working on my thesis and I have a few questions about which method I should use for the analysis of my data. My research is about inequality in Europe, and in general households in Europe that have access to broadband internet connections and what its effects are on their educational attainment. The IV here would be
- % of households that have access to a broadband connection
- GINI Index of the country (to mesure inequality between the countries)
The dependent variable would be a variable that can be divided in 3 groups: . % of population that completed primary education, % of population that completed secundary education, and finally % of population that completed tertiary education.
The data is available on the World Bank database and has about 20 countries over a period of 15 years. After doing some research I figured out that the type of data i'm using is Panel Data. I have done some reading about it, but I can't figure out how to continue, because most of the tutorials only use one IV and on DV. What I have read suggests using OLS(my promotor also told me that OLS would be best suited) for the type of variables and data I'm using and that I will need control variables like "population" or "unemployment", but I don't get it.
I don't know if I'm being clear here, I basically want to know if I need to do what I read (but then I have no clue on how to work with 2IV and 1 DV), or if it's something completely different..?
If something isn't clear, let me know, and I'll try to explain better. Thank you very much.
Hi all, I have a panel data set consisting of monthly returns, ESG scores and Fama French factors. Monthly returns and ESG scores are changing across units and across time. However, the Fama French factors does not change across units, but do change across time. My question is, whether I then can use Fama French factors in a Pooled OLS, RE of FE estimation? And can I then add time and firm fixed effects?
Thanks, Kind regards Karoline
Could anybody explain that while performing the system GMM model, I have to perform a unit root test or not? I have data from 46 firms for 12 years, but while performing the Levin-Lin-Chiu test in Stata software, it shows the following "Levin-Lin-Chiu test requires strongly balanced data".
According to someone, using country-level data for study might bring the "aggregate fallacy" introduced by macro data, resulting in possible bias in estimate findings. If this is true, then why have researchers published several research articles in prestigious journals in which they analyzed country-level data? What if we conducted estimates using data from income groups? Simply said, if this is indeed a problem, what will be the solution?
I am thinking estimating a bivariate random effect probit for panel data and I wondered if there was a way to do this with stata
I have panel data comprises 5 cross sections across 14 independent variables. the data time series part is 10 years. while I run the panel data model for pooled OLS and FE model it gives results while for Random effect model it shows error as RE estimation required number of cross-section>number of coefficients for between estimators for estimation of RE innovation variance. Can anyone help me how to get the results for Random effect model?
I am interested in some recommendations regarding the use of bootstrapping with panel data.
I appreciate any suggestions on literature and also software that can be used to create the bootstrapping sample.
The 5 methods of estimating dynamic panel data models using 'dynpanel" in R
# Fit the dynamic panel data using the Arellano Bond (1991) instruments reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,4) summary(reg) # Fit the dynamic panel data using an automatic selection of appropriate IV matrix #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,0) #summary(reg) # Fit the dynamic panel data using the GMM estimator with the smallest set of instruments #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,1) #summary(reg) # Fit the dynamic panel data using a reduced form of IV from method 3 #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,2) #summary(reg) # Fit the dynamic panel data using the IV matrix where the number of moments grows with kT # K: variables number and T: time per group #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,3) #summary(reg)
I've applied a 4-variable panel data VAR for 19 units for 15 years' data (variables were all normalized to get values between 0 and 1). After checking for cross sectional dependency, appropriate unit root tests were applied for stationarity and where necessary, variables were transformed to make them stationary.
I've used STATA [commands mentioned in a paper by MRM Abrigo and I Love, 2016) to conduct panel VAR. Sabiliity condition was satisfied, VAR shows significant relation between the variables (so does Granger tests). But when I plot the IRF graphs with Monte Carlo simulations, my confidence bands are explosive.
Without the MC simulation, the IRF graphs look good.
How can IRF graphs be inconsistent with my data that is stationary and results that are stable?
Is it something to do with using normalized values? Or a problem with the instrumental lags selected?
Thanks a lot in advance!
I have some questions regarding ARDL model in the form of UECM(Dynamic panel data).I use this technique to estimate EKC Hypothesis for panel of 16 countries.I wonder if I can use this method in the presence of random effects (for example, by using GLS).I also would like to know if I should implement specification tests(autocorrelation, heteroskedasticity and contemporaneous correlation)like fixed effects model.I really appreciate your advice.
I have some problems regarding the stationarity of the variables.I have realised unit root test to check the stationarity of the variables and it shows that some of them are not stationary even at the first difference, but stationary at the second difference ,namely variables are I(2).May I ask in this case which panel econometric technique can be used to estimate short and long run effects when variables are integrated of order 2?(clearly ARDL is not the option)
I really appreciate your help.
Do we need to run the panel regression diagnostic tests for panel stochastic frontier analysis (xtfrontier or sfpanel)?
Could someone suggest the best read for this analysis, especially for the time-varying decay model and sfpanel with inefficiency functions explicitly mentioned?
I want to examine the impact of economic growth (GDP per capita) and population on CO2 emissions. I checked the unit root and find that the variables are integrated in different order, CO2 I(1), GDPc I(1) and Pop I(0), where the ARDL model is recommended for this case, but I get stuck by ARDL estimation. By using Eviews 11, I got an error "Near singular matrix" which means there is multicollinearity. I checked the correlation between variables and got 1 between GDPc and population.
Therefore, I am asking if there is a way to fix multicollinearity in panel data or if GDP per capita and population can not be in one regression? looking for your help
I am analyzing the impact of adopting two technologies using panel data. My dependent variable is categorized into 0=non-adopters, 1=partial adopters, and 2=both technology adopters. Independent variables include respondents' demographic and socio-economic variables.
I am using six years of panel data (2016-2021). Can anyone kindly suggest which impact assessment analysis is suitable for this study?
Thank you very much.
Y(dependent) = I(0)
if I take the first difference of X and make it stationary, is it possible to apply Dumitrescu & Hurlin Causality test. İf the answer is yes, could you please share an article for this situation.
Thanks and regards,
Hi dear scholar! With best wishes for you all. I know a little about heterogeneity problems in a model, such as slope heterogeneity due to cross sections differences. Is there any other heterogeneity exist in panel data ? To solve this slope heterogeneity we split the units into subgroups based on a characteristics such as income level, democracy, etc. Or use interaction term even in homogeneous models, or use heterogeneous mode such as DCCEMG, CS ARDL etc. Please let me know if there is another method. Thank you
I have panel data of various countries where exchange rate is one of the variables. I have an exchange rate denominated in local currencies only. My question is that Do I need to transform local currency to dollar denominated currency?
I have a panel dataset of 120 different countries measuring a variable over three periods. This variable indicates the percentage of 1000 respondents in each country that answered yes to the question. I am considering this representative of the probability that a respondent in each country will answer yes to that question. The dataset currently has a bimodal distribution across countries, with the results concentrated around 0.0-0.10 and 0.9-1.0. To transform this into a normal distribution, I am using a logit transformation employing the function
where p is the probability of a respondent answering yes. However, in three of the countries, 100% of the respondents answered yes, resulting in a logit function that cannot be calculated of
What should I do with these three countries in the sample? Is there a legitimate way to lower their values from 1.0 so that they can be used in the formula?
I will also be averaging the panel data over the three periods to create a cross-sectional data set. The three countries of concern have values of 1.0 in a single period, with their results being less than 1.0 in the other two periods, meaning their average probability would be less than 1. Would it be appropriate for me to average the probability values across the periods prior to employing the logit transformation? an example of these two options is formulated below.
I am trying to recreate the example from data which was reproduced by William Greene and later discussed by Damodar Gujarati in his book "Basic Econometrics". I have panel data of 6 airlines from 1970-1984. The data analyzes the costs of six airline firms for the period 1970–1984, for a total of 90 panel data observations. The variables are defined as: I = airline id; T = year id; Q = output, in revenue passenger miles, an index number; C = total cost, in $1,000; PF = fuel price; and LF = load factor, the average capacity utilization of the fleet. where, cost is dependent variable and others are independent variable.
I want to analyze how the cost of individual airlines has been affected due to changes in other factors. Further, I want to know if the magnitude of change in independent variable is same on each airline or different. Finally, which factor contributes most to affecting the cost of individual airlines? I am using EViews to recreate the model.
I have generated two models: 1. with dummy assigned to each airline 2. fixed effect pooled model.
I AM USING PANEL DATA, I WANT TO ESTIMATE THE IMPACT OF REGULATION ON FIRMS' INNOVATION THROUGH DID, PSM-DID APPROACHES, I CAN ABLE TO CALCULATE DID BUT NOT PSM-DID FOR PANEL DATA. PLEASE ANYONE CAN EXPERIENCE.
In my panel data (10-time units and 50 cross sections) I am using inflation as one of the control variable. When I am running Levin-Lin-Chu unit-root test the p-value I am getting is 1. It may be because the inflation will be different across time units and the same across cross-sections. How to report this?
I hava a data sample for 12 years and 11 companies. Nine of the 11 companies belong to a single parent company and its estrategic investment decisions are not totally independent.
For an analysis to get the determinants of investment deciosions, can this be a problem?
Thanks in advance,
I have seen a couple of panel data research where the authors fill up missing data of a variable with similar variable from another source. For example , getting inequality data from WDI and filling the missing values up with data from SWIID. Is there any justification for this and also how do we fill this up when the 2 estimates are Calculated differently and with different scales.