Science topic
Econometric Modeling - Science topic
Explore the latest questions and answers in Econometric Modeling, and find Econometric Modeling experts.
Questions related to Econometric Modeling
Hi all,
I am facing a challenge with transforming some variables for logarithmic analysis in my ARDL model. These variables contain negative values, which complicates standard logarithmic transformation. I applied a log transformation with offset: Ln(X)=sign(x)*ln((abs(x)+1)
- Whether this approach is appropriate for an ARDL model.
- Any better alternatives or additional steps for accuracy and interpretability.
Thanks for your help!
The question is pretty straightforward, I think. I was just searching for empirical research (econometric modelling, specifically) on segmented labor markets. I didn't found much, maybe because is not a very recent theory. But what I have found, especially for Latin America, seems to discard the theory. Is there a definitive answer or consensus on this?
Also, what would be the ideal test to the presence of marketduality or segmentation?
Good day scholars!
Suppose I have five logged time series variables with mixed order of integration such as two I(1)s, two I(2)s and one I(0). Apart from simultaneous equation system, which other econometrics models can be used to model these variables?
Thanks for being there always
Dear Scholars,
Would you please help to find the model of sustainable climate risk management for firm-level?
To study complementarities of education in socioeconomic development of developing countries, how we can develop a model between education and environment?
I have heterogeneous panel data model,, N=6 T=21,What is the appropriate regression model? I have applied CD test , It shows the data have cross-sectional dependency
I used the 2nd unit root tests , and the result found that my data is stationary at level
is it possible to use PMG ? would you pleas explain the appropriate regression model?
Dear Scholars, I am measuring the effect of policy on firms' performance. I found a common (in both treatment and control groups) structural break 4 years before the policy intervention. I used the difference-in-difference model to find the impact of the policy. I am using 20 years of firm-level panel data. What are your suggestions?
I am currently trying to understand a possible dynamic panel model with the years of observation (t) higher than the number of unit of observation (n). The particular dataset contain 6 different cross-sectional region observed within the span of 30 years.
Hi! I would like to have an opinion on something, rather than a straight-out answer, so to speak. In time-series econometrics, it is common to present both long-term coefficients from the cointegrating equation, as well as the short-term coefficients from the error correction model. Since I have a lot of specifications, and since I'm really only interested in the long-term, I only present the long-term coefficients from a cointegrating equation in a paper I'm writing. Would you say that is feasible? I'm using the Phillips-Oularis singe-equation approach to cointegration.
I am working on some macroeconomic forecasting on a 15-year data sample. I don't know which multivariate time series technique would be appropriate for the analysis since one variable is stationary at the second difference. Could someone please suggest which method would be better for my analysis, along with the supporting material?
Economics has treated econometrics as a universal solvent, a technique that can be applied to any economic question, which is sufficient and, therefore, makes other applied techniques redundant.
Peter Swann in his book indicates the place of econometrics and argues against this notion and even takes this as a severe error. He advises fellow economists that they learn to respect and assimilate what he calls vernacular knowledge of the economy. His top message to economists is what the great French composer, Paul Dukas, advised his pupil: “Listen to the birds, they are great masters.” If any fellow economist asks: “don’t most economists do this already?” Then the answer by Swann is clear: “… some economists do use vernacular knowledge some of the time to underpin what they do … incidentally to make a piece of high technique more approachable … outside this limited context, economists do not tend to take the vernacular seriously."
Any argument for or against it?
Hi all, I'm doing fyp with the title of the determinant of healthcare expenditure in 2011-2020. Here are my variables: government financing, gdp, total population.
first model is: healthcare expendituret=B0+B1gov financingt + B2gdpt + B3populationt + et
second causal relationship model is healthcare expenditure per capitat= B0 + B1gdp per capitat +et
It is possible to use unit root test then ADRL for the first model and what test can use for the second model?
Thank you in advance for those reply me :)
In the context of ADAS features, what kind of econometric modeling would be the best to represent it? Here it is to be noted that ADAS parameters will be such that will be recognized the integrity and performance of the system. For example, in-vehicle cooling, seat and glass settings, etc.
Thank you, in advance.
In the file that I attached below there is a line upon the theta(1) coefficient and another one exactly below C(9). In addition, what is this number below C(9)? There is no description
The models that are used to check the robustness of the main econometric model may not always provide 100% parallel outcomes. Does it mean that there are flaws in the main estimation outcomes?
Dear research community,
I am currently working with Hofstede's dimensions, however, I do not exactly use his questionnaire. In order to calculate my index in accordance to his process, I am looking for the meaning of the constants in front of the mean scores.
For example: PDI = 35(m07 – m02) + 25(m20 – m23) ... What do 35 and 25 mean? How could I calculate them with regard to my research?
Thank you very much for your help!
Best wishes,
Katharina Franke
Given that the structural break points play a crucial role in time-series analysis, many past studies have ignored them. There are studies that have identified break points but have not incorporated them with the main econometric model. Given that the incorporation of break dummies can significantly influence the final model outcomes, what is the rationality of finding the breaks and not incorporating them to the main model?
I collected 109 responses for 60 indicators to measure the status of urban sustainability as a pilot study. So far I know, I cannot run EFA as 1 indicator required at least 5 responses, but I do not know whether I can run PCA with limited responses? Would you please suggest to me the applicability of PCA or any other possible analysis?
I'm working on Chinese OFDI in Africa and I really want to analyze my hypothesis via an Econometric Model based on the Micro level index. So if you already have a room or article link please let me know to learn how is possible.
hi economist researchers, pls help the what type of econometric model is used in order to compare the income of improved crop variety adopter and non adopters?
What is the most acceptable method to measure the impact of regulation/policy so far?
I only know the Difference-in-Difference (DID), Propensity Score Matching (PSM), Two-Step System GMM (for dynamic) are common methods. Expecting your opinion for 20 years long panel for firm-level data.
Dear everyone,
I am in great distress and desperately need your advice. I have the cumulated (disaggregated) data of a survey of an industry (total export, total labour costs etc.) of 380 firms. The original paper is using a Two-stage least square (TSLS) model in oder to analyze several industries with one Independent variable having a relationship with the dependent variable, which was the limitation not to use an OLS method, according to the author. However, i want to conduct a single industry analysis and exclude the variable with the relationship, BUT instead analyze the model over 3 years. What is the best econometric model to use? Can is use an OLS regression over period of 3 years? if yes, what tests are applicable then?
Thank you so much for your help, you are helping me out so much !!!!!!!
Hello everyone,
i would like to analyze the effect of innovation in 1 industry over a time period of 10 years. the dependent variable is export and the Independent variables are R&D and Labour costs.
What is the best model to use? i am planning to do a Log-linear model.
Thank you very much for your greatly needed help!
Dear colleagues,
I am planning to investigate the panel data set containing three countries and 10 variables. The time frame is a bit short that concerns me (between 2011-2020 for each country). What should be the sample size in this case? Can I apply fixed effects, random effects, or pooled OLS?
Thank you for your responses beforehand.
Best
Ibrahim
I have panel data covering 12 countries of the MENA region for the period 1996-2019. The dependent variable is tourism receipts and the independent variables are the World Bank governance indexes ( political stability, government effectiveness, control of corruption, and voice and accountability), I also have UNESCO sites - dummy- and a few economic indexes such as GDP and GDP Per Capita. What are the models I should implement? fixed or random? and what are step-by-step processes? how should I start?
Below I tried to do OLS pooling, but not sure if this is correct or not.
I have tried to estimate Bayesian spatial econometric models using R. However, I felt like the functions and packages are limited in R. Has anyone used R for Bayesian spatial econometric? Would other software do better job in estimating it?
I am working on a project regarding the impact of ICT investment on economic growth of the MENA region from 2007 to 2017. we will use econometric model and we wil set the ICT development index as a proxy to measure the ICT. I found data about ranks of IDI in 2007,2008,2010,2011,2012,2013,2015,2016,2017, but I can't find in 2009 and 2014, so can someone help me where can I find these two years?
thanks in advance.
Dear Colleagues,
I ran an Error Correction Model, obtaining the results depicted below. The model comes from the literature, where Dutch disease effects were tested in the case of Russia. My dependent variable was the real effective exchange rate, while oil prices (OIL_Prices), terms of trade (TOT), public deficit (GOV), industrial productivity (PR) were independent variables. My main concern is that only the Error Correction Term, the dummy variable, and the intercept are statistically significant. Moreover, residuals are not normally distributed, while also the residuals are heteroscedasdic. There is no serial correlation issue according to the LM test. How can I improve my findings? Thank you beforehand.
Best
Ibrahim
I have non-stationary time-series data for variables such as Energy Consumption, Trade, Oil Prices, etc and I want to study the impact of these variables on the growth in electricity generation from renewable sources (I have taken the natural logarithms for all the variables).
I performed a linear regression which gave me spurious results (r-squared >0.9)
After testing these time series for unit roots using Augmented Dickey- Fuller test all of them were found to be non-stationary and hence the spurious regression. However their first differences for some of them, and second differences for the others, were found to be stationary.
Now when I test the new linear regressions with the proper order of integration for each variables (in order to have a stationary model) the statistical results are not good (high p-value for some variables and low r-squared (0.25))
My question is how should I proceed now? Should i change my variables?
Does anyone have the codes (written on RATS/MATLAB/any other platform) for Rolling Hinich Bicorrelation test and Rolling Hurst Exponent test? Would greatly appreciate if you could share them.
Hi! I have a model for a panel data and my teacher told me to do an estimation of the model with different coefficients for one of the explicative variables. She gave me an example:
lpop @expand(@crossid) linv(-1) lvab lcost (for different coefficients for intercept)
or
lnpop c lninv @expand(@crossid)*lnvab lncost (for different coefficients for this variable).
Can someone explain me how to do that? I tried but it didn't work..
Dear colleagues,
I applied the Granger Causality test in my paper and the reviewer wrote me the following: the statistical analysis was a bit short – usually the Granger-causality is followed by some vector autoregressive modeling...
What can I respond in this case?
P.S. I had a small sample size and serious data limitation.
Best
Ibrahim
In my work, I use the multivariate GARH model (DCC-GARCH). I am testing the existence of autocorrelation in the variance model. Ljung-Box tests (Q) for standardized residuals and square standardized residuals give different results.
Should I choose the Ljung-Box or Ljung-Box square test?
N=1500
Hello everyone. I am using the VECM model and I want to use variance decomposition, but as you know variance decomposition is very sensitive to the ordering of the variable. I read in some papers that it will be better to use generalized variance decomposition because it is invariant to the ordering of the variables. I am using Stata, R or Eviews and the problem is how to perform Generalised VD and please if anyone knows help me
I am running an ARDL model on eviews and I need to know the following if anyone could help!
1. Is the optimal number of lags for annual data (30 observations) 1 or 2 OR should VAR be applied to know the optimal number of lags?
2. When we apply the VAR, the maximum number of lags applicable was 5, beyond 5 we got singular matrix error, but the problem is as we increase the number of lags, the optimal number of lags increase (when we choose 2 lags, we got 2 as the optimal, when we choose 5 lags, we got 5 as the optimal) so what should be done?
It is obvious that economic publications are becoming more and more quantitative that they seem detached from reality and lost economic sense. The advanced econometric models we use are sometimes difficult for other economists to comprehend. Is it not time to prioritize qualitative researches so we have a larger audience and our publications become more relevant?
Hi!
I would like to ask about the possibility of improving the MAPE values for the VAR-model. When the lag intervals change, the MAPE values improve. However, the question arises as to the rationale for such an approach to changing the lag intervals. Earlier in the analysis of the VAR-model, I explained the feasibility of the lag intervals following the test results of Lag Length Criteria and Lag Exclusion Tests. And now I'm not sure if from a scientific point of view I can explain the change in the lag intervals as a result of improving the MAPE values. Tell me, please, maybe someone knows the rationale for such an approach?
Hello. I am working on a classroom study on the effect of environmental factors on German energy demand. I did not find an econometric model or related decomposition model. Can anyone help?
Dear Colleagues,
I paid attention to that, when I estimate an equation by Least Squares in Eviews, under the options tab we have a tick mark for degrees of freedom (d.f.) Adjustment. What is the importance and its role? Because, when I estimate an equation without d.f. Adjustment, I get two statistically significant relationship coefficients out of five explanatory variables; however, when I estimate with d.f. Adjustment, I do not get any significant results.
Thank you beforehand.
Dear Colleagues,
If I have 10 variables in my dataset (time series) out of which 9 is explanatory and 1 dependent, and if I clarify that all the variables are non-stationary, should I take the first difference of the dependent variable as well?
Best
Ibrahim
I am working on the green solow model as part of my university studies. I am trying to create an econometric model close to the one of Brook and Taylor. However, I am desperately looking -without success- for a database that lists the abatement costs for reducing polluting emissions by country. Do you know a database that you can recommend to me?
Thank you for your time and I wish you a pleasant day / evening.
God bless you,
Cedric
Dear colleagues,
I am applying the PCA to political and institutional variables to create an index and use it in the regression analysis as a dependent variable. However, the variables which will form the main components contain different measurements. For example, while control of corruption ranges between -2.5 (weaker) and 2.5 (stronger), freedom of press ranges between 0-100 and if the value is higher, it shows fewer degrees of freedom of the press. So, I am in a loss to understand if this difference creates any hardship to PCA to produce a valid index. In other words, is it a problem for PCA if one variable implies higher success as the values of it get higher, while the other variable shows higher success as the values of it get lower? What should I do in this case? Thank you beforehand.
Best
Ibrahim
How to assess the socioeconomic impact of the dam with econometric models and if you want to compare the impact on the livelihood of the same community before and after the dam construction which method is easy and applicable.
I need an expertise consultation for best Econometrics model in order to assess technical efficiency of commercial banks on resource mobilization in Ethiopia.
Dear respectful community researchers,
I am interested to find out the impact of Institution, Geography, and Trade on Economic Development of a single country.
I do know that for several countries' analysis, researchers mostly used the Hausmann and Taylor (1981) model to find the impact. Unfortunately, I have no idea if any model exists to be suitable to execute on a single country.
I am very much looking forward to hearing from you.
Thank you so much indeed
Best Regards,
Abdul Rahim
I am writing my dissertation about the effects of economic reforms on economic growth in the case of Egypt in 2016 and I need suggestions for what econometric models are usually used in testing such effects using indices like GDP, inflation and so on.
If we have to check the co movements/spill over effects between certain variable of Pakistan and 5 other countries, Is it possible by using a single GARCH model. If yes then WHAT will be the VARIANCE EQUATION?
OR WE HAVE TO study EACH country one by one against PAKISTAN AND KEEPING ALL OTHER countries IN VARIANCE EQUATION
Assume
basic model: A= Con + B+C+D+E+error. (refer to figure for detailed elaboration)
please give me some recommended econometric models for studying adoption of agricultural technologies
I am interested in multi-hurdle econometric model to analyze multiple hurdle data for my study. Currently, I look for model specification materials and commands used for the analysis purpose in STATA. I really acknowledge and appreciate your help...!
- I want to estimate the stochastic frontier model to find the technical efficiency and technological gap of agricultural in each geographical region.
- However, the outcome variable (agricultural income) is not a continuous variable but my outcomes variable is ordered categorical variable.
Example:
y = 1 ; if agricultural income is less than or equal 100$
y = 2 ; if agricultural income is more than 100$ but not greater than 200$
y = 3; if agricultural income is more than 200$
- How can I estimate Stochastic Frontier Models for Discrete Output Variables with STATA or R? What command I could use? or Is there another approach to deal with this type of outcome variables?
The explanation of the model is in page 290: https://link.springer.com/chapter/10.1007/978-3-030-23727-1_9
Trying to see how increasing minimum wage affects unemployment rates at the county level. Will look at state with federal min wage vs state with higher min wage. What variables need to be controlled for? Is there a better model/regression/method?
Hello,
I'm doing a research about the US-China Trade war impact on FDI right now but I'm not sure what model should I choose because the informations only appear in a short range(Between 2018-2019). My advisor recommend to use multiple regression with a Trade war as a dummy or use a VAR model. Do you have any recommend on what model should I use? Or if you have a recommend study/research about anylising shock effect on FDI then it would really helps me a lot.
Thank you
I would like to review some papers that uses econometric model to estimate Total Factor Productivity (TFP) of any country. I found papers using Solow Residual technique only. So, would you please suggest me some paper using econometric models to estimate TFP.
Thank you!
Dear All,
I would like to perform event study analysis through website: https://www.eventstudytools.com/.
Unfortunately they ask for uploading data in a format i dont understand , dont know how to put data in this form, and i dont find a user manual or email to communicate with them.
Can anyone kindly advise how to use this service and explain it in a plain easy way?
Thanks in advance.
Ahmed Samy
Dear All,
I'm conducting an event study for a sample of 25 firms that each gone through certain yearly event (inclusion in an index).
(The 25 firms (events) are collected from last 5 years.)
I'm using daily price abnormal returns (AR), and consolidated horizontally the daily returns for the 25 firms to get daily "Average abnormal Returns" (AAR).
Estimation Window (before the event)= 119 days
Event Window = 30 days
1- I tested the significance of daily AAR through a t-test and corresponding P-value, How can i calculate the statistical power for those daily P-values?
(significance level used=.0.05, 2 tailed)
2- I calculated "Commutative Average Abnormal Returns" (CAAR) for some period in the event window, performed a significance test for it by t-test and corresponding P-value, how can i calculate the statistical power of this CAAR significance test?
(significance level used=.0.05, 2 tailed)
Thank you for your help and guidance.
Ahmed Samy
I am working on the economic history of Switzerland and I would like to know which determinants foster industrialization during the 19th century.
I am working with time-series. My dependent variable is the Gross added value of Swiss industries and I have 5 explanatory variables (like education, railway, tariffs etc.). The times period studied runs from 1890 to 1913.
I first used a VAR model but reviewers are not so convinced... They prefer panel data (but I don't have !) or they think that VAR is unusual...
So, do you have any idea about the macroeconometric model I should use to deal with my research question ?
Thanks a lot !!
Hello researchers and experts,
I am trying to run a Spatial Auto-regressive Model (SAR) and Spatial Durbin Model (SDM). But when I am trying to see impacts, that is direct, indirect, and total effect, R is giving an error by stating that : "Error in intImpacts(rho = rho, beta = beta, P = P, n = n, mu = mu, Sigma = Sigma, : length(listw$neighbours) == n is not TRUE".
Here is my codes:
SMod = readOGR(dsn = "C:/Users/FRJM/Desktop/Data/Spatial 3079", layer = "Sp3079")
Coord <- coordinates(SMod)
SpD100 <- dnearneigh(Coord, d1 = 0, d2 = 150, longlat = TRUE)
SpD100listW <- nb2listw(SpD100, style="W", zero.policy=TRUE)
Equ1 <- Volatility~A+B+C+D+E
reg.SAR1=lagsarlm(Equ1, data=SMod, SpD100listW, tol.solve=1.0e-20, zero.policy=TRUE)
impacts(reg.SAR1, listw=SpD100listW)
If I am writing code : summary(reg.SAR1), I am getting summary result. So there is no error on running : lagsarlm(Equ1, data=SMod, SpD100listW, tol.solve=1.0e-20, zero.policy=TRUE).
Only problem is running the "impacts" command.
Thanks in advance for your answer.
Hello, I am dealing with a econometric model that uses system GMM. Please answer the following questions:
1.In order to get better result, what tests should I accomplish before running the final regression? I mean do I need to test autocorrelation, heteroscedasticity etc. for diffferent series of the panel data ?
2.What is the appropriate test to determine the endogeneous,exogeneous and predetermined variable in the panel data? How these issues can be resolved?
ex. there are two latent variables namely psychological and motivational factors of entrepreneurs. what could be possible explanation of this cov.
Hello fellow researchers,
I am currently in the process of writing my master thesis and my topic of research is homicide rates in the developing world. My research objective is to find a relationship between capital punishment and homicide rates if there exists any.
The literature on similar topics are available on the internet, however I am not very good when it comes to statistics and econometric and would like some assistance in creating my econometric model for my research. I have gathered all the necessary data required for my dependent and independent variables.
Basically, I have a panel dataset of 24 countries from 2001 to 2012 and it is strongly balanced (no gaps or missing values in any variable). The only problem I am stuck at is how can I make my econometric model so I can actually use the data for testing and regression analysis.
I have trade in goods (% of GDP) and trade in services (% of GDP) as explanatory variables then does it sounds good if I take log of them to include in my model since its already in percentage? Also, please guide whether including both trade in goods (% of GDP) and trade in services (% of GDP) together in a single model will cause multicollinearity problem? Please guide.
Thanks
I have seen that some researchers just compare the difference in R2 in two models: one in which the variables of interest are included and one in which they are excluded. However, in my case, I have that this difference is small (0.05). Is there any method by which I can be sure (or at least have some support for the argument that) this change is not just due to luck or noise?
To illustrate my point I present you an hypothetical case with the following equation:
wage=C+0.5education+0.3rural area (
Where the variable "education" measures the number of years of education a person has and rural area is a dummy variable that takes the value of 1 if the person lives in the rural area and 0 if she lives in the urban area.
In this situation (and assuming no other relevant factors affecting wage), my questions are:
1) Is the 0.5 coefficient of education reflecting the difference between (1) the mean of the marginal return of an extra year of education on the wage of an urban worker and (2) the mean of the marginal return of an extra year of education of an rural worker?
a) If my reasoning is wrong, what would be the intuition of the mechanism of "holding constant"?
2) Mathematically, how is that just adding the rural variable works on "holding constant" the effect of living in a rural area on the relationship between education and wage?
Dear All,
Doing Financial Event studies using excel is just horrible process for the arrangement and chopping of huge data and complicated manual calculations..etc
Please advise what software are out there that can do Financial Event Studies in a more neat and time efficient way?
Thanks
Ahmed Samy
I have used high frequency financial time series data for the study of futures market volatility and liquidity. wanted to ask that what are the major limitations of econometric models like GARCH and TARCH and how could they impact our results?
Dear All,
I’m conducting and event study for inclusion of companies in a certain index.
The event is the “inclusion event” for companies in this index for last 5 years.
For the events, we have yearly Announcement date (AD) for inclusions, and also effective Change Dates (CD) for the inclusion in the index.
Within same year, I have aligned all companies together on (AD) as day 0, and since they are companies from same year, CD will also align for all of them.
The problem comes when I try to aggregate companies from different years together, although I aligned them all to have same AD, but CD is different from one year to another so CD don’t align for companies from different years.
How can I overcome this misalignment of CD from different years , so that I’m able to aggregate all the companies together?
Many Thanks.
I am running a regression in Stata.
As the dependent variable, I have the market share of smartphones (quarterly) for Apple and Samsung, and independent variables are Functional improvements and Design innovation (scored also quarterly).
My supervisor suggested that I have time fixed in order to account for the Christmas boost and I do not really understand how to do it.
And the second question is, am I capturing the interaction effect correctly?
So far I did...
xtset idcompany qdate
reg marketshare design function
and for interaction effect I did
gen designfunction=design*function
reg marketshare design function designfunction
and I got really good P values and R^2, but my coefficient for design*function is ( -.09) I am very curious how should I interpret it.
Does this all make sense? I am really new to Stata. I would really appreciate any help.
Dears,
I'm conducting an event study for the effect of news announcement at certain date on stock return.
Using the market model to estimate the expected stock return in the "estimation window" , we need to regress stock returns ( stock under study) with returns from market portfolio index.
1- How can we decide upon choosing this market portfolio index for regression ?
Is it just the main index of the market?
Sector index from which the stock under study belong?..etc ?
2- Is it necessary that stock under study be among the constituents of this market index?
Appricite to justify your kind answers with research citations if possible
Many thanks
I'm working with life satisfaction as my dependent variable and some other independent variables that measure purchasing power (consumption, income and specific expenditures). To take into account the diminishig marginal returns of this last variables (following the literature) I transformed them in terms of their natural logarithm. However, now I want to compare the size of the coefficients of specific expenditures with the ones of consumption and income. Specifically, I would like some procedure which allows me to interpret the result like this: 1 unit of resources directed to a type of expenditure (say culture) is more/less effective to improve life satisfaction in comparison with the effect that this same unit would have under the category of income. If I just do this with withouth the natural logarithm (that is, expressed in dolars) the coefficients change in counterintuitive ways, so I would prefer to avoid this.
I was thinking about using beta coefficients, but I don't know if it makes sense to standarize an already logarithmic coefficient.
I am completing a dissertation project for my undergraduate economics degree and am in need of any assistance that can be offered in guiding me towards an appropriate econometrics model to test the effect of trade policy on GDP in the following east African countries: Kenya, Tanzania, Uganda, Burundi and Rwanda. As such I was thinking that using a VAR model for each country repeatedly for 10 years would be a possible method to tease out the effect of trade policy.If possible could I model my work based on the following paper https://www.researchgate.net/deref/http%3A%2F%2Fwww.cluteinstitute.com%2Fojs%2Findex.php%2FJBER%2Farticle%2Fview%2F2801%2F2849
In a regression with a database with N=1200, I have an independent dummy variable that measures if the surveyed is unemployed or employed. The variable has the following characteristics:
Unemployment = 0 - Frecuency: 1196
Unemployment = 1 - Frecuency : 4
The regression gives me a significant coefficient, but, also, very counter intuitive (especifically, thay Life Satisfaction has a possitve association with unemployment). I think, however, that it's wrong to obtain a valid conclusion from just 4 cases in Unemployment=1. I also have other dummy variables where the situation is even less clear. For example:
Dummy = 0 - Frecuency: 1170
Dummy = 1 - Frecuency: 30
Or even more:
Categorical option A = 0 - Frecuency: 1150
Categorical option B = 1 - Frecuency: 30
Categorical option C = 2 - Frecuency: 12
Cateogorical optio D = 3 - Frecuency: 8
Can I obtain valid conlcusions from this? And, in more general terms, is there a minimun number of observations needed per category of response in each independent variable so the conslusions that arise from it are pertinent/correct? If that's the case, how can I calculate this number?
In order to analyze if there is a mediation effect using Baron & Kenny's steps, is it necessary to include the control variables of my model, or is it enough to do the analysis just with the independent variable, the mediator variable and the dependent variable of my interest?