Science topic

Data Model - Science topic

Explore the latest questions and answers in Data Model, and find Data Model experts.
Questions related to Data Model
  • asked a question related to Data Model
Question
2 answers
I am currently researching control strategies for wind-excited tall buildings and seeking the MATLAB data/models for the Wind-Excited 76-Story Building benchmark developed by Yang et al., UC-Irvine (1997/2000). The original links appear to be inactive. Could anyone assist me in locating these resources?Any assistance or direction to obtain these resources would be greatly appreciated.
Relevant answer
Answer
Thanks for your help
  • asked a question related to Data Model
Question
4 answers
Integration of AI applications with Cybersecurity to create a model or agent that will have faster and more reliable techniques to be able to penterate into a system so that we may understand its flaws and improve upon them.
Relevant answer
  • asked a question related to Data Model
Question
2 answers
Relevant answer
Answer
Definitely yes
  • asked a question related to Data Model
Question
2 answers
hello researcher greetings
Actually I want to run panel data model in stata, my panel data consist monthly time variable with 6 cross-sectional observation. When I am putting my data on stata the time variable is coming to be string. When i generating monthly time variable, the time variable get extended to many time period ahead. Can any one help me to solve such problem.
Relevant answer
Answer
You can use these codes below
generate time = td(30 jan 2012)+_n-1
br time
format time %td
br time
tsset time
before using these codes copy the data set, excluding the time/date column and past them in stata
If you have any problem call me on +2348060962048
  • asked a question related to Data Model
Question
2 answers
Generally life extension and anti-aging. A lower death rate cancels out a low birthrate. https://www.researchgate.net/publication/382049802_Correcting_Cell_Errors
Relevant answer
Answer
This can be achieved by improving socioeconomic status of individual, and reduce economic hardshi.p
Reduce the depopulation strategies of contraceptive usage.
Frop the islamic point of view encourage polygamy where man is entitle to four wives provided he can cater for their well-being
  • asked a question related to Data Model
Question
3 answers
Can someone direct me to a working link to download the Century model for SOC?
The link in the Colorado State University site below doesn't seem to work
Relevant answer
Answer
Hello. I also need this model. If you find it, please send it to me
  • asked a question related to Data Model
Question
2 answers
Hi,
In the panel data model, where I'm researching the effects of demographic indicators associated with the aging of the population on the Poverty Risk Rate indicator. In this model, I found a negative regression coefficient for the regressor Proportion of seniors (it is a statistically significant effect), which means that with a higher proportion of seniors, the rate of poverty risk should decrease and vice versa. Please, how could I explain this in my thesis? The model is also tested for heteroskedasticity, autocorrelation, and multicollinearity, and all come out well.
Thank you!
Relevant answer
Answer
Hakan is right but to make sure it is not "spurious correlation" I would check the regression model for different time period (years) when, for example, economic situation of seniors was different. But it depends how long your times series are to make this kind of validation.
  • asked a question related to Data Model
Question
1 answer
I am analyzing some time-series data. I wrote a script in R and used two methods from two different packages in R to calculate the DW statistics and respective p-values. Surprisingly, for the same value of DW statistics, they give me significantly different p-values. Why and which one is more trustworthy (I assume the one calculated with the durbinWatsonTest)? Part of my code is below:
dwtest(model)
durbinWatsonTest(model)
R output is the following:
data: model DW = 1.8314, p-value = 0.1865
alternative hypothesis: true autocorrelation is greater than 0
lag Autocorrelation D-W Statistic p-value
1 0.07658155 1.831371 0.348
Furthermore, durbinWatsonTest from car package seems to involve some randomness. I executed for the same data (different than above) a script from the terminal within couple of seconds and the output is as below:
lag Autocorrelation D-W Statistic p-value
1 0.1181864 1.7536 0.216
lag Autocorrelation D-W Statistic p-value
1 0.1181864 1.7536 0.204
lag Autocorrelation D-W Statistic p-value
1 0.1181864 1.7536 0.198
p-value is different every time I execute the script.
Any ideas why? Which method gives correct p-values dwtest or durbinWatsonTest?
Relevant answer
Answer
IYH Dear Igor Niezgodzki
IMHO the issue with the inconsistent p-values generated by durbinWatsonTest() may be that by default, durbinWatsonTest() generates a non-zero number of bootstrap replicates to obtain more precise p-values. When run repeatedly, the resulting p-values will differ slightly due to the random sampling involved in the bootstrapping procedure.
To address this, set the number of bootstrap replicates to zero using the reps=0 argument inside durbinWatsonTest(). Doing so will yield deterministic p-values, and they should match the p-values produced by dwtest().
  • asked a question related to Data Model
Question
1 answer
2024 4th International Conference on Computer Technology and Media Convergence Design (CTMCD 2024) will be held in Kuala Lumpur,Malaysia on February 23-25, 2024.
---Call For Papers---
The topics of interest for submission include, but are not limited to:
1. Digital design
· Animation design
· Digital media art
· Visual media design
· Digital design analysis
· Smart design
2. Computer Technology
· Artificial intelligence
· Virtual reality and human-computer interaction
· Computer animation
· Software engineering
· Computer modeling
· Data model and method
· Big data search and information retrieval technology
· Intelligent information fusion
All accepted papers will be published in SPIE conference proceedings,which will be indexed by EI Compendex and Scopus.
Important Dates:
Full Paper Submission Date: February 06, 2024
Registration Deadline: February 13, 2024
Final Paper Submission Date: February 18, 2024
Conference Dates: February 23-25, 2024
For More Details please visit:
Relevant answer
Answer
Thanks for sharing. Wishing you every success in your work.
  • asked a question related to Data Model
Question
3 answers
The website of Colorado University is not working since months and I don´t know where to find this model!
Relevant answer
Answer
That link is not working.
  • asked a question related to Data Model
Question
3 answers
Hi
I intend to model a thermal responde we measured with ERA5 variables, and later, use this model to predict the future responde with CMIP6 variables.
My doubt is, how to use precipitation correctly?
In ERA5 hour analysis total precipitation comes in meters acumulated in 1hour (so its m/h I suppose) and in CMIP6 comes in Kg/m2/s .
Kg/m2/s is the same as mm/s, so I'm wondering if turning ERA5 precipitation from meters/hour to mm/s and use that in a valid way?
Also, am I messing up units considering the grids are not even the same in both datasets?
Thank you!
Cheers, Luís Pereira.
Relevant answer
Answer
Hi Luis
To convert precipitation from g/m**2/s , you have to multiply by 3.6 to get precipitation in mm/hour. Since 1g=0.001kg, we can match units of kg/m**2/s to mm/hr. Now if you have the time of accumulation of precipitation in hr, you can easily match the two units (kg/m2/sec to meter).
See "On the continuity and distribution of water substance in atmospheric circulations" by E. Kessler, Atmospheric Research 38 (1995) 109-145.
  • asked a question related to Data Model
Question
6 answers
A software design is a plan or a blueprint for building a software program. It is a high-level representation of the structure, behavior, and functionality of the software that guides the coding process. A software design typically includes a number of different components, such as:
  1. Architecture: This describes the overall structure of the software, including how different components will interact with each other and how data will flow through the system.
  2. Data structures: This describes the way that data will be organized and stored within the software, including databases, data models, and other data-related components.
  3. Algorithms: This describes the specific methods and procedures that will be used to perform different tasks within the software, such as sorting data or searching for information.
  4. User interface: This describes how users will interact with the software, including the layout of the user interface, the types of controls and widgets that will be used, and other details related to the user experience.
  5. Functional requirements: This describes the specific features and functionality that the software will provide, including the different tasks that it will be able to perform and the types of data that it will be able to handle.
Relevant answer
Answer
If a software design is not a program (and it isn’t), then what is it?
It is the interface.
Regards,
Shafagat
  • asked a question related to Data Model
Question
8 answers
I am estimating female labour participation rates using panel data for 7 countries. I have data period from 1991 to 2021. I have reviewed the literature, and it suggests using GMM only when you have larger Ns and small Ts. Can you please help in this regard that which advanced or dynamic panel model be used?
Relevant answer
Answer
For small N and large T panel data, you can use the Fixed Effects (FE) model or the First Difference (FD) model. The FE model is preferred when there is no serial correlation in the idiosyncratic errors. However, if there is serial correlation in the idiosyncratic errors, then the FD model may be more efficient.
There are also other advanced panel data models that can be used for small N and large T data such as the Generalized Method of Moments (GMM) and the Maximum Likelihood (ML) estimator. However, these models require more assumptions and may not be appropriate for small N and large T data.
  • asked a question related to Data Model
Question
4 answers
I have panel data model,, N=42 T=11, i need the differents commands of the 2nd unit root tests to stata?
Thanks
Relevant answer
Answer
1/ multipurt y x1 x2 , lags(0)
2/ for the first diff if needed u have to diff vars before using multipurt
However, u have N>T and i think running URT is not necessary.
Best
  • asked a question related to Data Model
Question
1 answer
If I have a panel data survey that collect 100 patients' EQ5D scores by app for their health. One patient can submit their scores more than one times as time passing by. The time in this model is defined as the interval between the data they register the app and the data the submit their scores. Some patients may fill this survey several times but some of them may only fill it one to two times. Thus, we could view the latter as drop out if we want to study the trajectory of the EQ5D scores. In this situation, how to use IPW to weight the sample, as early drop out may be a bias in the model because they feel better thus do not want to continue to record their quality of life.
Relevant answer
Answer
why are you thinking on weighing instead of an indicator variable for censored data (observations you know they have attrition)?
I would not know how to weight observations that only report their perception just once or twice relative to others more responsive.... If you weight by the number of times they answered you are assuming that dropping out if relatively random I believe.
Applying a censored model you are using estimators that take into account that participants who drop out the sample are not similar to the ones that remain
  • asked a question related to Data Model
Question
7 answers
I have heterogeneous panel data model,, N=6 T=21,What is the appropriate regression model? I have applied CD test , It shows the data have cross-sectional dependency
I used the 2nd unit root tests , and the result found that my data is stationary at level
is it possible to use PMG ? would you pleas explain the appropriate regression model?
Relevant answer
  • asked a question related to Data Model
Question
13 answers
I have panel data comprises 5 cross sections across 14 independent variables. the data time series part is 10 years. while I run the panel data model for pooled OLS and FE model it gives results while for Random effect model it shows error as RE estimation required number of cross-section>number of coefficients for between estimators for estimation of RE innovation variance. Can anyone help me how to get the results for Random effect model?
Relevant answer
  • asked a question related to Data Model
Question
6 answers
I am seeking recommendations for potential research topics for my PhD in the field of AI in healthcare, with a particular focus on neuroscience. I am interested in exploring how artificial intelligence can be used to improve our understanding and treatment of neurological and neuropsychiatric disorders.
Could you kindly suggest some potential research topics that are currently of interest or relevance in this field? I am particularly interested in topics that leverage machine learning, deep learning, or other AI techniques to analyze neuroimaging data, model brain function, or develop diagnostic or therapeutic tools for neurological and neuropsychiatric disorders.
Relevant answer
Answer
There are several ongoing and emerging research areas in AI for healthcare systems. Some of the newest and next research focuses are:
Explainable AI (XAI) in Healthcare: Explainable AI aims to provide a clear understanding of how AI algorithms work and why they make certain decisions. In healthcare, XAI can help physicians and clinicians understand the reasoning behind AI-powered recommendations or diagnoses, and increase the trust and adoption of AI in clinical practice.
Federated Learning for Healthcare: Federated learning is a machine learning technique that enables multiple parties to collaboratively train a model without sharing their raw data. In healthcare, federated learning can enable the development of AI models on decentralized and diverse datasets while maintaining patient privacy and data security.
AI-assisted Medical Imaging Diagnosis: Medical imaging diagnosis is a time-consuming and error-prone process, and AI can assist radiologists and clinicians in accurately identifying and analyzing medical images. AI-powered medical imaging diagnosis can also help to reduce healthcare costs and improve patient outcomes.
AI-powered Drug Discovery and Development: AI can accelerate the drug discovery and development process by predicting drug efficacy, identifying potential side effects, and optimizing drug dosages. AI-powered drug discovery can also help to identify new therapeutic targets and treatments for diseases.
Personalized Medicine with AI: Personalized medicine aims to provide tailored healthcare interventions based on individual patient characteristics such as genetics, lifestyle, and environment. AI can help to identify personalized treatment options and predict treatment outcomes for patients, improving patient outcomes and reducing healthcare costs.
Overall, the application of AI in healthcare is still in its infancy, and there is a vast scope for research and innovation in this field.
  • asked a question related to Data Model
Question
3 answers
I have panel data model, my sample includes 6 countries (won't add more), t= 11 years , independent variables =6 or 7
Can I use all the 6 or 7 Independent variables while I have 6 countries (cross -sections)?
Relevant answer
Answer
Well, adding more independent variables has its penalty which is mainly the R-square adjusted.
Also, being as parsimonious as possible is quite necessary for model building. You could run a pre-analysis to confirm the significance of the independent variables you want to add. However, it is advisable to make the whole variables not more than 6.
  • asked a question related to Data Model
Question
5 answers
Is there any test like this in Stata?
Relevant answer
Answer
Wintoki et al. (2012) adapt the procedure for testing the relevance of the instruments in 2SLS settings, that is to say they examined the F-Statistics for the joint significance of the instruments on the instrumented variable after the first stage, which is the reduced form equation where we regress the instrumented variable on all exogenous variables (x) + some some excluded instruments (z).
They carried out two separate procedures, one for the difference equation and for the level equation as they use system GMM estimator. In case you are using differenced GMM then you can follow the procedure for the differenced equation. For each procedure you use the instruments for the corresponding equation, for example, when you do the test for the level equation, use the instruments that were used in the level equation.
For more details you can consult their paper:
Wintoki, M. B., Linck, J. S., & Netter, J. M. (2012). Endogeneity and the dynamics of internal corporate governance. Journal of financial economics, 105(3), 581-606.
  • asked a question related to Data Model
Question
2 answers
Dear Scholars,
I have stationary dependent variable and non-stationary independent variables. I employed the Panel ARDL model but also I would like to run a static panel data model too. To control country differences, I decided to use fixed effects model but I could not find proper answer about taking differences.
Should I take differences for all variables or just for non-stationary variables?
Thank you very much for your helps.
Relevant answer
Answer
only take the difference of that variables which are non stationary.
  • asked a question related to Data Model
Question
3 answers
I'm trying to get the data of loan officers from microfinance(how many borrowers they approach, loan amount outstanding, the portfolio risk, the percentage of complete repayment, etc). Can anyone suggest to me the database to use data for the panel data model?
Thank you.
Relevant answer
Answer
Data about customers are sensitive and confidential and I doubt if any bank will like to release data about customers' loans. In my country (Nigeria) there may not be possibility of getting online information to capture this data because of fraudsters. I therefore have no idea of how clearance and permission can be given by Micro finance bank to get customers data about loans for research..
  • asked a question related to Data Model
Question
27 answers
One of my big problems is finding articles that could suggest new thoughts/research to my work. Part of the problem is the amount of extraneous material (dirt) that is available. For example, when I see an abstract that is long (>about 300 words in English), I simply ignore. My experience tells me it is usually unfounded or vague or hand waving. But there is a possibility there may be a grain of something that I'm ignoring. There is also the possibility I'm missing some paper that may be valuable. Then there is all those ad-hominem statements to which I respond to just ignoring those authors. I'd like to be more effective at finding new data/models while ignoring the dirt. How can I be more effective at distributing my research?
Relevant answer
Answer
Hai John, Emmanouil and all:
Thank you having wonderful applicable discussions. My experience with highly reputed peer-reviewed first journals with high impact factor are that they are quite groovy and selective to known names.
Another interesting experience with recent many peer-reviewed papers will be that there are progressive upcoming journals where highly knowledgeable reviewers have migrated moving from top tier journals so-called the must citable journals. Scientific community will take a while to realize progressive journals have ongoing really useful papers or original articles, not exclusively with top-tier journals.
What John and Emmanouil have stated about the peer-review process, agenda is developing to establish novel publication platform requiring initial presentation, then RG type preprint with subsequent publication of paper or article, that are further peer-reviewed to ensure that more scientists have participated debates and discussions of addressed topics. Noteworthy rubrics may involve high quality of original research versus mere quantity or number of papers. This shift of publication paradigm will help to cure inconsistencies between theoretical and experimental outputs. Logically meaningful knowledge achieved thus will make sense of what physics really is, especially with respect to nature!!
We will have interesting debates discussions time progressing!!
Thank you with engaging communications.
Sincerely,
Rajan Iyer
ENGINEERINGINC INTERNATIONAL OPERATIONAL TEKNET EARTH GLOBAL
  • asked a question related to Data Model
Question
6 answers
i sudied a process using design of experiments. firstly, i used screening by fractional factorial design. results showed that 3 out of 5 affecting factors are significant. also i found significant curvature in model. so, i used RSM method (box-behnken) to better understand the process using the 3 selected factors. results showed that the linear model is the best model that fit the data. i have confused with the results. whats the reason that results from fractional factorial design show curvature but behave linear in RSM method?
Relevant answer
Answer
  • asked a question related to Data Model
Question
5 answers
I am working on the development of a PMMS model. To select the best-performing tools and models, several models are needed to be developed and validated. Can this be replaced by some optimization algorithms?
Relevant answer
Answer
1. Use the downhill simplex method to minimize a function.
2. Use the BFGS algorithm to minimize a function.
3. Use the nonlinear conjugate gradient technique to minimize a function.
4. Using the Newton-CG approach, minimize the function f.
5. Use a modified Powell's approach to minimize a function.
  • asked a question related to Data Model
Question
1 answer
The 5 methods of estimating dynamic panel data models using 'dynpanel" in R
# Fit the dynamic panel data using the Arellano Bond (1991) instruments reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,4) summary(reg) # Fit the dynamic panel data using an automatic selection of appropriate IV matrix #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,0) #summary(reg) # Fit the dynamic panel data using the GMM estimator with the smallest set of instruments #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,1) #summary(reg) # Fit the dynamic panel data using a reduced form of IV from method 3 #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,2) #summary(reg) # Fit the dynamic panel data using the IV matrix where the number of moments grows with kT # K: variables number and T: time per group #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,3) #summary(reg)
Relevant answer
Answer
Brown Chitekwere,
May I ask if you find your answer to share it here, please?
In addition, when I ran the model, I received the error "rbind error: "names do not match previous names." Do you have any idea about it?
I appreciate your help.
Kind regards,
Mona
  • asked a question related to Data Model
Question
8 answers
A common threshold for standardized coefficients in structural equation models is 0.1. But is this also valid for first difference models?
Relevant answer
Answer
Jochen Wilhelm I agree very much with your statement that "you should better do more research on the meaning of the variable you are actually analyzing." I think that this is generally desirable for many studies. I also agree that there is a tendency in the social sciences to overemphasize standardized coefficients and to not even report unstandardized coefficients. That is very unfortunate in my opinion, as I believe both are important and have their place.
That being said, there are fields (mine included: psychology) where we are dealing with variables that simply do not have an intuitive metric. Many variables are based on test or questionnaire sum or average scores. People use different tests/questionnaires with different metrics/scoring rules in different studies. What does it mean when, for example, subjective well-being is expected to change by 2.57 for every one-unit change in self-rated health and by 1.24 for every one-unit change in extraversion when self rated health is measured on a 0 - 10 scale and extraversion ranges between 20 and 50?
Standardized estimates can give us a better sense for the "strength" of influence/association in the presence of other predictors than unstandardized coefficients when variables have such arbitrary and not widely agreed upon metrics. The interpretation in standard deviation (SD) units is not completely useless in my opinion, especially since we operate a lot with SD units also in the context of other effect size measures such as Cohen's d. It allows us (often, not always) to see fairly quickly which variables are relatively more important as predictors of an outcome--we may not care so much about the absolute/concrete interpretation or magnitude of a standardized coefficient, but it does matter whether it is .1 or .6.
In addition, in the context of latent variable regression or path models (i.e., structural equation models), unstandardized paths between latent variables often have an even more arbitrary interpretation as there are different ways to identify/fix latent variable scales (e.g., by using a reference indicator or by standardizing the latent variable to a variance of 1). Regardless of the scaling of the latent variables, the standardized coefficients will generally be the same.
This does not mean that I recommend standardized coefficients over unstandardized coefficients. Variance dependence and (non-)comparability across studies/different populations are important issues/drawbacks of standardized coefficients. Unstandardized coefficients should always be reported as well, and they are very useful when variables have clear/intuitive/known metrics such as, for example, income in dollar, age, number of siblings (or pretty much any count), IQ scores, etc. Unstandardized coefficients are also preferable for making comparisons across groups/populations/studies that used the same variables. I would always report both unstandardized and standardized coefficients along with standard errors and, if possible, confidence intervals.
I believe there are many examples of regression or path models in psychology for which standardized coefficients were reported and that did advance our knowledge regarding which variables are more important than others in predicting an outcome.
  • asked a question related to Data Model
Question
5 answers
Dear blockchain researchers,
In the classical Nakamoto Blockchain (BC) model, Transactions (TXs) are packaged in blocks and each block points to, specifically, its previous single block (I'm not gonna go into technical details here). This is a linear data model which justifies the name 'chain'. In the DAG-based BCs, TXs may, or may not, be packaged into blocks, and then each TX/ block (say 'a' ) is allowed/enforced to point to more than one parent. Consequently, several children blocks/TXs (say 'b', 'c' and 'd') are similarly allowed/enforced to randomly point later to 'a'. This is a network like data model which is obvious.
Searching in previous works, all DAG-based BCs I found adopt a many-to-many cardinality model of blocks/TXs as described above. Some do propose children must point to several parent for higher throughput and credibility. However, none of those proposed, specifically, a relaxed one-to-many parent-child dependency.
To clarify, I specifically mean that children are enforced to point to 'only one' parent, while each parent is allowed to be pointed to by several children. This leads to a tree-like DAG instead of a complicated dense network. I need some references that discuss such data modelling. Would be much beneficial if a comparison is also conducted between different types of DL data models (1-to-many vs. vs. many-to-one vs. 1to1 vs many-to-many).
Any help, explanation, or suggestions are most welcome!
Relevant answer
Answer
Devils advocate.
Referencing another transaction/block in a DAG is basically providing validation service. In other words, to have your transaction confirmed, you first validate another. The reason why you would want to have more then one parent is for the network to scale and induce more trust as you pointed out. As long as you give to the network more (validate two transactions) then the network has given you (validate your single transaction) the model can be scaled.
On the other hand, I'm not sure having one parent would result in a tree like structure unless you have some constraints on which parents can be picked. That would be analogue to tip selection problem?
  • asked a question related to Data Model
Question
10 answers
Can anyone suggest any ensembling methods for the output of pre-trained models? Suppose, there is a dataset containing cats and dogs. Three pre-trained models are applied i.e., VGG16, VGG19, and ResNet50. How will you apply ensembling techniques? Bagging, boosting, voting etc.
Relevant answer
  • asked a question related to Data Model
Question
4 answers
Extended/edited from an early question for clarity.
I have temporally high resolution outputs of modelled climate data (model x, spanning 0-5000 ka. Low spatial resolution 0.5 degrees). Compared to other climate models, however, I have reason to believe it is under-predicting precipitation/temperature changes at certain time intervals. Is there a way to calibrate this with better quality records (i.e., those available via WorldClim/PaleoClim)?
For example, the response to the MIS 5e (120-130 ka BP) incursion of the African Summer Monsoon and Indian Summer Monsoon into the Saharan and Arabian deserts is very weak compared to the MIS 5e data from WorldClim/PaleoClim (and corroborated by palaeoclimatic data). Can I correct/calibrate model x with these more responsive models, and how should this be done?
Relevant answer
Answer
Dear @Sam Nicholson, I'm afraid climate models do not calibrate any parameter. These models are developed considering different physical processes in terms of their equations. The number of physical processes considered, the way in which they are analytically described and also numerically implemented, depends on different factors, including the spatio-temporal discretization of the climate model (i.e. grid dimension, time step length) and the total temporal horizon to be simulated. The evolution of the state variables of climate models depends on different model forcings such as the income radiative flux affecting the modeled system through the physical processes considered by the model (e.g. vapor condensation, evaporation, moisture recycling, etc. ). Therefore, to simulate the expected paleoclimate trends, which are often revealed by different proxies, you should include the corresponding climate forcings in the model, which are supposed to generate such paleoclimate trends.
  • asked a question related to Data Model
Question
7 answers
Dear all,
I wanted to evaluate the accuracy of a model using observation data. My problem is the correlation of the model with observed data is really good (bigger than 0.7) but RMSE is very high too (like bigger than 100 mm in a month for monthly rainfall data). How can I explain it? the model also has low bias.
How to explain this case?
Thank you all
Relevant answer
Answer
The choice of a model should be based on underlying physical explanations first.
See the manual of my software for examples:
  • asked a question related to Data Model
Question
11 answers
The aim of my study is to investigate the impact of Integrated Reporting disclosure practices on operational performance (ROA) and firm value (Tobin's). I have applied panel data model for my analysis. Under descriptive statistics the Std. deviation of tobin's q is high i.e. 4.964. One of the reviewer commented that high std deviation of tobin's q means that variable is not normal, which may affect results. However, I have studied that normality is not required in panel data models? What justification should I give to satisfy reviewer? Please also mention some references.
Relevant answer
Answer
If you do any of the models that employ regression and particularly cointegration then the dynamic stability of your variables are more important than anything else ,this will be true for time series data. If you employ cross section data there are several econometric tests that can be performed to assure soundness of the mode and standard deviation is only one criterion that may be covered by the the various tests.
  • asked a question related to Data Model
Question
4 answers
I try to create a model in Rstudio, however, I can't find a solution. Order of my procedure is;
- data
- Shapiro-Wilk test for normality (It says; data has non-normal distribution)
- log transformation
- Shapiro-Wilk test for normality again (It says; data still have non-normal distribution)
What can I do?
Relevant answer
Answer
Thank you for your answer
  • asked a question related to Data Model
Question
3 answers
Base on Hansen(1999) we can estimate fixed effect threshold panel data model. In my model Hausman Test says it's random effect, what can I do?
Relevant answer
Answer
Hi, check also Fixed-effect panel threshold model using Stata, Qunyong Wang, The Stata Journal (2015) 15, Number 1, pp. 121–134
  • asked a question related to Data Model
Question
3 answers
HI everybody
I am trying to run the CESM-atm model, but I don't get where is the path for the data, I am attaching an image of what must be the structure of the path.
By the way, I am running this model in my personal lap, so I had to do the porting proccess before, so I don't  think that would be really the problem here.
Could anyone explain me what I must do for downloading the data for the model?
Thanks a lot!
Relevant answer
Answer
That is a good question.
  • asked a question related to Data Model
Question
2 answers
I noticed that while using the gemtc package to perform a fixed effect model with likelihood = "normal" and link = "identity" (mean difference), the burn in iteration specified in mtc.run ("n.adapt") are not taken into account.
Example (with "parkinson" data):
model <- mtc.model(parkinson, likelihood='normal', link='identity', linearModel = 'fixed')
res <- mtc.run(model, n.adapt = 20000, n.iter = 75000)
summary(res)
#Results on the Mean Difference scale
#Iterations = 1:75000
#Thinning interval = 1
#Number of chains = 4
#Sample size per chain = 75000
if no specification for the linear model, a random effect is performed by default. Random effect is working, and other likelihood / link are working in both model.
Is there a way to use the package in mean difference with a fixed effect model including "burn in" interations ? Do you see any error in the way I used the likelihood='normal' / link='identity' ?
Relevant answer
Answer
Adnan Majeed, thank you for sharing. In the PDF, it's mentionned that fixed effect for a mean difference (likelihood = "normal" and link = "identity") should be working, but I noticed that "burn in" iteration are not taken into consideration (see my attached PDF). Other likelihood / link works well.
I also opened an issue on Github... I wonder if there is other way I could proceed for a fixed effect model in mean difference with that package (random effect works well) ?
  • asked a question related to Data Model
Question
7 answers
In the development of forecasting, prediction or estimation models, we have recourse to information criterions so that the model is parsimonious. So, why and when should one or the other of these information criterions be used ?
Relevant answer
Answer
You need to be mindful of what any one IC is doing for you. They can look at 3 different contexts:
(a) you select a model structure now, fit the model to the data you have now and keep using those now-fitted parameters from now on.
(b) you select a model structure now and keep that structure, but will refit the model to an expanded dataset (reducing parameter-estimation variation but not bias).
(c) you select a model structure now and keep that structure, but will continually refit the model as expanded datasets are available (eliminating parameter-estimation variation but not bias).
  • asked a question related to Data Model
Question
5 answers
I have a panel data set of 11 countries and 40 years while data is consisted of two groups developing and developed countries. The chosen method will be applied on both groups of data set separately in order to compare results of two groups. Suggestions will be appreciated.
Relevant answer
Answer
Dynamic panel models like Arellano bond (xtabond,xtdpdsys) are used for large N and small T. For small N and large T, you can use FE, or Zellner's seemingly unrelated regressions (SUR)-with Stata command sureg. or FGLS (xtgls).
  • asked a question related to Data Model
Question
5 answers
I am using transfer learning using pre-trained models in PyTorch for the Image classification task.
When I modified the output layer of the pre-trained model (e,g, alexnet) as per our dataset and run the code for seeing the modified architecture of alexnet it gives output as "none".
Relevant answer
I try to replicate your code, and I don't get "None", I just get an error when I try to do an inference with the model (see image-1). In your forward you do it:
def forward(self, xb):
xb = self.features(xb)
xb = self.avgpool(xb)
xb = torch.flatten(xb,1)
xb = self.classifier(xb)
return xb
but features, avgpool and classifier are "variables" of network, then you need to do:
def forward(self, xb):
xb = self.network.features(xb)
xb = self.network.avgpool(xb)
xb = torch.flatten(xb,1)
xb = self.network.classifier(xb)
return xb
then when I run the forward again, everything looks ok. (see Image-2)
If this not work for you, could you share your .py? I need to check the functions: to_device, evaluate, and check the ImageClassificationBase class. To replicate the error and be able to identify where it is.
  • asked a question related to Data Model
Question
9 answers
I have non-stationary time-series data for variables such as Energy Consumption, Trade, Oil Prices, etc and I want to study the impact of these variables on the growth in electricity generation from renewable sources (I have taken the natural logarithms for all the variables).
I performed a linear regression which gave me spurious results (r-squared >0.9)
After testing these time series for unit roots using Augmented Dickey- Fuller test all of them were found to be non-stationary and hence the spurious regression. However their first differences for some of them, and second differences for the others, were found to be stationary.
Now when I test the new linear regressions with the proper order of integration for each variables (in order to have a stationary model) the statistical results are not good (high p-value for some variables and low r-squared (0.25))
My question is how should I proceed now? Should i change my variables?
Relevant answer
Please note that transforming variable(s) does NOT make the series stationary, but rather makes the distribution(s) symmetrical. Application of logarithmic transformation needs to be exercised with extreme caution regarding properties of the series, underlying theory and the implied logical/correct interpretation of the relationships between the dep variable and associated selected regressors.
Reverting to your question, the proposed solution would be to use the Autoregressive Distributed Lag (ARDL) model approach, which is suitable for datasets containing a mixture of variables with different orders of integration. Kindly read the manuscripts attached for your info.
All the best!
  • asked a question related to Data Model
Question
4 answers
EDIT: Up to the literature suggested in the answers, IT IS NOT POSSIBLE because they are required at least some calibration data, which - in my case - are not available.
I am looking for a technique/function to estimate soil temperature from meteorological data only, for soils covered with crops.
In particular, I need to estimate soil temperature for a field with herbaceous crops at mid-latitudes (north Italy), but the models I found in literature are fitted for snow-covered and/or high-latitude soils.
I have daily values of air temperature (minimum, mean and maximum), precipitation, relative humidity (minimum, mean and maximum), solar radiation and wind speed.
Thank you very much
Relevant answer
Answer
  • asked a question related to Data Model
Question
6 answers
I am running an ARDL model on eviews and I need to know the following if anyone could help!
1. Is the optimal number of lags for annual data (30 observations) 1 or 2 OR should VAR be applied to know the optimal number of lags?
2. When we apply the VAR, the maximum number of lags applicable was 5, beyond 5 we got singular matrix error, but the problem is as we increase the number of lags, the optimal number of lags increase (when we choose 2 lags, we got 2 as the optimal, when we choose 5 lags, we got 5 as the optimal) so what should be done?
Relevant answer
Answer
  1. My first comment is that all cointegrating studies must be based on the economic theory (and common sense) of the system that you are examining. Your theory should suggest which variables are stationary, which are non-stationary, and which are cointegrated. Your ARDL, VECM, etc, analyses are then tests of the fit of the data to your theories. It is not appropriate to use these methodologies to search for theories that fit the data. Such results will give spurious results. Now suppose that you have outlined your theory in advance of touching your computer keyboard to do your econometrics.
  2. You have only 30 annual observations. This is on the small size for an elaborate analysis such as this. It appears that you have one dependent variable and possibly 3 explanatory variables. If you have 5 lags you are estimating about 25 coefficients which is not feasible with 30 observations.
  3. If you wish to use the ARDL methodology you must be satisfied that (1) there is only one cointegrating relationship and (2) that the explanatory variables are (weakly) exogenous. Otherwise, a VECM approach is required and you may also not have enough data for a VECM analysis.
  4. Is it possible that you would use a simpler approach? Could you use graphs or a simpler approach to illustrate your economic theory? These are questions that you alone can answer. Advanced econometrics is not a cure for inadequate data and a deficit of economic theory.
  5. If this is an academic project, consult your supervisor. While what I have set out above is correct, it may not be what your supervisor expects at this stage in your studies.
  • asked a question related to Data Model
Question
8 answers
Hello,
My friend is seeking an collaborator in psychology-related statistics. Current projects including personality traits and their relations to other variables (e.g., age). You will be responsible for doing data analysis for potential publications. Preferbably you should have some knowledge about statistics and is fimaliar with software that is used to do analysis (e.g., MATLAB, R, SPSS). 10 hours a week is required. Leave your email address if interested.
Relevant answer
Answer
Psychological Councilling data of pateints can be analysed statistically - https://www.ncbi.nlm.nih.gov/books/NBK425420/
  • asked a question related to Data Model
Question
4 answers
I'm a community ecologist (for soil microbes), and I find hurdle models are really neat/efficient for modeling the abundance of taxa with many zeros and high degrees of patchiness (separate mechanisms governing likelihood of existing in an environment versus the abundance of the organism once it appears in the environment). However, I'm also very interested in the interaction between organisms, and I've been toying with models that include other taxa as covariates that help explain the abundance of a taxon of interest. But the abundance of these other taxa also behave in a way that might be best understood with a hurdle model. I'm wondering if there's a way of constructing a hurdle model with two gates - one that is defined by the taxon of interest (as in a classic hurdle model); and one that is defined by a covariate such that there is a model that predicts the behavior of taxon 1 given that taxon 2 is absent, and a model that predicts the behavior of taxon 1 given that taxon 2 is present. Thus there would be three models total:
Model 1: Taxon 1 = 0
Model 2: Taxon 1 > 0 ~ Environment, Given Taxon 2 = 0
Model 3: Taxon 1 > 0 ~ Environment, Given Taxon 2 > 0
Is there a statistical framework / method for doing this? If so, what is it called? / where can I find more information about it? Can it be implemented in R? Or is there another similar approach that I should be aware of?
To preempt a comment I expect to receive: I don't think co-occurrence models get at what I'm interested in. These predict the likelihood of taxon 1 existing in a site given the distribution of taxon 2. These models ask the question do taxon 1 and 2 co-occur more than expected given the environment? But I wish to ask a different question: given that taxon 1 does exist, does the presence of taxon 2 change the abundance of taxon 1, or change the relationship of taxon 1 to the environmental parameters?
Relevant answer
Answer
Thank you Remal Al-Gounmeein for sharing! I think it's interesting because I have somewhat the opposite problem that this paper addresses; many people in my field use simple correlation to relate the abundance of taxa to one another, but typically those covariances can be explained by an environmental gradient. So including covariates actually vastly decreases the number of "significant" relationships. But still it's a point well-taken because explaining that e.g. taxon1 and taxon2 don't likely interact directly even though they are positively or negatively correlated would in fact require presenting the results of both models. Thanks!
  • asked a question related to Data Model
Question
2 answers
Hello,
Dpes anyone have an idea about howto analyse my panel data of exchange rate and stock markets of six countries spread over ten years. My panel data set is actually long (T greater than N) and is unbalanced. I'm initially using the pooled regression and fixed effects models and the Wald test. But while reading, I come to notice that panel data models are applied according to panel data structure. So I'm a bit confused. I will be glad if I could have more insight on which model best fit my data structure. Thanks in advance.
Relevant answer
Thanks Adnan Majeed
  • asked a question related to Data Model
Question
3 answers
I need your help and expertise on the J48 decision tree algorithm that will walk me through the data analysis and interpretation of the data model.
How the data will be consolidated? Processed? Analyzed? and interpretation of the data model.
Relevant answer
Answer
Dear @Jesie, you are talking about a supervised model. It is known as the C4.5 algorithm developed by Quinlan. This kind of model implies trainig, pruning, and test stages. The trainig will perform a tree growing up. It will split data based on the best feature choice iteratively. It will conitnue until a) there are not more data, b) the minimum volume of data in a node is not enough for a node splitting, or c) all data into a node belong to the same target class
After training the model, a pruning stage is performed to avoid the overfitting in the model. A cross-validation technique (k-folds=10) should be used for better results. You could use the weka software for analysis of this algorithm. I hope it is useful for you. (https://www.cs.waikato.ac.nz/ml/weka/).
  • asked a question related to Data Model
Question
3 answers
I first conducted a fixed effect model using xtreg y x, fe and I found that all the variables are significant and R-squared is .51.
So I thought that maybe I should use two step system GMM to account for endogeneity. But, since I only have 3 years, when i include the lagged variable as a predictor using xtabond2 y l.y x y*, gmm ( l.y) iv (x y*), equation (level)) nodiffsargan twostep robust orthogonal small, the number of observations per group shrinks to two and I can't even run an AR(1) test or Sargan test. And Also the output shows insignificant lagged variable.
I am still new to dynamic panel data models. Do I need GMM in such small sample size and small number of observations? Should i use something else? If i only report fixed effects results would that be sufficient to be considered for publication?
I would love to hear your recommendation. Thank you very much,
Relevant answer
Answer
Why don't you try a longitudinal models? the best intro is here:
Prof. Davidian developed these models. Feel free to ask questions.
Best, D. Booth
  • asked a question related to Data Model
Question
4 answers
I wish to investigate the effects of landscape parameters on avian community assemblages in an agricultural landscape. In order to conduct modelling in ArcGIS is it advisable to use BIOCLIM data in the Model Builder?
I'm not going for prediction , rather, just want to see the effects of landscape parameters on birds' assemblage.
Relevant answer
Answer
A new source of bioclimatic variables' uncertainty has been discovered that is related to the selection of the specific month/quarter (e.g. wettest quarter in case of bio8). Please refer to this paper:
  • asked a question related to Data Model
Question
3 answers
I am having 21 json files containing more than 15 million rows with approx. 10 features in each file. I need to first convert all the json files to csv and combine all the csv files into one to have a high dimensional dataset. For now, if I load each individual json file as csv, it provides me only the max limit of excel which is 1048576 rows of data which means I am losing rest of the data. I know I can analyze them using data model and power pivot in excel. However, I need to load them first in a csv file for doing dimensionality reduction.
Any idea or suggestion on loading this much data in a csv, excel or any other accepted format which I can later use in Python?
Relevant answer
Answer
Python library pandas will be useful here, just load the CSV and use the sample for analysis. This link would be helpful
  • asked a question related to Data Model
Question
2 answers
Hello!
I estimate the influence of some components of the global competitiveness index on the index itself for 12 countries over the period of 11 years. So, in my model I have N=12 and T=11, while the number of components is equal to 32. I am facing the situation when the only model, which provides acceptable test results for my data is the 1-step dynamic panel. In my model I use log differences of selected variable. Yes, it contains lagged dependent variables, but am worried if the presence of lagged dependent variables and the acceptable test results are enough to justify the selection of dynamic panel data model.
Relevant answer
Answer
Dear Murat,
I am more than pleased!
  • asked a question related to Data Model
Question
4 answers
what useful information can be extracted from a saved model file regarding the training data.
From security perspective too. If someone has access to the saved model what information can they gain?
  • asked a question related to Data Model
Question
1 answer
I have to estimate a panel data model( 19 country and and 37 year) with xtscc command (Regression with Driscoll-Kraay standard errors), i want to know how can i choose the optimal lag for this estimation . Thank you for any suggestion .
Relevant answer
Answer
Thank you!
  • asked a question related to Data Model
Question
7 answers
I am working on this corporate panel data model: LEVERAGE_it = PROFITABILITY_it + NON-DEBT-TAX-SHIELD_it + SIZE_it + LIQUIDITY_it + TOBIN_Q_it + TANGIBILITY_it + u_it. Where:
leverage = long term debt/total assets
profitability = cash flows/total asset
non debt tax shield = depreciation/total asset
size = log (sales)
liquidity = current assets/current liabilities
tobin_q = mkt capitalization/total assets
tangibility = tangible fixed asset/total assets
What can I say about the exogeneity condition? Can I assume that expected value of the covariance between error term u_it and of X_i is zero? Why? A lot of papers make this assumption but do not explain why.
Thank in advance for your response.
Relevant answer
Answer
In general, to explore the heterogeneity in a panel data, you need to first identify the group classifications of data in line with variables of interest.
  • asked a question related to Data Model
Question
7 answers
Can I use one Sentinel image for training my model and another one for testing? Since i have two or three images, I wanted to use one or two image for training and the rest for test. However, I know 70 15 15 is the ideal proportion. But i dont know how to implement it for three images. And also, is this possible not to include 15 percent for validation? Just 70 30?
Relevant answer
Answer
I suggest you to visit the website of TensorFlow , we can find you the example to explained you how devise your datasets and how you can test your model by using of images tests one by one.
I hope that be Claire for you.@ Nima Karimi
  • asked a question related to Data Model
Question
1 answer
Dear all,
I am working on the BACON model to establish the chronology of a lake core. However, I have a question seeking help from you.
Is it necessary to add my 210Pb data into the model? If yes, how to calculate the dating error ?
Thanks,
Mingda
Relevant answer
Answer
Dear Glückler,
Thanks for your answer. Yes, considering the calculation of the dating error for 210Pb, the new rplum package give me a solution.
Regards,
Mingda
  • asked a question related to Data Model
Question
6 answers
Why some researchers, in their paper, report the results from the static panel data models (OLS, FE and RE) beside the results from the dynamic models (1st difference GMM and SYS-GMM) while they chose GMM models as the best model for the research problem.
Relevant answer
Answer
I am an applied economist; not an econometrician. So I hesitate to say that published papers in good journals are not adhering to good practice. However, having made this gesture towards humility, this is what I am about to argue. The key to determining the model specification that best conforms to the underlying process or mechanism that generates your data (the data generating mechanism) is diagnostic testing. In the case of distinguishing between a static or a dynamic model - hence, making a judgement about the underlying data generating mechanism - a simple procedure is to estimate a static panel FE model and then test the within-group residuals for serial correlation. (In Stata, this may be done by the user-written xtserial package, which implements a test devised by Jeffrey Wooldridge.) If the null of no serial correlation in the residuals is not rejected, then it is reasonable to conclude that your model does not suffer from unmodelled dynamics. In this case, estimate a static model. However, if the null is rejected, then your model has not captured the dynamics in the data. In this case, estimate a dynamic model. In effect, estimating a dynamic model is to displace the dynamics in the data from the error term (where serial correlation violates the assumptions of the estimator) into the estimated part of the model (where the dynamics are explicitly modelled and thus yield useful information - e.g. enabling long-run and short-run effects to be distinguished). If this reasoning is correct, then it is not helpful (to say the least) to report both static and dynamic models. They cannot both be well specified. Indeed, it is my observation that researchers who report both do not undertake diagnostic tests. I suspect that if diagnostic testing were to be undertaken then it would reveal the static models to be misspecified and thus - by their construction - yielding biased and inconsistent estimates . (In my experience, time-series data usually exhibit serial dependence, which needs to be taken into account in specifying econometric models). To summarise: (I) test for serial correlation; and (ii) if you find it, explain to your readers why you specify and report only a dynamic model.
By the way, the GMM is a general approach to estimation. As such, it is often applied to dynamic models but is not restricted to dynamic modelling (e.g. the GMM approach can be used to estimate static models).
  • asked a question related to Data Model
Question
11 answers
I have behavioral data (feeding latency) which is the dependent variable. There are 4 populations from which the behavioral data is collected. So population becomes a random effect. I have various environmental parameters like dissolved oxygen, water velocity, temperature, fish diversity index, habitat complexity etc. as the independent variables (continuous). I want to see which of these variables or combination of variables will have significant effect on the behavior.
Relevant answer
Answer
I agreed with Abdullahi Ubale Usman answer. But some other techniques like non-linear analysis, cluster analysis, factor analysis, etc. may be utilized in this regard
  • asked a question related to Data Model
Question
3 answers
Regarding interoperability of FEA tools:
1. Is Dassault Abaqus Input-file Format is widely supported by other FEA tools (such as, Ansys, Nastran, etc.)? Or every FEA tool has a specific Input file format that cannot be handled/used by a different tool?
2. Are there any interoperability issues between different versions of Nastran provided by different vendors (for instance, MSC Nastran and NX Nastran, etc.)? Or can we use the model developed in one Nastran version (e.g. MSC Nastran) easily in a different Nastran version (e.g. NX Nastran)?
Thanking you.
Relevant answer
Answer
In recent updated versions such as ANSYS 19.3, Altair hyperworks, most of the cases, all input files formats are compatible with any solver and intermediately we can move from one solver to other without much hassle.
  • asked a question related to Data Model
Question
8 answers
Obviously, the world is watching covid-19 transmission very carefully. Elected officials and the press are discussing what "the models" predict. As far as I can tell, they are talking about the SIR model: (Susceptible, Infected, Recovered). However, I can't tell if they are using a spatial model and if the spatial model they are using is point pattern or areal.This is critical because the disease has very obvious spatial autocorrelation and clustering in dense urban areas. However, there appears to be a road network effect and a social network effect. For example, are they using a Bayesian maximum entropy SIR? Or a Conditional Autoregressive Bayesian spatio-temporal model? An agent based model? Random walk?
I mean "they" generally. I'm sure different scholars are using different models, but right now I think I can find one spatio-temporal model, and what these scholars meant is that they did two cross sectional count data models (not spatial ones either) in two different time periods.
  • asked a question related to Data Model
Question
4 answers
Dear researchers,
It is several years in which OSM (Open Street Map) is developing huge amounts of spatial data all around the world. As I know some countries like Canada have reorganized their NTDB (National Topographic Data Base) data models to be harmonized with OSM data layers and being merged with them, and although they accept the ODBL licenses as Open DataBase Licence.
I am wondering if it is possible to have a list of such countries' names.
Any help will be so appreciated
Thank you very much for your time.
With Regards
Ali Madad
Relevant answer
Answer
when it comes to some kind of national, government maps, you may not find such a list
but RG is huge, maybe just people will tell you that their country is like in Canada or not
in Poland, unfortunately not, we have a geodetic map of Topographic Objects Database similar to OSM
  • asked a question related to Data Model
Question
4 answers
sir
my research topic is crowding-in and crowding-out effects of public investment on private investment in emerging Asian economies. i have panel data of 6 countries 15-years yearly data and 1 IV (public investment) , 1 DV (private investment) and 8 control variables as my panel data is small , i need your suggestion which panel data model on stata is suitable for my data.
Relevant answer
Answer
You can run Fixed Effects, Random Effects, or Pooled OLS.
  • asked a question related to Data Model
Question
1 answer
What variable(s) can be used as instruments for public health and education expenditure in testing for endogeneity in a static panel data model that regresses public health/education expenditure on economic growth? I am using random effects estimators since this is the most appropriate traditional panel technique (the Hausman test suggested this).
There are many variables in the literature that have a positive correlation with public health expenditure, for instance. However, these variables also have strong correlation with real GDP per capita growth rate and therefore are unsuitable instruments.
Relevant answer
Answer
Variables such as population growth and foreign aid are inappropriate
  • asked a question related to Data Model
Question
4 answers
Dear all,
the panel data model i am going to analyse has some stationary and non-stationary variables, and non-stationary variables are integrated of order one. what would be the best estimation method i must apply? discuss plz
Relevant answer
If the dependent variable is stationary at I(1) you can use ARDL Bounds test, but if the dependent variable is stationary at I(0), you can use an augmented Autoregressive Distributed Lag (ARDL) bounds test.
  • asked a question related to Data Model
Question
6 answers
The conventional test for the system GMM is 1) testing for instrument validity and 2) test for second order serial autocorrelation.
Are there pre-estimation tests that may be relevant i.e normality,heteroskedasticity, panel unit root tests, panel cointegration test
I do ask since almost 90% of academic papers reviewed seem to ignore these tests and stress mostly on the two .
Relevant answer
Answer
The GMM dynamic panel estimators are appropriate for large N and small T and generally pre-tests are not conducted. A large panel sample size is often used so that the Central Limit Theorem can be invoked for the asymptotic normality of coefficients even if the residuals are non-normal. Robust standard errors can be employed to deal with autocorrelation and heteroscedasticity. For a broader discussion and how to apply GMM methods in Stata see:
Roodman, D. (2009). How to do xtabond2: An introduction to "Difference" and "System" GMM in Stata. The Stata Journal, 9(1), 86-136. With very small T unit root testing would not typically be applied. If T is large GMM estimators can become unreliable because the number of instruments becomes large and the instrumented variables can be overfitted and so may not remove the endogenous components of the lagged dependent variable(s) as intended. When N is not large and T is moderate you may wish to use a bias corrected LSDV estimator to deal with dynamic panel bias, although these assume that all variables other than the lagged dependent variable are strictly exogenous. To apply a bias-corrected LSDV estimator to a potentially unbalanced dynamic panel in Stata see:
Bruno, G. (2005). Estimation and inference in dynamic unbalanced panel-data models with a small number of individuals. The Sata Journal, 5(4), 473-500. With moderate T you may wish to apply panel unit root tests. Second generation tests that deal with cross-sectional dependence are recommended. An example paper that uses Bruno's estimator and panel unit root tests is given in:
(1) Goda, T., Stewart, C. and Torres García, A. (2019) ‘Absolute Income Inequality and Rising House Prices’, forthcoming in Socio-Economic Review. An example application of GMM dynamic panel estimators without unit root testing is: Matousek, R., and Nguyen, T. N., and Stewart, C. (2017) ‘Note on a non-structural model using the disequilibrium approach: Evidence from Vietnamese banks, Research in International Business and Finance, 41, pp. 125 – 135.
  • asked a question related to Data Model
Question
7 answers
Hi, I'm testing a serial multiple mediation model with two mediators. I tried twice by testing different data but they all showed the same results that CMIN and df=0.
First, I did CFA test to ensure the validity of model and the model fit was acceptable. Second, when I was doing mediation test(I made casual model only using unobserved variables), the information of Notes for Model were shown that:
Number of distinct sample moments: 15
Number of distinct parameters to be estimated: 15
Degrees of freedom(15-15):0
Result(Default model)
Minimum was achieved
Chi-square= .000
Degrees of freedom= 0
Probablity level cannot be computed
Based on these, will this model be acceptable to test further hypotheses? Or will this model be meaningful to study? And I checked the literature by Byrne(2001).
The reference is: Byrne, B.M. 2001. Structural equation modeling with AMOS : basic concepts, applications, and programming. Mahwah, N.J. ;: Lawrence Erlbaum Associates.
It mentioned that "this kind of model is not scientifically interesting because it has no degrees of freedom and therefore can never be rejected" (Byrne, 2001). Anyone could give any comments and suggestions on this?
I think it might result from this particular type of causal relationship or coincidence? Because the CFA test of this model:
CMIN=1170.358
df=399
CFI= 0.919
TLI=0.911
SRMR=0.045
RMSEA=0.066
which might provide evidence that the data and model could match well. So, what might be the actual reasons?
Thank you all for any comments in advance!
Thanks!
Relevant answer
Answer
Holger Steinmetz Thank you so much for your consistent help. I will go on reading and studying.
  • asked a question related to Data Model
Question
1 answer
Hi, I am trying to model the effect of human perception on wildfire ignition in the United States. I want to use Google trend data to model society's perception of wildfire. Are there any similar studies that use Google trend data to model people's perception?
Relevant answer
Answer
Hi.,
Google Trends to improve your SEO:
  1. Start Big & Narrow Down. ...
  2. Focus on the Context. ...
  3. Get Advanced Insights with Specific Search Options. ...
  4. Use it for location-based targeting. ...
  5. Predicting Trends. ...
  6. Research long-tail keywords for your content. ...
  7. Use Google Trends for Video SEO. For More Info: https://www.oberlo.in/blog/google-trends
  • asked a question related to Data Model
Question
2 answers
I have done uni-axial testing on a biological tissue for 10% strain and have the data. Now I need to use the data and model the tissue in abaqus.I believe Fung-Anisotropic model suits best for the tissue.I could not find any clear reference textbooks/sources for modelling from test results.
Relevant answer
Answer
Seyed Shayan Sajjadinia Many thanks for your help!
  • asked a question related to Data Model
Question
9 answers
I was able to run an analysis on AMOS.
However, I have a low fit of my data to the model, and given time contrasts I doubt I would be able to double the number of data I got (I only have 217 responses..).
What could I do? At the moment I have a
- CMIN/DF: 18,5
-CFI: 0,46
-RMSEA: 0,285
I tried to go through the modification indices, but the only available covariance modification i could do between two 'errors' doesn't make sense and would only improve my model by 4,7.
Any suggestions?
Relevant answer
Answer
You can write me here ( message ) or to my email : Alireza.shabani@hotmial.com
  • asked a question related to Data Model
Question
2 answers
I’m working with a panel data about Foreign Direct Investments using FDI flows as endogenous and, among others, FDI stock in the previous year as one of the explanatory variables. If we use the lagged endogenous as an explanatory variable we would have a dynamic panel data model and we should use a convenient estimator (say Arellano Bond, for example). However, in my case, I'm not using as exogenous the lagged endogenous (flow [yt-1-yt-2]), but the lagged stock of FDI (yt-1). Should this case be considered as a dynamic model too? Should it be estimated using Arellano&Bond or similar to avoid the inconsistency and Is there any specific alternative for this type of specification?
Relevant answer
Answer
Yes, this too is a dynamic model and applying Arellano-Bond (or the related GMM-type estimators) would be perfectly adequate.
  • asked a question related to Data Model
Question
5 answers
I would like to incorporate semi-structured surveys, satellite tracking, and eBird records into a single species distribution model, while being able to control for potential limitations and biases of each sampling approach.
See these papers for background / theory of this approach:
Relevant answer
Answer
Hi evan;
As far as i know, to control biases of tracking and surveys (whatever the method used to obtain presence data) datasets is to become them spatially and temporarily equal. That is, using standardizing your datasets to make these datasets comparable across scales.
On the other hand, given the different nature of our datasets, you can use the R package named "biomod2" which integrates different modelling approaches, namely GLM, Maxent etc for the same datasets separately or combined.
I strongly recommend you to check this out, indeed this modelling approach is helpful to control biases instead of using solely one method.
I hope I've been of help to you. However, I think that other researchers with huge backaground in ecologuical modelling could give you more useful advises.
Best regards
Jon
  • asked a question related to Data Model
Question
2 answers
Hello!
Does anybody know how to estimate variance components for GLM-models in R?
It can be easily done for ordinary linear model (e.g., using VCA package), but I am not able to find any solutions for GLMs.
I would be greatful for any advices or links, R-code is very appreciated.
Here is an example of data and model I have:
N <- 200
dummydata <- rbind(
data.frame(
incidence = sample(x = 0:5, size = N/2, replace = T),
size = 12,
Pred1 = rep(c("X1", "X2", "X3", "X4"), each = 25),
Pred2 = "T1"
),
data.frame(
incidence = sample(x = 6:10, size = N/2, replace = T),
size = 12,
Pred1 = rep(c("X1", "X2", "X3", "X4"), each = 25),
Pred2 = "T2"
))
mod <- glm(
cbind(incidence, size - incidence) ~ Pred1 * Pred2,
data = dummydata,
family = binomial)
With best regards,
Igor
Thanks in advance!
Relevant answer
Answer
It differs according to which mixed GLM you are discussing
and the references it contains
for Poisson , see
Article