Science topic

Data Model - Science topic

Explore the latest questions and answers in Data Model, and find Data Model experts.
Questions related to Data Model
  • asked a question related to Data Model
Question
2 answers
Dear Scholars,
I have stationary dependent variable and non-stationary independent variables. I employed the Panel ARDL model but also I would like to run a static panel data model too. To control country differences, I decided to use fixed effects model but I could not find proper answer about taking differences.
Should I take differences for all variables or just for non-stationary variables?
Thank you very much for your helps.
Relevant answer
Answer
only take the difference of that variables which are non stationary.
  • asked a question related to Data Model
Question
2 answers
Can someone direct me to a working link to download the Century model for SOC?
The link in the Colorado State University site below doesn't seem to work
Relevant answer
Answer
Thank you J.C !
Do you have an active link for downloading the model by any chance?
  • asked a question related to Data Model
Question
3 answers
I'm trying to get the data of loan officers from microfinance(how many borrowers they approach, loan amount outstanding, the portfolio risk, the percentage of complete repayment, etc). Can anyone suggest to me the database to use data for the panel data model?
Thank you.
Relevant answer
Answer
Data about customers are sensitive and confidential and I doubt if any bank will like to release data about customers' loans. In my country (Nigeria) there may not be possibility of getting online information to capture this data because of fraudsters. I therefore have no idea of how clearance and permission can be given by Micro finance bank to get customers data about loans for research..
  • asked a question related to Data Model
Question
27 answers
One of my big problems is finding articles that could suggest new thoughts/research to my work. Part of the problem is the amount of extraneous material (dirt) that is available. For example, when I see an abstract that is long (>about 300 words in English), I simply ignore. My experience tells me it is usually unfounded or vague or hand waving. But there is a possibility there may be a grain of something that I'm ignoring. There is also the possibility I'm missing some paper that may be valuable. Then there is all those ad-hominem statements to which I respond to just ignoring those authors. I'd like to be more effective at finding new data/models while ignoring the dirt. How can I be more effective at distributing my research?
Relevant answer
Answer
I agree here is not a consistent basis for judging a contribution. I think this is because there is a wide range of opinions - many of which are oriented not to scientific advance but to publishing with little contribution ("publish or perish"). From this comes my criteria for spending time on an article. I defend my criteria of abstract length by noting long abstract reflect a lack of clear thinking. The article can only be vague also. Perhaps, this is too harsh but I've spent a lot of wasted time before I followed the old APS criteria.
You last point is why I asked the question to myself and on RG. Exactly correct, the historical judgement is the only criteria but then it's for the future - not the present.
The idea that the valid model will prevail depends heavily on it being recognized. Part of my quest is to gain other's views for contribution into my thinking/model and, perhaps, modification of my thinking.
  • asked a question related to Data Model
Question
6 answers
i sudied a process using design of experiments. firstly, i used screening by fractional factorial design. results showed that 3 out of 5 affecting factors are significant. also i found significant curvature in model. so, i used RSM method (box-behnken) to better understand the process using the 3 selected factors. results showed that the linear model is the best model that fit the data. i have confused with the results. whats the reason that results from fractional factorial design show curvature but behave linear in RSM method?
Relevant answer
Answer
  • asked a question related to Data Model
Question
11 answers
I have panel data comprises 5 cross sections across 14 independent variables. the data time series part is 10 years. while I run the panel data model for pooled OLS and FE model it gives results while for Random effect model it shows error as RE estimation required number of cross-section>number of coefficients for between estimators for estimation of RE innovation variance. Can anyone help me how to get the results for Random effect model?
Relevant answer
Answer
You can try Mixed model (random for periods and fixed for cross-sections or vice versa), if that didn't succeed, you should take random(for period only) and Non for the cross section, or reverse. you can do this by fixed also, after that you should check your choice by Housman or Chow test(redundant test in EVEWS) .
  • asked a question related to Data Model
Question
5 answers
I am working on the development of a PMMS model. To select the best-performing tools and models, several models are needed to be developed and validated. Can this be replaced by some optimization algorithms?
Relevant answer
Answer
1. Use the downhill simplex method to minimize a function.
2. Use the BFGS algorithm to minimize a function.
3. Use the nonlinear conjugate gradient technique to minimize a function.
4. Using the Newton-CG approach, minimize the function f.
5. Use a modified Powell's approach to minimize a function.
  • asked a question related to Data Model
Question
1 answer
The 5 methods of estimating dynamic panel data models using 'dynpanel" in R
# Fit the dynamic panel data using the Arellano Bond (1991) instruments reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,4) summary(reg) # Fit the dynamic panel data using an automatic selection of appropriate IV matrix #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,0) #summary(reg) # Fit the dynamic panel data using the GMM estimator with the smallest set of instruments #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,1) #summary(reg) # Fit the dynamic panel data using a reduced form of IV from method 3 #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,2) #summary(reg) # Fit the dynamic panel data using the IV matrix where the number of moments grows with kT # K: variables number and T: time per group #reg<-dpd(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,Produc,index=c("state","year"),1,3) #summary(reg)
Relevant answer
Answer
Brown Chitekwere,
May I ask if you find your answer to share it here, please?
In addition, when I ran the model, I received the error "rbind error: "names do not match previous names." Do you have any idea about it?
I appreciate your help.
Kind regards,
Mona
  • asked a question related to Data Model
Question
8 answers
A common threshold for standardized coefficients in structural equation models is 0.1. But is this also valid for first difference models?
Relevant answer
Answer
Jochen Wilhelm I agree very much with your statement that "you should better do more research on the meaning of the variable you are actually analyzing." I think that this is generally desirable for many studies. I also agree that there is a tendency in the social sciences to overemphasize standardized coefficients and to not even report unstandardized coefficients. That is very unfortunate in my opinion, as I believe both are important and have their place.
That being said, there are fields (mine included: psychology) where we are dealing with variables that simply do not have an intuitive metric. Many variables are based on test or questionnaire sum or average scores. People use different tests/questionnaires with different metrics/scoring rules in different studies. What does it mean when, for example, subjective well-being is expected to change by 2.57 for every one-unit change in self-rated health and by 1.24 for every one-unit change in extraversion when self rated health is measured on a 0 - 10 scale and extraversion ranges between 20 and 50?
Standardized estimates can give us a better sense for the "strength" of influence/association in the presence of other predictors than unstandardized coefficients when variables have such arbitrary and not widely agreed upon metrics. The interpretation in standard deviation (SD) units is not completely useless in my opinion, especially since we operate a lot with SD units also in the context of other effect size measures such as Cohen's d. It allows us (often, not always) to see fairly quickly which variables are relatively more important as predictors of an outcome--we may not care so much about the absolute/concrete interpretation or magnitude of a standardized coefficient, but it does matter whether it is .1 or .6.
In addition, in the context of latent variable regression or path models (i.e., structural equation models), unstandardized paths between latent variables often have an even more arbitrary interpretation as there are different ways to identify/fix latent variable scales (e.g., by using a reference indicator or by standardizing the latent variable to a variance of 1). Regardless of the scaling of the latent variables, the standardized coefficients will generally be the same.
This does not mean that I recommend standardized coefficients over unstandardized coefficients. Variance dependence and (non-)comparability across studies/different populations are important issues/drawbacks of standardized coefficients. Unstandardized coefficients should always be reported as well, and they are very useful when variables have clear/intuitive/known metrics such as, for example, income in dollar, age, number of siblings (or pretty much any count), IQ scores, etc. Unstandardized coefficients are also preferable for making comparisons across groups/populations/studies that used the same variables. I would always report both unstandardized and standardized coefficients along with standard errors and, if possible, confidence intervals.
I believe there are many examples of regression or path models in psychology for which standardized coefficients were reported and that did advance our knowledge regarding which variables are more important than others in predicting an outcome.
  • asked a question related to Data Model
Question
10 answers
Can anyone suggest any ensembling methods for the output of pre-trained models? Suppose, there is a dataset containing cats and dogs. Three pre-trained models are applied i.e., VGG16, VGG19, and ResNet50. How will you apply ensembling techniques? Bagging, boosting, voting etc.
Relevant answer
  • asked a question related to Data Model
Question
4 answers
Extended/edited from an early question for clarity.
I have temporally high resolution outputs of modelled climate data (model x, spanning 0-5000 ka. Low spatial resolution 0.5 degrees). Compared to other climate models, however, I have reason to believe it is under-predicting precipitation/temperature changes at certain time intervals. Is there a way to calibrate this with better quality records (i.e., those available via WorldClim/PaleoClim)?
For example, the response to the MIS 5e (120-130 ka BP) incursion of the African Summer Monsoon and Indian Summer Monsoon into the Saharan and Arabian deserts is very weak compared to the MIS 5e data from WorldClim/PaleoClim (and corroborated by palaeoclimatic data). Can I correct/calibrate model x with these more responsive models, and how should this be done?
Relevant answer
Answer
Dear @Sam Nicholson, I'm afraid climate models do not calibrate any parameter. These models are developed considering different physical processes in terms of their equations. The number of physical processes considered, the way in which they are analytically described and also numerically implemented, depends on different factors, including the spatio-temporal discretization of the climate model (i.e. grid dimension, time step length) and the total temporal horizon to be simulated. The evolution of the state variables of climate models depends on different model forcings such as the income radiative flux affecting the modeled system through the physical processes considered by the model (e.g. vapor condensation, evaporation, moisture recycling, etc. ). Therefore, to simulate the expected paleoclimate trends, which are often revealed by different proxies, you should include the corresponding climate forcings in the model, which are supposed to generate such paleoclimate trends.
  • asked a question related to Data Model
Question
5 answers
Dear blockchain researchers,
In the classical Nakamoto Blockchain (BC) model, Transactions (TXs) are packaged in blocks and each block points to, specifically, its previous single block (I'm not gonna go into technical details here). This is a linear data model which justifies the name 'chain'. In the DAG-based BCs, TXs may, or may not, be packaged into blocks, and then each TX/ block (say 'a' ) is allowed/enforced to point to more than one parent. Consequently, several children blocks/TXs (say 'b', 'c' and 'd') are similarly allowed/enforced to randomly point later to 'a'. This is a network like data model which is obvious.
Searching in previous works, all DAG-based BCs I found adopt a many-to-many cardinality model of blocks/TXs as described above. Some do propose children must point to several parent for higher throughput and credibility. However, none of those proposed, specifically, a relaxed one-to-many parent-child dependency.
To clarify, I specifically mean that children are enforced to point to 'only one' parent, while each parent is allowed to be pointed to by several children. This leads to a tree-like DAG instead of a complicated dense network. I need some references that discuss such data modelling. Would be much beneficial if a comparison is also conducted between different types of DL data models (1-to-many vs. vs. many-to-one vs. 1to1 vs many-to-many).
Any help, explanation, or suggestions are most welcome!
Relevant answer
Answer
With no doubt that blockchain has been great so far. It brought us cryptocurrency and was the technology that set the pace for a myriad of industry-disrupting innovations. If you know what blockchain is, you probably also know that it is a kind of distributed ledger.
  • asked a question related to Data Model
Question
7 answers
Dear all,
I wanted to evaluate the accuracy of a model using observation data. My problem is the correlation of the model with observed data is really good (bigger than 0.7) but RMSE is very high too (like bigger than 100 mm in a month for monthly rainfall data). How can I explain it? the model also has low bias.
How to explain this case?
Thank you all
Relevant answer
Answer
The choice of a model should be based on underlying physical explanations first.
See the manual of my software for examples:
  • asked a question related to Data Model
Question
11 answers
The aim of my study is to investigate the impact of Integrated Reporting disclosure practices on operational performance (ROA) and firm value (Tobin's). I have applied panel data model for my analysis. Under descriptive statistics the Std. deviation of tobin's q is high i.e. 4.964. One of the reviewer commented that high std deviation of tobin's q means that variable is not normal, which may affect results. However, I have studied that normality is not required in panel data models? What justification should I give to satisfy reviewer? Please also mention some references.
Relevant answer
Answer
If you do any of the models that employ regression and particularly cointegration then the dynamic stability of your variables are more important than anything else ,this will be true for time series data. If you employ cross section data there are several econometric tests that can be performed to assure soundness of the mode and standard deviation is only one criterion that may be covered by the the various tests.
  • asked a question related to Data Model
Question
4 answers
I try to create a model in Rstudio, however, I can't find a solution. Order of my procedure is;
- data
- Shapiro-Wilk test for normality (It says; data has non-normal distribution)
- log transformation
- Shapiro-Wilk test for normality again (It says; data still have non-normal distribution)
What can I do?
Relevant answer
Answer
Thank you for your answer
  • asked a question related to Data Model
Question
3 answers
Base on Hansen(1999) we can estimate fixed effect threshold panel data model. In my model Hausman Test says it's random effect, what can I do?
Relevant answer
Answer
Hi, check also Fixed-effect panel threshold model using Stata, Qunyong Wang, The Stata Journal (2015) 15, Number 1, pp. 121–134
  • asked a question related to Data Model
Question
3 answers
HI everybody
I am trying to run the CESM-atm model, but I don't get where is the path for the data, I am attaching an image of what must be the structure of the path.
By the way, I am running this model in my personal lap, so I had to do the porting proccess before, so I don't  think that would be really the problem here.
Could anyone explain me what I must do for downloading the data for the model?
Thanks a lot!
Relevant answer
Answer
That is a good question.
  • asked a question related to Data Model
Question
6 answers
There are many models which helps us to explain in some detail certain processes or phenomena or even to predict to some point how or why or what will happen to the organism if we did... animal testing, trying out a drug or a compound. there are many models, and this does not mean they are used dfor the same reason. Some organisms are better suited than others. How many models and why is this model appropriate? I would like to start by adding a partial answer. I hope we can make a detailed answer to the question, which fully explains the reason or the main aspects on the use of organisms for a model.
A related model may be the use of E. Coli for secveral purposes. some of them require the use of variants or stains, or some need the use og genetic engineering. Is this organism considered a model despite the transformations, or is it more related to the fact it is widely used? How many strains are there, why would we use a particular strain and is it related to the case of general use of model organisms? Are the more cases where we use organisms like this? Do we classify them under the model organism as a whole? Is there anything else you would like to add to the question?
Relevant answer
interesting. I did not know. Keep going!
  • asked a question related to Data Model
Question
2 answers
I noticed that while using the gemtc package to perform a fixed effect model with likelihood = "normal" and link = "identity" (mean difference), the burn in iteration specified in mtc.run ("n.adapt") are not taken into account.
Example (with "parkinson" data):
model <- mtc.model(parkinson, likelihood='normal', link='identity', linearModel = 'fixed')
res <- mtc.run(model, n.adapt = 20000, n.iter = 75000)
summary(res)
#Results on the Mean Difference scale
#Iterations = 1:75000
#Thinning interval = 1
#Number of chains = 4
#Sample size per chain = 75000
if no specification for the linear model, a random effect is performed by default. Random effect is working, and other likelihood / link are working in both model.
Is there a way to use the package in mean difference with a fixed effect model including "burn in" interations ? Do you see any error in the way I used the likelihood='normal' / link='identity' ?
Relevant answer
Answer
Adnan Majeed, thank you for sharing. In the PDF, it's mentionned that fixed effect for a mean difference (likelihood = "normal" and link = "identity") should be working, but I noticed that "burn in" iteration are not taken into consideration (see my attached PDF). Other likelihood / link works well.
I also opened an issue on Github... I wonder if there is other way I could proceed for a fixed effect model in mean difference with that package (random effect works well) ?
  • asked a question related to Data Model
Question
7 answers
In the development of forecasting, prediction or estimation models, we have recourse to information criterions so that the model is parsimonious. So, why and when should one or the other of these information criterions be used ?
Relevant answer
Answer
You need to be mindful of what any one IC is doing for you. They can look at 3 different contexts:
(a) you select a model structure now, fit the model to the data you have now and keep using those now-fitted parameters from now on.
(b) you select a model structure now and keep that structure, but will refit the model to an expanded dataset (reducing parameter-estimation variation but not bias).
(c) you select a model structure now and keep that structure, but will continually refit the model as expanded datasets are available (eliminating parameter-estimation variation but not bias).
  • asked a question related to Data Model
Question
4 answers
I have a panel data set of 11 countries and 40 years while data is consisted of two groups developing and developed countries. The chosen method will be applied on both groups of data set separately in order to compare results of two groups. Suggestions will be appreciated.
Relevant answer
Answer
Dynamic panel models like Arellano bond (xtabond,xtdpdsys) are used for large N and small T. For small N and large T, you can use FE, or Zellner's seemingly unrelated regressions (SUR)-with Stata command sureg. or FGLS (xtgls).
  • asked a question related to Data Model
Question
5 answers
I am using transfer learning using pre-trained models in PyTorch for the Image classification task.
When I modified the output layer of the pre-trained model (e,g, alexnet) as per our dataset and run the code for seeing the modified architecture of alexnet it gives output as "none".
Relevant answer
I try to replicate your code, and I don't get "None", I just get an error when I try to do an inference with the model (see image-1). In your forward you do it:
def forward(self, xb):
xb = self.features(xb)
xb = self.avgpool(xb)
xb = torch.flatten(xb,1)
xb = self.classifier(xb)
return xb
but features, avgpool and classifier are "variables" of network, then you need to do:
def forward(self, xb):
xb = self.network.features(xb)
xb = self.network.avgpool(xb)
xb = torch.flatten(xb,1)
xb = self.network.classifier(xb)
return xb
then when I run the forward again, everything looks ok. (see Image-2)
If this not work for you, could you share your .py? I need to check the functions: to_device, evaluate, and check the ImageClassificationBase class. To replicate the error and be able to identify where it is.
  • asked a question related to Data Model
Question
10 answers
I have non-stationary time-series data for variables such as Energy Consumption, Trade, Oil Prices, etc and I want to study the impact of these variables on the growth in electricity generation from renewable sources (I have taken the natural logarithms for all the variables).
I performed a linear regression which gave me spurious results (r-squared >0.9)
After testing these time series for unit roots using Augmented Dickey- Fuller test all of them were found to be non-stationary and hence the spurious regression. However their first differences for some of them, and second differences for the others, were found to be stationary.
Now when I test the new linear regressions with the proper order of integration for each variables (in order to have a stationary model) the statistical results are not good (high p-value for some variables and low r-squared (0.25))
My question is how should I proceed now? Should i change my variables?
Relevant answer
Please note that transforming variable(s) does NOT make the series stationary, but rather makes the distribution(s) symmetrical. Application of logarithmic transformation needs to be exercised with extreme caution regarding properties of the series, underlying theory and the implied logical/correct interpretation of the relationships between the dep variable and associated selected regressors.
Reverting to your question, the proposed solution would be to use the Autoregressive Distributed Lag (ARDL) model approach, which is suitable for datasets containing a mixture of variables with different orders of integration. Kindly read the manuscripts attached for your info.
All the best!
  • asked a question related to Data Model
Question
4 answers
EDIT: Up to the literature suggested in the answers, IT IS NOT POSSIBLE because they are required at least some calibration data, which - in my case - are not available.
I am looking for a technique/function to estimate soil temperature from meteorological data only, for soils covered with crops.
In particular, I need to estimate soil temperature for a field with herbaceous crops at mid-latitudes (north Italy), but the models I found in literature are fitted for snow-covered and/or high-latitude soils.
I have daily values of air temperature (minimum, mean and maximum), precipitation, relative humidity (minimum, mean and maximum), solar radiation and wind speed.
Thank you very much
Relevant answer
Answer
  • asked a question related to Data Model
Question
6 answers
I am running an ARDL model on eviews and I need to know the following if anyone could help!
1. Is the optimal number of lags for annual data (30 observations) 1 or 2 OR should VAR be applied to know the optimal number of lags?
2. When we apply the VAR, the maximum number of lags applicable was 5, beyond 5 we got singular matrix error, but the problem is as we increase the number of lags, the optimal number of lags increase (when we choose 2 lags, we got 2 as the optimal, when we choose 5 lags, we got 5 as the optimal) so what should be done?
Relevant answer
Answer
  1. My first comment is that all cointegrating studies must be based on the economic theory (and common sense) of the system that you are examining. Your theory should suggest which variables are stationary, which are non-stationary, and which are cointegrated. Your ARDL, VECM, etc, analyses are then tests of the fit of the data to your theories. It is not appropriate to use these methodologies to search for theories that fit the data. Such results will give spurious results. Now suppose that you have outlined your theory in advance of touching your computer keyboard to do your econometrics.
  2. You have only 30 annual observations. This is on the small size for an elaborate analysis such as this. It appears that you have one dependent variable and possibly 3 explanatory variables. If you have 5 lags you are estimating about 25 coefficients which is not feasible with 30 observations.
  3. If you wish to use the ARDL methodology you must be satisfied that (1) there is only one cointegrating relationship and (2) that the explanatory variables are (weakly) exogenous. Otherwise, a VECM approach is required and you may also not have enough data for a VECM analysis.
  4. Is it possible that you would use a simpler approach? Could you use graphs or a simpler approach to illustrate your economic theory? These are questions that you alone can answer. Advanced econometrics is not a cure for inadequate data and a deficit of economic theory.
  5. If this is an academic project, consult your supervisor. While what I have set out above is correct, it may not be what your supervisor expects at this stage in your studies.
  • asked a question related to Data Model
Question
8 answers
Hello,
My friend is seeking an collaborator in psychology-related statistics. Current projects including personality traits and their relations to other variables (e.g., age). You will be responsible for doing data analysis for potential publications. Preferbably you should have some knowledge about statistics and is fimaliar with software that is used to do analysis (e.g., MATLAB, R, SPSS). 10 hours a week is required. Leave your email address if interested.
Relevant answer
Answer
Psychological Councilling data of pateints can be analysed statistically - https://www.ncbi.nlm.nih.gov/books/NBK425420/
  • asked a question related to Data Model
Question
4 answers
I'm a community ecologist (for soil microbes), and I find hurdle models are really neat/efficient for modeling the abundance of taxa with many zeros and high degrees of patchiness (separate mechanisms governing likelihood of existing in an environment versus the abundance of the organism once it appears in the environment). However, I'm also very interested in the interaction between organisms, and I've been toying with models that include other taxa as covariates that help explain the abundance of a taxon of interest. But the abundance of these other taxa also behave in a way that might be best understood with a hurdle model. I'm wondering if there's a way of constructing a hurdle model with two gates - one that is defined by the taxon of interest (as in a classic hurdle model); and one that is defined by a covariate such that there is a model that predicts the behavior of taxon 1 given that taxon 2 is absent, and a model that predicts the behavior of taxon 1 given that taxon 2 is present. Thus there would be three models total:
Model 1: Taxon 1 = 0
Model 2: Taxon 1 > 0 ~ Environment, Given Taxon 2 = 0
Model 3: Taxon 1 > 0 ~ Environment, Given Taxon 2 > 0
Is there a statistical framework / method for doing this? If so, what is it called? / where can I find more information about it? Can it be implemented in R? Or is there another similar approach that I should be aware of?
To preempt a comment I expect to receive: I don't think co-occurrence models get at what I'm interested in. These predict the likelihood of taxon 1 existing in a site given the distribution of taxon 2. These models ask the question do taxon 1 and 2 co-occur more than expected given the environment? But I wish to ask a different question: given that taxon 1 does exist, does the presence of taxon 2 change the abundance of taxon 1, or change the relationship of taxon 1 to the environmental parameters?
Relevant answer
Answer
Thank you Remal Al-Gounmeein for sharing! I think it's interesting because I have somewhat the opposite problem that this paper addresses; many people in my field use simple correlation to relate the abundance of taxa to one another, but typically those covariances can be explained by an environmental gradient. So including covariates actually vastly decreases the number of "significant" relationships. But still it's a point well-taken because explaining that e.g. taxon1 and taxon2 don't likely interact directly even though they are positively or negatively correlated would in fact require presenting the results of both models. Thanks!
  • asked a question related to Data Model
Question
2 answers
Hello,
Dpes anyone have an idea about howto analyse my panel data of exchange rate and stock markets of six countries spread over ten years. My panel data set is actually long (T greater than N) and is unbalanced. I'm initially using the pooled regression and fixed effects models and the Wald test. But while reading, I come to notice that panel data models are applied according to panel data structure. So I'm a bit confused. I will be glad if I could have more insight on which model best fit my data structure. Thanks in advance.
Relevant answer
Thanks Adnan Majeed
  • asked a question related to Data Model
Question
3 answers
I need your help and expertise on the J48 decision tree algorithm that will walk me through the data analysis and interpretation of the data model.
How the data will be consolidated? Processed? Analyzed? and interpretation of the data model.
Relevant answer
Answer
Dear @Jesie, you are talking about a supervised model. It is known as the C4.5 algorithm developed by Quinlan. This kind of model implies trainig, pruning, and test stages. The trainig will perform a tree growing up. It will split data based on the best feature choice iteratively. It will conitnue until a) there are not more data, b) the minimum volume of data in a node is not enough for a node splitting, or c) all data into a node belong to the same target class
After training the model, a pruning stage is performed to avoid the overfitting in the model. A cross-validation technique (k-folds=10) should be used for better results. You could use the weka software for analysis of this algorithm. I hope it is useful for you. (https://www.cs.waikato.ac.nz/ml/weka/).
  • asked a question related to Data Model
Question
5 answers
is there any book that can explain what kind of classical assumption or diagnostic check that need to be tested on panel data model ?
Relevant answer
Answer
Econometric analysis of cross section and panel data by Jeffrey Wooldridge is a comprehensive reading on this topic (or even the most established textbook in this field).
  • asked a question related to Data Model
Question
3 answers
I first conducted a fixed effect model using xtreg y x, fe and I found that all the variables are significant and R-squared is .51.
So I thought that maybe I should use two step system GMM to account for endogeneity. But, since I only have 3 years, when i include the lagged variable as a predictor using xtabond2 y l.y x y*, gmm ( l.y) iv (x y*), equation (level)) nodiffsargan twostep robust orthogonal small, the number of observations per group shrinks to two and I can't even run an AR(1) test or Sargan test. And Also the output shows insignificant lagged variable.
I am still new to dynamic panel data models. Do I need GMM in such small sample size and small number of observations? Should i use something else? If i only report fixed effects results would that be sufficient to be considered for publication?
I would love to hear your recommendation. Thank you very much,
Relevant answer
Answer
Why don't you try a longitudinal models? the best intro is here:
Prof. Davidian developed these models. Feel free to ask questions.
Best, D. Booth
  • asked a question related to Data Model
Question
4 answers
I wish to investigate the effects of landscape parameters on avian community assemblages in an agricultural landscape. In order to conduct modelling in ArcGIS is it advisable to use BIOCLIM data in the Model Builder?
I'm not going for prediction , rather, just want to see the effects of landscape parameters on birds' assemblage.
Relevant answer
Answer
A new source of bioclimatic variables' uncertainty has been discovered that is related to the selection of the specific month/quarter (e.g. wettest quarter in case of bio8). Please refer to this paper:
  • asked a question related to Data Model
Question
3 answers
I am having 21 json files containing more than 15 million rows with approx. 10 features in each file. I need to first convert all the json files to csv and combine all the csv files into one to have a high dimensional dataset. For now, if I load each individual json file as csv, it provides me only the max limit of excel which is 1048576 rows of data which means I am losing rest of the data. I know I can analyze them using data model and power pivot in excel. However, I need to load them first in a csv file for doing dimensionality reduction.
Any idea or suggestion on loading this much data in a csv, excel or any other accepted format which I can later use in Python?
Relevant answer
Answer
Python library pandas will be useful here, just load the CSV and use the sample for analysis. This link would be helpful
  • asked a question related to Data Model
Question
2 answers
Hello!
I estimate the influence of some components of the global competitiveness index on the index itself for 12 countries over the period of 11 years. So, in my model I have N=12 and T=11, while the number of components is equal to 32. I am facing the situation when the only model, which provides acceptable test results for my data is the 1-step dynamic panel. In my model I use log differences of selected variable. Yes, it contains lagged dependent variables, but am worried if the presence of lagged dependent variables and the acceptable test results are enough to justify the selection of dynamic panel data model.
Relevant answer
Answer
Dear Murat,
I am more than pleased!
  • asked a question related to Data Model
Question
4 answers
what useful information can be extracted from a saved model file regarding the training data.
From security perspective too. If someone has access to the saved model what information can they gain?
  • asked a question related to Data Model
Question
1 answer
I have to estimate a panel data model( 19 country and and 37 year) with xtscc command (Regression with Driscoll-Kraay standard errors), i want to know how can i choose the optimal lag for this estimation . Thank you for any suggestion .
Relevant answer
Answer
Thank you!
  • asked a question related to Data Model
Question
7 answers
I am working on this corporate panel data model: LEVERAGE_it = PROFITABILITY_it + NON-DEBT-TAX-SHIELD_it + SIZE_it + LIQUIDITY_it + TOBIN_Q_it + TANGIBILITY_it + u_it. Where:
leverage = long term debt/total assets
profitability = cash flows/total asset
non debt tax shield = depreciation/total asset
size = log (sales)
liquidity = current assets/current liabilities
tobin_q = mkt capitalization/total assets
tangibility = tangible fixed asset/total assets
What can I say about the exogeneity condition? Can I assume that expected value of the covariance between error term u_it and of X_i is zero? Why? A lot of papers make this assumption but do not explain why.
Thank in advance for your response.
Relevant answer
Answer
In general, to explore the heterogeneity in a panel data, you need to first identify the group classifications of data in line with variables of interest.
  • asked a question related to Data Model
Question
7 answers
Can I use one Sentinel image for training my model and another one for testing? Since i have two or three images, I wanted to use one or two image for training and the rest for test. However, I know 70 15 15 is the ideal proportion. But i dont know how to implement it for three images. And also, is this possible not to include 15 percent for validation? Just 70 30?
Relevant answer
Answer
I suggest you to visit the website of TensorFlow , we can find you the example to explained you how devise your datasets and how you can test your model by using of images tests one by one.
I hope that be Claire for you.@ Nima Karimi
  • asked a question related to Data Model
Question
1 answer
Dear all,
I am working on the BACON model to establish the chronology of a lake core. However, I have a question seeking help from you.
Is it necessary to add my 210Pb data into the model? If yes, how to calculate the dating error ?
Thanks,
Mingda
Relevant answer
Answer
Dear Glückler,
Thanks for your answer. Yes, considering the calculation of the dating error for 210Pb, the new rplum package give me a solution.
Regards,
Mingda
  • asked a question related to Data Model
Question
6 answers
Why some researchers, in their paper, report the results from the static panel data models (OLS, FE and RE) beside the results from the dynamic models (1st difference GMM and SYS-GMM) while they chose GMM models as the best model for the research problem.
Relevant answer
Answer
I am an applied economist; not an econometrician. So I hesitate to say that published papers in good journals are not adhering to good practice. However, having made this gesture towards humility, this is what I am about to argue. The key to determining the model specification that best conforms to the underlying process or mechanism that generates your data (the data generating mechanism) is diagnostic testing. In the case of distinguishing between a static or a dynamic model - hence, making a judgement about the underlying data generating mechanism - a simple procedure is to estimate a static panel FE model and then test the within-group residuals for serial correlation. (In Stata, this may be done by the user-written xtserial package, which implements a test devised by Jeffrey Wooldridge.) If the null of no serial correlation in the residuals is not rejected, then it is reasonable to conclude that your model does not suffer from unmodelled dynamics. In this case, estimate a static model. However, if the null is rejected, then your model has not captured the dynamics in the data. In this case, estimate a dynamic model. In effect, estimating a dynamic model is to displace the dynamics in the data from the error term (where serial correlation violates the assumptions of the estimator) into the estimated part of the model (where the dynamics are explicitly modelled and thus yield useful information - e.g. enabling long-run and short-run effects to be distinguished). If this reasoning is correct, then it is not helpful (to say the least) to report both static and dynamic models. They cannot both be well specified. Indeed, it is my observation that researchers who report both do not undertake diagnostic tests. I suspect that if diagnostic testing were to be undertaken then it would reveal the static models to be misspecified and thus - by their construction - yielding biased and inconsistent estimates . (In my experience, time-series data usually exhibit serial dependence, which needs to be taken into account in specifying econometric models). To summarise: (I) test for serial correlation; and (ii) if you find it, explain to your readers why you specify and report only a dynamic model.
By the way, the GMM is a general approach to estimation. As such, it is often applied to dynamic models but is not restricted to dynamic modelling (e.g. the GMM approach can be used to estimate static models).
  • asked a question related to Data Model
Question
11 answers
I have behavioral data (feeding latency) which is the dependent variable. There are 4 populations from which the behavioral data is collected. So population becomes a random effect. I have various environmental parameters like dissolved oxygen, water velocity, temperature, fish diversity index, habitat complexity etc. as the independent variables (continuous). I want to see which of these variables or combination of variables will have significant effect on the behavior.
Relevant answer
Answer
I agreed with A. U. Usman answer. But some other techniques like non-linear analysis, cluster analysis, factor analysis, etc. may be utilized in this regard
  • asked a question related to Data Model
Question
3 answers
Regarding interoperability of FEA tools:
1. Is Dassault Abaqus Input-file Format is widely supported by other FEA tools (such as, Ansys, Nastran, etc.)? Or every FEA tool has a specific Input file format that cannot be handled/used by a different tool?
2. Are there any interoperability issues between different versions of Nastran provided by different vendors (for instance, MSC Nastran and NX Nastran, etc.)? Or can we use the model developed in one Nastran version (e.g. MSC Nastran) easily in a different Nastran version (e.g. NX Nastran)?
Thanking you.
Relevant answer
Answer
In recent updated versions such as ANSYS 19.3, Altair hyperworks, most of the cases, all input files formats are compatible with any solver and intermediately we can move from one solver to other without much hassle.
  • asked a question related to Data Model
Question
8 answers
Obviously, the world is watching covid-19 transmission very carefully. Elected officials and the press are discussing what "the models" predict. As far as I can tell, they are talking about the SIR model: (Susceptible, Infected, Recovered). However, I can't tell if they are using a spatial model and if the spatial model they are using is point pattern or areal.This is critical because the disease has very obvious spatial autocorrelation and clustering in dense urban areas. However, there appears to be a road network effect and a social network effect. For example, are they using a Bayesian maximum entropy SIR? Or a Conditional Autoregressive Bayesian spatio-temporal model? An agent based model? Random walk?
I mean "they" generally. I'm sure different scholars are using different models, but right now I think I can find one spatio-temporal model, and what these scholars meant is that they did two cross sectional count data models (not spatial ones either) in two different time periods.
  • asked a question related to Data Model
Question
4 answers
Dear researchers,
It is several years in which OSM (Open Street Map) is developing huge amounts of spatial data all around the world. As I know some countries like Canada have reorganized their NTDB (National Topographic Data Base) data models to be harmonized with OSM data layers and being merged with them, and although they accept the ODBL licenses as Open DataBase Licence.
I am wondering if it is possible to have a list of such countries' names.
Any help will be so appreciated
Thank you very much for your time.
With Regards
Ali Madad
Relevant answer
Answer
when it comes to some kind of national, government maps, you may not find such a list
but RG is huge, maybe just people will tell you that their country is like in Canada or not
in Poland, unfortunately not, we have a geodetic map of Topographic Objects Database similar to OSM
  • asked a question related to Data Model
Question
4 answers
sir
my research topic is crowding-in and crowding-out effects of public investment on private investment in emerging Asian economies. i have panel data of 6 countries 15-years yearly data and 1 IV (public investment) , 1 DV (private investment) and 8 control variables as my panel data is small , i need your suggestion which panel data model on stata is suitable for my data.
Relevant answer
Answer
You can run Fixed Effects, Random Effects, or Pooled OLS.
  • asked a question related to Data Model
Question
1 answer
What variable(s) can be used as instruments for public health and education expenditure in testing for endogeneity in a static panel data model that regresses public health/education expenditure on economic growth? I am using random effects estimators since this is the most appropriate traditional panel technique (the Hausman test suggested this).
There are many variables in the literature that have a positive correlation with public health expenditure, for instance. However, these variables also have strong correlation with real GDP per capita growth rate and therefore are unsuitable instruments.
Relevant answer
Answer
Variables such as population growth and foreign aid are inappropriate
  • asked a question related to Data Model
Question
4 answers
Dear all,
the panel data model i am going to analyse has some stationary and non-stationary variables, and non-stationary variables are integrated of order one. what would be the best estimation method i must apply? discuss plz
Relevant answer
Answer
If the dependent variable is stationary at I(1) you can use ARDL Bounds test, but if the dependent variable is stationary at I(0), you can use an augmented Autoregressive Distributed Lag (ARDL) bounds test.
  • asked a question related to Data Model
Question
6 answers
The conventional test for the system GMM is 1) testing for instrument validity and 2) test for second order serial autocorrelation.
Are there pre-estimation tests that may be relevant i.e normality,heteroskedasticity, panel unit root tests, panel cointegration test
I do ask since almost 90% of academic papers reviewed seem to ignore these tests and stress mostly on the two .
Relevant answer
Answer
The GMM dynamic panel estimators are appropriate for large N and small T and generally pre-tests are not conducted. A large panel sample size is often used so that the Central Limit Theorem can be invoked for the asymptotic normality of coefficients even if the residuals are non-normal. Robust standard errors can be employed to deal with autocorrelation and heteroscedasticity. For a broader discussion and how to apply GMM methods in Stata see:
Roodman, D. (2009). How to do xtabond2: An introduction to "Difference" and "System" GMM in Stata. The Stata Journal, 9(1), 86-136. With very small T unit root testing would not typically be applied. If T is large GMM estimators can become unreliable because the number of instruments becomes large and the instrumented variables can be overfitted and so may not remove the endogenous components of the lagged dependent variable(s) as intended. When N is not large and T is moderate you may wish to use a bias corrected LSDV estimator to deal with dynamic panel bias, although these assume that all variables other than the lagged dependent variable are strictly exogenous. To apply a bias-corrected LSDV estimator to a potentially unbalanced dynamic panel in Stata see:
Bruno, G. (2005). Estimation and inference in dynamic unbalanced panel-data models with a small number of individuals. The Sata Journal, 5(4), 473-500. With moderate T you may wish to apply panel unit root tests. Second generation tests that deal with cross-sectional dependence are recommended. An example paper that uses Bruno's estimator and panel unit root tests is given in:
(1) Goda, T., Stewart, C. and Torres García, A. (2019) ‘Absolute Income Inequality and Rising House Prices’, forthcoming in Socio-Economic Review. An example application of GMM dynamic panel estimators without unit root testing is: Matousek, R., and Nguyen, T. N., and Stewart, C. (2017) ‘Note on a non-structural model using the disequilibrium approach: Evidence from Vietnamese banks, Research in International Business and Finance, 41, pp. 125 – 135.
  • asked a question related to Data Model
Question
7 answers
Hi, I'm testing a serial multiple mediation model with two mediators. I tried twice by testing different data but they all showed the same results that CMIN and df=0.
First, I did CFA test to ensure the validity of model and the model fit was acceptable. Second, when I was doing mediation test(I made casual model only using unobserved variables), the information of Notes for Model were shown that:
Number of distinct sample moments: 15
Number of distinct parameters to be estimated: 15
Degrees of freedom(15-15):0
Result(Default model)
Minimum was achieved
Chi-square= .000
Degrees of freedom= 0
Probablity level cannot be computed
Based on these, will this model be acceptable to test further hypotheses? Or will this model be meaningful to study? And I checked the literature by Byrne(2001).
The reference is: Byrne, B.M. 2001. Structural equation modeling with AMOS : basic concepts, applications, and programming. Mahwah, N.J. ;: Lawrence Erlbaum Associates.
It mentioned that "this kind of model is not scientifically interesting because it has no degrees of freedom and therefore can never be rejected" (Byrne, 2001). Anyone could give any comments and suggestions on this?
I think it might result from this particular type of causal relationship or coincidence? Because the CFA test of this model:
CMIN=1170.358
df=399
CFI= 0.919
TLI=0.911
SRMR=0.045
RMSEA=0.066
which might provide evidence that the data and model could match well. So, what might be the actual reasons?
Thank you all for any comments in advance!
Thanks!
Relevant answer
Answer
Holger Steinmetz Thank you so much for your consistent help. I will go on reading and studying.
  • asked a question related to Data Model
Question
1 answer
Hi, I am trying to model the effect of human perception on wildfire ignition in the United States. I want to use Google trend data to model society's perception of wildfire. Are there any similar studies that use Google trend data to model people's perception?
Relevant answer
Answer
Hi.,
Google Trends to improve your SEO:
  1. Start Big & Narrow Down. ...
  2. Focus on the Context. ...
  3. Get Advanced Insights with Specific Search Options. ...
  4. Use it for location-based targeting. ...
  5. Predicting Trends. ...
  6. Research long-tail keywords for your content. ...
  7. Use Google Trends for Video SEO. For More Info: https://www.oberlo.in/blog/google-trends
  • asked a question related to Data Model
Question
16 answers
Five major differences between data and data stores:
1. Data Lakes Retain All Data
During the development of a data warehouse, a considerable amount of time is spent analyzing data sources, understanding business processes and profiling data. The result is a highly structured data model designed for reporting. A large part of this process includes making decisions about what data to include and to not include in the warehouse. Generally, if data isn’t used to answer specific questions or in a defined report, it may be excluded from the warehouse. This is usually done to simplify the data model and also to conserve space on expensive disk storage that is used to make the data warehouse performant.
In contrast, the data lake retains ALL data. Not just data that is in use today but data that may be used and even data that may never be used just because it MIGHT be used someday. Data is also kept for all time so that we can go back in time to any point to do analysis.
This approach becomes possible because the hardware for a data lake usually differs greatly from that used for a data warehouse. Commodity, off-the-shelf servers combined with cheap storage makes scaling a data lake to terabytes and petabytes fairly economical.
2. Data Lakes Support All Data Types
Data warehouses generally consist of data extracted from transactional systems and consist of quantitative metrics and the attributes that describe them. Non-traditional data sources such as web server logs, sensor data, social network activity, text and images are largely ignored. New uses for these data types continue to be found but consuming and storing them can be expensive and difficult.
The data lake approach embraces these non-traditional data types. In the data lake, we keep all data regardless of source and structure. We keep it in its raw form and we only transform it when we’re ready to use it. This approach is known as “Schema on Read” vs. the “Schema on Write” approach used in the data warehouse.
3. Data Lakes Support All Users
In most organizations, 80% or more of users are “operational”. They want to get their reports, see their key performance metrics or slice the same set of data in a spreadsheet every day. The data warehouse is usually ideal for these users because it is well structured, easy to use and understand and it is purpose-built to answer their questions.
The next 10% or so, do more analysis on the data. They use the data warehouse as a source but often go back to source systems to get data that is not included in the warehouse and sometimes bring in data from outside the organization. Their favorite tool is the spreadsheet and they create new reports that are often distributed throughout the organization. The data warehouse is their go-to source for data but they often go beyond its bounds
Finally, the last few percent of users do deep analysis. They may create totally new data sources based on research. They mash up many different types of data and come up with entirely new questions to be answered. These users may use the data warehouse but often ignore it as they are usually charged with going beyond its capabilities. These users include the Data Scientists and they may use advanced analytic tools and capabilities like statistical analysis and predictive modeling.
The data lake approach supports all of these users equally well. The data scientists can go to the lake and work with the very large and varied data sets they need while other users make use of more structured views of the data provided for their use.
4. Data Lakes Adapt Easily to Changes
One of the chief complaints about data warehouses is how long it takes to change them. Considerable time is spent up front during development getting the warehouse’s structure right. A good warehouse design can adapt to change but because of the complexity of the data loading process and the work done to make analysis and reporting easy, these changes will necessarily consume some developer resources and take some time.
Many business questions can’t wait for the data warehouse team to adapt their system to answer them. The ever increasing need for faster answers is what has given rise to the concept of self-service business intelligence.
In the data lake on the other hand, since all data is stored in its raw form and is always accessible to someone who needs to use it, users are empowered to go beyond the structure of the warehouse to explore data in novel ways and answer their questions at their pace.
If the result of an exploration is shown to be useful and there is a desire to repeat it, then a more formal schema can be applied to it and automation and reusability can be developed to help extend the results to a broader audience. If it is determined that the result is not useful, it can be discarded and no changes to the data structures have been made and no development resources have been consumed.
5. Data Lakes Provide Faster Insights
This last difference is really the result of the other four. Because data lakes contain all data and data types, because it enables users to access data before it has been transformed, cleansed and structured it enables users to get to their results faster than the traditional data warehouse approach.
However, this early access to the data comes at a price. The work typically done by the data warehouse development team may not be done for some or all of the data sources required to do an analysis. This leaves users in the driver’s seat to explore and use the data as they see fit but the first tier of business users I described above may not want to do that work. They still just want their reports and KPI’s.
Relevant answer
Answer
I've learnt from the responses so far. Thanks for the insightful contributions
  • asked a question related to Data Model
Question
2 answers
I have done uni-axial testing on a biological tissue for 10% strain and have the data. Now I need to use the data and model the tissue in abaqus.I believe Fung-Anisotropic model suits best for the tissue.I could not find any clear reference textbooks/sources for modelling from test results.
Relevant answer
Answer
Seyed Shayan Sajjadinia Many thanks for your help!
  • asked a question related to Data Model
Question
9 answers
I was able to run an analysis on AMOS.
However, I have a low fit of my data to the model, and given time contrasts I doubt I would be able to double the number of data I got (I only have 217 responses..).
What could I do? At the moment I have a
- CMIN/DF: 18,5
-CFI: 0,46
-RMSEA: 0,285
I tried to go through the modification indices, but the only available covariance modification i could do between two 'errors' doesn't make sense and would only improve my model by 4,7.
Any suggestions?
Relevant answer
Answer
You can write me here ( message ) or to my email : Alireza.shabani@hotmial.com
  • asked a question related to Data Model
Question
2 answers
I’m working with a panel data about Foreign Direct Investments using FDI flows as endogenous and, among others, FDI stock in the previous year as one of the explanatory variables. If we use the lagged endogenous as an explanatory variable we would have a dynamic panel data model and we should use a convenient estimator (say Arellano Bond, for example). However, in my case, I'm not using as exogenous the lagged endogenous (flow [yt-1-yt-2]), but the lagged stock of FDI (yt-1). Should this case be considered as a dynamic model too? Should it be estimated using Arellano&Bond or similar to avoid the inconsistency and Is there any specific alternative for this type of specification?
Relevant answer
Answer
Yes, this too is a dynamic model and applying Arellano-Bond (or the related GMM-type estimators) would be perfectly adequate.
  • asked a question related to Data Model
Question
5 answers
I would like to incorporate semi-structured surveys, satellite tracking, and eBird records into a single species distribution model, while being able to control for potential limitations and biases of each sampling approach.
See these papers for background / theory of this approach:
Relevant answer
Answer
Hi evan;
As far as i know, to control biases of tracking and surveys (whatever the method used to obtain presence data) datasets is to become them spatially and temporarily equal. That is, using standardizing your datasets to make these datasets comparable across scales.
On the other hand, given the different nature of our datasets, you can use the R package named "biomod2" which integrates different modelling approaches, namely GLM, Maxent etc for the same datasets separately or combined.
I strongly recommend you to check this out, indeed this modelling approach is helpful to control biases instead of using solely one method.
I hope I've been of help to you. However, I think that other researchers with huge backaground in ecologuical modelling could give you more useful advises.
Best regards
Jon
  • asked a question related to Data Model
Question
2 answers
Hello!
Does anybody know how to estimate variance components for GLM-models in R?
It can be easily done for ordinary linear model (e.g., using VCA package), but I am not able to find any solutions for GLMs.
I would be greatful for any advices or links, R-code is very appreciated.
Here is an example of data and model I have:
N <- 200
dummydata <- rbind(
data.frame(
incidence = sample(x = 0:5, size = N/2, replace = T),
size = 12,
Pred1 = rep(c("X1", "X2", "X3", "X4"), each = 25),
Pred2 = "T1"
),
data.frame(
incidence = sample(x = 6:10, size = N/2, replace = T),
size = 12,
Pred1 = rep(c("X1", "X2", "X3", "X4"), each = 25),
Pred2 = "T2"
))
mod <- glm(
cbind(incidence, size - incidence) ~ Pred1 * Pred2,
data = dummydata,
family = binomial)
With best regards,
Igor
Thanks in advance!
Relevant answer
Answer
It differs according to which mixed GLM you are discussing
and the references it contains
for Poisson , see
  • asked a question related to Data Model
Question
3 answers
if I have unit root problem in both, dependent and independent variables, which technique is recommended to use to get consistent estimation? (T is bigger than N). can I employ dynamic panel data models or first differences? and which of them is better? also if some independent variables are stationary, do I leave they as level or use first differences?
Relevant answer
Answer
In your panel data N<25 and T >25, So I would like to suggest you to use Pooled OLS, FE or RE (based on the huasmen test). And also you can apply cointegration regression models such as FMOLS and Dynamic OLS. If some independent variables are stationary at level, you should use the first differences of those. (You don't need to uplaod data here.)
  • asked a question related to Data Model
Question
5 answers
dear all,
Now I try to carry out the panel data model (fixed and random effects), the Durbin-Watson stat is = 0.2 less than 2
I took the first difference for all variables in the model, but the Durbin-Watson stat
(D-W) still less than 2 it is = 0.38
The variables in the log and percentage forms, are this which do the problem?
Can I solve this problem?
can I remove auto-correlation between residuals?
thanks a lot in advance
Relevant answer
Answer
The Durbin Watson in Panel data is not effective, to fixed effect you can see
and to random effects you can see:
Good luck
  • asked a question related to Data Model
Question
3 answers
I have a panel firm level data set with N>20,000 and T=142. T is large because it's a monthly dataset. I need to employ a dynamic panel data model but I am not able to use either ARDL because it requires that T>N and I can't use GMM because T is large. Does anyone have any idea on what model to use? I was trying to use the CCEM on stata but with no success either. Any advice would be appreciated.
Relevant answer
Answer
Hope this will help you. Please see the attached document.
  • asked a question related to Data Model
Question
2 answers
how fusion of 3 different data model like RDB,semantic and big dta base takes place and how it is useful in realtime analysis
Relevant answer
Answer
Hazim Al Dilaimy What answers
  • asked a question related to Data Model
Question
4 answers
I'm trained my model with reasonable accuracy and loss functions but when I predict with model, the model give same probability for two classes
Relevant answer
Answer
@Noor
  • asked a question related to Data Model
Question
5 answers
Anyone looking for a discussion board and collaborator to mine data and model interactions in online discussions?
Relevant answer
Answer
You can now try out the discussion forum hosted in Google spreadsheet as well as downloading it into your own Google drive to host your own discussion forums. For more details, please visit http://bit.ly/forumtoolbox
If you'd like to collaborate with me to conduct research using modified versions of the discussion forum, please email me at ajeong@fsu.edu to discuss your ideas!
Sincerely,
Allan
  • asked a question related to Data Model
Question
5 answers
To build the data model in the Blockchain architecture how to represent and save data items inside the system and how to represent the whole data?.
Relevant answer
Answer
Mohammed Abdullah Al-Hagery if you go through this white paper, i think it will clear all of your questions regarding IPFS.
Thanks.
  • asked a question related to Data Model
Question
1 answer
Hi, model. Please help me to fit TGA data into model. Is it required any coading?
Good morning!!
I am Ranjeet Kumar Mishra from India. I am working on thermochemical conversion of biomass and plastics into renewable fuel and chemicals. Presently, I am working on kinetic analysis of biomass using vyazovkin model. I have TGA data but not able to fit in vyazovkin model. Please help me to fit TGA data into model. Is it required any coding?
Relevant answer
Answer
you should be able to fit your data after putting them in the correct form and calculating the required terms. you should use origin program for ease of calculation. please check the following articles:
Cross Linked Sulphonated Poly(ether ether ketone) for the Development of Polymer Electrolyte Membrane Fuel Cell
  • asked a question related to Data Model
Question
3 answers
I am using the Arellano Bond dynamic panel data GMM esimator with the 'xtdpd' command in STATA to determine the impact of health human capital on economic growth for 30 sub saharan African coutries over a period of 20 years (1995 - 2014). In an alternaive model I lagged the health variable by 10 years to account for the delayed impact of health on growth. Could the coeffecient be interpreted as a long term effect in this case?
Relevant answer
Answer
Read more articles in the area. 10 years is too much, reduce it. You can stop when the model is in best fit or passes diagnostic tests.
  • asked a question related to Data Model
Question
9 answers
I have a dataset that was passed to a StandardScaler before being passed into a classification model for training. StandardScaler was also applied to the test data for Model validation.
But, the problem comes when I have to make real world predictions with this model since, the data on which the prediction is to be made won't be scaled or standardized.
How to go about using such a model to make predictions on real world data?
Relevant answer
Answer
Dear Surojit,
You can scale your input data and the output with the method chosen by you or you can find the method the software makes it. That will allow you for inverted process i.e. scaling back the predictions to real values. It is important taking into consideration the prediction errors values. Having them calculated (MSE, MAPE or others) on the scaled values does not give much, but - after scaling back - we can say thet e.g. predicted price is ... $ with mean average percentage error of ... $. So the method of scaling should be known for scaling back purposes and than prediction errors can be shown in real values too.
Best regards
Hubert
P.S. one of my paper shows that the method of scaling can influence the predictions (end their errors) when ANN are used
  • asked a question related to Data Model
Question
2 answers
Because I see that some manuscript identified the break point in panel by considering each cross sectional units as an individual and then they can apply general method which is describe by many researchers. But I think that each cross sectional units having impact to other variables. So, break point is identify by jointly consider the variables in panel.
Relevant answer
Answer
Thanks for given a fruitful reference to identify the joint break points
  • asked a question related to Data Model
Question
3 answers
I am having problem with prediction with my model trained on resnet50. I have 10 classes of Nepali numbers from (0 ...9). I have trained the model for 100 epochs with around 40,000 data . What is the issue with my model? I am having overfit? Or, the model I have used to train is just too complex for 10 class. I have also tried this model to predict on the training set but the prediction accuracy is very very poor around (10%).
Relevant answer
Answer
I think you are using few samples to learning a ResNet network. In particular, this network requires a lot of samples during the training, due to its high capacity. I had similar problems in CIFAR-10 using ResNet 50, and when I used data argumentation, the network works fine. Finally, I recommend you to decrease the learning rate, iteratively, after some epochs, for instance, after 80, 120, 160, 180 epochs.
  • asked a question related to Data Model
Question
6 answers
I am interested in looking at the effect of land and wealth inequality (alongside a number of other explanatory variables) on the process of agricultural expansion in Latin America over 1990-2010. Data on land and wealth distribution are however sparse and available only for one point in time. Although there is some evidence that both land and wealth concentration have not varied significantly over 1990-2010, I was wondering if there would be a better way than panel data to analyse the data.
Thanks to everybody for your help.
Relevant answer
Answer
Thank you Moreno. I am actually using a dynamic panel now (arellano-bond system GMM). I think it addresses the data limitations problems.
  • asked a question related to Data Model
Question
1 answer
This is my final year research topic.And I want to know about How good is the quality of information which emergent through OSM data model and ordnance survey is for representing VGI
Relevant answer
Answer
An array of studies investigated OSM data and assessed the geometric, attributive and temporal accuracy, and completeness of the mapped features. Besides intrinsic approaches, most of these studies compare OSM data to established commercial or official geographic data on road networks [29–31], buildings [32], and land use data [33–35]. Their results show, first, that OSM data is only slightly inferior to official/commercial data in terms of accuracy. Second, OSM data completeness increases at a rapid rate and is assumed to have reached or exceeded the level of completeness of commercial data in the meantime. Third, the completeness of OSM is positively correlated to population density and can be considered to be particularly suitable for the spatial analysis of urban areas.
29. Haklay, M. How good is volunteered geographical information? A comparative study of OpenStreetMap and ordnance survey datasets. Environ. Plan. B Plan. Des. 2010, 37, 682–703. [CrossRef]
30. Girres, J.F.; Touya, G. Quality assessment of the French OpenStreetMap dataset. Trans. GIS 2010, 14, 435–459. [CrossRef]
31. Neis, P.; Zielstra, D.; Zipf, A. The street network evolution of crowdsourced maps: OpenStreetMap in Germany 2007–2011. Futur. Internet 2011, 4, 1–21. [CrossRef]
32. Hecht, R.; Kunze, C.; Hahmann, S. Measuring completeness of building footprints in OpenStreetMap over space and time. ISPRS Int. J. Geo-Inf. 2013, 2, 1066–1091. [CrossRef]
33. Arsanjani, J.J.; Mooney, P.; Zipf, A.; Schauss, A. Quality assessment of the contributed land use information from OpenStreetMap versus authoritative datasets. In OpenStreetMap in GIScience: Experiences, Research, and Applications; Arsanjani, J.J., Zipf, A., Mooney, P., Helbich, M., Eds.; Springer: Heidelberg, Germany; New York, NY, USA; Dordrecht, The Netherlands; London, UK, 2015; p. 324.
34. Arsanjani, J.J.; Vaz, E. An assessment of a collaborative mapping approach for exploring land use patterns for several European metropolises. Int. J. Appl. Earth Obs. Geoinf. 2015, 35, 329–337. [CrossRef]
35. Dorn, H.; Törnros, T.; Zipf, A. Geo-Information comparison with land use data in Southern Germany. Int. J. Geo-Inf. 2015, 4, 1657–1671. [CrossRef]
Hope that helps!
  • asked a question related to Data Model
Question
4 answers
May I know what is the minimum acceptable R square or the range of R square (0.XX) for biological wastewater treatment model ?
I am evaluating the performance of lab-scale bioreactor in treating phenol containing wastewater. While building a model based on the performance I obtained i.e., phenol removal efficiency versus the variables, I am not sure what is the acceptable range of the R square in the model. Of course, the higher the R square the better the model. However, there is an acceptable R square of model in specific field. So, I would like to know the proper range and minimum R square for the model under the aforementioned biological wastewater treatment approach. Thank you.
Relevant answer
Answer
Hello Chingyeh, R2 alone cannot tell a good model. I recommend you look at the adjusted R2 which will correct or adjust the model with respect to the variables. Notice that if the variable does not correct significantly to the model, the adjusted R2 will reduce but the ordinary R2 will continue to increase as more predictors are incorporated into the model. Very importantly, look at the lack-of-Fit test result. If it is insignificant, then you have a good model. Still there is a last step where you have to test your model. If your model is for predictions, then look at the pred R2 which does not have to be too different from the adj R2. For a good range of adjusted R2, consider 0.8-0.99. Be careful with overfitted models.
  • asked a question related to Data Model
Question
6 answers
Constraints: Data non-availability for other independent variables for older years.
Relevant answer
Answer
Let's separate the technical issue from the economics one, which I think is what you want to know. Adding more data, such as going to quarterly (GDP data are not usually available in monthly frequency) data only addresses the technical issue of estimation without addressing the economic question. The average post-WWII business cycle in the U.S. is 70 months, or a little less than 6 years. So, 7 years of data will not be adequate to capture an average U.S. business cycle. It will most certainly not be able to capture any growth pattern. In order to have meaningful interpretation of your results, you will need to have a longer time span of data, and not just more data.
  • asked a question related to Data Model
Question
7 answers
I am searching for a suitable regression method, when the dependent variable is a fraction or proportion. I am using an unbalanced panel data set.
Relevant answer
Answer
Yes the write answer is
fractional probit model can be used to analyze this kind of data.