Science topic

Regression - Science topic

Explore the latest questions and answers in Regression, and find Regression experts.
Questions related to Regression
  • asked a question related to Regression
Question
3 answers
If we have a research (analysis of factors affecting on sustainable agriculture...) in order to analyze its data, most previous researches have used techniques such as regression. To identify effictive factors, Is it possible to use the exploratory factor analysis technique?
Relevant answer
Answer
We use richness, biodiversity, Simpson indices and saturation curves.
We analyze ecological, spatial, social, family economy and culture variables.
  • asked a question related to Regression
Question
4 answers
Hi,
How do I interpret a significant interaction effect between my moderator (Coh) and independent variable (Hos)? The literature states Hos and my dependent variable (PDm) has a negative relationship. The literature also states the moderator (Coh) has a positive relationship with the DV (PDm). My regression co-efficient for the interaction effect is negative. Does this mean Coh is exacerbating the negative effect (i.e., making it worse) or weakening the effect (i.e., making it better)?
I have attached the SPSS output and simple slopes graph.
Thank you!
Relevant answer
Answer
My take on is that the predictor, Hos, has no significant main effect whereas Coh has a positive main effect and the interaction is significant and negative. The upshot is that, although the effect of Hos looks negative as per the literature, this is only significant at higher levels of Coh (which shows a very strong positive relationship to the criterion, PDm).
I would expect such an outcome with an underpowered analysis, but this seems unlikely with an N of 349.
  • asked a question related to Regression
Question
2 answers
Hello, I am trying to analyze factors that influence the adoption of technology, and while doing that, I am facing issues with rbiprobit estimation. I have seven years (2015-2021) balanced panel data that contains 2835 data. The dependent variable y1 (Adopt2cat), the endogenous variable "BothTechKnowledge," and the instrumental variable "SKinfoAdoptNew" takes value 0 and 1. Although the regression works, I am unsure how to include panel effects in the model. I am using follwing codes: rbiprobit Adopt2cat ACode EduC FarmExp HHCat LaborHH LandSizeDec LandTypeC landownership SoilWaterRetain SoilFertility CreditAvail OffFarmCode BothTechAware IrriMachineOwn, endog(BothTechKnowledge = ACode EduC FarmExp HHCat LaborHH LandSizeDec LandTypeC landownership SoilWaterRetain SoilFertility CreditAvail OffFarmCode BothTechAware IrriMachineOwn SKinfoAdoptNew) rbiprobit tmeffects, tmeff(ate) rbiprobit margdec, dydx(*) effect(total) predict(p11) If we do not add time variables (year dummy), can we say we have obtained pooled panel estimation? I kindly request you to please guide me through both panel and pool panel estimation procedures. I have attached the Data file for your kind consideration. Thank you very much in advance. Kind regards Faruque
Relevant answer
Answer
Thank you very much Mr. Usman for your kind reply. It would be great help if you could kindly share the the code.
  • asked a question related to Regression
Question
5 answers
I have 4 groups in my study and I want to analyse the effect of treatment in 4 groups at 20 time points. Which test should I chose?
Relevant answer
Answer
If I did understand your question correctly, I will suggest to you to use RCBD , at the
same time you still have the chance to analyze data for regression, for each 20 points and/or 80 collected together. Regards.
  • asked a question related to Regression
Question
7 answers
I did principal component analysis on several variables to generate one component measuring compliance to medication but need understanding on how to use the regression scores generated for that component.
Relevant answer
Answer
Nicco Lopez Tan thanks so much
  • asked a question related to Regression
Question
2 answers
How can I ensure random sampling for customer surveys when the sampling frame is unavailable but needs to run a regression?
Relevant answer
Hi,
You can try this,
SD for the Average = SQRT[(SD1^2 + SD2^2 + SD3^2) / 3]
Thanks.
  • asked a question related to Regression
Question
1 answer
I had a few quick questions regarding the output generated by FEAT statistics. I'm currently working with resting-state data and attempting to perform nuisance regression of CSF, WM, Global Signal, motion parameters (standard + extended), and also scrub volumes that exceed a specific threshold of motion using FEAT Statistics. To scrub specific volumes with excessive motion I generated a confound.txt file that includes columns of 0 each with a single 1 indicating the specific volume that needs to be scrubbed. I selected Standard + Extended Motion Parameters to apply the motion parameters generated during FEAT preprocessing. Additionally, I applied CSF, WM, and Global signal nuisance regressors under full model setup by selecting Custom (1 entry per volume) and including three separate .txt files, each including 1 column of average values per volume (for CSF, WM, or Global). Doing so generated the attached design.png and res4d image. Is this the correct way to perform nuisance regression? If so, does the output res4d image look correct? It is very difficult to see the actual image relative to the background. Furthermore, is res4d the right image that I should be using if my goal is to extract the time series of ROIs within this fully processed resting state data? Any help is very much appreciated! Best, Brandon
Relevant answer
Answer
Dear Brandon,
The reason your residual map is hard to visualize is that typically the data is demeaned when fitting the GLM. So one strategy to visualize your data as a brain could be to save the mean of the data in time before running the denoising and then adding it back to the residuals afterwards for visualization purposes.
Regarding your denoising strategy, it seems fine to me, even if there are some cons of scrubbing (see https://neurostars.org/t/despiking-vs-scrubbing/2157/9 ). What are regressors 3-6 in your design matrix? They look surprisingly regular to me.
Also, usually you might want to also apply a bandpass filter in this process, which you can do in FEAT.
I hope this helps!
  • asked a question related to Regression
Question
1 answer
I perfomed 2SLS ,
In the robust version i found the endogeneity , but did not found in non robust version.
The results in robust version is validy? need your help.
Non Robust options
Tests of endogeneity
H0: Variables are exogenous
Durbin (score) chi2(1) = .242302 (p = 0.6225)
Wu-Hausman F(1,613) = .227544 (p = 0.6335)
. estat overid
Tests of overidentifying restrictions:
Sargan (score) chi2(1) = .035671 (p = 0.8502)
Basmann chi2(1) = .033487 (p = 0.8548)
. estat firststage, all
First-stage regression summary statistics
--------------------------------------------------------------------------
| Adjusted Partial
Variable | R-sq. R-sq. R-sq. F(2,613) Prob > F
-------------+------------------------------------------------------------
TURN_1 | 0.1681 0.1152 0.0632 20.6714 0.0000
--------------------------------------------------------------------------
Shea's partial R-squared
--------------------------------------------------
| Shea's Shea's
Variable | partial R-sq. adj. partial R-sq.
-------------+------------------------------------
TURN_1 | 0.0632 0.0036
--------------------------------------------------
Minimum eigenvalue statistic = 20.6714
Critical Values # of endogenous regressors: 1
H0: Instruments are weak # of excluded instruments: 2
---------------------------------------------------------------------
| 5% 10% 20% 30%
2SLS relative bias | (not available)
-----------------------------------+---------------------------------
| 10% 15% 20% 25%
2SLS size of nominal 5% Wald test | 19.93 11.59 8.75 7.25
LIML size of nominal 5% Wald test | 8.68 5.33 4.42 3.92
---------------------------------------------------------------------
Robust options
  • Tests of endogeneity
  • H0: Variables are exogenous
  • Robust score chi2(1) = 2.99494 (p = 0.0835)
  • Robust regression F(1,613) = 2.77036 (p = 0.0965)
  • . estat overid, forcenonrobust
  • Tests of overidentifying restrictions:
  • Sargan chi2(1) = .035671 (p = 0.8502)
  • Basmann chi2(1) = .033487 (p = 0.8548)
  • Score chi2(1) = .514465 (p = 0.4732)
  • . estat overid
  • Test of overidentifying restrictions:
  • Score chi2(1) = .514465 (p = 0.4732)
  • . estat firststage, all
  • First-stage regression summary statistics
  • --------------------------------------------------------------------------
  • | Adjusted Partial Robust
  • Variable | R-sq. R-sq. R-sq. F(2,613) Prob > F
  • -------------+------------------------------------------------------------
  • TURN_1 | 0.1681 0.1152 0.0632 13.7239 0.0000
  • --------------------------------------------------------------------------
  • Shea's partial R-squared
  • --------------------------------------------------
  • | Shea's Shea's
  • Variable | partial R-sq. adj. partial R-sq.
  • -------------+------------------------------------
  • TURN_1 | 0.0632 0.0036
  • --------------------------------------------------
Relevant answer
The difference between robust and non-robust versions in regression analysis typically refers to the handling of heteroskedasticity in the error terms.
Robust Version: Adjusts for heteroskedasticity in the error terms. If you find evidence of endogeneity using a robust version, it means that after accounting for non-constant variance in the residuals (heteroskedasticity), there's still a correlation between the independent variable and the error term.
Non-Robust Version: Assumes constant variance in the error terms (homoskedasticity). If you don't find evidence of endogeneity in the non-robust version, it means the initial test did not detect correlation between the independent variable and the error term under the assumption of homoskedasticity.
In your 2SLS regression, the robust test indicating endogeneity suggests that after accounting for potential heteroskedasticity, the instrument may not be as valid as assumed. The non-robust result can be misleading if there's heteroskedasticity in the model. It's generally safer to trust the robust version, especially if there's a reason to suspect non-constant variance in the residuals.
  • asked a question related to Regression
Question
5 answers
When the results of correlation and regression are different, which one should I rely on more? For example, if the correlation of two variables is negative, but the direction is positive in regression or path analysis, how should I interpret the results?
Relevant answer
Answer
Could you give us some data please, so we can have a look what's going on?
  • asked a question related to Regression
Question
2 answers
I am doing landuse projection using the Dyna-CLUE model, but I am stucked with the error "Regression can not be calculated due to a large value in cell 0,478". I would appreciate any advice you can provide to solve this error.
Relevant answer
Answer
Hi Mellinia, were you able to solve the problem? if not, send me your email. Thank you
  • asked a question related to Regression
Question
1 answer
Global Project: Should we start developing the SIT-USE?
Software Immune Testing: Unified Software Engine (SIT-USE)
Toward Software Immune Testing Environment
Would you like to be part of the funding proposal for SIT-USE?
Would you like to participate in the development of the SIT-USE?
Would you like to support the development of HR SIT-USE?
Keywords: Funding Proposal or Funding, Participation, Support
If you answer yes to any of the questions, don't hesitate to get in touch with me at
info.aitg@aeehitg.com and write in the subject – The keyword(s)
Despite much progress and research in software technology, testing is still today's primary quality assurance technique. Currently, significant issues in software testing are:
1) Developing and testing software is necessary to meet the new economy market. In this new market, delivering the software on time is essential to capture the market. Software must be produced on time and be good enough to meet the customer's needs.
2) The existing software requirements keep changing as the project progresses, and in some projects, the rate of requirement changes can grow exponentially as the deadline approaches. This kind of rapid software change imposes significant constraints on testing because once a software program changes, the corresponding test cases/scripts may have to be updated. Furthermore, regression testing may have to be performed to ensure that those parts that are supposed to remain unchanged are indeed unchanged.
3) The number of test cases needed is enormous; however, the cost of developing test cases is extremely high.
4) Software development technologies, such as object-oriented techniques, design patterns (such as Decorator, Factory, Strategy), components (such as CORBA, Java's EJB and J2EE, and Microsoft's. NET), agents, application frameworks, client-server computing (such as socket programming, RMI, CORBA, Internet protocols), and software architecture (such as MVC, agent architecture, and N-tier architecture), progress rapidly, while designing and programming towards dynamic and runtime behavior. Dynamic behavior makes software flexible but also makes it difficult to test. Objects can now send a message to another entity without knowing the type of object that will receive the news. The receiver may be just downloaded from the Internet with no interface definition and implementation. Numerous testing techniques have been proposed to test object-oriented software. However, testing technology is still far behind software development technology.
5) Conventional software testing is generally application-specific, rarely reusable, and is not extensible. Even within a software development organization, software development, and test artifacts are developed by different teams and are described in separate documents. These make test reuse difficult.
As a part of this research, we plan to work toward an automated and immune software testing environment that includes 1. Unified Component-Based Testing (U-CBT); 2. Unified Built-In Test (U-BIT); 3. Unified-End-to-End (U-E2E) Testing; 4. Unified Agent-Based Testing U-ABT); 5. Unified Automatic Test Case Generators (U-ATCG); and 6. Unified Smart Testing Framework (U-STF). The development of this environment is based on the software stability model (SSM), knowledge map (KM): Unified Software Testing (KM-UST), and the notion of software agents. An agent is a computational entity evolving in an environment with autonomous behavior, capable of perceiving and acting on this environment and communicating with other agents.
You are invited to join Unified Software Engineering (USWE)
Relevant answer
Answer
It could help improve the detection and prevention of software vulnerabilities. Few factors:
1. Research and feasibility: Conduct thorough research to understand existing approaches, tools, and techniques related to software immune testing. Evaluate the feasibility of developing a unified software engine and consider the potential challenges and limitations that may arise.
2. Market demand and competition: Assess the market demand for a software immune testing tool. Investigate if similar tools or solutions already exist and analyze their features and limitations. Consider whether there is a need for a unified software engine like SIT-USE and how it would differentiate itself from existing solutions.
3. Resources and expertise: Determine if you have the necessary resources, including skilled developers, researchers, and domain expertise, to undertake the development of SIT-USE. Developing a robust and effective software testing tool requires significant time, effort, and expertise in areas such as software security, testing methodologies, and programming.
4. Collaboration and partnerships: Consider collaborating with experts or organizations specializing in software security or immune system-inspired testing. Partnering with experts in the field can provide valuable insights, guidance, and potential support for the development process.
5. Sustainability and maintenance: Evaluate the long-term sustainability and maintenance of the software immune testing tool. Consider factors such as updates, bug fixes, support, and staying up-to-date with emerging security threats and technologies.
Good luck
  • asked a question related to Regression
Question
5 answers
The objective here is to determine factor sensitivities or slope coefficients in a multiple ols regression model.
Relevant answer
Answer
In a multivariate regression model, where you have multiple predictor variables and a single outcome variable, you do not necessarily need to conduct separate independent simple regressions for each predictor variable before constructing the multivariate regression model. The multivariate regression model inherently takes into account the relationships between all the predictor variables and the outcome variable simultaneously.
In fact, conducting separate independent simple regressions for each predictor variable can lead to issues like omitted variable bias and incorrect interpretation of the individual effects. This is because simple regressions do not account for the potential correlations or interactions between predictor variables.
In a multivariate regression analysis, all predictor variables are included in the model at the same time, and the coefficients associated with each predictor are estimated while controlling for the presence of the other predictors. This approach allows you to examine the unique contributions of each predictor variable while considering the joint influence of all predictors on the outcome variable. Additionally, it can help to account for potential multicollinearity among the predictor variables.
So, instead of conducting separate independent simple regressions, you can directly use a multivariate regression model to analyze the relationships between multiple predictors and a single outcome variable. This approach provides a more comprehensive and accurate understanding of the relationships among variables.
  • asked a question related to Regression
Question
4 answers
I am doing a research project to study the determinants of capital structure. However, I've run into two issues.
After downloading data from Compustat, I noticed there are a lot of missing values amongst the data, and I wonder how I can deal with this data? How is it usually done in finance literature?
The other problem I came across is strange to me, and one of my variables, Interest expsense, includes zero values and sometimes also negative values, which not only does not make sense but also poses issues in calculating coverage ratio. What do you suggest me to do in this case?
I highly appreicate your response.
Best
Saeed
Relevant answer
Answer
Missing data is a common problem in financial data. There are several methods to handle missing data in financial data. One approach is to replace the missing value with a constant value. This can be a good approach when used in discussion with the domain expert for the data we are dealing with. Another approach is to replace the missing value with the mean or median. This is a decent approach when the data size is small but it does add bias. A third approach is to replace the missing value with values by using information from other columns.
  • asked a question related to Regression
Question
1 answer
Recently, I was contacted by a professor who wanted to utilize my PyCaret book in his research. Considering that I support scientific advancement in every way possible, I was happy to collaborate with that person. Furthermore, I have decided to freely provide my book to other researchers interested in utilizing it. Here are the topics covered in the book:
• Regression
• Classification
• Clustering
• Anomaly Detection
• Natural Language Processing
• Time Series Forecasting
• Developing Machine Learning Apps with Streamlit
If you want to acquire the book for research purposes, I encourage you to send some information about your project, so we can discuss this further. You can check the link below for more information, and leave a comment below if you have any questions!
𝗦𝗶𝗺𝗽𝗹𝗶𝗳𝘆𝗶𝗻𝗴 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗣𝘆𝗰𝗮𝗿𝗲𝘁: https://leanpub.com/pycaretbook/
Relevant answer
Answer
There are several machine learning books that can provide you with the theoretical foundations, practical techniques and recent development of machine learning.
While selecting, look out for
- The book that cover a wide range of topics, from the basics to the advanced, and from the classical to the modern. This way, you can have a comprehensive overview of the field and explore different aspects of machine learning.
- The book should also provide clear explanations, proofs, and relevant examples. To help you understand the concepts, the methods, and the results of machine learning, and apply them to your own research problems.
  • asked a question related to Regression
Question
1 answer
In G Power - which should I use for a hierarchical regression with continuous variables? t test - linear multiple regression: fixed model, single regression coefficient OR OR f test - linear multiple regression: fixed model, R2 deviation from zero. What is the difference?
Relevant answer
Answer
Hello Bonnie,
The answer depends on which step in the hierarchy you're trying to evaluate.
1. If your base model includes only the target variable(s) (as in an "unadjusted," "raw," or "crude" model), then use: Fixed model, R2 deviation from zero.
2. If your base model includes only "control" variables, then again, use Fixed model, R2 deviation from zero.
3. If your second-tier model is intended to look at the added explanatory power of one or more target variables, given the "control" or previously entered variables, then use Fixed model, R2 increase.
Good luck with your work.
  • asked a question related to Regression
Question
3 answers
THANKS
Relevant answer
Answer
The criteria to choose between seemingly unrelated regression (SUR) and generalized method of moments (GMM) in EViews 13 depends on the research question you are trying to answer and the assumptions you are willing to make. The SUR method estimates the parameters of the system, accounting for heteroskedasticity and contemporaneous correlation in the errors across equations. The GMM method is a general method for estimating parameters in models where some of the assumptions made by maximum likelihood estimation are not met.
  • asked a question related to Regression
Question
10 answers
i am runing an instrumental variable regression.
Eviews is providing two different models for instrumetenal variables i.e., two-stage least squares and generalized method of moments.
how to choose between the two models.
thanks in advance
Relevant answer
Answer
Least squares method is, as I have been experienced, more convenient than the method of moments.
In the method of moments, first you have to derive the theoretical moments up to order p if the regression equation consists of p parameters in order to obtain p equations and then to solve the p equations substituting the values of the moments obtained from the sample in the equations.
Derivation of theoretical moments is more difficult than to set up normal equations (required in least squares method) since it depends upon the nature of the probability distribution followed by the parent population of the sample.
  • asked a question related to Regression
Question
2 answers
I need help on how to run RIDGE REGRESSION in EViews. I have installed the add-in in EViews, but am having problems running the regression. Could someone, please help me with the step-by-step video (or even explanation, on how to do this. I am facing deadline.
I will sincerely appreciate a timely response.
Shalom.
Monday
Relevant answer
Answer
Install EViews,
  • Open it and load your data.
  • Click on 'Quick' and select 'Estimate Equation'.
  • Select 'Ridge Regression' from the list of estimation.
  • Choose dependent and independent variables.
  • set the value of the ridge parameter.
  • click on OK to run the regression.
  • asked a question related to Regression
Question
6 answers
I am working on a SEM model using Mplus. The model includes 2 latent factors each with about 4 dichotomous indicators. The latent factors are regressed onto 5 exogenous predictors (also dichotomous). A dichotomous outcome is, in turn, regressed onto the 2 latent factors. I used WLSMV to estimate the model, which is recommended when the latent factor indicators are dichotomous.
The model fits well but my understanding is that Mplus uses probit regression for the DV and latent factors. And I am not very familiar with how to interpret probit results. So I do not know how to interpret the parameter estimates (the indicator coefficients for each latent factor; the exogenous coefficients for those variables after regressing the latent factor on them; and the coefficients for the DV regressed onto the latent risk factors).
Can anyone point me towards reference material that might walk me through how to interpret (and write-up) the results of this modeling?
Thanks for any help.
James
  • asked a question related to Regression
Question
2 answers
Regression Analysis
Relevant answer
Answer
What is the meaning of categorical more than 2(ordinal)?
But still, If a variable is categorical and has more than two categories, You should use that variable as a dummy variable or indicator variable. Mahfooz Alam
  • asked a question related to Regression
Question
14 answers
Hello
I am searching for the Panel smooth transition regression Stata code.
Does anyone know of any available code for Stata?
Thank you
Relevant answer
Answer
You can use the “PSTR” package to implement the Panel Smooth Transition Regression model in Stata. The package offers tools for conducting model specification tests, estimating the PSTR model, and evaluating the results
  • asked a question related to Regression
Question
3 answers
Hi everyone
I am using package "XTENDOTHRESDPD" to run a Dynamic panel threshold regression in Stata which is provided here: https://econpapers.repec.org/software/bocbocode/s458745.htm
However, I have the following issue which I could not solve.
To see whether the threshold effect is statistically significant, I am running "xtendothresdpdtest", function after the regression result and I am getting this Error:  "inferieurbt_result not found."
I would really appreciate it if you could guide me in case you have any experience with this function.
Relevant answer
Answer
You can run “xtendothresdpdtest” after using “XTENDOTHRESDPD” in Stata by typing the following command in the Stata command window:
xtendothresdpdtest
This command will test for the statistical significance of the threshold effect in your regression model. If you are getting an error message when running this command, it may be due to a problem with your data or your model specification. You may want to check your data and model specification to ensure that they are correct.
  • asked a question related to Regression
Question
7 answers
sing STATA or R, how can we extract intra class correlation coefficients (ICCs) for Multilevel Poisson and Multilevel Negative Binomial Regression?
Relevant answer
Answer
Thank you Mosharop Hossian for your details
  • asked a question related to Regression
Question
2 answers
Dear all
I have a set of balance panel data, i:6, t: 21 which is it overall 126 observation. I decided that 1 dependent variable (y) and 6 independents variables (x1,x2......).
First: I do unit root test it shows:
y I(I)
x1 I(0)
x2 I(I)
x3 I(I)
x4 I(0)
X5 I(I)
x6 I(0)
If I would like to run panel data regression (Pooled, Fixed Effect and Random Effect), is that the correct form for inputting the model in Views:
d(y) c x1 d(x2) d(x3) x4 d(x5) x6
or
Shall I sort all variables in the same difference level, adding "d" to all ?
please correct if I am wrong, these are the steps I would like to conduct the statical part of a panel data:
1. Test Unit Root
2. Panel Regression?
3. ARDL
Relevant answer
Answer
If the data is in different stationary levels, you can still write the model in Eviews by following these steps:
  1. Open the Eviews program.
  2. Load the data you want to use for your model.
  3. Click on the “Quick” menu and select “Estimate Equation”.
  4. In the Equation Specification window, select the variables you want to include in your model.
  5. Click on the “Options” button.
  6. In the Options window, select the appropriate option for handling non-stationary variables (e.g., first differences).
  7. Click on the “OK” button to close the Options window.
  8. Click on the “OK” button to run the model.
In summary, if the data is in different stationary levels, you can still write the model in Eviews by opening the Eviews program, loading the data you want to use for your model, clicking on the “Quick” menu and selecting “Estimate Equation”, selecting the variables you want to include in your model, clicking on the “Options” button, selecting the appropriate option for handling non-stationary variables (e.g., first differences), clicking on the “OK” button to close the Options window, and clicking on the “OK” button to run the model.
  • asked a question related to Regression
Question
3 answers
Hello everyone,
In order to compare two clinical methods, we usually use Passing & Bablok (PABA) regression. Most of the time, our samples are larger than n=50, but for the comparison I'm interested in today (method A vs method B), the samples are small (n = 10-15).
The PABA regression validates the equivalence between the two methods (method A vs method B). Indeed, the CI intercept crosses 0 and CI slope crosses 1 :
  • Intercept = -6 and confidence intervalle (CI) = [-56 ; 31]
  • Slope = 2. and confidence intervalle (CI) = [0,5 ; 4]
However, I have a few points of concern about these results because :
  • The Pearson coefficient is low (r = 0,63),
  • The size of the CI is very large,
  • The coefficient of variation (CV) between the two methods is high (CV > 20%).
Do you know of any criteria or rules that I could add to the analysis of PABA regression that would enable me to improve our validation method ?
Thanks in advance for your help ! :)
Relevant answer
Answer
Bonjour Morgane,
Attention, la méthode de Passing-Bablock donne des intervalles de confiance assez approximatifs surtout pour de faibles échantillons. Par ailleurs, le fait de regarder s'ils contiennent 0 ou non (pour l'ordonnée à l'origine) et 1 ou non (pour la pente) ne permet pas de prouver l'équivalence des méthodes : s'ils ne les contiennent pas, on peut raisonnablement rejeter l'équivalence des méthodes, mais sinon on ne peut rien dire — cf. la théorie des tests d'équivalence.
En particulier, s'ils sont très larges (ce qui se produira sur des données de faible taille et assez bruyantes ou pas du tout alignées sur une droite), on n'arrivera presque jamais à rejeter l'équivalence simplement par manque de puissance. C'est sans doute ce qu'il se passe dans votre cas.
Que donne une représentation graphique ?
Avec un peu plus de contexte, on peut vous guider sur les méthodes d'étude de concordance...
  • asked a question related to Regression
Question
1 answer
I discovered that three independent variables have standardized multiple coefficients (SMC) equal to 1.02 on a dependent variable. Are there any approaches to be considered for modifying a high R2 in a regression?
Relevant answer
Answer
HI,
Do you mean that each independent variable has the same value of 1.02 or the general multiple linear regression model has this beta value. In regression check both predictive R squared value and Adjusted R squared value especially when there are many predictor variables.
Check for your residual plots, without seen the residual plots it is hard to recommend a way to better fit your data to the model
  • asked a question related to Regression
Question
3 answers
I have conducted some ordinal logistic regressions, however, some of my tests have not met the proportional odds assumptions so I need to run multinomial regressions. What would I have to do to the ordinal DV to use it in this model? I'm doing this in SPSS by the way.
Relevant answer
Answer
Hello Hannah Belcher. How did you determine that the proportional odds (aka., parallel lines) assumption was too severely violated? And what is your sample size?
I ask those questions, because the test of proportional odds is known to be too liberal (i.e., it rejects the null too easily), particularly as n increases. You can find some relevant discussion of this (and many other issues) in this nice tutorial:
HTH.
  • asked a question related to Regression
Question
6 answers
Using SPSS, I've studied linear regression between two continous variables (having 53 values each), I've got a p-value of 0.000 which means no normal distribution, should I use another type of regression?
Relevant answer
Answer
Mohamed Amine Ferradji , you might clarify your question. You got a p-value of 0.000 for what test ? And to what did you apply this test ?
  • asked a question related to Regression
Question
8 answers
In carrying out panel data regression analysis, it is required that Hausman Specification Test be carried out to choose from Fixed Effect or Random Effect estimation approaches. Another theory holds that Breusch-Pagan Lagrange multiplier (LM) test for panel data is also required to choose between Random Effect estimation and Pooled Effect Estimation.
Which of the preliminary tests should come first? Are these tests the final determinants of which estimation approach to deploy?
Relevant answer
Answer
John C Frain Thanks a great deal, Professor. This is helpful.
However, aside a recourse to economic theory, other statisticians have suggested that economic theory should be combined with statistical analysis in determining the choice among Fixed Effect/Random Effect/Pooled Effect.
The above formed part of the major reason I needed clarification on "What are the preliminary tests that will determine the choice among fixed effect regression, random effect regression and pooled effect regression?"
  • asked a question related to Regression
Question
1 answer
Hello community!
I am running a CFA for a within and between subjects design. The research involves the study of students on variables before and after taking an entrepreneurship course. The dependent variables are self-efficacy, with 5 subconstructs, and entrepreneurial intent. The independent variable is the course. Covariates are a continuous age variable, exposure (0, 1, 2, or 3), and experience (0, 1, or 2) (I used dummy variables for these in the ANCOVAs).
I don't have experience with repeated measures CFA and want to make sure I'm doing it correctly. I have attached a picture for the CFA model I have tested. I correlated error terms for the errors of the corresponding measured items. I set the regression weights to equal for both times. I also correlated the latent variables. This study also has multiple groups (female and male), but I believe it does not change anything to the factor structure (I just added separate data sets for those groups in Amos). Please let me know if this assumption is wrong.
  1. Does the model reflect an appropriate way to test whether the factor structure holds across time?
  2. Is it OK that I did not include the covariates or should I?
  3. The model where I constrain the regressions weights to be equal for both times has significantly lower model fit according to the chi square difference test. The model fit is otherwise good for both models TLI & CFI > .9 and RMSEA < .04. Can I argue that theoretically the model should hold and since model fit is good for the constrained model, that it is OK to use it across time? I know the chi square difference test is sensitive to sample size (n > 3,000) but does that matter for the chi square difference test?
  4. Chi square constrained model – Chi square less constrained model -> 11814.262-11644.759=169.503 and df (2ndmodel) – df (1st model) = 2345-2288=57. p < .001
I would appreciate your insight very much!
Thank you,
Heidi
Relevant answer
Answer
Hello Heidi,
Assuming your principal concern (research question) is whether the factor structure holds for this batch of respondents over the length of the course in question, then:
1. Your illustration would correspond to the restricted model, in which respective variable-factor (and second-order) loadings were constrained to be equal. In the unrestricted model, you would remove such constraints. As well, in the unrestricted model, you'd likely not start by assuming correlated error terms.
2. I don't see that inclusion of covariates would help...again, the driving force here is the specific research question you're trying to address.
3. Yes, N makes a difference (that's how one arrives at a value of 11,814). However, given N and your data, your results would appear to suggest that: (a) the constrained model (equal structures) is significantly different from the unrestricted model; (b) both models appear to fit well enough (to your implied criteria) that one could consider the difference to be statistically significant but not of such magnitude as to be of practical significance.
The fact of the matter is, sample to sample variance may often be the only real "culprit" for why one research team claims support for structure "A" for a measure while another claims support for structure "B."
4. See #3.
Males vs. females: Was this a research question? I couldn't tell.
Good luck with your work.
  • asked a question related to Regression
Question
1 answer
In addition to Oaxaca-Blinder decomposition, does exogenous switching regression is applicable to see gender gap in market participation of agricultural product?
Relevant answer
Answer
Both Oaxaca-Blinder decomposition and exogenous switching regression can be used to see the gender gap in market participation of agricultural product, but they have different assumptions and interpretations. Oaxaca-Blinder decomposition assumes that the treatment variable (e.g., gender) is exogenous and does not affect the outcome variable (e.g., market participation) through unobserved factors. It decomposes the mean difference in the outcome variable between the two groups into an explained component (due to differences in observable characteristics, such as education, land size, etc.) and an unexplained component (due to differences in coefficients or discrimination). Exogenous switching regression also assumes that the treatment variable is exogenous, but it allows for heterogeneity in the outcome variable across the two groups. It estimates two regression models for the outcome variable, one for each group, and a selection equation for the treatment variable. It can estimate the average treatment effect (ATE) and the average treatment effect on the treated (ATT), which measure the difference in the expected outcome between the two groups and between the treated group and their counterfactual outcome, respectively.
The choice between Oaxaca-Blinder decomposition and exogenous switching regression depends on the research question and the data availability. Oaxaca-Blinder decomposition is simpler to implement and interpret, but it requires a common set of predictors for both groups and a linear specification of the outcome variable. Exogenous switching regression is more flexible and can account for nonlinearities and interactions in the outcome variable, but it requires a set of exogenous variables that affect only the treatment choice and not the outcome variable. Both methods can provide useful insights into the sources and magnitude of the gender gap in market participation of agricultural product.
  • asked a question related to Regression
Question
1 answer
Hi,
I have a set of studies that looked at the association of sex w.r.t to multiple variables. The majority of the studies reported regression variables such as beta, b values, t-stats, and standard errors. Is it possible to run a meta-analysis using any of the above-mentioned variables? If so, which software would be more meaningful to perform a meta-analysis? I did a wee bit of research and found out that Metafor in R would be the better choice to perform these kinds of meta-analyses.
Any help would be highly appreciated!
Thanks!
Relevant answer
Answer
Hello Vital,
As sex is typically coded to be a dichotomous variable, you could use:
1. ordinary Pearson r;
2. Cohen's d (probably derived from r via the formula d ~ 2r / (1 - r^2));
3. for single IV regression models, the "beta" (standardized regression coefficient) = Pearson r;
Both r and d are common ES metrics in meta-analytic studies. If most of your sources are correlational in form, then I'd suggest sticking with r.
The problem with multiple regression models is, unless each model has exactly the same assortment and number of IVs, and the same DV, beta coefficients aren't meaningfully comparable for a given IV (for your aims, sex) across studies.
Good luck with your work.
  • asked a question related to Regression
Question
1 answer
Hi,
I have a set of studies that looked at the association of sex w.r.t to multiple variables. The majority of the studies reported regression variables such as beta, b values, t-stats, and standard errors. Is it possible to run a meta-analysis using any of the above-mentioned variables? If so, which software would be more meaningful to perform a meta-analysis? I did a wee bit of research and found out that Metafor in R would be the better choice to perform these kinds of meta-analyses.
Any help would be highly appreciated!
Thanks!
Relevant answer
Answer
Hi,
you may meta-analyze the bivariate correlation coefficients that are depicted in most studies. If not, write an email to the author.
You may convert the relationship between an IV and the DV from the regression analysis in a semi-partial or partial correlation but the problem is that these don't follow a defined sampling distribution. The reason is that the context (i.e., the set of other predictors and covariates) affect the regression coefficient.
Recenty, a paper by Aloe proposed to solve this in a meta-regression where the sets of control variables can be represented as dummies in a meta-regression. I have not yet tested this idea but think that studies will differ to such a large degree (wrt the controls) that you'll end up with one dummy per study....Perhaps an extension could be to create a dummy for each used covariate and control for these......
Aloe, A. M. (2015). Inaccuracy of regression results in replacing bivariate correlations. Research Synthesis Methods, 6(1), 21-27.
Best,
Holger
  • asked a question related to Regression
Question
3 answers
can we apply regression on moderate correlation? Please recommended an easy book to understand for non-statistical readers.
Relevant answer
Answer
If you have enough observations, such as 40 or 50 or more, then tested your data for correlation, it is recommended to run regression, for it will give you better underdtanding of your data. You can check any reference of Statistics books to understand better. Regards.
  • asked a question related to Regression
Question
3 answers
Hello everyone. The p value of the path estime regression weight (B=0.198) from A to C, is 0.014 in my model in the figure. After boostraping, the coefficient from A to C (B=0.198) becomes p value 0.043 as a direct effect. What causes this difference in P value? Many thanks for your comments
Relevant answer
Answer
My guess is that the first p value is based on a regular theoretical/asymptotic standard error, whereas the second one is based on bootstrapping, which is a different methodology for finding a p value empirically based on resampling rather than asymptotic theory.
  • asked a question related to Regression
Question
9 answers
Hello spatial analysis experts
Hope you're all good.
I need urgently the commands and R codes of performing spatial binomial regression in RStudio. Please if someone has already worked on it, share the codes from start to end.
Thanks and regards
Dr. Sami
Relevant answer
Answer
Hi Dr. Sami,
To perform spatial binomial regression in RStudio, you can use the 'spdep' package, which provides functions for spatial dependency analysis. Here's a step-by-step guide to performing spatial binomial regression:
1. Install and load the 'spdep' package in RStudio:
install.packages("spdep")
library(spdep)
2. Load your spatial data into RStudio. Make sure your data includes a dependent binary variable and spatial coordinates (e.g., longitude and latitude).
3. Create a spatial weights matrix using the 'nb2listw' function. This function will generate a neighbor list and transform it into a spatial weights matrix. For example:
nb <- dnearneigh(coordinates(data), d1) # Create neighbor list
w <- nb2listw(nb) # Create spatial weights matrix
4. Perform spatial binomial regression using the 'spglm' function. This function fits spatial generalized linear models. Specify the family argument as 'binomial' to indicate binomial regression. For example:
model <- spglm(dependent ~ independent, data = data, family = "binomial", weights = w)
5. Extract and analyze the regression results. You can use the 'summary' function to obtain a summary of the model coefficients and their significance. For example:
summary(model)
Remember to replace 'dependent' and 'independent' with the appropriate variable names in your dataset. Additionally, adjust the value of 'd1' in step 3 to define the distance threshold for identifying neighboring observations.
I hope this helps you with performing spatial binomial regression in RStudio. If you encounter any issues or have further questions, feel free to ask.
  • asked a question related to Regression
Question
4 answers
Hi there!
I am currently running SPSS AMOS 24
But the SEM result doesn't show the P-Value for regression weights in estimate when it comes to my three main paths
Estimates only showed score 1 for each correlation, S.E., C.R. and P-Value are all empty
(The rest of the variables are normal, only three main ones)
How can I resolve this question?
Looking forward to kind assistance in this regard, wish everyone well :)
Relevant answer
Answer
It seems like you are encountering an issue with SPSS AMOS 24 where the P-values for regression weights are not being displayed for your three main paths. This could be due to various reasons, such as missing data or model specification issues. To resolve this, you can try the following steps:
1. Check your data for missing values or outliers that may affect the estimation process. Make sure all variables used in the analysis have complete data.
2. Verify that your model is correctly specified, including the variable labels and measurement scales. Ensure that you have properly defined the paths and set up the model constraints.
3. Consider increasing the sample size if it is small, as this can improve the accuracy of parameter estimates and increase the likelihood of obtaining meaningful P-values.
4. If the issue persists, try updating your SPSS AMOS software to the latest version or consult the SPSS AMOS user community or support team for further assistance. They may be able to provide specific guidance based on the software version you are using.
  • asked a question related to Regression
Question
2 answers
My research topic is ROLE OF TEACHERS' ENTREPRENEURIAL ORIENTATION IN DEVELOPING ENTREPRENEURIAL MIND-SET OF STUDENTS IN HEIs. The research constructs that I am using are ENTREPRENEURIAL ORIENTATION and ENTREPRENEURIAL MIND-SET, both are psychological and behavioral. The variables that I will be measuring are INNOVATIVENESS, PRO-ACTIVENESS and RISK TAKING ABILITY of Teachers.
I will checking the strength of relationship between these construct using regression and would like to use THEORY OF PLANNED BEHAVIOUR by Ajzen in support of my research argument without TESTING or BUILDING the theory. I would seek expert advice as to how can it be done and is it practically acceptable practice Thank You
Relevant answer
Answer
Grounded Theory. Develops based on data.
  • asked a question related to Regression
Question
4 answers
Since OLS and Fixed effect estimation varies, for a fixed effect panel data model estimated using a fixed effects (within) regression what assumptions, for example no heteroskedasticity, linearity, do I need to test for, before I can run the regression.
I'm using the and xtreg,fe and xtscc,fe commands on stata.
Relevant answer
Answer
Before performing a fixed effects regression on panel data, several assumptions should be tested to ensure the validity of the results. These assumptions include:
  1. Time-invariant individual effects: One assumption of the fixed effects model is that the individual-specific effects are time-invariant. This means that the unobserved individual-specific factors affecting the dependent variable remain constant over time. This assumption can be tested by examining whether the individual effects are correlated with the time-varying regressors. If there is a correlation, it suggests that the assumption may be violated.
  2. No perfect multicollinearity: The independent variables in the regression should not exhibit perfect multicollinearity, which occurs when one or more independent variables are perfectly linearly dependent on others. Perfect multicollinearity can lead to unreliable coefficient estimates and inflated standard errors.
  3. No endogeneity: The assumption of exogeneity implies that the independent variables are not correlated with the error term. Endogeneity can arise when there are omitted variables, measurement errors, or simultaneity issues. Various tests, such as instrumental variable approaches or tests for correlation between the residuals and the independent variables, can be employed to check for endogeneity.
  4. Homoscedasticity: Homoscedasticity assumes that the error term has constant variance across all observations. Heteroscedasticity, where the variance of the error term varies systematically, can lead to inefficient coefficient estimates. Graphical methods, such as plotting residuals against predicted values or conducting formal tests like the White test, can be used to diagnose heteroscedasticity.
  5. No serial correlation: Serial correlation, also known as autocorrelation, assumes that the error terms are not correlated with each other over time. If there is serial correlation, it violates the assumption of independence of observations. Diagnostic tests like the Durbin-Watson test or plotting residuals against time can help identify serial correlation.
  6. Normality of errors: The assumption of normality assumes that the error term follows a normal distribution. Departures from normality can affect the reliability of hypothesis tests and confidence intervals. Graphical methods, such as histograms or Q-Q plots of residuals, can help assess normality.
  • asked a question related to Regression
Question
3 answers
In what situation we will use please tell me I face difficulty
Relevant answer
Answer
ARDL stands for Autoregressive Distributed Lag model. It is a statistical model used to analyze time series data. ARDL error correction model regression is an extension of the ARDL model that includes an error correction term. This term helps to correct for any deviations from long-run equilibrium in the short run. ARDL long run form bound test is a test used to determine whether there is a long-run relationship between two or more variables in an ARDL model.
  • asked a question related to Regression
Question
6 answers
In 2007 I did an Internet search for others using cutoff sampling, and found a number of examples, noted at the first link below. However, it was not clear that many used regressor data to estimate model-based variance. Even if a cutoff sample has nearly complete 'coverage' for a given attribute, it is best to estimate the remainder and have some measure of accuracy. Coverage could change. (Some definitions are found at the second link.)
Please provide any examples of work in this area that may be of interest to researchers. 
Relevant answer
Answer
I would like to restart this question.
I have noted a few papers on cutoff or quasi-cutoff sampling other than the many I have written, but in general, I do not think those others have had much application. Further, it may be common to ignore the part of the finite population which is not covered, and to only consider the coverage, but I do not see that as satisfactory, so I would like to concentrate on those doing inference. I found one such paper by Guadarrama, Molina, and Tillé which I will mention later below.
Following is a tutorial i wrote on quasi-cutoff (multiple item survey) sampling with ratio modeling for inference, which can be highly useful for repeated official establishment surveys:
"Application of Efficient Sampling with Prediction for Skewed Data," JSM 2022: 
This is what I did for the US Energy Information Administration (EIA) where I led application of this methodology to various establishment surveys which still produce perhaps tens of thousands of aggregate inferences or more each year from monthly and/or weekly quasi-cutoff sample surveys. This also helped in data editing where data collected in the wrong units or provided to the EIA from the wrong files often showed early in the data processing. Various members of the energy data user community have eagerly consumed this information and analyzed it for many years. (You might find the addenda nonfiction short stories to be amusing.)
There is a section in the above paper on an article by Guadarrama, Molina, and Tillé(2020) in Survey Methodology, "Small area estimation methods under cut-off sampling," which might be of interest, where they found that regression modeling appears to perform better than calibration, looking at small domains, for cutoff sampling. Their article, which I recommend in general, is referenced and linked in my paper.
There are researchers looking into inference from nonprobability sampling cases which are not so well-behaved as what I did for the EIA, where multiple covariates may be needed for pseudo-weights, or for modeling, or both. (See Valliant, R.(2019)*.) But when many covariates are needed for modeling, I think the chances of a good result are greatly diminished. (For multiple regression, from an article I wrote, one might not see heteroscedasticity that should theoretically appear, which I attribute to the difficulty in forming a good predicted-y 'formula'. For psuedo-inclusion probabilities, if many covariates are needed, I suspect it may be hard to do this well either, but perhaps that may be more hopeful. However, in Brewer, K.R.W.(2013)**, he noted an early case where failure using what appears to be an early version of that helped convince people that probability sampling was a must.)
At any rate, there is research on inference from nonprobability sampling which would generally be far less accurate than what I led development for at the EIA.
So, the US Energy Information Administration makes a great deal of use of quasi-cutoff sampling with prediction, and I believe other agencies could make good use of this too, but in all my many years of experience and study/exploration, I have not seen much evidence of such applications elsewhere. If you do, please respond to this discussion.
Thank you - Jim Knaub
..........
*Valliant, R.(2019), "Comparing Alternatives for Estimation from Nonprobability Samples," Journal of Survey Statistics and Methodology, Volume 8, Issue 2, April 2020, Pages 231–263, preprint at 
**Brewer, K.R.W.(2013), "Three controversies in the history of survey sampling," Survey Methodology, Dec 2013 -  Ken Brewer - Waksberg Award article: 
  • asked a question related to Regression
Question
4 answers
I'm working on my PhD thesis and I'm stuck around expected analysis.
I'll briefly explain the context then write the question.
I'm studying moral judgment in the cross-context between Moral Foundations Theory and Dual Process theory.
Simplified: MFT states that moral judgmnts are almost always intuitive, while DPT states that better reasoners (higher on cognitive capability measures) will make moral judgmnets through analytic processes.
I have another idea - people will make moral judgments intuitively only for their primary moral values (e.g., for conservatives those are binding foundations - respectin authority, ingroup loyalty and purity), while for the values they aren't concerned much about they'll have to use analytical processes to figure out what judgment to make.
To test this idea, I'm giving participants:
- a few moral vignettes to judge (one concerning progressive values and one concerning conservative values) on 1-7 scale (7 meaning completely morally wrong)
- moral foundations questionnaire (measuring 5 aspects of moral values)
- CTSQ (Comprehensive Thinking Styles Questionnaire), CRT and belief bias tasks (8 syllogisms)
My hypothesis is therefore that cognitive measures of intuition (such as intuition preference from CTSQ) will predict moral judgment only in the situations where it concerns primary moral values.
My study design is correlational. All participants are answering all of the questions and vignettes. So I'm not quite sure how to analyse the findings to test the hypothesis.
I was advised to do a regressional analysis where moral values (5 from MFQ) or moral judgments from two different vignettes will be predictors, and intuition measure would be dependent variable.
My concern is that the anlaysis is a wrong choice because I'll have both progressives and conservatives in the sample, which means both groups of values should predict intuition if my assumption is correct.
I think I need to either split people into groups based on their MFQ scores than do this analysis, or introduce some kind of multi-step analysis or control or something, but I don't know what would be the right approach.
If anyone has any ideas please help me out.
How would you test the given hypothesis with available variables?
Relevant answer
Answer
There are several statistical analysis techniques available, and the choice of method depends on various factors such as the type of data, research question, and the hypothesis being tested. Here is a step-by-step guide on how to approach hypothesis testing:
  1. Formulate your research question and null hypothesis: Start by clearly defining your research question and the hypothesis you want to test. The null hypothesis (H0) represents the default position, stating that there is no significant relationship or difference between variables.
  2. Select an appropriate statistical test: The choice of statistical test depends on the nature of your data and the research question. Here are a few common examples:Student's t-test: Used to compare means between two groups. Analysis of Variance (ANOVA): Used to compare means among more than two groups. Chi-square test: Used to analyze categorical data and test for independence or association between variables. Correlation analysis: Used to examine the relationship between two continuous variables. Regression analysis: Used to model the relationship between a dependent variable and one or more independent variables.
  3. Set your significance level and determine the test statistic: Specify your desired level of significance, often denoted as α (e.g., 0.05). This value represents the probability of rejecting the null hypothesis when it is true. Based on your selected test, identify the appropriate test statistic to calculate.
  4. Collect and analyze your data: Gather the necessary data for your analysis. Perform the chosen statistical test using statistical software or programming languages like R or Python. The specific steps for analysis depend on the chosen test and software you are using.
  5. Calculate the p-value: The p-value represents the probability of obtaining the observed results (or more extreme) if the null hypothesis is true. Compare the p-value to your significance level (α). If the p-value is less than α, you reject the null hypothesis and conclude that there is evidence for the alternative hypothesis (Ha). Otherwise, you fail to reject the null hypothesis.
  6. Interpret the results: Based on the outcome of your analysis, interpret the results in the context of your research question. Consider the effect size, confidence intervals, and any other relevant statistical measures.
  • asked a question related to Regression
Question
3 answers
This questions is for beginner students only.
Relevant answer
Answer
You can check any book of statistics to understand and follow the answer, and get acquainted with this topic, Regard.
  • asked a question related to Regression
Question
2 answers
My question is very straightforward. I have different ordinal independent variables (4 categories), and I am trying to regress it on different ordinal and nominal variables. I want help on how I could interpret the coefficients and the odd ratios.
On the picture, the independent variable has 4 categories
gender has 2 categories
age has 13 categories
education has 9 categories
politics has 2 categories
statecol has 3 categories
famincome has 6 categories
I want to ask as well if it is correct to write the "i." before some variables, and if not, when it is appropriate to use the "i."
I attach my Stata output.
Thank you
Relevant answer
Answer
I am pretty late here but do put i before categorical variables.
  • asked a question related to Regression
Question
1 answer
I'm fitting a model in R where response variable is an over-dispersed count data, and where I meet an issue with goodness-of-fit test. As it has been previously discussed in Cross Validated, common GOF test like chi-square test for deviance is inappropriate for negative binomial regression because it includes a dispersion parameter as well as the mean:
I'm confused and would appreciate any suggestions about the appropriate GOF test for NB regression (preferably intuitive).
Relevant answer
Answer
The Likelihood Ratio Chi-Square test is the appropriate goodness-of-fit test for negative binomial regression. This test compares the fit of the negative binomial regression model to an alternative model, typically a simpler model such as a Poisson regression.
To perform the Likelihood Ratio Chi-Square test for negative binomial regression in SPSS, you can follow these steps:
  1. Run the negative binomial regression model using the appropriate command in SPSS. Make sure to include all relevant predictors and specify the negative binomial distribution. This will generate the estimated coefficients and other model outputs.
  2. Obtain the log-likelihood value for the negative binomial regression model. This value can usually be found in the model output or summary table.
  3. Run a modified version of the negative binomial regression model without the predictor variables. This is your alternative model, typically a Poisson regression model.
  4. Obtain the log-likelihood value for the alternative model.
  5. Calculate the likelihood ratio (LR) statistic by subtracting the log-likelihood value of the alternative model from the log-likelihood value of the negative binomial regression model: LR = -2 * (log-likelihood of an alternative model - log-likelihood of the negative binomial model).
  6. The LR statistic follows a chi-square distribution with degrees of freedom equal to the difference in the number of parameters estimated between the two models. The difference in the negative binomial versus Poisson regression is the number of predictors in the negative binomial model.
  7. Compare the calculated LR statistic to the critical chi-square value at the desired significance level. If the calculated LR statistic is greater than the critical value, it suggests that the negative binomial regression model provides a significantly better fit than the alternative model (Poisson regression).
It's important to note that the Likelihood Ratio Chi-Square test assesses the overall goodness-of-fit of the negative binomial regression model but does not provide information about specific predictors or individual model assumptions. Additional diagnostic tests and assessments should be performed to evaluate the model's performance thoroughly.
  • asked a question related to Regression
Question
2 answers
Autocorrelation, Hetroscedastcity and the normality of residuals are being considered as the important diagnostics to examine the credibility of model. If Autocorrelation is examining by DW so up to which level the DW will be sufficient for reliable results?
What measure will used for hetroscedasticity in panel regression?
Relevant answer
Answer
Regression diagnostics are methods for determining whether a regression model that has been fit to data adequately represents the structure of the data.
  • asked a question related to Regression
Question
4 answers
Hi, I am currently writing my master thesis on early indicators of bank failure. For this I have calculated probability of default as my dependent variable and have around 15 different financial ratios as explanatory variables, I have also included two measures of interest sensitivity as possible independent variables.
My data consists of 125 companies over 20 years. I'm using STATA and need help with how I should format my data from excel. I'm also unsure of which kind of regression is best suited for my data. I've tried reading Econometric Analysis of Cross Section and Panel Data by Woolridge but I'm feeling a bit lost.
Attached is the Excel file.
Thank You for the help!
Relevant answer
Answer
Data Panel Regression is a combination of cross section data and time series, where the same unit cross-section is measured at different times. So in other words, panel data is data from some of the same individuals observed in a certain period of time.
  • asked a question related to Regression
Question
18 answers
The Seemingly Uncorrelated Regression Models
Relevant answer
Answer
Bruce Weaver , he also states that SPSS has a procedure for SUR with correlated errors. Given the specifics that Samuel Oluwaseun Adeyemo 's post includes either he is working with a system beyond what (https://www.ibm.com/support/pages/seemingly-unrelated-regression) is refers to that we don't know about, or there is some other explanation for how he came up with this information. Shame in his response above he did not answer where he got this information. Since he didn't answer on the other thread, the using a new version explanation seems unlikely.
  • asked a question related to Regression
Question
3 answers
Hello...
I have 40 samples... I want to randomly select a percentage of these 40 samples and make an equation from them by regression and test the remaining percentage of the sample with this regression equation... that is, a number of data For correlation (creation of regression equation) and some data for validation.... My question is:
1- On what basis is this sample selection percentage chosen for correlation and validation? For example, one can take 50% to 50% of the samples, one can take 70 to 30... which one is correct? Is there an article or book that tells the basics?
2- What should be the sample selection criteria for regression equation and validation?
Relevant answer
Answer
There is no need to split data. You can use all your sample for the construction of the regression model, and then use all the sample again for internal validation with bootstrap. These are the recommendations about this topic: Internal validation of predictive models: efficiency of some procedures for logistic regression analysis - PubMed (nih.gov). These recommendations also apply for multiple linear regression, and cox regression models.
  • asked a question related to Regression
Question
2 answers
I have tested IPS unit root for my regression variables. However, one of my variables are still non-stationary after first difference, but stationary at second difference. May I know what are suitable model to use for integrated order 2 ?
Relevant answer
Answer
In panel data, if a variable is integrated of order 2 (also known as I(2)), it means that it needs to be differenced twice to become stationary. In other words, it has a quadratic trend that needs to be removed to achieve stationarity.
Dealing with I(2) variables in panel data can be challenging, but there are a few possible approaches that you could consider:
  1. First-differencing the data: One approach to dealing with I(2) variables is to first-difference the data twice to make it stationary. However, this can lead to a loss of information and may not be appropriate if the I(2) variable is an important part of the research question.
  2. Including a quadratic time trend: Another approach is to include a quadratic time trend in the model to account for the non-stationarity. This involves adding a time-squared variable as a regressor in the model, which captures the quadratic trend in the I(2) variable. This approach can be effective if the quadratic trend is the only source of nonstationarity in the variable.
  3. Using a cointegration analysis: If the I(2) variable is related to other variables in the model, it may be appropriate to use cointegration analysis. Cointegration analysis involves testing for a long-term relationship between the I(2) variable and other stationary variables in the model. If cointegration is found, a vector error correction model (VECM) can be used to model the dynamic relationships between the variables.
  4. Using an I(2) model: Finally, if the I(2) variable is the main focus of the research question, it may be appropriate to use an I(2) model. This involves modeling the I(2) variable explicitly using methods such as the I(2) autoregressive distributed lag (ARDL) model. This approach can be effective if the I(2) variable is a key component of the research question and is not related to other variables in the model.
Overall, the choice of method will depend on the specific research question and the characteristics of the data. It may be useful to consult with a statistician or econometrician to determine the best approach for your particular study.
  • asked a question related to Regression
Question
3 answers
Can I use a variable that was found to be correlated with the independent variable in the exploratory analysis to control the effect of the variable in hierarchical regression by putting it in the first step without any prior hypothesis related to it?
I think I have learned to do that, but a reviewer pointed out it is not right to use the variable as a control variable because I do not have any hypothesis related to it.
If using it as a control variable is possible, please give me the list of publications that used this method or references that can justify this.
Relevant answer
Answer
Using a variable as a control variable without any prior hypothesis related to it is generally not recommended. Control variables are typically included in a statistical model to account for potential confounding effects that could influence the relationship between the independent variable and the dependent variable.
When selecting control variables, it is important to have a theoretical basis or prior knowledge to support their inclusion in the model. This means that you should have a reason to believe that the control variable could have an impact on the dependent variable or on the relationship between the independent variable and the dependent variable.
If you include a variable as a control variable without any prior hypothesis related to it, you run the risk of introducing additional noise or bias into your analysis. This can lead to inaccurate or misleading results and can make it difficult to draw valid conclusions.
In summary, it is generally recommended to include control variables in a statistical model based on prior knowledge or theoretical reasoning, rather than including variables without a prior hypothesis related to them.
  • asked a question related to Regression
Question
4 answers
there are two independent (A & B) variable and two dependent (Y & Z). I want to see if the impact of independent variables vary across dependent variables. I've used SEM and that shows the value of the relationship, where A significantly impacts Y and Z, B significantly impacts Y but not Z. At the same time the regression estimate of Impact of A on Z is less than the regression estimate of B on Z. The reviewer asks for a test for significant change, which I don't understand.
If the reviewer is asking if there is significant difference in the impact, how to do that? Kindly help, thanks
Relevant answer
Answer
You could formally test the equality of the regression slopes by specifying a model in which the slopes A --> Z and B --> Z are set equal. The fit of the model with equal slopes can then be compared to the fit of the original model with unconstrained (freely estimated) slopes. For example, you could conduct a chi-square difference test to compare the two models statistically. If the chi-square difference is significant, this means that the slopes differ significantly. However, the interpretation of this would most likely only make sense if A and B are in the same metric (same units of measurement).
  • asked a question related to Regression
Question
4 answers
Is there a REFERENCE, for selecting items of a construct based on the standardised regression weights and preferring to select the ones above 0.6 for doing the Confirmatory Factor analysis and later SEM ?
I have found a reference which said if the weights were above .4 the item could be taken.
Relevant answer
Answer
I must say this sounds like a bad approach to me. I can't see how it could make sense to make a selection based on an arbitrary criterion/threshold about regression coefficients in multiple regression. CFA implies redundancy (high correlations) of the measures pertaining to a given factor. When you include multiple highly correlated indicators of the same factor in a multiple regression model, some will have lower regression weights simply because of the (desirable) redundancy (overlap) of the indicators.
As the name implies, CFA is a confirmatory type of analysis. The choice of measures for CFA and SEM should be a result of your theory about the construct(s), not prior empirical selection of variables based on regression or some other technique. So if there is a reference that recommends the above approach, I would be skeptical.
  • asked a question related to Regression
Question
7 answers
I'm doing a multiple linear regression where some of the independent variables are normally distributed whereas others aren't. The normal P-P plots of regression seems appropriate as the plots are in line. I have 84 participants in total, is that enough to go ahead with linear regression without assumption of normality being met?
Relevant answer
Answer
The normality assumption in linear regression concerns the residuals (error terms), not the independent or dependent variables.
  • asked a question related to Regression
Question
1 answer
Dear Colleagues,
QQ regression is perhaps one of the latest methods in econometric estimation approaches. In case you have expertise could you please help me by providing useful information as to how to perform QQ regression using R or Stata?
Relevant answer
Answer
Hello,
In my view, you can use package R "quantreg".
And you can use this command:rq(formula, tau=c(0.05, 0.25, 0.5, 0.75, 0.95))
Hope the website following can help you ! ↓
  • asked a question related to Regression
Question
5 answers
I am currently engaged in a study that applies regression and ANOVA models to several latent variables, including entrepreneurial passion, risk attitude, and entrepreneurial self-efficacy.
In the context of this study, I am seeking a rigorous and straightforward method for determining the factor loadings and latent variable scores for each participant. I am particularly interested in going beyond the traditional methods of simply calculating average or sum scores for these latent variables.
I believe estimating these factors would be more precise using an approach similar to that employed in Covariance-Based Structural Equation Modeling (CB-SEM) models.
Could you provide guidance on how to implement this approach effectively? Would you recommend specific statistical techniques or software tools for this purpose?
How can I ensure the validity and reliability of the obtained factor loadings and latent variable scores? Any advice or resources you could share would be greatly appreciated.
Relevant answer
Answer
Thanks Giacomo!
  • asked a question related to Regression
Question
3 answers
Suppose I have collected data on customer churn rates and various customer attributes. How can I use regression to predict the likelihood of customer churn based on these attributes?
Relevant answer
Answer
Hello Dr Kumar, What attributes do you have? Wealth [how do you define], shopping centre attributes ie markets, upmarket shopping centres, average shopping centres, time of day [day/ night shopping] ie. I know someone who would only shop for Xmas presents on Xmas eve. Are they categories or continuous? I would be interested to see the results. It is interesting why some women [even some teachers/ health professionals] have excessive shopping disorders accumulating vast quantities of clothes. Good luck. Bye.
  • asked a question related to Regression
Question
9 answers
For context, the study I am running is a between-participants vignette experimental research design.
My variables include:
1 moderator variable: social dominance orientation (SDO)
1 IV: target (Muslim woman= 0, woman= 1) <-- these represent the vignette 'targets' and 2 experimental conditions which are dummy-coded on SPSS as written here)
1 DV: bystander helping intentions
I ran a moderation analysis with Hayes PROCESS macro plug-in on SPSS, using model 1.
As you can see in my moderation output (first image), I have a significant interaction effect. Am I correct in saying there is no direct interpretation for the b value for interaction effect (Hence, we do simple slope analyses)? So all it tells us is - SDO significantly moderates the relationship between the target and bystander helping intentions.
Moving onto the conditional effects output (second image) - I'm wondering which value tells us information about X (my dichotomous IV) in the interaction, and how a dichotomous variable should be interpreted?
So if there was a significant effect for high SDO per se...
How would the IV be interpreted?
" At high SDO levels, the vignette target ___ led to lesser bystander helping intentions; b = -.20,t (88) = -1.65, p = .04. "
(Note: even though my simple slope analyses showed no significant effect for high SDO, I want to be clear on how my IV should be interpreted as it is relevant for the discussion section of the lab report I am writing!)
Relevant answer
Answer
The significant t-test for the interaction term in your model shows that the slopes of the two lines differ significantly. But at the 3 values of X that are shown in your results (x=-.856, x=0, x=.856), fitted values on the two lines do not differ significantly.
I suspect your output is from Hayes' PROCESS macro, and that -.856 and .856 correspond to the mean ± one SD. Is that right?
Why does it matter if the fitted values on the two lines do not differ significantly at those particular values of X? Your main question is whether the slopes differ significantly, is it not?
  • asked a question related to Regression
Question
4 answers
My sample is an environmental sample. Is there anybody who can help me with this?
Relevant answer
Answer
Hello Shahadat,
Have a look at this link, which walks you through the process using R:
Good luck with your work.
  • asked a question related to Regression
Question
4 answers
I am using SPSS to model count data with a Poisson distribution. My initial Poisson model and the default Negative binomial model showed over dispersion and under dispersion respectively. I am fitting a third model with custom negative binomial with log link function to estimate dispersion more accurately. However, I get this message after running the model: "There are no valid cases for the log link function. Only the iteration history is displayed. Execution of this command stops". Is it okay for me to go with the best of the first two models or there is a way around the problem with the third model (in which case what should be done)?
Relevant answer
Answer
When running a custom negative binomial regression with an estimated dispersion parameter in SPSS, you may encounter iteration problems due to the complexity of the model. Here are some steps you can take to resolve these issues:
  1. Increase the maximum number of iterations: In SPSS, you can increase the maximum number of iterations allowed for the model. To do this, go to the "Analyze" menu and select "Regression" -> "Negative Binomial." In the dialog box that appears, go to the "Options" tab and increase the maximum number of iterations. This may help the model converge if it is having difficulty doing so.
  2. Change the optimization method: SPSS offers several optimization methods for fitting negative binomial regression models. If the default method is not working well, you can try changing it to another method. In the "Options" tab of the negative binomial regression dialog box, select a different optimization method from the drop-down menu.
  3. Check for multicollinearity: Multicollinearity can cause convergence problems in regression models. You can check for multicollinearity by examining the correlation matrix between the predictor variables. If there is high correlation between two or more variables, consider removing one of them from the model.
  4. Simplify the model: A complex model with many predictor variables can be difficult to fit. If the iteration problems persist, consider simplifying the model by removing some predictor variables or by using a simpler model specification.
  5. Consult a statistician: If you have tried the above steps and are still having difficulty with iteration problems, consider consulting a statistician who can help you with your model specification and implementation.
By following these steps, you can help to resolve iteration problems when running a custom negative binomial regression with an estimated dispersion parameter in SPSS.
  • asked a question related to Regression
Question
3 answers
Hi All,
I'm working on an artificial neural network model, I got the attached results in which the regression is 0.99072 which I think is good, but not sure why there is an accumulation of data about the Zero and One as shown in the attached regression.
Any Idea, or explanation, I will be highly appreciated.
Relevant answer
Answer
Thank you very much Bipesh Subedi
  • asked a question related to Regression
Question
4 answers
In finding the correlation and regression of multivariable distribution what is the significance of R and R^2? What is the main relation between them?
Relevant answer
Answer
R represents the correlation coefficient between two variables in a multivariable distribution. It measures the strength and direction of the linear relationship between the two variables. R ranges from -1 to +1, where -1 indicates a perfect negative correlation, +1 indicates a perfect positive correlation, and 0 indicates no correlation.
R^2, on the other hand, represents the coefficient of determination. It measures the proportion of variance in one variable that is predictable from the other variable(s) in a multivariable distribution. R^2 ranges from 0 to 1, where 0 indicates no variance in the dependent variable is explained by the independent variable(s), and 1 indicates that all the variance in the dependent variable is explained by the independent variable(s).
The main relationship between R and R^2 is that R is the square root of R^2. In other words, R^2 is the proportion of variance in the dependent variable that is explained by the independent variable(s), and R is the correlation coefficient between the dependent variable and the predicted values based on the independent variable(s). Therefore, R^2 is a measure of how well the regression line fits the data, while R is a measure of the strength and direction of the linear relationship between two variables. R-Squared? R-squared (R2) is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model
  • asked a question related to Regression
Question
1 answer
I have data on sales and various marketing efforts, such as advertising spend, social media engagement, and promotional activities. How can I use regression to quantify the effectiveness of these marketing strategies on sales?
Relevant answer
Answer
It's as easy as it can be on Spss. First you have to enter your quantitative data into Spss. Identify your variables in the variables view. In the data view, you transform the measures of your Marketing strategies into composite variables. Also transform your sales variable into a composite variable.
Then, click on analyze >>regression >> linear. Drag your independent variables (marketing strategies) into the independent variables box. Drag your dependent variable (sales) into the dependent variable box. Then click okay.
Your multiple regression output will come out showing the relationship between your variables as well as the effects of each of your marketing strategies on the dependent variable - - sales.
You can send me a direct message here for more clarity. I might have not been so detailed in this answer.
  • asked a question related to Regression
Question
3 answers
ماهو الهدف من الانحدار الاستراتيجي وما هي طرائق استخدامه
Relevant answer
  • asked a question related to Regression
Question
2 answers
Hello everyone, I only found the additive interaction calculation method based on logistic regression model on the internet, can anyone provide the name of the R package or SAS code for calculating RERI, AP, SI and their 95%CIs based on log-binomial regression model? Millions of thx!
Relevant answer
Answer
Hi,
did you see this paper?
They refer to reference [9] to calculate CIs based on logreg for RERI, AP and S. Perhaps that could work for you?
This is reference [9] in the paper mentioned above:
Hosmer DW, Lemeshow S. Confidence interval estimation of interaction. Epidemiology. 1992;3:452–456.
  • asked a question related to Regression
Question
4 answers
Hi dear researchers
i am reading about regression analaysis can anyone help me to guve me a brief ideae about when do we use simple linear regression and what is differnce with correlation coefficient thanks in advance
Relevant answer
Answer
A regression coefficient has the distinct advantage that it is an interpretable measure of effect. For example, you can use maternal height to predict baby weight and report the coefficient as the increase in expected baby weight associated with a one-centimetre increase in maternal height.
If you reported the correlation, there is no such real life interpretation.
  • asked a question related to Regression
Question
3 answers
Dear all,
We are establishing statistical equations to predict lime requirement (LR) of tropical soils. I've found in the literature that, for prediction purpose, sometimes regression equations are used, but other times models are employed. Can anyone give us the main difference between a statistical equation and a model ? And which one is most suitable for prediction purpose ? Thanks a lot.
Relevant answer
Answer
Semantically, "model" is a higher-level designation, and "mathematical equation" is one special form in which a model can be formulated.
A "statistical model" is essentially the definition of a random variable (RV) used to explain observed values (data). This way, the RV is a model of the data-generating process. Additionally, a statistical model may also include a mathematical equation describing (modelling!) the relationship between some features of the RV (typically the expectation of its distribution) and other (experimental) variables.
Markov models are one possible different approach of modeling the state of a system with a RV that changes through time (or simulation cycle).
  • asked a question related to Regression
Question
3 answers
I have data related to households with a number of variables with the dependent variable being household consumption. I need to specify an OLS regression to identify the treatment effect of interest but I do not have interest as a variable within the data provided. How would I go about creating this variable and introducing it as a shock to the data?