Science topic

# Statistical Modeling - Science topic

Explore the latest questions and answers in Statistical Modeling, and find Statistical Modeling experts.

Questions related to Statistical Modeling

Hello, I am currently writing my bachelor's thesis and I am trying to investigate, based on BD2MS2 selectorate theory, whether the ratio of government expenditure to the winning coalition/selectorate value can provide an explanation as to why some leaders under economic sanctions are able to maintain office better than others. For this, I need a statistical model that will allow me to test the correlation between a dependent variable (has the leader remained in power regardless of the sanctions imposed against him) and an independent variable that varies over time (as I said before, gov't spending in relation to w/s for the duration of sanctions). I would also like to compare this correlation at least with the country's score on the polity scale and its gdp per capita, to see if using the selectorate theory provides any better results. Does anyone know what the best statistical model to do this would be? I must add that I am not the most versed in complicated statistical models but I am a fast learner, so any suggestion would help. Thank you so much!

Good day, everyone.
I am analyzing a moderated mediation model, and I need to examine the conditional indirect effects at various levels of the moderator.

**The PROCESS in SmartPLS 4 reports the conditional indirect effect, but only at + 1 and 0 values.**

My moderator in SmartPLS, I set it as a Ordinal Scale, because I use Likert Scale for my survey.

**Can I or where I can get the indirect effect at -1 values?**

Hello everyone,

I'm testing different inhibitors against Candida albicans. I'm measuring hourly OD600 to plot growth curves in order to compare delays in lag duration (or any other comparable parameter to measure growth inhibition). Which would be a good statistical model in GraphPad to calculate this? I've been looking at Logistic and Gompertz curves, but I haven't been able to get satisfactory analyses with these. I've also looked at AUC (area under curve), but not sure this is the best option. Thanks in advance for any suggestions!

Tatiana.

Several models have been used to estimate rice yields spatially, including empirical, semi-empirical, and process-based crop models. Empirical models, or correlative or statistical models, are typically used over larger spatial scales such as the country or regional scale.

Hello everyone,

I am currently undertaking a research project that aims to assess the effectiveness of an intervention program. However, I am encountering difficulties in locating suitable resources for my study.

Specifically, I am in search of papers and tutorials on

**multivariate multigroup latent change modelling**. My research involves evaluating the impact of the intervention program in the absence of a control group, while also investigating the influence of pre-test scores on subsequent changes. Additionally, I am keen to explore how the scores differ across various demographic groups, such as age, gender, and knowledge level (all measured as categorical variables).Although I have come across several resources on univariate/bivariate latent change modelling with more than three time points, I have been unable to find papers that specifically address my requirements—namely, studies focusing on two time points, multiple latent variables (n >= 3), and multiple indicators for each latent variable (n >= 2).

I would greatly appreciate your assistance and guidance in recommending any relevant papers, tutorials, or alternative resources that pertain to my research objectives.

Best,

V. P.

Hallo, i am looking for introductory notes/literature to statistical modelling of diseases or/and spatial modeling of diseases. Who can recommend relevant literature? I am conversant with deterministic compartmentalized disease modeling with the use of ODE's. I feel i need to explore other methods of disease modeling

Hi all,

I'm struggling in determining the most appropriate statistical model to use.

The experimental design is a single group pre-post design.

dependent variable=connectivity strength (functional MRI), dependent variable=anxiety total score, other time-invariant variables (age/sex).

I built in r the following model: lmer( y ~ condition + age + sex + anxiety + (1|subj), data=dat, REML=TRUE), where y = connectivity strength, condition=pre/post and anxiety=anxiety total score.

Is the model appropriate?

Best regards

A.G.

How can probability theory and statistical modeling contribute to our understanding of phonological variation and probabilistic phonological processes?

I am interested in any good book with exercise sets for students for teaching a course in Statistical Modeling. (linear model, regression, glm etc.) The course will use calculus and linear algebra as well as applications using real data sets. Can any one suggest a good book to be used a text. The book should not be "... with R" type but focus on concepts, understanding with a reasonable level of mathematical rigor for 2nd /3rd year undergraduate students in Statistics/Mathematics.

I want to build a complete weather statistical model to predict the weather.

Where the model is able to predict all atmospheric layers.

And to be able to display the predicted results in the form of interactive weather maps.

I have understood the theoretical concepts of moderating and mediating variables. However, the research papers that have tested the roles of such variables are using interaction terms for both moderation and moderation effects.

What is the difference between the statistical models for testing moderating and mediating roles?

I am working in the field of forest ecology by using statistical modelling. Please write me for any further clarification.

Thanks

I am currently doing my undergraduate study, and I am using UTAUT2 to evaluate factors that can affect adoption of a new application we made. I have 30 sample size, all obtain using a purposive sampling only. I want to use SEM by performing PLS but it seems that I needed a bigger sample size, what other statistical models can I use?

I am refering to models like REM, SC, REST and CTDS

I can't find articles that describe the quality of multivariate statistical models according to RMSEC, RMSECV and RMSEP values. I find only describing about R2 and RPD, as in the Williams 2003 article.

In my time series dataset, I have 1 dependent variable and 5 independent variables and I need to find the independent variable that affects the dependent variable the most (the independent variable that explains most variations in the dependent variable). Consider all 5 variables are economic factors.

*Dear scholars,* I want a

**statistical mode**l to analyze my data on rare a**rare diseas**e( asymptomatic or submicroscopic malaria). I want a consultation from experts in the field.**I am convinced that logistic regression is not suitable for my study however there are dozens of published articles used it.**I want to see it in different way.

Hence, need your prompt responses.

Abdissa B.

Currently, data is available in forms of text, images, audio, video and other such forms.

We are able to use mathematical and statistical modeling for identifying different patterns and trends in data which can be used through machine learning which is a A.I's subsidiary for performing different decision making tasks. The data can be visualized in variety of forms for different purposes.

Data Science is currently the ultimate state of Computing. For generating data we have hardware, software, algorithms, programming, and communication channels.

But, what could be next beyond this mere data creation and manipulation in Computing?

I have a dataset of 140 students who have sat an exam. The exam had a total of 30 marks, split equally into an A and B question. The A question was numerical and was made up of several calculations. The B question was an essay response to a problem.

The exam has been marked twice. The first time using a mark scheme, each of the calculations had a maximum possible number of marks as did the essay. Each sub question was marked with reference to the mark scheme before all of the scores were totalled. The second time was based purely on academic judgement, the same marker considered the whole exam and made a judgement based on the overall performance.

I've calculated the correlation (.847, p=.00003), and I'm keen to explore this further, but I'm drawing a blank on which statistical models could be used? Any suggestions?

The statistical model I used for the calculation of air-sea CO2 fluxes, gives me results of net air-sea CO2 flux in GtC yr-1.

I would like to convert GtC year-1 to mmol m-2 d-1

Hello friend,

I have done my thesis in simple lattice design (2 replications) for 2 years. I want to do

*. How can do it? which statistical software? which program?***combined**analysis of varianceThanks

Having effect sizes from multiple time points after intervention, which is the best way to take it into account? Which statistic model can I use?

TAG is nothing but Thrust Area Group. The main objective of TAG - Learning Analytics is to work on the following objectives :

To provide helpful information to optimize or improve learning designs.

Track and evaluate student use of learning materials and tools to identify potential issues or gaps, and provide an objective evaluation of the materials and tools.

Identify patterns in historical data and apply statistical models and algorithms to identify relationships between various student academic behavioural data sets to forecast trends.

Exploring the relationship between computational thinking and academic performance.

Predictive Analytics: understanding the future of student learning ability

. Sustainable quality education

Anyone interested to join this TAG can register here.

I applied Ordinal Logistic Regression as my main statistical model, because my response variable is 7 Point-Likert Scale data.

After testing for Goodness of Fit using AIC, i got my best fit model, including 4 independent variables (3 explanatory and 1 factor variable).

However, I encounter 1 negative coefficient value (0.44 odds) of 1 explanatory variable (all explanatory variables are also 7 point Likert-scale).

My theoretical assumption is simple: the more frequency of explanatory variables (engage in activities) happen, the higher impact score on response variable (mutual understanding)

That's why I am confused when 1 independent variable has negative coefficient.

In this case, how should I interpret this IV?

Thank you very much,

I have previously conducted laboratory experiments on a photovoltaic panel under the influence of artificial soiling in order to be able to obtain the short circuit current and the open-circuit voltage data, which I analyzed later using statistical methods to draw a performance coefficient specific to this panel that expresses the percentage of the decrease in the power produced from the panel with the increase of accumulating dust. Are there any similar studies that relied on statistical analysis to measure this dust effect?

I hope I can find researchers interested in this line of research and that we can do joint work together!

**Article link:**

Hi

I'm working on a research for developing a nonlinear model (e.g. exponential, polynomial and...) between a dependent variable (Y) and 30 independent variables ( X1, X2, ... , X30).

As you know I need to choose the best variables that have most impacts on estimating (Y).

But the question is that can I use Pearson Correlation coefficient matrix to choose the best variables?

I know that Pearson Correlation coefficient calculates the linear correlation between two variables but I want to use the variables for a nonlinear modeling ,and I don't know the other way to choose my best variables.

I used PCA (Principle Component Analysis) for reduce my variables but acceptable results were not obtained.

I used HeuristicLab software to develop Genetic Programming - based regression model and R to develop Support Vector Regression model as well.

Thanks

Greetings to all those interested and eager to help.

In short: During 12 breeding seasons, my colleagues and I researched the nesting of one bird species on an area of about 11,000 hectares. We spent the first two years looking exclusively for territories/nests. We recorded a total of 34 different territories/nests. For the next ten years, we had in mind to monitor the reproductive parameters (laying dates, number of eggs, number of offspring, etc.) for all 34 territorial pairs found. However, due to the vast study area, hard mountain relief, bad weather conditions and lack of time in general, we did not visit every territorial pair every year (we did it completely randomly). So, in some years, we followed the nesting parameters in only five territories, while in others, we managed to monitor up to 20 territories and collect reproductive data. For each year collected data table contained: year, number of controlled nests, number of nests with incubation, number of successful pairs, number of fledglings, productivity and nesting success. We defined productivity as the number of fledged juveniles divided by the number of successful nesting attempts. Nesting success was defined as the number of fledged juveniles divided by the total number of nesting attempts during one calendar year. My question is whether it is possible and in what way (statistical modelling, simple formula, etc.) to express the population trend for the entire monitored population with the help of partially collected reproductive data? So is it possible to project a ten-year overall population trend based on annual productivity and nesting performance data?

Thanks in advance for the comments, suggestions and literature.

Sincerely yours,

How to develop a suitable model based on problem. Give some real time example

Hello,

I work with a knockout model infecting cells with cre-adenovirus to induce the deletion.

I am analyzing my qPCR results and I am not quiet certain about the best statisticalmethod to use, because I want to analyze the correlation of cre-adenoviral infection and animal genetic status (basically, I want to know how much of the gene expression is affected by cre-adenoviral infection by itself and how much is as a result of the mutation).

To evaluate that I have four groups:

- Wild-type cells not infected

- Wild-type cells cre-infected (n=3 for wild-type cells)

- Transgenic cells not infected

- Transgenic cells cre-infected (n=6 for transgenic cells)

I have tried different statistical models, but I believe I am still lacking understanding about choosing a statistical method.

PS: the reason for sample size differences was the unexpected result of gene expression being significantly affected in wild-type cells infected with cre-adenovirus - when I proved that wild-type cells were also affected, I did not have the opportunity to gather more samples.

Thank you

Hi.

I have a data in which the relationship between two parameters seems to fit to a model that has two oblique asymptotes. Does any one have any idea about what type of function I should use? Please find attached a screenshot of the data. I appreciate any help.

Thanks.

I want to read more about ordered probit model, but my searches are returning only application articles. Are there any good recomendations of texts that explain more of the theory behind it?

Any member of RG help me in applyuing statistical models for the explanation of adsorption process, like mono layer adsorption model, Double layer adsorption model and Multi-layer adsorption model and other then these another important statistical model if used please share with me. I read the article of Qun Li et al. group article that is published in chemical engineering journal in Dec 2021, Where these models are used but I am remain unable to apply these models on my adsorption articles.

The title of the article is:

**"Effective adsorption of dyes on an activated carbon prepared from carboxymethyl cellulose: Experiments, characterization and advanced modelling"**

I was wondering whether anyone had any suggestions as to what statistic to use (I use R Studio).

I have some muscle cell circumference measurements (response) and I have two explanatory variable columns (Control/Diet) and exercise (Yes/ No) (categorical) and I need the option to run an interaction between these explanatory variables (as I had 4 groups) as well as separately. A Bartlett's test states that its non-parametric data so does anyone have any suggestions as to what statistic/model I could use? I have looked into GLM's but as far as I can see, they don't work with interactions/ non-binomial data.

Thanks in advance for any recommendations.

I am trying to estimate the strength of the relationship between a set of independent categorical variables (coded as binary variables; 1=yes; 0=no) and a continuous dependent variable. Which statistical model would suit here?

In my database have some variables such as income and age that are described in classes ( $0 to $2,000 or 20 to 29 years old for example). However, the texts I have read more often than not use those variables as numbers, not classes. As I see, using numbers allows for a more comprehensible analysis of most methods. Should I do it?

In the example mentioned, what should I test to convert $0 to 2,000 to $1.000?

If not, is there any other conversion possible?

The work entails tracing erosion in the last 100 years. It is testing the stability of the ecosystem as soil is deposited in various sites in the catchment. What statistical components do I need to look at and what is the most fit model that I can use so that I can analyse my data well?

This work is still at formative stages hence yet to be done. The purpose of the question is to find out from other experts in this area if and how statistics can be used to improve it.

I aim to allocate subjects to four different experimental groups by means of Permuted Block Randomization, in order to get equal group sizes.

This, according to Suresh (2011, J Hum Reprod Sci) can result in groups that are not comparable with respect to important covariates. In other words: there may be significant differences between treatments with respect to subject covaraites, e.g. age, gender, education.

I want to achieve comparable groups with respect to these covaraites. This is normally achieved with stratified randomization techniques, which itself seems to be a type of block randomization with blocks being not treatment groups, but the covariate-categories, e.g. low income and high income.

Is a combination of both approaches possible and practically feasible? If there are, e.g. 5 experimental groups and 3 covariates, each with 3 different categories, randomization that aims to achieve groups balanced wrt covariates and equal in size might be complicated.

Is it possible to perform Permuted Block Randomization to treatments for each "covariate-group", e.g. for low income, and high income groups separately, in order to achieve this goal?

Thanks in advance for answers and help.

I need some research papers on this topic. can somebody help me to pin some of the documents, please

i) What kind of objective may be acheived by applying Markov chain analysis?

ii)How would be the arrangement of data?

Dear Researchers,

We are aware that a shift in monsoon peak discharge may have an adverse impact on several water-based applications such as agriculture, dam operations, etc. E.g.

**Thank you!***I am interested to know how to quantify the same based on modeling approaches.*Sincerely,

Aman Srivastava

Dear all,

I want to see whether a biomarker level at baseline can be used to predict the prognosis after a treatment alone as compared to a clinical parameter?

Which statistical model will be best to investigate it?

We have been conducting one continuous camera trap survey (2019 - present) with (n=40) camera traps set up across our study site. The objective is to determine prey availability in terms of demographic classes. However, since the majority of the prey species are not identifiable to individual, we are limited to unmarked species. Furthermore, we are open to the idea of using models that require population closure but will have to violate the assumption as we are purposefully comparing prey availability between breeding and non-breeding seasons.

Thank you in advance.

Many statistical tests require approximate normality (normal distribution should be seen approximately). On the other hand, normality tests such as Kolmogorov-Smirnov and Shapiro-Wilk are sensitive to the smallest departure from a normal distribution and are generally not suitable for large sample sizes. They can not show approximate normality (Source: Applied Linear Statistical Model). In this case, the Q-Q plot can show approximately normal.

Based on what is written in the book "Applied Linear Statistical Model", a severe departure from normality is only considered, in this case, parametric tests can no longer be used. But if severe departure is not seen, parametric tests can be used.

What method do you know to detect

**approximate normality**in addition to using a Q-Q plot?in ASTM G102, corrosion rate is calculated using current density based on Faraday's law.

Question 1) Amplitude of current measured ACM sensor has a very large range, Should i take the average of the measured values?

Question 2) In most papers, statistical models are used for ACM sensor, why is ASTM G102 not being used?

This is my research problem so far:

In this scientific paper I will conduct an empirical investigation where the objective is to discover if the number of newly reported corona cases and deaths have been contributing towards the huge spike in volatility on the sp500 during the pandemic phase of the corona outbreak. This paper will try to answer the following questions: “Is there any evidence for significant correlation between stock market volatility on the SP500 and the newly reported number of corona cases and deaths in the US?”. “If there is significant evidence, can the surge in volatility mostly be explained by the national number of daily reported cases or was the mortality number the largest driver? “

So far i have conducted a time series object in R-studio containing the variables; VIX numbers, newly reported US corona cases and deaths. I have also converted my data into a stationary process and will later on test some assumptions. I have a total of 82 obersvations for each variable that stretches from 15. February to 15. June.

I do not have a lot of knowledge regarding all the different statistical models, and which ones that is logical to use in my case. My first thought was to implement a GARCH or OLS regression, although I am not sure if this is a smart choice or not. Hence, I ask you for some advice.

Thank you in advance :)

Best regards, stressed out student!

I have 3 constructs. I have one dependent variable with 7 categories and 30 sub-categories. I have one independent variable with 5 categories and 20 subcategories and I also have one mediator variables with six categories and no sub-category. All these categories and sub-categories are assigned the value 0 or 1( dichotomous) Besides that I have 6 control variables, four are continuous one is dichotomous, one is scale..... Which is the appropriate statistical model and appropriate tests I can use for such type of data?

The RQs are: a) What is the role of reporting structure in firm's value

What is the role of stakeholders relationship in the relation of reporting structure and firm's performance?

Just to include examples IV = Performance Category1= Financial indicators Sub-category1= profit, sub-category 2= leverage, sub-category3= liquidity

Category2 = no-financial indicators sub-category1= environmental sub-category 2= economic sub-category 3= social.

DV=Governance Category 1= Board structure sub-category1= board size sub-category2= board profile sub-category3= board's experience

Category 2= Accountability Sub-category1= no discrimination sub-category2= fairness in reporting

I am interested to do the analysis up to category level. I can change the data from dichotomous(binary) to nominal. I am thinking to use averages or percentages to analyse the relationship between variables and constructs. Can I use Partial Least Square-SEM or any other parametric or non-parametric model?

Dear all,

I would like to know if it is possible to use SAOMs (Stochastic actor oriented models) to analyse weighted networks?

Thank you in advance,

Léa DAUPAGNE

I was looking for best statistical model for my research, my research concerning pregnant women. Also what is the suitable sampling technique can be used?

I am working on a project, for that I want to make a statistical model. Please guide me for data and different operations used in this project.

We have very large data sets of populations across five years. We want to compare the proportion of people in different categories statistically, controlling for the differences in sample size. For example, if we have a total sample of 10,000 people in Year 1, and 17% are ages 18-24, how do we compare that 17% to the proportions of same age category in Year 2 with 15,000 total sample, Year 3 with 18,000 total sample etc.? I had assumed it would involve weighting but want to get expert opinions on approaches to this. Ultimately, the goal would be to see whether there are statistically significant differences in the proportions in each age category, controlling for differences in the total sample. Thank you in advance!

Dear all,

I would like to know if it is currenly possible to use temporal ERGMs (Exponential Random Graph Models) for analyzing weighted networks?

For now, it seems that software packages available to analyse TERGMs (tergm or btergm) only use binary networks.

Thanks in advance for your answer,

Léa Daupagne

I want to statistically model mode choice behavior ( modes: bus, train, car, other modes). My independent variables are gender(male/female), car ownership (yes/no), age (continuous), household income( continuous), travel time, travel distance, travel cost (continuous). Along with that service quality parameters: comfort, reliability, safety etc which are ordinal data collected from questionnaire survey to be included in the model. Users has been asked to rank service quality parameters to rank in an ordinal scale of 1-5.

My dependent variable in nominal and independent variables contains nominal, ordinal, continuous variables. In order to model mode choice, which statistical method should I choose: Multi-nominal logistic regression or Ordinary Logistic Regression?

TIA

Hello everyone,

I am currently analyzing my dataset, I have obtained body weight data in insects for 7 days consecutively exposed to six different treatments in triplicates. I would like to analyze the effect of treatment over time, which model should I be using in SPSS?

Thank you so much in advance

Hello,

I am trying to setup a statistical model for a timecourse experiment. I have a total of 16 timepoints *7 before and 8 after treatment. I have 4 acclimation groups before treatment. At treatment, half the individuals from each group were treated with a protein inhibitor and all individuals are treated with a stress. Following treatment, I have 8 groups (inhibitor+stress, stress for each acclimation group). I have an unequal amount of measurements from each group at each time due to mortality and low quality data. This is not a repeated measures as each measurement is from a unique individual that was sacrificed. My data is non-normal as well possibly due to missing and low quality data. I read that I can use the average of each group to make up for the missing data points.

I have had great trouble trying to get each timepoint integrated in my model. I have tried analyzing by averaging all "before" and "after" timepoints for each group but it would be great to get results at higher resolution (point of the timecourse). I am using JMP but open to trying another program.

Any help you can provide here or point in a direction would be greatly appreciated!

Thank you!

*I forgot to add that I am missing data for a timepoint and some of the treatments do not have data for others.

I am currently building statistical models in Health care, education and Agriculture. and looking for someone who is willing to share primary data sets in these thematic areas. Support and or suggestions will be very much appreciated.

I have generated so many models having a different coefficient of determination (r

^{2}) and RMSE. In this analysis, it has been seen that it is not mandatory that a model having maximum r^{2 }has a minimum RMSE. It is very difficult for me to choose a model among them. I just want your suggestion. Which model should I choose? Should I go with r^{2}or RMSE?Thank you.

Does anyone have an example to share on a statistical model that incorporates temporal and spatial autocorrelation terms simultaneously?

Examples in ecology and hydrology research would be optimal. Thanks in advance.

Hi,

For a research I am trying to compare google trends search volume series for two online retailers . However, as you migth know, Google Trends returns you a normalized search index value for a spesific time period and does not reflect real search volumes. So in my thinking, while estimating statistical models, this issue creates an important problem. Do you also think if a model includes more than one google trends variable, they should be weigthed? After that issue wiegthing could also be a problem too.

I have behavioral data (feeding latency) which is the dependent variable. There are 4 populations from which the behavioral data is collected. So population becomes a random effect. I have various environmental parameters like dissolved oxygen, water velocity, temperature, fish diversity index, habitat complexity etc. as the independent variables (continuous). I want to see which of these variables or combination of variables will have significant effect on the behavior.

*Dear:*

*Within the statistical process, there is a model (Mahalanobis Distance) that is used to support the regression. When is the ratio good on this scale? And why most Arab studies do not rely on it and do not refer to it. But most foreign studies rely on it to support the results of linear and multiple regression. Is it possible to clarify it? What is the best reliable percentage, as high and low values appear in spss?*While running Multinomial logistic regression in spss an error displaying in parameter estimate table. For Wald statistics some item value is missing because of the zero standard error and displaying a message below this table "Floating point overflow occurred while computing this statistic. Its value is therefore set to system missing". Does anyone know how to resolve this error?

Hi all,

My paper is looking at the most significant determinants of u.k. economic growth. I will look at macroeconomic , technological, human capital, socio-geographical and governance variables against the dependent variable of GDP growth. Anyone who has done research into growth-theories could suggest suitable statistical models which can be used on stata? I am looking at Baynesian Model Averaging currently but not sure that is the best model out there

Statistical modeling based on the parameters affecting the characteristics , is it viable model to predict concrete properties based on the parameters

Which error families are allowed in generalized least squares (gls) models? Can I have, for example, a binomial glm and define a covariance structure (which, I guess, makes it a gls) in it (see below example)?

model <- glmmTMB(response ~ ar1(predictor + 0 | group), data = data, family = binomial(link = logit))

And also, should I call a glm with a covariance structure a gls?

When I have to choose my identification variable in lvl 1 and lvl 2, I cannot choose it because it is grayed out. Do you know why ? and how to fix it?

Hi all,

I have run a three-factor mixed-design experiment with one between-subjects factor (biological sex: two levels) and two within-subjects factors (having four levels each). I have measured several continuous response variables, each of which I have already analysed with a standard ANOVA. I have also collected the values of a nominal (non-ordinal) categorical response variable. This response variable is non-binary (it takes one out of five possible values).

The question is: How to approach the statistical analysis of a three-factor mixed-design experiment with a non-binary non-ordinal categorical response variable?

In particular, I would like to be able to analyse main effects and interactions as in a standard ANOVA.

Any reference to a related R package would be more than welcome.

Francesco

Accurate forecast can’t be possible by ARIMA model, generally used for forecasting.

In an exploratory study, If I want to state that certain components of counselling (7 items to assess), environmental modification (8 items) and therapeutic interventions ( 8 items) results in the practice of social case work, what analysis should I do?

NB: we have no items to assess practice of social work. Instead we want to state the practice of the other three components results in practice of social case work.

I have been recently studying proprietary voter files and data. While I know that voter files are for the most part public (paid or free), I am confused as to how companies match this data to other data.

For example, the voter files (public) never reveal who you voted for, or your behavioral attributes, and so on. So how do companies that sell this "enhanced" data match every "John Smith" to other data. How can they say that they have a profile on every voter? Wouldn't that require huge data collection? or are there models that simply do that job for them?

In the November issue of the

*Journal of Conflict Resolution*(**https://tinyurl.com/tyhrn2j)**Adrian Lucardi and I, debate with Bunce, Wolchik, Hale, Houle, Kayser, and Weyland, about whether democracy protest diffuse? We find across thousands of statistical models, that the don't between 1989-2000, in general? BW and W suggest that they might in very unusual circumstances.What are your thoughts? In what situations do you think they might spread? Why, despite all the protests occurring today in close succession of each other, is no one is talking about protest diffusion?

Minitab software is used to fit model to experimental data. I go to Stay>DOE>Response surface and choose regressors in uncoded versions, as well as their lower and higher values, then responses at the same time. As far as I know the unusual observations and large residuals points must be totally excluded from data and RSM must be performed again with new data. However, right after I remove these points and rub RSM, the results show new unusual observations and large residuals, and again I remove them. The same thing happens which leads to an undesirable reduction in data. So, what must be done with unusual observations and large residuals?

If I have two parameters, A and B, is it always better to include A*A*B and A*B*B, or just A*B is enough. Generally, what is the basis for choice, does it depend on the number of factors, particularly in Minitab software. In response surface methodology, for example, software itself defines the terms, but in regression>fit model it is possible to include additional terms.

I am looking for statistical models (or results of relevant numerical simulations) to estimate the number of different types of contacts between particles in a powder mixture. The simplest case would be a mixture of particles of type A and B, both spherical and of same uniform size.

The goal would be to calculate the number of A-A, B-B and A-B contacts per unit volume.

More complex cases would include size and shape distributions (which may or may not differ between the components) and mixtures with more than two components.

Any relevant references are highly appreciated! Thanks in advance.