Science topic

# Quantitative Data Analysis - Science topic

Explore the latest questions and answers in Quantitative Data Analysis, and find Quantitative Data Analysis experts.

Questions related to Quantitative Data Analysis

I am reviewing a paper for a statistic class. In it, the authors combine, for example, Cohen Perceived Stress scale (0-24) with Epworth Sleepiness scale (14 - 70) to give a final scale (14-94) in order to asses general perceived stress. I wanted to know if these type of scales could be combined in this way.

I suppose it is not a good idea, since they don't even have the same min/max values, meaning each point has more weight in the first scale (with only 24 max points) than in the Epworth scale (70 max points). If combined, someone that had 24/24 and 20/70 would be scored in the same way than someone that had a combination of 0/24 and 44/70.

hello ..

thank you for reading my quistion and i will appreciate any help. i am doing my master dissertation now which i am investigating gender represenatation in textbooks. my method is content analysis and i will collect data which includes genders of poets, scientists, authours, leaders and so on. i also will look at the appearance of each genders in varied areas. the data will be gathered from texts and pictures and i will interpret it (after analising) in charts and diagrams.

my quistion is does this process considered as qualitative or quantitave or mixed methods? and about the data is it qualitative data or quantitative ?

thank you

Fatema

Hi Everyone

Can anyone provide references of a thesis or article where systematic literature review is used to identify gaps followed by a quantitative data analysis method to fill the gap?

I am currently attending a PhD program in innovative Urban Leadership and I am interested how and in what way have the church leadership responded to the pandemic that happened in the past two years. I want to learn as how most leaders responded to the scenario created by the COVID-19 Pandemic. What innovative leadership techniques have been employed by church pastors and how were those techniques of innovative leadership principles and values have been employed to address the dire situation of the church goers? What models have been employed to address the wholistic needs of the members of the church? How did the church leadership overcame their vulnerability within this dire situation as they were in the forefront of fighting the pandemic? What is the learning most institutions generated and how do we recreate those learnings and use in the future when similar incidents happen?

I have six kinds of compounds which I then tested for antioxidant activity using the DDPH assay and also anticancer activity on five types of cell lines, so I got two types of data groups:

1. Antioxidant activity data

2. Anticancer activity (5 types of cancer cell line)

Each data consisted of 3 replications. Which correlation test is the most appropriate to determine whether there is a relationship between the two activities?

I am particularly interested in resources on/best ways to learn datasets/numeric evidence's interpretation and correlations between variables. Thank you!

Dear researchers,

I have estimated internal consistency for a questionnaire with 50 items (five -likert point) by alpha cronbach. Alpha was .98.

What is your interpretation?

What can the cause be from your point of view?

Thanks for your help.

I have 10 variables (1-D.V, 8-I.V and 1-Moderarator) in my model and I have 267 no of observations i.e. sample size.

I am newbie in AMOS and want to test the interaction between the variables of this model through AMOS. The measurement model estimates are quite handsome. The results of Hayes process model in SPSS for moderation are also fine. However when I run the Model fit measure plugin or Calculate estimates for interaction. The error "RMSEA is not defined" is shown. May I know the reason and solution please?

Note: The standardized values of IVs and moderator have been used for interaction.

Hi everyone,

I am experiencing trouble on measuring correlation on my dissertation study.

In two questions, I have asked my respondents "How many times have you been to ..." and "How familiar are you with the music at ...".

I am trying to establish the relationship between those who have frequented a store most often and if they are the same people who expressed their high familiarity with the music. They are the same group of respondents. Will a one-sample T test on SPSS be sufficient to test this relationship?

Hope to get all of your expertise on this!!

What are the programs which can be used for structural modeling?

I have used an AMOS previously but the trial period had run off, so are there any other available programs for the use which are free of charge?

thanks

How can i validate a questionnaire for hospitals' senior managers?

Hello everyone

-I performed a systematic review for the strategic KPIs that are most used and important worldwide.

-Then, I developed a questionnaire in which I asked the senior managers at 15 hospitals to rate these items based on their importance and their performance at that hospital on a scale of 0-10 (Quantitative data).

-The sample size is 30 because the population is small (however, it is an important one to my research).

-How can I perform construct validation for the items which are 46 items, especially that EFA and CFA will not be suitable for such a small sample.

-These 45 items can be classified into 6 components based on literature (such as the financial, the managerial, the customer, etc..)

-Bootstrapping in validation was not recommended.

-I found a good article with a close idea but they only performed face and content validity:

Ravaghi H, Heidarpour P, Mohseni M, Rafiei S. Senior managers’ viewpoints toward challenges of implementing clinical governance: a national study in Iran. International Journal of Health Policy and Management 2013; 1: 295–299.

-Do you recommend using EFA for each component separately which will contain around 5- 9 items to consider each as a separate scale and to define its sub-components (i tried this option and it gave good results and sample adequacy), but am not sure if this is acceptable to do. If you can think of other options I will be thankful if you can enlighten me.

Hi everyone,

I am trying to run statistical analysis on two datasets: two researchers have each independently measured a quantitative, continuous variable (i.e. wound closure rate) from the same population. What statistical tests would be most appropriate to test the

**variance**between the two researchers' measurements, and to determine if it is okay to combine the two datasets?What do you think of the following method:

1. Testing both datasets for normality

2. Testing for a significant difference between the

**means**of the two datasets (student's t-test for parametric data, Mann-Whitney U test for non-parametric data)3. Testing for a significant difference between the

**variance**of the two datasets (Kolmogorov-Smirnov test for parametric data, F-test for equality of 2 variances for non-parametric data)4. If means and variance are not significantly different, the two datasets can be combined

Any comments and suggestions would be much appreciated!

GLM is an advanced quantitative data analysis but I could not find enough materials for SPSS. Please, suggest some good resources teaching and learning GLM by using SPSS.

Hi,

I am doing my dissertation on the effects of complaints on sonographers in obstetric ultrasound.

I am doing survey as a mixed methods design. So a convergent design (questionnaire/data validation variant). I was advised to use descriptive statistics only for the quantitative data analysis. I cannot find any justification for this. Is this acceptable? Creswell seems to suggest I should be using inferential statistics as well.

I know its standard for surveys to be used a quantitative data only but I have done a lot of work on the justification for using it in a mixed methods study.

Also is thematic analysis standard in this type of study for the qualitative data analysis?

Many thanks

Gina

Hello everyone,

I am struggling with quantitative data analysis as I have not done it before. My questionnaire is about electric vehicles and I am looking on key factors that influence people's willingness to adopt electric vehicles. So, consumer behaviour in short.

I use Survey Monkey to collect data, therefore, it is easy to extract data in excel, figures, percentages, key words etc. What I need your help is;

1- Which method should I use to analyse? (best if it is the easiest one :)

2- Can I indeed just analyse by myself? This might be silly but surveymonkey gives everything I need, percentage, key words etc. Can I, for instance, give general overview of percentages, responses etc. Then start analysing and discussing question by question?

I am super lost and any help is appreciated! :)

Many thanks

Hello,

In relation to another recent, as yet unanswered, post (https://www.researchgate.net/post/How_to_calculate_comparable_effect_sizes_from_both_t-test_and_ANOVA_statistics), I am wondering how I can calculate the sample variance of an effect size, in order for me to then infer the confidence intervals.

So far, I have been calculating 'Cohen's d' effect sizes from studies' experiments, by using the t-value of two-sample t-tests and the sample size (n) for each group. I then convert the Cohen's d into an unbiased 'Hedge's g' effect size.

I understand that, normally, in order to calculate sample variance, I would also need to know the means of the two groups and their standard deviation. However, these are not reported in most studies when a t-test is calculated. Is there any way I can calculate the sample variance of my Hedge's g effect sizes without this information?

Many thanks,

Alex

Dear all,

I am conducting my dissertation which has a quantitative data analysis process and, as I have never done it before, I need your help to understand the needed steps.

The basic model will require three separate regressions to confirm the relationship between constructs. The data has been collected d with a Likert-style questionnaire where multiple questions measure each variable.

Before I start running the regressions, which steps should I implement?

I believe I would need to do reliability, do I test the Cronbach's alpha for each question of for the variables altogether?

Similarly for the regression, do I take the Likert average of each variable, or do I test each question separately?

Thank you for your help! If anyone would also be available for a video call it would be of amazing support.

Elena

I have applied the Yoon-Nelson kinetic model to the experimental data obtained for CO

_{2}adsorption on solid adsorbents. For the two materials studied, the experimental values of adsorption capacity were found to be 5.9% and 2.3% higher than those of empirical values. However, the R^{2}values were 0.998 and 0.985 respectively. Now, a journal’s reviewer is expressing disagreement with the values, pointing out that Experimental values are more than empirical values. I feel that high value of R^{2}and closeness of values (between experimental and empirical) are sufficient to show that the model fits well, as it is only empirical, not theoretical. Please comment on this.Hello everyone,

I am struggling with quantitative data analysis as I have not done it before. My questionnaire is about electric vehicles and I am looking on key factors that influence people's willingness to adopt electric vehicles. So, consumer behaviour in short.

I use Survey Monkey to collect data, therefore, it is easy to extract data in excel, figures, percentages, key words etc. What I need your help is;

1- Which method should I use to analyse? (best if it is the easiest one :)

2- Can I indeed just analyse by myself? This might be silly but surveymonkey gives everything I need, percentage, key words etc. Can I, for instance, give general overview of percentages, responses etc. Then start analysing and discussing question by question?

I am super lost and any help is appreciated! :)

Many thanks

Dear Scientists

During a study of corrosion inhibitor's performance, and doing

*, there is a statistical value appear called***Tafel fitting***and having a wide range of values ranging from very low (0.0001) to tens.***"Chi Squared"**Please can you advise how can I decide based on

*value if my***"Chi Squared"***is correct or not?***Tafel fit**I am using GAMRY Potentiostat.

Thank you in advance

Aasem

1. in a research study, I'm doing using SEM, there are six paths which are theoretically well-established in the literature, and three paths (from the exogenous to three mediating constructs) whose literature isn't strong. should I still use PLS-SEM or not?

2. using two different techniques of determining the sample size for PLS-SEM, I ended up with 146 and 147 cases, which are both over the number of population cases (140). this seems a little odd to me.

two methods are Hair et al. (2013) table on sample size recommendation and Gamma exponential method by Kock and Hadaya (2016).

in case of PLS-SEM being the right approach, should I continue with studying all the population (140)?

Hi, I'm writing a research paper for a theoretical experiment.

It's a repeated measure design. The estimated sample size is 385, consisting of 3 to 12-year-old participants who would be exposed to 4 different intervention methods and then complete a Wong-Baker faces pain scale (children under 8) or a Likert scale ( children 8 and above).

What quantitative data analysis would be appropriate?

Hello everyone,

As a part of my mixed-methods thesis, I ran a small survey about neurologists' practice on breaking bad news. I'm relatively inexperienced with quantitative studies, so, although I have managed to derive all the relevant descriptive statistics, I was wondering if I could perform any additional analyses to such a small sample size?

I have tried to consult google and several booked but got a bit confused. Do I have to test normality and then maybe perform non-parametric tests? As an example, I would like to look at whether their age, experience, perceived difficulty of breaking bad news etc are associated with their attitudes and practice.

Thank you!

I have collected data using questionnaires from athletes at two time points. I have an independent variable (leadership) and a few dependent variables but also some mediating or indirect variables such as trust ect. All of the variables were measured using the same questionnaire at two time points.

What data analysis method would be best to analyse the data? So far I have used the Process macro on SPSS which uses OLS regressions but I am unsure if this is the best method.

I essentially want to see how the IV relates/ increases the dependant variables over time and whether this changed occurred directly from the IV and indirectly through the mediators.

Would these be appropriate research questions for the type of data I have and for the appropriate analysis technique?

Hi everyone!

In my thesis, I use partner gender (woman vs. man) as a between subjects independent variable and priming figure (partner vs. acquaintance) as a within subject independent variable. Dependent variable is negative affect recovery score (range: -5 and +5).

So my study design is a 2x2 mixed design.

Also, I gave all participants LGB Identity Scale and asked their relationship duration prior to the experiment. I planned to use LGB Identity Scale (1-7 Likert type) and Relationship Duration (in months) as control variables.

The problem is that two control variables are continuous and what they measure is completely different than what I measure in the experiment? therefore, I cannot imagine how these covariates can eliminate any effect from dependent variable?

Another question is which analysis to use?

Actually I need to understand the logic of any analysis method to interpret these variables.

Thank you!

(*I use Tabachnick and Fidell’s book you can also refer to a book page )

I would like to know how to interpret the result when the interaction term is significant while both of the main effects are not.

For instance, I would like to understand the direct effect of A and moderating effect of B on dependent variable D. The results show that the direct effect of A and B are insignificant. However, the interaction term of A and B is significant.

1. In this case, should I interpret that there are no direct effect of either A or B, yet the B moderates the direct effect of A?

2. Is it okay to plot an interaction chart while there are no direct effects available?

Thank you very much!

Hello,

I am currently undertaking a meta-analysis related to the efficacy of previous studies to harness pharmacological agents to ameliorate associative fear memories in rodents. The concept of effect sizes is fundamental to meta-analyses, but I am not so familiar with it. As a result, I have a (perhaps) elementary question related to the calculation of effect sizes:

I have decided that 'Hedge's g' will be the most appropriate metric of effect size for my analysis, given the relatively small sample sizes of my included studies. Often, authors will report an observation-of-interest (i.e. treatment vs. control effect) in the form of a (two-sample) t-test. I have found that the reported values of a t-test can easily be used to calculate a Hedge's g effect size - all good so far.

However, often an observation-of-interest is reported as an ANOVA (e.g. when the treatment group is compared with more than one control type), and other meta-analyses I have read seem to derive their effect sizes from both t-test or ANOVA statistics, depending on which is carried out. Again, I think I have found the correct equation to derive a Hedge's g effect size from the reported F-value of an ANOVA. However, I am struggling to understand how effect sizes can be comparable when they are derived from both t-test and ANOVA statistics. Surely ANOVA-derived effect sizes are less informative because you cannot tease out the individual contrasts-of-interest from an F value, as you obviously can for a two-sample t-test.

As a result, my initial instinct would be to request the raw data from the author (when ANOVAs have been reported), calculate a t-test for the specific contrast-of-interest (i.e. treatment vs. one particular control group) and calculate the Hedge's g effect size from the resulting t-test value.

Apologies for the long question, but I would hugely appreciate someone to enlighten me, and show me how to conflate t-tests and ANOVAs when generating effect sizes for a meta-analysis.

Many thanks!

Alex Nagle

I have collected data using questionnaires from athletes at two time points. I have an independent variable (leadership) and a few dependent variables but also some mediating or indirect variables such as trust ect. All of the variables were measured using the same questionnaire at two time points.

What data analysis method would be best to analyse the data? So far I have used the Process macro on SPSS which uses OLS regressions but I am unsure if this is the best method.

I essentially want to see how the IV relates/ increases the dependant variables over time and whether this changed occurred directly from the IV and indirectly through the mediators. Would these be appropriate research questions for the type of data I have and for the appropriate analysis technique?

So I'm becoming crazy trying to analyze three online newspapers, the fact is that my paper has a very small scope so I only decided to study 10 articles per newspapers. The reason why I want to analyze so few is that I'm going to also take into consideration the comments under the articles to figure out what kind of reactions they generate directly. The problems here concern reliability and validity, which I struggle to show, like for example, can I just say that the selection is random? I have selected specific dates, which is from April to June 2014 and keywords such as "EP elections", "immigration", "economic crisis" and "austerity". Another problem is that I couldn't find any other paper anywhere which dealt with the same type of data. If anyone could provide some sources or help out in any way it would be greatly appreciated. Thanks in advance

Hello everyone,

I am trying to statistically analyze whether data from 3 thermometers differs significantly. At the moment, because of COVID-19, several control points have come up at the company for which I work. We have been using infrared thermometers to check up on people and to be aware if they have a fever or not. However, we don't own a control thermometer with which we could easily calibrate our equipment, we thought that using a statistical test would be helpful, but at this point, we are lost.

Normally, we would compare our data to our control thermometer and that would be it. Our other thermometers are allowed to have a difference of +-1°C at max when we compare them to their controls; we can't do that now.

What I have been doing is collecting 5 to 10 measurements from each thermometer and compare them through an ANOVA test, and then assessing the results (when needed) by running Fisher's Least Significant Difference test.
I don't know if it is right to do so because sometimes the data I collect does not seem to vary a lot (the mean difference is NEVER greater than +-1°C), and even so the test concludes that they differ significantly.

What would be right here? We don't want to work with the wrong kind of equipment or put away operating thermometers without a solid reason, we just want to do what's best to our people.

Could you guys please help me?

I am currently conducting a systematic review which uses the Downs & Black Checklist for measuring Quality.

I am having trouble with Question 27: The power question as I am not sure how to go about doing my calculations even with the recommended guidelines.

Does anybody have any suggestions or is anyone able to talk me through it as I am quite lost.

Also is there any articles which state this checklist as being valid even after Question 27 has been modified? Please advise.

Thanks

Does the indirect effect (in a simple mediation model) of your mediator only consider the IV, or also take into account the control variables in your model?

I am trying to check the robustness of the indirect effect in my simple mediation model (with 6 control variables), with the Sobel/Aroian/Goodman Test. Do I need to include my controls to check the indirect effect or not?*

*I have tried both. The indirect effect is significant as I include my controls, but insignificant as I exclude them. In my initial methodology, my indirect effect is shown to be insignificant (using bootstrapping method).

Hi,

I have analysed my data using multivariate multiple regression (8 IVs, 3 DVs), and significant composite results have been found.

Before reporting my findings, I want to discuss in my results chapter (briefly) how the composite variable is created.

I have done some reading, and in the sources I have found, authors simply state that a 'weighted linear composite' or a 'linear combination of DVs' is created by SPSS (the software I am using).

They do not explain

*how*they are weighted, and as someone relatively new to multivariate statistics, I am still unclear.Are the composite DVs simply a mean score of the three DVs I am using, or is a more sophisticated method used on SPSS?

If the latter is true, could anyone either a) explain what this method is, or b) signpost some useful (and accessible) readings which explain the method of creating composite variables?

Many thanks,

Edward Noon

I have finished an online course on Nvivo, but before to start analyzing my data with it, wanted to know my peers' experience on any other software for quantitative data analysis.

If the variances are not homogeneous in the one way anova test and the sample numbers in the group are equal, which of the Dunnetts T3, Games Howel or Dunnetts C tests are preferred to determine the source of the difference?

I wish to collect the data of Annual Average conc. of SO2, NO2 indicators, state-wise. The data available is citywise. Can I take average of all the cities within the state and use it for state wise analyasis?

narratives, oral histories, videography adds to and enhances quantitative data analysis for efforts like this.

Dear colleagues,

I've been working on some lipid extraction analysis with QDa HPLC.

I start with some biological samples that are spiked with a different final concentrations of a lipid mix (consisted of few known lipids), thus to see what is the minimal concentration (and the recovery %) of the lipids that can be detected. In parallel, I use a freshly prepared calibration curve (0;0.5;1;2;4;6;8;10;20).

The problem is that I am not sure of the way I analyze my data, because in some samples I see clear peaks (of up to the ^6-7 intensity), yet, in my excel calculations I get very low or non-existing concentrations, like the extraction was not successful at all. For ex., I have clear peaks and higher calculated concentrations for samples with 50ug/ml final conc; but for samples with 100ug/ml I only have clear peaks with higher intensity, yet the calculations and recovery % show very low/non-existent values.

The way I analyze my data is the following:

1) Use calibration curve (slope&intercept) to calculate the concentration of my samples;

2) I tried to normalize for the blanks (for ex. if I have some very low signal in some of the blank samples)

3) I tried to normalize for the dilution factor (if diluted 10 times, multiple the concentration calculation by 10).

I believe there is a problem with the calibration curve cause a) R= 0.95-8 (approximately)

b) because of the slope or intercept, I get a lot of negative values in my sample conc. calculations.

I already checked the calculations multiple times and I am using a precise automatic pipette, so I really don't understand what the problem might be. Knowing how simple Cal.curve preparation and calculation is, I feel very frustrated that I cannot get proper values every time I do the analysis.

Do you maybe have any suggestions/comment/advice on a proper calculation, on how the slope&intercept affect my concentration and the calibration curve problem?

Thanks in advance,

Margarita

In an exploratory study, If I want to state that certain components of counselling (7 items to assess), environmental modification (8 items) and therapeutic interventions ( 8 items) results in the practice of social case work, what analysis should I do?

NB: we have no items to assess practice of social work. Instead we want to state the practice of the other three components results in practice of social case work.

If you have expertise on quantitative data analysis, please let me know! I have data from questionnaire survey and need help in analysis those data! Perhaps interested person can communicate with in miladhasan@yahoo.com for collaboration as well.

For eg, In an exploratory study, I have assessed social factors, emotional factors and cultural factors and emotional intelligence. How can I arrive at a model by identifying relevant factors influencing emotional intelligence?

I'm preparing for a management research and analysis exam and in it we have to devise a hypothetical research study to solve an organisational problem.

My independent variables in the example I'm completing are communication frequency, trust levels within team, and number of team members. My dependent variable is team performance.

If I wanted to suggest trust had a moderation effect on the relationship between communication and team performance, what steps would I have to go through/data analysis techniques would I use, and what kind of results would I expect to see if there was a moderation effect on the relationship? It doesn't need to be too in depth, and it might be worth noting I'm using a longitudinal research design with questionnaires as my data collection method.

Many thanks.

Can someone please advise me as to the best software to use to analyse quantitative data.

Does anyone could share with me the steps to analyze the ERP data using BrainVision Analyzer 2? My data was collected using Brain Products V-Amp 8 channels.

Thank you in advance.

I am new to using R and am stuck on best code to analyze a split plot experiment. Does any one have sample code?

Hey everybody,

I'm going to be up front with this, I am not super confident when it comes to quantitative data analysis. The study I am working on uses a series of Likert-style questions to generate data (including a basic BFAS questionnaire) which I am using to highlight potential areas of interest before I conduct my primary analysis using qualitative data analysis. I have two focus groups (the first N=31 and the second n= 13) with two equally large control groups randomly selected from the remaining sample that did not qualify for the focus groups.

Anyway, that's more than you all probably needed to know - my issue is that I'm not sure what all information I need to put into my quantitative report, especially since that data is just basically a discussion starter for my Qual analysis. I've run (r) on the appropriate values, and computed (d) between the focus and control groups. Is there anything else you all would suggest I do?

Thanks in advance for the help. We English studies types don't do much in the way of quantitative studies.

My data consits out of observations rated on 5 different dimensions (5 point Lickert-scale). A sixth variable describes the outcome (binary 0-1) I am interested in. But rather understanding the individual contribution of these dimensions on the outcome variable, I am interested in finding out the optimal combination of dimensions resulting in the highest probability of getting the Outcome equal to 1.

Do you have any advice regarding a methodological approach for me?

Thank you very much in advance, your help is highly appreciated!

Kind regards,

Jessica Birkholz

Dear sirs/madams,

Hi.

I have a question the answer to which I couldn't find. I was wondering if you could help me.

I used random matching (matched paires) based on one variable to assign 132 participants into two groups i.e. 66 participants each. However, after deleting outliers and checking the assumptions, imbalanced groups resulted in a way that one group has 58 participants and the other has 53.

Running an independent sample t-test showed no significant difference between the means of the two groups, yet i was doubtful about the numbers of the participants in each group. Should the numbers be exactly the same by deleting the counterparts of the participants that were deleted?

After submitting this paper to a journal, I have received feedback that it is not possible to make a contribution to this literature with analysis at such a macro level and I should look at firm-level data or case studies to do the analysis. This cant be correct can it?

My current research project is a sequential explanatory mixed methods project. I plan to give high school teachers in one school district a demographic questionnaire and the West Virginia 21st Century Teaching and Learning Survey [WVDE-CIS-28]. This survey is a 5 point Likert-Type Survey on the frequency of integration of 21st Century Skills . Teachers with the highest frequency will be chosen for the qualitative interviews (and used to define successful integration) and a thematic analysis will be used for the qualitative phase.

I am trying to determine what statistical analysis should be used to determine if a relationship exists between frequency of integration and the following:

a) years experience teaching

b) level of education

c) number of professional development hours completed- related to 21st century skills/learning

I have limited knowledge of quantitative data analysis. Is there anyone out there that can provide me with some insight?

My study is based on skilled returnee to their home country and there are two types of them , one came home country with the job offered by government and other find job on their own after return to home country. Choice of coming back is ranked on the scale of 1-10 .

question that i need to find answer for my project is that

Do returnee with govt fellow and non fellow differ significantly on choice of reason of returning to home country ?

Should i use Mann-Whitney test or kruskal wallis test?

Thanks

When I analyze NMR data for metabolomics combined with other omics, can I use relative quantification data but not absolute quantification to do multi-omics data analysis?

I would like to cluster user into at least 3, and at most 7 segments based on different credit and paying back variables that i would collected from two different databases. In the attached file, i put an example of what table I will have to do this. The user_id (A-E) represents my users which, in reality, is about 20, 000. I put different potential variable that would be part of clustering. I would assume they can rage from 5-15 variables (in the example, dunning state, no. of missing date, and so on).

My question is,

1) Is it possible to do a k-means clustering to achieve such segmentation? Or do I need to have other statistical means I should purse?

2) Which software can do the job well (R or Spss)?

Excuse any nativity on my part, as i am new to quantitative data analysis.

Usually a simple model such as linear regression require a population, where each samples are linked with each information/parameter. For some reasons: probably data security or financial, I only get the marginal distributions of each given parameters.

While I know this is not optimal and information (especially interrelationship) is lost, I want to ask whether you think this is a hopeless study case. I personally cannot think any possible way to analyse such data (trivially, I'm able to see the marginal histogram or calculate its core density, but nothing more than that).

I would not go so far saying that analysis with such data is technically impossible. Given only that an object's shadow is a circle from three different perspective, one could still reliably estimate that the object is probably a ball. In the same manner, one might be able to analyse something (with some degree of information loss), when only given marginal distributions.

What is your opinion on this? and would you suggest a concrete statistical/stochastical methods when given such data?

When you sacrifice a mouse from a study group due to its big tumor size, how do you record the result? Since the average tumor volume of the group will be smaller, you should see a dip in the tumor growth chart, but this does not reflect the real situation. Also, in the end of the study, the number of mice in the study group will be smaller due to sacrifice, thus SEM will be bigger. Do you record the changing of study group size in the growth chart?

thanks,

Dear all,

I am currently working on my MA dissertation in Event Management, analysing the impact of Technology on UK music festivals' attendees' experience.

I would like to analyse data collected via a 25 questions survey (5 points likert scale - strongly agree/strongly disagree). My main objective is to test 8 hypothesis. My hypothesis are classified in 4 different groups (2 per group), each of these groups being related to a different facet. I aim at finding a positive relationship between the different groups/facets.

Example:

H1: Sound equipment has a positive influence on Music experience*

H2: Large screen has a positive influence on Music experience

H3: Social Media has a positive influence on Social experience**

H4: 4G has a positive influence on Social experience... and so on

*1st facet

**2nd facet

However I am not sure of the right method to follow... Which kind of data analysis test would you go for?

Thank you

I have 8 months of load consumption and PV generation data available from site and want a prediction/forecasting analysis for the next 4 months. Kindly update with any new/effective method used.

Thanks

I applied sequential explanatory design in which quantitative data analysis was followed qualitative data collection through semi-structured interviews. The qualitative interviews were based on the themes/sections of the quantitative data. I am wondering what type of qualitative data analysis will be the best choice at this stage. The qualitative data will support the quantitative results. Thanks

I am familiar with liker type questions coding and analyzing in SPSS, but how can code multiple answers questions in SPSS.

For example:

Q: What are you wearing right now?

o Short-sleeves shirt dress

o Long sleeves shirt dress

o Straight trouser

o Sleeveless shirt

o Skirt

o Sleeveless vest

o Long sleeves pyjamas

o Calf length socks (length between knee and foot)

o Pant

o Boots

o Sandal o Cap

o Scarf

o Abaya

There are more multiple checks to this question by single respondent.

1) How this can be coded in SPSS ?

2) How to analyze It?

I'm conducting a study investigating the attraction to individuals in the Dark Triad. I am using a questionnaire with several likert scale rating sections about attraction as well as some vignettes where the participant rates their attraction to the vignette person and filling out the Big Five Inventory (BFI-S).

I'm not sure which method of data analysis I should use for this study.

In my data-set, there are 8 explanatory variables. I took log transformation for 7 of those as doing so improves normality while one explanatory variable remains untrasformed as log transforming does decrease normality as shown in normal q-q plot. In this condition, can I run a regression model with one variable untrasfored while all other variables log transformed? Note that dependent variable was also log transformed.

which method must be considered to treat the missing data in demographic information that might the respondents answered most the questionnaire items?

**Why is it important to examine the assumption of linearity when using Regression?**

I'm planning to perform a SEM in order to investigate intention to innovate. This dependent variable depends on attitude toward innovation (ATI) and entrepreneurial self-efficacy (ESE) - based on the scheme of the Theory of Planned Behaviour. Literature indicates that other variables play a role, such as gender and family exposure to entrepreneurship (FEE).

How should I perform the analysis, in order to see how gender and FEE impacts the scheme? The data comes from a survey with 1200 answers.

At first I thought on mediating model, but I believe the effect is not a causal relationship between gender and the dependent variable, but a moderator effect. Nevertheless, I am struggling to implement it in Stata. Most literature about it talk about mediation and moderator at the same time, which is not the case.

Other suggestions of analysis would also be really helpful.

Best regards, Pedro

When, I try to fit model using AMOS so I find a instruction that the following variables are endogenous, but have no residual (error) variable so my question is: is it compulsory to impose residual (error) without imposing error residual? The results are found good. Kindly respond over this .

The β-weights of the items in the factor pattern will be substantially reduced, I suppose, but will that be true for the item-factor correlations in the factor structure as well?

People think numbers are more accurate than words although the quality of data analysis matters.

Which one do you thin is more convenient in applied research? What are the considerations?

Reflections are appreciated in advance,

I was working with PEAKs quantitative data analysis but I have a confusion what a Ratio vs Quality graph means. Further, the output in the quantitative analysis has indicated that Quality≥1 under the Result filtration parameters. But I don't understand what this ≥1 value is telling.

Thanks

I have annual data on real total remittances and real per capita income, primary education enrollment, domestic saving, age dependency ratio, education expenditure as percentage of GDP. All variables are annual time series observations from 1972-2012.

Need your expertise in this regards immensely?