Science topic
Hypothesis Testing - Science topic
Explore the latest questions and answers in Hypothesis Testing, and find Hypothesis Testing experts.
Questions related to Hypothesis Testing
Hi all,
I am trying to find the demographic association with with variable on customer engagement.
Need suggestion on two queries
1. is alternate hypothesis statement is correct ?
2. Is decision statements are correct?
Alternate hypothesis statement
There is significant association of gender with customer engagement
Decision rule
1 The Alpha value is taken as 0.05
2.Reject hypotheses, if the p-value is less than or equal to Alpha
3. Fail to Reject hypotheses, if the p-value is greater than or equal to Alpha
Pls help
Regards,
Uday Bhale
I am researching potential success factors for the internationalization of small and medium enterprises (SMEs) into emerging markets.
The identified success factor is "cooperation with local firms." As there is no existing literature on this specific topic (SMEs + Cooperation + Emerging Markets), I plan to use qualitative content analysis on data collected from expert interviews.
The schematic structure of the research will be as follows:
- Theoretical basics (Internationalization, SMEs, Emerging Markets)
- Theoretical basics (Cooperation) -> Ending with a research question (for example: "How important is cooperation, especially for SMEs, in emerging markets and what are possible reasons?")
- Results of the interviews -> Ending with a hypothesis ("It is an advantage for SMEs to cooperate with local firms to internationalize in emerging markets.")
Is this a reasonable approach?
Hello everyone,
I am currently doing research on the impact of online reviews on consumer behavior. Unfortunately, statistics are not my strong point, and I have to test three hypotheses.
The hypotheses are as follows: H1: There is a connection between the level of reading online reviews and the formation of impulsive buying behavior in women.
H2: There is a relationship between the age of the respondents and susceptibility to the influence of online reviews when making a purchase decision.
H3: There is a relationship between respondents' income level and attitudes that online reviews strengthen the desire to buy.
Questions related to age, level of income and level of reading online reviews were set as ranks (e.g. 18-25 years; 26-35 years...; 1000-2000 Eur; 2001-3000 Eur; every day; once a week; once a month etc.), and the questions measuring attitudes and impulsive behavior were formed in the form of a Likert scale.
What statistical method should be used to test these hypotheses?
I'm pretty new to statistics. I have a temporal data set consisting of nitrate levels in water collected from 5 different locations. (I wanted to correlate it with water pH in the region in long run.) Based on box plots, it appears that level of nitrates in location A and C are significantly higher than other spots at a given month. I would like to prove that A and C contributes most to the total nitrate levels of that month than other spots by a hypothesis testing and p-Values. Is there any hypothesis testing or post hoc test to do that. (The data ins not normally distributed) Thank you.
We are currently working on a research paper aiming to develop ink from organic waste. My research group mates and I are debating on what statistical test to use for our study. We want to see the effect of different particle sizes on ink characteristics such as viscosity, pH, drying time, erasability, density, etc.
Most of the related literature we found did not use a specific test, only graphic/tabulating and then describing the data they have obtained.
Given a time series with Events, I want to test whether events in two time series are occurring differently.
See for example the attached image. There are 12 events (orange) between 2000 and 2007 with different lengths. Let's pretend this are drought periods in countries or a any kind of event that can last for a certain time and there can be multiple events overlapping (my actual dataset are returns and the events are some kind of patterns).
I can simulate data to generate events. For example I can simulate a weather dataset and if certain conditions are met it is an event that will last for some time (e.g. no clouds = no rain). Thus, I can generate as many datasets (under H0) like I want.
I want to check if there is any kind of "systematic occurrence" or if there is the same amount and distribution of events in the simulated datasets and actual dataset.
I could use a simple t-Test and test the average number of events from the simulated datasets against the actual one. But his would not account for the problem, that the amount of events could be the same - but the events could be clustered (e.g. the twelve events in the actual dataset are always at the beginning).
Does anyone knows any similar problem with a solution or any kind of test for this kind of problem?
My only idea is to split the tests:
(1) Test for the number of occurrences
(2) Somehow test if the structure is different
Thanks,
Nico
Hi, I am a student doing research on the MBA level.
In my research, I have linear regressions, Chi-squares, and t-tests due different types of variables. I have around 8 variables that are tested against each other. I have two main hypotheses but have several sub-hypotheses that are tested in SPSS.
Due to my study having several analyses, are there any issues with having (too) many hypotheses? i.e 16 - 20? Are there any arbitrary limits or is "too many" something in research?
Thanks
I am studying the effects of social media on fad dieting on males and females. My hypothesis is "There is going to be more of a social media influence on females than males regarding fad dieting." I'm recruiting participants from my University however I'm slumped to figuring out a procedure to measure my hypothesis. Can anyone help me? Thank you in advance.
The scenario is as follows:
- Imagine a phenomenon was studied by others;
- They concluded that there is correlation between the increase in that phenomenon and the increase in its outcomes;
- After reading these studies, can I hypothesize that if the phenomenon will prevail/ increase, the outcomes will also prevail/increase as well?
- Will my hypothesis be valid or is it incorrect to draw future conclusions on past ones?
- P.S. Sorry not to reveal what I am hypothisizing because, once revealed, it will lose its magic :)
in hypothesis testing, we report the confidence interval? For instance when we say 90% or 95% confidence interval, what does it mean?
Kindly, let me know which regression model I can use specifically for hypothesis testing.
In the study (EEG band power analysis), there were six sample points (six subjects) on the pre and post-groups but couldn't get a significant difference between both groups. Intend to apply a sliding window of 1s with no overlap to reduce variance; let's assume we have 100 sample points per participant, 600 sample points in the pre-group, and 600 points in the post-group. I intend to run paired t-test on the data obtained post sliding window application. The dilemma:
should I,
1. average the windows generated per participant to increase the SNR, returning our data to 6 sample points per group (we didn't get any significant difference with this method)?
2. keep the 600 sample points generated by the sliding window and run the test on it?
Hi
I have applied several conditional independence testing methods:
1- Fisher's exact test
2- Monte-Carlo Chi-sq
3- Yates correction of Chi-sq
4- CMH
The number of distinct feature segments that reject the independence (null H) is different in each method. Which method is more reliable and why?
(The data satisfies the prerequisite of all of these methods)
In my dissertation I made a Multiple Linear Regression Model, and to test the variable significance I display the p-values with the hypothesis test as follows:
H0: There is a relation between the feature and the output
H1: There is no relation between the feature and the output
1º This hypothesis is correct?
2ºWhen I say 'relation between the feature and the output' it means that this relation is linear?
And if the variables present a non-linear relation, for example an exponential relation, the variable will not be significant?
Good day.
I am doing linear regression between a set of data and predictions made by two models, that I'll call A and B. Both models have the same number of parameters.
If I do a simple regression with excel, I get the following:
- Model A has R2 = 0.97.
- Model B has R2 = 0.29.
- The least-squares fit to model A has a slope m = 2.43.
- The slope for model B is m = 0.29
From this simple analysis, I would conclude that model A is better than model B in capturing the trend of experimental outcomes. I even tested it on a set of unseen data and it performed better at predicting the trends.
Now, I was asked to confirm this by hypothesis testing, and here it gets tricky probably due to my lack of experience. Due to the large slope of model A, the residual sum of squares for model A is huge, almost 5 times larger than that for model B. Since the number of data points and parameters is the same for both models, this suggests that model B is better than model A.
What am I doing wrong? I feel that I'm not formulating my problem correctly, but I'm honestly lost.
Also, I've seen that there are endless flavors of hypothesis testing, but the more I read the less I know where to start.
Is there a simple prescription to formulate and test my hypothesis?
Many thanks in advance!
Which countries use permanent income hypothesis for managing their oil wealth?
Hi
I have a huge dataset for which I'd like to assess the independence of two categorical variables (x,y) given a third categorical variable (z).
My assumption: I have to do the independence tests per each unique "z" and even if one of these experiments shows the rejection of null hypothesis (independence), it would be rejected for the whole data.
Results: I have done Chi-Sq, Chi with Yates correction, Monte Carlo and Fisher.
- Chi-Sq is not a good method for my data due to sparse contingency table
- Yates and Monte carlo show the rejection of null hypothesis
- For Fisher, all the p values are equal to 1
1) I would like to know if there is something I'm missing or not.
2) I have already discarded the "z"s that have DOF = 0. If I keep them how could I interpret the independence?
3) Why do Fisher result in pval=1 all the time?
4) Any suggestion?
#### Apply Fisher exact test
fish = fisher.test(cont_table,workspace = 6e8,simulate.p.value=T)
#### Apply Chi^2 method
chi_cor = chisq.test(cont_table,correct=T); ### Yates correction of the Chi^2
chi = chisq.test(cont_table,correct=F);
chi_monte = chisq.test(cont_table,simulate.p.value=T, B=3000);
Hello,
I am writing a research proposal in the field of Marketing Theory right now. The hypothesis must be developed based on the theory (of a paper by Ofek et al. (2011)), that multichannel retailers must increase in-store assistance levels to decrease product returns. Therefore, the authors propose that retailers with a high level of in-store assistance have lower returns than vice versa.
I want to test this relationship. My hypothesis is based on my belief (based on previous literature), that this relationship as described by the authors has changed. Therefore, I expect both groups (retailers with high vs. low in-store assistance levels) to have the same product return rates.
Now my question:
Is this hypothesis in a statistical context correct?
"The average product return rate of a B&C retailer with a high level of in-store assistance is similar to a B&C retailer with a low level of in-store assistance in the clothing market."
My problem is: If I conduct, for example, an ANOVA test and the results are significant, I would normally conclude with "there is a significant difference in the group means, therefore I can reject the null hypothesis.
In my case, with a significant test result, I would then need to *reject* my hypothesis. Is this allowed or statistical incorrect?
Thank you so much for any help or feedback. I would appreciate any thoughts on this.
Kind regards,
Johanna
When we teach about family-wise error rates, we usually use straightforward examples such as:
I conduct an ANOVA and obtain a statistically significant result. Next, I conduct post hoc comparisons to determine which cells significantly differ. However, I need to adjust alpha since I'm doing multiple hypothesis tests.
...but are family-wise error rates an issue only in these narrow situations; namely, when multiple hypothesis tests are done on the same datum and all related to a single underlying "question", or "inference".
For instance, if a study contains 3 experiments, all slightly different in design, but all intended to address a single question or phenomenon - is a statistical adjustment required? Why or why not?
*My point in asking this question is that what exactly constitutes a "family" is far from clear (at least to me).
Is Bartlett's test alone enough for hypothesis testing? or Chi square has to be tested along with in a dissertation for a Ph.D study in the field or social science?
For example, in the model y= b1+b2x2+b3x3+e, I want to test the linear combination b2+b3=0 in R with model estimated in Quantile Regression. I checked quantreg package of Koenker , it has the command anova.rq, but it tests only nested models, similar to the usual anova command in base R. I want to test linear combination of a single multiple regression or time series model. For linear model package 'AER' has such a command of 'linearHypothesis' but it does not work for QR.
I know Eviews can be used for this and I have used this in my paper Iqbal (2017)"Does gold hedge stock market, inflation and exchange rate risks: An econometric investigation", International Review of Economics and Finance.
Any help is appreciated.
I have large sample (just one sample with 1100 cases) and I want to test a hypothesis about comparing mean of my sample in the two groups (each group has 550 cases).
Some statisticians told me "you can use formal student t-test because the data are normal, based on Central Limit Theorem".
I'm confused, the Central Limit Theorem is about "mean of sample means". for example, if we have a data with 100,000 cases which is not normal then we can take 100 samples. In this case, the average of 100 sample means would be normal. Now I can use the t-test.
If my sample is large, can I use parametric statistics (or testing hypothesis test) with a non-normal distribution of the data?
My understanding of conventional practice in this regard is that when there are more than two independent proportions being compared (e.g., comparing the proportion of people who contracted COVID-19 at a given period between the <18 year-old group, 18-64 year-old group and >64 year-old group), one of the groups being compared will serve as a reference group (which will automatically have an OR or RR = 1) through which the corresponding OR or RR of the remaining groups will be derived from. As far as I know, it seems that the generated OR or RR from the latter groups, through logistic regression or by-hand manual computation, will have a p-value whose threshold for significance testing is not adjusted with respect to the number of pairwise comparisons performed.
I understand that in the case of more than two independent means, we implement one-way ANOVA/Kruskal-Wallis technique first as omnibus/global hypothesis test which is followed by the appropriate post-hoc tests with the p-value thresholds adjusted if the former test finds something "statistically significant." I imagine that if the same stringency is applied to more than two independent proportions, we should be doing something like a Chi-square test of association (with the assumptions of the test being met) first as omnibus/global hypothesis test, followed by an appropriate post-hoc procedure (possibly Fisher exact tests with p-value threshold adjustment depending on the number of pairwise comparisons performed) if the former test elicits a "statistically significant" difference between the independent proportions.
I would like to ask some clarification (i.e., what concepts/matters I am getting wrong) on this. Thank you in advance.
The starting point for Interpretative Phenomenological Analysis (IPA) is induction. It aims at filtering out theoretical and conceptual assumptions in order to allow the lived experiences of research participants vis-a-vis a phenomena to speak on their own terms. But, although IPA strongly emphasises an inductive process, many PhD students who choose IPA as their main methodological point of departure appear to make empirical and theoretical assumptions, as well as develop hypotheses, prior to embarking on their fieldwork (this is, at least, my experience). So, the question is, can IPA incorporate deductive processes, such as preconceived hypotheses?
Hello everyone. I really need your help on xtfrontier.
I'm using STATA and journal of Battese and Coelli (1995) to estimate the efficiency of firms using two-stage procedure.
As in the journal there is some test of hypotheses for parameters of the inefficiency frontier model. What command that I should run in STATA to have those hypotheses test and to get the inefficiency score? Like how to do loglikelihood test like in the paper?
I run as follows:
xtfrontier lnY lnK1 lnK2 lnL, tvd
predict TE, te
xtreg TE ...(some independent var)
Thanks in advance for your help!
With a hypothesis test, we can see of using the null hypothesis in many studies. While the use of alternative hypotheses is also widespread, particularly for the structural equation modeling approach.
It is very important to know for the researchers when we should use what hypothesis approach. Could you give your expert opinion on it?
Esteemed members of Research Gate
I have been wondering for many years and even now about hypothesis testing, which is not able to produce practical results. Just accepting some proposition or rejecting it would not be right. Ironically the research which we do here is heavily dependent on it. I am starting the discussion so that you would provide insights into the issue.
My point is this. When I was doing graduation in statistics, I have been instructed to set up the Null Hypothesis as a statement of " no significant difference". If the calculated value is less than the tabulated value, then accept the null hypothesis. If not, reject the null hypothesis. But in research reviews and some lectures, I have seen the opposite. If I just ignore the null hypothesis statement and just take a directive hypothesis where I assume that there is a significant difference between the variable under study as my primary hypothesis, would the same rule apply? I mean, if calculated<tabulated then we have to accept the primary hypothesis, which is assumed to be significant. Then I have to reject it if the opposite comes true. Is this right?
To bypass this issue, I have come across with bayseian and is very clear. But I do not have a model to follow in teaching students.
So kindly edify the correct way of teaching hypothesis testing as is still the predominant practice of doing psychological research.
If possible refer me some free resources of teaching research methodology to psychology teachers. Unless teachers of psychology trained well in this issue, genuine research would not come forth.
I am using Mann-Kendall test and Sen slope to assess the trends in monthly rainfall datasets for 64 years, e.g., Jan 1957, Jan 1958, ..., Jan 2020. Since the region is a semi arid one, there are a lot of zero values (NOT missing values) in the time series. For example, the time series for rainfall in January has only 15 non-zero values out of 64 data points. My question is how this will effect the trend test (Mann-Kendall) and the trend slope (Theil-Sen)?
Dear Community,
I was wondering whether it is possible or not to validate hypothesis testing based on OLS regression with 2 indep variables only? Or will such thing decrease the credibility of OLS regression results?
If we push it to the extreme, is it possible to validate hypothesis testing with uni-variate regression analysis?
On working on a hypothesis it was assumed that a marine microorganism named dinoflagellates need to be injected via microinjection into the xylem tissue of the plant once cultured into the nutritive media.
I want to know is it possible to do so (practically)? Any consequences related to it for plant as well as for the microorganism survival ?
I have heard some academics argue that t-test can only be used for hypothesis testing. That it is too weak a tool to be used to analyse a specific objective when carrying out an academic research. For example, is t-test an appropriate analytical tool to determine the effect of credit on farm output?
I am confused about it that during testing EKC, the studied variable of DGP and its square values are having very small coefficient (0.01 and 0.00) having positive and negative sign respectively at 1% level of significane. It is okay with these values or there is something wrong with the data i am using in the study? Kindly help me out in this matter. Thanks
I want establish relationship between two construct and both construct are second order reflective-formative construct. Please suggest how test hypothesis between such construct using smart pls software. Kindly share a research paper if possible for such kind of analysis. Thanks in advance.
I have two independent groups of animals (n=4 in each group) observed under two different conditions (A and B) at the same time of the day and equal time interval (12 hours). For each group, behavioral data were recorded, divided into 7 mutually exclusive categories of behaviour. As a result, we have a different number and frequency of categories of behavioral acts between two groups (Total: Group A - 795 behavioural acts, Group B - 867).
I just visualized the total frequency of the categories in each group and the relative frequency as a percentage, but I do not know which statistical method to use with this data to determine the significance of the difference in the frequencies of each behavioral category between group A and B.
Can you suggest one?
Hello, everyone
I want to discuss with you about Hypothesis Testing.
Briefly speaking,
about 350 thousand people have bad liver(bad AST, ALT) so they got medical test for Hepatitis C
and only 38 people among them really have virus of Hepatitis C
and other 850 thousand people didn't get a medical test for Hepatitis C because they have normal AST, ALT
(So they don't know if they are patients or not for Hepatitis C)
In this case, we want to do test between two groups
group C : All people who have virus of Hepatitis C
group D : All people who don't have virus of Hepatitis C
We want to test if there is a significant difference between the mean of BMI of group C and the mean of BMI of group D.
(Also we want to test similarly if there is a significant difference between the mean of weights of group C and the mean of weights of group D
and so on)
Two serious things in this situation are
We don't know, for some of them, if they belong to group C or group D.
And, there is extreme imbalance between the number of group C and the number of group D (group C is too small)
In this case, I want to discuss with you
What is the best strategy or test for this situation
Thank all of you
Dear Researchers, I would like to discuss, please, on experiences with the statistical method of the Wilcoxon Paired Test. In the core algorithm of the method, the values are further transfered on ranks. However, the test would not consider e.g., if some value (before transfering on ranks) e.g. is 5 times or 15 times greater than other values - scale of some values. Do I consider right, that the Kolmogorov-Smirnov test for testing variances is appropriate, as the necessary 2nd additive procedure, for obtaining the information on difference of the various situation with appeared various multiplied values (not ranks) in the data file (?)
I supposed that sequency before transfering on ranks e.g. 1 1 1 2 1 .. is in Wilcoxon Paired Test transfered same as e.g. 1 1 1 20 1 ...; However, if is right to considered the K-S test for satisfaction that there are two statistically significantlly different possible situations in the sense of the influence of themselve values for the paired comparisons.
If the Wilcoxon Paired Test would obtain that the medians of ranked data are statistically significantlly "same", there are not included the influences of concrete values; therefore, if is right to make the following additive test on differences between variances by Kolmogorov-Smirnov test (?)
Thanks for possible discussions
With wishes of statistically significant summer days
Tomas Barot
Dpt. of Mathematics with Didactics
Uni. of Ostrava, Czechia
My analysis is about the candidates who participated in 8 Ukrainian Parlamentarian elections more than one time, no matter whether they were winning or loosing.. I am testing hypothesis whether gender, place of living, occupation, electoral rules (majoratarian|party lists), experience of victory effect the number of attempts to win the elections. I need a literature which analysizes one of these problems, or phenomena of multiple-time participating in elections. And, may be, theoretical frame for this study
I have samples of monthly totals of births for families in different locations and want to compare the seasonality of births as an indicator of distinctions in work patterns. The suggestion is that if sailor are away from their families at particular times of year, the pattern of births in their families will reflect these absences. The data is arranged by months and years with the proportion of births as a percentage of the total for each month and the totals for each month added and calculated as a proportion of total births. Like this:
Parish 1
Jan Feb Mar Apr May etc total
1614 0 0 1 4 2 31
0% 0% 3% 13% 6% 100%
1615 etc
Total 15 11 13 7 9 etc 144
19% 8% 9% 5% 6%
Parish 2
1614 1 3 1 3 1 etc 20
5% 15% 5% 15% 15% etc 100%
1615 etc
Total 15 16 13 25 21 etc 183
8% 9% 7% 14% 11% 100%
To test for the significance of the differences , I have been performing simple T-tests on each month. Is this the correct approach or are there other ways to test difference in the pattern of seasonality? In my statistical work I've not gone much further than hypothesis testing and linear regression.
Any thoughts would be much appreciated.
I have two data sets to compare which are responses to statements using a Likert scale response anchors (7 in total, always untrue to always true, coded as 1-7) (so non-normally distributed, ordinal data). In the 2 groups (n=24 and n=34, responses collected a year apart), 5 cases appear in both data sets (responses at the different time points from the same people) but all other cases are from people who have only given responses at one of the two time-points.
If I am looking to hypothesis test for group differences, what is the best way to work with these groups? Split into independent and not? Analyse together ignoring the 5 cases where responses are from the same people? Previously, I have analysed other data sets (two and three groups) where all cases have been independent of each other both between and within groups using Mann-Whitney and Kruskal-Wallis tests, but as I have a mix of independent and not cases here, I am unsure on how best to treat the data set/the best hypothesis testing to run?
We're working with the lovely garden eels: snake-like fishes that live in big colonies, attached to the sandy sea bottom. They feed on plankton and hide in their burrows whenever something big approaches. Here's a small video of them: https://www.youtube.com/watch?v=v2WEkd9qMlw
To test whether they're using social information in their evasive behaviour, we found an edge of the colony and, after satying put for 3 minutes to ensure they were not hiding at that point, one of us slowly approached until the first eel retracted. We marked that point as our zero. Then, we marked the positions where the closest and farthest eels hide. We then measured the distances between our zero and the closest (Ri), and farthest (R1) points.
Now, our null hypothesis is that if Ri and R1 are equal, the information (the evasive behaviour) is not spreading, and therefore there's no use of social information. Our H1, then, is that if information is spreading, R1 > Ri. As every pair of R1 and Ri was taken at the same time, respect to the same point of reference (zero), and our data did not pass the Shapiro normality test, we're considering a paired Wilcoxon test. Is this appropiate? Our sample size is 68.
Thank you in advance.
If somebody frames research hypothesis for his/her study can he/she conclude on that based on frequency of response only or he must apply a statistical test of hypothesis testing?
(My research is about development of FR intumescent coating by addition of additives, my goal is to have a more thermal resistive sample than the controlled sample without neglecting the time(aim: lesser average temp or better "temperature vs time" relationship( negatively related)). I performed a horizontal burner fire test for 8 different samples, recording the temperatures in 1 hour per minute from each samples.What statistical tool/method do I use the data gathered( time vs temp) to conculde that sample a is better than the controlled sample? I am not quite sure to use correlation and not knowledgeable enough to other tests thank you.
To show the mediation effect , how should we develop hypothesis. Will we test three hypothesis for testing the path a to b, b to c and a to C ?
Can someone clarify the number of hypothesis to be tested and how to develop these hypothesis? If the discussion is done using an example it can be a great help.
I performed a specific experiment involving temperature vs time. I performed 8 experiments having 8 different data (time vs temperature), can I use pearson correlation coefficient to represent the behavior of the graph and then how can I compare each coefficients to see if it is significantly different from one another.
Hi everyone!
I analyzed 20 tissue samples of oral leukoplakia (OL - an oral potentially malignant disease) through untargeted metabolomics to compare the metabolic profile of those OL who had malignant transformation (5) and those who did not (15). I know that the small sample size is one important limitation of the study, but OL is a rare disease and I have to deal with it.
Well, when I use my complete dataset (around 4k compounds) to perform multivariate analysis such as PLS-DA, my model is overfitted, exhibiting a negative q2. However, when I use the 72 compounds considered statistically significant by the univariate methods (hypothesis tests) as the input data, my q2 rises to 0.6. The improvement also occurs when I use this small dataset to build the heatmap that clearly distinguishes the malignant transformed from the non-transformed OL. Interestingly most of the compounds classified on the PLS-DA VIP list are the same, both using my whole data and using the 72 discriminant features as the input.
I recently presented my thesis to a metabolomics specialist and she told me that my analysis is curious and that she cannot tell me whether it is right or wrong.
Would anyone here help me with this question?
Thanks!
As we know there are 2 types of errors in hypothesis testing. Type I & type II.
When applied to real life situations, which are more catastrophic ?
Examples would be helpful.
Thanks.
Hello! I've published on five randomized controlled trials revealing that a virtual reality job interview training tool increases the odds of employment in those using the virtual tool compared to a community control group (~OR=2.0 with two-tailed tests). The trials were in various groups with serious mental illness. I'm now conducting what was supposed to be a fully-powered RCT where the COVID-19 prematurely ended our recruitment where we enrolled 68% of our anticipated sample.
Given we have 5 RCTs finding the same outcome, I proposed an a priori directional hypothesis that the virtual interview tool would again increase employment in the latest study. That said, is there a way to compute a directional/one-sided confidence interval for the Odds Ratio?
I have a large dataset (https://www.kaggle.com/teejmahal20/airline-passenger-satisfaction) regarding airline passenger satisfaction. I applied a decision tree on this dataset and I extracted the feature importance and it seems that the quality of the inflight wifi service is the best predictor for the final satisfaction level of the passengers. Please keep in mind that the target variable is binary in this dataset (satisfied or dissatisfied).
I would like to cross-check this result by using "classical" statistics - hypothesis testing - whether the quality / level of satisfaction with respect to the wifi service is really a good indicator of whether the passenger will be satisfied or not. The final purpose of my research is to create an algorithm that can provide quality information for a business decision making process from the airlines' point of view (is it worth to invest in X service in order to improve our passenger's satisfaction level? - if the quality of the service is improved by a% then b% of the passengers become satisfied and are more likely to fly with our airline again).
I've identified the PSM (propensity score matching) as a way to "create" these control & test group for my hypothesis, but I'm not sure how to apply this or whether it is what I am really looking for.
Can anyone shed some light into this problem? Any help with respect to properly selecting a control group and a test group for this hypothesis testing will be greatly appreciated!
Many thanks!
My study is about the development of Intumescent Coatinv through addition of additive A and B. In my research I used samples with different ratios from the two having the controlled one with 0:0 . I perfomed the Horizontal fire test and recorsed the temperature of the coated steel overtime. I need to show if there is significant difference from each sample and compare the correlation coefficients from each samples( if it is statistically significant). I tried using one way anova and post hoc test but i think time affects the temperatures. Should i try two way anova?
In order to perform verify validity of a hypothesis what will be the minimum sample size ? I only have 45 samples, would that be enough? Is it okay do conduct a qualitative analyzing methods to validate the hypothesis?
Hi there!!! I have done a meta-analysis with 6 different datasets to find out significantly differentially abundant bacteria across all datasets.
I have calculated the standardized mean difference (Effect Size) between the control and test group for each bacteria from each dataset. Now, for a single bacteria, I have 6 different effect sizes. Across these 6 different effect sizes, I have run the Random Effect model to find the overall effect size across populations for that particular bacteria and I got the P-Value.
I have done the same procedure for all the bacteria (a total of 200 bacteria). As I have done multiple hypothesis testing, I have adjusted for P-Value with FDR correction. After adjustment, I am getting 8, 11, 14, and 21 differentially abundant bacteria after FDR cut off 0.05, 0.1, 0.15, and 0.2 respectively. In this case,
- can I report the bacteria with FDR < 0.2 or < 0.15? Will it be acceptable in high-quality journals?
- Do the journals have any restrictions for high FDR values like 0.15 or 0.2?
Thanks,
Deep
The question has been answered- Closed Thread
.
.
Hello, for my undergrad dissertation I have a model where the dependent variable is Behavioral Intention (BI), and it has many independent variables. I first run regression analysis on SPSS by putting BI in the dependent box, and the rest of the variables (as well as the control variables) in the independent box. Almost all of my hypotheses were accepted, except 2 where the significance was over 0.05. Then I decided to run the analysis by testing the variables one by one instead of putting them all together (however I still included the control variables). I then realized that in this way, the standardized b coefficients were higher and the significance was almost always 0.000 (i.e. more strong relationships, and all hypotheses accepted). I know that probably the first method is more correct (multiple linear regression analysis), but why does this happen? Note: there are no issues of multicollinearity
I was trying to determine whether there are differences in the frequencies of words (lemmas) in a given language corpus starting with the letter K and starting with the letter M. Some 50 000 words starting with K and 54000 words starting with M altogether. I first tried using the chi-square test, but the comments below revealed that this was an error.
Hello research fellows,
I would like to understand how I can create a statistical test, testing that the relationship between A and B is zero. That is, my ALTERNATIV HYPOTHESIS predicts a zero relationship between these two variables.
What I understand is: The solution cannot be an insignificant relationship between A and B with a normal t-test, as the statistical test then was created under the assumption that the null hypothesis predicts COR(A, B) =0, while the alternative hypothesis predicts COR(A, B) ≠ 0 or ><0. So here I would have no coherence between my null hypothesis about the phenomenon and my statistical test.
Can anybody suggest literature for how to test for zero relationships, preferably in the realm of psychology?
Thank you very much!
Best
Rafael
Hi, I have 2 datasets of au values, they were analysed with different analytical methods, dataset A has N=60 and dataset B has N =252.which hypothesis testing method can I use to test if the two datasets are significantly different from each other?
Has anyone ever used Bayesian modelling when hypothesis testing instead of p values in classical hypothesis testing?
Hello Everyone,
I have one population, sample size 200. and needed to find the correlation between Variable A and B. Also, which T-test should I choose for hypothesis testing
Please advise which statistical test using SPSS will be good for below research questions:
Is there a relationship between Variable A and Variable B?
Are there any differences in risk score by gender ( Male vs. Female)?
Thanks in advance.
Will appreciate your response.
Best Regards,
Meraj Farheen Ansari.
Hi Colleagues-
What do you think about the following methodological issues:
1. Is it methodologically wrong to formulate quantitative research questionnaire based on inductive reasoning?
2. Is it mandatory to test hypothesis and/or theory in quantitative research? if no -what are the justifications? (please suggest an example of exemption)
Thanks in advance
Hello everyone, I am using SPSS and IPA METHOD to analyse my data. My questionnaire included 31 questions about two variables "Importance" and "Usage" (and the participants answered from 1-5). My first hypothesis is
H0- There is no significant difference between the level of importance and the level of usage...
In order to answer, I have used Paired t test in SPSS for each question , BUT 19/31 questions show that the Mean differnce is NOT statistically significant , while the rest 12 questions show that that the Mean differnce is statistically significant. Hence I do not know how to proceed?
Do I accept or reject the null hypothesis and why ??
Thanks!
I have collected data of a variable and I want to know the underlying distribution, and then take samples from it.
I have tried to fit several distributions and almost all of them fail in hypothesis testing. What should I do? How should I approach this question?
Dear RG Researchers,
When we apply the null hyposthesis to test the relationship, for two data set , we compare the F-statistic with the critical value (P-value) and upon this comparison we reject the null hypotheis (means there is relationship between the two data set (F>P-value)) or we accept it (F<P-value). However, we know this fact, I would like to know the scientific reasons (statistical reasons) behind this comparison to reject or accept.
Best wishes,
Sincerely.
I have an enormous dataset and for each row, I have a predicted value, and in the same row, there are a few characteristics(independent variables). I have set an ideal linear regression for this dataset. Now, I want to compare the set of independent variables of my ideal regression with the regression of each every row of my dataset. I appreciate for any help....thank you!
I would be most grateful for advice on interesting clinical cases where interventions have been approved based on hypothesis test results from high-quality RCTs, but where it has subsequently been discovered that the hypothesis test results corresponded to false positives. I am particularly interested in cases where, despite the positive RCT finding, the scientific rationale behind the hypothesis was later discredited.
Many thanks in advance!
I am studying the characteristics of GitHub issues for a project. Based on some criteria, I have classified these issues into two separate groups. For example, if I have a total of 1000 issues for a project, 20 goes to the first group and the remaining 980 goes to the second group. Also, the two groups are highly unbalanced (e.g., 1 issue in the first category to 100 issues in the second category). For all of the issues, I have measured different characteristics, and the measured values for each feature do not follow a normal distribution.
Now, I want to do a null hypothesis testing for each of the measured characteristics to find out if the feature is different for the two groups and ideally how different are they. For example, feature X is significantly different in the two groups and it has higher values in the first group compared to the second group.
Can someone kindly help me on which methods I can use for this purpose.
Given a study hypothesis, what are the up-to-date alternatives to null hypothesis significance testing in finding the evidence for or against the truth of that hypothesis and can you summarize you they work?
I want to test the impact of a process change on an operational metric, which is a continuous variable. I have two data sets, pre-test and post-test. Both data sets represent the entire population of events that occurred during the specified time periods, and both have a population size of > 5,000. I want to know if there was a positive or negative change following the intervention and whether that change is statistically significant.
My intuition is to apply a two-tailed z-test, however this particular metric is reported using its 90th percentile rather than its mean. A z-test for proportion doesn't seem to fit either. Essentially, I want to know if a change in the 90th percentile was statistically significant.
I have a dataset with an unusual configuration and was hoping for guidance on choosing a method to test for changes in mean.
For context, this is operational data tracking vehicle arrivals at 10 physical sites, and measures relative site volume using a ratio of daily arrivals in relation to the site's capacity. Variation among the 10 ratios is calculated for each day (coefficient of variation).
A program was started that changes the conditions under which vehicles determine which site they drive to (goal is to reduce variation in above described ratios). There have been three different changes in conditions, yet the time lengths they were implemented for were all different:
Baseline - 30 days (cannot be extended)
Phase 1 - 78 days
Phase 2 - 116 days
Phase 3 - 87 days
I'm being asked to determine whether there were significant changes in the mean variation during each phase compared to Baseline. Since I'm testing the same group (the entire vehicle/site system) under three different conditions, I believe three separate paired t-tests would be appropriate. However, I know the sample size must be identical for each pair. Generating a proportional random sample would still give me different sample sizes (obviously). My question is whether it's acceptable to choose a constant number of days to sample from each phase (e.g. 15 days of ratio variations) or if there would be a more appropriate test to use?
Hello all! I am new to applying statistical analyses to research problems. I need some help regarding choosing the correct statistical test to analyze my experiments:
I am trying to measure the concentration of a metabolite that cells secrete in response to particular compounds I treat them with, and see if the compounds affect metabolite secretion.
1) I am trying to see how the mean concentration of the metabolite differs between the four groups : control, treatment 1 alone, treatment 2 alone and treatment 1 and 2 in combination. (treatment 1 refers to when I treat my cells with compound 1, treatment 2 refers to when I treat cells with compound 2.)
2) I am trying to see whether the mean metabolite concentration differs between control and treatment 1 at four different time points: 12hours, 24 hours, 36 hours and 48 hours post-treatment. At the same time, I am also comparing the mean metabolite concentration when I give treatment 1 at 12 hours with the metabolite concentration at 24 hours with the concentration at 36 hours with the concentration at 48 hours post-treatment.
Any help is appreciated!
Hello, I am looking for some advice regarding choosing a direction for hypothesis/statistical testing for a multivariate analysis. I am interested in determining if there is relationship between cyanobacteria bloom frequency and wildfires. So, my dependent continuous variable is the amount of blooms detected, while I have both categorical (NLCD class) and quantitative (fire frequency, wind speed, temperature, etc) independent variables. I am now planning to do a PCA in order to reduce the dimensionality of all these variables, which may lead me to do a multiple regression.
I was wondering if there is another hypothesis test that I am missing regarding multivariate data that may have independent variables related to one another? (i.e fire frequency and area burned may be related). Someone has suggested me to look at a PERMANOVA, but I am not sure if that would be the other route I could take.
I am a novice in statistics, so I apologize if I said something incorrect and would appreciate any suggestions/advice.
I have two methods for doing Monte Carlo simulations. With both of them I have run serveral simulations and got the mean and variance of their results. I would like to determine whether both methods are equivalent. How can I compare them since hypothesis tests like Student's t test only determine that there is not strong evidence to refute the null hypothesis?