# Would I use independent 2 samples or paired samples T test to test for mean difference between age-sex-matched samples with ratio 1:2?

I have a case group and a control group, which is age-sex-matched with ratio 1 case: 2 control. I'm wondering what test is suitable to test their mean difference. To test matched groups, some suggest paired t test, some use independent t test. But I find difficult to use paired t test because they are not 1:1 but 1:2.

I found it is difficult to treat it as matched sense if it is not 1:1 ratio. Any idea? Thanks

I found it is difficult to treat it as matched sense if it is not 1:1 ratio. Any idea? Thanks

## Popular Answers

DeletedCarl Schwarz.

Dudley Gentles· University of Auckland## All Answers (44)

Tania Limongi· King Abdullah University of Science and TechnologyRaul Castro·Mikhail Atepalikhin· Gymnasium, Novy Urengoy, YaNAD, RussiaJesse Reynolds· Yale UniversityNaceur M'Hamdi· National Institute of Agronomy, TunisI understant from your case that you have a 1 treatment to compare to a control one , the appropriate test is Dunnett. Becase T-Test is used if you compare two treatements.

Huanying Qin· Baylor Scott & White HealthAlan Bugbee· istationThe question about paired vs independent t-tests is not about sample size. The real concern is whether or not they are independent of one another. The fact that one group is twice as large as the other is not a concern unless it is a paired-sample experiment. If you really want to do a paired-comparison, see if you can match the groups on a common element (i.e., demographics, grades, etc).

Perhaps I've missed the point, but I do not see your concern about sample size and type of statistical analysis to perform.

Alan Bugbee· istationMark Krushat· Tampere University of TechnologyRip StaufferMahmoud El-Daly· The University of CalgaryDudley Gentles· University of AucklandDeletedCarl Schwarz.

Jason Leung· The Chinese University of Hong KongIn SPSS Example for pair-t test.

In a study on high blood pressure, all patients are measured at the beginning of the study, given a treatment, and measured again. Thus, each subject has two measures, often called before and after measures. An alternative design for which this test is used is a matched-pairs or case-control study, in which each record in the data file contains the response for the patient and also for his or her matched control subject. In a blood pressure study, patients and controls might be matched by age (a 75-year-old patient with a 75-year-old control group member).

also in a reference http://www.spssvideotutor.com/paired-samples-t-test/

Thank Carl for suggesting me the generalized randomized block design!

James Schmeidler· Icahn School of Medicine at Mount SinaiDeletedY = Set(R) Type Set(R)*Type

where Set(R) is the blocking variable (the set) and is a random effect; Type is the Case or Control.

Because you have replicated "treatments" within each set (the multiple controls), you are able to "test" for a block-treatment interaction interaction which normally is not possible in RCBs. See Addelman, Sidney (Oct. 1969). "The Generalized Randomized Block Design". The American Statistician 23 (4): 35–36. for more details. You should NOT pool the Set(R)*Type and the Residual error as they measure different things. See Gates, Charles E. (Nov. 1995). "What Really Is Experimental Error in Block Designs?". The American Statistician 49 (4): 362–363

Note that saying that the design is a "two-way anova" aren't really enough. That is like saying that I have a car without specifying the make or model. This is a single factor GRCB design. There really is only 1 factor (the case or control). The experimental unit are people. Blocks are NOT factors! Blocks restrict the randomization and so have a different type of effect than factor levels. Within each block, you need to assume that the patients selected for case or control are a simple random sample from the population of people within that block. Without blocking, you select a random sample of people from the entire population of people.

Now it is true that the arithmetic may look the same for a "two-factor anova" and the single factor GRCB, but that doesn't make it a two-factor design.

Muhammed Abdullah Abdulrahman· Sudan Atomic Energy Commission1. The data classified as male and female.

2. Both males and females are normally distributed.

3. The variances of two groups are equal.

4. The independence between two groups(males and females)

The t-test for paired samples absolutely not relevant to you data. Please take care when you check assumption and notice that you have alternative tests(Non-parametric tests. I wish you all the best.

Jambulingam Subramani· Pondicherry UniversityHerman Ader· Independent ResearcherAnother problem which may make it necessary to use other techniques than a F-test (analysis of variance) or t-test

even when the dependent variable is continuous is eventual non-normallity of the dependent variable.

In such cases one may revert to nonparametric tests or to bootstrapping the parametric test statistic.

Agresti, A. (2002), Categorical data analysis, Second edition. Hoboken, NJ: Wiley.

Yar Yot· Loei Rajabhat UniversityRamesh Patil· Ashwini Medical College HospitalLeili Piriindependent 2 samples is better than other methods.

Richard A Rode· AbbVieNeither the two-sampe t-test nor the paired t-test is appropriate in this case. As others have indicated a model-based approach, adjusting for age and sex is most appropriate. For example, use ANOVA for continuous measures and logistic regression for binary outcomes.

Best wishes,

Rick

Nana Celestin· Foundation of Applied Statistics and Data Management (FASTDAM)It is clear that only independent sample test can apply but you have to screen your data for normality as to decide for a parametric or non-parametric test. Screen your data for normality using Kolmogorov Smirnov or Shapiro Wilk test. You can equally run case summaries statistics for case and control groups separately and compare the mean and the median for each subset. If the mean somehow overlaps with the median, the subsets are therefore normally or approximately normally distributed. Reinforce your assumption with Skewness and Kurtosis.

Regards.

David Lester Morris· King's College LondonBhargavarama Sarma Bharathula· University of HyderabadJames Schmeidler· Icahn School of Medicine at Mount SinaiDespite the terms "Randomized" and "Design,", a "Generalized Randomized Block Design" does not refer to a randomized design for a study, but instead to an statistical analysis of blocking that may be employed for either a randomized or a non-randomized design. Thus results of the analysis of this study, using Generalized Randomized Block Design, must not be misinterpreted as having any benefit of randomization.

Shijing si· Renmin University of Chinaused to deal with the data with changes happened to every subject. By the way , I guess maybe ANOVA also suit to your

case. Anyway, you can have a try.

Zahra Hooshyari· Allameh Tabatabai Universityyou can use pair t-test just if you want to compare this groups( just 2 group) :1- completely match your groups 2- to study twin 3- to study Couples 4- same groups in different variables or positions 5- same group that assess by different people.

Yar Yot· Loei Rajabhat UniversityPuchong Praekhaow· King Mongkut's University of Technology ThonburiAvinash Kadam· Rasayani Biologics Pvt. LtdAs the data is matched there is no need for adjusting the scores.

DeletedSuggestions about testing for normality are also off the mark. The actual assumption is that the residuals (the difference between the observed response and the expected responses) are normally distributed. Testing if the marginal distribution of the variables is normally distributes is again wrong. For example, consider the independent two-sample t-test. You DO NOT TEST if the pooled data over both sample is normally distributed -- that is simply wrong as the marginal distribution is a mixture of the two distributions for each group. In this case with matching, the marginal distribution of the response is a horrible mixture of distributions across all of the "blocks".

So in this case if the response variable is continuous, use a variant of the GRCB as outlined earlier. If the residuals show evidence of non-normality (and this is non-trivial to really examine because the matching automatically constrains the observed residuals within each of the 3 case/control in a block to sum to zero) you could go to a non-parametric version of the GRCB. Of course as noted above, because this is an observational study and no randomization presumably took place to assign patients to case or control, you must be careful about interpreting the results.

If the response variable is categorical, then a log-linear or (conditional) logistic regression is appropriate. Similar concerns about interpreting the results because of the lack of real randomization within the "blocks".

Jason Leung· The Chinese University of Hong KongBekele Belayihun· Haramaya Universitythere are two groups: 1. comparison group(T-test and ANOVA) 2. correlation and regression

t-test(paired and un paired )

UNPAIRED: the data are independent that is one nominal level variable (two different groups/categorical)with two group as independent variable that are mutually exclusive .eg. chomper mean birth weight between males and females

by independent of data we mean that the data value of each study subject rises independently uninfluenced by and uncorrelated with the data values of the other subjects. the distribution of the dependent variable /numerical variable normal(normally distributed). the test also assume that the variance (of the dependent variable) in the two groups are equal. the assumption is called the requirement of homogeneity of variance

PAIRED: data s are side to be paired , if the study subjects from one population can be matched or paired with particular subjects in the second population. paired data rise naturally from studies of twins and paired objects such as eyes or ears of the same individual.

advantage of paired data is that smaller size are needed b/c of the similarity , hence decries variability with in paired .paired data also arises in studies where observation are taken before and after the intervention on the same study subject.

paired data can be analysis the d/c b/n each member of the pair and tests to determine if the difference are significantly d/t from zero/ the data are normally distributed. eg. com paired mean blood pressure of diabetic patients before and after some intervention/treatment . one group matched twice rather than two independent group.

but your out come variable is categorical (case and control) not possible to use independent 2 samples or paired samples T test (it is wrong method of analysis)

The dependent /outcome/ response variable is categorical the appropriate method of analysis.

your out come variable is categorical , the baste method of analysis of your study or data is:

1. chi squared test of Independence( to check the association between categorical dependent variable and continuous or categorical independent variable)

2.matched paired test(cross-sectional , case control the data s are independent sample / before and after the study cross over or matched case control studies) each cell represent the number of pair in which both member of pair experienced the same value with the two equal to the number of pair

3. in paired sample we use MCNemar's test is the proper method of analysis to test the hypothesis

4. conditional logistic regression ( bi variate and multivariable analysis of logistic regression ) to check or to identify factors that affect the dependent variable(case and control) through controlled factors(independent variable) and confounding.the crud and a dusted OR together with the corresponding Confidence interval computed. A good fit as measured by Housmer lemeshow's test.

How to Select the Appropriate statistical Analise Selection criteria for statistical tests

First, you have to define the level of measurement of each variable to be included in the analysis.

Second, to select the correct statistical analysis, you have to clarify what you want to find out.

Third, sample size calculation or power analysis is directly related to the statistical test that is chosen.

The selection of a statistical test is based on the purpose of the test, the experimental design, and the type of variable (generally, measurement or rank).

Warren L May· University of Mississippi Medical CenterBasilea Watson· National Institute for Research in Tuberculosishttp://www.bmj.com/content/309/6962/1128.full

Vic Siskind· Queensland University of TechnologyE(xr) = μ + θr, E(yr) = η + + θr. r = 1,2 ...n.

On H0, μ = η. Also θr ≠ θs, r ≠ s, and var(x) = var(y) = σ2. So

E Σ{xr – mean(x)}2 = (n-1)σ2 + Σ(θr – Θ)2, where Θ is the mean of the θr. Similarly for the sample estimate of var(y). If the θ’s vary substantially – sand if they do not, the matching is unnecessary – the denominator of t is much larger than it should be.

So your variable of analysis should be d = x – (y1 + y2)/2, because E(d) = μ + η, = 0 on H0 and

E Σ{dr – mean(d)}2 = (n-1)σ2 as it should be.

Sorry about the lack of subscripts in xr, yr etc and superscripts i the sums of squares - pasting my reply into the box got rid of them.

Altaf Khan· National Guard Health Affairs--. In a 1:1 matched study, the matched set consists of one case and one control fromeach stratum, and it is most commonly used situation.

--. In a 1 : m matched study, the matched set consists of one case and m controls (in your case m is 2), and in general m lies between 2 and 5.

--In the m:n matched study, the matched set consists of n cases and with m controls and values of both (m and n) from 1 to 5.

For your study, 1 : 2 matched case-control study, you need to use 'The Proportional Hazard Regression Model'. For reference: see Breslow and Day (1980) and Collett (1991). Moreover, I have borrowed these ideas from:'Categorical Data Analysis Using the SAS System', by Maura E. Stokes, Charles S. Davis and Gary G. Koch.

Mikhail Nikulin· Université BordeauxJagdish Khubchandani· Ball State UniversityFabio Montanaro· Latis Srl, Italy, GenovaStephen Senn· LIH Luxembourg Institute of HealthCan you help by adding an answer?