Page 1

Sample size determinations in original research

protocols for randomised clinical trials submitted to

UK research ethics committees: review

OPEN ACCESS

Timothy Clark research scientist, Ursula Berger senior statistician, Ulrich Mansmann director, and

chair of biostatistics and bioinformatics

Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie, Faculty of Medicine, Ludwig-Maximilians University, Munich,

Germany

Abstract

Objectives To assess the completeness of reporting of sample size

determinations in unpublished research protocols and to develop

guidance for research ethics committees and for statisticians advising

these committees.

Design Review of original research protocols.

Study selection Unpublished research protocols for phase IIb, III, and

IV randomised clinical trials of investigational medicinal products

submitted to research ethics committees in the United Kingdom during

1 January to 31 December 2009.

Main outcome measures Completeness of reporting of the sample size

determination, including the justification of design assumptions, and

disagreement between reported and recalculated sample size.

Results 446 study protocols were reviewed. Of these, 190 (43%) justified

the treatment effect and 213 (48%) justified the population variability or

survival experience. Only 55 (12%) discussed the clinical importance of

the treatment effect sought. Few protocols provided a reasoned

explanation as to why the design assumptions were plausible for the

planned study. Sensitivity analyses investigating how the sample size

changed under different design assumptions were lacking; six (1%)

protocols included a re-estimation of the sample size in the study design.

Overall, 188 (42%) protocols reported all of the information to accurately

recalculate the sample size; the assumed withdrawal or dropout rate

was not given in 177 (40%) studies. Only 134 of the 446 (30%) sample

size calculations could be accurately reproduced. Study size tended to

be over-estimated rather than under-estimated. Studies with

non-commercial sponsors justified the design assumptions used in the

calculation more often than studies with commercial sponsors but less

often reported all the components needed to reproduce the sample size

calculation. Sample sizes for studies with non-commercial sponsors

were less often reproduced.

Conclusions Most research protocols did not contain sufficient

information to allow the sample size to be reproduced or the plausibility

of the design assumptions to be assessed. Greater transparency in the

reporting of the determination of the sample size and more focus on

study design during the ethical review process would allow deficiencies

to be resolved early, before the trial begins. Guidance for research ethics

committees and statisticians advising these committees is needed.

Introduction

The determination of sample size is central to the design of

randomised controlled trials.1To have scientific validity a

clinical study must be appropriately designed to meet clearly

defined objectives.2 3Clinical trials should provide precise

estimates of treatment effects, thus allowing healthcare

professionals to make informed decisions based on sound

evidence.3Equally, trials should not be too large, as these may

expose some patients to unnecessary risks. An extensive

literature on sample size calculations in clinical research now

existsforawidevarietyofdatatypesandstatisticaltests.4-9The

International Conference on Harmonisation of technical

requirementsforregistrationofpharmaceuticalsforhumanuse,

topic E9, sets down the requirements for sample size reporting

in research protocols for studies supporting the registration of

drugs for use in humans.1Although these standards primarily

concerncommercialsponsors,theprinciples(box1)havebroad

application to all clinical trials. The consolidated standards of

reportingtrialsstatementprovidessimilarguidanceforpublished

randomised trials.10 11

Surprisingly few evaluations have been made of the quality of

the sample size determination in randomised controlled trials.

Those that have been performed are mainly based on published

data owing to the difficulty in obtaining access to unpublished

Correspondence to: U Mansmann mansmann@ibe.med.uni-muenchen.de

Extra material supplied by the author (see http://www.bmj.com/content/346/bmj.f1135?tab=related#webextra)

Filter criteria used to identify studies in research ethics database

Additional figures and tables

No commercial reuse: See rights and reprints http://www.bmj.com/permissionsSubscribe: http://www.bmj.com/subscribe

BMJ 2013;346:f1135 doi: 10.1136/bmj.f1135 (Published 21 March 2013)Page 1 of 10

Research

RESEARCH

Page 2

Box 1: Core components of the sample size determination (International Conference on Harmonisation topic E9)1

Trial objective

For example, superiority, non-inferiority, equivalence

Study design

For example, parallel group, crossover, factorial

Primary variables

Clinically most relevant endpoints from the patients’ perspective

Statistical test procedure

For example, t test for continuous variables, χ2test for binary variables

Allocation ratio

Ratio of number of participants in each treatment arm

Treatment difference sought

Minimal effect that has clinical relevance or the anticipated effect of the new treatment, where this is larger

Other design assumptions

For example, variance, response rates, and event rates, used in the calculation

Type I error

Probability of erroneously rejecting the null hypothesis

Type II error

Probability of erroneously failing to reject the alternative hypothesis:

Expected rate of treatment withdrawals

Expected proportion of subjects with no post-randomisation information

Key considerations

Justification of treatment difference sought and other design assumptions

Investigation of how sample size changes under different assumptions (sensitivity analyses)

Other components depending on study design

Accrual and total study duration used to estimate the number of patients required in event driven studies

Adjustments for multiple testing—for example, multiple endpoints, multiple checks during interim monitoring

research protocols.12 13These reviews have several limitations.

Firstly, the reporting of the sample size determination in the

study publication is less detailed than in the research protocol.

Secondly, they are affected by publication bias. Thirdly, one

study showed that there are often discrepancies between the

researchprotocolandthepublication.14Consequently,definitive

conclusions about the quality of sample size determinations

should be based on a review of original research protocols.

Studies that are too large or too small have been branded as

unethical.2 15 16The view that underpowered studies are in

themselves unethical has been challenged by some researchers,

who argue that this is too simplistic.17-19We believe that a study

mustbejudgedonwhetheritisappropriatelydesignedtoanswer

the research question posed, and the validity of the sample size

calculation is germane to this assessment. This is not merely a

matter of whether the sample size can be recalculated, since the

calculation can be correct mathematically but still be of poor

qualityiftheassumptionsusedhavenotbeensuitablyresearched

and qualified.20Greater transparency in the reporting of the

sample size determination and more focus on study design

during the ethical review process would allow deficiencies to

be resolved early, before the trial begins; once the trial starts it

istoolate.Weassessedthequalityofsamplesizedeterminations

reported in research protocols with the aim of developing

guidance for research ethics committees.

Methods

Wesearchedtheresearchethicsdatabase,awebbaseddatabase

applicationformanagingtheadministrationoftheethicalreview

process in the United Kingdom, using filter criteria (see

supplementary file) to identify all validated applications for

randomised (phase IIb, III, and IV) clinical trials of

investigational medicinal products submitted to the National

Research Ethics Service for ethical review during 2009. We

designed these criteria to create a large database of recently

submitted protocols (2009 was the last complete year before

the project started in 2010) for randomised controlled trials.

Creation of the protocol database

Three researchers extracted the characteristics of the studies

(table 1⇓) from the research ethics database according to

prespecifiedrulesandenteredthedataintotheprotocoldatabase.

Two reviewers independently assessed each research protocol.

The researchers met regularly to discuss and agree on the final

data to be entered into the database.

To verify the data sources we checked that the information in

the research ethics database was consistent with the research

protocol on file at the research ethics committee office.

Data analysis

ThedatabasewasanalysedusingSPSSVersion19.Wedescribe

the results using frequency tables with percentages, cross

tabulations, relative risks with 95% confidence interval, and

box and Bland-Altman plots.

No commercial reuse: See rights and reprints http://www.bmj.com/permissionsSubscribe: http://www.bmj.com/subscribe

BMJ 2013;346:f1135 doi: 10.1136/bmj.f1135 (Published 21 March 2013) Page 2 of 10

RESEARCH

Page 3

Assessing the sample size determination

We assessed sample size determination based on three factors:

the reporting of how the sample size was determined, the

reporting of and justification for the design assumptions, and

recalculation of the original sample size determination.

Reporting of sample size determination

Wereviewedeachprotocoltodeterminethepresenceorabsence

of the core sample size components. Reporting of additional

informationsuchasadjustmentformultipletesting(forexample,

multipleendpoints,multiplechecksduringinterimmonitoring)

required for the sample size calculation was also documented.

We did not assess the appropriateness of the proposed methods

of analysis.

Reporting and justifying design assumptions

Each design assumption was categorised (box 2). We also

documented the reporting of sensitivity analyses and

consideration of an adaptive design. We did not independently

assess the appropriateness of the design assumptions.

Recalculation of original sample size

determination

The three researchers who created the protocol database

recalculated the original sample size according to prespecified

rules.Twoindependentreviewerscarriedouteachrecalculation;

the researchers met regularly to discuss and agree the final data

tobeenteredintothedatabase.Anyoutstandingquestionswere

referred to a fourth reviewer for resolution (n=65).

If the sample size determination stated that a specific statistical

software had been used (for example, nQuery Advisor, East,

PASS) or referenced a specific publication, then we used the

same software or published methodology to recalculate the

samplesize.Iftheprotocolstatedthatthesamplesizewasbased

on a more complex method of analysis, such as analysis of

covariance then we used PASS 11 or nQuery 6.01. Otherwise

weusedstandardformulasfornormal,binary,orsurvivaldata.4-6

Missing information was imputed in four ways. Firstly, if the

withdrawal rate was not specified we recalculated the sample

size using the variables given. This recalculated sample size

wasthencomparedwiththesamplesizereportedintheprotocol.

Secondly, if the type I error or type II error (power of a trial is

1−probability of a type II error) was not specified, we

recalculated the sample size using a two sided 5% type I error

or a 20% type II error. Thirdly, when adjustments for multiple

testing were not reported we assumed no adjustment had been

applied.Finally,ifthesamplesizewasbasedonamorecomplex

method of analysis, but insufficient information was reported

to allow the sample size to be recalculated for the planned

method, we used standard formulas to recalculate the sample

size.

We defined two populations for analysis: protocols where

missing information was imputed and protocols that reported

all core components and any additional information such as

adjustments for multiple testing required to accurately

recalculate the sample size (complete reporting).

Iftheratioofthenumberofevaluablepatientsoreventsreported

in the protocol to that calculated fell within the range 0.95 to

1.05, then we reproduced the sample size, since a difference of

5% or less either way represented an inconsequential reduction

or increase in power (approximately 2% for normal, binary, or

survival data).

Results

A total of 929 research protocols were identified by the initial

search. Of these, 446 met the inclusion criteria (see

supplementary figure 5). Table 2⇓ lists the main characteristics

of the 446 research protocols (also see supplementary table 5).

The most common therapeutic areas were oncology (94; 21%)

and endocrinology (49; 11%). Most studies were sponsored by

industry(314;70%),wereinphaseIII(251;56%),hadaparallel

group design (319; 72%), and had superiority of the test over

control medicinal product as the primary objective (375; 84%).

Six (1%) protocols included sample size re-estimation in the

study design.

Reporting of sample size components

The individual core components of the sample size were

generally reported in the 446 protocols, with the exception of

withdrawals (269; 60%, fig 1⇓) (also see supplementary table

6). Of the 446 protocols, 240 (54%) reported all the core

components; withdrawal rate was the only element missing in

143 out of 206 (69%) protocols that did not report all core

components.

Whenweconsideredprotocolsthatreportedallcorecomponents

and additional information such as adjustments for multiple

testing to accurately recalculate the sample size (complete

reporting) then the number reduced to 188 protocols (42%).

Reporting design assumptions

Lessthanhalfofthe446protocols(190;43%)reportedthedata

on which the treatment difference (or margin) was based. Of

the 190 protocols that did report the basis of the treatment

difference, 92 (48%) cited previous studies with the product or

aproductinthesameclassand38(20%)citedaliteraturesearch

(fig 2⇓ and supplementary table 7). In only four (2%) protocols

wastheestimatedtreatmentdifferencebasedonameta-analysis.

Reporting the basis for the treatment difference was lowest in

studies on oncology (28/94; 30%) and cardiovascular disease

(12/36; 33%) and highest in those on pain and anaesthesia

(16/27; 59%) (see supplementary table 8).

Overall, 55 out of 446 (12%) protocols reported both the basis

of the treatment effect and its clinical importance, 135 (30%)

protocols reported the basis only, and 256 (57%) reported

neither. Limited information on the nature of the data

underpinning the treatment effect was usually given, and just

13 (3%) protocols gave a reasoned explanation why the value

chosen was plausible for the planned study.

The same pattern was observed with population variability or

survival, with less than half (213/446; 48%) of the protocols

reporting the basis of the variable used in the calculation (fig 2

andsupplementarytable9).Previousstudies,aliteraturesearch,

or both, were again most commonly cited. The variability or

survival estimate was based on a meta-analysis in only two of

the213(1%)protocols.Again,limitedinformationwasusually

given, and just 17 (4%) protocols explained the plausibility of

the value chosen.

Only 11 out of the 446 (3%) protocols reported analyses

investigating the sensitivity of the sample size to deviations

from the assumptions used in the calculation.

Reporting of strategies to control type I (false

positive) and type II (false negative) error

Adjustmentsformultiplecomparisons(81/144;56%)orinterim

analyses (56/95; 59%) were reported in just over half of the

researchprotocolswiththesedesignfeatures(seesupplementary

No commercial reuse: See rights and reprints http://www.bmj.com/permissionsSubscribe: http://www.bmj.com/subscribe

BMJ 2013;346:f1135 doi: 10.1136/bmj.f1135 (Published 21 March 2013)Page 3 of 10

RESEARCH

Page 4

Box 2: Categorisation of design assumptions

Variable with no justification

Variable and data on which assumption based (the “basis”)

Details of the data underpinning the variable—for example, previous studies with the new drug or products in the same therapeutic

class, physician survey, meta-analysis, literature search given

Treatment difference and data on which assumption based and discussion of its clinical importance

Required more than a simple statement that the “treatment difference was clinically important”. Reference to a specific study or studies

in which the clinically relevant difference has been determined, or a detailed clinical discussion of why the investigators considered the

difference sought to be meaningful

Variable and data on which assumption based and a reasoned explanation for choice

Required a discussion of the data underpinning the variable and an explanation why the value used in the sample size calculation was

plausible for the planned study

table 10). The potential for increasing the type II error was not

considered in any study with multiple comparisons. If all

co-primaryvariablesmustbesignificanttodeclaresuccessthen

the type II error rate can be inflated, resulting in reduction in

the overall study power.1 22

Recalculation of the original sample size

determination

If all protocols were considered using the rules for imputing

missing information then 262 of out 446 (59%) sample size

determinations could be reproduced, with 51 (11%)

under-estimated and 103 (23%) over-estimated. Thirty (7%) of

the original sample size calculations could not be recalculated

(see supplementarytable 11). Figure 3⇓ shows a box plot of the

relativedifferencesbetweenthereportedandrecalculatedsample

sizes.

A total of 134 of the 188 (71%) sample size calculations from

protocols with complete reporting could be reproduced, with

20 (11%) under-estimated and 34 (18%) over-estimated,

respectively. The reproducibility of the sample size increased

withmorecomprehensivereporting,primarilywithdrawalrates

and adjustments for multiple testing. None the less, both

analyses showed a tendency for over-estimation, and in total

only 134 of the 446 (30%) original sample size calculations

could be accurately reproduced.

Supplementary figure 6 shows a Bland-Altman plot comparing

reported and calculated sample sizes.

Commercialversusnon-commercialsponsors

The reporting of the core components of the sample size

determination did not differ noticeably between studies with

commercial and non-commercial sponsors (fig 4⇓ and

supplementarytable12).Studieswithnon-commercialsponsors

weremorelikelythanthosewithcommercialsponsorstoreport

the basis for design assumptions (relative risk 1.69, 95%

confidence interval 1.38 to 2.08 for treatment difference and

1.29,1.07to1.56forvarianceandsurvival).Conversely,studies

with non-commercial sponsors were less likely than those with

commercial sponsors to report adjustments for multiple

comparisons (0.26, 0.13 to 0.50) and interim analyses (0.54,

0.31 to 0.93) and provide complete reporting (0.60, 0.45 to

0.81);thesamplesizecalculationfromprotocolsofstudieswith

non-commercial sponsors was also less likely to be reproduced

(0.72, 0.59 to 0.88).

Discussion

Our review suggests that the reporting of the sample size

determination in the research protocol often lacks essential

information.TreatmentdifferenceandtypeIerrorwereusually

given,butwithdrawalratesandadjustmentsformultipletesting

wereoftenmissing.Only188of446(42%)protocolscontained

sufficientinformationtoaccuratelyrecalculatethesamplesize.

Morethanhalfoftheresearchprotocolsprovidednojustification

for the assumptions used in the sample size calculation. When

a justification was given, it generally lacked detail. Sensitivity

analyses,whichcanhelpinvestigatorsunderstandthereliability

of the variables used in the sample size calculation and whether

samplesizere-estimationshouldbeincludedinthestudydesign,

were rarely reported.23-25

Imputing missing information resulted in 262 out of 446 (59%)

reproducedsamplesizes.Thisincreasedto134outof188(71%)

whenonlycompletereportswereconsidered.Overall,only134

of the 446 (30%) sample size calculations could be accurately

reproduced. Study size tended to be over-estimated rather than

under-estimated.

Ourresearch,thefirstextensivereviewofunpublishedresearch

protocols, raises several problems with the statistical planning

of randomised controlled trials, in particular the limited

consideration afforded to the choice of design assumptions.

Sample size determinations are highly sensitive to changes in

design assumptions, which behoves sponsors to be rigorous

when estimating these variables.26Moreover, if the degree of

uncertainty is high then design assumptions should be checked

during the course of the trial.26

Limitations of this review

We only reviewed the research protocol submitted to the

research ethics committee and had no access to any other

documents. Moreover, our review was completely independent

of the ethical review process. The protocols were submitted in

2009 to the UK National Research Ethics Service and reflect

clinical research practice at that time. None the less, the sample

is relatively recent and many sponsors planned to include sites

both within and outside the United Kingdom, so we believe our

findings can be generalised to other countries and regions for

commercialstudieswhereglobalregulatoryrequirementsexist.

For non-commercial studies, the quality of reporting depends

on the investigators experience. We did not verify the

appropriateness of the design assumptions used in the sample

size determination in this research project.

Implications of the findings

In many instances the validity of the sample size determination

andbyextensionthescientificvalidityofthestudy—oneofthe

main aspects of the ethical review process—could not be

judged.2The available evidence suggests that key sample size

assumptionsarenotdeterminedinarigorousmanner.Thismay

explain why large differences have been observed between

designassumptionsandobserveddata.13 27Furthermore,sample

No commercial reuse: See rights and reprints http://www.bmj.com/permissionsSubscribe: http://www.bmj.com/subscribe

BMJ 2013;346:f1135 doi: 10.1136/bmj.f1135 (Published 21 March 2013) Page 4 of 10

RESEARCH

Page 5

sizes tended to be over-estimated, which is a concern given the

challengesofrecruitingtorandomisedcontrolledtrials.28Finally,

methodologies to check assumptions and re-estimate sample

size during the study are often not applied, despite the fact that

these methods are encouraged by regulatory authorities.29 30

Investigators should be rigorous in the determination of design

assumptions.Thereisno“onesizefitsall”approach.Sufficient

information should be reported to allow the sample size to be

reproduced and show that there is solid reasoning behind the

assumptions used in the calculation (box 3).

We would also ask the suppliers of software used to calculate

sample size to consider including the withdrawal and dropout

rate in the package to ensure that this is taken into account and

reported in the research protocol.

A poorly designed trial cannot be saved once it is completed.

Greater transparency in the reporting of sample size

determinations in research protocols would facilitate the early

detection of deficiencies in the study design. Moreover, better

justification of the design assumptions in the research protocol

would facilitate the overall ethical review process.31

Despite calls for a different approach to sample size

determination,webelievethatthereisnosubstituteforspending

time designing the study and giving due consideration to the

risks and how these can be tackled.13 32

Wherever the responsibility for scientific and statistical review

lies,webelieveclearguidanceonthesamplesizedetermination

should be provided and followed. Individuals with appropriate

statistical expertise should also play a central role in the ethical

review of research protocols.33Improving the review process

toplacemorefocusonstudydesignwastheaimoftheNational

Research Ethics Service at the start of this project and we

propose to use the results of our research to develop guidance,

working with the ethics service and others interested in this

area.

We thank the National Research Ethics Service for its support; IBE

research assistants Mathias Heibeck and Linda Hayanga for their

contribution to this project; and Michael Campbell, Sir Iain Chalmers,

Douglas Altman, Gary Collins, and Hugh Davies for their comments on

this work.

Contributors: UM had full access to all of the study data and takes

responsibility for the integrity of the data and the accuracy of the data

analyses. He is guarantor. TC and UM conceived and designed the

study. TC, Mathias Heibeck, and Linda Hayanga collected the data. TC

and UB carried out the statistical analyses. TC, Mathias Heibeck, Linda

Hayanga, and UM recalculated the sample sizes. TC, UM, and UB

interpreted the data. TC drafted the manuscript. UM and UB critically

reviewed the manuscript. The views expressed in the paper are those

of the authors and do not necessarily reflect the views of the National

Research Ethics Service, which became a part of the Health Research

Authority in December 2011.

Funding: This study received no funding.

Competing interests: All authors have completed the ICMJE uniform

disclosure form at www.icmje.org/coi_disclosure.pdf (available on

request from the corresponding author) and declare that: TC received

support for travel from the National Research Ethics Service and has

worked as a consultant for the clinical research organisation ICON in

the previous three years; they have no other relationships or activities

that could appear to have influenced the submitted work.

Ethical approval: Not required.

Data sharing: No additional data available.

1ICH harmonised tripartite guideline. Statistical principles for clinical trials E9. Current Step

4 version dated 5 February 1998.

Emanuel EJ, Wendler D, Grady C. What makes clinical research ethical? JAMA

2000;283:2701-11.

Victor N. Prüfung derwissenschaftlichen Qualität und biometriespezifischerAnforderungen

durch die Ethikkommissionen? Medizinrecht 1999;9:408-12.

Julious SA. Sample sizes for clinical trials. Chapman and Hall, 2009.

Chow SC, Shao J, Wang H. Sample size calculations in clinical research. Chapman and

Hall/CRC, 2003.

Machin D, Campbell M, Tan SB, Tan SH. Sample size tables for clinical studies, 3rd ed.

Wiley-Blackwell, 2008.

Julious SA. Tutorial in biostatistics: sample sizes for clinical trials with normal data. Stat

Med 2004;23:1921-86.

Julious SA, Campbell MJ, Altman DG. Estimating sample sizes for continuous, binary

and ordinal outcomes in paired comparisons: practical hints. J Biopharm Stat

1999;9:241-51.

Campbell MJ, Julious SA, Altman DG. Estimating sample sizes for binary, ordered

categorical, and continuous outcomes in two group comparisons. BMJ 1995;311:1145-8.

Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, et al. The revised

CONSORT statement for reporting randomized trials: explanation and elaboration. Ann

Intern Med 2001;134:663-94.

Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations

for improving the quality of reports of parallel group randomised trials. Lancet

2001;357:1191-4.

Chan AW, Upshur R, Singh JA, Ghersi D, Chapuis F, Altman DG. Research protocols:

waiving confidentiality for the greater good. BMJ 2006;332:1086-9.

Charles P, Giraudeau B, Dechartres A, Baron G, Ravaud P. Reporting of sample size

calculation in randomised controlled trials: review. BMJ 2009;338:b1732.

Chan AW, Hrobjartsson A, Jorgensen KJ, Gotzsche PE, Altman DG. Discrepancies in

sample size calculations and data analyses reported in randomised trials: comparison of

publications with protocols. BMJ 2008;337:a2299.

Altman DG. Statistics and ethics in medical research III: how large a sample? BMJ

1980;281:1336-8.

Halpern SD, Karlawish JHT, Berlin JA. The continuing unethical conduct of underpowered

clinical trials. JAMA 2002;288:358-62.

Schulz KF, Grimes DA. Sample size calculations in randomised trials: mandatory and

mystical. Lancet 2005;365:1348-53.

Bacchetti P, Wolf LE, Segal MR, McCulloch CE. Ethics and sample size. Am J Epidemiol

2005;161:105-10.

Chalmers I. Cardiotocography v Doppler auscultation: all unbiased comparative studies

should be published. BMJ 2002;324:483-5.

Scharf HP, Mansmann U, Streitberger K, Witte S, Krämer J, Maier C, et al. Acupuncture

and knee osteoarthritis: a three-armed randomized trial. Ann Intern Med 2006;145:12-20.

Wood L, Egger M, Gluud LL, Schulz KF, Juni P, Altman DG, et al. Empirical evidence of

bias in treatment effect estimates in controlled trials with different interventions and

outcomes: meta-epidemiological study. BMJ 2008;336:601-5.

Julious SA, McIntyre NE. Sample sizes for trials involving multiple correlated must-win

comparisons. Pharm Stat 2012;11:177-85.

Senn SJ. Statistical issues in drug development. Wiley, 2007:198-9.

Bacchetti P. Current sample size conventions: flaws, harms, and alternatives. BMC Med

2010;8:17-23.

Kieser M, Wassmer G. On the use of the upper confidence limit for the variance from a

pilot sample for sample size determination. Biom J 1996;38:941-9.

Julious SA. Designing clinical trials with uncertain estimates of variability. J Pharm Stat

2004;3:261-8.

Djulbegovic B, Kumar A, Magazin A, Schroen A, Soares H, Hozo I, et al. Optimism bias

leads to inconclusive results—an empirical study. J Clin Epidemiol 2011;64:583-93.

McDonald AM, Knight RC, Campbell MK, Entwistle VA, Grant AM, Cook JA, et. al. What

influences recruitment to randomised controlled trials? A review of trials funded by two

UK funding agencies. Trials 2006;7:9.

FDA draft guidance for industry, adaptive design clinical trials for drugs and biologics.

FDA, Feb 2010.

Kieser M, Friede T. Simple procedures for blinded sample size adjustment that do not

affect the type I error rate. Stat Med 2003;22:3571-81.

Savulescu J, Chalmers I, Blunt J. Are research ethics committees behaving unethically?

Some suggestions for improving performance and accountability. BMJ 1996;313:1390-3.

Norman G, Monteiro S, Salama S. Sample size calculations: should the emperor’s clothes

be off the peg or made to measure? BMJ 2012;345:e5278.

Williamson P, Hutton JL, Bliss J, Blunt J, Campbell MJ, Nicholson R. Statistical review

by research ethics committees. J R Stat Soc Ser A 2000;163:5-13.

Accepted: 31 January 2013

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

Cite this as: BMJ 2013;346:f1135

This is an open-access article distributed under the terms of the Creative Commons

Attribution Non-commercial License, which permits use, distribution, and reproduction in

any medium, provided the original work is properly cited, the use is non commercial and

is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-

nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe

BMJ 2013;346:f1135 doi: 10.1136/bmj.f1135 (Published 21 March 2013) Page 5 of 10

RESEARCH