Keith E Muller

West Virginia University, Morgantown, WV, USA

Are you Keith E Muller?

Claim your profile

Publications (39)60.17 Total impact

  • Article: Confidence regions for repeated measures ANOVA power curves based on estimated covariance.
    [show abstract] [hide abstract]
    ABSTRACT: Background Using covariance or mean estimates from previous dataintroduces randomness into each power value in a power curve. Creatingconfidence intervals about the power estimates improves study planning byallowing scientists to account for the uncertainty in the power estimates.Driving examples arise in many imaging applications.Methods We use both analytical and Monte Carlo simulation methods. Our analytical derivationsapply to power for tests with the univariate approach to repeated measures(UNIREP). Approximate confidence intervals and regions for power based onan estimated covariance matrix and fixed means are described. Extensivesimulations are used to examine the properties of the approximations.Results Closed-form expressions are given for approximate power andconfidence intervals and regions. Monte Carlo simulations support theaccuracy of the approximations for practical ranges of sample size, rank ofthe design matrix, error degrees of freedom, and the amount of deviation fromsphericity. The new methods provide accurate coverage probabilities for allfour UNIREP tests, even for small sample sizes. Accuracy is higher forhigher power values than for lower power values, making the methodsespecially useful in practical research conditions. The new techniquesallow the plotting of power confidence regions around an estimated powercurve, an approach that has been well received by researchers. Freesoftware makes the new methods readily available.Conclusions The new techniques allow a convenient way to account for the uncertainty of using anestimated covariance matrix in choosing a sample size for a repeated measuresANOVA design. Medical imaging and many other types of healthcare researchoften use repeated measures ANOVA.
    BMC Medical Research Methodology 04/2013; 13(1):57. · 2.67 Impact Factor
  • Article: Marginal Vitamin B-6 Deficiency Decreases Plasma (n-3) and (n-6) PUFA Concentrations in Healthy Men and Women.
    [show abstract] [hide abstract]
    ABSTRACT: Previous animal studies showed that severe vitamin B-6 deficiency altered fatty acid profiles of tissue lipids, often with an increase of linoleic acid and a decrease of arachidonic acid. However, little is known about the extent to which vitamin B-6 deficiency affects human fatty acid profiles. The aim of this study was to determine the effects of marginal vitamin B-6 deficiency on fatty acid profiles in plasma, erythrocytes, and peripheral blood mononuclear cells (PBMC) of healthy adults fed a 28-d, low-vitamin B-6 diet. Healthy participants (n = 23) received a 2-d, controlled, vitamin B-6-adequate diet followed by a 28-d, vitamin B-6-restricted diet to induce a marginal deficiency. Plasma HDL and LDL cholesterol concentrations, FFA concentrations, and erythrocyte and PBMC membrane fatty acid compositions did not significantly change from baseline after the 28-d restriction. Plasma total arachidonic acid, EPA, and DHA concentrations decreased from (mean ± SD) 548 ± 96 to 490 ± 94 μmol/L, 37 ± 13 to 32 ± 13 μmol/L, and 121 ± 28 to 109 ± 28 μmol/L [positive false discovery rate (pFDR) adjusted P < 0.05], respectively. The total (n-6):(n-3) PUFA ratio in plasma exhibited a minor increase from 15.4 ± 2.8 to 16.6 ± 3.1 (pFDR adjusted P < 0.05). These data indicate that short-term vitamin B-6 restriction decreases plasma (n-3) and (n-6) PUFA concentrations and tends to increase the plasma (n-6):(n-3) PUFA ratio. Such changes in blood lipids may be associated with the elevated risk of cardiovascular disease in vitamin B-6 insufficiency.
    Journal of Nutrition 09/2012; 142(10):1791-7. · 3.92 Impact Factor
  • Article: Adaptive trial designs: a review of barriers and opportunities.
    [show abstract] [hide abstract]
    ABSTRACT: Adaptive designs allow planned modifications based on data accumulating within a study. The promise of greater flexibility and efficiency stimulates increasing interest in adaptive designs from clinical, academic, and regulatory parties. When adaptive designs are used properly, efficiencies can include a smaller sample size, a more efficient treatment development process, and an increased chance of correctly answering the clinical question of interest. However, improper adaptations can lead to biased studies. A broad definition of adaptive designs allows for countless variations, which creates confusion as to the statistical validity and practical feasibility of many designs. Determining properties of a particular adaptive design requires careful consideration of the scientific context and statistical assumptions. We first review several adaptive designs that garner the most current interest. We focus on the design principles and research issues that lead to particular designs being appealing or unappealing in particular applications. We separately discuss exploratory and confirmatory stage designs in order to account for the differences in regulatory concerns. We include adaptive seamless designs, which combine stages in a unified approach. We also highlight a number of applied areas, such as comparative effectiveness research, that would benefit from the use of adaptive designs. Finally, we describe a number of current barriers and provide initial suggestions for overcoming them in order to promote wider use of appropriate adaptive designs. Given the breadth of the coverage all mathematical and most implementation details are omitted for the sake of brevity. However, the interested reader will find that we provide current references to focused reviews and original theoretical sources which lead to details of the current state of the art in theory and practice.
    Trials 08/2012; 13(1):145. · 2.02 Impact Factor
  • Article: Global hypothesis testing for high-dimensional repeated measures outcomes.
    [show abstract] [hide abstract]
    ABSTRACT: High-throughput technology in metabolomics, genomics, and proteomics gives rise to high dimension, low sample size data when the number of metabolites, genes, or proteins exceeds the sample size. For a limited class of designs, the classic 'univariate approach' for Gaussian repeated measures can provide a reasonable global hypothesis test. We derive new tests that not only accurately allow more variables than subjects, but also give valid analyses for data with complex between-subject and within-subject designs. Our derivations capitalize on the dual of the error covariance matrix, which is nonsingular when the number of variables exceeds the sample size, to ensure correct statistical inference and enhance computational efficiency. Simulation studies demonstrate that the new tests accurately control Type  I error rate and have reasonable power even with a handful of subjects and a thousand outcome variables. We apply the new methods to the study of metabolic consequences of vitamin B6 deficiency. Free software implementing the new methods applies to a wide range of designs, including one group pre-intervention and post-intervention comparisons, multiple parallel group comparisons with one-way or factorial designs, and the adjustment and evaluation of covariate effects.
    Statistics in Medicine 12/2011; 31(8):724-42. · 1.88 Impact Factor
  • Article: Avoiding bias in mixed model inference for fixed effects.
    Matthew J Gurka, Lloyd J Edwards, Keith E Muller
    [show abstract] [hide abstract]
    ABSTRACT: Analysis of a large longitudinal study of children motivated our work. The results illustrate how accurate inference for fixed effects in a general linear mixed model depends on the covariance model selected for the data. Simulation studies have revealed biased inference for the fixed effects with an underspecified covariance structure, at least in small samples. One underspecification common for longitudinal data assumes a simple random intercept and conditional independence of the within-subject errors (i.e., compound symmetry). We prove that the underspecification creates bias in both small and large samples, indicating that recruiting more participants will not alleviate inflation of the Type I error rate associated with fixed effect inference. Enumerations and simulations help quantify the bias and evaluate strategies for avoiding it. When practical, backwards selection of the covariance model, starting with an unstructured pattern, provides the best protection. Tutorial papers can guide the reader in minimizing the chances of falling into the often spurious software trap of nonconvergence. In some cases, the logic of the study design and the scientific context may support a structured pattern, such as an autoregressive structure. The sandwich estimator provides a valid alternative in sufficiently large samples. Authors reporting mixed-model analyses should note possible biases in fixed effects inference because of the following: (i) the covariance model selection process; (ii) the specific covariance model chosen; or (iii) the test approximation.
    Statistics in Medicine 07/2011; 30(22):2696-707. · 1.88 Impact Factor
  • Article: Combining an Internal Pilot with an Interim Analysis for Single Degree of Freedom Tests.
    [show abstract] [hide abstract]
    ABSTRACT: An internal pilot with interim analysis (IPIA) design combines interim power analysis (an internal pilot) with interim data analysis (two stage group sequential). We provide IPIA methods for single df hypotheses within the Gaussian general linear model, including one and two group t tests. The design allows early stopping for efficacy and futility while also re-estimating sample size based on an interim variance estimate. Study planning in small samples requires the exact and computable forms reported here. The formulation gives fast and accurate calculations of power, type I error rate, and expected sample size.
    Communication in Statistics- Theory and Methods 12/2010; 39(20):3717-3738. · 0.27 Impact Factor
  • Source
    Article: Kronecker product linear exponent AR(1) correlation structures and separability tests for multivariate repeated measures
    [show abstract] [hide abstract]
    ABSTRACT: Longitudinal imaging studies have moved to the forefront of medical research due to their ability to characterize spatio-temporal features of biological structures across the lifespan. Credible models of the correlations in longitudinal imaging require two or more pattern components. Valid inference requires enough flexibility of the correlation model to allow reasonable fidelity to the true pattern. On the other hand, the existence of computable estimates demands a parsimonious parameterization of the correlation structure. For many one-dimensional spatial or temporal arrays, the linear exponent autoregressive (LEAR) correlation structure meets these two opposing goals in one model. The LEAR structure is a flexible two-parameter correlation model that applies in situations in which the within-subject correlation decreases exponentially in time or space. It allows for an attenuation or acceleration of the exponential decay rate imposed by the commonly used continuous-time AR(1) structure. Here we propose the Kronecker product LEAR correlation structure for multivariate repeated measures data in which the correlation between measurements for a given subject is induced by two factors. We also provide a scientifically informed approach to assessing the adequacy of a Kronecker product LEAR model and a general unstructured Kronecker product model. The approach provides useful guidance for high dimension, low sample size data that preclude using standard likelihood based tests. Longitudinal medical imaging data of caudate morphology in schizophrenia illustrates the appeal of the Kronecker product LEAR correlation structure.
    10/2010;
  • Article: A linear exponent AR(1) family of correlation structures.
    [show abstract] [hide abstract]
    ABSTRACT: In repeated measures settings, modeling the correlation pattern of the data can be immensely important for proper analyses. Accurate inference requires proper choice of the correlation model. Optimal efficiency of the estimation procedure demands a parsimonious parameterization of the correlation structure, with sufficient sensitivity to detect the range of correlation patterns that may occur. Many repeated measures settings have within-subject correlation decreasing exponentially in time or space. Among the variety of correlation patterns available for this context, the continuous-time first-order autoregressive correlation structure, denoted AR(1), sees the most utilization. Despite its wide use, the AR(1) structure often poorly gauges within-subject correlations that decay at a slower or faster rate than required by the AR(1) model. To address this deficiency we propose a two-parameter generalization of the continuous-time AR(1) model, termed the linear exponent autoregressive (LEAR) correlation structure, which accommodates much slower and much faster decay patterns. Special cases of the LEAR family include the AR(1), compound symmetry, and first-order moving average correlation structures. Excellent analytic, numerical, and statistical properties help make the LEAR structure a valuable addition to the suite of parsimonious correlation models for repeated measures data. Both medical imaging data concerning neonate neurological development and longitudinal data concerning diet and hypertension [DASH (Dietary Approaches to Stop Hypertension) study] exemplify the utility of the LEAR correlation structure.
    Statistics in Medicine 07/2010; 29(17):1825-38. · 1.88 Impact Factor
  • Source
    Article: Using scientifically and statistically sufficient statistics in comparing image segmentations
    Yueh-Yun Chi, Keith E Muller
    [show abstract] [hide abstract]
    ABSTRACT: Automatic computer segmentation in three dimensions creates opportunity to reduce the cost of three-dimensional treatment planning of radiotherapy for cancer treatment. Comparisons between human and computer accuracy in seg-menting kidneys in CT scans generate distance values far larger in number than the number of CT scans. Such high dimension, low sample size (HDLSS) data present a grand challenge to statisticians: how do we find good estimates and make credible inference? We recommend discovering and using scientifically and statistically sufficient statistics as an additional strategy for overcoming the curse of di-mensionality. First, we reduced the three-dimensional array of distances for each image comparison to a histogram to be modeled individually. Second, we used non-parametric kernel density estimation to explore distributional patterns and assess multi-modality. Third, a systematic exploratory search for parametric distributions and truncated variations led to choosing a Gaussian form as approximating the dis-tribution of a cube root transformation of distance. Fourth, representing each histogram by an individually estimated distribution eliminated the HDLSS problem by reducing on average 26,000 distances per histogram to just 2 parame-ter estimates. In the fifth and final step we used classical statistical methods to demonstrate that the two human ob-servers disagreed significantly less with each other than with the computer segmentation. Nevertheless, the size of all dis-agreements was clinically unimportant relative to the size of a kidney. The hierarchal modeling approach to object-oriented data created response variables deemed sufficient by both the scientists and statisticians. We believe the same strategy provides a useful addition to the imaging toolkit and will succeed with many other high throughput tech-nologies in genetics, metabolomics and chemical analysis.
    Statistics and Its Interface Volume. 01/2010; 3:91-101.
  • Article: Real longitudinal data analysis for real people: building a good enough mixed model.
    [show abstract] [hide abstract]
    ABSTRACT: Mixed effects models have become very popular, especially for the analysis of longitudinal data. One challenge is how to build a good enough mixed effects model. In this paper, we suggest a systematic strategy for addressing this challenge and introduce easily implemented practical advice to build mixed effects models. A general discussion of the scientific strategies motivates the recommended five-step procedure for model fitting. The need to model both the mean structure (the fixed effects) and the covariance structure (the random effects and residual error) creates the fundamental flexibility and complexity. Some very practical recommendations help to conquer the complexity. Centering, scaling, and full-rank coding of all the predictor variables radically improve the chances of convergence, computing speed, and numerical accuracy. Applying computational and assumption diagnostics from univariate linear models to mixed model data greatly helps to detect and solve the related computational problems. Applying computational and assumption diagnostics from the univariate linear models to the mixed model data can radically improve the chances of convergence, computing speed, and numerical accuracy. The approach helps to fit more general covariance models, a crucial step in selecting a credible covariance model needed for defensible inference. A detailed demonstration of the recommended strategy is based on data from a published study of a randomized trial of a multicomponent intervention to prevent young adolescents' alcohol use. The discussion highlights a need for additional covariance and inference tools for mixed models. The discussion also highlights the need for improving how scientists and statisticians teach and review the process of finding a good enough mixed model.
    Statistics in Medicine 12/2009; 29(4):504-20. · 1.88 Impact Factor
  • Source
    Article: POWERLIB: SAS/IML Software for Computing Power in Multivariate Linear Models
    [show abstract] [hide abstract]
    ABSTRACT: The POWERLIB SAS/IML software provides convenient power calculations for a widerange of multivariate linear models with Gaussian errors. The software includes the Box,Geisser-Greenhouse, Huynh-Feldt, and uncorrected tests in the univariate" approach torepeated measures (UNIREP), the Hotelling Lawley Trace, Pillai-Bartlett Trace, andWilks Lambda tests in multivariate" approach (MULTIREP), as well as a limited butuseful range of mixed models. The familiar univariate linear model with Gaussian errorsis an important special case. For estimated covariance, the software provides condencelimits for the resulting estimated power. All power and condence limits values canbe output to a SAS dataset, which can be used to easily produce plots and tables formanuscripts.
    Journal of Statistical Software. 01/2009;
  • Article: An R2 statistic for fixed effects in the linear mixed model.
    [show abstract] [hide abstract]
    ABSTRACT: Statisticians most often use the linear mixed model to analyze Gaussian longitudinal data. The value and familiarity of the R(2) statistic in the linear univariate model naturally creates great interest in extending it to the linear mixed model. We define and describe how to compute a model R(2) statistic for the linear mixed model by using only a single model. The proposed R(2) statistic measures multivariate association between the repeated outcomes and the fixed effects in the linear mixed model. The R(2) statistic arises as a 1-1 function of an appropriate F statistic for testing all fixed effects (except typically the intercept) in a full model. The statistic compares the full model with a null model with all fixed effects deleted (except typically the intercept) while retaining exactly the same covariance structure. Furthermore, the R(2) statistic leads immediately to a natural definition of a partial R(2) statistic. A mixed model in which ethnicity gives a very small p-value as a longitudinal predictor of blood pressure (BP) compellingly illustrates the value of the statistic. In sharp contrast to the extreme p-value, a very small R(2) , a measure of statistical and scientific importance, indicates that ethnicity has an almost negligible association with the repeated BP outcomes for the study.
    Statistics in Medicine 10/2008; 27(29):6137-57. · 1.88 Impact Factor
  • Source
    Article: Limitations of high dimension, low sample size principal components for Gaussian data
    [show abstract] [hide abstract]
    ABSTRACT: Medical images and genetic assays typically generate High Dimension, Low Sample Size (HDLSS) data, namely more variables than independent sampling units. Scientists often use Principal Components Analysis (PCA) of sample covariance matrices to work around the limitations of HDLSS. We provide analytic results and Monte Carlo simulations for Gaussian data which strongly discourage the practice. All but a negligible fraction of population variation must occur in a set of dominant components far fewer in number than the number of independent sampling units. The results demonstrate why statisticians must assess the empirical performance of any analysis method in the HDLSS setting. Expressing HDLSS data in terms of underlying canonical forms helps develop analytic and sample properties.
    03/2008;
  • Source
    Article: GLUMIP 2.0: SAS/IML Software for Planning Internal Pilots
    [show abstract] [hide abstract]
    ABSTRACT: Internal pilot designs involve conducting interim power analysis (without interim data analysis) to modify the final sample size. Recently developed techniques have been described to avoid the type~I error rate inflation inherent to unadjusted hypothesis tests, while still providing the advantages of an internal pilot design. We present GLUMIP 2.0, the latest version of our free SAS/IML software for planning internal pilot studies in the general linear univariate model (GLUM) framework. The new analytic forms incorporated into the updated software solve many problems inherent to current internal pilot techniques for linear models with Gaussian errors. Hence, the GLUMIP 2.0 software makes it easy to perform exact power analysis for internal pilots under the GLUM framework with independent Gaussian errors and fixed predictors.
    Journal of Statistical Software. 01/2008;
  • Article: Effects of home access and availability of alcohol on young adolescents' alcohol use.
    [show abstract] [hide abstract]
    ABSTRACT: The purpose of the present study was to examine the effects of parental provision of alcohol and home alcohol accessibility on the trajectories of young adolescent alcohol use and intentions. Data were part of a longitudinal study of alcohol use among multi-ethnic urban young adolescents who were assigned randomly to the control group of a prevention trial. Data were collected from a cohort of youth, and their parents, who attended public schools in Chicago, Illinois (2002-2005). The sample comprised the 1388 students, and their parents, who had been assigned randomly to the control group and were present and completed surveys at baseline, in the beginning of 6th grade (age 12). The sample was primarily low-income, and African American and Hispanic. Students completed self-report questionnaires when in the 6th, 7th and 8th grades (age 12-14 years; response rates 91-96%). Parents of the 6th grade students also completed questionnaires (70% response rate). Student report, at age 12, of parental provision of alcohol and home alcohol availability, and parental report of providing alcohol to their child and the accessibility of alcohol in the home, were associated with significant increases in the trajectories of young adolescent alcohol use and intentions from ages 12-14 years. Student report of receiving alcohol from their parent or taking it from home during their last drinking occasion were the most robust predictors of increases in alcohol use and intentions over time. Results indicate that it is risky for parents to allow children to drink during early adolescence. When these findings are considered together with the risks associated with early onset of alcohol use, it is clear that parents can play an important role in prevention.
    Addiction 11/2007; 102(10):1597-608. · 4.31 Impact Factor
  • Article: Internal pilots for a class of linear mixed models with Gaussian and compound symmetric data
    [show abstract] [hide abstract]
    ABSTRACT: An internal pilot design uses interim sample size analysis, without interim data analysis, to adjust the final number of observations. The approach helps to choose a sample size sufficiently large (to achieve the statistical power desired), but not too large (which would waste money and time). We report on recent research in cerebral vascular tortuosity (curvature in three dimensions) which would benefit greatly from internal pilots due to uncertainty in the parameters of the covariance matrix used for study planning. Unfortunately, observations correlated across the four regions of the brain and small sample sizes preclude using existing methods. However, as in a wide range of medical imaging studies, tortuosity data have no missing or mistimed data, a factorial within-subject design, the same between-subject design for all responses, and a Gaussian distribution with compound symmetry. For such restricted models, we extend exact, small sample univariate methods for internal pilots to linear mixed models with any between-subject design (not just two groups). Planning a new tortuosity study illustrates how the new methods help to avoid sample sizes that are too small or too large while still controlling the type I error rate. Copyright © 2007 John Wiley & Sons, Ltd.
    Statistics in Medicine 09/2007; 26(22):4083 - 4099. · 1.88 Impact Factor
  • Article: Statistical tests with accurate size and power for balanced linear mixed models.
    [show abstract] [hide abstract]
    ABSTRACT: The convenience of linear mixed models for Gaussian data has led to their widespread use. Unfortunately, standard mixed model tests often have greatly inflated test size in small samples. Many applications with correlated outcomes in medical imaging and other fields have simple properties which do not require the generality of a mixed model. Alternately, stating the special cases as a general linear multivariate model allows analysing them with either the univariate or multivariate approach to repeated measures (UNIREP, MULTIREP). Even in small samples, an appropriate UNIREP or MULTIREP test always controls test size and has a good power approximation, in sharp contrast to mixed model tests. Hence, mixed model tests should never be used when one of the UNIREP tests (uncorrected, Huynh-Feldt, Geisser-Greenhouse, Box conservative) or MULTIREP tests (Wilks, Hotelling-Lawley, Roy's, Pillai-Bartlett) apply. Convenient methods give exact power for the uncorrected and Box conservative tests. Simulations demonstrate that new power approximations for all four UNIREP tests eliminate most inaccuracy in existing methods. In turn, free software implements the approximations to give a better choice of sample size. Two repeated measures power analyses illustrate the methods. The examples highlight the advantages of examining the entire response surface of power as a function of sample size, mean differences, and variability.
    Statistics in Medicine 08/2007; 26(19):3639-60. · 1.88 Impact Factor
  • Article: Practical Methods for Bounding Type I Error Rate with an Internal Pilot Design
    [show abstract] [hide abstract]
    ABSTRACT: New analytic forms for distributions at the heart of internal pilot theory solve many problems inherent to current techniques for linear models with Gaussian errors. Internal pilot designs use a fraction of the data to re-estimate the error variance and modify the final sample size. Too small or too large a sample size caused by an incorrect planning variance can be avoided. However, the usual hypothesis test may need adjustment to control the Type I error rate. A bounding test achieves control of Type I error rate while providing most of the advantages of the unadjusted test. Unfortunately, the presence of both a doubly truncated and an untruncated chi-square random variable complicates the theory and computations. An expression for the density of the sum of the two chi-squares gives a simple form for the test statistic density. Examples illustrate that the new results make the bounding test practical by providing very stable, convergent, and much more accurate computations. Furthermore, the new computational methods are effectively never slower and usually much faster. All results apply to any univariate linear model with fixed predictors and Gaussian errors, with the t-test a special case.
    Communications in Statistics—Theory and Methods. 08/2007; 36(11):2143-2157.
  • Article: ON THE EXPECTED VALUES OF SEQUENCES OF FUNCTIONS
    Deborah H. Glueck, Keith E. Muller
    [show abstract] [hide abstract]
    ABSTRACT: We prove new extensions to lemmas about combinations of convergent sequences of distribution functions and absolutely continuous bounded functions. New lemma one, a generalized Helly theorem, allows computing the limit of the expected value of a sequence of functions with respect to a sequence of measures. Previously published results allow either the function or the measure to be a sequence, but not both. Lemma two allows computing the expected value of an absolutely continuous monotone function by integrating the probabilities of the inverse function values. Previous results were restricted to the identity function. Lemma three gives a computationally and analytically convenient form for the limit of the expected value of a sequence of functions of a sequence of random variables. This is a new result that follows directly from the first two lemmas. Although the lemmas resemble standard results and seem obviously true, we have found only similar looking and related but quite distinct results in the literature. We provide examples which highlight the value of the new results.
    COMMUN. STATIST.—THEORY METH. 02/2007; 30(2)(363–369 (2001)):363-369.
  • Article: Comparison of calcification specificity in digital mammography using soft-copy display versus screen-film mammography.
    [show abstract] [hide abstract]
    ABSTRACT: The purpose of this study was to compare specificity in the interpretation of calcifications in soft-copy reviewing of digital mammograms versus hard-copy reviewing of screen-film mammograms. A total of 130 consecutive cases with calcifications (44 malignant and 86 benign) that had been evaluated with needle or surgical biopsy were collected. Both screen-film mammography and soft-copy digital mammography were obtained in the same patients under existing research protocols using Fischer Imaging's SenoScan (n = 71), Lorad's digital mammography system (n = 35), and GE Healthcare's Senographe 2000D (n = 24). Eight trained radiologists scored all lesions--cropped or masked to display just the region of interest--both on screen-film and soft-copy digital mammography with a month between reviews to reduce the effects of learning and memory. A 5-point malignancy scale was used, with 1 as definitely not, 2 as probably not, 3 as possibly, 4 as probably, and 5 as definitely. Reviewers were randomly assigned condition order, and images within each condition were randomly ordered. Repeated measures analysis of variance was used to test for differences between conditions in specificity computed via nonparametric receiver operating characteristic (ROC) study separately for each reviewer and condition. Across all reviewers, the mean specificity for 1 or 2 versus 3, 4, or 5 was 0.803 for screen-film mammography (range, 0.413-0.938; SD +/- 0.166) and 0.833 for soft-copy image (range, 0.375-0.951; SD +/- 0.187). Although not statistically significant (Student's t test p values from 0.19 to 0.99 across all cut points), numeric values of specificity were consistently higher for soft-copy versus screen-film mammography. No statistical significance in specificity was seen using all possible cut points in the 5-point scale, although the primary analysis used the cutpoint for differentiation between benign and malignant cases as 1 or 2 versus 3, 4, or 5. No statistically significant difference was shown in specificity achievable using soft-copy digital versus screen-film mammography in this study.
    American Journal of Roentgenology 08/2006; 187(1):47-50. · 2.78 Impact Factor