Margaret Sullivan Pepe's research while affiliated with Fred Hutch Cancer Center and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (234)
Background:
Previous studies of second opinions in the diagnosis of melanocytic skin lesions have examined blinded second opinions, which do not reflect usual clinical practice. The current study, conducted in the USA, investigated both blinded and nonblinded second opinions for their impact on diagnostic accuracy.
Methods:
In total, 100 melanoc...
Importance
Diagnostic variation among pathologists interpreting cutaneous melanocytic lesions could lead to suboptimal care.
Objective
To estimate the potential association of second-opinion strategies in the histopathologic diagnosis of cutaneous melanocytic lesions with diagnostic accuracy and 1-year population-level costs in the US.
Design, Se...
Biomarkers abound in many areas of clinical research, and often investigators are interested in combining them for diagnosis, prognosis, or screening. In many applications, the true positive rate (TPR) for a biomarker combination at a prespecified, clinically acceptable false positive rate (FPR) is the most relevant measure of predictive capacity....
p>The cancer early-detection biomarker field was, compared with the therapeutic arena, in its infancy when the Early Detection Research Network (EDRN) was initiated in 2000. The EDRN has played a crucial role in changing the culture and the ways people conduct biomarker studies. The EDRN proposed biomarker developmental guidelines and biomarker piv...
Referral strategies based on risk scores and medical tests are commonly proposed. Direct assessment of their clinical utility requires implementing the strategy and is not possible in early phases of biomarker research. Prior to late‐phase studies, net benefit measures can be used to assess the potential clinical impact of a proposed strategy. Vali...
Importance
Histopathologic criteria have limited diagnostic reliability for a range of cutaneous melanocytic lesions.
Objective
To evaluate the association of second-opinion strategies by general pathologists and dermatopathologists with the overall reliability of diagnosis of difficult melanocytic lesions.
Design, Setting, and Participants
This...
Biomarkers abound in many areas of clinical research, and often investigators are interested in combining them for diagnosis, prognosis, or screening. In many applications, the true positive rate for a biomarker combination at a prespecified, clinically acceptable false positive rate is the most relevant measure of predictive capacity. We propose a...
Background:
Biomarker candidates are often ranked using P-values. Standard P-value calculations use normal or logit-normal approximations, which may not be correct for small P-values and small sample sizes common in discovery research.
Methods:
We compared exact P-values, correct by definition, with logit-normal approximations in a simulated stu...
Importance
The recently updated American Joint Committee on Cancer (AJCC) classification of cancer staging, the AJCC Cancer Staging Manual, 8th edition (AJCC 8), includes revisions to definitions of T1a vs T1b or greater. The Melanoma Pathology Study database affords a comparison of pathologists’ concordance and reproducibility in the microstaging...
Background:
Diagnostic interpretations of melanocytic skin lesions vary widely among pathologists, yet the underlying reasons remain unclear.
Objective:
Identify pathologist characteristics associated with rates of accuracy and reproducibility.
Methods:
Pathologists independently interpreted the same set of biopsies of melanocytic lesions on t...
Purpose:
To estimate the potential near-term population impact of alternative second opinion breast biopsy pathology interpretation strategies.
Methods:
Decision analysis examining 12-month outcomes of breast biopsy for nine breast pathology interpretation strategies in the U.S. health system. Diagnoses of 115 practicing pathologists in the Brea...
Consider a gene expression array study comparing two groups of subjects where the goal is to explore a large number of genes in order to select for further investigation a subset that appear to be differently expressed. There has been much statistical research into the development of formal methods for designating genes as differentially expressed....
There are clear clinical and public health needs to improve the early detection of breast cancer in order to save lives. Breast cancer is the leading cause of cancer death in women worldwide and is the second leading cause of cancer mortality in women in the United States. Although mammography is widely used to screen for breast cancer, it suffers...
Objective
To quantify the accuracy and reproducibility of pathologists’ diagnoses of melanocytic skin lesions.
Design
Observer accuracy and reproducibility study.
Setting
10 US states.
Participants
Skin biopsy cases (n=240), grouped into sets of 36 or 48. Pathologists from 10 US states were randomized to independently interpret the same set on t...
Appendix: Supplementary materials
Importance:
Compared with white American (WA) women, African American (AA) women have a 2-fold higher incidence of breast cancers that are negative for estrogen receptor, progesterone receptor, and ERBB2 (triple-negative breast cancer [TNBC]). Triple-negative breast cancer, compared with non-TNBC, likely arises from different pathogenetic pathways...
Background
Surgeons may receive a different diagnosis when a breast biopsy is interpreted by a second pathologist. The extent to which diagnostic agreement by the same pathologist varies at two time points is unknown. Methods
Pathologists from eight U.S. states independently interpreted 60 breast specimens, one glass slide per case, on two occasion...
Objective To evaluate the potential effect of second opinions on improving the accuracy of diagnostic interpretation of breast histopathology.
Design Simulation study.
Setting 12 different strategies for acquiring independent second opinions.
Participants Interpretations of 240 breast biopsy specimens by 115 pathologists, one slide for each case, c...
Background:
The effect of physician diagnostic variability on accuracy at a population level depends on the prevalence of diagnoses.
Objective:
To estimate how diagnostic variability affects accuracy from the perspective of a U.S. woman aged 50 to 59 years having a breast biopsy.
Design:
Applied probability using Bayes' theorem.
Setting:
B-P...
Background:
Many cancer biomarker research studies seek to develop markers that can accurately detect or predict future onset of disease. To design and evaluate these studies, one must specify the levels of accuracy sought. However, justified target levels are rarely available.
Methods:
We describe a way to calculate target levels of sensitivity...
Incomplete reporting has been identified as a major source of avoidable waste in biomedical research. Essential information is often not provided in study reports, impeding the identification, critical appraisal, and replication of studies. To improve the quality of reporting of diagnostic accuracy studies, the Standards for Reporting of Diagnostic...
Developing biomarkers that can predict whether patients are likely to benefit from an intervention is a pressing objective in many areas of medicine. Recent guidance documents have recommended that the accuracy of predictive biomarkers, ie, sensitivity, specificity, and positive and negative predictive values, should be assessed. We clarify the mea...
In Reply We emphasized in our article that evaluating the overall diagnostic system was not our objective. We studied diagnostic variation at the level of the individual pathologist reviewing a routinely stained slide because this is the starting point of every microscopic diagnosis. We documented very high variation for breast atypia and ductal ca...
Biomarkers that predict the efficacy of treatment can potentially improve clinical outcomes and decrease medical costs by allowing treatment to be provided only to those most likely to benefit. We consider the design of a randomized clinical trial in which one objective is to evaluate a treatment selection marker. The marker may be measured prospec...
Background: Triple negative breast cancers (TNBC) comprise 15-20% of all breast cancers and frequently present as interval cancers with high proliferative rates and increased risk of mortality. There is a clinical need for biomarkers for the early detection of TNBC to complement radiologic imaging. No plasma biomarkers for TNBC currently exist. The...
Biomarker discovery research has yielded few biomarkers that validate for clinical use. A contributing factor may be poor study designs.
The goal in discovery research is to identify a subset of potentially useful markers from a large set of candidates assayed on case and control samples. We recommend the PRoBE design for selecting samples. We prop...
A breast pathology diagnosis provides the basis for clinical treatment and management decisions; however, its accuracy is inadequately understood.
To quantify the magnitude of diagnostic disagreement among pathologists compared with a consensus panel reference diagnosis and to evaluate associated patient and pathologist characteristics.
Study of pa...
Background: Many circulating biomarkers have been reported for the diagnosis of breast cancer, but few, if any, have undergone rigorous credentialing using prospective cohorts and blinded evaluation. Methods: The NCI Early Detection Network (EDRN) has created a prospective, multicenter collection of plasma and serum samples from 832 subjects design...
Estrogen receptor (ER)-positive/progesterone receptor (PR)-positive invasive ductal carcinoma accounts for approximately 45% of all invasive breast cancers (BC) diagnosed in the United States each year. While mammography screening and adjuvant hormonal therapy have played key roles in reducing breast cancer mortality, an important challenge remains...
The Net Reclassification Index (NRI) is a very popular measure for evaluating the improvement in prediction performance gained by adding a marker to a set of baseline predictors. However, the statistical properties of this novel measure have not been explored in depth. We demonstrate the alarming result that the NRI statistic calculated on a large...
Context:
Little is known about the frequency of discordant diagnoses identified during research.
Objective:
To describe diagnostic discordance identified during research and apply a newly designed research framework for investigating discordance.
Design:
Breast biopsy cases (N = 407) from registries in Vermont and New Hampshire were independen...
Abstract Despite the heightened interest in developing biomarkers predicting treatment response that are used to optimize patient treatment decisions, there has been relatively little development of statistical methodology to evaluate these markers. There is currently no unified statistical framework for marker evaluation. This paper proposes a sui...
The Net Reclassification Index (NRI) and its P value are used to make conclusions about improvements in prediction performance gained by adding a set of biomarkers to an existing risk prediction model. Although proposed only 5 years ago, the NRI has gained enormous traction in the risk prediction literature. Concerns have recently been raised about...
Net reclassification indices have recently become popular statistics for measuring the prediction increment of new biomarkers. We review the various types of net reclassification indices and their correct interpretations. We evaluate the advantages and disadvantages of quantifying the prediction increment with these indices. For predefined risk cat...
Two different approaches to analysis of data from diagnostic biomarker studies are commonly employed. Logistic regression is used to fit models for probability of disease given marker values, while ROC curves and risk distributions are used to evaluate classification performance. In this paper we present a method that simultaneously accomplishes bo...
Two different approaches to analysis of data from diagnostic biomarker studies are commonly employed. Logistic regression is used to fit models for probability of disease given marker values while ROC curves and risk distributions are used to evaluate classification performance. In this paper we present a method that simultaneously accomplishes bot...
There is growing interest in markers that can be used to identify which patients are most likely to benefit from a treatment. For example, the Gail breast cancer risk prediction model may be useful for identifying a subset of older women for whom the benefit of tamoxifen for breast cancer prevention is likely to outweigh the harm. Two general class...
Two-phase study methods, in which more detailed or more expensive exposure information is only collected on a sample of individuals with events and a small proportion of other individuals, are expected to play a critical role in biomarker validation research. One major limitation of standard two-phase designs is that they are most conveniently empl...
Authors have proposed new methodology in recent years for evaluating the improvement in prediction performance gained by adding a new predictor, Y, to a risk model containing a set of baseline predictors, X, for a binary outcome D. We prove theoretically that null hypotheses concerning no improvement in performance are equivalent to the simple null...
Rationale and objectives:
Studies evaluating a new diagnostic imaging test may select control subjects without disease who are similar to case subjects with disease in regard to factors potentially related to the imaging result. Selecting one or more controls that are matched to each case on factors such as age, comorbidities, or study site improv...
Purpose:
Optimal triage of patients at risk for critical illness requires accurate risk prediction, yet few data on the performance criteria required of a potential biomarker to be clinically useful exists.
Materials and methods:
We studied an adult cohort of nonarrest, nontrauma emergency medical services encounters transported to a hospital fr...
Background
Diagnostic test sets are a valuable research tool that contributes importantly to the validity and reliability of studies that assess agreement in breast pathology. In order to fully understand the strengths and weaknesses of any agreement and reliability study, however, the methods should be fully reported. In this paper we provide a st...
When an existing risk prediction model is not sufficiently predictive, additional variables are sought for inclusion in the model. This paper addresses study designs to evaluate the improvement in prediction performance that is gained by adding a new predictor to a risk prediction model. We consider studies that measure the new predictor in a case-...
When an existing standard marker does not have sufficient classification accuracy on its own, new markers are sought with the goal of yielding a combination with better performance. The primary criterion for selecting new markers is that they have good performance on their own and preferably be uncorrelated with the standard. Most often linear comb...
This chapter covers material presented in a short course at the 2011 International Conference on Risk Assessment and Evaluation of Predictions. Methods for evaluating the performance of markers to predict risk of a current or future clinical outcome are reviewed. Specifically, we discuss criteria for evaluating a risk model including: calibration,...
Background:
The mission of the National Cancer Institute's Early Detection Research Network (EDRN) is to identify and validate cancer biomarkers for clinical use. Since its inception, EDRN investigators have learned a great deal about the process of validating biomarkers for clinical use. Translational research requires a broad spectrum of researc...
In this issue of the Journal, Pencina and et al. (Am J Epidemiol. 2012;176(6):492-494) examine the operating characteristics of measures of incremental value. Their goal is to provide benchmarks for the measures that can help identify the most promising markers among multiple candidates. They consider a setting in which new predictors are condition...
Selecting controls that match cases on risk factors for the outcome is a pervasive practice in biomarker research studies. Such matching, however, biases estimates of biomarker prediction performance. The magnitudes of these biases are unknown.
We examined the prediction performance of biomarkers and improvements in prediction gained by adding biom...
Many biomarkers identified in marker discovery are shown to have inadequate performance in validation studies. This motivates the use of group sequential designs that allow early termination for futility. However, an option for early termination will lead to biased estimates for studies that reach full enrollment. We propose conditional estimators...
Epidemiologic methods are well established for investigating the association of a predictor of interest and disease status in the presence of covariates also associated with disease. There is less consensus on how to handle covariates when the goal is to evaluate the increment in prediction performance gained by a new marker when a set of predictor...
For comparing the performance of a baseline risk prediction model with one that includes an additional predictor, a risk reclassification analysis strategy has been proposed. The first step is to cross-classify risks calculated according to the 2 models for all study subjects. Summary measures including the percentage of reclassification and the pe...
Treatment selection markers, sometimes called predictive markers, are factors that help clinicians select therapies that maximize good outcomes and minimize adverse outcomes for patients. Existing statistical methods for evaluating a treatment selection marker include assessing its prognostic value, evaluating treatment effects in patients with a r...
The diagnostic likelihood ratio function, DLR, is a statistical measure used to evaluate risk prediction markers. The goal of this paper is to develop new methods to estimate the DLR function. Furthermore, we show how risk prediction markers can be compared using rank-invariant DLR functions. Various estimators are proposed that accommodate cohort...
The predictiveness curve is a graphical tool that characterizes the population distribution of Risk(Y)=P(D=1|Y), where D denotes a binary outcome such as occurrence of an event within a specified time period and Y denotes predictors. A wider distribution of Risk(Y) indicates better performance of a risk model in the sense that making treatment reco...
To assess the value of a continuous marker in predicting the risk of a disease, a graphical tool called the predictiveness curve has been proposed. It characterizes the marker's predictiveness, or capacity to risk stratify the population by displaying the distribution of risk endowed by the marker. Methods for making inference about the curve and f...
Statistical evaluation of medical imaging tests used for diagnostic and prognostic purposes often employs receiver operating characteristic (ROC) curves. Two methods for ROC analysis are popular. The ordinal regression method is the standard approach used when evaluating tests with ordinal values. The direct ROC modeling method is a more recently d...
Advances in biotechnology have raised expectations that biomarkers, including genetic profiles, will yield information to accurately predict outcomes for individuals. However, results to date have been disappointing. In addition, statistical methods to quantify the predictive information in markers have not been standardized.
We discuss statistical...
The performance of a well-calibrated risk model for a binary disease outcome can be characterized by the population distribution of risk and displayed with the predictiveness curve. Better performance is characterized by a wider distribution of risk, since this corresponds to better risk stratification in the sense that more subjects are identified...
The predictive capacity of a marker in a population can be described using the population distribution of risk (Huang et al. 2007; Pepe et al. 2008a; Stern 2008). Virtually all standard statistical summaries of predictability and discrimination can be derived from it (Gail and Pfeiffer 2005). The goal of this paper is to develop methods for making...
Recent scientific and technological innovations have produced an abundance of potential markers that are being investigated for their use in disease screening and diagnosis. In evaluating these markers, it is often necessary to account for covariates associated with the marker of interest. Covariates may include subject characteristics, expertise o...
The predictiveness curve shows the population distribution of risk endowed by a marker or risk prediction model. It provides a means for assessing the model's capacity for stratifying the population according to risk. Methods for making inference about the predictiveness curve have been developed using cross-sectional or cohort data. Here we consid...
The receiver operating characteristic (ROC) curve displays the capacity of a marker or diagnostic test to discriminate between two groups of subjects, cases versus controls. We present a comprehensive suite of Stata commands for performing ROC analysis. Nonparametric, semiparametric, and parametric estimators are calculated. Comparisons between cur...
Classification accuracy is the ability of a marker or diagnostic test to discriminate between two groups of individuals, cases and controls, and is com- monly summarized by using the receiver operating characteristic (ROC) curve. In studies of classification accuracy, there are often covariates that should be incorporated into the ROC analysis. We...
Development of a disease screening biomarker involves several phases. In phase 2 its sensitivity and specificity is compared with established thresholds for minimally acceptable performance. Since we anticipate that most candidate markers will not prove to be useful and availability of specimens and funding is limited, early termination of a study...
Biomarkers that can be used in combination with established screening tests to reduce false positive rates are in considerable demand. In this article, we present methods for evaluating the diagnostic performance of combination tests that require positivity on a biomarker test in addition to a standard screening test. These methods rely on relative...
Classification accuracy is the ability of a marker or diagnostic test to discriminate between two groups of individuals, cases and controls, and is commonly summarized using the receiver operating characteristic (ROC) curve. In studies of classification accuracy, there are often covariates that should be incorporated into the ROC analysis. We descr...
The recent epidemiologic and clinical literature is filled with studies evaluating statistical models for predicting disease or some other adverse event. Risk stratification tables are a new way to evaluate the benefit of adding a new risk marker to a risk prediction model that includes an established set of markers. This approach involves cross-ta...
Research methods for biomarker evaluation lag behind those for evaluating therapeutic treatments. Although a phased approach
to development of biomarkers exists and guidelines are available for reporting study results, a coherent and comprehensive
set of guidelines for study design has not been delineated. We describe a nested case–control study de...
Consider a set of baseline predictors X to predict a binary outcome D and let Y be a novel marker or predictor. This paper is concerned with evaluating the performance of the augmented risk model P(D = 1|Y,X) compared with the baseline model P(D = 1|X). The diagnostic likelihood ratio, DLR(X)(y), quantifies the change in risk obtained with knowledg...
The classification accuracy of a continuous marker is typically evaluated with the receiver operating characteristic (ROC)
curve. In this paper, we study an alternative conceptual framework, the “percentile value.” In this framework, the controls
only provide a reference distribution to standardize the marker. The analysis proceeds by analyzing the...
The concept of covariate adjustment is well established in therapeutic and etiologic studies. However, it has received little attention in the growing area of medical research devoted to the development of markers for disease diagnosis, screening, or prognosis, where classification accuracy, rather than association, is of primary interest. In this...
Receiver operating characteristic (ROC) curves play a central role in the evaluation of biomarkers and tests for disease diagnosis. Predictors for event time outcomes can also be evaluated with ROC curves, but the time lag between marker measurement and event time must be acknowledged. We discuss different definitions of time-dependent ROC curves i...
In case-control studies evaluating the classification accuracy of a marker, controls are often matched to cases with respect to factors associated with the marker and disease status. In contrast with matching in epidemiologic etiology studies, matching in the classification setting has not been rigorously studied. In this article, we consider the i...
There are two popular statistical approaches to biomarker evaluation. One models the risk of disease (or disease outcome) with, for example, logistic regression. A marker is considered useful if it has a strong effect on risk. The second evaluates classification performance by use of measures such as sensitivity, specificity, predictive values, and...
In a prospective cohort study, information on clinical parameters, tests and molecular markers is often collected. Such information is useful to predict patient prognosis and to select patients for targeted therapy. We propose a new graphical approach, the positive predictive value (PPV) curve, to quantify the predictive accuracy of prognostic mark...
Consider a continuous marker for predicting a binary outcome. For example, the serum concentration of prostate specific antigen may be used to calculate the risk of finding prostate cancer in a biopsy. In this article, we argue that the predictive capacity of a marker has to do with the population distribution of risk given the marker and suggest a...
Latent class analysis is used to assess diagnostic test accuracy when a gold standard assessment of disease is not available but results of multiple imperfect tests are. We consider the simplest setting, where 3 tests are observed and conditional independence (CI) is assumed. Closed-form expressions for maximum likelihood parameter estimates are de...
Diagnostic tests, medical tests, screening tests, biomarkers, and prediction rules are all types of classifiers. This chapter introduces methods for classifier development and evaluation. We first introduce measures of classification performance including sensitivity, specificity, and receiver operating characteristic (ROC) curves. We then review s...
The case–control design is frequently used to study the discriminatory accuracy of a screening or diagnostic biomarker. Yet,
the appropriate ratio in which to sample cases and controls has never been determined. It is common for researchers to sample
equal numbers of cases and controls, a strategy that can be optimal for studies of association. How...
Modern technologies promise to provide new ways of diagnosing disease, detecting subclinical disease, predicting prognosis, selecting patient specific treatment, identifying subjects at risk for disease, and so forth. Advances in genomics, proteomics and imaging modalities in particular hold great potential for assisting with classification/predict...
The statistical literature on assessing the accuracy of risk factors or disease markers as diagnostic tests deals almost exclusively with settings where the test, Y, is measured concurrently with disease status D. In practice, however, disease status may vary over time and there is often a time lag between when the marker is measured and the occurr...
No single biomarker for cancer is considered adequately sensitive and specific for cancer screening. It is expected that the results of multiple markers will need to be combined in order to yield adequately accurate classification. Typically, the objective function that is optimized for combining markers is the likelihood function. In this article,...
Non-parametric procedures such as the Wilcoxon rank-sum test, or equivalently the Mann-Whitney test, are often used to analyse data from clinical trials. These procedures enable testing for treatment effect, but traditionally do not account for covariates. We adapt recently developed methods for receiver operating characteristic (ROC) curve regress...