George Duncan

George Duncan
Carnegie Mellon University | CMU · Heinz College, School of Public Policy & Management

PhD

About

102
Publications
13,847
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,781
Citations
Additional affiliations
July 1974 - May 2016
Carnegie Mellon University
Position
  • Professor of Statistics, Emeritus

Publications

Publications (102)
Article
Collaborative filtering algorithms learn from the ratings of a group of users on a set of items to find personalized recommendations for each user. Traditionally they have been designed to work with one-dimensional ratings. With interest growing in recommendations based on multiple aspects of items, we present an algorithm for using multicomponent...
Article
Thus far, our focus has been on achieving confidentiality protection through SDL procedures, which as we know mask the data that are to be disseminated. This is the restricted data approach.1 As we noted in Chapters 1 and 2, a different but complementary approach is for the DSO to instead control the process of accessing and analyzing the data.2 Re...
Article
Before disseminating a data product for public use, a DSO needs to assess the risk of a data snooper compromising confidentiality. In its original form as the source data, a data product typically has unacceptably high disclosure risk. The data product must therefore be transformed to lower the disclosure risk to an acceptable level. We present a v...
Article
Risk-utility formulations for problems of statistical disclosure limitation are now common. We argue that these approaches are powerful guides to official statistics agencies in regard to how to think about disclosure limitation problems, but that they fall short in essential ways from providing a sound basis for acting upon the problems. We illust...
Chapter
Confirming statistical confidentiality as vital to the stewardship of personal data, the United Nations set out Principle 6 of its Fundamental Principles of Official Statistics: Individual data collected by statistical agencies for statistical compilation, whether they refer to natural or legal persons, are to be strictly confidential and used excl...
Article
Full-text available
Evidence suggests that the medication lists of patients are often incomplete and could negatively affect patient outcomes. In this article, the authors propose the application of collaborative filtering methods to the medication reconciliation task. Given a current medication list for a patient, the authors employ collaborative filtering approaches...
Article
Full-text available
The increasing demand for information, coupled with the increasing capability of computer systems, has compelled information providers to reassess their procedures for preventing disclosure of confl- dential information. This paper considers the problem of protecting an unpublished, sensitive table by suppressing cells in related, published tables....
Book
Why statistical confidentiality?- Concepts of statistical disclosure limitation.- Assessment of disclosure risk.- Protecting tabular data.- Providing and protecting microdata.- Disclosure rRisk and data utility.- Restrictions on data access.- Thoughts on the future.
Article
A microdata file is a compilation of data records. Each record contains values of attributes about a single unit—say a person with the attributes of their height, attitude toward minimum wage laws, cell-phone usage, and diastolic blood pressure. Microdata are special. Expanding Access to Research Data: Reconciling Risks and Opportunities (National...
Article
As we have repeatedly argued, DSOs fulfill their stewardship responsibilities by resolving the tension between ensuring confidentiality and providing access (Duncan et al., 1993; Kooiman et al., 1999; Marsh et al., 1991). Data stewardship, therefore, requires disseminating data products that both (1) protect confidentiality—so get disclosure risk R...
Article
The SDL literature has its own terminology. Understanding this terminology and, more importantly, the concepts underlying the terminology is essential to learning how statistical confidentiality can be best employed. In this chapter we look at the structure of disclosure risk, its assessment, and its limitation. Complicating our task is that many t...
Article
Familiar to all of us, a statistical table displays aggregate information that is classified according to categories. Even in this age of electronic dissemination, tables remain important data products. In the past, DSOs published these tables in paper form as large statistical abstracts. Today, many DSOs provide users with an online capability for...
Article
The future will surely bring challenges to statistical confidentiality. Some challenges will be familiar, much like the ones described in Chapter 1. But as the lead quotation suggests, we must prepare for exponential change in our responsibilities, the technology we employ, and the problems we face. Specifically, we must prepare for dramatic change...
Article
The increasing demand for information, coupled with the increasing capability of computer systems, has compelled information providers to reassess their procedures for preventing disclosure of confidential information. This paper considers the problem of protecting an unpublished, sensitive table by suppressing cells in related, published tables. A...
Article
Guideline based clinical decision support systems provide patient-specific medical guidance to physicians, often at the point-of-care. A large body of research shows that these systems have the potential to reduce practice variation and human error. However, there is also evidence suggesting that these systems may introduce unintended risk into the...
Article
In a multilevel relational (MLR) database, users are not allowed to access data classified at a level higher than their own security classification. However, it may be possible for a low‐level user to infer high‐level data. This article provides methods to detect and eliminate such inference channels. A graph‐based representation of the database sc...
Article
Protecting confidentiality is essential to the functioning of systems for collecting and disseminating data on individuals and enterprises that are necessary for evidence-based public policy formulation. Deidentification of records, defined as removing obvious identifiers such as name and address, is not sufficient to protect confidentiality. Micro...
Article
Government agencies collect and disseminate data that bear on the most important issues of public interest. Advances in information technology, particularly the Internet, have created a globalized technology society and multiplied the tension between demands for ever more comprehensive databases and demands for the shelter of privacy. In reconcilin...
Article
Full-text available
A physicians prescribing decisions depend on knowledge of the patients medication list. This knowledge is often incomplete, and errors or omissions could result in adverse outcomes. To address this problem, the Joint Commission recommends medication reconciliation for creating a more accurate list of a patients medications. In this paper, we develo...
Article
Full-text available
New technologies are being developed to protect the privacy of individuals in today's information society.
Article
Various entities (e.g., parents, employers) that provide users (e.g., children, employees) access to web content wish to limit the content accessed through those computers. Available filtering methods are crude in that they too often block “acceptable” content while failing to block “unacceptable” content. This paper presents a general and flexible...
Conference Paper
Full-text available
Incremental hierarchical text document clustering algorithms are important in organizing documents generated from stream- ing on-line sources, such as, Newswire and Blogs. However, this is a relatively unexplored area in the text document clustering literature. Popular incremental hierarchical clus- tering algorithms, namely Cobweb and Classit ,h a...
Article
Full-text available
The dependency structure among the component ratings is discovered and incorporated into a mixture model. The parameters of the model were estimated using the Expectation Maxi- mization algorithm. The algorithm was evaluated using data collected from Yahoo Movies. It was found that using multiple components leads to improved recommendations over us...
Article
Full-text available
When disseminating data on individuals, information organizations must balance the interests of data users in better access and the interests of data providers in confldentiality. Cyclic perturbation is a new method for protecting sensitive data in categorical tables. In the disseminated data product, the true table values are altered in a way that...
Article
Full-text available
Managers of database security must ensure that data access does not compromise the confidentiality afforded data providers, whether individuals or establishments. Recognizing that deidentification of data is generally inadequate to protect confidentiality against attack by a data snooper, managers of information organizations (IOs)—such as statisti...
Article
Full-text available
Government agencies collect and disseminate data that bear on the most important issues of public interest. Advances in information technology, particularly the Internet, have multiplied the tension between demands for evermore comprehensive databases and demands for the shelter of privacy. In mediating between these two conflicting demands, agenci...
Article
Under confidentiality, identifiable data provided for statistical purposes is protected from unauthorized disclosure. Organizations acting as brokers between respondents and data users seek to disseminate useful data products while keeping low the risk of confidentiality disclosure. Recognizing that deidentification of each data record is generally...
Article
Introduction Even in the age of electronic dissemination of statistical data, tables are central data products of statistical agencies. For prominent examples, see the American FactFinder (http://factfinder.census.gov/servlet/BasicFactsServlet) from the U.S. Bureau of Census, the Office of National Statistics (http://www.statistics.gov.uk/) in the...
Article
Full-text available
ABSTRACT Organizations that use time series forecasting on a regular basis generally forecast many variables, such as demand for many products or services. Within the population of variables forecasted by an organization, we can expect that there will be groups of analogous time series that follow similar, time-based patterns. The co-variation of a...
Article
Full-text available
Information organizations (IOs) must provide data products that are both useful and have low risk of confidentiality disclosure. Recognizing that deidentification of data is generally inadequate to protect their confidentiality against attack by a data snooper, concerned IOs can apply disclosure limitation techniques to the original data. Desirably...
Conference Paper
Full-text available
Statistical agencies seek to disseminate useful data while keeping low the risk of statistical confidentiality disclosure. Recognizing that reidentification of data is generally inadequate to protect its confidentiality against attack by a data snooper, agencies restrict the data they release for general use. Typically, these restricted data proced...
Article
Full-text available
Disclosure limitation methods transform statistical databases to protect confidentiality, a practical concern of statistical agencies. A statistical database responds to queries with aggregate statistics. The database administrator should maximize legitimate data access while keeping the risk of disclosure below an acceptable level. Legitimate user...
Article
Full-text available
As databases grow more prevalent and comprehensive, database administrators seek to limit disclosure of confidential information while still providing access to data. Practical databases accommodate users with heterogeneous needs for access. Each class of data user is accorded access to only certain views. Other views are considered confidential, a...
Article
The need to justify one's decisions is a signal characteristic of decision making in a managerial environment. Even chief executives must communicate reasons for their actions. Yet, despite a significant amount of laboratory research on the effects of accountability on decision making, few studies have attempted to assess what affects accountabilit...
Article
Full-text available
Government agencies collect and disseminate data that bear on the most important issues of public interest. Advances in information technology, particularly the Internet, have multiplied the tension between demands for evermore comprehensive databases and demands for the shelter of privacy. In mediating between these two conflicting demands, agenci...
Article
Full-text available
Preserving privacy appears to conflict with providing information. Statistical information can, however, be provided while preserving a specified level of confidentiality protection. The general approach is to provide disclosure-limited data that maximizes its statistical utility subject to confidentiality constraints. Disclosure limitation based o...
Conference Paper
Multi-dimensional tables of sensitive information are often summarized and made public by means of lower-dimensional projections, which are intended do prevent any disclosure of confidential data. Multiple projections of the same underlying table are linked over common attributes, however, so there is concern about the possibility of recovering sen...
Conference Paper
The increasing demand for information, coupled with the increasing capability of computer systems, has compelled information providers to reassess their procedures for preventing disclosure of confidential information. General logical and numerical methods exist to determine, prior to release, if disclosure can occur-either directly or through infe...
Article
The U.S. Census Bureau, health data providers, and credit bureaus are information organizations (IOs). They collect, store, and process large sets of sensitive data on individuals, households, and organizations. Storage, processing, and dissemination technologies that IOs employ have grown in capability, sophistication, and cost-effectiveness. Thes...
Article
Not just about communication and access, the Net also poses concerns over choosing the criteria that define ethical research.
Article
Full-text available
The increasing demand for information, coupled with the increasing capability of computer systems, has compelled information providers to reassess their procedures for preventing disclosure of confidential information. General logical and numerical methods exist to determine, prior to release, if disclosure can occur---either directly or through in...
Conference Paper
Full-text available
As computer databases have become more prevalent and comprehensive, they have prompted concerns about the confidentiality of sensitive information. Database administrators require effective policies to guard against disclosure of confidential information while at the same time providing reasonable access to legitimate users. A general method is ava...
Article
Time series forecasting for relatedunits is common practice. Examples include sales of a chain of fast-food restaurants in a metropolitan area, precipitation in neighboring sections of farm land, and, as explored in this chapter, tax revenues for the school districts of a county. Our premise is that there is information in the cross- sectional data...
Article
Future developments in methodology have the potential to improve management research and better couple it to management practice. These developments are on six fronts: (1) computer technology, (2) data capture and experimentation, (3) privacy, confidentiality, and data access, (4) causation, (5) modeling and simulation, and (6) Bayesian statistics....
Article
One important implementation of Bayesian forecasting is the Multi-State Kalman Filter (MSKF) method. It is particularly suited for short and irregular time series data. In certain applications, time series data are available on numerous parallel observational units which, while not having cause-and-effect relationships between them, are subject to...
Article
A mediator aids disputants in resolving their differences. A theoretical framework for mediator mechanisms extends the subjective expected utility perspective and permits formal examination of mediator attitudes toward the equity and efficiency of agreements. Two forms of impartiality in promoting gains to disputants are examined: active (jointly m...
Conference Paper
Full-text available
Disclosure control methods in statistical databases often rely on modifying responses to queries while approximately maintaining values of aggregate statistics. Response modification schemes suggested in the literature have adopted one of two extreme measures; the responses for repeated queries are either independent or they are totally dependent....
Article
Full-text available
This article presents a scenario for the future of research access to federally collected microdata. Many researchers find access to government databases increasingly desirable. The databases themselves are more comprehensive, of better quality and-with improved database management techniques--better structured. Advances in computer communications...
Conference Paper
Full-text available
A probabilistic framework can be used to assess the risk of disclosure of confidential information in statistical databases that use disclosure control mechanisms. The authors show how the method may be used to assess the strengths and weaknesses of two existing disclosure control mechanisms: the query set size restriction control and random sample...
Article
A probabilistic framework can be employed to assess the risk of disclosure of confidential information in statistical databases that use disclosure control mechanisms. We illustrate how the method may be used to assess the strengths and weaknesses of two existing disclosure control mechanisms - query set size restriction control and random sample q...
Article
Recognizing fertile ground and preparing ground not yet ready is an essential skill of an effective intervenor. This sequence of diagnosis and action is studied in a theoretical framework in which mediators examine and alter four classes of disputants' perceptions. These classes are (1) the available set of actions, (2) the class of possible conseq...
Article
This simulation consists of a series of exercises in which participants select moves that minimize their penalties in a delivery system (TARTAN) game. It emphasizes multiple‐actor decision making, which shows how negotiation can lead to cooperative solutions with material benefit. It makes use of calculus, and optimization techniques of dynamic pro...
Article
Full-text available
Statistical agencies that provide microdata for public use strive to keep the risk of disclosure of confidential information negligible. Assessing the magnitude of the risk of disclosure is not easy, however. Whether a data user or intruder attempts to obtain confidential information from a public-use file depends on the perceived costs of identify...
Article
Providing researchers, especially those in the social sciences, with access to publicly collected microdata furthers research while advancing public policy goals in a democratic society. However, while technological improvements have eased remote access to these databases and enabled computer using researchers to perform sophisticated statistical a...
Article
Mandate, as a noun, is the charge that authorizes and legitimizes an intervenor's actions. The intervenor may act at the bidding of the disputants or of third party stakeholders. Mandate provides a functional taxonomy of intervenors-from go-betweens to conciliators to mediators to arbitrators to dictators. Mandate affords a perspective for analyzin...
Article
Two results on the unimodality of the Dirichlet-multinomial distribution are proved, and a further result is alos proved on the identifiability of mixtures of multinomial distributions. These properties are used in developing a method for eliciting a Dirchlet prior distribution. The elicitation method is based on the mode, and region around the mod...
Article
Partial specification of a prior distribution can be appealing to an analyst, but there is no conventional way to update a partial prior. In this paper, we show how a framework for Bayesian updating with data can be based on the Dirichlet(a) process. Within this framework, partial information predictors generalize standard minimax predictors and ha...
Article
Full-text available
Statistical agencies use a variety of disclosure control policies with ad hoc justification in disseminating data. The issues involved are clarified here by showing that several of these policies are special cases of a general disclosure-limiting (DL) approach based on predictive distributions and uncertainty functions. A user's information posture...
Article
An interactive computer scheme is described for eliciting from an analyst a beta prior distribution on the parameter π of a binomial distribution. Information on the analyst's beta-binomial predictive distribution is obtained through a questioning and feedback algorithm based on modes.
Article
International system theorists usually hypothesize great flexibility of alliance partner choice among the major powers in a multipolar system. To test for the existence of such flexibility, three statistically testable hypotheses of alliance partner choice in a multipolar system are derived. Log-linear model procedures are developed for testing hyp...
Article
Experiments on frequency-dependent fitness often consist of forming pairwise mixtures of distinguishable types at several frequency combinations. These mixtures are allowed to undergo competition, after which the performance of each type is enumerated. A statistical method for analyzing such experiments is described in this article. This method, su...
Article
Stochastic models are constructed to illuminate the dynamic incidence of international warfare during the period from 1816 to 1965. It is argued that the probabilistic structure of this incidence is revealed most clearly through an analysis based on dyads of nations, thereby disassembling multilateral wars such as World War II. The conceptual focus...
Article
A jack knife procedure for constructing confidence regions in nonlinear regression is examined using Monte Carlo simulation. The jack knife promises to be asymptotically double-edged, being both independent of linearizing approximations to the regression surface and insensitive to specification of the error distribution. For moderate sample sizes t...
Article
A multiple-answer multiple-choice test item has a certain number of alternatives,any number of which might be keyed. The examinee is also allowed to mark any number of alternatives. This increased flexibility over the one keyed alternative case is useful in practice but raises questions about appropriate scoring rules. In this article a certain cla...
Article
An observer is to make inference statements about a quantityp, called apropensity and bounded between 0 and 1, based on the observation thatp does or does not exceed a constantc. The propensityp may have an interpretation as a proportion, as a long-run relative frequency, or as a personal probability held by some subject. Applications in medicine,...
Article
This paper develops Bayesian statistical estimation procedures for the finite state Markov renewal process. The general case is treated where uncertainty exists about both the waiting time distributions and the transition probabilities. This work extends the Bayesian results of Martin, who only considers the Markov chain case, and Brock, who assume...
Article
A reasonable charge structure for sequential questioning schemes which do not partition conditional state spaces, i.e., a charge structure for a lattice questionnaire, is proposed. The minimum average charge then is shown to characterize the finite state Shannon entropy.
Article
A charging scheme based on the resolution of questions strikes a new direction from the approach of Claude Picard. The relationship between questionnaire theory and noiseless coding theory is explored. Graph-theoretic methods are used to obtain results valid for codes in which words are constructed from arbitrary mixtures of alphabets, as well as a...
Article
This article develops multiple-choice test scoring rules, concentrating on Bayes rules and their frequency theory analogs, empirical Bayes rules. Conditions are given for empirical Bayes estimates to lie in the probability simplex. The misinformation model is considered in detail. It is shown that ranking by raw scores is equivalent to ranking by B...
Article
Monte-Carlo simulation is used to compare the small-sample performance of the usual normal theory procedures for inference about correlation coefficients with that of two asymptotically robust procedures, one of which is based on a grouping of the observations and the other on the jackknife technique. The sampled distributions comprise the normal a...
Article
Monte Carlo simulation is used to compare the small sample performance of the usual normal theory procedures for inference about correlation coefficients with that of two asymptotically robust procedures, one of which is based on a grouping of the observations and the other on the jackknife technique. The sampled distributions comprise the normal a...
Article
Models suitable for statistical inference in Markov chains are considered featuring various forms of stochastic entry, including Poisson, renewable binomial pool, uncertain pool size, negative binomial, and input-output reservoir. A parallel inactive states model is suggested for exit. Maximum likelihood estimates are obtained and likelihood ratio...
Article
Full-text available
Collaborative filtering algorithms learn from the ratings of a group of users on a set of items to find rec-ommendations for each user. Traditionally they have been designed to work with one dimensional ratings. With interest growing in recommending based on multiple aspects of items (Adomavicius and Kwon 2007, Adomavicius and Tuzhilin 2005) we pre...

Network

Cited By