Peter-Paul de Wolf

Peter-Paul de Wolf
Centraal Bureau voor de Statistiek | CBS · Methodology

PhD mathematics (statistics)

About

41
Publications
2,871
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
724
Citations
Citations since 2017
5 Research Items
261 Citations
201720182019202020212022202301020304050
201720182019202020212022202301020304050
201720182019202020212022202301020304050
201720182019202020212022202301020304050
Introduction
Peter-Paul de Wolf currently works as senior methodologist at the Methodology Department at Statistics Netherlands (CBS). He does research in the filed of statistical disclosure control, record linkage and quality control. One of his current projects is 'Confidentiality on the fly'.

Publications

Publications (41)
Chapter
Full-text available
We investigate an attack on a machine learning classifier that predicts the propensity of a person or household to move (i.e., relocate) in the next two years. The attack assumes that the classifier has been made publically available and that the attacker has access to information about a certain number of target individuals. That attacker might al...
Chapter
The spatial distribution of a variable, such as the energy consumption per company, is usually plotted by colouring regions of the study area according to an underlying table which is already protected from disclosing sensitive information. The result is often heavily influenced by the shape and size of the regions. In this paper, we are interested...
Article
Full-text available
A commonly known problem in population size estimation using registers, is that registers do not necessarily cover the whole population. This may be because they intend to cover part of the population (e.g., students), due to administrative delay or because part of the target population is not registered by default (e.g., illegal persons). One of t...
Article
Full-text available
The size of a partly observed population is often estimated with the capture – recapture (for two sources) or multiple – recapture (for multiple sources) estimation method. An important assumption of these models is that records in different sources can be identified such that it is known whether these records belong to the same unit or not, i.e. r...
Conference Paper
Cartographic maps have many practical uses and can be an attractive alternative for disseminating detailed frequency tables. However, a detailed map may disclose private data of individual units of a population. We will describe some smoothing algorithms to display spatial distribution patterns. In certain situations, the disclosure risk of a spati...
Article
At National Statistical Institutes, increased re-use of data and increased use of different data sources within statistical production lines leads to increased complexity. Moreover, different production lines become interconnected resulting in networks or chains. With increasing complexity, process management can be used to support tactical and ope...
Technical Report
Full-text available
In this manuscript we consider an important topic in official statistics, namely estimating the number of usual residents. For the Netherlands we investigate the under coverage of the Population Register. First, the Population Register is linked to an Employment Register and a Crime Suspects Register. Then, we use three list capture-recapture metho...
Conference Paper
Both the (n,k)-dominance rule as well as the p%-rule are well known and often used sensitivity measures in determining which cells are unsafe to publish in tabular output. The p%-rule has some theoretical advantages over the dominance rule, hence it is generally advised to use that rule instead of the latter one. In this paper we investigate the re...
Book
A reference to answer all your statistical confidentiality questions. This handbook provides technical guidance on statistical disclosure control and on how to approach the problem of balancing the need to provide users with statistical outputs and the need to protect the confidentiality of respondents. Statistical disclosure control is combined w...
Technical Report
Full-text available
In this paper, we describe the controversy that arose between Statistics Netherlands and the Ministry of Economic Affairs in 2009 after the Dutch government announced a tax relief measure for businesses, which deteriorated the quality of tax data used by Statistics Netherlands for producing short-term statistics.
Conference Paper
In Council Regulation no. 2701/98 of the European Committee, a framework is given on an extensive set of tables concerning economic statistics. Some of these tables are linked to each other. Until recently, there existed no practical solution to a consistent protection of that set of tables, save for a rather naive one. In this paper we will show t...
Article
The software package τ-argus offers a very efficient algorithm for secondary cell suppression known as either HiTaS or the modular approach. The method is well suited for the protection of up to 3-dimensional hierarchical tables. In practice, statistical agencies release multiple tabulations based on the same dataset. Usually these tables are linke...
Conference Paper
Full-text available
The software package �-ARGUS offers a very efficient algorithm for secondary cell suppression known as either HiTaS or the Modular approach. The method is well suited for the protection of up to 3-dimensional hierarchical tables. In practice, statistical agencies release multiple tabulations based on the same dataset. Usually these tables are linke...
Conference Paper
PRAM (Post Randomization Method) is a disclosure control method for microdata, introduced in 1997. Unfortunately, PRAM has not yet been applied extensively by statistical agencies in protecting their microdata. This is partly due to the fact that little knowledge is available on the effect of PRAM on disclosure control as well as on the loss of inf...
Conference Paper
In this paper we discuss a pilot project concerning a remote access facility at Statistics Netherlands. We describe some aspects of the technical implementation as well as the functional implementation. Moreover, we will discuss some tentative first experiences of external users.
Conference Paper
A heuristic for disclosure control in hierarchical tables was introduced in [5]. In that heuristic, the complete set of all possible subtables is being protected in a sequential way. In this article, we will show that it is possible to reduce the set of subtables, to a set that contains subtables with the same dimension as the complete hierarchical...
Article
A large part of the theory of extreme value index estimation is developed for positive extreme value indices. The best-known estimator of a positive extreme value index is probably the Hill estimator. This estimator belongs to the category of moment estimators, but can also be interpreted as a quasi-maximum likelihood estimator. It has been general...
Article
Full-text available
PRAM is a probabilistic, perturbative method for disclosure protection of categorical variables in microdata files. If PRAM is to be applied, several issues should be carefully considered. The microdata file will usually contain a specific structure, e.g., a hierarchical structure when all members of a household are present in the data file.
Conference Paper
Full-text available
This paper describes a heuristic approach to find suppression patterns in tables that exhibit a hierarchical structure in at least one of the explanatory variables. The hierarchical structure implies that there exist (many) sub-totals, i.e., that (many) sub-tables can be constructed. These sub-tables should be protected in such a way that they cann...
Article
The Post RAndomisation Method (PRAM) is a perturbative method for disclosure protection of categorical variables. Applying PRAM means that for each record in a microdata file the score on a number of variables is changed according to a specified probability mechanism. This article considers the effect of PRAM on both the safety of the data and the...
Technical Report
The application of the Perpetual Inventory Method (PIM) requires estimates and assumptions on three parameters: service life, discard pattern and depreciation method. In this paper these parameters are discussed and choices are made in order to present an applicable approach. Service lives are an important parameter in the Perpetual Inventory Metho...
Conference Paper
PRAM is a probabilistic, perturbative method for disclosure protection of categorical variables in microdata files. If PRAM is to be applied, several issues should be carefully considered. The microdata file will usually contain a specific structure, e.g., a hierarchical structure when all members of a household are present in the data file. To wha...
Article
Full-text available
This paper describes the Post Randomisation Method (PRAM) for disclosure protection of microdata. Applying PRAM means that for each record in the data file according to a specified probability mechanism the score on a number of variables is changed. Since this probability mechanism is known, the characteristics of the latent true data can unbiasedl...
Technical Report
Full-text available
PRAM is a probabilistic, perturbative method for disclosure protection of categorical variables in microdata files. If PRAM is applied several issues should be carefully considered. Some of these issues are considered in this paper. The paper was a contribution to the Conference on Statistical Data Protection '98, March 25-27 1998, Lisbon, Portuga...
Technical Report
Full-text available
The Post Randomisation method (PRAM) is a purturbative method for disclosure protection of categorical variables. Applying PRAM means that for each record in a microdata file the score on a number of variables is changed according to a specified probability mechanism.
Article
Full-text available
Statistical Disclosure Control is part of the usual process, necessary for the dissemination of microdata. Usually, direct identifiers are removed and variables that can be used to re-identify certain records are recoded or suppressed. For certain surveys these methods are inadequate to produce safe microdatasets with enough detail, to satify both...
Article
Full-text available
Until now, Statistics Netherlands has used questionnaires to produce short-term statistics on turnover. VAT data were used only as an auxiliary variable. To reduce respondent burden, we are trying to use VAT turnover data instead of questionnaires, for small and medium-sized companies. This is not straightforward, because it can be difficult to lin...
Article
Full-text available
Have you heard of the story about Argus in Greek mythology? He was a giant with one hundred eyes. Hera, Zeus's wife, assigned him to guard Zeus's mistress Io. Eager to get Io back, Zeus sent Hermes to kill the giant. Argus was then transformed into a peacock. ARGUS is also the name for a practical tool to guard data. It has been developed by the Co...

Network

Cited By

Projects

Projects (2)
Project
Eurostat co-funded project to create sustainable user support and maintenance of Statistical Disclosure Control software tools. See http://ec.europa.eu/eurostat/cros/content/user-support-and-maintenance-sdc-tools_en https://joinup.ec.europa.eu/software/sdctools/home https://github.com/sdcTools/UserSupport/wiki