
Abel DasylvaStatistics Canada | STATCAN · International Cooperation and Corporate Statistical Methods Division
Abel Dasylva
Doctor of Statistics
About
33
Publications
1,405
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
163
Citations
Citations since 2017
Introduction
I am a methodologist with a general background in survey methodology and biostatistics. I am currently developing state-of-the-art methods for linking massive data sets, assessing linkage errors and analyzing linked data while accounting for linkage errors.
Publications
Publications (33)
An excerpt: "...IPP techniques can help National Statistical Organizations (NSOs) to (re)use private and administrative data for statistical purposes while preserving data confidentiality and individual privacy rights."
Available at
https://statswiki.unece.org/display/hlgbas/Input+Privacy-Preservation+for+Official+Statistics+Project+outcome?previe...
Presentation at the 2022 HLG-MOS workshop
https://statswiki.unece.org/display/hlgbas/Input+Privacy-Preservation+Project+Seminar+2022
Duplicate records are records from the same unit in a given data source, regardless of whether they are identical. Their identification is required when the source is used to produce official statistics, such as a sampling frame or a census. To date, many Bayesian models have been described to perform this task in an automated manner. Yet, they inv...
In official statistics, record linkage is used to find records from the same entity in many data sources, often without a unique identifier. Consequently, linkage errors arise that are commonly measured by the recall and the precision; two finite population parameters that require the identification of all the record pairs, where the records are fr...
The probabilistic method of record linkage aims at making optimal linkage decisions for a given set of features. Yet it does not prescribe how to select these features. Another challenge is the estimation of the decision of parameters due to the common lack of training data. This presentation addresses both issues with a model and a recursive parti...
When linking massive data sets, blocking is used to select a manageable subset of record pairs at the expense of losing a few matched pairs. This loss is an important component of the overall linkage error, because blocking decisions are made early on in the linkage process, with no way to revise them in subsequent steps. Yet, measuring this contri...
Non-probability samples are being increasingly explored by National Statistical Offices as a complement to probability samples. We consider the scenario where the variable of interest and auxiliary variables are observed in both a probability and non-probability sample. Our objective is to use data from the non-probability sample to improve the eff...
In the context of its "admin-first" paradigm, Statistics Canada is prioritizing the use of non-survey sources to produce official statistics. This paradigm critically relies on non-survey sources that may have a nearly perfect coverage of some target populations, including administrative files or big data sources. Yet, this coverage must be measure...
In the context of its "admin-first" paradigm, Statistics Canada is prioritizing the use of non-survey sources to produce official statistics. This paradigm critically relies on non-survey sources that may have a nearly perfect coverage of some target populations, including administrative files or big data sources. Yet, this coverage must be measure...
The accurate and cost effective estimation of linkage errors remains a major challenge for the automated production and use of linked data. However this exercise is worthwhile only if the linked data are fit for use. A new model is proposed to estimate the errors without clerical reviews, training data or conditional independence assumptions, under...
When linking massive data sets, blocking is used to select a manageable subset of record pairs at the expense of losing a few matched pairs. This loss is an important component of the overall linkage error, because blocking decisions are made early on in the linkage process, with no way to revise them in subsequent steps. Yet, measuring this contri...
This presentation describes a new model for the estimation of linkage errors without clerical reviews and without assumptions of conditional independence.
In theory, the probabilistic linkage method provides two distinct advantages over non-probabilistic methods, including minimal rates of linkage error and accurate measures of these rates for data users. However, implementations can fall short of these expectations either because the conditional independence assumption is made, or because a model wi...
A new estimating equation methodology is proposed for the primary analysis of linked data, i.e. an analysis by someone having an unfettered access to the related microdata and project information. It is described when the data come from the linkage of two registers with an exhaustive coverage of the same population, or from the linkage of two overl...
This article looks at the estimation of an association parameter between two variables in a finite population, when the variables are separately recorded in two population registers that are also imperfectly linked. The main problem is the occurrence of linkage errors that include bad links and missing links. A methodology is proposed when clerical...
We propose an optimal estimating equation for logistic regression with linked data while accounting for false positives. It builds on a previous solution but estimates the regression coefficients with a smaller variance, in large samples.
We propose an optimal estimating equation for logistic regression with linked data while accounting for false positives. It builds on a previous solution but estimates the regression coefficients with a smaller variance, in large samples.
Background:
This study summarizes the linkage of the Canadian Community Health Survey (CCHS) and the Canadian Mortality Database (CMDB), which was performed to examine relationships between social determinants, health behaviours and mortality in the household population.
Data and methods:
The 2000/2001-to-2011 Canadian Community Health Surveys w...
Probabilistic linkage is susceptible to linkage errors such as missed links and false links. In many cases, these errors may be reliably measured through clerical-reviews, i.e. the visual inspection of a sample of record pairs to determine if they are matched. A framework is described to effectively carry-out such clerical-reviews. It is based on a...
This paper presents new constructions of multistage wave-mixing networks with arbitrary b×b space-switching elements, where b ≥ 2. In these networks, for a size of F fiber links and W wavelengths per link, converter requirements are O(Flog<sub>b</sub>W) or O(FW/b) for rearrangeable nodes, and O(Flog<sub>b</sub>Wlog<sub>b</sub>(FW)) or O(FWlog<sub>b...
There is current interest in differentiated service architectures where packets with different priorities can share the same queue. In the case of congestion, packets marked with higher drop probability are preferentially dropped in order to make buffer room for packets marked with lower drop probability. Active queue management (AQM) based on rand...
Multistage cross connects with wave-mixing conversion have two essential characteristics. First, individual converters are simultaneously shared by a significant number of channels. Second, individual channels may be converted through one or more cascaded wave-mixing conversions. The combination of both design principles contributes to the degradat...
We describe a general construction for space-wavelength log2(FW;m; p) networks capable of wave-mixing conversion, including previous designs proposed in the literature, where F is the number of fibers and W is the number of wavelengths per fiber. In these networks, a lightpath is converted through at most f log2W +min(m; log2 W)g cascaded conversio...
The health status of the control plane and the data plane of a GMPLS-controlled optical network is independent in the physically separated control network implementation. In most control plane designs, besides the topology information, the entities of the routing protocol only record the number of available wavelengths on each link. However, the st...
We describe what we believe to be new designs for all-optical cross connects, capable of wavelength conversion. They are based on two-dimensional, space–wavelength, Benes or Cantor topologies, and they exploit cascaded wave-mixing bulk frequency conversion. In these cross connects many channels at distinct frequencies can be simultaneously frequenc...
The LDP (label distribution protocol) is used in the control plane
to control an optical network. The data plane and the control plane of
an optical network could be physically separate. So a failure in the
control plane does not necessarily imply a data plane failure and that
user communications have to be interrupted. The standard LDP, however,
d...
We consider single-hop wavelength-division multiplexed networks in which the transmitters take a nonzero amount of time, called tuning latency, to tune from one wavelength to another. For such networks, we show that, under certain conditions on the traffic matrix, there exist polynomial-time algorithms that produce the optimal schedule. Further, th...
We consider the problem of obtaining non-trivial lower bounds on
the the lost revenue under any routing and admission control scheme in a
multi-class loss network. First, we use the following simple idea to
bound the performance of any coordinate-convex admission policy on a
single link: the blocking probability of any call class is lower bounded
b...
We consider the problem of routing permanent virtual circuits over
general-topology networks under some shortest-path rule. We derive
bounds on the competitiveness ratio of any online algorithm as a
function of the cost function used to describe the congestion on each
network element, link or node
We propose competitive policies for admission control and routing
in general topology networks, with bandwidth and buffer resources. The
goal of these solutions is to maximize the network revenue, when the
traffic demand is not known ahead of time, and over-allocation of
network resources is allowed with some probability. On top of
appropriate reso...
We consider single-hop wavelength-division multiplexed (WDM) networks in which the transmitters take a nonzero amount of time, called tuning latency, to tune from one wavelength to another. For such networks, we show that, under certain conditions on the traffic matrix, there exist polynomial-time algorithms that produce the optimal schedule. Furth...