Bernhard Meindl's research while affiliated with IST Austria and other places

Publications (36)

Article
Full-text available
The interactive, web-based point-and-click application presented in this article, allows anonymizing data without any knowledge in a programming language. Anonymization in data mining, but creating safe, anonymized data is by no means a trivial task. Both the methodological issues as well as know-how from subject matter specialists should be taken...
Article
Full-text available
The production of synthetic datasets has been proposed as a statistical disclosure control solution to generate public use files out of protected data, and as a tool to create “augmented datasets” to serve as input for micro-simulation models. Synthetic data have become an important instrument for ex-ante assessments of policy impact. The performan...
Article
Die Armutsberichterstattung in Österreich und der Europäischen Union beruht auf Ergebnissen von EU-SILC, einer jährlich durchgeführten Haushaltserhebung. Die Situation besonders exponierter Bevölkerungsgruppen sowie die regionale Verteilung von Armutsgefährdung kann bei Anwendung klassischer, direkter Schätzung aus den Erhebungsdaten häufig nur ung...
Article
Full-text available
The demand for data from surveys, censuses or registers containing sensible information on people or enterprises has increased significantly over the last years. However, before data can be provided to the public or to researchers, confidentiality has to be respected for any data set possibly containing sensible information about individual units....
Article
Full-text available
Visual analysis of data is important to understand the main characteristics, main trends and relationships in data sets and it can be used to assess the data quality. Using the R package sparkTable, statistical tables holding quantitative information can be enhanced by including spark-type graphs such as sparklines and sparkbars. These kind of grap...
Article
Full-text available
The popularity of R is increasing in national statistical offices not only for simulation tasks. Nowadays R is also used in the production process. A lot of new features for various tasks in official statistics have been developed over the last years and these features are freely available in the form of add-on package. In this contribution we firs...
Conference Paper
Full-text available
Visual analysis of data is important to understand the main char-acteristics, main trends and relationships in data sets and it can be used to assess the data quality. Using the R package sparkTable, statistical tables holding quan-titative information can be enhanced by including spark-type graphs such as sparklines and sparkbars . These kind of g...
Conference Paper
Traffic management systems are using traffic data from various data collection sources for different purposes such as traffic control decisions. In order to ensure data reliability and plausibility abnormal/faulty data needs to be identified. Therefore the QUATRA system is established that provides services for the quality management of traffic dat...
Article
Full-text available
In this contribution software tools that can be used to solve (mixed inte-ger) linear optimization problems are described and compared. These kind of problems occur for instance when solving the secondary cell suppression problem (CSP). An overview of existing comparisons of both open-source and commercial solvers is given. Moreover, the performanc...
Article
Full-text available
In this contribution, software tools that can be used to solve (mixed integer) linear optimization problems are described and compared. These kind of problems occur for instance when solving the secondary cell suppression problem (CSP) for which we tested the tools.par Especially, for the CSP fast and efficient tools are needed. While experience ga...
Article
Statistical Disclosure Control, Data Utility, Disclosure Risk
Conference Paper
The aim of a project initiated by the International Household Survey Network (IHSN, www.ihsn.org ) is to integrate the C++ code they developed to the R package sdcMicro. The methods for microdata perturbation in the R-package sdcMicro are now all based on computational fast C++ code. The paper describes how this integration was done and describes t...
Conference Paper
Full-text available
Der steigende Informationsbedarf unserer Gesellschaft und das große Angebot an statistischer Information erfordert auch deren richtige Nutzung und Anwendung - die Rolle der „Statistik in Bildung und Ausbildung“ ist deshalb von wachsender Bedeutung. Moderne Medien haben in der Welt der Statistik seit langem Einzug gehalten; so trägt etwa die fachkun...
Article
Full-text available
Der steigende Informationsbedarf unserer Gesellschaft und das große Angebot an statistischer Information erfordert auch deren richtige Nutzung und Anwendung - die Rolle der „Statistik in Bildung und Ausbildung“ ist deshalb von wachsender Bedeutung. Moderne Medien haben in der Welt der Statistik seit langem Einzug gehalten; so trägt etwa die fachkun...
Conference Paper
Full-text available
Anfang 2009 wurde das TGUI System im Austrian Journal Of Statistics als effektives Instrument zur Mitarbeiterschulung vorgestellt. Ein grundlegendes Redesign ermöglicht es nun das TGUI System auch anderen Lehrenden zugänglich zu machen. Im Vortrag wird TGUITeaching vorgestellt, ein Prototyp der entwickelt wurde um die erstmalige Anwendung des TGUI...
Chapter
The aim is to show how statistical disclosure methods can be applied to data using the R-packages sdcMicro and sdcTable. The reader of this chapter should be advised how popular methods in microdata protection and tabular protection can be applied within these packages to real-world data. sdcMicro supports an exploratory approach for the anonymizat...
Conference Paper
The aim of this study is to evaluate the risk of re-identification related to distance-based disclosure risk measures for numerical variables. First, we overview different - already proposed - disclosure risk measures. Unfortunately, all these measures do not account for outliers. We assume that outliers must be protected more than observations nea...
Conference Paper
The aim of this study was to compare different microdata protection methods for numerical variables under various conditions. Most of the methods used in this paper have been implemented in the R-package sdcMicro which is available for free on the comprehensive R archive network ( http://cran.r-project.org ). The other methods used can be easily ap...
Article
The aim of this study was to compare different microdata protection meth-ods for numerical variables under various conditions. Most of the 21 methods used in this paper have been implemented in the R-package sdcMicro which is available for free on the comprehensive R archive network (http://cran.r-project.org). The rest of the methods used can easi...
Article
The estimation of Austrian unemployment rates is based on data of the labour force survey (LFS). It is possible to calculate direct, design based estimates with fixed precision for population subgroups for which the sample size is known due to the sampling design. Sometimes we are interested to estimate unemployment rates for population subgroups...
Article
In this contribution we give an overview about recent developments done in R-package sdcTable. sdcTable is free and open source software that is available on the R comprehensive archive network http://cran.r-project.org. It provides methods to solve the secondary cell suppression problem for multidimensional and hierarchical tables.
Article
In this contribution we will present our work on the implementation of two important algorithms for statistical disclosure control within the free and open source statistical computing system R ((1)). We present a first ver- sion of the open source R-package sdcTable. The paper is organised in the following way. We start by discussing the need for...

Citations

... Another simple encoding employs grids to encode, for instance, a sequence of states in colored cells in one or multiple rows ( 21 / 140 ). However, there also exist approaches demonstrating that word-sized graphics are not limited to these simple diagram types, for instance, graphics that encode spatial trajectories or densities [55], [56], [57], stacked quantities that form streams [58], small representations of boxplots to display statistical distributions [59], [60], [61], or glyphs that encode multivariate properties [9], [40]. Even parallel coordinates can be represented [42], and networks in simplified node-link representations [55], [62] or adjacency matrices [55]. ...
... For example, [28] and [29] use imputation processes to decompose the multidimensional joint distribution into conditional univariate distributions. [30] and [31] use parametric models in combination with conditional re-sampling to synthesize hierarchical relationships. ...
... However, at the prediction step, gaussian noise with mean = 1.2% and SD = 3,9% (minimum = 0.0% and maximum = 130.9%) was randomly added to the simulated C1 and C2 sampling times, using the sdcMicro R package. 18 The aim was to introduce uncertainty on input data so as to observe the algorithm prediction performance in more realistic conditions. In addition, we kept the interindividual variability of the PK parameters described in the initial study (eta values), as well as that brought by the most important covariate, the ideal body weight. ...
... Especially in smaller studies, budget and personnel constraints are limiting parameters. Basic statistics and descriptive plots enable researchers to identify quality problems through visualization of main characteristics as well as relationships and, therefore, facilitate monitoring and reporting quality of research data [6]. ...
... There are open -source software libraries available such as COIN -OR, GLPK and lp_solve. A comparison of the performance (speed and instances) of optimization engines was undertaken by [50]. Gearhart et al. [51] evaluate open-source linear programming solvers. ...
... Im Folgenden wird beispielhaft das Einkommensquintilverhältnis (quintile share ratio) und die Armutsrisikoquote (at-risk-of poverty rate) definiert. Weitere Definitionen der Laeken-Indikatoren, so wie sie im AMELI-Projekt verwendet wurden, sind bei Graf et al. (2011a) zu finden. ...
... Various anonymization software tools have been made available in the past. One of the most feature-rich is sdcMicro [1,2], an R package for data anonymization optimized for large datasets. For users comfortable with using R, this package provides a tool for the application of a comprehensive suite of methods commonly used and described in literature on disclosure control. ...
... Users can choose from a total of eight non-cluster-based and four cluster-based methods. The grouping can also be can be nonrobust (for example, the default algorithm Maximum Distance to Average Vector (MDAV) [3]) or robust such as the Robust Maximum Distance (RMD) algorithm [19]. ...
... Examples of such outliers might be enterprises with very high values for turnover or persons with extremely high income. Also, multivariate outliers exist (see Templ and Meindl 2008a). Unfortunately, intruders may want to disclose a large enterprise or an enterprise with specific characteristics. ...
... Additionally, even Lagrangian Relaxation (LR) based algorithms, often employed to exploit the master problem separability into several sub-problems, suffer from a slowly overall convergence that might negatively affect the optimization process (Bragin et al. 2019). Hence, although the mathematical programming related literature reports the execution times for specific algorithms and problem instances (Meindl and Templ 2013), it is difficult to establish the required time given only a MILP problem's generic formulation (IBM 2018). For instance, S. Lehmann et al. have investigated an RPO problem related to wind farm planning with multiple cable types (Lehmann 2017), while J. C. S. N. Pinheiro et al. have investigated an RPO problem related to parallel machine scheduling (Pinheiro et al. 2020), in which each machine has a certain amount of resources to process a job. ...