Bernhard Meindl's research while affiliated with IST Austria and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (36)
The interactive, web-based point-and-click application presented in this article, allows anonymizing data without any knowledge in a programming language. Anonymization in data mining, but creating safe, anonymized data is by no means a trivial task. Both the methodological issues as well as know-how from subject matter specialists should be taken...
The production of synthetic datasets has been proposed as a statistical disclosure control solution to generate public use files out of protected data, and as a tool to create “augmented datasets” to serve as input for micro-simulation models. Synthetic data have become an important instrument for ex-ante assessments of policy impact. The performan...
Die Armutsberichterstattung in Österreich und der Europäischen Union beruht auf Ergebnissen von EU-SILC, einer jährlich durchgeführten Haushaltserhebung. Die Situation besonders exponierter Bevölkerungsgruppen sowie die regionale Verteilung von Armutsgefährdung kann bei Anwendung klassischer, direkter Schätzung aus den Erhebungsdaten häufig nur ung...
The demand for data from surveys, censuses or registers containing sensible information on people or enterprises has increased significantly over the last years. However, before data can be provided to the public or to researchers, confidentiality has to be respected for any data set possibly containing sensible information about individual units....
Visual analysis of data is important to understand the main characteristics, main trends and relationships in data sets and it can be used to assess the data quality. Using the R package sparkTable, statistical tables holding quantitative information can be enhanced by including spark-type graphs such as sparklines and sparkbars. These kind of grap...
The popularity of
R
is increasing in national statistical offices
not only for simulation tasks. Nowadays
R
is also used in the production
process. A lot of new features for various tasks in official statistics have been
developed over the last years and these features are freely available in the
form of add-on package.
In this contribution we firs...
Visual analysis of data is important to understand the main char-acteristics, main trends and relationships in data sets and it can be used to assess the data quality. Using the R package sparkTable, statistical tables holding quan-titative information can be enhanced by including spark-type graphs such as sparklines and sparkbars . These kind of g...
Traffic management systems are using traffic data from various data collection sources for different purposes
such as traffic control decisions. In order to ensure data reliability and plausibility abnormal/faulty data needs to
be identified. Therefore the QUATRA system is established that provides services for the quality management of
traffic dat...
In this contribution software tools that can be used to solve (mixed inte-ger) linear optimization problems are described and compared. These kind of problems occur for instance when solving the secondary cell suppression problem (CSP). An overview of existing comparisons of both open-source and commercial solvers is given. Moreover, the performanc...
In this contribution, software tools that can be used to solve (mixed integer) linear optimization problems are described and compared. These kind of problems occur for instance when solving the secondary cell suppression problem (CSP) for which we tested the tools.par Especially, for the CSP fast and efficient tools are needed. While experience ga...
Statistical Disclosure Control, Data Utility, Disclosure Risk
The aim of a project initiated by the International Household Survey Network (IHSN,
www.ihsn.org
) is to integrate the C++ code they developed to the R package sdcMicro. The methods for microdata perturbation in the R-package sdcMicro are now all based on computational fast C++ code. The paper describes how this integration was done and describes t...
Statistical Disclosure Control, Data Utility, Disclosure Risk
Der steigende Informationsbedarf unserer Gesellschaft und das große Angebot an statistischer Information erfordert auch deren richtige Nutzung und Anwendung - die Rolle der „Statistik in Bildung und Ausbildung“ ist deshalb von wachsender Bedeutung. Moderne Medien haben in der Welt der Statistik seit langem Einzug gehalten; so trägt etwa die fachkun...
Der steigende Informationsbedarf unserer Gesellschaft und das große Angebot an statistischer Information erfordert auch deren richtige Nutzung und Anwendung - die Rolle der „Statistik in Bildung und Ausbildung“ ist deshalb von wachsender Bedeutung. Moderne Medien haben in der Welt der Statistik seit langem Einzug gehalten; so trägt etwa die fachkun...
Anfang 2009 wurde das TGUI System im Austrian Journal Of Statistics als effektives Instrument zur Mitarbeiterschulung vorgestellt. Ein grundlegendes Redesign ermöglicht es nun das TGUI System auch anderen Lehrenden zugänglich zu machen. Im Vortrag wird TGUITeaching vorgestellt, ein Prototyp der entwickelt wurde um die erstmalige Anwendung des TGUI...
The aim is to show how statistical disclosure methods can be applied to data using the R-packages sdcMicro and sdcTable.
The reader of this chapter should be advised how popular methods in microdata protection and tabular protection can be applied within these packages to real-world data.
sdcMicro supports an exploratory approach for the anonymizat...
The aim of this study is to evaluate the risk of re-identification related to distance-based disclosure risk measures for
numerical variables. First, we overview different - already proposed - disclosure risk measures. Unfortunately, all these
measures do not account for outliers. We assume that outliers must be protected more than observations nea...
The aim of this study was to compare different microdata protection methods for numerical variables under various conditions.
Most of the methods used in this paper have been implemented in the R-package sdcMicro which is available for free on the comprehensive R archive network (
http://cran.r-project.org
). The other methods used can be easily ap...
The aim of this study was to compare different microdata protection meth-ods for numerical variables under various conditions. Most of the 21 methods used in this paper have been implemented in the R-package sdcMicro which is available for free on the comprehensive R archive network (http://cran.r-project.org). The rest of the methods used can easi...
The estimation of Austrian unemployment rates is based on data of the labour force survey (LFS). It is possible to calculate direct, design based estimates with fixed precision for population subgroups for which the sample size is known due to the sampling design.
Sometimes we are interested to estimate unemployment rates for population subgroups...
In this contribution we give an overview about recent developments done in R-package sdcTable. sdcTable is free and open source software that is available on the R comprehensive archive network http://cran.r-project.org. It provides methods to solve the secondary cell suppression problem for multidimensional and hierarchical tables.
In this contribution we will present our work on the implementation of two important algorithms for statistical disclosure control within the free and open source statistical computing system R ((1)). We present a first ver- sion of the open source R-package sdcTable. The paper is organised in the following way. We start by discussing the need for...
Citations
... Another simple encoding employs grids to encode, for instance, a sequence of states in colored cells in one or multiple rows ( 21 / 140 ). However, there also exist approaches demonstrating that word-sized graphics are not limited to these simple diagram types, for instance, graphics that encode spatial trajectories or densities [55], [56], [57], stacked quantities that form streams [58], small representations of boxplots to display statistical distributions [59], [60], [61], or glyphs that encode multivariate properties [9], [40]. Even parallel coordinates can be represented [42], and networks in simplified node-link representations [55], [62] or adjacency matrices [55]. ...
Reference: Word-Sized Graphics for Scientific Texts
... For example, [28] and [29] use imputation processes to decompose the multidimensional joint distribution into conditional univariate distributions. [30] and [31] use parametric models in combination with conditional re-sampling to synthesize hierarchical relationships. ...
... However, at the prediction step, gaussian noise with mean = 1.2% and SD = 3,9% (minimum = 0.0% and maximum = 130.9%) was randomly added to the simulated C1 and C2 sampling times, using the sdcMicro R package. 18 The aim was to introduce uncertainty on input data so as to observe the algorithm prediction performance in more realistic conditions. In addition, we kept the interindividual variability of the PK parameters described in the initial study (eta values), as well as that brought by the most important covariate, the ideal body weight. ...
... Especially in smaller studies, budget and personnel constraints are limiting parameters. Basic statistics and descriptive plots enable researchers to identify quality problems through visualization of main characteristics as well as relationships and, therefore, facilitate monitoring and reporting quality of research data [6]. ...
... There are open -source software libraries available such as COIN -OR, GLPK and lp_solve. A comparison of the performance (speed and instances) of optimization engines was undertaken by [50]. Gearhart et al. [51] evaluate open-source linear programming solvers. ...
... Im Folgenden wird beispielhaft das Einkommensquintilverhältnis (quintile share ratio) und die Armutsrisikoquote (at-risk-of poverty rate) definiert. Weitere Definitionen der Laeken-Indikatoren, so wie sie im AMELI-Projekt verwendet wurden, sind bei Graf et al. (2011a) zu finden. ...
... Various anonymization software tools have been made available in the past. One of the most feature-rich is sdcMicro [1,2], an R package for data anonymization optimized for large datasets. For users comfortable with using R, this package provides a tool for the application of a comprehensive suite of methods commonly used and described in literature on disclosure control. ...
... Users can choose from a total of eight non-cluster-based and four cluster-based methods. The grouping can also be can be nonrobust (for example, the default algorithm Maximum Distance to Average Vector (MDAV) [3]) or robust such as the Robust Maximum Distance (RMD) algorithm [19]. ...
... Examples of such outliers might be enterprises with very high values for turnover or persons with extremely high income. Also, multivariate outliers exist (see Templ and Meindl 2008a). Unfortunately, intruders may want to disclose a large enterprise or an enterprise with specific characteristics. ...
... Additionally, even Lagrangian Relaxation (LR) based algorithms, often employed to exploit the master problem separability into several sub-problems, suffer from a slowly overall convergence that might negatively affect the optimization process (Bragin et al. 2019). Hence, although the mathematical programming related literature reports the execution times for specific algorithms and problem instances (Meindl and Templ 2013), it is difficult to establish the required time given only a MILP problem's generic formulation (IBM 2018). For instance, S. Lehmann et al. have investigated an RPO problem related to wind farm planning with multiple cable types (Lehmann 2017), while J. C. S. N. Pinheiro et al. have investigated an RPO problem related to parallel machine scheduling (Pinheiro et al. 2020), in which each machine has a certain amount of resources to process a job. ...