Rudolf MayerSBA Research
Rudolf Mayer
About
111
Publications
40,455
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,373
Citations
Introduction
Publications
Publications (111)
Large Generative AI (GAI) models have the unparalleled ability to generate text, images, audio, and other forms of media that are increasingly indistinguishable from human-generated content. As these models often train on publicly available data, including copyrighted materials, art and other creative works, they inadvertently risk violating copyri...
Federated learning provides the solution when multiple parties want to collaboratively train a machine learning model without directly sharing sensitive data. In Federated Learning, each party trains a machine learning model locally on its private data and sends only the models' weights or updates (gradients) to an aggregator, which averages locall...
Deep Learning has enabled significant progress towards more accurate predictions and is increasingly integrated into our everyday lives in real-world applications; this is true especially for Convolutional Neural Networks (CNNs) in the field of image analysis. Nevertheless, it has been shown that Deep Learning is vulnerable against well-crafted, sm...
Since 72% of rare diseases are genetic in origin and mostly paediatrics, genetic newborn screening represents a diagnostic “window of opportunity”. Therefore, many gNBS initiatives started in different European countries. Screen4Care is a research project, which resulted of a joint effort between the European Union Commission and the European Feder...
Federated learning provides the solution when multiple parties want to collaboratively train a machine learning model without directly sharing sensitive data. In Federated Learning, each party trains a machine learning model locally on its private data and sends only the models' weights or updates (gradients) to an aggregator, which averages locall...
Federated learning provides the solution when multiple parties want to collaboratively train a machine learning model without directly sharing sensitive data. In Federated Learning, each party trains a machine learning model locally on its private data and sends only the models' weights or updates (gradients) to an aggregator, which averages locall...
Following the reverse genetics strategy developed in the 1980s to pioneer the identification of disease genes, genome(s) sequencing has opened the era of genomics medicine. The human genome project has led to an innumerable series of applications of omics sciences on global health, from which rare diseases (RDs) have greatly benefited. This has pro...
Background:
Machine learning and artificial intelligence have shown promising results in many areas and are driven by the increasing amount of available data. However, these data are often distributed across different institutions and cannot be easily shared owing to strict privacy regulations. Federated learning (FL) allows the training of distri...
The commercial use of machine learning (ML) is spreading; at the same time, ML models are becoming more complex and more expensive to train, which makes intellectual property protection (IPP) of trained models a pressing issue. Unlike other domains that can build on a solid understanding of the threats, attacks, and defenses available to protect th...
Machine-Learning-as-a-Service (MLaaS) has become a widespread paradigm, making even the most complex Machine Learning models available for clients via e.g. a pay-per-query principle. This allows users to avoid time-consuming processes of data collection, hyperparameter tuning, and model training. However, by giving their customers access to the (pr...
The commercial use of Machine Learning (ML) is spreading; at the same time, ML models are becoming more complex and more expensive to train, which makes Intellectual Property Protection (IPP) of trained models a pressing issue. Unlike other domains that can build on a solid understanding of the threats, attacks and defenses available to protect the...
BACKGROUND
Machine learning and artificial intelligence have shown promising results in many areas and are driven by the increasing amount of available data. However, these data are often distributed across different institutions and cannot be easily shared owing to strict privacy regulations. Federated learning (FL) allows the training of distribu...
Data is increasingly collected on practically every area of human life, e.g. from health care to financial or work aspects, and from many different sources. As the amount of data gathered grows, efforts to leverage it have intensified. Many organizations are interested to analyse or share the data they collect, as it may be used to provide critical...
Anonymisation is a strategy often employed when sharing and exchanging data that contains personal and sensitive information, to avoid possible record identification or inference. Besides the actual attributes contained within a dataset, also certain other aspects might reveal information on the data subjects. One example of this is the structure w...
Machine Learning-as-a-Service (MLaaS) has become a widespread paradigm, making even the most complex machine learning models available for clients via e.g. a pay-per-query principle. This allows users to avoid time-consuming processes of data collection, hyperparameter tuning, and model training. However, by giving their customers access to the (pr...
Small and medium-sized organisations face challenges in acquiring, storing and analysing personal data, particularly sensitive data (e.g., data of medical nature), due to data protection regulations, such as the GDPR in the EU, which stipulates high standards in data protection. Consequently, these organisations often refrain from collecting data c...
Anomaly detection is an important task to identify rare events such as fraud, intrusions, or medical diseases. However, it often needs to be applied on personal or otherwise sensitive data, e.g. business data. This gives rise to concerns regarding the protection of the sensitive data, especially if it is to be analysed by third parties, e.g. in col...
The microbial communities of the human body are subject to extensive research efforts. The individual variations in the human microbiome reveal information about our diet, exercise habits and general well-being, and are useful for investigations on the prediction and therapy of diseases. On the other hand, these variations allow for microbiome-base...
Machine learning-based systems are increasingly used in critical applications such as medical diagnosis, automotive vehicles, or biometric authentication. Because of their importance, they can become the target of various attacks. In a data poisoning attack, the attacker carefully manipulates some input data, e.g. by superimposing a pattern, e.g. t...
Digitalization of knowledge work in communication-intensive domains such as intellectual property protection poses great challenges but also opportunities to improve today’s working environments. The legal domain is strongly characterized by knowledge work, whereby, despite a common legal framework, creativity of individual experts is decisive. Thi...
Machine Learning (ML) and Artificial Intelligence (AI) have shown promising results in many areas and are driven by the increasing amount of available data. However, this data is often distributed across different institutions and cannot be shared due to privacy concerns. Privacy-preserving methods, such as Federated Learning (FL), allow for traini...
Federated Learning decreases privacy risks when training Machine Learning (ML) models on distributed data, as it removes the need for sharing and centralizing sensitive data. However, this learning paradigm can also influence the effectiveness of the obtained prediction models. In this paper, we specifically study Neural Networks, as a powerful and...
Fingerprinting is a method of embedding a traceable mark into digital data, to verify the owner and identify the recipient a certain copy of a data set has been released to. This is crucial when releasing data to third parties, especially if it involves a fee, or if the data is of sensitive nature, due to which further sharing and leaks should be d...
k-anonymity is an approach for enabling privacy-preserving data publishing of personal, sensitive data. As a result of this anonymisation process, the utility of the sanitised data is generally lower than on the original data. Quantifying this utility loss is therefore important to estimate the usefulness of the resulting datasets. In this paper, w...
Artificial intelligence (AI) has been successfully applied in numerous scientific domains including biomedicine and healthcare. Here, it has led to several breakthroughs ranging from clinical decision support systems, image analysis to whole genome sequencing. However, training an AI model on sensitive data raises also concerns about the privacy of...
With ever increasing capacity for collecting, storing, and processing of data, there is also a high demand for intelligent knowledge discovery and data analysis methods. While there have been impressive advances in machine learning and similar domains in recent years, this also gives rise to concerns regarding the protection of personal and otherwi...
With the recent advances and increasing activities in data mining and analysis, the protection of the privacy of individuals is crucial. Several approaches address this concern, from techniques like data anonymisation to secure, non-disclosive computation, all of which have their specific strengths and weaknesses, depending on the specific requirem...
Fingerprinting of data is a method to embed a traceable marker into the data to identify which specific recipient a certain copy of the data set has been released to. This is crucial for releasing data sets to third parties, especially if the release involves a fee, or if the data contains sensitive information due to which further sharing and pote...
Machine learning, and deep learning in particular, has seen tremendous advances and surpassed human-level performance on a number of tasks. Currently, machine learning is increasingly integrated in many applications and thereby, becomes part of everyday life, and automates decisions based on predictions. In certain domains, such as medical diagnosi...
Research databases are an important building block in eScience and computational science investigations. For enabling reproducible research, an approach is needed which supports the identification and citation of the exact data (sub)sets utilized in experiments. While this itself is a challenge, in many cases the data stored in databases is sensiti...
Purpose
– This paper aims to address the issue of long-term stability of services and systems depending on service-oriented architecture that has become a popular architecture in systems development and is often implemented using Web services. However, the dependency, especially on externally provided services, can impact the reliability of a syste...
In privacy sensitive eScience domains the disclosure of data is often not allowed or advised if it contains sensitive data about the individual. Applying data protection methods oppose interests of repeatability and reproducibility, as the data which serves as input and output for processing steps in experiments needs to be altered in order to pres...
Workflows have become a popular means for implementing experiments in computational sciences. They are beneficial over other forms of implementation, as they require a formalisation of the experiment process, they provide a standard set of functions to be used, and provide an abstraction of the underlying system. Thus, they facilitate understandabi...
IT-supported business processes and computationally intensive science (called e-science) have become increasingly ubiquitous in the last decades. Along with this trend comes the need to make at least the most important of these processes available for the long term, to allow later analysis of their execution, or even a re-execution. As such, the pr...
High dependence on web services and service-oriented architecture affects not only business solutions, but also scientific research. Web services may be delivered by third parties, and thus are candidates for outsourcing. However, they represent a source of risks, which can jeopardise the robustness of processes. Hence, there is a need for actions...
Many business and scientific processes make extensive use of service-oriented architectures, using distributed services. These are often provided by third parties and are thus not under direct control of process owners. In this paper we discuss the issues of ensuring continuous and faithful execution of processes in distributed environments, focusi...
The re-usability and repeatability of e-Science experiments is widely understood as a requirement of validating and reusing previous work in data-intensive domains. Experiments are, however, often complex chains of processing, involving a number of data sources, computing infrastructure, software tools, or external and third-party services, renderi...
An enterprise architecture provides views on heterogeneous domains, such as business processes, people, business rules, application components, and technological infrastructure. These views are defined according to specific concerns and need to be expressed with an adequate description language. This entails integrating the description languages as...
Enterprise architecture facilitates the alignment between different domains, such as business, applications and information technology. These domains must be described with description languages that best address the concerns of its stakeholders. However, current model-based enterprise architecture techniques are unable to integrate multiple descri...
Earlier work identified the potential for reuse and reproducibility when applying workflow systems - repurposable components and their assemblies - to audio analysis and Music Information Retrieval. In this paper we extend this approach with the introduction of Research Objects to capture semantic context, gathered when workflows are applied within...
Earlier work identified the potential for reuse and reproducibility when applying workflow systems - repurposable components and their assemblies - to audio analysis and Music Information Retrieval. In this paper we extend this approach with the introduction of Research Objects to capture semantic context, gathered when workflows are applied within...
In the domain of eScience, investigations are increasingly collaborative. Most scientific and engineering domains benefit from building on top of the outputs of other research: By sharing information to reason over and data to incorporate in the modelling task at hand.
This raises the need to provide means for preserving and sharing entire eScience...
Digital preservation research has seen an increased focus is on objects that are non-deterministic but depend on external events like user input or data from external sources. Among those is the preservation of scientific processes, aiming at reuse of research outputs. Ensuring that the preserved object is equivalent to the original is a key concer...
Enterprise architecture aligns business and information technology through the management of different elements and domains. An architecture description encompasses a wide and heterogeneous spectrum of areas, such as business processes, metrics, application components, people and technological infrastructure. Views express the elements and relation...
Enterprise architecture supports the analysis, design and engineering of business-oriented systems through multiple views. Each view expresses the elements and relationships of a system from the perspective of specific system concerns relevant to one or more of its stakeholders. As a result, each view needs to expressed in the architecture descript...
A goal of enterprise architecture is to align the business with the underlying support systems. An enterprise architecture description encompasses an heterogeneous spectrum of domains, such as business processes, application components, metrics, people and technological infrastructure. Architectural views express the domain elements and their relat...
The Million Song Dataset (MSD), a collection of one million music pieces, enables a new era of research of Mu-sic Information Retrieval methods for large-scale applica-tions. It comes as a collection of meta-data such as the song names, artists and albums, together with a set of fea-tures extracted with the The Echo Nest services, such as loudness,...
Digital Preservation has so far focused mainly on digital objects that are static in their nature, such as text and multimedia documents. However, there is an increasing demand to extend the applications towards dynamic objects and whole processes, such as scientific workflows in the domain of E-Science. This calls for a revision and extension of c...
In experimental sciences, under which we may likely sub-sume most research areas in MIR, repeatability is one of the key cornerstones of validating research and measuring progress. Yet, due to the complexity of typical MIR exper-iments, ensuring the capability of re-running any experi-ment, achieving exactly identical outputs is challenging at best...
Digital audio has become an ubiquitously available medium, and for many consumers, it is the major distribution and storage form of music, accounting for a growing share of record sales. However, handling the ever growing size of both private and commercial collections becomes increasingly difficult. Users are often overwhelmed by the seemingly cou...