Technical Report

Study on the technical evaluation of decentralization based de-identification procedures for personal data in the automotive sector

Authors:
  • Continental Automotive Technologies GmbH
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The aim of this study is the technical evaluation of de-identification methods based on decentralization, in particular methods of distributed and federated learning for personal data in concrete use cases in the mobility domain. The General Data Protection Regulation (GDPR) has significantly increased the incentive and effort for companies to process personal data in compliance with the law. This includes the creation, distribution, storage and deletion of personal data. Non- compliance with the GDPR and other legislation now poses a significant financial risk to companies that work with personal data. With a substancial increase in computing power at the users’ side, distributed and federated learning techniques provide a promising path for de-identification of personal data. Such methods and techniques enable organizations to store and process sensitive user data locally. To do so, a sub-model of the main model that processes data is stored in the local environment of the users. Since only necessary updates are transmitted between the submodel and the main model, two advantages can be achieved from this approach. First, there is no central database, which makes it immensely difficult for potential attackers to obtain large amounts of data. Second, only fragments of the locally stored data are transferred to the main model. In the first work package of this report, suitable use cases for this study are identified through a scientific literature review. The following use cases are identified and analyzed with regard to data, benefits, model and sensible data: Traffic flow prediction, Energy demand prediction, Eco-routing, Autonomous driving, Vehicular object detection, Parking space estimation.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... It has also been shown that if users are aware that a tool should protect their privacy, they are getting biased and tend toward being more concerned about potential privacy issues of the tool than for non-privacy tools [4,5]. Further problems of integrating PETs into existing services are that, on the one hand, it is hard to decide which of the many PETs is the best choice [43,62] and that, on the other hand, it is hardly possible to ask the users about their preferences since in most cases the users do not notice the main achievement of the PET to protect their privacy, but rather things such as increased latency, more complex processes, or similar side effects. ...
Chapter
Full-text available
This chapter provides information about acceptance factors of privacy-enhancing technologies (PETs) based on our research why users are using Tor and JonDonym, respectively. For that purpose, we surveyed 124 Tor users (Harborth and Pape 2020) and 142 JonDonym users (Harborth Pape 2020) and did a quantitative evaluation (PLS-SEM) on different user acceptance factors. We investigated trust in the PET and perceived anonymity (Harborth et al. 2021; Harborth et al. 2020; Harborth and Pape 2018), privacy concerns, and risk and trust beliefs (Harborth and Pape 2019) based on Internet Users Information Privacy Concerns (IUIPC) and privacy literacy (Harborth and Pape 2020). The result was that trust in the PET seems to be the major driver. Furthermore, we investigated the users’ willingness to pay or donate for/to the service (Harborth et al. 2019). In this case, risk propensity and the frequency of perceived improper invasions of users’ privacy were relevant factors besides trust in the PET. While these results were new in terms of the application of acceptance factors to PETs, none of the identified factors was surprising. To identify new factors and learn about differences in users’ perceptions between the two PETs, we also did a qualitative analysis of the questions if users have any concerns about using the PET, when they would be willing to pay or donate, which features they would like to have and why they would (not) recommend the PET (Harborth et al. 2021; Harborth et al. 2020). To also investigate the perspective of companies, we additionally interviewed 12 experts and managers dealing with privacy and PETs in their daily business and identified incentives and hindrances to implement PETs from a business perspective (Harborth et al. 2018).
... Regarding ML services, the aim of PPML is to allow the training of such models while keeping at the same time the data of the input parties (data subject) private [2]. While it is already a challenge to identify the best suitable PPML technique from a technical point of view [29,39], it is even harder to assess which technique has the best end-users acceptance. It is widely recognised that knowledge [11,37] and privacy literacy [23] influence the users' privacy concerns, and thus the acceptance of the service. ...
Conference Paper
Users are confronted with a variety of different machine learning applications in many domains. To make this possible especially for applications relying on sensitive data, companies and developers are implementing Privacy Preserving Machine Learning (PPML) techniques what is already a challenge in itself. This study provides the first step for answering the question how to include the user's preferences for a PPML technique into the privacy by design process , when developing a new application. The goal is to support developers and AI service providers when choosing a PPML technique that best reflects the users' preferences. Based on discussions with privacy and PPML experts, we derived a framework that maps the characteristics of PPML to user acceptance criteria.
... The authors are grateful to the Forschungsvereinigung Automobiltechnik e.V. (FAT e.V.) who funded this research. We are in particular grateful to FAT's working group "AK 31 Elektronik und Software" who not only initiated this research but also provided input and feedback on the underlying reports [19,20] in various meetings. ...
... In contrast, our paper aims at the technical realization of GDPR requirements. In addition to the system model, our work reflects on implementation considerations by selecting suitable privacypreserving technologies (PPT) that are required for a technical realization in the vehicle, which can be challenging itself [19,28]. The concept of the Privacy Manager (cf. ...
Conference Paper
Cars are getting rapidly connected with their environment allowing all kind of mobility services based on the data from various sensors in the car. Data privacy is in many cases only ensured by legislation, i. e., the European General Data Protection Regulation (GDPR), but not technically enforced. Therefore, we present a system model for enforcing purpose limitation based on data tagging and attribute-based encryption. By encrypting sensitive data in a way only services for a certain purpose can decrypt the data, we ensure access control based on the purpose of a service. In this paper, we present and discuss our system model with the aim to improve technical enforcement of GDPR principles. CCS CONCEPTS • Security and privacy → Human and societal aspects of security and privacy; Privacy protections; Usability in security and privacy; • Computer systems organization → Special purpose systems.
... Inferences can be seen as "new" data, created through the combination of (personal) data of different types and sources. Inferences can also be targeted at de-identified data [49,73] when combining the existing data set with another set to re-identify users. The ethical issue is now how these inferences should be treated under consideration of all circumstances, that is the different entities, creator, data subjects, involved, the type of data as well as its purpose and processing. ...
Book
Full-text available
This book presents the main scientific results from the GUARD project. It aims at filling the current technological gap between software management paradigms and cybersecurity models, the latter still lacking orchestration and agility to effectively address the dynamicity of the former. This book provides a comprehensive review of the main concepts, architectures, algorithms, and non-technical aspects developed during three years of investigation.
... Inferences can be seen as "new" data, created through the combination of (personal) data of different types and sources. Inferences can also be targeted at de-identified data [49,73] when combining the existing data set with another set to re-identify users. The ethical issue is now how these inferences should be treated under consideration of all circumstances, that is the different entities, creator, data subjects, involved, the type of data as well as its purpose and processing. ...
Chapter
Full-text available
Blockchain can be successfully utilised in diverse areas, including the financial sector and the Information and Communication Technology environments, such as computational clouds (CC). While cloud computing optimises the use of resources, it does not (yet) provide an effective solution for the secure hosting scheduling and execution of large computing and data applications and prevention of external attacks. This chapter briefly reviews the recent blockchain-inspired task scheduling and information processing methods in computational clouds. We pay special attention to security, intrusion detection, and unauthorised manipulation of tasks and information in such systems. As an example, we present the implementation of a new blockchain-based scheduler in the computational cloud. We defined a new Proof of Schedule consensus algorithm, which works with the Stackelberg game, regulates checking and adding new blocks to the blockchain, and determines how to validate schedules stored in transactions. The proposed model assumes competition between different schedule providers. The winner of such a competition takes account of the client’s requirements faster and prepares an optimal schedule to meet them. The presented scheduler extends the possibilities of using different scheduling modules by the end-users. By delegating the preparation of the schedules, providers can get benefits only for that, without executing customer tasks.
... Inferences can be seen as "new" data, created through the combination of (personal) data of different types and sources. Inferences can also be targeted at de-identified data [49,73] when combining the existing data set with another set to re-identify users. The ethical issue is now how these inferences should be treated under consideration of all circumstances, that is the different entities, creator, data subjects, involved, the type of data as well as its purpose and processing. ...
Chapter
Full-text available
Intrusion Detection Systems (IDSs) monitor all kinds of IT infrastructures to automatically detect malicious activities related to cyber attacks. Unfortunately, especially anomaly-based IDS are known to produce large numbers of alerts, including false positives, that often become overwhelming for manual analysis. However, due to a fast changing threat landscape, quickly evolving attack techniques, and ever growing number of vulnerabilities, novel anomaly detection systems that enable detection of unknown attacks are indispensable. Therefore, to reduce the number of alerts that have to be reviewed by security analysts, aggregation methods have been developed for filtering, grouping, and correlating alerts. Yet, existing techniques either rely on manually defined attack scenarios or require specific alert formats, such as IDMEF that includes IP addresses. This makes the application of existing aggregation methods infeasible for alerts from host-based or anomaly-based IDSs that frequently lack such network-related data. In this chapter, we present a domain-independent alert aggregation technique that enables automatic attack pattern mining and generation of actionable CTI. The chapter describes the concept of the proposed alert aggregation process as well as a dashboard that enables visualization and filtering of the results. Finally, the chapter demonstrates all features in course of an application example.
... Inferences can be seen as "new" data, created through the combination of (personal) data of different types and sources. Inferences can also be targeted at de-identified data [49,73] when combining the existing data set with another set to re-identify users. The ethical issue is now how these inferences should be treated under consideration of all circumstances, that is the different entities, creator, data subjects, involved, the type of data as well as its purpose and processing. ...
Chapter
Full-text available
Enabling cybersecurity and protecting personal data are crucial challenges in the development and provision of digital service chains. Data and information are the key ingredients in the creation process of new digital services and products. While legal and technical problems are frequently discussed in academia, ethical issues of digital service chains and the commercialization of data are seldom investigated. Thus, based on outcomes of the Horizon2020 PANELFIT project, this work discusses current ethical issues related to cybersecurity. Utilizing expert workshops and encounters as well as a scientific literature review, ethical issues are mapped on individual steps of digital service chains. Not surprisingly, the results demonstrate that ethical challenges cannot be resolved in a general way, but need to be discussed individually and with respect to the ethical principles that are violated in the specific step of the service chain. Nevertheless, our results support practitioners by providing and discussing a list of ethical challenges to enable legally compliant as well as ethically acceptable solutions in the future.
... Inferences can be seen as "new" data, created through the combination of (personal) data of different types and sources. Inferences can also be targeted at de-identified data [49,73] when combining the existing data set with another set to re-identify users. The ethical issue is now how these inferences should be treated under consideration of all circumstances, that is the different entities, creator, data subjects, involved, the type of data as well as its purpose and processing. ...
Chapter
Full-text available
Modern computing paradigms (i.e., cloud, edge, Internet of Things) and ubiquitous connectivity have brought the notion of pervasive computing to an unforeseeable level, which boosts service-oriented architectures and microservices patterns to create digital services with data-centric models. However, the resulting agility in service creation and management has not been followed by a similar evolution in cybersecurity patterns, which still largely rest on more conventional device- and infrastructure-centric models. In this Chapter, we describe the implementation of the GUARD Platform, which represents the core element of a modern cybersecurity framework for building detection and analytics services for complex digital service chains. We briefly review the logical components and how they address scientific and technological challenges behind the limitations of existing cybersecurity tools. We also provide validation and performance analysis that show the feasibility and efficiency of our implementation.
... Inferences can be seen as "new" data, created through the combination of (personal) data of different types and sources. Inferences can also be targeted at de-identified data [49,73] when combining the existing data set with another set to re-identify users. The ethical issue is now how these inferences should be treated under consideration of all circumstances, that is the different entities, creator, data subjects, involved, the type of data as well as its purpose and processing. ...
Chapter
Full-text available
In the field of research, the role of ethics grows more and more every year. One might be surprised but even in the field of technology there is a necessity for experts to understand and to implement ethical principles. Ethics itself could be understood as a code or a moral way by which a person lives and works. But within the field of information technology and cybersecurity research there is a chance that even the most technical appropriate solution does not go in line with the corresponding ethical principles. Experts need to implement fundamental ethical principles in their technical products in order not to cause harm or have any negative effect on their users. To the vast majority of challenges that will be reflected in this chapter are discussed within the EU-funded project GUARD, namely what are the proper actions which need to be taken to ensure ethical compliance. Challenges such as ensuring the privacy of the users, reporting and handling incidental findings, testing the technological product, mitigating biases etc. could have different negative effect on humans if not dealt with properly. The current chapter would explore the questions posed above alongside a description of a methodology resulting in the combined efforts of experts both in the field of cybersecurity and ethics.
... Inferences can be seen as "new" data, created through the combination of (personal) data of different types and sources. Inferences can also be targeted at de-identified data [49,73] when combining the existing data set with another set to re-identify users. The ethical issue is now how these inferences should be treated under consideration of all circumstances, that is the different entities, creator, data subjects, involved, the type of data as well as its purpose and processing. ...
Chapter
Full-text available
As cars and other transportation devices become increasingly interconnected, mobility takes on a new meaning, offering new opportunities. The integration of new communications technologies in modern vehicles has generated an enormous variety of data from various communications sources. Hence, there is a demand for intelligent transportation systems that can provide safe and reliable transportation while maintaining environmental conditions such as pollution, CO2 emission, and energy consumption. This chapter provides an overview of the Intelligent Transportation Systems (ITS) models. Briefly, it discusses the most important features of the systems and challenges, mostly related to the security in data and information processing. Fast anomalies detection and prevention of external attacks may help solve the problems of traffic congestion and road safety to prevent accidents. The chapter contains the description of the realistic Smart Transportation System developed by the Wobcom company and implemented in Wolfsburg (Germany). That system is also used for practical validation of the security service components of the platform created in the GUARD project.
... Inferences can be seen as "new" data, created through the combination of (personal) data of different types and sources. Inferences can also be targeted at de-identified data [49,73] when combining the existing data set with another set to re-identify users. The ethical issue is now how these inferences should be treated under consideration of all circumstances, that is the different entities, creator, data subjects, involved, the type of data as well as its purpose and processing. ...
Chapter
Full-text available
Detection of unknown attacks is challenging due to the lack of exemplary attack vectors. However, previously unknown attacks are a significant danger for systems due to a lack of tools for protecting systems against them, especially in fast-evolving Internet of Things (IoT) technology. The most widely used approach for malicious behaviour of the monitored system is detecting anomalies. The vicious behaviour might result from an attack (both known and unknown) or accidental breakdown. We present a Net Anomaly Detector (NAD) system that uses one-class classification Machine Learning techniques to detect anomalies in the network traffic. The highly modular architecture allows the system to be expanded with adapters for various types of networks. We propose and discuss multiple approaches for increasing detection quality and easing the component deployment in unknown networks by known attacks emulation, exhaustive feature extraction, hyperparameter tuning, detection threshold adaptation and ensemble models strategies. Furthermore, we present both centralized and decentralized deployment schemes and present preliminary results of experiments for the TCP/IP network traffic conducted on the CIC-IDS2017 dataset.
... Inferences can be seen as "new" data, created through the combination of (personal) data of different types and sources. Inferences can also be targeted at de-identified data [49,73] when combining the existing data set with another set to re-identify users. The ethical issue is now how these inferences should be treated under consideration of all circumstances, that is the different entities, creator, data subjects, involved, the type of data as well as its purpose and processing. ...
Chapter
Full-text available
The distributed denial of service (DDoS) attack is an attempt to disrupt the proper availability of a targeted server, service or network. The attack is achieved by corrupting or overwhelming the target’s communications with a flood of malicious network traffic. In the current era of mass connectivity DDoS attacks emerge as one of the biggest threats, staidly causing greater collateral damage and heaving a negate impacting on the integral Internet Infrastructure. DDoS attacks come in a variety of types and schemes, they continue to evolve, steadily becoming more sophisticated and larger at scale. A close investigation of attack vectors and refining current security measures is required to efficiently mitigate new DDoS threats. The solution described in this article concerns a less explored variation of signature-based techniques for DDoS mitigation. The approach exploits one of the traits of modern DDoS attacks, the utilization of Packet generation algorithms (PGA) in the attack execution. Proposed method performs a fast, protocol-level detection of DDoS network packets and can easily be employed to provide an effective, supplementary protection against DDoS attacks.
... Inferences can be seen as "new" data, created through the combination of (personal) data of different types and sources. Inferences can also be targeted at de-identified data [49,73] when combining the existing data set with another set to re-identify users. The ethical issue is now how these inferences should be treated under consideration of all circumstances, that is the different entities, creator, data subjects, involved, the type of data as well as its purpose and processing. ...
Chapter
Full-text available
For many years signature-based intrusion detection has been applied to discover known malware and attack vectors. However, with the advent of malware toolboxes, obfuscation techniques and the rapid discovery of new vulnerabilities, novel approaches for intrusion detection are required. System behavior analysis is a cornerstone to recognizing adversarial actions on endpoints in computer networks that are not known in advance. Logs are incrementally produced textual data that reflect events and their impact on technical systems. Their efficient analysis is key for operational cyber security. We investigate approaches beyond applying simple regular expressions, and provide insights into novel machine learning mechanisms for parsing and analyzing log data for online anomaly detection. The AMiner is an open source implementation of a pipeline that implements many machine learning algorithms that are feasible for deeper analysis of system behavior, recognizing deviations from learned models and thus spotting a wide variety of even unknown attacks.
... Inferences can be seen as "new" data, created through the combination of (personal) data of different types and sources. Inferences can also be targeted at de-identified data [49,73] when combining the existing data set with another set to re-identify users. The ethical issue is now how these inferences should be treated under consideration of all circumstances, that is the different entities, creator, data subjects, involved, the type of data as well as its purpose and processing. ...
Chapter
Full-text available
With the progressive implementation of digital services over virtualized infrastructures and smart devices, the inspection of network traffic becomes more challenging than ever, because of the difficulty to run legacy cybersecurity tools in novel cloud models and computing paradigms. The main issues concern i ) the portability of the service across heterogeneous public and private infrastructures, that usually lack hardware and software acceleration for efficient packet processing, and ii ) the difficulty to integrate monolithic appliances in modular and agile containerized environments. In this Chapter, we investigate the usage of the extended Berkeley Packet Filter (eBPF) for effective and efficient packet inspection in virtualized environments. Our preliminary implementation demonstrates that we can achieve the same performance as well-known packet inspection tools, but with far less resource consumption. This motivates further research work to extend the capability of our framework and to integrate it in Kubernetes.
... The authors are grateful to the Forschungsvereinigung Automobiltechnik e.V. (FAT e.V.) who funded this research. We are in particular grateful to FAT's working group "AK 31 Elektronik und Software" who not only initiated this research and provided input and feedback on the underlying report [32] in various meetings. ...
Conference Paper
Vehicles are becoming interconnected and autonomous while collecting, sharing and processing large amounts of personal, and private data. When developing a service that relies on such data, ensuring privacy preserving data sharing and processing is one of the main challenges. Often several entities are involved in these steps and the interested parties are manifold. To ensure data privacy, a variety of different de-identification techniques exist that all exhibit unique peculiarities to be considered. In this paper, we show at the example of a location-based service for weather prediction of an energy grid operator, how the different de-identification techniques can be evaluated. With this, we aim to provide a better understanding of state-of-the-art de-identification techniques and the pitfalls to consider by implementation. Finally, we find that the optimal technique for a specific service depends highly on the scenario specifications and requirements.
Chapter
Full-text available
One way to reduce privacy risks for consumers when using the internet is to inform them better about the privacy practices they will encounter. Tailored privacy information provision could outperform the current practice where information system providers do not much more than posting unwieldy privacy notices. Paradoxically, this would require additional collection of data about consumers’ privacy preferences—which constitute themselves sensitive information so that sharing them may expose consumers to additional privacy risks. This chapter presents insights on how this paradoxical interplay can be outmaneuvered. We discuss different approaches for privacy preference elicitation, the data required, and how to best protect the sensitive data inevitably to be shared with technical privacy-preserving mechanisms. The key takeaway of this chapter is that we should put more thought into what we are building and using our systems for to allow for privacy through human-centered design instead of static, predefined solutions which do not meet consumer needs.
Chapter
Full-text available
Mobile computing devices have become ubiquitous; however, they are prone to observation and reconstruction attacks. In particular, shoulder surfing, where an adversary observes another user’s interaction without prior consent, remains a significant unresolved problem. In the past, researchers have primarily focused their research on making authentication more robust against shoulder surfing—with less emphasis on understanding the attacker or their behavior. Nonetheless, understanding these attacks is crucial for protecting smartphone users’ privacy. This chapter aims to bring more attention to research that promotes a deeper understanding of shoulder surfing attacks. While shoulder surfing attacks are difficult to study under natural conditions, researchers have proposed different approaches to overcome this challenge. We compare and discuss these approaches and extract lessons learned. Furthermore, we discuss different mitigation strategies of shoulder surfing attacks and cover algorithmic detection of attacks and proposed threat models as well. Finally, we conclude with an outlook of potential next steps for shoulder surfing research.
Chapter
Full-text available
Users should always play a central role in the development of (software) solutions. The human-centered design (HCD) process in the ISO 9241-210 standard proposes a procedure for systematically involving users. However, due to its abstraction level, the HCD process provides little guidance for how it should be implemented in practice. In this chapter, we propose three concrete practical methods that enable the reader to develop usable security and privacy (USP) solutions using the HCD process. This chapter equips the reader with the procedural knowledge and recommendations to: (1) derive mental models with regard to security and privacy, (2) analyze USP needs and privacy-related requirements, and (3) collect user characteristics on privacy and structure them by user group profiles and into privacy personas. Together, these approaches help to design measures for a user-friendly implementation of security and privacy measures based on a firm understanding of the key stakeholders.
Chapter
Full-text available
A variety of methods and techniques are used in usable privacy and security (UPS) to study users’ experiences and behaviors. When applying empirical methods, researchers in UPS face specific challenges, for instance, to represent risk to research participants. This chapter provides an overview of the empirical research methods used in UPS and highlights associated opportunities and challenges. This chapter also draws attention to important ethical considerations in UPS research with human participants and highlights possible biases in study design.
Article
Full-text available
License plate is an essential characteristic to identify vehicles for the traffic management, and thus license plate recognition is important for Internet of Vehicles. Since 5G has been widely covered, mobile devices are utilized to assist the traffic management, which is a significant part of Industry 4.0. However, there have always been privacy risks due to centralized training of models. Also, the trained model cannot be directly deployed on the mobile device due to its large number of parameters. In this paper, we propose a federated learning-based license plate recognition framework (FedLPR) to solve these problems. We design detection and recognition model to apply in the mobile device. In terms of user privacy, data in individuals is harnessed on their mobile devices instead of the server to train models based on federated learning. Extensive experiments demonstrate that FedLPR has high accuracy and acceptable communication cost while preserving user privacy.
Article
Full-text available
This study introduces a software-based traffic congestion monitoring system. The transportation system controls the traffic between cities all over the world. Traffic congestion happens not only in cities, but also on highways and other places. The current transportation system is not satisfactory in the area without monitoring. In order to improve the limitations of the current traffic system in obtaining road data and expand its visual range, the system uses remote sensing data as the data source for judging congestion. Since some remote sensing data needs to be kept confidential, this is a problem to be solved to effectively protect the safety of remote sensing data during the deep learning training process. Compared with the general deep learning training method, this study provides a federated learning method to identify vehicle targets in remote sensing images to solve the problem of data privacy in the training process of remote sensing data. The experiment takes the remote sensing image data sets of Los Angeles Road and Washington Road as samples for training, and the training results can achieve an accuracy of about 85%, and the estimated processing time of each image can be as low as 0.047 s. In the final experimental results, the system can automatically identify the vehicle targets in the remote sensing images to achieve the purpose of detecting congestion.
Article
Full-text available
Existing traffic flow forecasting approaches by deep learning models achieve excellent success based on a large volume of datasets gathered by governments and organizations. However, these datasets may contain lots of user’s private data, which is challenging the current prediction approaches as user privacy is calling for the public concern in recent years. Therefore, how to develop accurate traffic prediction while preserving privacy is a significant problem to be solved, and there is a trade-off between these two objectives. To address this challenge, we introduce a privacy-preserving machine learning technique named federated learning and propose a Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU) for traffic flow prediction. FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism rather than directly sharing raw data among organizations. In the secure parameter aggregation mechanism, we adopt a Federated Averaging algorithm to reduce the communication overhead during the model parameter transmission process. Furthermore, we design a Joint Announcement Protocol to improve the scalability of FedGRU. We also propose an ensemble clustering-based scheme for traffic flow prediction by grouping the organizations into clusters before applying FedGRU algorithm. Extensive case studies on a real-world dataset demonstrate that FedGRU can produce predictions that are merely 0.76 km/h worse than the state-of-the-art in terms of mean average error under the privacy preservation constraint, confirming that the proposed model develops accurate traffic predictions without compromising the data privacy.
Article
Full-text available
Road vehicles operations are continuously monitored through physical parameters (temperature, air flow, rotation rate);such measurements are retrieved by electronic sensors and communicated, over the internal vehicle communications protocol,towards the Main Control Unit for further processing. In this paper we present our selection of parameters for monitoring keyvehicle operations and briefly describe the sensors employed for the retrieval of these parameter values. The values are retrievedthrough the OBD-II diagnostics protocol and they are related with the vehicle operation and with the fuel consumption. As proofof concept, focused experimentation has taken place, through a 5 km trip with low and heavy traffic. Values retrieved fromthe OBD-II scanner are presented and discussed. In terms of evaluation, the raw values as well as the calculated measurementsrelated to fuel consumption are compared with manufacturer standards and the user driving behaviour has been identified asthe key factor influencing the fuel consumption for a given model (19) (PDF) OBD-II sensor diagnostics for monitoring vehicle operation and consumption. Available from: https://www.researchgate.net/publication/336861522_OBD-II_sensor_diagnostics_for_monitoring_vehicle_operation_and_consumption [accessed Jan 03 2020].
Conference Paper
Full-text available
The large-scale monitoring of computer users' software activities has become commonplace, e.g., for application telemetry, error reporting, or demographic profiling. This paper describes a principled systems architecture---Encode, Shuffle, Analyze (ESA)---for performing such monitoring with high utility while also protecting user privacy. The ESA design, and its Prochlo implementation, are informed by our practical experiences with an existing, large deployment of privacy-preserving software monitoring. With ESA, the privacy of monitored users' data is guaranteed by its processing in a three-step pipeline. First, the data is encoded to control scope, granularity, and randomness. Second, the encoded data is collected in batches subject to a randomized threshold, and blindly shuffled, to break linkability and to ensure that individual data items get "lost in the crowd" of the batch. Third, the anonymous, shuffled data is analyzed by a specific analysis engine that further prevents statistical inference attacks on analysis results. ESA extends existing best-practice methods for sensitive-data analytics, by using cryptography and statistical techniques to make explicit how data is elided and reduced in precision, how only common-enough, anonymous data is analyzed, and how this is done for only specific, permitted purposes. As a result, ESA remains compatible with the established workflows of traditional database analysis. Strong privacy guarantees, including differential privacy, can be established at each processing step to defend against malice or compromise at one or more of those steps. Prochlo develops new techniques to harden those steps, including the Stash Shuffle, a novel scalable and efficient oblivious-shuffling algorithm based on Intel's SGX, and new applications of cryptographic secret sharing and blinding. We describe ESA and Prochlo, as well as experiments that validate their ability to balance utility and privacy.
Article
Intelligent transportation systems, especially Autonomous Vehicles (AVs), are emerging as a paradigm with the potential to change modern society. However, with this, there is a strong need to ensure the security and privacy of such systems. AV ecosystems depend on machine learning algorithms to autonomously control their operations. Given the amount of personal information AVs collect, coupled with the distributed nature of such ecosystems, there is a movement to employ federated learning algorithms to develop secure decision-making models. Although federated learning is a viable candidate for data privacy, it is vulnerable to adversarial attacks, particularly data poisoning attacks, where malicious vectors would be injected in the training phase. Additionally, hyperparameters play an important role in establishing an efficient federated learning model that can be resilient against adversarial attacks. In this paper, to address these challenges, we propose a novel Optimized Quantum-based Federated Learning (OQFL) framework to automatically adjust the hyperparameters of federated learning using various adversarial attacks in AV settings. This work is innovative in two ways: first, a quantum-behaved particle swarm optimization technique is used to update the hyperparameters of the learning rate, local and global epochs. Second, the proposed technique is utilized within a cyber defense framework to defend against adversarial attacks. The performance of the proposed framework was evaluated using two benchmark datasets: MINST and Fashion-MINST, where they include images that would be extracted from smart cameras of AVs. This framework is shown to be more resilient against various adversarial attacks compared with peer techniques.
Conference Paper
Vehicles are becoming interconnected and autonomous while collecting, sharing and processing large amounts of personal, and private data. When developing a service that relies on such data, ensuring privacy preserving data sharing and processing is one of the main challenges. Often several entities are involved in these steps and the interested parties are manifold. To ensure data privacy, a variety of different de-identification techniques exist that all exhibit unique peculiarities to be considered. In this paper, we show at the example of a location-based service for weather prediction of an energy grid operator, how the different de-identification techniques can be evaluated. With this, we aim to provide a better understanding of state-of-the-art de-identification techniques and the pitfalls to consider by implementation. Finally, we find that the optimal technique for a specific service depends highly on the scenario specifications and requirements.
Article
As a distributed learning approach, federated learning trains a shared learning model over distributed datasets while preserving the training data privacy. We extend the application of federated learning to parking management and introduce FedParking in which Parking Lot Operators (PLOs) collaborate to train a long short-term memory model for parking space estimation without exchanging the raw data. Furthermore, we investigate the management of Parked Vehicle assisted Edge Computing (PVEC) by FedParking. In PVEC, different PLOs recruit PVs as edge computing nodes for offloading services through an incentive mechanism, which is designed according to the computation demand and parking capacity constraints derived from FedParking. We formulate the interactions among the PLOs and vehicles as a multi-lead multi-follower Stackelberg game. Considering the dynamic arrivals of the vehicles and time-varying parking capacity constraints, we present a multi-agent deep reinforcement learning approach to gradually reach the Stackelberg equilibrium in a distributed yet privacy-preserving manner. Finally, numerical results are provided to demonstrate the effectiveness and efficiency of our scheme.
Chapter
Exchanging model updates is a widely used method in the modern federated learning system. For a long time, people believed that gradients are safe to share: i.e., the gradients are less informative than the training data. However, there is information hidden in the gradients. Moreover, it is even possible to reconstruct the private training data from the publicly shared gradients. This chapter discusses techniques that reveal information hidden in gradients and validate the effectiveness on common deep learning tasks. It is important to raise people’s awareness to rethink the gradient’s safety. Several possible defense strategies have also been discussed to prevent such privacy leakage.
Article
Federated learning, as a promising machine learning approach, has emerged to leverage a distributed personalized dataset from a number of nodes, for example, mobile devices, to improve performance while simultaneously providing privacy preservation for mobile users. In federated learning, training data is widely distributed and maintained on the mobile devices as workers. A central aggregator updates a global model by collecting local updates from mobile devices using their local training data to train the global model in each iteration. However, unreliable data may be uploaded by the mobile devices (i.e., workers), leading to frauds in tasks of federated learning. The workers may perform unreliable updates intentionally, for example, the data poisoning attack, or unintentionally, for example, low-quality data caused by energy constraints or high-speed mobility. Therefore, finding out trusted and reliable workers in federated learning tasks becomes critical. In this article, the concept of reputation is introduced as a metric. Based on this metric, a reliable worker selection scheme is proposed for federated learning tasks. Consortium blockchain is leveraged as a decentralized approach for achieving efficient reputation management of the workers without repudiation and tampering. By numerical analysis, the proposed approach is demonstrated to improve the reliability of federated learning tasks in mobile networks.
evaluation of de-identification procedures for personal data in the automotive sector
evaluation of de-identification procedures for personal data in the automotive sector. Universitätsbibliothek Johann Christian Senckenberg, May 2021. doi: http://dx.doi.org/10.21248/gups. 63413. URL http://publikationen.ub.uni-frankfurt.de/frontdoor/
Contra: Defending against poisoning attacks in federated learning
  • Sana Awan
  • Bo Luo
  • Fengjun Li
Sana Awan, Bo Luo, and Fengjun Li. Contra: Defending against poisoning attacks in federated learning. In European Symposium on Research in Computer Security, pages 455-475. Springer, 2021.
  • Vitaly Ú Lfar Erlingsson
  • Ilya Feldman
  • Ananth Mironov
  • Raghunathan
Ú lfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Shuang Song, Kunal Talwar, and Abhradeep Thakurta. Encode, shuffle, analyze privacy revisited: Formalizations and empirical evaluation. arXiv preprint arXiv:2001.03618, 2020.
Mitigating sybils in federated learning poisoning
  • Clement Fung
  • J M Chris
  • Ivan Yoon
  • Beschastnikh
Clement Fung, Chris JM Yoon, and Ivan Beschastnikh. Mitigating sybils in federated learning poisoning. arXiv preprint arXiv:1808.04866, 2018.
Evaluating gradient inversion attacks and defenses in federated learning
  • Yangsibo Huang
  • Samyak Gupta
  • Zhao Song
  • Kai Li
  • Sanjeev Arora
Yangsibo Huang, Samyak Gupta, Zhao Song, Kai Li, and Sanjeev Arora. Evaluating gradient inversion attacks and defenses in federated learning. Advances in Neural Information Processing Systems, 34, 2021b.
Learning to detect malicious clients for robust federated learning
  • Suyi Li
  • Yong Cheng
  • Wei Wang
  • Yang Liu
  • Tianjian Chen
Suyi Li, Yong Cheng, Wei Wang, Yang Liu, and Tianjian Chen. Learning to detect malicious clients for robust federated learning. arXiv preprint arXiv:2002.00211, 2020.
  • Priyanka Mary Mammen
Priyanka Mary Mammen. Federated learning: Opportunities and challenges. arXiv preprint arXiv:2101.05428, 2021.
Assessing the impact of driving behavior on instantaneous fuel consumption
  • Javier E Meseguer
  • Carlos T Calafate
  • Juan Carlos Cano
  • Pietro Manzoni
Javier E Meseguer, Carlos T Calafate, Juan Carlos Cano, and Pietro Manzoni. Assessing the impact of driving behavior on instantaneous fuel consumption. In 2015 12th Annual IEEE Consumer Communications and Networking Conference (CCNC), pages 443-448. IEEE, 2015.
  • Gs Oh
  • J David
  • Huei Leblanc
  • Peng
GS Oh, David J Leblanc, and Huei Peng. Vehicle energy dataset (ved), a large-scale dataset for vehicle energy consumption research. arXiv preprint arXiv:1905.02081, 2019.
  • Ziteng Sun
  • Peter Kairouz
  • Ananda Theertha Suresh
  • H Brendan Mcmahan
Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, and H Brendan McMahan. Can you really backdoor federated learning? arXiv preprint arXiv:1911.07963, 2019.
Data poisoning attacks against federated learning systems
  • Vale Tolpegin
  • Stacey Truex
  • Ling Mehmet Emre Gursoy
  • Liu
Vale Tolpegin, Stacey Truex, Mehmet Emre Gursoy, and Ling Liu. Data poisoning attacks against federated learning systems. In European Symposium on Research in Computer Security, pages 480-501. Springer, 2020.
Mitigating backdoor attacks in federated learning
  • Chen Wu
  • Xian Yang
  • Sencun Zhu
  • Prasenjit Mitra
Chen Wu, Xian Yang, Sencun Zhu, and Prasenjit Mitra. Mitigating backdoor attacks in federated learning. arXiv preprint arXiv:2011.01767, 2020.