Eric Bodden’s research while affiliated with Fraunhofer Institute for Mechatronic Systems Design and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (245)


Fig. 3: SliceViz interactive dashboard for Roidsec [48].
Fig. 4: SliceViz customization options for users.
Fig. 5: Visualized Jimple slice in Roidsec [48], blue edge: data edge, green edge: control and data edge.
Fig. 6: Java view for the Roidsec Jimple slice (cf. Figure 5).
Visualizing Privacy-Relevant Data Flows in Android Applications
  • Preprint
  • File available

March 2025

·

10 Reads

Mugdha Khedkar

·

·

Santhosh Mohan

·

Eric Bodden

Android applications collecting data from users must protect it according to the current legal frameworks. Such data protection has become even more important since in 2018 the European Union rolled out the General Data Protection Regulation (GDPR). Since app developers are not legal experts, they find it difficult to integrate privacy-aware practices into source code development. Despite these legal obligations, developers have limited tool support to reason about data protection throughout their app development process. This paper explores the use of static program slicing and software visualization to analyze privacy-relevant data flows in Android apps. We introduce SliceViz, a web tool that analyzes an Android app by slicing all privacy-relevant data sources detected in the source code on the back-end. It then helps developers by visualizing these privacy-relevant program slices. We conducted a user study with 12 participants demonstrating that SliceViz effectively aids developers in identifying privacy-relevant properties in Android apps. Our findings indicate that program slicing can be employed to identify and reason about privacy-relevant data flows in Android applications. With further usability improvements, developers can be better equipped to handle privacy-sensitive information.

Download

Software Security Analysis in 2030 and Beyond: A Research Roadmap

December 2024

·

15 Reads

·

1 Citation

ACM Transactions on Software Engineering and Methodology

Marcel Böhme

·

Eric Bodden

·

Tevfik Bultan

·

[...]

·

As our lives, our businesses, and indeed our world economy become increasingly reliant on the secure operation of many interconnected software systems, the software engineering research community is faced with unprecedented research challenges, but also with exciting new opportunities. In this roadmap paper, we outline our vision of Software Security Analysis for the systems of the future. Given the recent advances in generative AI, we need new methods to assess and maximize the security of code co-written by machines. As our systems become increasingly heterogeneous, we need practical approaches that work even if some functions are automatically generated, e.g., by deep neural networks. As software systems depend evermore on the software supply chain, we need tools that scale to an entire ecosystem. What kind of vulnerabilities exist in future systems and how do we detect them? When all the shallow bugs are found, how do we discover vulnerabilities hidden deeply in the system? Assuming we cannot find all security flaws, how can we nevertheless protect our system? To answer these questions, we start our roadmap with a survey of recent advances in software security, then discuss open challenges and opportunities, and conclude with a long-term perspective for the field.


A Study of Privacy-Related Data Collected by Android Apps

November 2024

·

71 Reads

Many Android apps collect data from users, and the European Union's General Data Protection Regulation (GDPR) mandates clear disclosures of such data collection. However, apps often use third-party code, complicating accurate disclosures. This paper investigates how accurately current Android apps fulfill these requirements. In this work, we present a multi-layered definition of privacy-related data to correctly report data collection in Android apps. We further create a dataset of privacy-sensitive data classes that may be used as input by an Android app. This dataset takes into account data collected both through the user interface and system APIs. Based on this, we implement a semi-automated prototype that detects and labels privacy-related data collected by a given Android app. We manually examine the data safety sections of 70 Android apps to observe how data collection is reported, identifying instances of over- and under-reporting. We compare our prototype’s results with the data safety sections of 20 apps revealing reporting discrepancies. Using the results from two Messaging and Social Media apps (Signal and Instagram), we discuss how app developers under-report and over-report data collection, respectively, and identify inaccurately reported data categories. A broader study of 7,500 Android apps reveals that apps most frequently collect data that can partially identify users. Although system APIs consistently collect large amounts of privacy-related data, user interfaces exhibit some more diverse data collection patterns. A more focused study on various domains of apps reveals that the largest fraction of apps collecting personal data belong to the domain of Messaging and Social Media. Our findings show that location is collected frequently by apps, specially from the E-commerce and Shopping domain. However, it is often under-reported in app data safety sections. Our results highlight the need for greater consistency in privacy-aware app development and reporting practices.






An Empirical Study of Large Language Models for Type and Call Graph Analysis

October 2024

·

20 Reads

Large Language Models (LLMs) are increasingly being explored for their potential in software engineering, particularly in static analysis tasks. In this study, we investigate the potential of current LLMs to enhance call-graph analysis and type inference for Python and JavaScript programs. We empirically evaluated 24 LLMs, including OpenAI's GPT series and open-source models like LLaMA and Mistral, using existing and newly developed benchmarks. Specifically, we enhanced TypeEvalPy, a micro-benchmarking framework for type inference in Python, with auto-generation capabilities, expanding its scope from 860 to 77,268 type annotations for Python. Additionally, we introduced SWARM-CG and SWARM-JS, comprehensive benchmarking suites for evaluating call-graph construction tools across multiple programming languages. Our findings reveal a contrasting performance of LLMs in static analysis tasks. For call-graph generation in Python, traditional static analysis tools like PyCG significantly outperform LLMs. In JavaScript, the static tool TAJS underperforms due to its inability to handle modern language features, while LLMs, despite showing potential with models like mistral-large-it-2407-123b and GPT-4o, struggle with completeness and soundness in both languages for call-graph analysis. Conversely, LLMs demonstrate a clear advantage in type inference for Python, surpassing traditional tools like HeaderGen and hybrid approaches such as HiTyper. These results suggest that while LLMs hold promise in type inference, their limitations in call-graph analysis highlight the need for further research. Our study provides a foundation for integrating LLMs into static analysis workflows, offering insights into their strengths and current limitations.


Software Security Analysis in 2030 and Beyond: A Research Roadmap

September 2024

·

225 Reads

As our lives, our businesses, and indeed our world economy become increasingly reliant on the secure operation of many interconnected software systems, the software engineering research community is faced with unprecedented research challenges, but also with exciting new opportunities. In this roadmap paper, we outline our vision of Software Security Analysis for the software systems of the future. Given the recent advances in generative AI, we need new methods to evaluate and maximize the security of code co-written by machines. As our software systems become increasingly heterogeneous, we need practical approaches that work even if some functions are automatically generated, e.g., by deep neural networks. As software systems depend evermore on the software supply chain, we need tools that scale to an entire ecosystem. What kind of vulnerabilities exist in future systems and how do we detect them? When all the shallow bugs are found, how do we discover vulnerabilities hidden deeply in the system? Assuming we cannot find all security flaws, how can we nevertheless protect our system? To answer these questions, we start our research roadmap with a survey of recent advances in software security, then discuss open challenges and opportunities, and conclude with a long-term perspective for the field.



Citations (60)


... The software engineering community has developed many approaches and techniques to analyze the properties (incl. correctness) of a program [2]. However, in the future, we might simply prompt an LLM to solve a problem. ...

Reference:

Empirical Computation
Software Security Analysis in 2030 and Beyond: A Research Roadmap
  • Citing Article
  • December 2024

ACM Transactions on Software Engineering and Methodology

... Before being able to create a store listing for an Android app, developers must complete this form. This manual process often results in inaccuracies in reporting [14], [15], providing users with a false sense of privacy. To bridge the gap between reported and actual handling of privacy-relevant data, app developers need technical support. ...

Do Android App Developers Accurately Report Collection of Privacy-Related Data?
  • Citing Conference Paper
  • October 2024

... Completeness and Soundness. In this study, we use the terms completeness and soundness as they have been pre-established in callgraph research (Salis et al., 2021;Venkatesh et al., 2024a). The terms completeness and soundness are closely related to the precision and recall metrics. ...

Static analysis driven enhancements for comprehension in machine learning notebooks

Empirical Software Engineering

... Furthermore, aligning with the literature (Allamanis et al., 2020;Mir et al., 2022;Peng et al., 2022;Venkatesh et al., 2024bVenkatesh et al., , 2023a, for type-inference evaluation, we use exact matches as the metric as well. ...

The Emergence of Large Language Models in Static Analysis: A First Look through Micro-Benchmarks
  • Citing Conference Paper
  • June 2024

... Yet, an important difference to taint analysis is that for analyzing privacy-relevant data one needs to explore all paths originating at the sources of personal data, and needs to understand the processing activities along those paths. Without a predefined list of sinks, the analysis requires thorough examination of all code handling personal data [19]. ...

Toward an Android Static Analysis Approach for Data Protection
  • Citing Conference Paper
  • June 2024

... Methods2Test [155] CRUXEval [156] CRQBench [157] CriticBench [158] CodeScope [159] Merge Conflict Repair ConflictBench [160] Type Inference TypeEvalPy [161] TypeEvalPy AutoGen [161] Automatic Code Quality Review ...

TypeEvalPy: A Micro-benchmarking Framework for Python Type Inference Tools
  • Citing Conference Paper
  • May 2024

... Several analysis frameworks for imperative or object-oriented languages perform a flow-insensitive points-to analysis in the style of Andersen [3] or Steensgaard [65] as the basis of more advanced analyses of the program. In this way, the Java analysis frameworks Soot [36] and FlowDroid [6] rely on pointer analyses provided by Spark [38], while SootUp [33] relies on QiLin [27,29]. Cai and Zhang [11] propose a call graph construction combining flow-insensitive points-to analysis with a flow-sensitive refinement. ...

SootUp: A Redesign of the Soot Static Analysis Framework

Lecture Notes in Computer Science

... CODEC CRYPTO APIs were included since the insecure use of codec/crypto libraries can lead to vulnerabilities [19]. After the categorization, we have 3 categories (themes) and 15 subcategories, as shown in Table III. ...

Securing Your Crypto-API Usage Through Tool Support - A Usability Study
  • Citing Conference Paper
  • October 2023

... Croft et al. (2021) concluded that although learning-based approaches had better precision, both learning-based and SAST tools approaches should be used independently. Piskachev et al. (2023) did a user study of SAST tools in resolving security vulnerabilities and provided a list of recommendations for software security professionals and practitioners. Scandariato et al. (2013) studied users' experiences of nine participants using a SAST tool and an automated tool for penetration testing on two blogging applications. ...

Can the configuration of static analyses make resolving security vulnerabilities more effective? - A user study

Empirical Software Engineering

... However, none of the existing studies have created a benchmark for TEE partitioning issues. By referencing the construction methodology of the popular API-misuse dataset CryptoAPI-Bench [1,2,38], we designed the benchmark PartitioningE-Bench for evaluating the abilities of DITING in bad partitioning detection. We will describe its creation steps below. ...

Runtime Verification of Crypto APIs: An Empirical Study
  • Citing Article
  • October 2023

IEEE Transactions on Software Engineering