
Thorsten HolzRuhr-Universität Bochum | RUB · Horst Görtz Institute of IT-Security (HGI)
Thorsten Holz
Professor
About
269
Publications
131,425
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
14,103
Citations
Introduction
Skills and Expertise
Publications
Publications (269)
Measurement studies are essential for research and industry alike to understand the Web's inner workings better and help quantify specific phenomena. Performing such studies is demanding due to the dynamic nature and size of the Web. An experiment's careful design and setup are complex, and many factors might affect the results. However, while seve...
Smart contracts are increasingly being used to manage large numbers of high-value cryptocurrency accounts. There is a strong demand for automated, efficient, and comprehensive methods to detect security vulnerabilities in a given contract. While the literature features a plethora of analysis methods for smart contracts, the existing proposals do no...
The number of papers submitted to academic conferences is steadily rising in many scientific disciplines. To handle this growth, systems for automatic paper-reviewer assignments are increasingly used during the reviewing process. These systems use statistical topic models to characterize the content of submissions and automate the assignment to rev...
We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact...
Recently, large language models for code generation have achieved breakthroughs in several programming language tasks. Their advances in competition-level programming problems have made them an emerging pillar in AI-assisted pair programming. Tools such as GitHub Copilot are already part of the daily programming workflow and are used by more than a...
Modern websites frequently use and embed third-party services to facilitate web development, connect to social media, or for monetization. This often introduces privacy issues as the inclusion of third-party services on a website can allow the third party to collect personal data about the website's visitors. While the prevalence and mechanisms of...
Fuzzing is a key method to discover vulnerabilities in programs. Despite considerable progress in this area in the past years, measuring and comparing the effectiveness of fuzzers is still an open research question. In software testing, the gold standard for evaluating test quality is mutation analysis, assessing the ability of a test to detect syn...
Diffusion models (DMs) have recently emerged as a promising method in image synthesis. They have surpassed generative adversarial networks (GANs) in both diversity and quality, and have achieved impressive results in text-to-image and image-to-image modeling. However, to date, only little attention has been paid to the detection of DM-generated ima...
Web measurement studies can shed light on not yet fully understood phenomena and thus are essential for analyzing how the modern Web works. This often requires building new and adjusting existing crawling setups, which has led to a wide variety of analysis tools for different (but related) aspects. If these efforts are not sufficiently documented,...
Modern websites frequently use and embed third-party services to facilitate web development, connect to social media, or for monetization. This often introduces privacy issues as the inclusion of third-party services on a website can allow the third party to collect personal data about the website's visitors. While the prevalence and mechanisms of...
Memory safety in complex applications implemented in unsafe programming languages such as C/C++ is still an unresolved problem in practice. Many different types of defenses have been proposed in the past to mitigate this problem. The most promising next step is a tighter integration of the hardware and software level: modern mitigation techniques a...
Voice assistants like Amazon’s Alexa, Google’s Assistant, Tencent’s Xiaowei, or Apple’s Siri, have become the primary (voice) interface in smart speakers that can be found in millions of households. For privacy reasons, these speakers analyze every sound in their environment for their respective wake word like “Alexa,” “Jiǔsì’èr líng,” or “Hey Siri...
Coverage-guided fuzz testing ("fuzzing") has become mainstream and we have observed lots of progress in this research area recently. However, it is still challenging to efficiently test network services with existing coverage-guided fuzzing methods. In this paper, we introduce the design and implementation of Nyx-Net, a novel snapshot-based fuzzing...
In the arms race between binary exploitation techniques and mitigation schemes, code-reuse attacks have been proven indispensable. Typically, one of the initial hurdles is that an attacker cannot execute their own code due to countermeasures such as data execution prevention (DEP, Open image in new window ). While this technique is powerful, the ta...
Nowadays, email is still the most popular communication channel of the Internet. It is based on Simple Mail Transfer Protocol (SMTP), which lacks basic security properties such as confidentiality and authenticity despite its ever-growing importance. This results in spam and frequent phishing attacks, often with spoofed sender email addresses to app...
Phishing is in practice one of the most common attack vectors threatening digital assets. An attacker sends a legitimate-looking e-mail to a victim to lure her on a website with the goal of tricking the victim into revealing credentials. A phishing e-mail can use both technical (e.g., a forged link) and psychological vectors (e.g., an authoritarian...
Attackers use various techniques to lure victims to malicious domains. A typical approach is to generate domains which look similar to well-known ones so that a confused victim is tricked into visiting the domain. An important attack technique in practice is the impersonation of domains in the lower DNS hierarchy as subdomains of otherwise unsuspic...
Mobile networks are a crucial part of our digital lives and require adequate security measures. The 4G and 5G network standards are complex and challenging to implement, which led to several implementation issues being discovered over the last years. Conse- quently, we aim to strengthen automation in testing and increase test coverage to spot issue...
In mobile networks, IMSI-Catchers identify and track users simply by requesting all users’ permanent identities (IMSI) in range. The 5G standard attempts to fix this issue by encrypting the perma- nent identifier (now SUPI) and transmitting the SUCI. Since the encrypted SUCI is re-generated with an ephemeral key for each use, an attacker can no lon...
Software obfuscation is a crucial technology to protect intellectual property. Despite its importance, commercial and academic state-of-the-art obfuscation approaches are vulnerable to a plethora of automated deobfuscation attacks, such as symbolic execution, taint analysis, or program synthesis. While several enhanced techniques were proposed to t...
This work evaluates the reproducibility of the paper "CNN-generated images are surprisingly easy to spot... for now" by Wang et al. published at CVPR 2020. The paper addresses the challenge of detecting CNN-generated imagery, which has reached the potential to even fool humans. The authors propose two methods which help an image classifier to gener...
Adversarial examples seem to be inevitable. These specifically crafted inputs allow attackers to arbitrarily manipulate machine learning systems. Even worse, they often seem harmless to human observers. In our digital society, this poses a significant threat. For example, Automatic Speech Recognition (ASR) systems, which serve as hands-free interfa...
In the past few years, we observed a wide adoption of practical systems that use Automatic Speech Recognition (ASR) systems to improve human-machine interaction. Modern ASR systems are based on neural networks and prior research demonstrated that these systems are susceptible to adversarial examples, i.e., malicious audio inputs that lead to miscla...
Advanced Persistent Threats (APTs) are one of the main challenges in modern computer security.
They are planned and performed by well-funded, highly-trained and often state-based actors.
The first step of such an attack is the reconnaissance of the target.
In this phase, the adversary tries to gather as much intelligence on the victim as possible t...
Voice assistants like Amazon's Alexa, Google's Assistant, or Apple's Siri, have become the primary (voice) interface in smart speakers that can be found in millions of households. For privacy reasons, these speakers analyze every sound in their environment for their respective wake word like ''Alexa'' or ''Hey Siri,'' before uploading the audio str...
Polymorphism and inheritance make C++ suitable for writing complex software, but significantly increase the attack surface because the implementation relies on virtual function tables (vtables). These vtables contain function pointers that attackers can potentially hijack and in practice, vtable hijacking is one of the most important attack vector...
Memory disclosure attacks play an important role in the exploitation of memory corruption vulnerabilities. By analyzing recent research, we observe that bypasses of defensive solutions that enforce control-flow integrity or attempt to detect return-oriented programming require memory disclosure attacks as a fundamental first step. However, research...
Microcode is an abstraction layer used by modern x86 processors that interprets user-visible CISC instructions to hardware-internal RISC instructions. The capability to update x86 microcode enables a vendor to modify CPU behavior in-field, and thus patch erroneous microarchitectural processes or even implement new features. Most prominently, the re...
Memory corruption vulnerabilities are still a severe threat for software systems. To thwart the exploitation of such vulnerabilities, many different kinds of defenses have been proposed in the past. Most prominently, Control-Flow Integrity (CFI) has received a lot of attention recently. Several proposals were published that apply coarse-grained pol...
The wide-spread adoption of system defenses such as the randomization of code, stack, and heap raises the bar for code-reuse attacks. Thus, attackers utilize a scripting engine in target programs like a web browser to prepare the code-reuse chain, e.g., relocate gadget addresses or perform a just-in-time gadget search. However, many types of progra...
More than two decades after the first stack smashing attacks, memory corruption vulnerabilities utilizing stack anomalies are still prevalent and play an important role in practice. Among such vulnerabilities, uninitialized variables play an exceptional role due to their unpleasant property of unpredictability: as compilers are tailored to operate...
The art of finding software vulnerabilities has been covered extensively in the literature and there is a huge body of work on this topic. In contrast, the intentional insertion of exploitable, security-critical bugs has received little (public) attention yet. Wanting more bugs seems to be counterproductive at first sight, but the comprehensive eva...
Memory corruption vulnerabilities have been around for decades and rank among the most prevalent vulnerabilities in embedded systems. Yet this constrained environment poses unique design and implementation challenges that significantly complicate the adoption of common hardening techniques. Combined with the irregular and involved nature of embedde...
Just-in-time return-oriented programming (JIT-ROP) is a powerful memory corruption attack that bypasses various forms of code randomization. Execute-only memory (XOM) can potentially prevent these attacks, but requires source code. In contrast, destructive code reads (DCR) provide a trade-off between security and legacy compatibility. The common be...
The European General Data Protection Regulation (GDPR), which went into effect in May 2018, brought new rules for the processing of personal data that affect many business models, including online advertising. The regulation's definition of personal data applies to every company that collects data from European Internet users. This includes trackin...
Vulnerabilities in private networks are difficult to detect for attackers outside of the network. While there are known methods for port scanning internal hosts that work by luring unwitting internal users to an external web page that hosts malicious JavaScript code, no such method for detailed and precise service identification is known. The reaso...
Deep neural networks can generate images that are astonishingly realistic, so much so that it is often hard for humans to distinguish them from actual photos. These achievements have been largely made possible by Generative Adversarial Networks (GANs). While these deep fake images have been thoroughly investigated in the image domain-a classical ap...
DNS rebinding is an attack technique know for more than 20 years, which is experiencing a revival caused by the ever-increasing networking of Internet of Things (IoT) devices. Thus, the potential attack surface is growing rapidly, and this paper shows that DNS rebinding attacks on many smart home devices are still successful. Nevertheless, various...
In the modern Web, service providers often rely heavily on third parties to run their services. For example, they make use of ad networks to finance their services, externally hosted libraries to develop features quickly, and analytics providers to gain insights into visitor behavior. For security and privacy, website owners need to be aware of the...
Polymorphism and inheritance make C++ suitable for writing complex software, but significantly increase the attack surface because the implementation relies on virtual function tables (vtables). These vtables contain function pointers that attackers can potentially hijack and in practice, vtable hijacking is one of the most important attack vector...
After the adoption of the General Data Protection Regulation (GDPR) in May 2018, more than 60 % of popular websites in Europe were found to display a cookie consent notice. This has quickly led to users becoming fatigued with privacy notifications and contributed to the rise of both browser extensions that block these banners and demands for a solu...
Zusammenfassung
Im Fahrwasser der EU-Datenschutzgrundverordnung (DSGVO) entstehen derzeit in mehreren Ländern neue Datenschutzgesetze, die sich teilweise an Konzepten der DSGVO orientieren. Parallel rücken die Datenverarbeitungspraktiken großer Technologiefirmen in den Fokus von Datenschutzbehörden und Medien, was die Frage aufwirft, inwieweit die...
The European General Data Protection Regulation (GDPR)
went into effect in May 2018. As part of this regulation, the right to access was extended, it grants a user the right to request access to all personal data collected by a company about this user. In this paper, we present the results of an empirical study on data exfiltration attacks that are...
Microcode is an abstraction layer on top of the physical components of a CPU and present in most general-purpose CPUs today. In addition to facilitate complex and vast instruction sets, it also provides an update mechanism that allows CPUs to be patched in-place without requiring any special hardware. While it is well-known that CPUs are regularly...
DNS rebinding is an attack technique know for more than 20 years, which is experiencing a revival caused by the ever-increasing networking of Internet of Things (IoT) devices. Thus, the potential attack surface is growing rapidly, and this paper shows that DNS rebinding attacks on many smart home devices are still succ...
Online tracking has mostly been studied by passively measuring the presence of tracking services on websites (i) without knowing what data these services collect, (ii) the reasons for which specific purposes it is collected, (iii) or if the used practices are disclosed in privacy policies. The European General Data Protection Regulation (GDPR) came...
Ad personalization has been criticized in the past for invading privacy, lack of transparency, and improper controls offered to users. Recently, companies started to provide web portals and other means for users to access data collected about them. In this paper, we study these new transparency tools from multiple perspectives using a mixed-methods...
More than two decades after the first stack smashing attacks, memory corruption vulnerabilities utilizing stack anomalies are still prevalent and play an important role in practice. Among such vulnerabilities, uninitialized variables play an exceptional role due to their unpleasant property of unpredictability: as compilers are tailored to operate...
Software complexity has increased over the years. One common way to tackle this complexity during development is to encapsulate features into a shared library. This allows developers to reuse already implemented features instead of reimplementing them over and over again. However, not all features provided by a shared library are actually used by a...
The European General Data Protection Regulation (GDPR) went into effect in May 2018. As part of this regulation, the right to access was extended, it grants a user the right to request access to all personal data collected by a company about this user. In this paper, we present the results of an empirical study on data exfiltration attacks that are...
Since the adoption of the General Data Protection Regulation (GDPR) in May 2018 more than 60 % of popular websites in Europe display cookie consent notices to their visitors. This has quickly led to users becoming fatigued with privacy notifications and contributed to the rise of both browser extensions that block these banners and demands for a so...
A general defense strategy in computer security is to increase the cost of successful attacks in both computational resources as well as human time. In the area of binary security, this is commonly done by using obfuscation methods to hinder reverse engineering and the search for software vulnerabilities. However, recent trends in automated bug fin...
Automatic speech recognition (ASR) systems are possible to fool via targeted adversarial examples. These can induce the ASR to produce arbitrary transcriptions in response to any type of audio signal, be it speech, environmental sounds, or music. However, in general, those adversarial examples did not work in a real-world setup, where the examples...
Online tracking has mostly been studied by passively measuring the presence of tracking services on websites (i) without knowing what data these services collect, (ii) the reasons for which specific purposes it is collected, (iii) or if the used practices are disclosed in privacy policies. The European General Data Protection Regulation (GDPR) came...
Software complexity has increased over the years. One common way to tackle this complexity during development is to encapsulate features into a shared library. This allows developers to reuse already implemented features instead of reimplementing them over and over again. However, not all features provided by a shared library are actually used by a...
Advertisements are the fuel that runs many online services such as websites or mobile apps, but also adversaries started to abuse ads for financial gains.
Nowadays, online advertising companies track users all over the web in order to create successful online ads campaigns specifically tailored for a target audience.
A popular phenomenon on the Int...
The Domain Name System (DNS) is a fundamental backbone service of the Internet. In practice, this infrastructure often shows flaws, which indicate that measuring the DNS is important to understand potential (security) issues. Several works deal with the DNS and present such problems, mitigations, and attack vectors. A so far overlooked issue is the...
Long Term Evolution (LTE) provides the communication infrastructure for both professional and private use cases and has become an integral part of our everyday life. Even though LTE/4G overcomes many security issues of previous standards, recent work demonstrates several attack vectors on the physical and network layers of the LTE stack. We do, how...
Long Term Evolution (LTE) is the de-facto standard for mobile communication. It provides effective security features but leaves room for misunderstandings in its configuration and implementation. In particular, providers face difficulties when maintaining network configurations.
In this paper, we analyze the security configuration of commercial LT...