Chapter

Extended Abstract: Toward Systematically Exploring Antivirus Engines

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

While different works tested antiviruses (AVs) resilience to obfuscation techniques, no work studied AVs looking at the big picture, that is including their modern components (e.g., emulators, heuristics). As a matter of fact, it is still unclear how AVs work internally. In this paper, we investigate the current state of AVs proposing a methodology to explore AVs capabilities in a black-box fashion. First, we craft samples that trigger specific components in an AV engine, and then we leverage their detection outcome and label as a side channel to infer how such components work. To do this, we developed a framework, crAVe, to automatically test and explore the capabilities of generic AV engines. Finally, we tested and explored commercial AVs and obtained interesting insights on how they leverage their internal components.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Deneysel süreçte, masum program olarak açık kaynak kodlu Notepad++ metin editörü [38] kullanılmışken, kötücül kod bloğu ise Metasploit [39] adlı yine açık kaynak kodlu zafiyet analiz ve test çatısı kullanılarak üretilmiştir. Elde edilen kötücül yazılımın bilişim pazarında yaygın olarak kullanılan antivirüs yazılımlarının kaçı tarafından başarıyla tespit edilebildiği sonucuna ise herkesin kullanımına açık kötücül yazılım analiz platformu Virustotal aracılığıyla ulaşılmıştır [40,41]. Birden fazla antivirüs yazılımının eş zamanlı aynı dosya üzerinde tarama yapabilmesine imkân veren bu platform, tarama neticesinde elden edilen bulguları bir rapor hâlinde kullanıcıya sunma yeteneğine sahiptir. ...
... Virustotal platformunda olduğu gibi birçok antivirüs yazılımının hangi konfigürasyonda ve ne kadar hassas düzeyde tarama yaptığı tam olarak bilinemediğinden, kötücül yazılım üzerinde yapılacak farklı her yeni modifikasyonun, tarama sonuçlarında ne şekilde değişimlere neden olacağı öngörülememektedir [41]. Bu sebeple her yeni modifikasyonda kötücül yazılımın Virustotal aracılığıyla tekrardan taratılabiliyor olması, siber saldırganlara bir sonraki modifikasyon adımının belirlenmesi ve test edilmesinde yol gösterici olmaktadır. ...
... These landscapes started to be presented by the first academic work tackling the problem (e.g., a review of AVs using signatures [4], or ML detectors [193]). The major drawback of the academic literature is that most works adopt black-box analyses procedures [156], exploiting the fact that still few solutions employ anti-black-box technique [70]. Whereas this approach provides interesting information, such as about the AV's energy consumption [152], they do not reveal the AV company's project decisions. ...
Article
AntiViruses (AVs) are the main defense line against attacks for most users and much research has been done about them, especially proposing new detection procedures that work in academic prototypes. However, as most current and commercial AVs are closed-source solutions, in practice, little is known about their real internals: information such as what is a typical AV database size, the detection methods effectively used in each operation mode, and how often on average the AVs are updated are still unknown. This prevents research work from meeting the industrial practices more thoroughly. To fill this gap, in this work, we systematize the knowledge about AVs. To do so, we first surveyed the literature and identified existing knowledge gaps in AV internals’ working. Further, we bridged these gaps by analyzing popular (Windows, Linux, and Android) AV solutions to check their operations in practice. Our methodology encompassed multiple techniques, from tracing to fuzzing. We detail current AV’s architecture, including their multiple components, such as browser extensions and injected libraries, regarding their implementation, monitoring features, and self-protection capabilities. We discovered, for instance, a great disparity in the set of API functions hooked by the distinct AV’s libraries, which might have a significant impact in the viability of academically-proposed detection models (e.g., machine learning-based ones).
... Adversarial malware examples are an immediate threat as they are evasive and malicious executables that can take advantage of many commercial antivirus software's persistent vulnerability to obfuscation and mutation [61]. This differentiates practical adversarial malware examples from an adversarial feature vector. ...
Preprint
Machine learning based solutions have been very helpful in solving problems that deal with immense amounts of data, such as malware detection and classification. However, deep neural networks have been found to be vulnerable to adversarial examples, or inputs that have been purposefully perturbed to result in an incorrect label. Researchers have shown that this vulnerability can be exploited to create evasive malware samples. However, many proposed attacks do not generate an executable and instead generate a feature vector. To fully understand the impact of adversarial examples on malware detection, we review practical attacks against malware classifiers that generate executable adversarial malware examples. We also discuss current challenges in this area of research, as well as suggestions for improvement and future research directions.
... Evaluating AV solutions is a hard task because most of their internal working mechanisms are closed source solutions and with limited configuration possibilities. Given this limitation, overall AV evaluations are re-quired to develop specially-crafted samples to trigger individual AV components [38]. Therefore, most evaluation reports focus on specific factors affecting AV working, such as detection regression, when a sample stops being detected after some time [15]. ...
Full-text available
Article
Security evaluation is an essential task to identify the level of protection accomplished in running systems or to aid in choosing better solutions for each specific scenario. Although antiviruses (AVs) are one of the main defensive solutions for most end-users and corporations, AV’s evaluations are conducted by few organizations and often limited to compare detection rates. Moreover, other important factors of AVs’ operating mode (e.g., response time and detection regression) are usually underestimated. Ignoring such factors create an “understanding gap” on the effectiveness of AVs in actual scenarios, which we aim to bridge by presenting a broader characterization of current AVs’ modes of operation. In our characterization, we consider distinct file types, operating systems, datasets, and time frames. To do so, we daily collected samples from two distinct, representative malware sources and submitted them to the VirusTotal (VT) service for 30 consecutive days. In total, we considered 28,875 unique malware samples. For each day, we retrieved the submitted samples’ detection rates and assigned labels, resulting in more than 1M distinct VT submissions overall. Our experimental results show that: (i) phishing contexts are a challenge for all AVs, turning malicious Web pages detectors less effective than malicious files detectors; (ii) generic procedures are insufficient to ensure broad detection coverage, incurring in lower detection rates for particular datasets (e.g., country-specific) than for those with world-wide collected samples; (iii) detection rates are unstable since all AVs presented detection regression effects after scans in different time frames using the same dataset and (iv) AVs’ long response times in delivering new signatures/heuristics create a significant attack opportunity window within the first 30 days after we first identified a malicious binary. To address the effects of our findings, we propose six new metrics to evaluate the multiple aspects that impact the effectiveness of AVs. With them, we hope to assess corporate (and domestic) users to better evaluate the solutions that fit their needs more adequately.
... Current anti-malware scanners are not effective against these evading techniques [14][15][16][17][18]. In [13], four mobile security applications were tested on more than 1200 Android malware apps. ...
... In addition, the authors in [9] compare the design of 30 top AV solutions focusing on their detection and prevention capabilities. In a similar approach, Quarta et al [10] leverage VirusTotal detections to perform a black-box analysis of its AV engines solely based on input samples and the outcome from each AV. Furthermore, the authors in [11] raise concerns regarding malware developers using multi-scanner tools in their loops and propose a methodological approach to detect them from VirusTotal submissions. ...
Article
Multi-scanner Antivirus (AV) systems are often used for detecting Android malware since the same piece of software can be checked against multiple different AV engines. However, in many cases the same software application is flagged as malware by few AV engines, and often the signatures provided contradict each other, showing a clear lack of consensus between different AV engines. This work analyzes more than 80 thousand Android applications flagged as malware by at least one AV engine, with a total of almost 260 thousand malware signatures. In the analysis, we identify 41 different malware families, we study their relationships and the relationships between the AV engines involved in such detections, showing that most malware cases belong to either Adware abuse or really dangerous Harmful applications, but some others are unspecified (or Unknown). With the help of Machine Learning and Graph Community Algorithms, we can further combine the different AV detections to classify such Unknown apps into either Adware or Harmful risks, reaching F1-score above 0.84.
Article
Anti-malware agents typically communicate with their remote services to share information about suspicious files. These remote services use their up-to-date information and global context (view) to help classify the files and instruct their agents to take a predetermined action (e.g., delete or quarantine). In this study, we provide a security analysis of a specific form of communication between anti-malware agents and their services, which takes place entirely over the insecure DNS protocol. These services, which we denote DNS anti-malware list (DNSAML) services, affect the classification of files scanned by anti-malware agents, therefore potentially putting their consumers at risk due to known integrity and confidentiality flaws of the DNS protocol. By analyzing a large-scale DNS traffic dataset made available to the authors by a well-known CDN provider, we identify anti-malware solutions that seem to make use of DNSAML services. We found that these solutions, deployed on almost three million machines worldwide, exchange hundreds of millions of DNS requests daily. These requests carry sensitive file scan information, often - as we demonstrate - without any additional safeguards to compensate for the insecurities of the DNS protocol. As a result, these anti-malware solutions that use DNSAML are vulnerable to DNS attacks. For instance, an attacker capable of tampering with DNS queries gains the ability to alter the classification of scanned files, without a presence on the scanning machine. We showcase three attacks applicable to at least three anti-malware solutions that could result in the disclosure of sensitive information and improper behavior by the anti-malware agent, such as ignoring detected threats. Finally, we propose and review a set of countermeasures for anti-malware solution providers to prevent attacks stemming from the use of DNSAML services.
Article
Malware is one of the prevalent security threats. Sandboxes and, more generally, instrumented environments play a crucial role in dynamically analyzing malware samples, providing key threat intelligence results and critical information to update detection mechanisms. In this paper, we study the evasive behaviors employed by malware authors to hide the malicious activity of samples and hinder security analysis. First, we collect and systematize 92 evasive techniques leveraged by Windows malware to detect and thwart instrumented environments (e.g., debuggers and virtual machines). Then, we implement a framework for evasion analysis and analyze 45,375 malware samples observed in the wild between 2010 and 2019; we compare this analysis against popular, legitimate Windows programs to study the intrinsic characteristics of such evasive behaviors. Based on the results of our experiments, we present statistics about the adoption of evasive techniques and their evolution over time. We show that over the past 10 years, the prevalence of evasive malware samples had a slight increase (12%). Moreover, the employed techniques shifted significantly over time. We also identify techniques that are specific to malware, as opposed to being employed by both malicious and legitimate software. Finally, we study how the security community reacts to the deployment of new evasive techniques. Overall, our results empirically address open research questions and provide insights and directions for future research.
Conference Paper
The ever-increasing number of malware samples demands for automated tools that aid the analysts in the reverse engineering of complex malicious binaries. Frequently, malware communicates over an encrypted channel with external network resources under the control of malicious actors, such as Command and Control servers that control the botnet of infected machines. Hence, a key aspect in malware analysis is uncovering and understanding the semantics of network communications. In this paper we present SysTaint, a semi-automated tool that runs malware samples in a controlled environment and analyzes their execution to support the analyst in identifying the functions involved in the communication and the exchanged data. Our evaluation on four banking Trojan samples from different families shows that SysTaint is able to handle and inspect encrypted network communications, obtaining useful information on the data being sent and received, on how each sample processes this data, and on the inner portions of code that deal with the data processing.
Full-text available
Conference Paper
To fight the ever-increasing proliferation of novel mal-ware, antivirus (AV) vendors have turned to emulation-based automated dynamic malware analysis. Malware authors have responded by creating malware that attempts to evade detection by behaving benignly while running in an emulator. Malware may detect emulation by looking for emulator "fingerprints" such as unique environmental values, timing inconsistencies, or bugs in CPU emulation. Due to their immense complexity and the expert knowledge required to effectively analyze them, reverse-engineering AV emulators to discover fingerprints is an extremely challenging task. As an alternative, researchers have demonstrated fingerprinting attacks using simple black-box testing, but these techniques are slow, inefficient, and generally awkward to use. We propose a novel black-box technique to efficiently extract emulator fingerprints without reverse-engineering. To demonstrate our technique, we implemented an easy-to-use tool and API called AVLeak. We present an evaluation of AVLeak against several current consumer AVs and show emulator fingerprints derived from our experimentation. We also propose a classification of fingerprints as they apply to consumer AV emu-lators. Finally, we discuss the defensive implications of our work, and future directions of research in emulator evasion and exploitation.
Full-text available
Article
The authors of mobile-malware have started to leverage program protection techniques to circumvent anti-viruses, or simply hinder reverse engineering. In response to the diffusion of anti-virus applications, several researches have proposed a plethora of analyses and approaches to highlight their limitations when malware authors employ program-protection techniques. An important contribution of this work is a systematization of the state of the art of anti-virus apps, comparing the existing approaches and providing a detailed analysis of their pros and cons. As a result of our systematization, we notice the lack of openness and reproducibility that, in our opinion, are crucial for any analysis methodology. Following this observation, the second contribution of this work is an open, reproducible, rigorous methodology to assess the effectiveness of mobile anti-virus tools against code-transformation attacks. Our unified workflow, released in the form of an open-source prototype, comprises a comprehensive set of obfuscation operators. It is intended to be used by anti-virus developers and vendors to test the resilience of their products against a large dataset of malware samples and obfuscations, and to obtain insights on how to improve their products with respect to particular classes of code-transformation attacks.
Full-text available
Article
Malwares are big threat to digital world and evolving with high complexity. It can penetrate networks, steal confidential information from computers, bring down servers and can cripple infrastructures etc. To combat the threat/attacks from the malwares, anti- malwares have been developed. The existing anti-malwares are mostly based on the assumption that the malware structure does not changes appreciably. But the recent advancement in second generation malwares can create variants and hence posed a challenge to anti-malwares developers. To combat the threat/attacks from the second generation malwares with low false alarm we present our survey on malwares and its detection techniques.
Full-text available
Conference Paper
Remote attackers use network reconnaissance techniques, such as port scanning, to gain information about a victim machine and then use this information to launch an attack. Current network reconnaissance techniques, that are typically below the application layer, are limited in the sense that they can only give basic information, such as what services a victim is running. Furthermore, modern remote exploits typically come from a server and attack a client that has connected to it, rather than the attacker connecting directly to the victim. In this paper, we raise this question and answer it: Can the attacker go beyond the traditional techniques of network reconnaissance and gain high-level, detailed information? We investigate remote timing channel attacks against ClamAV antivirus and show that it is possible, with high accuracy, for the remote attacker to check how up-to-date the victim's antivirus signature database is. Because the strings the attacker uses to do this are benign (i.e., they do not trigger the antivirus) and the attack can be accomplished through many different APIs, the attacker has a large amount of flexibility in hiding the attack.
Full-text available
Article
Camouflage of malware is a serious challenge for antivirus experts and code analysts. Malware use various techniques to camouflage them to not be easily visible and make their lifetime as longer as possible. Although, camouflage approaches cannot fully stop the analyzing and fighting against the malware, but it make the process of analyzing and detection prolonged, so the malware can get more time to widely spread. It is very important for antivirus technologies to improve their products by shortening the detection procedure, not only at the first time facing with a new threat, but also in the future detections. In this paper, we intend to review the concept of camouflage in malware and its evolution from non-stealth days to modern metamorphism. Moreover, we explore obfuscation techniques exploited by metamorphism, the most recent method in malware camouflage.
Full-text available
Article
This paper presents a general overview on evolution of concealment methods in computer viruses and defensive techniques employed by anti-virus products. In order to stay far from the anti-virus scanners, computer viruses gradually improve their codes to make them invisible. On the other hand, anti-virus technologies continually follow the virus tricks and methodologies to overcome their threats. In this process, anti-virus experts design and develop new methodologies to make them stronger, more and more, every day. The purpose of this paper is to review these methodologies and outline their strengths and weaknesses to encourage those are interested in more investigation on these areas.
Full-text available
Article
Malware includes several protections to complicate their analysis: the longer it takes to analyze a new malware sample, the longer the sample survives and the larger number of systems it compromises. Nowadays, new mal-ware samples are analyzed dynamically using virtual en-vironments (e.g., emulators, virtual machines, or debug-gers). Therefore, malware incorporate a variety of tests to detect whether they are executed through such envi-ronments and obfuscate their behavior if they suspect their execution is being monitored. Several simple tests, we indistinctly call red-pills, have already been proposed in literature to detect whether the execution of a program is performed in a real or in a virtual environment. In this paper we propose an automatic and systematic technique to generate red-pills, specific for detecting if a program is executed through a CPU emulator. Using this tech-nique we generated thousands of new red-pills, involving hundreds of different opcodes, for two publicly available emulators, which are widely used for analyzing malware.
Full-text available
Conference Paper
Many threats that plague todaypsilas networks (e.g., phishing, botnets, denial of service attacks) are enabled by a complex ecosystem of attack programs commonly called malware. To combat these threats, defenders of these networks have turned to the collection, analysis, and reverse engineering of malware as mechanisms to understand these programs, generate signatures, and facilitate cleanup of infected hosts. Recently however, new malware instances have emerged with the capability to check and often thwart these defensive activities - essentially leaving defenders blind to their activities. To combat this emerging threat, we have undertaken a robust analysis of current malware and developed a detailed taxonomy of malware defender fingerprinting methods. We demonstrate the utility of this taxonomy by using it to characterize the prevalence of these avoidance methods, to generate a novel fingerprinting method that can assist malware propagation, and to create an effective new technique to protect production systems.
Conference Paper
Although anti-virus software has significantly evolved over the last decade, classic signature matching based on byte patterns is still a prevalent concept for identifying security threats. Anti-virus signatures are a simple and fast detection mechanism that can complement more sophisticated analysis strategies. However, if signatures are not designed with care, they can turn from a defensive mechanism into an instrument of attack. In this paper, we present a novel method for automatically deriving signatures from anti-virus software and discuss how the extracted signatures can be used to attack sensible data with the aid of the virus scanner itself. To this end, we study the practicability of our approach using four commercial products and exemplary demonstrate anti-virus assisted attacks in three different scenarios.
Conference Paper
Labeling a malicious executable as a variant of a known family is important for security applications such as triage, lineage, and for building reference datasets in turn used for evaluating malware clustering and training malware classification approaches. Oftentimes, such labeling is based on labels output by antivirus engines. While AV labels are well-known to be inconsistent, there is often no other information available for labeling, thus security analysts keep relying on them. However, current approaches for extracting family information from AV labels are manual and inaccurate. In this work, we describe AVclass, an automatic labeling tool that given the AV labels for a, potentially massive, number of samples outputs the most likely family names for each sample. AVclass implements novel automatic techniques to address 3 key challenges: normalization, removal of generic tokens, and alias detection. We have evaluated AVclass on 10 datasets comprising 8.9 M samples, larger than any dataset used by malware clustering and classification works. AVclass leverages labels from any AV engine, e.g., all 99 AV engines seen in VirusTotal, the largest engine set in the literature. AVclass’s clustering achieves F1 measures up to 93.9 on labeled datasets and clusters are labeled with fine-grained family names commonly used by the AV vendors. We release AVclass to the community.
Article
Despite the widespread use of antivirus software, malware remains pervasive. A new study compares the effectiveness of six commercial AV products.
Escaping the avast sandbox using a single IOCTL
  • K Economou
Attacks on more virtual machine emulators
  • P Ferrie
Bypassing sandboxes for fun
  • P Jung
Comodo: integer overflow leading to heap overflow in win32 emulation
  • T Ormandy
Symantec/norton antivirus aspack remote heap/pool memory corruption vulnerability cve
  • T Ormandy
Breaking the sandbox
  • S Singh
Uncloaking Advanced Malware: How to Spot and Stop an Evasion
  • M Cova
Testing Malware Detectors
  • M Christodorescu
  • S Jha
  • Mihai Christodorescu
Detecting malware and sandbox evasion techniques
  • D Keragala
Eset nod32 heap overflow unpacking epoc installation files
  • T Ormandy
Sophail: a critical analysis of sophos antivirus
  • T Ormandy
Comodo antivirus: emulator stack buffer overflow handling psubusb packed subtract unsigned with saturation
  • T Ormandy