Conference Paper

A Study of Malcode-Bearing Documents.

DOI: 10.1007/978-3-540-73614-1_14 Conference: Detection of Intrusions and Malware, and Vulnerability Assessment, 4th International Conference, DIMVA 2007, Lucerne, Switzerland, July 12-13, 2007, Proceedings
Source: DBLP

ABSTRACT By exploiting the object-oriented dynamic composability of modern document applications and formats, malcode hidden in otherwise inconspicuous documents can reach third-party applications that may harbor exploitable vulnerabilities otherwise unreachable by network-level service attacks. Such attacks can be very selective and dicult to detect compared to the typical network worm threat, owing to the complex- ity of these applications and data formats, as well as the multitude of document-exchange vectors. As a case study, this paper focuses on Mi- crosoft Word documents as malcode carriers. We investigate the pos- sibility of detecting embedded malcode in Word documents using two techniques: static content analysis using statistical models of typical doc- ument content, and run-time dynamic tests on diverse platforms. The experiments demonstrate these approaches can not only detect known malware, but also most zero-day attacks. We identify several problems with both approaches, representing both challenges in addressing the problem and opportunities for future research.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The widespread adoption of the PDF format for document exchange has given rise to the use of PDF files as a prime vector for malware propagation. As vulnerabilities in the major PDF viewers keep surfacing, effective detection of malicious PDF documents remains an important issue. In this paper we present MDScan, a standalone malicious doc-ument scanner that combines static document analysis and dynamic code execution to detect previously unknown PDF threats. Our evaluation shows that MDScan can detect a broad range of malicious PDF documents, even when they have been extensively obfuscated.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Run-time behavior of processes – running on an end-host – is being actively used to dynamically detect malware. Most of these detection schemes build model of run-time behavior of a process on the basis of its data flow and/or sequence of system calls. These novel techniques have shown promising results but an efficient and effective technique must meet the following performance metrics: (1) high detection accuracy, (2) low false alarm rate, (3) small detection time, and (4) the technique should be resilient to run-time evasion attempts. To meet these challenges, a novel concept of genetic footprint is proposed, by mining the information in the kernel process control blocks (PCB) of a process, that can be used to detect malicious processes at run time. The genetic footprint consists of selected parameters – maintained inside the PCB of a kernel for each running process – that define the semantics and behavior of an executing process. A systematic forensic study of the execution traces of benign and malware processes is performed to identify discriminatory parameters of a PCB (task_struct is PCB in case of Linux OS). As a result, 16 out of 118 task structure parameters are short listed using the time series analysis. A statistical analysis is done to corroborate the features of the genetic footprint and to select suitable machine learning classifiers to detect malware. The scheme has been evaluated on a dataset that consists of 105 benign processes and 114 recently collected malware processes for Linux. The results of experiments show that the presented scheme achieves a detection accuracy of 96% with 0% false alarm rate in less than 100 ms of the start of a malicious activity. Last but not least, the presented technique utilizes partial knowledge that is available at a given time while the process is still executing; as a result, the kernel of OS can devise mitigation strategies. It is also shown that the presented technique is robust to well known run-time evasion attempts.
    Information Sciences 09/2011; · 3.64 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Owed to their versatile functionality and widespread adoption, PDF documents have become a popular avenue for user exploitation ranging from large-scale phishing attacks to targeted attacks. In this paper, we present a framework for robust detection of malicious documents through machine learning. Our approach is based on features extracted from document metadata and structure. Using real-world datasets, we demonstrate the the adequacy of these document properties for malware detection and the durability of these features across new malware variants. Our analysis shows that the Random Forests classification method, an ensemble classifier that randomly selects features for each individual classification tree, yields the best detection rates, even on previously unseen malware. Indeed, using multiple datasets containing an aggregate of over 5,000 unique malicious documents and over 100,000 benign ones, our classification rates remain well above 99% while maintaining low false positives of 0.2% or less for different classification parameters and experimental scenarios. Moreover, the classifier has the ability to detect documents crafted for targeted attacks and separate them from broadly distributed malicious PDF documents. Remarkably, we also discovered that by artificially reducing the influence of the top features in the classifier, we can still achieve a high rate of detection in an adversarial setting where the attacker is aware of both the top features utilized in the classifier and our normality model. Thus, the classifier is resilient against mimicry attacks even with knowledge of the document features, classification method, and training set.
    Proceedings of the 28th Annual Computer Security Applications Conference; 12/2012

Full-text (2 Sources)

Available from
May 22, 2014