Conference PaperPDF Available

Anti-virus Engine Analysis using Deep Web Malware Data

Authors:

Abstract

AntiVirus products and tools are essential in every business deployment connected to the Internet. Nowadays, with the increase in the number and diversity of malware on the Web, there are also more AntiVirus Tools (AVT) becoming available to protect users and/or companies from malware. However, the quarterly growth at around 12% for known unique malware samples, according to the Intel Security Group’s McAfee Labs Threat Report: August 2015, and the fact that some AntiVirus companies use same or signicantly similar AntiVirus engines leave us in some way vulnerable to the existing security threats. In this work, using graph analysis and visualization methods, on one hand we will empirically infer detection engine similarity and existing groupings and/or overlapping between them, while on the other hand we will infer which Anti-Virus Tools (AVTs) differentiate from other AVTs and have greater advantage in detecting malware compared to others.
Anti-virus Engine Analysis using Deep Web Malware Data
Igor Mishkovski1, Miroslav Mirchev1and Milos Jovanovik1
1Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, R. Macedonia
Abstract
AntiVirus products and tools are essential in every business deployment connected to the Internet.
Nowadays, with the increase in the number and diversity of malware on the Web, there are also
more AntiVirus Tools (AVT) becoming available to protect users and/or companies from malware.
However, the quarterly growth at around 12% for known unique malware samples, according to the
Intel Security Group’s McAfee Labs Threat Report: August 2015, and the fact that some AntiVirus
companies use same or signicantly similar AntiVirus engines leave us in some way vulnerable to the
existing security threats.
In this work, using graph analysis and visualization methods, on one hand we will empirically
infer detection engine similarity and existing groupings and/or overlapping between them, while on
the other hand we will infer which Anti-Virus Tools (AVTs) differentiate from other AVTs and have
greater advantage in detecting malware compared to others.
Using the AVT responses to our malware file set we will optimize the combination of AVTs in
order to obtain maximum detection rate (i.e. coverage). We strongly believe that this approach can
be used by companies who want to implement multi-scanning approach on their email gateways.
Finally, another novelty in this work is that we relate the source of the malware, i.e. the domain
name where the malware is found, with AVTs. In this way, we will show the detection rate of AVTs
across domains in which potential malware resides. The results will imply that certain AVTs have
more detection capabilities on specific domains, whereas, others might have detection rate spread
across multiple domains. All the analysis will be done on a malware file set provided by F-Secure
and the AVTs responses on this file set obtained using the Virus Total API.
Based on the dataset we measure the similarity between different AVTs in order to see if there
are some clusters or communities that share similar “reaction“ to a certain malware files. Thus, we
construct the similarity network G1= (V, E, W 1) in order to characterize the similarity between
different AVTs based on the shared files which they labeled them as malwares. The node set V
consists of AVTs which were reported by Virus Total and the undirected edges set Econtains the
links between the AVTs that have labeled at least one common malicious file, with an edge weight
w1
ij being defined through Jaccardi score of the sets of malware files detected by the two AVTs iand
j. Thus, here we define the similarity between Viand Vjas the co-occurrence strength. Let us
assume that Fiand Fjdenote set of files, labeled as malware by Viand Vj, then we can define the
Jaccardi similarity measure as a co-occurrence strength as follows.
sim(Vi, Vj) = |FiFj|
|FiFj|=w1
ij =w1
ji (1)
where |F|indicates the size of the set F. The value of w1
ij is between 0 and 1 (”0” indicates no
co-occurrence relationship between two AVTs and ”1” indicates a full co-occurrence).
The results show high similarity between certain AVT in their malware detection. Some of the
AVT groups that show high similarity are i) BitDefender, F-Secure, Emsisoft, MicroWorld-eScan and
Ad-Aware ; ii) Arcabit, eTrust-InoculateIT, UNA and T3. This results clearly show that there might
exist grouping in sense of structural communities and/or clusters between different AVTs. This kind
of clustering or grouping might be as a consequence of the fact that different AVTs are specialized
for certain type of malwares (Trojans, Adwares, Exploits, Rootkits, etc.), or malwares written for a
given platform (such as Win32, OSX, Android, etc.) or simply due to the fact that some companies
use engines from other AV companies, such as F-Secure and BitDefender,AVWare and VIPRE.
Acknowledgments
The work in this paper was partially financed by the Faculty of Computer Science and Engineering,
Ss. Cyril and Methodius University in Skopje, as part of the project AVADEEP: Anti-virus analysis
using Deep Web malware files”
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.