Conference Paper

Comparing Machine Learning Techniques for Malware Detection

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We based our initial choice of classifiers on an analysis of existing comparative studies. In [40], the authors conclude that an RF and a feed-forward neural network achieve the best TPR and FPR values, with the latter having the fastest detection time. These two types of models are also identified as best performers for the detection of malicious Windowsexecutables in [41]. ...
Article
Full-text available
This paper analyzes the efficiency of various machine learning models (artificial neural networks, random forest, decision tree, AdaBoost and XGBoost) against the evolution of VBA-based (Visual Basic for Applications) malware over a large period of time (1995–2021). The file set used in our research is comprehensive—approximately 1.9 million files (out of which 944,595 are malicious and the rest are benign)—which allowed to gain insights on the resilience of various machine learning models against the diversity and the evolution of file features that reflect obfuscation techniques in VBA-based malware. In studying detection of VBA-based malware, we focus on characteristics of both the classifiers—proactivity (short-term detection efficiency against future malware), endurance (long-term detection robustness)—and of the detection-wise relevant file features—feature perishability (dynamics of feature relevance). We also describe in some detail—as a prerequisite of the study—various obfuscation techniques used by the malware under investigation during the last decade.
... Due to the persistence of big data ML algorithms are trained using wide and diversified data sets. They could make decisions even with unseen data using the algorithmic models which could extract several features from large amount of data [8,9]. The machine learning algorithms are broadly classified as supervised, unsupervised or reinforcement learning. ...
Article
Full-text available
Today there is a tremendous flow of data to several information systems within fraction of seconds. At the same time the vulnerabilities in the digital infrastructure have been a serious threat to the security of information. The presence of malware in sensitive data may incur huge financial loss or even causes life threatening events. This paper discusses the performance of different ensemble classification algorithms in the detection of malware present in the data. Two benchmark malware datasets are used for evaluation. The various ensemble algorithms like Bagging ensemble, Random Forest, Gradient descent boosting, AdaBoost, Stacking Ensemble, XGBoost, Light GBM Ensemble are compared based on several evaluation metrics namely accuracy, precision (positive, negative), recall (sensitivity and specificity), F1-score, Jaccard score and Hamming Loss. The XGBoost ensemble has resulted in 99% accuracy during the identification of malware with a negligible Hamming loss of 0.014 and 0.013 on the two different data sets.
... 4. Machine learning techniques: Behavioral analysis can rely on modern methods of big data or machine learning to identify and counter new attacks faster. Several algorithms [43] are used for: ...
Article
Full-text available
The threat landscape is continuously evolving and attackers are improving their tactics and techniques. From worms and viruses, initially introduced in 1982, to advanced, targeted and persistent attacks that have emerged in recent years, many verdicts demonstrate that no architecture is invulnerable. Nowadays, malware and cyberthreats are penetrating many platforms and the growth is exponential and a corporate and politically-driven outbreak has surfaced worldwide. A continuous back-and-forth between vulnerabilities and controls directs the evolution of the information age. Besides, intelligent technologies are a dual-use and a new class of smart cyberthreats is arisen. This paper presents a state of the art in computer virology and explores how we leveraged the blockchain technology to create a new form of malware offering a new aspect to the cyber-vector.
Chapter
Malware attacks are growing years after years because of increasing android, IOT along with traditional computing devices. To protect all these devices malware analysis is necessary so that interest of the organizations and individuals can be protected. There are different approaches of malware analysis like static, dynamic and heuristic. As the technology is advancing malware authors also use the advanced malware attacking techniques like obfuscation and packing techniques, which cannot be detect by signature based on static approaches. To overcome all these problems behavior of malware must be analyzed using dynamic approaches. Now a days malware author using some more advanced evasion techniques in which malware suspends its malicious behavior after detecting virtual environment. So, evasion techniques give a new challenge to malware analysis because even dynamic approach some time fails to detect and analyze the malwares.
Chapter
This paper presents a study on the detection performance of MSOffice-embedded malware; the detection models were trained and tested using a very large database of malicious and benign MSOffice documents (1.8 million files), collected over a long period of time (1995–2021). The time-wise comprehensive database allowed us to shed a light on perishability (evolution of feature relevance) and detection performance of anti-malware classifiers. For the latter, we look into proactivity (short-term detection efficiency against future malware) and endurance (long-term detection robustness); aspects of the co-evolution of malware and security products are also discussed.Along the various training and testing timewidths available in the database, our experiments indicate that, on average, neural networks reach higher levels of accuracy in MSOffice-embedded malware detection, while Random Forest achieves lower false-positive rates.KeywordsMSOffice-embedded (VBA) malware detectionMachine learningFeature miningSpear-phishingNeural networksRandom forestClassifier proactivityClassifier endurance
Article
There has been an increasing trend of malware release, which raises the alarm for security professionals worldwide. It is often challenging to stay on top of different types of malware and their detection techniques, which are essential, particularly for researchers and the security community. Analysing malware to get insights into what it intends to perform on the victim’s system is one of the crucial steps towards malware detection. Malware analysis can be performed through static analysis, code analysis, dynamic analysis, memory analysis and hybrid analysis techniques. The next step to malware analysis is the detection model’s design using malware’s extracted patterns from the analysis. Machine learning and deep learning methods have drawn attention to researchers, owing to their ability to implement sophisticated malware detection models that can deal with known and unknown malicious activities. Therefore, this survey presents a comprehensive study and analysis of current malware and detection techniques using the snowball approach. It presents a comprehensive study on malware analysis testbeds, dynamic malware analysis and memory analysis, the taxonomy of malware behaviour analysis tools, datasets repositories, feature selection, machine learning and deep learning techniques. Moreover, comparisons of behaviour-based malware detection techniques have been grouped by categories of machine learning and deep learning techniques. This study also looks at various performance evaluation metrics, current research challenges in this area and possible future direction of research.
ResearchGate has not been able to resolve any references for this publication.