About
90
Publications
60,941
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,698
Citations
Introduction
My research interests include network security analysis, botnet detection, anomaly detection, behavior analysis, machine learning, network visualization, penetration testing, intruders detection and keystroke dynamics. My strong network security background helped my realize that real network data is paramount for any research.
My last work about botnet detection public datasets is on https://www.stratosphereips.org/
Additional affiliations
January 2015 - present
Stratosphere IPS Project
Position
- Researcher
Description
- https://stratosphereips.org
April 2009 - December 2014
Publications
Publications (90)
Botnets are the technological backbone supporting myriad of attacks, including identity stealing, organizational spying, DoS, SPAM, government-sponsored attacks and spying of political dissidents among others. The research community works hard creating detection algorithms of botnet network traffic. These algorithms have been partially successful,...
The results of botnet detection methods are usually presented without any comparison. Although it is generally accepted that more comparisons with third-party methods may help to improve the area, few papers could do it. Among the factors that prevent a comparison are the difficulties to share a dataset, the lack of a good dataset, the absence of a...
Through the analysis of a long-term botnet capture, we identified and modeled the behaviors of its C&C channels. They were found and characterized by periodicity analyses and statistical representations. The relationships found between the behaviors of the UDP, TCP and HTTP C&C channels allowed us to unify them in a general model of the botnet beha...
Botnets’ diversity and dynamism challenge detection and classification algorithms depend heavily on static or protocol-dependant features. Several methods showing promising results were proposed using behavioral-based approaches. The authors conducted an analysis of botnets’ and bots’ most inherent characteristics such as synchronism and network lo...
Botnets are an important security problem on the Internet. They continuously evolve their structure, protocols and attacks. This survey analyzes and compares the most important efforts done in the network-based detection area. It accomplishes four tasks: first, the comparison of previous surveys and the proposal of four new dimensions to analyze th...
Large Language Models (LLMs) have shown remarkable potential across various domains, including cybersecurity. Using commercial cloud-based LLMs may be undesirable due to privacy concerns, costs, and network connectivity constraints. In this paper, we present Hackphyr, a locally fine-tuned LLM to be used as a red-team agent within network security e...
Ransomware-as-a-service (RaaS) is increasing the scale and complexity of ransomware attacks. Understanding the internal operations behind RaaS has been a challenge due to the illegality of such activities. The recent chat leak of the Conti RaaS operator, one of the most infamous ransomware operators on the international scene, offers a key opportun...
Due to the proliferation of malware, defenders are increasingly turning to automation and machine learning as part of the malware detection toolchain. However, machine learning models are susceptible to adversarial attacks, requiring the testing of model and product robustness. Meanwhile, attackers also seek to automate malware generation and evasi...
As machine learning becomes more widely used, the need to study its implications in security and privacy becomes more urgent. Although the body of work in privacy has been steadily growing over the past few years, research on the privacy aspects of machine learning has received less focus than the security aspects. Our contribution in this research...
Due to the proliferation of malware, defenders are increasingly turning to automation and machine learning as part of the malware detection tool-chain. However, machine learning models are susceptible to adversarial attacks, requiring the testing of model and product robustness. Meanwhile, attackers also seek to automate malware generation and evas...
Honeypots are essential tools in cybersecurity. However, most of them (even the high-interaction ones) lack the required realism to engage and fool human attackers. This limitation makes them easily discernible, hindering their effectiveness. This work introduces a novel method to create dynamic and realistic software honeypots based on Large Langu...
Ransomware-as-a-service (RaaS) is increasing the scale and complexity of ransomware attacks. Understanding the internal operations behind RaaS has been a challenge due to the illegality of such activities. The recent chat leak of the Conti RaaS operator, one of the most infamous ran-somware operators on the international scene, offers a key opportu...
Ransomware-as-a-service (RaaS) is increasing the scale and complexity of ransomware attacks. Understanding the internal operations behind RaaS has been a challenge due to the illegality of such activities. The recent chat leak of the Conti RaaS operator, one of the most infamous ransomware operators on the international scene, offers a key opportun...
Large Language Models (LLMs) have gained widespread popularity across diverse domains involving text generation, summarization, and various natural language processing tasks. Despite their inherent limitations, LLM-based designs have shown promising capabilities in planning and navigating open-world scenarios. This paper introduces a novel applicat...
Honeypots have been a long-established form of passive defense in a wide variety of systems. They are often used for the reliability and low false positive rate. However, the deployment of honeypots in the Active Directory (AD) systems is still limited. Intrusion detection in AD systems is a difficult task due to the complexity of the system and it...
Through an inductive thematic analysis of semi-structured interviews with experts, this study corroborates key findings on contextual and organisational dynamics behind profit-driven cybercrime. The findings pinpoint three contextual factors influencing individuals to participate in profit-driven cybercrime: lack of legal economic opportunities, la...
Most network security datasets do not have comprehensive label assignment criteria, hindering the evaluation of the datasets, the training of models, the results obtained, the comparison with other methods, and the evaluation in real-life scenarios. There is no labeling ontology nor tools to help assign the labels, resulting in most analyzed datase...
Honeypots are a well-known and widely used technology in the cybersecurity community, where it is assumed that placing honeypots in different geographical locations provides better visibility and increases effectiveness. However, how geolocation affects the usefulness of honeypots is not well-studied, especially for threat intelligence as early war...
The ongoing rise in cyberattacks and the lack of skilled professionals in the cybersecurity domain to combat these attacks show the need for automated tools capable of detecting an attack with good performance. Attackers disguise their actions and launch attacks that consist of multiple actions, which are difficult to detect. Therefore, improving d...
The cybercrime industry is characterised by work specialisation to the point that it has become a volume industry with various “as-a-service” offerings. One well-established “as-a-service” business model is blackmarket pay-per-install (PPI) services, which outsource the spread of malicious programmes to affiliates. Such a business model represents...
DNS over HTTPS (DoH) is one of the standards to protect the security and privacy of users. The choice of DoH provider has controversial consequences, from monopolisation of surveillance to lost visibility by network administrators and security providers. More importantly, it is a novel security business. Software products and organisations depend o...
Many activities related to cybercrime operations do not require much secrecy, such as developing websites or translating texts. This research provides indications that many users of a popular public internet marketing forum have connections to cybercrime. It does so by investigating the involvement in cybercrime of a population of users interested...
Model stealing attacks have been successfully used in many machine learning domains, but there is little understanding of how these attacks work in the malware detection domain. Malware detection and, in general, security domains have very strong requirements of low false positive rates (FPR). However, these requirements are not the primary focus o...
Many activities related to cybercrime operations do not require much secrecy, such as developing websites or translating texts. This research provides indications that many users of a popular public internet marketing forum have connections to cybercrime. It does so by investigating the involvement in cybercrime of a population of users interested...
Deception technologies, and honeypots in particular, have been used for decades to understand how cyber attacks and attackers work. A myriad of factors impact the effectiveness of a honeypot. However, very few is known about the impact of the geographical location of honeypots on the amount and type of attacks. Hornet 40 is the first dataset design...
Active Directory (AD) is a crucial element of large organizations, given its central role in managing access to resources. Since AD is used by all users in the organization, it is hard to detect attackers. We propose to generate and place fake users (honeyusers) in AD structures to help detect attacks. However, not any honeyuser will attract attack...
Several encryption proposals for DNS have been presented since 2016, but their adoption was not comprehensively studied yet. This research measured the current adoption of DoH (DNS over HTTPS), DoT (DNS over TLS), and DoQ (DNS over QUIC) for five months at the beginning of 2021 by three different organizations with global coverage. By comparing the...
This report presents the current state of security in IPv6 for IoT devices. In this research conducted from May 2020 to July 2020, we explored the global growth of IPv6 and compared it with the real growth of IPv6 in a medium size network. If IPv6 is already being used, are attackers already attacking using this protocol? To answer this question we...
As machine learning becomes more widely used, the need to study its implications in security and privacy becomes more urgent. Research on the security aspects of machine learning, such as adversarial attacks, has received a lot of focus and publicity, but privacy related attacks have received less attention from the research community. Although the...
Domain Name Service is a trusted protocol made for name resolution, but during past years some approaches have been developed to use it for data transfer. DNS Tunneling is a method where data is encoded inside DNS queries, allowing information exchange through the DNS. This characteristic is attractive to hackers who exploit DNS Tunneling method to...
Domain Name Service is a central part of Internet regular operation. Such importance has made it a common target of different malicious behaviors such as the application of Domain Generation Algorithms (DGA) for command and control a group of infected computers or Tunneling techniques for bypassing system administrator restrictions. A common detect...
A Domain Generation Algorithm (DGA) is an algorithm to generate domain names in a deterministic but seemly random way. Mal-ware use DGAs to generate the next domain to access the Command Control (C&C) communication server. Given the simplicity of the generation process and speed at which the domains are generated, a fast and accurate detection meth...
El crecimiento vertiginoso de nuevas tecnologías, trae aparejado el crecimiento de aplicaciones maliciosas. Estas aplicaciones hacen uso de los recursos de los dispositivos infectados para realizar actividades ilícitas, enviar mails de forma masiva (spam) o minar para obtener criptomonedas. Para minar, se re-quiere grandes capacidades de cómputo La...
A pdf file containing links to online resources for the article.
Source Code and web services included.
A Domain Generation Algorithm (DGA) is an algorithm to generate domain names in a deterministic but seemly random way. Malware use DGAs to generate the next domain to access the Command Control (C&C) communication channel. Given the simplicity and velocity associated to the domain generation process, machine learning detection methods emerged as su...
Poster related to the paper "Bringing a GAN to a Knife-fight: Adapting Malware Communication to Avoid Detection",
Generative Adversarial Networks (GANs) have been successfully used in a large number of domains. This paper proposes the use of GANs for generating network traffic in order to mimic other types of traffic. In particular, our method modifies the network behavior of a real malware in order to mimic the traffic of a legitimate application, and therefo...
During the last couple of years there has been an important surge on the use of HTTPs by malware. The reason for this increase is not completely understood yet, but it is hypothesized that it was forced by organizations only allowing web traffic to the Internet. Using HTTPs makes malware behavior similar to normal connections. Therefore, there has...
During the last couple of years there has been an important surge on the use of HTTPs by malware. The reason for this increase is not completely understood yet, but it is hypothesized that it was forced by organizations only allowing web traffic to the Internet. Using HTTPs makes malware behavior similar to normal connections. Therefore, there has...
During the last couple of years there has been an important surge on the use of HTTPs by malware. The reason for this increase is not completely understood yet, but it is hypothesized that it was forced by organizations only allowing web traffic to the Internet. Using HTTPs makes malware behavior similar to normal connections. Therefore, there has...
The python implementation of the LSTM network used for the experiments described in the paper. We have made our best effort for simplyfing the reading of the code. However, if you want to test it and you find some problems/errors feel free to tell us about them.
A normal computer infected with malware is difficult to detect. There have been several approaches in the last years which analyze the behavior of malware and obtain good results. The malware traffic may be detected, but it is very common to miss-detect normal traffic as malicious and generate false positives. This is specially the case when the me...
A normal computer infected with malware is difficult to detect. There have been several approaches in the last years which analyze the behavior of malware and obtain good results. The malware traffic may be detected, but it is very common to miss-detect normal traffic as malicious and generate false positives. This is specially the case when the me...
Some botnets use special algorithms to generate the domain names they need to connect to their command
and control servers. They are refereed as Domain Generation Algorithms. Domain Generation Algorithms generate domain names and tries to resolve their IP addresses. If the domain has an IP address, it is used to connect to that command and control...
A Botnet can be conceived as a group of compromised computers which can be controlled remotely to execute coordinated attacks or commit fraudulent acts. The fact that Botnets keep continuously evolving means that traditional detection approaches are always one step behind. Recently, the behavioral analysis of network traffic has arisen as a way to...
If small botnets are difficult to detect, small Linux botnets staying under the radar are more difficult. This talk describes how we detected a novel Linux botnet in a large organization by analyzing the network connections patterns with our behavioral detection system. The botnet exploits web servers and uses obfuscated python scripts to receive c...
Abstract—The network patterns of Targeted Attacks is very different from the usual malware because of the different attacker’s goals. Therefore, it is difficult to detect targeted attacks looking for DNS anomalies, DGA traffic or HTTP patterns. However, our analysis of targeted attacks reveals novel patterns in their network communication. These pa...
The current malware traffic detection solutions work mostly by using static fingerprints, white and black lists and
crowd sourced Threat Intelligence Analytics. These methods are useful to detect known malware in real time, but are insufficient to detect unknown malicious trends and attacks. Our proposed complementary solution is to analyze the inh...
Current malware traffi c detection solutions work mostly by using static fi ngerprints, white and black lists and crowd-sourced threat intelligence analytics. These methods are useful for detecting known malware in real time, but are insuffi cient for detecting unknown malicious trends and attacks. Our proposed complementary solution is to analyse...
The results achieved so far to detect bots and botnets in the network may be improved if the analysis is done on the traffic of one bot alone. Detecting a bot and a botnet require different methods. While the botnet may be detected by correlating the behavior of several bots in a large amount of traffic, the behavior of one bot can be detected by a...
The current Antivirus, IDS and IPS systems still mostly use fingerprint knowledge, predefined rules, statical features or blacklists to detect and stop malware in the network. While still useful and fast, these technologies do not have the power to recognize the behaviors in the network and therefore can not give the network administrators a semant...
The current Antivirus, IDS and IPS systems still mostly use fingerprint knowledge, predefined rules, statical features or blacklists to detect and stop malware in the network. While still useful and fast, these technologies do not have the power to recognize the behaviors in the network and therefore can not give the network administrators a semant...
This is the bidirectional argus NetFlow file containing the traffic of the Capture 1 described in the paper. It has Botnet, Normal and Background labels. For more information see https://147.32.83.216/publicDatasets/CTU-Malware-Capture-Botnet-42/
Botnets are an important security problem on the Internet. They continuously evolve their structure, protocols and attacks. This survey analyzes and compares the most important efforts carried out in a network-based detection area. It accomplishes four tasks: first, the comparison of previous surveys and the proposal of four new dimensions to analy...
Cell phones have become so personal that detecting them on the street means to detect the owners. By using the information of the phone along with its GPS position it is possible to record and analyze the behavioral patterns of the people in the street. Bluetooth devices are ubiquitous, but until recently, there were no tools to perform bluetooth w...
Bluetooth devices are ubiquitous. However, until recently, there were no tools to perform Bluetooth wardriving. Considering that each cell phone usually identifies one person and that the position of these devices can be stored, it is possible to extract and visualize people's behavior. Most people is not aware that their Bluetooth device allows to...
This a a pcap dataset with the network traffic of one real infected computer. See the paper for more details.
This a a pcap dataset with the network traffic of one real infected computer. See the paper for more details.
This a a pcap dataset with the network traffic of one real infected computer. See the paper for more details.
La detección de botnets mediante técnicas basadas en comportamiento se ha venido utilizando desde hace algunos años. Estás técnicas permiten analizar cómo funciona una botnet y estudiar qué clase de tráfico genera a fin de detectarlas. En este trabajo se estudió el comportamiento en la red de las botnets, calculando un juego de valores por cada ven...
This paper is the result of a PhD microcontrolers programming course. It is intended to show how it was designed, built and programmed an autonomous robot cheap controlled using the Arduino open platform. Having no prior expe-rience in electronics, it was necessary to learn almost every electronic principle and to overcome a lot of mechanical probl...
Este trabajo plantea algunos interrogantes y describe algunas experiencias en la estructura formal de enseñanza y en los métodos no formales de enseñanza-aprendizaje. Se cuenta una experiencia educativa con componentes innovadores en un ambiente formal universitario en donde la motivación, el compromiso y la co-creación tuvieron junto con la tecnol...
Keystroke dynamics is a set of computer techniques that has been used successfully for many years for authentication mechanisms and masqueraders detection. Classification algorithms have reportedly performed well, but there is room for improvement. As obtaining real intruders keystrokes is a very difficult task, it has been a common practice to use...