Alejandro Mosquera

Alejandro Mosquera
Broadcom Corporation · Software

About

39
Publications
17,793
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
195
Citations
Introduction
Online safety expert and cyber-security researcher with a Natural Language Processing (NLP) background. Currently developing systems and methods for automatic threat hunting: malicious network traffic analysis, malware analysis, unknown threat categorization, messaging abuse filters, APT detection and attack chain inference based on machine learning.
Additional affiliations
September 2013 - January 2020
Symantec Corporation
Position
  • Principal Research Engineer
September 2011 - December 2013
University of Alicante
Position
  • PhD candidate

Publications

Publications (39)
Conference Paper
Full-text available
Modern malware typically makes use of a domain generation algorithm (DGA) to avoid command and control domains or IPs being seized or sinkholed. This means that an infected system may attempt to access many domains in an attempt to contact the command and control server. Therefore, the automatic detection of DGA domains is an important task, both f...
Preprint
Full-text available
This paper describes the author's participation in the 3rd edition of the Machine Learning Security Evasion Competition (MLSEC-2021) sponsored by CUJO AI, VM-Ray, MRG-Effitas, Nvidia and Microsoft. As in the previous year the goal was not only developing measures against adversarial attacks on a pre-defined set of malware samples but also finding w...
Conference Paper
Full-text available
Social media publications can inadvertently reveal a broad amount of potentially sensitive information such as gender, age, ethnicity or political ideology even when are not intentionally disclosed by their authors. For this reason, social media users concerned about the fact that Natural Language Processing techniques can infer some of these trait...
Conference Paper
Full-text available
This paper describes the use of AutoNLP techniques applied to the detection of patronizing and condescending language (PCL) in a binary classification scenario. The proposed approach combines meta-learning, in order to identify the best performing combination of deep learning architectures, with the synthesis of adversarial training examples; thus...
Conference Paper
Full-text available
This paper describes the winning approach in the first automated German text complexity assessment shared task as part of KONVENS 2022. To solve this difficult problem, the evaluated system relies on an ensemble of regression models that successfully combines both traditional feature engineering and pre-trained resources. Moreover, the use of adver...
Conference Paper
Full-text available
This paper presents the system submitted to the DETOXIS 2021 challenge for detecting toxicity in Spanish social media texts. The chosen approach relies on an ensemble of different neural network architectures including thread and topic features as side information. For sub-task 1, we have also applied machine translation in order to reuse linguisti...
Article
The data made available by Web 2.0 applications such as social networks, on-line chats or blogs have give access to multiples sources of information. Due to this dramatic increase in available information, the perception of quality and credibility plays an important role in social media, thus making necessary to discard low quality and uninterestin...
Conference Paper
Full-text available
This paper revisits feature engineering approaches for predicting the complexity level of English words in a particular context using regression techniques. Our best submission to the Lexical Complexity Prediction (LCP) shared task was ranked 3rd out of 48 systems for sub-task 1 and achieved Pearson correlation coefficients of 0.779 and 0.809 for s...
Conference Paper
Full-text available
This paper describes a method and system to solve the problem of detecting offensive language in social media using anti-adversarial features. Our submission to the SemEval-2020 task 12challenge was generated by an stacked ensemble of neural networks fine-tuned on the OLID dataset and additional external sources. For Task-A (English), text normalis...
Preprint
Full-text available
Modern malware typically makes use of a domain generation algorithm (DGA) to avoid command and control domains or IPs being seized or sinkholed. This means that an infected system may attempt to access many domains in an attempt to contact the command and control server. Therefore, the automatic detection of DGA domains is an important task, both f...
Article
Full-text available
The writing style used in social media usually contains informal elements that can lower the performance of Natural Language Processing applications. For this reason, text normalisation techniques have drawn a lot of attention recently when dealing with informal content. However, not all the texts present the same level of informality and may not r...
Article
The mobile threat landscape has undergone rapid growth as smartphones have increased in popularity. The first generation of mobile threats saw attackers relying on various scams delivered through SMS. As the technology progressed and Web browsers, e-mail clients, and custom applications became standard on smartphones, attackers started exploiting n...
Conference Paper
Spam has been infesting our emails and Web experience for decades; distributing phishing scams, adult/dating scams, rogue security software, ransomware, money laundering and banking scams... the list goes on. Fortunately, in the last few years, user awareness has increased and email spam filters have become more effective, catching over 99% of spam...
Article
Full-text available
The use of short text messages in social media and instant messaging has become a popular communication channel during the last years. This rising popularity has caused an increment in messaging threats such as spam, phishing or malware as well as other threats. The processing of these short text message threats could pose additional challenges suc...
Conference Paper
Full-text available
User-generated content has become a recurrent resource for NLP tools and applications, hence many efforts have been made lately in order to handle the noise present in short social media texts. The use of normalisation techniques has been proven useful for identifying and replacing lexical variants on some of the most informal genres such as microb...
Conference Paper
Full-text available
This paper describes our participation in the profiling (polarity classification) task of the RepLab 2013 workshop. This task is focused on determining whether a given text from Twitter contains a positive or a negative statement related to the reputation of a given entity. We cover three different approaches, one unsupervised and two unsupervised....
Article
This project is focused on textual treatment in Spanish in order to reduce language barriers that hinder hearing impaired people from reading comprehension, or even people learning a new language. This paper describes the methodology used to face the different problems related to the proposed objective, as well as the working hypothesis and partial...
Conference Paper
Full-text available
In management that we understand how Sustainable should to consider criteria like Economic, Social, Environmental and Technical properly. In an analytical assessment process which involves multiple criteria and alternatives the decision process is complicated. Besides the criteria may have two types: objective and subjective. The objective criteria...
Conference Paper
Full-text available
El Índice de Área Foliar (IAF) – LAI en inglés – es la principal variable para modelizar muchos procesos de fisiología vegetal, como por ejemplo la evapotranspiración, la capacidad fotosintética o la captura de carbono. Para el cálculo de este índice es necesario realizar una estimación de la superficie foliar total, por lo que es necesario conocer...
Conference Paper
Full-text available
La resolución de problemas de optimización basados en programación lineal por el método de Gauss-Jordan ha demostrado ser viable y aconsejable a la hora de resolver problemas de gestión forestal. Sin embargo, cuando se aplica a problemas de gran magnitud, tal como la gestión de cortas en montes grandes que se dividen en multitud de rodales, la func...
Conference Paper
Full-text available
In this paper, we describe the development and performance of the supervised system UMCC_DLSI-(SA). This system uses corpora where phrases are annotated as Positive, Negative, Objective, and Neutral, to achieve new sentiment resources involving word dictionaries with their associated polarity. As a result, new sentiment inventories are obtained and...
Article
Full-text available
La resolución de problemas de optimización basados en programación lineal por el método de Gauss-Jordan ha demostrado ser viable y aconsejable a la hora de resolver problemas de gestión forestal. Sin embargo, cuando se aplica a problemas de gran magnitud, tal como la gestión de cortas en montes grandes que se dividen en multitud de rodales, la func...
Article
Full-text available
The lexical richness and its ease of access to large volumes of information converts the Web 2.0 into an important resource for Natural Language Processing. Nevertheless, the frequent presence of non-normative linguistic phenomena that can make any automatic processing challenging. In this paper is described the participation in the Text Normalisat...
Article
Full-text available
A basic task in opinion mining deals with determining the overall polarity orientation of a document about some topic. This has several applications such as detecting consumer opinions in on-line product reviews or increasing the effectiveness of social media marketing campaigns. However, the informal features of Web 2.0 texts can affect the perfor...
Conference Paper
Full-text available
The lexical richness and its ease of access to large volumes of information converts the Web 2.0 into an important resource for Natural Language Processing. Nevertheless, the frequent presence of non-normative linguistic phenomena that can make any automatic processing challenging. We therefore propose in this study the normalisation of non-normati...
Conference Paper
Full-text available
The language used in Web 2.0 applications such as blogging platforms, realtime chats, social networks or collaborative encyclopaedias shows remarkable differences in comparison with traditional texts. The presence of informal features such as emoticons, spelling errors or Internet-specific slang can lower the performance of Natural Language Process...
Conference Paper
Full-text available
The Web 2.0, through its different platforms, such as blogs, social networks, microblogs, or forums allows users to freely write content on the Internet, with the purpose to provide, share and use information. However, the non-standard features of the language used in Web2.0 publications can make social media content less accessible than traditiona...
Conference Paper
The data made available by Web 2.0 applications such as social networks, on-line chats or blogs have give access to multiples sources of information. Due to this dramatic increase in available information, the perception of quality and credibility plays an important role in social media, thus making necessary to discard low quality and uninterestin...
Conference Paper
Full-text available
The study of the language used in Web 2.0 applications such as social networks, blogging platforms or on-line chats is a very interesting topic and can be used to test linguistic or social theories. However the existence of language deviations such as typos, emoticons, abuse of acronyms and domain-specific slang makes any linguistic analysis challe...
Conference Paper
User-generated content (UGC) has transformed the way that information is handled on-line. In this paradigm shift, users create, share and consume textual information that is likely to present informal features such as poor formatting, misspellings, phonetic transliterations, slang or lexical variants (Ritter et. al., 2010). These texts found in soc...
Article
El análisis de textos de la Web 2.0 es un tema de investigación relevante hoy en día. Sin embargo, son muchos los problemas que se plantean a la hora de utilizar las herramientas actuales en este tipo de textos. Para ser capaces de medir estas dificultades primero necesitamos conocer los diferentes registros o grados deinformalidad que podemos encon...
Conference Paper
Full-text available
Social media publications are a popular and valuable source of information for Natural Language Processing applications. The linguistic analysis of these texts is a challenging task as a consequence of their informal nature. This paper explores the characterization of informality levels in Web 2.0 texts using unsupervised machine learning technique...
Conference Paper
Full-text available
The study of text informality can provide us with valuable information for different NLP tasks. In the particular case of social media texts, their special characteristics like the presence of emoticons, slang or colloquial words can be used for obtaining additional information about their informality level. This pa-per demonstrates that the discov...
Article
El análisis de textos de la Web 2.0 es un tema de investigación relevante hoy en día. Sin embargo, son muchos los problemas que se plantean a la hora de utilizar las herramientas actuales en este tipo de textos. Para ser capaces de medir estas dificultades primero necesitamos conocer los diferentes registros o grados de informalidad que podemos enco...
Article
Full-text available
Analysis of Web 2.0 texts is a relevant investigation topic nowadays. However, many problems arise when using state of the art tools in this kind of texts. For being able to measure these difficulties first we need to identify the different registers or informality levels that we can find. Therefore, in this paper we will attempt to characterize th...

Network

Cited By