Gabriela Ramirez-de-la-Rosa

Gabriela Ramirez-de-la-Rosa
Metropolitan Autonomous University | UAM · Departamento de Tecnologías de la Información

M.Sc. in Computer Science

About

40
Publications
5,477
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
118
Citations
Introduction
I am a researcher working on Natural Language Processing. Currently, I am working on methods for plagiarism detection on source code; author profiling, particularly on social media; and short text classification. Also, I am interested in problems such as automatic summarization, personality detection and authorship attribution.
Additional affiliations
August 2013 - present
Metropolitan Autonomous University
Position
  • Professor (Associate)
January 2011 - May 2013
University of Alabama at Birmingham
Position
  • Research Assistant
Education
January 2011 - May 2013
University of Alabama at Birmingham
Field of study
  • Computer and Information Science
August 2008 - December 2010
October 2001 - July 2006

Publications

Publications (40)
Conference Paper
Full-text available
This paper describes our participation in the shared evaluation campaign of MexA3T 2020. Our main goal was to evaluate a Supervised Autoencoder (SAE) learning algorithm in text classification tasks. For our experiments, we used three different sets of features as inputs, namely classic word n-grams, char n-grams, and Spanish BERT encodings. Our res...
Article
Full-text available
Engaged customers are a very import part of current social media marketing. Public figures and brands have to be very careful about what they post online. That is why the need for accurate strategies for anticipating the impact of a post written for an online audience is critical to any public brand. Therefore, in this paper, we propose a method to...
Preprint
Full-text available
Engaged costumers are a very import part of current social media marketing. Public figures and brands have to be very careful about what to post online. That is why the need for accurate strategies for anticipating the impact of a post written for an online audience is critical to any public brand. Therefore, in this paper, we propose a method to p...
Chapter
This project researches if critical reception facing YouTube content can be promoted in children between nine and eleven years old, focusing in analysis, evaluation and taking a stand facing the videos of risky challenges. This is an interdisciplinary project that used a mixed methodology, mostly quantitative which consists in four phases, two of t...
Article
Available at: https://www.rcs.cic.ipn.mx/2018_147_7/Identificando%20signos%20de%20anorexia%20y%20depresion%20en%20usuarios%20de%20redes%20sociales.pdf
Article
Resources such as labeled corpora are necessary to train automatic models within the natural language processing (NLP) field. Historically, a large number of resources regarding a broad number of problems are available mostly in English. One of such problems is known as Personality Identification where based on a psychological model (e.g. The Big F...
Article
Full-text available
Automatic detection of source code plagiarism is an important research field for both the commercial software industry and within the research community. Existing methods of plagiarism detection primarily involve exhaustive pairwise document comparison, which does not scale well for large software collections. To achieve scalability, we approach th...
Conference Paper
Nowadays, Twitter depicts a rich source of on-line reviews, ratings, recommendations, and other forms of opinion expressions. This scenario has created the compelling demand to develop innovative mechanisms to store, search, organize and analyze all this data automatically. Unfortunately, it is seldom available to have enough labeled data in Twitte...
Conference Paper
The vast amount of electronic documents available on the Internet demands for automatic tools that help people finding, organizing and easily accessing to all this information. Although current text classification methods have alleviated some of the above problems, such strategies depend on having a large and reliable set of labeled data. In order...
Presentation
Full-text available
Nowadays, Twitter depicts a rich source of on-line reviews, ratings, recommendations, and other forms of opinion expressions. This scenario has created the compelling demand to develop innovative mechanisms to store, search, organize and analyze all this data automatically. Unfortunately, it is seldom available to have enough labeled data in Twitte...
Conference Paper
Full-text available
A conversational agent, also known as chatbot, is a machine conversational system which interacts with human users via natural language. Traditionally, chatbot technology is built under certain set of “manually” elaborated conversational rules. However, given the availability of large and real examples of humans’ interactions in the web, automatica...
Article
Full-text available
Twitter es una fuente de información para muchas tareas del procesamiento del lenguaje natural. Particularmente para tareas de perfilado de autores, es decir, la tarea de determinar mediante el texto de un autor características demográficas de éste, por ejemplo, género, edad y personalidad. Tradicionalmente este problema se resuelve mediante un enf...
Conference Paper
Psychologists have long theorized about the effects of birth order on intellectual development and verbal abilities. Several studies within the field of psychology have tried to prove such theories, however no concrete evidence has been found yet. Therefore, in this paper we present an empirical analysis on the pertinence of traditional Author Prof...
Conference Paper
Source code plagiarism can be identified by analyzing similarities of several and diverse aspects of a pair of source code. In this paper we present three types of similarity features that account for three aspects of source code documents, particularly: i) lexical, ii) structural, and iii) stylistics. From the lexical view, we used a character 3-g...
Article
Full-text available
Source code plagiarism can be identified by analyzing several and diverse views of a pair of source code. In this paper we present three representations from lexical and structural views of a given source code. We attempt to show that different representations provide diverse information that can be useful to identify plagiarism. In particular, we...
Conference Paper
Classifying malware into correct families is an important task for anti-virus vendors. Currently, only some of them will recognize a particular malware. Even when they do, they either classify them into different families or use a generic family name, which does not provide much information. Our method for malware family identification is based on...
Conference Paper
Online communities are filled with comments of loyal readers or first-time viewers, that are constantly creating and sharing information at an unprecedented level, resulting in millions of messages containing opinions, ideas, needs and beliefs of Internet users. Therefore, businesses companies are very interested in finding influential users and en...
Conference Paper
Full-text available
This paper describes the participation of the Language and Reasoning Group of UAM at RepLab 2014 Author Profiling evaluation lab. This task involves author categorization and author ranking subtasks. Our method for author categorization uses a supervised approach based on the idea that we can use the information on Twitter's user profile, then by m...
Conference Paper
Full-text available
We present a set of new measures designed to reveal latent information of language use in children at the lexico-syntactic level. We used these metrics to analyze linguistic patterns in spontaneous narratives from children developing typically and children identified as having a language impairment. We observed significant differences in the z-scor...
Article
Full-text available
This paper describes an approach for the author profiling task of the PAN 2013 challenge. This work is based on the idea of linguistic modality 3 that has been successfully used in other classification tasks such as authorship attribution. We consider three different modalities: syntactic, stylistic, and semantic, each representing a different aspe...
Article
During the last decades the Web has become the greatest repository of digital information. In order to organize all this information, several text categorization methods have been developed, achieving accurate results in most cases and in very different domains. Due to the recent usage of Internet as communication media, short texts such as news, t...
Conference Paper
Full-text available
This paper presents results on a preliminary study using syntactic information to predict language dominance in Spanish-English bilingual children. Our approach uses a bag of syntactic grammar rules taken from narratives in English and Spanish. We then measure prediction accuracy of categorizing children into Spanish-dominant, English-dominant, and...
Conference Paper
Full-text available
Crosslingual text classification consists of exploiting labeled documents in a source language to classify documents in a different target language. In addition to the evident translation problem, this task also faces some difficulties caused by the cultural discrepancies manifested in both languages by means of different topic distributions. Such...
Conference Paper
Full-text available
Current text classification methods are mostly based on a supervised approach, which require a large number of examples to build models accurate. Unfortunately, in several tasks training sets are extremely small and their generation is very expensive. In order to tackle this problem in this paper we propose a new text classification method that tak...
Article
Full-text available
Hierarchical document clustering produces a tree of clusters that is often stored as a plain text file or as a graphic. These representations are not easily understood by non-expert users and need to be processed before appli-cations can use them. This paper proposes to represent do-cument hierarchies by means of markup languages in order to improv...

Network

Cited By