
Barbara Poblete- PhD
- Professor (Assistant) at University of Chile
Barbara Poblete
- PhD
- Professor (Assistant) at University of Chile
About
69
Publications
34,987
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,345
Citations
Introduction
Current institution
Publications
Publications (69)
Social media data has emerged as a useful source of timely information about real-world crisis events. One of the main tasks related to the use of social media for disaster management is the automatic identification of crisis-related messages. Most of the studies on this topic have focused on the analysis of data for a particular type of event in a...
Online e-commerce product reviews can be highly influential in a customer's decision-making processes. Reviews often describe personal experiences with a product and provide candid opinions about a product's pros and cons. In some cases, reviewers choose to share information about themselves, just as they might do in social platforms. These descrip...
Chile experienced a series of important protests between Oc-tober and December 2019. This social unrest, as it was called, was fueled by social inequity and radically affected the na-tion's status quo. A large portion of the population demanded a new Constitution and changes to the current government, whereas another part of the population rejected...
Chile experienced a series of important protests between October and December 2019. This social unrest, as it was called, was fueled by social inequity and radically affected the nation's status quo. A large portion of the population demanded a new Constitution and changes to the current government, whereas another part of the population rejected t...
The Entity Linking (EL) task involves linking mentions of entities in a text with their identifier in a Knowledge Base (KB) such as Wikipedia, BabelNet, DBpedia, Freebase, Wikidata, YAGO, etc. Numerous techniques have been proposed to address this task down through the years. However, not all works adopt the same convention regarding the entities t...
The Modified Mercalli intensity scale (Mercalli scale for short) is a qualitative measure used to express the perceived intensity of an earthquake in terms of damages. Accurate intensity reports are vital to estimate the type of emergency response required for a particular earthquake. In addition, Mercalli scale reports are needed to estimate the p...
Community Question Answering (cQA) sites have emerged as platforms designed specifically for the exchange of questions and answers among communities of users. Although users tend to find good quality answers in cQA sites, there is evidence that they also engage in a significant volume of QA in other types of social sites, such as microblog platform...
The sheer amount of newsworthy information published by users in social media platforms makes it necessary to have efficient and effective methods to filter and organize content. In this scenario, off-the-shelf methods fail to process large amounts of data, which is usually approached by adding more computational resources. Simple data aggregations...
The Mercalli scale of quake damages is based on perceived effects and it has a strong dependence on observers. Recently, we proposed a method for ground shaking intensity estimation based on lexical features extracted from tweets, showing good performance in terms of mean absolute error (MAE). One of the flaws of that method is the detection of the...
The Entity Linking (EL) task identifies entity mentions in a text corpus and associates them with a corresponding unambiguous entry in a Knowledge Base. The evaluation of EL systems relies on the comparison of their results against gold standards. A common format used to represent gold standard datasets is the NLP Interchange Format (NIF), which us...
The Entity Linking (EL) task identifies entity mentions in a text corpus and associates them with a corresponding unambiguous entry in a Knowledge Base. The evaluation of EL systems relies on the comparison of their results against gold standards. A common format used to represent gold standard datasets is the NLP Interchange Format (NIF), which us...
Entity Linking (EL) associates the entities mentioned in a given input text with their corresponding knowledge-base (KB) entries. A recent EL trend is towards multilingual approaches. However, one may ask: are multilingual EL approaches necessary with recent advancements in machine translation? Could we not simply focus on supporting one language i...
Timely detection and accurate description of extreme events, such as natural disasters and other crisis situations, is crucial for emergency management and mitigation. However, this task can be challenging as it may rely on reports from human observers appointed to specific geographical areas, or on expensive and sophisticated infrastructure. In th...
Since its invention, the Web has evolved into the largest multimedia repository that has ever existed. This evolution is a direct result of the explosion of user-generated content, explained by the wide adoption of social network platforms. The vast amount of multimedia content requires effective management and retrieval techniques. Nevertheless, W...
Some decades have passed since the concept of "named entity" was used for the first time. Since then, new lines of research have emerged in this environment, such as linking the (named) entity mentions in a text collection with their corresponding knowledge-base entries. However, this introduces problems with respect to a consensus on the definitio...
In general, existing methods for automatically detecting emergency situations using Twitter rely on features based on domain-specific keywords found in messages. This type of keyword-based methods usually require training on domain-specific labeled data, using multiple languages, and for different types of events (e.g., earthquakes, floods, wildfir...
Community Question Answering (cQA) sites have emerged as platforms designed specifically for the exchange of questions and answers among users. Although users tend to find good quality answers in cQA sites, they also engage in a significant volume of QA interactions in other platforms, such as microblog
networking sites. This in part is explained b...
The immense growth of the social Web, which has made a large amount of user data easily and publicly available, has opened a whole new spectrum for research in social behavioral sciences. However, as the volume of social media content increases at a very fast rate, it becomes extremely difficult to systematically obtain high-level information from...
The Entity Linking (EL) task is concerned with linking entity mentions in a text collection with their corresponding knowledge-base entries. The majority of approaches have focused on EL over English text collections. However, some approaches propose language-independent or multilingual approaches to perform EL over texts in many languages. In this...
We propose an algorithm and system that detects earthquakes worldwide in real time based on reports of social media users, or "citizen-sensors." Earthquake detections are based on user postings in any language and from any region. This approach is unsupervised, adapting automatically to changes in the input data stream, and only requires a general...
On-line social networks publish information on a high volume of real-world events almost instantly, becoming a primary source for breaking news. Some of these real-world events can end up having a very strong impact on on-line social networks. The effect of such events can be analyzed from several perspectives, one of them being the intensity and c...
On-line social networks publish information about an enormous volume of
real-world events almost instantly, becoming a primary source for breaking
news. Many of the events reported in social media can be of high-impact to
society, such as important political decisions, natural disasters and terrorist
actions, but might go unnoticed in their early s...
Community Question and Answering (Q&A) sites provide special features for asking questions and receiving answers from users on the Web. Nevertheless, Web users do not restrict themselves to posting their questions exclusively in these platforms. With the massification of on-line social networks (OSN), such as Twitter, users are increasingly sharing...
Online Social Networks (OSN) have changed the way information is produced and consumed. Organizing and retrieving unstructured data extracted from these platforms is not an easy task. Galean is a visual and interactive tool that aims to help journalists and historians, among others, analyze news events discussed on Twitter. In this tool, news event...
Nowadays, social media services are being used extensively as news sources and for spreading information on real-world events. Several studies have focused on detecting those events and locating them geographically. However, in order to study real-world events, for example, finding relationships between locations or detecting high impact events bas...
In one embodiment, a search query is received. Information identifying a bookmark representing the search query is automatically stored in association with a set of bookmarks. Search results corresponding to the search query are automatically obtained and provided, where the search results identify one or more documents. When one of the documents i...
People react to events, topics and entities by expressing their personal opinions and emotions. These reactions can correspond to a wide range of intensities, from very mild to strong. An adequate processing and understanding of these expressions has been the subject of research in several fields, such as business and politics. In this context, Twi...
A specification of a target web site is received. A number of field web sites related to the target web site are identified. Data values are acquired for a set of metrics for the target and each field web site. These data values are processed to evaluate a standing of the target web site relative to the field web sites, while maintaining anonymity...
Methods and apparatus are described for classifying documents using a document representation model based on implicit user feedback obtained from search engine queries. The model may be used to achieve better results in non-supervised tasks such as clustering and labeling through the incorporation of usage data obtained from the search engine queri...
We present a novel methodology for creating multimedia summaries of real-world events through social media information. Summaries are generated using selected multimedia data disseminated through Twitter. The proposed summarization technique takes into account social indicators of relevance, which are used to select a set of representative multimed...
Purpose
– Twitter is a popular microblogging service which has proven, in recent years, its potential for propagating news and information about developing events. The purpose of this paper is to focus on the analysis of information credibility on Twitter. The purpose of our research is to establish if an automatic discovery process of relevant and...
Online social networks are known to be demographically biased. Currently
there are questions about what degree of representativity of the physical
population they have, and how population biases impact user-generated content.
In this paper we focus on centralism, a problem affecting Chile. Assuming that
local differences exist in a country, in term...
Twitter sentiment analysis or the task of automatically retrieving opinions from tweets has received an increasing interest from the web mining community. This is due to its importance in a wide
range of fields such as business and politics. People express sentiments about specific topics or entities with different strengths and intensities, where...
Usually time series are controlled by generative processes which display changes over time. On many occasions, two or more generative processes may switch forcing the abrupt replacement of a fitted time series model by another one. We claim that the incorporation of past data can be useful in the presence of concept shift. We believe that history t...
On-line social networks have become a massive communication and information channel for users world-wide. In particular, the microblogging platform Twitter, is characterized by short-text message exchanges at extremely high rates. In this type of scenario, the detection of emerging topics in text streams becomes an important research area, essentia...
Toolbar navigation logs provide rich data for enhancing information discovery on the Web. The value of this data resides in its scope, which goes beyond that of traditional query-mining data sources, such as search-engine logs. In this paper we present a methodology for extracting relevant association rules for queries, based on historic user navig...
In this work we conduct an empirical study of opinion time series created from Twitter data regarding the 2008 U.S. elections. The focus of our proposal is to establish whether a time series is appropriate or not for generating a reliable predictive model. We analyze time series obtained from Twitter messages related to the 2008 U.S. elections usin...
Web sites are grouped by generating feature space representations of documents, and aggregating the feature space representations into web site vectors. A document vector may be generated for each document of a plurality of documents associated with a set of web sites according to a query-based feature space model. The query-based feature space mod...
Annotating or tagging multimedia objects is an important task for enhancing multimedia information retrieval processes. In the context of the Web, automatic tagging deals with many issues, such as loosely tagged images and huge collections of images with no textual data at all. Recently, graph representations have been shown useful for modeling rel...
Social media services have spread throughout the world in just a few years. They have become not only a new source of information, but also new mechanisms for societies world-wide to organize themselves and communicate. Therefore, social media has a very strong impact in many aspects -- at personal level, in business, and in politics, among many ot...
We analyze the information credibility of news propagated through Twitter, a popular microblogging service. Previous research has shown that most of the messages posted on Twitter are truthful, but the service is also used to spread misinformation and false rumors, often unintentionally. On this paper we focus on automatic methods for assessing the...
The fast increase in the ease of access to computing, cou-pled with the rapid growth of social media has provided the space and motivated people all over the world to publicly share many kinds of information, from general interest top-ics such as elections and fashion to private topics such as the user's mood. The widespread use of microblogging se...
We explore an effective approach for modeling and classifying Web sites in the World Wide Web. The aim of this work is to classify Web sites using features which are independent of size, structure and vocabulary. We establish Web site similarity based on search engine query hits, which convey document relevance and utility in direct relation to use...
We explore the application of a graph representation to model similarity relationships that exist among images found on the Web. The resulting similarity-induced graph allows us to model in a unified way different types of content-based similarities, as well as semantic relationships. Content-based similarities include different image descriptors,...
In this article we explore the behavior of Twitter users under an emergency situation. In particular, we analyze the activity related to the 2010 earthquake in Chile and characterize Twitter in the hours and days following this disaster. Furthermore, we perform a pre-liminary study of certain social phenomenons, such as the dissem-ination of false...
We introduce the concern of confidentiality protection of business information for the publication of search engine query logs and derived data. We study business confidentiality, as the protection of nonpublic data from institutions, such as companies and people in the public eye. In particular, we relate this concern to the involuntary exposure o...
Web search engines are composed of a large set of search nodes and a broker machine that feeds them with queries. A location cache keeps minimal information in the broker to register the search nodes capable of producing the top-N results for frequent queries. In this paper we show that it is possible to use the location cache as a training dataset...
We introduce a unified graph representation of the Web, which includes both structural and usage information. We model this graph using a simple union of the Web's hyperlink and click graphs. The hyperlink graph expresses link structure among Web pages, while the click graph is a bipartite graph of queries and documents denoting users' searching be...
In this paper we present a new document representation model based on implicit user feedback obtained from search engine queries. The main objective of this model is to achieve better results in non-supervised tasks, such as clustering and labeling, through the incorporation of usage data obtained from search engine queries. This type of model allo...
In this paper we study privacy preservation for the publica- tion of search engine query logs. In particular, we introduce a new privacy concern, which is that of website privacy (or business privacy). We define the possible adversaries that could be interested in disclosing website information and the vulnerabilities found in the query log, from w...
We present a model for mining user queries found within the access logs of a website and for relating this information to
the website’s overall usage, structure and content. The aim of this model is to discover, in a simple way, valuable information
to improve the quality of the website, allowing the website to become more intuitive and adequate fo...
""En este artículo comparamos dos algoritmos de clustering, uno difuso y otro no difuso, para agrupar los documentos de un sitio Web. En la mayoría de los casos el método ‘categórico’ k-Means es utilizado como método ‘de facto’ para el clustering de documentos en la Web, así como otras técnicas no-difusas. Es por este motivo que en este estudio rea...
In this paper we present a large scale study on the evolution of the Web structure of the Chilean domain (.cl) from 2000 to 2004, focusing on the Web site transitions in the structure. This is the study of the largest time span and the most detailed of its kind. Our results show that there are many stable Web sites, but also a majority of chaotic c...
We present a novel model for validating and improving the content and structure organization of a website. This model studies the website as a graph and evaluates its interconnec- tivity in relation to the similarity of its documents. The aim of this model is to provide a simple way for improving the overall structure, contents and interconnectivit...
In this article we present a model for mining queries in Web sites. This model relates the information provided by queries found in a site, with the site's usage, content and structure. The main goal of our model is to discover in simple way, valuable information on how to improve the structure and content of the site, allowing the site to become m...
We present the evolution of the structure of the Chilean Web between 2000 and 2002. Our results show that although the Web grows as expected, also a significant part of it disappears. In addition, some components are much more stable than others. We also compare the expected life cycle of a Web site in the structure with the actual real data.
Resumen En este trabajo presentamos un modelo para hacer minería de consultas en sitios Web. Este modelo re-laciona la información aportada por las consultas encontradas al interior de un sitio, con los datos de uso, contenido y estructura de éste. El principal ob-jetivo de nuestro modelo es descubrir en forma sim-ple, información valiosa acerca de...
The fast increase in the ease of access to computing, cou-pled with the rapid growth of social media has provided the space and motivated people all over the world to publicly share many kinds of information, from general interest top-ics such as elections and fashion to private topics such as the user's mood. The widespread use of microblogging se...