Daniel Gayo-Avello

Daniel Gayo-Avello
University of Oviedo | UNIOVI · Department of Information Technology

PhD Computer Science

About

89
Publications
25,014
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,975
Citations
Introduction
My main area of interest is Web IR but I'm currently focused on social media research. I have published in international conferences, journals and magazines, such as Communications of the ACM, IEEE Internet Computing, or IEEE Multimedia. In 2013 I acted as guest co-editor for a special issue of Internet Research on the predictive power of social media. I have recently contributed a chapter on Political Opinion to "Twitter: A Digital Socioscope", published by Cambridge University Press.
Additional affiliations
November 2000 - present
University of Oviedo
Position
  • Professor (Associate)

Publications

Publications (89)
Article
Full-text available
There is an increasing number of projects based on Knowledge Graphs and SPARQL endpoints. These SPARQL endpoints are later queried by final users or used to feed many different kinds of applications. Shape languages, such as ShEx and SHACL, have emerged to guide the evolution of these graphs and to validate their expected topology. However, authori...
Article
Full-text available
The amount, size, complexity, and importance of Knowledge Graphs (KGs) have increased during the last decade. Many different communities have chosen to publish their datasets using Linked Data principles, which favors the integration of this information with many other sources published using the same principles and technologies. Such a scenario re...
Book
Cambridge Core - Computing and Society - Retooling Politics - by Andreas Jungherr
Chapter
The proliferation of large databases with potentially repeated entities across the World Wide Web drives into a generalized interest to find methods to detect duplicated entries. The heterogeneity of the data cause that generalist approaches may produce a poor performance in scenarios with distinguishing features. In this paper, we analyze the part...
Article
Full-text available
In this paper, the authors describe Musical Entities Reconciliation Architecture (MERA), an architecture designed to link music-related databases adapting the reconciliation techniques to each particular case. MERA includes mechanisms to manage third party sources to improve the results and it makes use of semantic technologies, storing and organiz...
Article
After losing the presidential elections in Iran in 2009, candidate Mir-Hossein Mousavi and his supporters claimed electoral fraud and confronted the regime forces in bloody clashes. In January 2011, president Zine El Abidine Ben Ali (in office since 1987) fled Tunisia after massive demonstrations spread through the country. After that, a number of...
Article
Twitter is among the commonest sources of data employed in social media research mainly because of its convenient APIs to collect tweets. However, most researchers do not have access to the expensive Firehose and Twitter Historical Archive, and they must rely on data collected with free APIs whose representativeness has been questioned. In 2010 the...
Article
Search engine logs store detailed information on Web users interactions. Thus, as more and more people use search engines on a daily basis, important trails of users common knowledge are being recorded in those files. Previous research has shown that it is possible to extract concept taxonomies from full text documents, while other scholars have pr...
Article
Full-text available
The confluence of social media with political action is a complex field raising important questions. Is social media a realm for democratic deliberation? Can we ascertain public opinion from social media outlets? How are people using social media for political participation? Can social media boost democracy in authoritarian regimes? Here, the autho...
Article
Despite being a fairly recent phenomenon, microblogging has attracted a large number of researchers and practitioners who consider microposts a suitable source of data to ascertain public opinion. Among the reasons for that interest, we may find the fact that one single platform (i.e., Twitter) is the default choice for users; the ease with which o...
Article
Full-text available
In light of the growing importance of web-based data in the social and behavioral sciences, WEBDATANET was established in 2011 as a COST Action (IS 1004) to create a multidisciplinary network of web-based data collection experts: (web) survey methodologists, psychologists, sociologists, linguists, economists, Internet scientists, media and public o...
Article
Full-text available
Purpose – Social media provide an impressive amount of data about users and their interactions, thereby offering computer and social scientists, economists, and statisticians – among others – new opportunities for research. Arguably, one of the most interesting lines of work is that of predicting future events and developments from social media dat...
Conference Paper
Full-text available
Usually time series are controlled by generative processes which display changes over time. On many occasions, two or more generative processes may switch forcing the abrupt replacement of a fitted time series model by another one. We claim that the incorporation of past data can be useful in the presence of concept shift. We believe that history t...
Article
Full-text available
Our group in the Department of Informatics at the University of Oviedo has participated, for the first time, in two tasks at CLEF: monolingual (Russian) and bilingual (Spanish-to-English) information retrieval. Our main goal was to test the application to IR of a modified version of the n-gram vector space model (codenamed blindLight). This new app...
Article
Full-text available
Predicting X from Twitter is a popular fad within the Twitter research subculture. It seems both appealing and relatively easy. Among such studies, electoral prediction is maybe the most attractive, and a growing body of literature exists on this topic. This research problem isn't only interesting, but is also extremely difficult. However, most aut...
Conference Paper
Full-text available
In this work we conduct an empirical study of opinion time series created from Twitter data regarding the 2008 U.S. elections. The focus of our proposal is to establish whether a time series is appropriate or not for generating a reliable predictive model. We analyze time series obtained from Twitter messages related to the 2008 U.S. elections usin...
Article
Electoral prediction from Twitter data is an appealing research topic. It seems relatively straightforward and the prevailing view is overly optimistic. This is problematic because while simple approaches are assumed to be good enough, core problems are not addressed. Thus, this paper aims to (1) provide a balanced and critical review of the state...
Article
Predicting X from Twitter is a popular fad within the Twitter research subculture. It seems both appealing and relatively easy. Among such kind of studies, electoral prediction is maybe the most attractive, and at this moment there is a growing body of literature on such a topic. This is not only an interesting research problem but, above all, it i...
Article
Content published in microblogging systems like Twitter can be data-mined to take the pulse of society, and a number of studies have praised the value of relatively simple approaches to sampling, opinion mining, and sentiment analysis. Today Twitter is a source of information on such events, updated by millions of users worldwide reacting to events...
Conference Paper
Full-text available
Using social media for political discourse is increasingly becoming common practice, especially around election time. Arguably, one of the most interesting aspects of this trend is the possibility of "pulsing" the public's opinion in near real-time and, thus, it has attracted the interest of many researchers as well as news organizations. Recently,...
Article
Full-text available
Online Social Networks (OSNs) are a cutting edge topic. Almost everybody —users, marketers, brands, companies, and researchers— is approaching OSNs to better understand them and take advantage of their benefits. Maybe one of the key concepts underlying OSNs is that of influence which is highly related, although not entirely identical, to those of p...
Article
Full-text available
Online Social Networks (OSNs) are used by millions of users worldwide. Academically speaking, there is little doubt about the usefulness of demographic studies conducted on OSNs and, hence, methods to label unknown users from small labeled samples are very useful. However, from the general public point of view, this can be a serious privacy concern...
Article
Full-text available
One of the most important issues in Information Retrieval is inferring the intents underlying users' queries. Thus, any tool to enrich or to better contextualized queries can proof extremely valuable. Entity extraction, provided it is done fast, can be one of such tools. Such techniques usually rely on a prior training phase involving large dataset...
Article
Micro-blogging services such as Twitter allow anyone to publish anything, anytime. Needless to say, many of the available contents can be diminished as babble or spam. However, given the number and diversity of users, some valuable pieces of information should arise from the stream of tweets. Thus, such services can develop into valuable sources of...
Article
Full-text available
Search engines are nowadays one of the most important entry points for Internet users and a central tool to solve most of their information needs. Still, there exist a substantial amount of users' searches which obtain unsatisfactory results. Needless to say, several lines of research aim to increase the relevancy of the results users retrieve. In...
Article
Characterizing user’s intent and behaviour while using a retrieval information tool (e.g. a search engine) is a key question on web research, as it hold the keys to know how the users interact, what they are expecting and how we can provide them information in the most beneficial way. Previous research has focused on identifying the average charact...
Article
Search engine logs provide a highly detailed insight of users’ interactions. Hence, they are both extremely useful and sensitive. The datasets publicly available to scholars are, unfortunately, too few, too dated and too small. There are few because search engine companies are reluctant to release such data; they are dated because they were collect...
Conference Paper
Full-text available
Search engine logs store detailed information on Web users interactions. Thus, as more and more people use search engines on a daily basis, important trails of users common knowledge are being recorded in those files. Previous research has shown that it is possible to extract concept taxonomies from full text documents, while other scholars have pr...
Article
Full-text available
User interactions with search engines reveal three main underlying intents, namely navigational, informational, and transactional. By providing more accurate results depending on such query intents the performance of search engines can be greatly improved. Therefore, query classification has been an active research topic for the last years. However...
Conference Paper
Many technologies related to software components have amostly descriptive purpose. It seems desirable to promote automaticspecification and validation strategies, developing techniques that, fromthe descriptions, are able to detect defects in a static manner (beforeexecution time). Technology transfer is also a main issue to encouragethe adoption of...
Conference Paper
Full-text available
Our group in the Department of Informatics at the University of Oviedo has participated, for the first time, in two tasks at CLEF: monolingual (Russian) and bilingual (Spanish-to-English) information retrieval. Our main goal was to test the application to IR of a modified version of the n-gram vector space model (codenamed blindLight). This new app...
Article
Full-text available
Chinese text segmentation is a well-known and difficult problem. On one side, there is not a simple notion of "word" in Chinese language making really hard to implement rule-based systems to segment written texts, thus lexicons and statistical information are usually employed to achieve such a task. On the other side, any piece of Chinese text usua...
Article
The Web is mainly processed by humans. The role of the machines is just to transmit and display the contents of the documents, barely being able to do something else. Nowadays there are lots of initiatives trying to change this situation; many of them are related to fields like the Semantic Web [1] or Web Intelligence. In this paper we describe the...
Conference Paper
Full-text available
Keywords are a simple way of describing a document, giving the reader some clues about its contents. However, sometimes they only categorize the text into a topic being more useful a summary. Keywords and abstracts are common in scientific and technical literature but most of the documents available (e.g., web pages) lack such help, so automatic ke...
Conference Paper
Full-text available
Word fragments or n-grams have been widely used to perform different Natural Language Processing tasks such as information retrieval (1) (2), document categorization (3), automatic summarization (4) or, even, genetic classification of languages (5). All these techniques share some common aspects such as: (1) documents are mapped to a vector space w...
Conference Paper
Full-text available
The Web is mainly processed by humans. The role of the machines is just to transmit and display the contents of the documents, barely being able to do something else. Nowadays there are lots of initiatives trying to change this situation; many of them are related to fields like the Semantic Web (1) or Web Intelligence. In this paper we describe the...
Conference Paper
Full-text available
Algorithms and Programming Languages is a core subject in the BS degree in mathematics at the authors' university. Some of the students are very interested in computer programming but most of them find the subject quite hard. This situation is particularly stressed when concerning theoretical aspects and, in fact, many students view these areas as...
Conference Paper
Full-text available
The Web is mainly processed by humans. The role of the machines is just to transmit and display the contents of the documents, barely being able to do something else. Nowadays there are lots of initiatives trying to change this situation; many of them are related to fields like the Semantic Web or Web Intelligence. This paper describes a new propos...
Article
The Web is a colossal document repository that is nowadays processed by humans only. Machines' role is limited to transmission and layout processing, barely being able to do something else with contents of documents. Therefore, information retrieval in the current Web is a difficult task where many results provide little relevance. The Semantic Web...
Conference Paper
Full-text available
The Web is a colossal document repository that is nowadays processed by humans only. The machines' role is just to transmit and display the contents, barely being able to do something else. The Semantic Web tries to change this status so that software agents can manipulate the semantic contents of the Web. There are some technologies proposed for t...
Article
La Web es un colosal repositorio de documentos que es procesado, mayoritariamente, por seres humanos. El papel de las máquinas se reduce a la transmisión y visualización de los contenidos y apenas pueden hacer nada más con los mismos. La Web Semántica (Berners-Lee et al, 2001) pretende cambiar esta situación de tal forma que los contenidos semántic...
Conference Paper
Full-text available
The Web is a colossal document repository that is nowadays processed by humans only. Machines ’ role is limited to transmission and layout processing, barely being able to do something else with contents of documents. Therefore, information retrieval in the current Web is a difficult task where many results provide little relevance. The Semantic We...
Article
After years of experience in computer graphics projects, GIworks workgroup at University of Oviedo came to the conclusion that some multipurpose graphics tools were an absolute necessity to perform rapid development on computer graphics. Such tools should provide basic facilities and should be easily adaptable to the kind of application on developm...
Article
Full-text available
People use of web as an information resource is growing, specially to browse last minute news. This fact is pushing the web towards popularity levels similar to other mass media like TV or press. In spite of this, web is different since resources required to build a site are much simpler. This paper will show the most essential parts in any dynamic...
Article
Full-text available
Resumen Uno de los pilares sobre los que se desarrolla el Espacio Europeo de Educación Superior es el de crédito ECTS. Sin embargo, pese a ser un concepto utilizado continuamente cuando se trata el tema de la convergencia europea hay muchas dudas en torno a él, algunas de las cuales se muestran, y se pretenden aclarar, en este artículo. 1. Historia...
Article
Full-text available
Using social media for political discourse is increasingly becoming common practice, especially around election time. Arguably, one of the most interesting aspects of this trend is the possibility of "pulsing" the public opinion in near real-time and, thus, it has attracted the interest of many researchers as well as news organizations. Recently, i...

Network

Cited By