Alberto Pérez García-Plaza

Alberto Pérez García-Plaza
  • Research and Development Engineer
  • Lecturer at National University of Distance Education

About

24
Publications
4,530
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
231
Citations
Current institution
National University of Distance Education
Current position
  • Lecturer

Publications

Publications (24)
Conference Paper
Full-text available
In this work we deal with the problem of web page clustering from the point of view of document representation. Fuzzy ruled-based systems have been successfully used to represent web documents by means of heuristic combinations of criteria. In these systems, rules were established based on the way humans read documents and have been analyzed in pre...
Article
Full-text available
Finding and visualizing semantic relations among tags within a tag cloud enhances user experience, particularly regarding access to and retrieval of web pages on social tagging systems. Several approaches have been proposed to visualize tag relations in these systems. However, results of previous research rely on qualitative evaluation methods, and...
Article
We present a methodology for learning a taxonomy from a set of text documents that each describes one concept. The taxonomy is obtained by clustering the concept definition documents with a hierarchical approach to the Self-Organizing Map. In this study, we compare three different feature extraction approaches with varying degree of language indepe...
Article
Full-text available
This paper presents a new approach to disambiguate company names in the Twitter social network. We have focused on making lighter the processing of comparing company profiles with tweets in order to obtain a competitive real-time system. With this aim, we only use the home page of each company as information source to create a unique profile. On th...
Article
Full-text available
Background After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge. Goal To produce a reusable toolset supporting the m...
Preprint
The selection of a suitable document representation approach plays a crucial role in the performance of a document clustering task. Being able to pick out representative words within a document can lead to substantial improvements in document clustering. In the case of web documents, the HTML markup that defines the layout of the content provides a...
Article
Full-text available
The selection of a suitable document representation approach plays a crucial role in the performance of a document clustering task. Being able to pick out representative words within a document can lead to substantial improvements in document clustering. In the case of web documents, the HTML markup that defines the layout of the content provides a...
Data
Wikipedia Animal Dataset is a dataset created during December 2010 and January 2011 with data retrieved from Wikipedia. It is available for research purposes. Statistics ----------- This dataset is made up by 498 unique URLs corresponding to articles about animals. For each animal the article was collected in English, Finnish and Spanish, fulfilli...
Article
Full-text available
In our daily lives, organizing resources like books or webpages into a set of categories to ease future access is a common task. The usual largeness of these collections requires a vast endeavor and an outrageous expense to organize manually. As an approach to effectively produce an automated classification of resources, we consider the immense amo...
Article
Full-text available
Since very recently, users on the social bookmarking service Delicious can stack web pages in addition to tagging them. Stacking enables users to group web pages around specific themes with the aim of recommending to others. However, users still stack a small subset of what they tag, and thus many web pages remain unstacked. This paper presents ear...
Article
Full-text available
Tag clouds have become an appealing way of navigating through Web pages on social tagging systems. Recent research has focused on finding relations among tags to improve visualization and access to Web documents from tag clouds. Reorganizing tag clouds according to tag relatedness has been suggested as an effective solution to ease navigation. Most...
Thesis
Full-text available
Keeping information organized is an important issue to make information access easier. Although the information we need is sometimes available on the Web, this information is only useful if we have the ability to find it. With this aim, it is increasingly frequent to use automatic techniques for grouping documents. In this thesis we are interested...
Conference Paper
This paper presents a new approach to disambiguate company names in the Twitter social network. We have focused on making lighter the processing of comparing company profiles with tweets in order to obtain a competitive real-time system. With this aim, we only use the home page of each company as information source to create a unique profile. On th...
Conference Paper
Full-text available
Document representation is an essential step in web page clustering. Web pages are usually written in HTML, offering useful information to select the most important features to represent them. In this paper we investigate the use of nonlinear combinations of criteria by means of a fuzzy system to find those important features. We start our research...
Data
DeliciousT140 is a dataset created during June 2008 with data retrieved from the social bookmarking site Delicious and the Web. It is available for research purposes.
Data
SocialBM0311 is a large-scale social tagging/bookmarking dataset collected from Delicious.com. It contains the complete bookmarking activity for almost 2 million users from the launch of the social bookmarking website in 2003 to the end of March 2011. The dataset contains: 339,897,227 bookmarks, 118,520,382 unique URLs, 14,723,731 unique tags, and...
Conference Paper
Full-text available
This paper presents a methodology for learning taxonomic relations from a set of documents that each explain one of the concepts. Three different feature extraction approaches with varying degree of language independence are compared in this study. The first feature extraction scheme is a language-independent approach based on statistical keyphrase...
Conference Paper
Full-text available
Social tagging systems are becoming an interesting way to retrieve web information from previously annotated data. These sites present a tag cloud made up by the most popular tags, where neither tag grouping nor their corresponding content is considered. We present a methodology to obtain and visualize a cloud of related tags based on the use of se...
Conference Paper
Full-text available
This article introduces and evaluates a fuzzy logic based representation for HTML document clustering using Self-Organizing Maps. This representation is built on heuristic combinations of criteria by means of a fuzzy rules system and based on the HTML markup. We evaluate the model using different feature vector sizes. Experimental results show an i...
Article
Full-text available
Laburpena: Erabiltzaileek aurrez anotatutako datuak berreskuratzeko baliabide in-teresgarria bilakatu dira markatzaile sozialak. Mota honetako webguneek etiketarik erabilienek osatutako etiketa-lainoa erakusten dute nabigazio aukera gisa. Etiketa-laino hauek, ordea, ez dute etiketen arteko antzekotasuna ez eta edukia kontuan izaten. Lan honetan SOM...

Network

Cited By