
Alberto Pérez García-Plaza- Research and Development Engineer
- Lecturer at National University of Distance Education
Alberto Pérez García-Plaza
- Research and Development Engineer
- Lecturer at National University of Distance Education
About
24
Publications
4,530
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
231
Citations
Introduction
Current institution
Publications
Publications (24)
In this work we deal with the problem of web page clustering from the point of view of document representation. Fuzzy ruled-based systems have been successfully used to represent web documents by means of heuristic combinations of criteria. In these systems, rules were established based on the way humans read documents and have been analyzed in pre...
Finding and visualizing semantic relations among tags within a tag cloud enhances user experience, particularly regarding access to and retrieval of web pages on social tagging systems. Several approaches have been proposed to visualize tag relations in these systems. However, results of previous research rely on qualitative evaluation methods, and...
We present a methodology for learning a taxonomy from a set of text documents that each describes one concept. The taxonomy is obtained by clustering the concept definition documents with a hierarchical approach to the Self-Organizing Map. In this study, we compare three different feature extraction approaches with varying degree of language indepe...
This paper presents a new approach to disambiguate company names in the Twitter social network. We have focused on making lighter the processing of comparing company profiles with tweets in order to obtain a competitive real-time system. With this aim, we only use the home page of each company as information source to create a unique profile. On th...
Background
After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge.
Goal
To produce a reusable toolset supporting the m...
The selection of a suitable document representation approach plays a crucial role in the performance of a document clustering task. Being able to pick out representative words within a document can lead to substantial improvements in document clustering. In the case of web documents, the HTML markup that defines the layout of the content provides a...
The selection of a suitable document representation approach plays a crucial role in the performance of a document clustering task. Being able to pick out representative words within a document can lead to substantial improvements in document clustering. In the case of web documents, the HTML markup that defines the layout of the content provides a...
Wikipedia Animal Dataset is a dataset created during December 2010 and January 2011 with data retrieved from Wikipedia. It is available for research purposes.
Statistics
-----------
This dataset is made up by 498 unique URLs corresponding to articles about animals. For each animal the article was collected in English, Finnish and Spanish, fulfilli...
In our daily lives, organizing resources like books or webpages into a set of categories to ease future access is a common task. The usual largeness of these collections requires a vast endeavor and an outrageous expense to organize manually. As an approach to effectively produce an automated classification of resources, we consider the immense amo...
Since very recently, users on the social bookmarking service Delicious can
stack web pages in addition to tagging them. Stacking enables users to group
web pages around specific themes with the aim of recommending to others.
However, users still stack a small subset of what they tag, and thus many web
pages remain unstacked. This paper presents ear...
Tag clouds have become an appealing way of navigating through Web pages on social tagging systems. Recent research has focused on finding relations among tags to improve visualization and access to Web documents from tag clouds. Reorganizing tag clouds according to tag relatedness has been suggested as an effective solution to ease navigation. Most...
Keeping information organized is an important issue to make information access easier. Although the information we need is sometimes available on the Web, this information is only useful if we have the ability to find it. With this aim, it is increasingly frequent to use automatic techniques for grouping documents. In this thesis we are interested...
This paper presents a new approach to disambiguate company names in the Twitter social network. We have focused on making lighter the processing of comparing company profiles with tweets in order to obtain a competitive real-time system. With this aim, we only use the home page of each company as information source to create a unique profile. On th...
Document representation is an essential step in web page clustering. Web pages are usually written in HTML, offering useful information to select the most important features to represent them. In this paper we investigate the use of nonlinear combinations of criteria by means of a fuzzy system to find those important features. We start our research...
DeliciousT140 is a dataset created during June 2008 with data retrieved from the social bookmarking site Delicious and the Web. It is available for research purposes.
SocialBM0311 is a large-scale social tagging/bookmarking dataset collected from Delicious.com. It contains the complete bookmarking activity for almost 2 million users from the launch of the social bookmarking website in 2003 to the end of March 2011. The dataset contains: 339,897,227 bookmarks, 118,520,382 unique URLs, 14,723,731 unique tags, and...
This paper presents a methodology for learning taxonomic relations from a set of documents that each explain one of the concepts. Three different feature extraction approaches with varying degree of language independence are compared in this study. The first feature extraction scheme is a language-independent approach based on statistical keyphrase...
Social tagging systems are becoming an interesting way to retrieve web information from previously annotated data. These sites present a tag cloud made up by the most popular tags, where neither tag grouping nor their corresponding content is considered. We present a methodology to obtain and visualize a cloud of related tags based on the use of se...
This article introduces and evaluates a fuzzy logic based representation for HTML document clustering using Self-Organizing Maps. This representation is built on heuristic combinations of criteria by means of a fuzzy rules system and based on the HTML markup. We evaluate the model using different feature vector sizes. Experimental results show an i...
Laburpena: Erabiltzaileek aurrez anotatutako datuak berreskuratzeko baliabide in-teresgarria bilakatu dira markatzaile sozialak. Mota honetako webguneek etiketarik erabilienek osatutako etiketa-lainoa erakusten dute nabigazio aukera gisa. Etiketa-laino hauek, ordea, ez dute etiketen arteko antzekotasuna ez eta edukia kontuan izaten. Lan honetan SOM...