Rafael Corchuelo's research while affiliated with Universidad de Sevilla and other places

Publications (158)

Article
Business rules govern how well‐managed companies perform every day. They are expected to be written in natural language because they are devised by business people. That makes it difficult to translate them automatically into executable rules that can be integrated into typical information systems. Our industrial and academic research concludes tha...
Article
Many people who have to make informed decisions in today’s always-on culture use information extractors to feed their systems with information that comes from human-friendly documents. Unfortunately, many proposals to validate information extractors have deficiencies that make it difficult to perform homogeneous comparisons, confirm or refute perfo...
Article
Question answering aims at computing the answer to a question given a context with facts. Many proposals focus on questions whose answer is explicit in the context; lately, there has been an increasing interest in questions whose answer is not explicit and requires multi-hop inference to be computed. Our analysis of the literature reveals that ther...
Article
Full-text available
The Web provides many data that are encoded using HTML tables. This facilitates rendering them, but obfuscates their structure and makes it difficult for automated business processes to leverage them. This has motivated many authors to work on proposals to extract them as automatically as possible. In this article, we present a new unsupervised pro...
Article
Data engineers are very interested in data lake technologies due to the incredible abundance of datasets. They typically use clustering to understand the structure of the datasets before applying other methods to infer knowledge from them. This article presents the first proposal that explores using a meta-heuristic to address the problem of multi-...
Article
This article presents Coraline, which is a new table-understanding proposal. Its novelty lies in a coral-reef optimisation algorithm that addresses the problem of feature selection in synchrony with a clustering technique and some custom heuristics that help extract information in a totally unsupervised manner. Our experimental analysis was perform...
Article
HTML tables have become pervasive on the Web. Extracting their data automatically is difficult because finding the relationships between their cells is not trivial due to the many different layouts, encodings, and formats available. In this article, we introduce Melva, which is an unsupervised domain-agnostic proposal to extract data from HTML tabl...
Article
Extracting data from user-friendly HTML tables is difficult because of their different layouts, formats, and encoding problems. In this article, we present a new proposal that first applies several pre-processing heuristics to clean the tables, then performs functional analysis, and finally applies some post-processing heuristics to produce the out...
Article
Full-text available
This article addresses major information systems integration problems, approaches, technologies, and tools within the context of Model-Driven Software Engineering. The Guaraná integration platform is introduced as an innovative platform amongst state-of-the-art technologies available for enterprises to design and implement integration solutions. In...
Article
Integrating RDF datasets has become a relevant problem for both researchers and practitioners. In the literature, there are many genetic proposals that learn rules that allow to link the resources that refer to the same real-world entities, which is paramount to integrating the datasets. Unfortunately, they are context-unaware because they focus on...
Article
RDFa, JSON‐LD, Microdata, and Microformats allow to endow the data in HTML files with metadata tags that help software agents understand them. Unluckily, there are many HTML files that do not have any metadata tags, which has motivated many authors to work on proposals to synthesize them. But they have some problems: the authors either provide an o...
Book
Both established and emergent business rely heavily on data, chiefly those that wish to become game changers. The current biggest source of data is the Web, where there is a large amount of sparse data. The Web, where there is a large amount of sparse data. To realise this vision, it is required that the resources in different data sources that ref...
Article
Data engineering seeks to support artificial intelligence processes that extract knowledge from raw data. Many such data are rendered in natural language from which entity-relation extractors extract facts and opinion miners extract opinions; the goal of condition mining is to mine the conditions that have an influence on them. In this article, a n...
Article
In this article, we report on our experience regarding devising, implementing, and deploying a scheduler for multi-source fusion in the context of SCADA systems (Supervisory Control and Data Acquisition). They are challenging because they commonly rely on low-end boards with very limited computing, memory, and storage capabilities, but have to run...
Article
A condition is a constraint that determines when a consequent holds. Mining them in text is paramount to understand many sentences properly. In the literature, there are a few pattern-based proposals that fall short regarding recall because it is not easy to characterise unusual ways to express conditions with hand-crafted patterns; there is one ma...
Article
Aspect‐based sentiment analysis systems are a kind of text‐mining systems that specialize in summarizing the sentiment that a collection of reviews convey regarding some aspects of an item. There are many cases in which users write their reviews using conditional sentences; in such cases, mining the conditions so that they can be analyzed is very i...
Article
Tables are a common means to display data in human-friendly formats. Many authors have worked on proposals to extract those data back since this has many interesting applications. In this article, we summarise and compare many of the proposals to extract data from tables that are encoded using HTML and have been published between 2000 and 2018. We...
Preprint
The Web provides many data in user-friendly tabular formats that are encoded using HTML. Information extractors are intended to extract those data as datasets that can feed business applications. There exist many proposals to implement them, which has motivated several previous surveys. Unfortunately, they are outdated and we do not think that it s...
Conference Paper
A condition is a constraint that determines when something holds. Mining them is paramount to understanding many sentences properly. There are a few pattern-based approaches that fall short because the patterns must be handcrafted and it is not easy to characterise unusual ways to express conditions; there is one machine-learning approach that requ...
Conference Paper
Business systems that are fed with data from the Web of Data require transparent interoperability. The Linked Data principles establish that different resources that represent the same real-world entities must be linked for such purpose. Link rules are paramount to transparent interoperability since they produce the links between resources. State-o...
Conference Paper
In the Web of Data, real-world entities are represented by means of resources, for instance the southern Spanish city “Seville” that is represented by means of the resource that is available at http://es.dbpedia.org/page/Sevilla in the DBpedia dataset. Link rules are intended to link resources that are different, but represent the same real-world e...
Chapter
Text mining pursues producing valuable information from natural language text. Conditions cannot be neglected because it may easily lead to misinterpretations. There are naive proposals to mine conditions that rely on user-defined patterns, which falls short; there is only one machine-learning proposal, but it requires to provide specific-purpose d...
Conference Paper
Feeding decision support systems with Web information typically requires sifting through an unwieldy amount of information that is available in human-friendly formats only. Our focus is on a scalable proposal to extract information from semi-structured documents in a structured format, with an emphasis on it being scalable and open. By semi-structu...
Article
Full-text available
The research regarding Web information extraction focuses on learning rules to extract some selected information from Web documents. Many proposals are ad hoc and cannot benefit from the advances in machine learning; furthermore, they are likely to fade away as the Web evolves, and their intrinsic assumptions are not satisfied. Some authors have ex...
Article
Full-text available
The research on Enterprise Systems Integration focuses on proposals to support business processes by re-using existing systems. Wrappers help re-use web applications that provide a user interface only. They emulate a human user who interacts with them and extracts the information of interest in a structured format. In this article, we present TANGO...
Article
Full-text available
Web page classification refers to the problem of automatically assigning a web page to one or more classes after analysing its features. Automated web page classifiers have many applications, and many researchers have proposed techniques and tools to perform web page classification. Unfortunately, the existing tools have a number of drawbacks that...
Article
Full-text available
Companies typically rely on applications purchased from third parties or developed at home to support their business activities. It is not uncommon that these applications were not designed taking integration into account. Enterprise Application Integration provides methodologies and tools to design and implement integration solutions. Camel, Sprin...
Article
Full-text available
Consulting companies that specialise in Enterprise Application Integration commonly require adapting existing frameworks to specific domains. Currently, there are many such frameworks available, most of which provide a materialisation of the well-known catalogue of patterns that was devised by Hohpe and Woolf. The decision regarding which framework...
Article
Full-text available
Information extractors are used to transform the user-friendly information in a web document into structured information that can be used to feed a knowledge-based system. Researchers are interested in ranking them to find out which one performs the best. Unfortunately, many rankings in the literature are deficient. There are a number of formal met...
Article
Nowadays, the Web of Data is in its earliest stages; it is currently organised into a variety of linked knowledge bases that have been developed independently by different organisations. RDF is one of the most popular languages to represent data in this context, which motivates the need to perform complex integration tasks amongst RDF knowledge bas...
Chapter
Full-text available
It is not difficult to find an enterprise which has a software ecosystem composed of applications that were built using different technologies, data models, operating systems, and most often were not designed to exchange data and share functionalities. Enterprise Application Integration provides methodologies and tools to design and implement integ...
Conference Paper
Full-text available
Business Intelligence requires the acquisition and aggregation of key pieces of knowledge from multiple sources in order to provide valuable information to customers. The Web is the largest source of information nowadays. Unfortunately, the information it provides is available in semi-structured human-friendly formats, which makes it difficult to b...
Conference Paper
Some software agents need information that is provided by some web sites, which is difficult if they lack a query API. Information extractors are intended to extract the information of interest automatically and offer it in a structured format. Unfortunately, most of them rely on ad-hoc techniques, which make them fade away as the Web evolves. In t...
Conference Paper
Software agents are increasingly used to search for experts, recommend resources, assess opinions, and other similar tasks in the context of social networks, which requires to have accurate information that describes the features of the members of the network. Unfortunately, many member profiles are incomplete, which has motivated many authors to w...
Chapter
Many software agents require information that is available in web documents. Unfortunately, the existing proposals to learn extraction rules are tightly coupled with the learning component and do not result in resilient rules. We present a novel approach that leverages neural networks and has proven to be very resilient.
Chapter
Full-text available
Throughout the years, companies have been using software systems to support their business activities. It is common that the software ecosystem in a company is composed of developed at home and off-the-shelf applications. Frequently, companies may need to reuse these applications to support new business processes or optimise the current ones. Howev...
Chapter
All over the years, enterprises have been accumulating a variety of applications in their software ecosystem to support their business processes. As a result, a software ecosystem is an heterogeneous set of IT assets (data and functionality) of the enterprise. Enterprise Application Integration (EAI) discipline aims to provide language and tools to...
Chapter
Enterprise Application Integration aims to provide methodologies and tools to inte- grate the many heterogeneous applications of typical companies' software ecosystems. The reuse of these applications within the ecosystem contributes to reducing software development costs and deployment time. Studies have shown that the cost of integration is usual...
Article
The Web is a huge and still growing information repository that has attracted the attention of many companies. Many such companies rely on information extractors to integrate information that is buried into semi-structured web documents into automatic business processes. Many information extractors build on extraction rules, which can be handcrafte...
Article
Web data extractors are used to extract data from web documents in order to feed automated processes. In this article, we propose a technique that works on two or more web documents generated by the same server-side template and learns a regular expression that models it and can later be used to extract data from similar documents. The technique bu...
Article
The Web is evolving into a Web of Data in which RDF data are becoming pervasive, and it is organised into datasets that share a common purpose but have been developed in isolation. This motivates the need to devise complex integration tasks, which are usually performed using schema mappings; generating them automatically is appealing to relieve use...
Article
The World Wide Web is an immense information resource. Web information extraction is the task that transforms human friendly Web information into structured information that can be consumed by automated business processes. In this article, we propose an unsupervised information extractor that works on two or more web documents generated by the same...
Article
A semantic-web ontology, simply known as ontology, comprises a data model and data that should comply with it. Due to their distributed nature, there exist a large amount of heterogeneous ontologies, and a strong need for exchanging data amongst them, i.e., populating a target ontology using data that come from one or more source ontologies. Data e...
Article
Full-text available
Unsupervised web page classification refers to the problem of clustering the pages in a web site so that each cluster includes a set of web pages that can be classified using a unique class. The existing proposals to perform web page classification do not fulfill a number of requirements that would make them suitable for enterprise web information...
Article
Full-text available
The goal of data exchange is to populate the data model of a target application using data that come from one or more source applications. It is common to address data exchange building on correspondences that are transformed into executable mappings. The problem that we address in this article is how to generate executable mappings in the context...
Conference Paper
We propose a technique that takes two or more web pages generated by the same server-side template and tries to learn a regular expression that represents it and helps extract relevant information from similar pages. Our experimental results on real-world web sites demonstrate that our technique outperforms others in terms of both effectiveness and...
Conference Paper
Deep Web sites expose data from a database, whose conceptual model remains hidden. Having access to that model is mandatory to perform several tasks, such as integrating different web sites; extracting information from the web unsupervisedly; or creating ontologies. In this paper, we propose a technique to discover the conceptual model behind a web...
Conference Paper
The Web of Data, which comprises web sources that provide their data in RDF, is gaining popularity day after day. Ontological models over RDF data are shared and developed with the consensus of one or more communities. In this context, there usually exist more than one ontological model to understand RDF data, therefore, there might be a gap betwee...
Article
Full-text available
The increasing popularity of the Web of Data is motivating the need to integrate semantic-web ontologies. Data exchange is one integration approach that aims to populate a target ontology using data that come from one or more source ontologies. Currently, there exist a variety of systems that are suitable to perform data exchange among these ontolo...
Conference Paper
Full-text available
The literature provides a variety of techniques to build the information extractors on which some data integration systems rely. Information extraction techniques are usually based on extraction rules that require maintenance and adaptation if web sources change. We present our preliminary steps towards an unsupervised information extraction techni...
Conference Paper
The Web is the largest repository of human-friendly information. Unfortunately, web information is embedded in formatting tags and is surrounded by irrelevant information. Researchers are working on information extractors that allow transforming this information into structured data for its later integration into automated processes. Devising a new...
Conference Paper
Full-text available
Virtual Integration systems require a crawling tool able to navigate and reach relevant pages in the Deep Web in an efficient way. Existing proposals in the crawling area fulfill some of these requirements, but most of them need to download pages in order to classify them as relevant or not. We propose a crawler supported by a web page classifier t...
Conference Paper
Full-text available
Most web page classifiers use features from the page content, which means that it has to be downloaded to be classified. We propose a technique to cluster web pages by means of their URL exclusively. In contrast to other proposals, we analyze features that are outside the page, hence, we do not need to download a page to classify it. Also, it is no...
Article
Full-text available
In the Semantic Web, there are a variety of ontolo-gies, which motivates the need for integrating them. Integration tasks rely on the use of relationships amongst the integrated ontologies, known as mappings. The literature reports on a number of techniques to automatically generate such mappings, unfortunately, the results are not suitable to perf...
Conference Paper
Full-text available
Typical companies rely on their software ecosystems to support and optimise their business processes. There are a few proposals to help software engineers devise enterprise application integration solutions. Some companies need to adapt these proposals to particular contexts. Unfortunately, our analysis reveals that they are not so easy to maintain...
Article
Full-text available
Enterprise Application Integration (EAI) solutions comprise a set of specific-purpose processes that implement exogenous message workflows. The goal is to keep a number of applications' data in syn-chrony or to develop new functionality on top of them. Such solutions are prone to errors because they are highly distributed and usually in-volve appli...
Article
The provision of services is often regulated by means of agreements that must be negotiated beforehand. Automating such negotiations is appealing insofar as it overcomes one of the most often cited shortcomings of human negotiation: slowness. Our analysis of the requirements of automated negotiation systems in open environments suggests that some o...
Conference Paper
The literature provides many techniques to infer rules that can be used to configureweb information extractors.Unfortunately, these techniques have been developed independently, which makes it very difficult to compare the results: there is not even a collection of datasets on which these techniques can be assessed. Furthermore, there is not a comm...
Article
Extracting information from web documents has become a research area in which new proposals sprout out year after year. This has motivated several researchers to work on surveys that attempt to provide an overall picture of the many existing proposals. Unfortunately, none of these surveys provide a complete picture, because they do not take region...
Conference Paper
Full-text available
Virtual integration systems require a crawler to navigate through web sites automatically, looking for relevant information. This process is online, so whilst the system is looking for the required information, the user is waiting for a response. Therefore, downloading a minimum number of irrelevant pages is mandatory to improve the crawler efficie...
Conference Paper
Full-text available
The Cloud is evolving as a cost-effective solution to run services that support a variety of business processes. It is not surprising then that Orchestration as a Service (OaaS) is gaining importance as a means to integrate the many services a typical company runs or out sources in the Cloud. OaaS requires very efficient orchestration engines: the...
Conference Paper
Many authors are researching on information extraction techniques to transform the semi-structured information in typical web pages into structured information. When a researcher devises a new technique, he or she has to validate it, which requires implementing it, experimenting, gathering precision and recall results, comparing it to others, and d...
Conference Paper
Full-text available
Data translation is an integration task that aims at populating a target model with data of a source model, which is usually performed by means of mappings. To reduce costs, there are some techniques to automatically generate executable mappings in a given query language, which are executed using a query engine to perform the data translation task....
Conference Paper
Full-text available
Data translation is an integration task that aims at populating a target model with data of a source model by means of mappings. Generating them automatically is appealing insofar it may reduce integration costs. Matching techniques automatically generate uninterpreted mappings, a.k.a. correspondences, that must be interpreted to perform the data t...
Conference Paper
Full-text available
Data translation, also known as data exchange, is an integration task that aims at populating a target model using data from a source model. This task is gaining importance in the context of semantic-web ontologies due to the increasing interest in graph databases and semantic-web agents. Currently, there are a variety of semantic-web technologies...
Article
Companies comprise a variety of software applications to carry out their business activities. A recurrent challenge is how to make them interoperate with each other which is usually handcrafted, which is a tedious task that increases integration costs. Enterprise Service Buses range amongst the most popular solution to reduce these costs, and they...
Conference Paper
Full-text available
Virtual Integration systems require a crawling tool able to navigate and reach relevant pages in the Web in an efficient way. Existing proposals in the crawling area are aware of the efficiency problem, but still most of them need to download pages in order to classify them as relevant or not. In this paper, we present a conceptual framework for de...
Conference Paper
Full-text available
The actual value of the Deep Web comes from integrating the data its applications provide. Such applications offer human-oriented search forms as their entry points, and there exists a number of tools that are used to fill them in and retrieve the resulting pages programmatically. Solution that rely on these tools are usually costly, which motivate...
Conference Paper
Full-text available
Enterprise Application Integration (EAI) is a field of Soft-ware Engineering. Its focus is on helping software engineers integrate ex-isting applications at a sensible costs, so that they can easily implement and evolve business processes. EAI solutions are distributed in nature, which makes them inherently prone to failures. In this paper, we repo...
Conference Paper
Full-text available
The Web is the largest information repository. The information it contains is usually available in human-friendly formats. Companies are interested in using this information. The problem is that they need it in structured formats so that they can use it in automated business processes. In the literature, there are many proposals to infer informatio...
Conference Paper
Full-text available
Enterprise Application Integration (EAI) is a field of Software Engineering. Its focus is on helping software engineers integrate existing applications at a sensible costs, so that they can support new business processes or optimise existing ones. EAI solutions are distributed in nature, which makes them inherently prone to failures. In this paper,...
Conference Paper
Full-text available
The Semantic Web comprises a large amount of distributed and heterogeneous ontologies, which have been developed by different communities, and there exists a need to integrate them. Mediators are pieces of software that help to perform this integration, which have been widely studied in the context of nested relational models. Unfortunately, mediat...
Article
Full-text available
Enterprise Application Integration (EAI) solutions cope with two kinds of problems within software ecosystems, namely: keeping a number of application's data in synchrony or creating new functionality on top of them. ESBs provide the technology required to implement a variety of EAI solutions at sensible costs, but they are still far from negligibl...
Article
Full-text available
Crawlers for Virtual Integration processes must be efficient, given that VI process is online, which means that while the system is looking for the required information, the user is waiting for a response. Therefore, downloading a minimum number of irrelevant pages is mandatory in order to improve the crawler efficiency. Most crawlers need to downl...
Article
Full-text available
In recent years, many authors have paid attention to web information extractors. They usually build on an algorithm that interprets extraction rules that are inferred from examples. Several rule learning techniques are based on transducers, but none of them proposed a transducer generic model for web information extraction. In this paper, we propos...
Conference Paper
Full-text available
Separation of concerns has been presented as a promising tool to tackle the design of complex systems in which cross-cutting properties that do no fit into the scope of a class must be satisfied. In this paper, we show that interaction amongst a number of objects can also be described separately from functionality by means of the CAL language, and...
Conference Paper
Full-text available
Las soluciones de integración de aplicaciones empresariales (EAI) suelen estar basadas en workflows de mensajes gracias a los cuales es posible conseguir que dos o más aplicaciones cooperen para proporcionar un nuevo servicio o que mantengan sus datos sincronizados. La bibliograf\ıa proporciona resultados sobre dos modelos de ejecución diferentes:...
Conference Paper
Full-text available
Enterprise application integrations involve the participation of sev-eral existing applications with which the integration solution exchanges data over LANs and the Internet. In these scenarios, operations might occasionally produce exceptional results at runtime due to impairments introduced by the electronic in-frastructure such as node crashes,...
Article
Full-text available
Resumen Las páginas web contienen diferentes tipos de enlaces, que desempeñan roles distintos. Los crawlers tradicionales navegan por los sitios web siguiendo todos los enlaces que encuen-tran, comportamiento que no es eciente en ciertos contextos como Virtual Integration, ya que en cada página web hay un número de en-laces relevantes pequeño en co...
Book
Full-text available
PAAMS, the International Conference on Practical Applications of Agents and Multi-Agent Systems is an international yearly forum to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and pr...