Francesco Guerra

Francesco Guerra
  • PhD in Information Engineering
  • Professor (Associate) at University of Modena and Reggio Emilia

About

128
Publications
15,842
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,283
Citations
Current institution
University of Modena and Reggio Emilia
Current position
  • Professor (Associate)
Additional affiliations
November 2005 - October 2015
University of Modena and Reggio Emilia
Position
  • Research Assistant

Publications

Publications (128)
Article
Full-text available
State-of-the-art Entity Matching approaches rely on transformer architectures, such as BERT, for generating highly contextualized embeddings of terms. The embeddings are then used to predict whether pairs of entity descriptions refer to the same real-world entity. BERT-based EM models demonstrated to be effective, but act as black-boxes for the use...
Article
Full-text available
In the past decade, many approaches have been suggested to execute ML workloads on a DBMS. However, most of them have looked at in-DBMS ML from a training perspective, whereas ML inference has been largely overlooked. We think that this is an important gap to fill for two main reasons: (1) in the near future, every application will be infused with...
Article
This paper showcases Time2Feat, an end-to-end machine learning system for Multivariate Time Series (MTS) clustering. The system relies on interpretable inter-signal and intra-signal features extracted from the time series. Then, a dimensionality reduction technique is applied to select a subset of features that retain most of the information, thus...
Article
Evaluation is a bottleneck in data integration processes: it is performed by domain experts through manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all integrated tuples infeasible. Our idea is to address this issue by providing the experts with an unsupervis...
Article
Full-text available
Clustering multivariate time series is a critical task in many real-world applications involving multiple signals and sensors. Existing systems aim to maximize effectiveness, efficiency and scalability, but fail to guarantee the interpretability of the results. This hinders their application in critical real scenarios where human comprehension of a...
Article
Full-text available
Predictive Maintenance (PdM) is the newest strategy for maintenance management in industrial contexts. It aims to predict the occurrence of a failure to minimize unexpected downtimes and maximize the useful life of components. In data-driven approaches, PdM makes use of Machine Learning (ML) algorithms to extract relevant features from signals, ide...
Article
Full-text available
State-of-the-art Entity Matching (EM) approaches rely on transformer architectures, such as BERT , for generating highly contex-tualized embeddings of terms. The embeddings are then used to predict whether pairs of entity descriptions refer to the same real-world entity. BERT-based EM models demonstrated to be effective, but act as black-boxes for...
Conference Paper
Full-text available
Evaluation of the quality of data integration processes is usually performed via manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all the tuples infeasible and the frequent updates, i.e. changes in the sources and/or new sources, impose to repeat the evaluatio...
Article
With the advent of Big Data, it is impossible for a human user to properly inspect and understand data at a glance. In this paper, we introduce the problem of generating data descriptions: a set of compact, readable and insightful formulas of boolean predicates that represents a set of data records. Unfortunately, finding the best description for a...
Conference Paper
Many data analysis and knowledge mining tasks require a basic understanding of the content of a dataset prior to any data access. In this demo, we showcase how data descriptions---a set of compact, readable and insightful formulas of boolean predicates---can be used to guide users in understanding datasets. Finding the best description for a datase...
Conference Paper
A rule-based entity matching task requires the definition of an effective set of rules, which is a time-consuming and error-prone process. The typical approach adopted for its resolution is a trial and error method, where the rules are incrementally added and modified until satisfactory results are obtained. This approach requires significant human...
Chapter
Wikipedia Infoboxes are semi-structured data structures organized in an attribute-value fashion. Policies establish for each type of entity represented in Wikipedia the attribute names that the Infobox should contain in the form of a template. However, these requirements change over time and often users choose not to strictly obey them. As a result...
Article
In relational databases, the full disjunction operator is an associative extension of the full outerjoin to an arbitrary number of relations. Its goal is to maximize the information we can extract from a database by connecting all tables through all join paths. The use of full disjunctions has been envisaged in several scenarios, such as data integ...
Chapter
Full-text available
The Database Group (DBGroup, www. dbgroup. unimore. it) and Information System Group (ISGroup, www. isgroup. unimore. it) research activities have been mainly devoted to the Data Integration Reserach Area. The DBGroup designed and developed the MOMIS data integration system, giving raise to a successful innovative enterprise DataRiver (www. datariv...
Chapter
DBpedia and Wikidata are two online projects focused on offering structured data from Wikipedia in order to ease its exploitation on the Linked Data Web. In this paper, a comparison of these two widely-used structured data sources is presented. This comparison considers the most relevant data quality dimensions in the state of the art of the scient...
Chapter
As more and more data becomes available on the Web, as its complexity increases and as the Web’s user base shifts towards a more general non-technical population, keyword searching is becoming a valuable alternative to traditional SQL queries, mainly due to its simplicity and the lower effort/expertise it requires. Existing approaches suffer from a...
Chapter
Nowadays, citizens require high level quality information from public institutions in order to guarantee their transparency. Institutional websites of governmental and public bodies must publish and keep updated a large amount of information stored in thousands of web pages in order to satisfy the demands of their users. Due to the amount of inform...
Conference Paper
We reproduce recent research results combining semantic and information retrieval methods. Additionally, we expand the existing state of the art by combining the semantic representations with IR methods from the probabilistic relevance framework. We demonstrate a significant increase in performance, as measured by standard evaluation metrics.
Article
Full-text available
A fundamental service for the exploitation of the modern large data sources that are available online is the ability to identify the topics of the data that they contain. Unfortunately, the heterogeneity and lack of centralized control makes it difficult to identify the topics directly from the actual values used in the sources. We present an appro...
Chapter
Full-text available
Structured data sources promise to be the next driver of a significant socio-economic impact for both people and companies. Nevertheless, accessing them through formal languages, such as SQL or SPARQL, can become cumbersome and frustrating for end-users. To overcome this issue, keyword search in databases is becoming the technology of choice, even...
Chapter
In document search, documents are typically seen as a flat list of keywords. To deal with the syntactic interoperability, i.e., the use of different keywords to refer to the same real world entity, entity linkage has been used to replace keywords in the text with a unique identifier of the entity to which they are referring. Yet, the flat list of e...
Conference Paper
Identifying similar items to the ones provided as input to a search system, is a challenging task. The main issues concern not only the management of large collections of data, but also the profiling of the users, who usually have different opinions, tastes and expertise. In this paper we propose a preliminary investigation about the improvements i...
Article
Full-text available
Algorithms and techniques for searching in collections of data address a challenging task, since they have to bridge the gap between the ways in which users express their interests, through natural language expressions or keywords, and the ways in which data is represented and indexed. When the collections of data include images, the task becomes h...
Article
Over the last decade, keyword search over relational data has attracted considerable attention. A possible approach to face this issue is to transform keyword queries into one or more SQL queries to be executed by the relational DBMS. Finding these queries is a challenging task since the information they represent may be modeled across different ta...
Conference Paper
This paper presents a new approach to improve the performance of a css-k-NN classifier for categorization of text documents. The css-k-NN classifier (i.e., a threshold-based variation of a standard k-NN classifier we proposed in [1]) is a lazy-learning instance-based classifier. It does not have parameters associated with features and/or classes of...
Conference Paper
Predicting the next page a user wants to see in a large website has gained importance along the last decade due to the fact that the Web has become the main communication media between a wide set of entities and users. This is true in particular for institutional government and public organization websites, where for transparency reasons a lot of i...
Conference Paper
Part-whole relations are ubiquitous in our world, yet they do not get “first-class” treatment in the data managements systems most commonly used today. One aspect of part-whole relations that is particularly important is that of attribute transitivity. Some attributes of a whole are also attributes of its parts, and vice versa. We propose an extens...
Conference Paper
Word Sense Disambiguation (WSD) usually relies on data structures built upon the words to be disambiguated. This is a time-consuming process that requires a huge computational effort. In this paper, we propose an approach to automatically build a generic sense inventory (called iSC) to be used as a reference for disambiguation. The sense inventory...
Article
In this paper, we present a preliminary approach for automatically discovering the topics of a structured data source with respect to a reference ontology. Our technique relies on a signature, i.e., a weighted graph that summarizes the content of a source. Graph-based approaches have been already used in the literature for similar purposes. In thes...
Chapter
In this paper, we overview the main research approaches developed in the area of Keyword Search over Relational Databases. In particular, we model the process for solving keyword queries in three phases: the management of the user’s input, the search algorithms, the results returned to the user. For each phase we analyze the main problems, the solu...
Conference Paper
Full-text available
This position paper discusses the need for considering keyword search over relational databases in the light of broader systems, where keyword search is just one of the components and which are aimed at better supporting users in their search tasks. These more complex systems call for appropriate evaluation methodologies which go beyond what is typ...
Article
Full-text available
We showcase QUEST (QUEry generator for STructured sources), a search engine for relational databases that combines semantic and machine learning techniques for transforming keyword queries into meaningful SQL queries. The search engine relies on two approaches: the forward, providing mappings of keywords into database terms (names of tables and att...
Article
Systems translating keyword queries into SQL queries over relational databases are usually referred to in the literature as schema-based approaches. These techniques exploit the information contained in the database schema to build SQL queries that express the intended meaning of the user query. Besides, typically, they perform a preliminary step t...
Article
Enterprises, governments, and government agencies have started to publish their data on the Internet, especially in the form of open structured data sources. The real exploitation of these free, large open data sources is more and more becoming a crucial activity for obtaining information and knowledge (i.e. competitive elements) in several busines...
Conference Paper
Following recent trends in Data Warehousing, companies realized that there is a great potential in combining their information repositories to obtain a broader view of the economical market. Unfortunately, even though Data Warehouse (DW) integration has been defined from a theoretical point of view, until now no complete, widely used methodology ha...
Article
In this paper, we present an ontology-based information extraction and retrieval system and its application in the soccer domain. In general, we deal with three issues in semantic search, namely, usability, scalability and retrieval performance. We propose ...
Chapter
The simplicity of keyword queries has made them particularly attractive to the technically unskilled user base, tending to become the de facto standard for querying on the web. Unfortunatelly, alongside its simplicity, came also the loose semantics. Researchers have, for a long time, studied ways to understand the keyword query semantics and retrie...
Book
The Web has become the worlds largest database, with search being the main tool that allows organizations and individuals to exploit its huge amount of information. Search on the Web has been traditionally based on textual and structural similarities, ignoring to a large degree the semantic dimension, i.e., understanding the meaning of the query an...
Book
This book constitutes the thoroughly refereed post-workshop proceedings of the 7th International Workshop on Agents and Peer-to-Peer Computing, AP2PC 2008, held in Estoril, Portugal, in May 2008 and the 8th International Workshop on Agents and Peer-to-Peer Computing, AP2PC 2009, held in Budapest, Hungary, May 2009, co-located with the International...
Conference Paper
Full-text available
We propose the demonstration of KEYRY, a tool for translating keyword queries over structured data sources into queries in the native language of the data source. KEYRY does not assume any prior knowledge of the source contents. This allows it to be used in situations where traditional keyword search techniques over structured data that require suc...
Conference Paper
Full-text available
We present a novel method for translating keyword queries over relational databases into SQL queries with the same intended semantic meaning. In contrast to the majority of the existing keyword-based techniques, our approach does not require any a-priori knowledge of the data instance. It follows a probabilistic approach based on a Hidden Markov Mo...
Conference Paper
Full-text available
Hidden Markov Models (HMMs) are today employed in a variety of applications, ranging from speech recognition to bioinformatics. In this paper, we present the List Viterbi training algorithm, a version of the Expectation-Maximization (EM) algorithm based on the List Viterbi algorithm instead of the commonly used forward-backward algorithm. We develo...
Article
Data warehouse architectures rely on extraction, transformation and loading (ETL) processes for the creation of an updated, consistent and materialized view of a set of data sources. In this paper, we support these processes by proposing a tool that: (1) allows the semi-automatic definition of inter-attribute semantic mappings, by identifying the p...
Conference Paper
In this paper we describe Keymantic, a framework for translating keyword queries into SQL queries by assuming that the only available information is the source metadata, i.e., schema and some external auxiliary information. Such a framework finds application when only intensional knowledge about the data source is available like in Data Integration...
Conference Paper
Full-text available
Keyword queries offer a convenient alternative to traditional SQL in querying relational databases with large, often unknown, schemas and instances. The challenge in answering such queries is to discover their intended semantics, construct the SQL queries that describe them and used them to retrieve the respective tuples. Existing approaches typica...
Article
From a user perspective, data and services provide a complementary view of an information source: data provide detailed information about specific needs, while services execute processes involving data and returning an informative result as well. For this reason, users need to perform aggregated searches to identify not only relevant data, but also...
Article
Full-text available
We introduce KEYRY, a tool for translating keyword queries over structured data sources into queries formulated in their native query language. Since it is not based on analysis of the data source contents, KEYRY finds application in scenarios where sources hold complex and huge schemas, apt to frequent changes, such as sources belonging to the lin...
Chapter
Given the many data integration approaches, a complete and exhaustive comparison of all the research activities is not possible. In this chapter we will present an overview of the most relevant research activities and ideas in the field investigated in the last 20 years. We will also introduce the MOMIS system, a framework to perform information ex...
Article
Information overload occurs when the information available exceeds the user's ability to process it. To manage information overload, a user is required to discriminate among useful, redundant, incorrect, and meaningless information. From a computer science perspective, this means we must provide users with a combination of techniques and tools for...
Article
Full-text available
We propose the demonstration of Keymantic, a system for keyword-based searching in relational databases that does not require a-priori knowledge of instances held in a database. It finds numerous applications in situations where traditional keyword-based searching techniques are inapplicable due to the unavailability of the database contents for th...
Conference Paper
The increasing availability of data and eServices on the Web allows users to search for relevant information and to perform operations through eServices. Current technologies do not support users in the execution of such activities as a unique task; thus users have first to find interesting information, and then, as a separate activity, to find and...
Conference Paper
In this paper we propose a system supporting the semi-automatic definition of inter-attribute mappings and transformation functions used as an ETL tool in a data warehouse project. The tool supports both schema level analysis, exploited for the mapping definitions amongst the data sources and the data warehouse, and instance level operations, explo...
Conference Paper
Traditional techniques for query formulation need the knowledge of the database contents, i.e. which data are stored in the data source and how they are represented. In this paper, we discuss the development of a keyword-based search engine for structured data sources. The idea is to couple the ease of use and flexibility of keyword-based search wi...
Article
Extraction, Transformation and Loading processes (ETL) are crucial for the data warehouse consistency and are typically based on constraints and requirements expressed in natural language in the form of comments and documentations. This task is poorly supported by automatic software applications, thus making these activities a huge works for data w...
Conference Paper
Integration of heterogeneous information in the context of Internet is becoming a key activity to enable a more organized and semantically meaningful access to several kinds of information in the form of data sources, multimedia documents and web services. In NeP4B (Networked Peers for Business), a project funded by the Italian Ministry of Universi...
Conference Paper
Full-text available
Managing data and multimedia sources with a unique tool is a challenging issue. In this paper, the capabilities of the MOMIS integration system and the MILOS multimedia content management system are coupled, thus providing a methodology and a tool for building and querying a popu- lated ontology representing data and multimedia sources.
Conference Paper
Full-text available
In: DIST - Data Integration through Semantic Technology Workshop at the 3rd Asian Semantic Web Conference (Bangkok, 12 February 2009). Proceedings, article n. 3. DIST, 2008.
Article
The aim of the second edition of the workshop on Semantic Web Architectures for Enterprises (SWAE) is to evaluate how and how much the Semantic Web vision has met its promises with respect to business and market needs. On the basis of our research experience within the basic research Italian project NeP4B (http://www.dbgroup.unimo.it/nep4b/it/index...
Article
A new kind of metadata offers a synthesized view of an attribute's values for a user to exploit when creating or refining a search query in data-integration systems. The extraction technique that obtains these values is automatic and independent of an attribute domain but parameterized with various metrics for similarity measures. The authors descr...
Article
Full-text available
In this paper, we present MELIS (Meaning Elicitation and Lexical Integration System), a method and a software tool for enabling an incremental process of automatic annotation of local schemas (e.g. re- lational database schemas, directory trees) with lexical information. The distinguishing and original feature of MELIS is its incrementality: the hi...
Article
Full-text available
Integration of heterogeneous information in the context of Internet becomes a key activity to enable a more organized and semantically meaningful access to data sources. As Internet can be viewed as a data-sharing network where sites are data sources, the challenge is twofold. Firstly, sources present information according to their particular view...
Article
A new kind of metadata offers a synthesized view of an attribute's values for a user to exploit when creating or refining a search query in data-integration systems. The extraction technique that obtains these values is automatic and independent of an attribute domain but parameterized with various metrics for similarity measures. The authors descr...
Conference Paper
Full-text available
Research on data integration has provided languages and systems able to guarantee an integrated intensional representation of a given set of data sources. A significant limitation com mon to most proposals is that only intensional knowledge is considered, with little or no consideration for exten sional knowledge. In this paper we propose a techniq...
Conference Paper
Full-text available
Research on data integration has provided languages and systems able to guarantee an integrated intensional representation of a given set of d ata sources. A significant limitation common to most proposals is that only intensional know l- edge is considered, with little or no consideration for extensional knowledg e. In this paper we propose a tech...
Conference Paper
Full-text available
In this paper we present RELEVANTNews, a web feed reader that auto- matically groups news related to the same topic published in different newspapers in different days. The tool is based on RELEVANT, a previously developed tool, which computes the "relevant values", i.e. a subset of the values of a string at- tribute. Clustering the titles of the n...
Article
We propose an approach for describing a unified view of data and services in a peer-to-peer environment. The research areas of data and services are usually represented with different models and queried by different tools with different requirements. Our approach aims at providing the user with a "complete" knowledge (in terms of data and services)...
Article
Full-text available
The increasing globalization and flexibility required by companies has generated new issues in the last decade related to the managing of large scale projects and to the cooperation of enterprises within geographically distributed networks. ICT support systems are required to help enterprises share information, guarantee data-consistency and establ...
Article
Full-text available
The increasing globalization and flexibility required by companies has generated new issues in the last decade related to the managing of large scale projects and to the cooperation of enterprises within geographically distributed networks. ICT support systems are required to help enterprises share information, guarantee data-consistency and establ...
Conference Paper
Full-text available
Research on data integration has provided a set of rich and well understood schema mediation languages and systems that provide a meta-data representation of the modeled real world, while, in general, they do not deal with data instances. Such meta-data are necessary for querying classes result of an integration process: the end user typically does...
Conference Paper
Full-text available
Internet has opened the access to an overwhelming amount of data, requiring the development of new applications to automatically recognize, process and manage information available in web sites or web-based applications. The standard Semantic Web architecture exploits ontologies to give a shared (and known) meaning to each web source elements. In t...
Article
Full-text available
The tourism industry is a good candidate for taking up Semantic Web technology. In fact, there are many portals and web sites belonging to the tourism domain which promote tourist products (places to visit, food to eat, museums, …) and tourist services (hotels, events, …), published by several operators (tourist promoter associations, public agenci...
Conference Paper
Full-text available
We propose a novel approach for defining and querying a super-peer within a schema-based super-peer network organized into a two-level architecture: the low level, called the peer level (which contains a mediator node), the second one, called super-peer level (which integrates mediators peers with similar content). We focus on a single super-peer...
Article
Full-text available
Nowadays, data can be represented and stored by using different formats rang-ing from non structured data, typical of file systems, to semistructured data, typical of Web sources, to highly structured data, typical of relational database systems. Therefore, the necessity arises to define new models and approaches for uniformly handling datasources...
Conference Paper
Full-text available
The increasing of globalization and flexibility required to the companies has generated, in the last decade, new issues, related to the managing of large scale projects within geographically distributed networks and to the cooperation of enterprises. ICT support systems are required to allow enterprises to share information, guarantee data-consiste...
Article
Full-text available
The widespread diffusion of the World Wide Web among medium/small companies yields a huge amount of information to make business available online. Nevertheless the heterogeneity of that information, forces even trading partners involved in the same business process to face daily interoperability issues. The challenge is the integration of distribut...
Article
Full-text available
A marketplace is the place where the demands and offers of buyers and sellers participating in a business transaction may meet. Therefore, electronic marketplaces are virtual communities in which buyers may receive proposals from several suppliers and make the best choice. In the electronic commerce world, the comparison between different products...
Article
Full-text available
The widespread diffusion of the World Wide Web among medium/small companies yields a huge amount of information to make business available online. Nevertheless the heterogeneity of that information, forces even trading partners involved in the same business process to face daily interoperability issues. The challenge is the integration of distribut...
Article
Full-text available
The Mediator EnvirOnment for Multiple Information Sources (MOMIS) aims at constructing synthesized, integrated descriptions of the information coming from multiple heterogeneous sources, in order to provide the user with a global virtual view of the sources independent from their location and the level of hetero- geneity of their data. Such a globa...
Article
In the last few years, efforts have been done towards bridging the gap between agent technology and de facto standard technologies, aiming at introducing multi-agent systems in industrial applications. This paper presents an experience done by using one of such proposals, Agent UML. Agent UML is a graphical modelling language based on UML. The prac...
Article
Full-text available
The Mediator Environment for Multiple Information Sources (Momis) supports semiautomatic building, annotation, and extension of domain ontologies.
Conference Paper
Data integration, in the context of the web, faces new problems, due in particular to the heterogeneity of sources, to the fragmentation of the information and to the absence of a unique way to structure and view information. In such areas, the traditional paradigms, on which database foundations are based (i.e. client server architecture, few sour...
Conference Paper
This paper provides, firstly, a general description of the research project SEWASIE and, secondly, a proposal of an architectural evolution of the SEWASIE system in the direction of peer-to-peer paradigm. The SEWASIE project has the aim to design and implement an advanced search engine enabling intelligent access to heterogeneous data sources on th...

Network

Cited By