Marco A. Casanova

Marco A. Casanova
Pontifical Catholic University of Rio de Janeiro · Department of Informatics (INF)

Ph.D.

About

466
Publications
93,050
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,048
Citations
Introduction
Marco A. Casanova is Full Professor at the Department of Informatics of PUC-Rio. He obtained a Ph.D. in Applied Mathematics from Harvard University in 1979. His research interests concentrate on database conceptual modeling and construction of database management systems. He wrote 7 books, 48 journal articles and over 200 conference papers. He advised 15 Ph.D. theses and 53 M.Sc. dissertations. In July 2012, he received the Scientific Merit Award from the Brazilian Computer Society.
Additional affiliations
January 1992 - December 1996
IBM
Position
  • Manager
January 1997 - December 1999
IBM
Position
  • Manager
January 1982 - December 1989
IBM
Position
  • Researcher

Publications

Publications (466)
Conference Paper
The Text-to-SQL task involves generating SQL queries based on a given relational database and a Natural Language (NL) question. Although Large Language Models (LLMs) show good performance on well-known benchmarks, they are evaluated on databases with simpler schemas. This dissertation first evaluates their effectiveness on a complex and openly avai...
Conference Paper
An Enterprise Knowledge Graph (EKG) is a robust foundation for knowledge management, data integration, and advanced analytics across organizations. It achieves this by offering a semantic view that semantically integrates various data sources within an organization’s data lake. This paper introduces a novel data design pattern (DDP) aimed at constr...
Conference Paper
Oil and gas industry applications often require querying data of various types and integrating the query results. Data range from structured tables stored in databases to documents and images organized in digital libraries. The users typically have technical training but are not necessarily versed in Information Technology, meaning the data process...
Chapter
Full-text available
This paper proposes the use of narrative patterns as an effective guide to preserve thematic consistency in the composition of stories using Large Language Models (LLMs). Our approach drove inspiration from a well-accepted, thorough, and overarching classification of folklore types and the deservedly famous Monomyth characterization of heroic quest...
Preprint
Full-text available
A method for generating narratives by analyzing single images or image sequences is presented, inspired by the time immemorial tradition of Narrative Art. The proposed method explores the multimodal capabilities of GPT-4o to interpret visual content and create engaging stories, which are illustrated by a Stable Diffusion XL model. The method is sup...
Chapter
Full-text available
This paper first presents DANKE, a data and knowledge management platform that allows users to submit keyword queries to a centralized database. DANKE uses a knowledge graph to provide a semantic view of the centralized database in a vocabulary familiar to the users. The paper then describes DANKE-U, a specialized module that enables DANKE to handl...
Article
Full-text available
This article presents a novel and highly interactive process to generate natural language narratives based on our ongoing work on semiotic relations, providing four criteria for composing new narratives from existing stories. The wide applicability of this semiotic reconstruction process is suggested by a reputed literary scholar’s deconstructive c...
Article
Full-text available
The field of Personal Knowledge Management (PKM) has seen a surge in popularity in recent years. Interestingly, Natural Language Processing (NLP) and Large Language Models are also becoming mainstream, but PKM has not seen much integration with NLP. With this motivation, this article first introduces a methodology to automatically interconnect isol...
Conference Paper
Text-to-SQL refers to the task defined as “given a relational database D and a natural language sentence S that describes a question on D, generate an SQL query Q over D that expresses S”. Numerous tools have addressed this task with relative success over well-known benchmarks. Recently, several LLM-based text-to-SQL tools, that is, text-to-SQL too...
Conference Paper
The leaderboards of familiar benchmarks indicate that the best text-to-SQL tools are based on Large Language Models (LLMs). However, when applied to real-world databases, the performance of LLM-based text-to-SQL tools is significantly less than that reported for these benchmarks. A closer analysis reveals that one of the problems lies in that the r...
Poster
This poster paper proposes a family of Natural Language (NL) interfaces for databases (NLIDBs) that use ChatGPT and LangChain features to compile NL sentences expressing database questions into SQL queries or to extract keywords from NL sentences, which are passed to a database keyword search tool. The use of ChatGPT reduces dealing with NL questio...
Conference Paper
Full-text available
In this paper we introduce a novel highly interactive process to generate natural language narratives on the basis of our ongoing work on semiotic relations. To the two basic components of interactive systems, namely, a software tool and a user interface, we add a third component-AI agents, understood as an upgraded rendition of software agents. Ou...
Conference Paper
This paper addresses the access control problem in the context of database keyword search, when a user defines a query by a list of keywords, and not by SQL (or SPARQL) code. It describes the solutions implemented in DANKE, a database keyword search platform currently used in several industrial applications. DANKE offers two alternatives for managi...
Preprint
Full-text available
Assuming that the term 'metaverse' could be understood as a computer-based implementation of multiverse applications, we started to look in the present work for a logic that would be powerful enough to handle the situations arising both in the real and in the fictional underlying application domains. Realizing that first-order logic fails to accoun...
Conference Paper
Full-text available
Recently, the topic of Personal Knowledge Management (PKM) has seen a surge in popularity. This is illustrated by the accelerated growth of apps such as Notion, Obsidian, and Roam Research, as well as the appearance of books like “How to Take Smart Notes” and “Building a Second Brain.” However, the area of PKM has not seen much integration with Nat...
Article
Full-text available
A knowledge base, expressed using the Resource Description Framework (RDF), can be viewed as a graph whose nodes represent entities and whose edges denote relationships. The entity relatedness problem refers to the problem of discovering and understanding how two entities are related, directly or indirectly, that is, how they are connected by paths...
Chapter
Full-text available
In this paper we propose a new plot composition method based on situation calculus and Petri net models, which are applied, in a complementary fashion, to a narrative open to user co-authorship. The method starts with the specification of situation calculus schemas, which allow a planning algorithm to check if the specification covers the desired c...
Article
Full-text available
The entity relatedness problem refers to the question of exploring a knowledge base, represented as an RDF graph, to discover and understand how two entities are connected. This article addresses such problem by combining distributed RDF path search and ranking strategies in a framework called DCoEPinKB, which helps reduce the overall execution tim...
Article
Purpose Enterprise knowledge graphs (EKG) in resource description framework (RDF) consolidate and semantically integrate heterogeneous data sources into a comprehensive dataspace. However, to make an external relational data source accessible through an EKG, an RDF view of the underlying relational database, called an RDB2RDF view, must be created....
Preprint
Full-text available
The situation calculus logic model is convenient for modelling the actions that can occur in an information system application. The interplay of pre-conditions and post-conditions determines a semantically justified partial order of the defined actions and serves to enforce integrity constraints. This form of specification allows the use of plan-ge...
Article
Full-text available
Keyword search systems provide users with a friendly alternative to access Resource Description Framework (RDF) datasets. Evaluating such systems requires adequate benchmarks, consisting of RDF datasets, keyword queries, and correct answers. However, available benchmarks often have small sets of queries and incomplete sets of answers, mainly becaus...
Article
Keyword search is typically associated with information retrieval systems. However, recently, keyword search has been expanded to relational databases and RDF datasets, as an attractive alternative to traditional database access. This paper introduces DANKE, a platform for keyword search over databases, and discusses how third-party applications ca...
Article
The answer of a query, submitted to a database or a knowledge base, is often long and may contain redundant data. The user is frequently forced to browse through a long answer or refine and repeat the query until the answer reaches a manageable size. Without proper treatment, consuming the answer may indeed become a tedious task. This article then...
Article
Full-text available
A Natural Language Interface to Database (NLIDB) refers to a database interface that translates a question asked in natural language into a structured query. Aggregation questions express aggregation functions, such as count, sum, average, minimum and maximum, and optionally a group by clause and a having clause. NLIDBs deliver good results for sta...
Conference Paper
The entity relatedness problem refers to the question of exploring a knowledge base, represented as an RDF graph, to discover and understand how two entities are connected. This question can be addressed by implementing a path search strategy, which combines an entity similarity measure, with an expansion limit, to reduce the path search space and...
Conference Paper
Full-text available
A knowledge base, expressed using the Resource Description Framework (RDF), can be viewed as a graph whose nodes represent entities and whose edges denote relationships. The entity relatedness problem refers to the problem of discovering and understanding how two entities are related, directly or indirectly, that is, how they are connected by paths...
Article
This article introduces an algorithm to automatically translate a user-specified keyword-based query K to a SPARQL query Q so that the answers Q returns are also answers for K. The algorithm does not rely on an RDF schema, but it synthesizes SPARQL queries by exploring the similarity between the property domains and ranges, and the class instance s...
Chapter
Surveys are pervasive in the modern world, with its usage ranging from the field of customer satisfaction measurement to global economic trends tracking. Data collection is at the core of survey processes and, usually, is computer- aided. The development of data collection software involves the codification of questionnaires, which vary from simple...
Chapter
Full-text available
This chapter first defines a set of operations that create new ontologies, including their constraints, out of other ontologies. The projection, union, and deprecation operations help define new ontologies by reusing fragments of other ontologies, the intersection operation constructs the constraints that hold in two ontologies, and the difference...
Conference Paper
Full-text available
Natural Language Interface to Databases (NLIDB) systems usually do not deal with aggregations, which can be of two types: aggregation functions (such as count, sum, average, minimum, and maximum) and grouping functions (GROUP BY). This paper addresses the creation of a generic module, to be used in NLIDB systems, that allows such systems to perform...
Conference Paper
Keyword search is typically associated with information retrieval systems. However, recently, keyword search has been expanded to relational databases and RDF datasets, as an attractive alternative to traditional database access. With this motivation, this paper first introduces a platform for data and knowledge retrieval, called DANKE, concentrati...
Conference Paper
This paper proposes a process that modifies the presentation of a query answer to improve the quality of the user’s experience. The process is particularly useful when the answer is long and repetitive. The process reorganizes the original query answer by applying heuristics to summarize the results and to select template questions that create a us...
Preprint
Full-text available
Cloud computing is a general term that involves delivering hosted services over the Internet. With the accelerated growth of the volume of data used by applications, many organizations have moved their data into cloud servers to provide scalable, reliable and highly available services. A particularly challenging issue that arises in the context of...
Article
Stop-and-move semantic trajectories are segmented trajectories where the stops and moves are semantically enriched with additional data. A query language for semantic trajectory datasets has to include selectors for stops or moves based on their enrichments and sequence expressions that define how to match the results of selectors with the sequence...
Chapter
Full-text available
The proliferation of shared multimedia narratives on the Internet is due to three main factors: increasing number of narrative producers, availability of narrative-sharing services, and increasing popularization of mobile devices that allow recording, editing, and sharing narratives. These factors characterize the emergence of an environment we cal...
Article
Full-text available
This article presents an in-depth analysis and comparison of two computer science degree offerings, viz.. The analysis is based on the student transcripts collected from the academic systems of both institutions over circa one decade. The article starts with a description of the degrees and global statistics of the student population considered. Th...
Article
Full-text available
Cloud computing is a general term that involves delivering hosted services over the Internet. With the accelerated growth of the volume of data used by applications, many organizations have moved their data into cloud servers to provide scalable, reliable and highly available services. A particularly challenging issue that arises in the context of...
Chapter
Full-text available
This extended abstract first introduces the problem of keyword search overRDF datasets. Then, it expands the discussion to cover the question of serendipitous search as a strategy to diversify answers. Finally, it briefly presents the entity relatedness problem, which refers to the problem of exploring an RDF dataset to discover and understand how...
Chapter
Full-text available
A key contributor to the success of keyword search systems is a ranking mechanism that considers the importance of the retrieved documents. The notion of importance in graphs is typically computed using centrality measures that highly depend on the degree of the nodes, such as PageRank. However, in RDF graphs, the notion of importance is not necess...
Conference Paper
Full-text available
For several applications, an integrated view of linked data, denoted linked data mashup, is a critical requirement. Nonetheless, the quality of linked data mashups highly depends on the quality of the data sources. In this sense, it is essential to analyze data source quality and to make this information explicit to consumers of such data. This pap...
Conference Paper
Full-text available
For several applications, an integrated view of linked data, denoted linked data mashup, is a critical requirement. Nonetheless, the quality of linked data mashups highly depends on the quality of the data sources. In this sense, it is essential to analyze data source quality and to make this information explicit to consumers of such data. This pap...
Chapter
Full-text available
The world-wide drive for academic excellence is placing new requirements on educational data analysis, triggering the need to find less-trivial educational patterns in non-identically distributed data with noise, missing values and non-constant relations. Biclustering, the discovery of a subset of objects (whether students, teachers, researchers, c...
Conference Paper
Full-text available
Identifying and monitoring students who are likely to dropout is a vital issue for universities. Early detection allows institutions to intervene, addressing problems and retaining students. Prior research into the early detection of at-risk students has opted for the use of predictive models, but a comprehensive assessment of the suitability of di...
Preprint
Full-text available
This article presents a novel approach to estimate semantic entity similarity using entity features available as Linked Data. The key idea is to exploit ranked lists of features, extracted from Linked Data sources, as a representation of the entities to be compared. The similarity between two entities is then estimated by comparing their ranked lis...
Preprint
Full-text available
In the last decade, RDF emerged as a new kind of standardized data model, and a sizable body of knowledge from fields such as Information Retrieval was adapted to RDF graphs. One common task in graph databases is to define an importance score for nodes based on centrality measures, such as PageRank and HITS. The majority of the strategies highly de...
Preprint
Full-text available
This paper argues that certain ontology design problems are profitably addressed by treating ontologies as theories and by defining a set of operations that create new ontologies, including their constraints, out of other ontologies. The paper first shows how to use the operations in the context of ontology reuse, how to take advantage of the opera...
Chapter
Full-text available
Currently available datasets still have a large unexplored potential for interlinking. Ranking techniques contribute to this task by scoring datasets according to the likelihood of finding entities related to those of a target dataset. Ranked datasets can be either manually selected for standalone linking discovery tasks or automatically inspected...
Article
Full-text available
This article defines, implements, and evaluates techniques to automatically compare and recommend conferences. The techniques for comparing conferences use familiar similarity measures and a new measure based on co-authorship communities, called co-authorship network community similarity index. The experiments reported in the article indicate that...
Conference Paper
Full-text available
This paper describes an algorithm to perform keyword search over federated RDF datasets. The algorithm compiles keyword-based queries into federated SPARQL queries, without user intervention, under the assumption that the RDF datasets and the federation have a schema. The compilation process is explained in detail, including how to synthesize exter...
Conference Paper
This Demo presents a framework for the live synchronization of an RDF view defined on top of relational database. In the proposed framework, rules are responsible for computing and publishing the changeset required for the RDB-RDF view to stay synchronized with the relational database. The computed changesets are then used for the incremental maint...
Conference Paper
Collecting huge volumes of trajectories opens up new opportunities to capture time-varying and uncertain travel costs to traverse segments on a network. This kind of analyses happens to be conducted offline, by means of data mining analysis on historical data. However, there is a need to deal with the incremental nature of spatio-temporal data and...
Conference Paper
Full-text available
A knowledge base stores descriptions of entities and their relationships, often in the form of a very large RDF graph, such as DBpedia or Wikidata. The entity relatedness problem refers to the question of computing the relationship paths that better capture the connectivity between a given entity pair. This paper describes a dataset created to supp...
Conference Paper
Full-text available
This paper1 first argues that ontology design may benefit from treating ontologies as theories and from the definition of a set of operations that map ontologies into ontologies, especially their constraints. The paper then defines the class of ontologies used and proposes four operations to manipulate them. It proceeds to discuss how the operation...