Marcos Didonet Del Fabro

Marcos Didonet Del Fabro
Atomic Energy and Alternative Energies Commission | CEA · Laboratory of Research on Software-intensive Technologies (LIST)

PhD in Computer Science
Research engineer

About

95
Publications
32,915
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,428
Citations
Additional affiliations
October 2010 - present
Federal University of Paraná
Position
  • Professor (Associate)
Description
  • Object Oriented Programming, RDBMS, Model Driven Engineering, Algorithms and Data Structures
March 2022 - present
Atomic Energy and Alternative Energies Commission
Position
  • Research engineer
October 2008 - August 2010
IBM
Position
  • Researcher
Education
October 2004 - September 2007
Nantes Université
Field of study
  • Computer Science - Data Management
October 2002 - June 2003
Nice Sophia Antipolis University
Field of study
  • Computer Science - Network and Distributed Systems
March 1996 - February 2000
Federal University of Santa Maria
Field of study
  • Computer Science

Publications

Publications (95)
Article
Full-text available
Question Answering (QA) systems provide accurate answers to questions; however, they lack the ability to consolidate data from multiple sources, making it difficult to manage complex questions that could be answered with additional data retrieved and integrated on the fly. This integration is inherent to Situational Data Integration (SDI) approache...
Conference Paper
Model-driven engineering provides several advantages compared to a direct manual implementation of a system. In reverse-engineering applications, an existing code basis needs to be imported into the modeling language. However, there is an abstraction gap between the programming language (C++) and the modeling language, in our case UML. This gap imp...
Article
Full-text available
Datacentric (Dc) approaches are being used for data processing in several application domains, such as distributed systems, natural language processing, and others. There are different data processing frameworks that ease the task of parallel and distributed data processing. However, there are few research approaches studying on how to execute mode...
Article
NoSQL databases have emerged as an alternative to relational databases, which do not meet all the currently imposed scenarios. Large applications that handle a variety of data formats often use several types of databases and the need to migrate data between them is common. There are several approaches that perform this type of conversion. However,...
Chapter
Full-text available
Document stores are frequently used as representation format in many applications. It is often necessary to transform a set of data stored in a relational database (RDB) into a document store. However, it is difficult to evaluate which target document structure is the most appropriate for each scenario. In this article, we present a tool, called QB...
Chapter
Full-text available
Document stores are frequently used as representation format in many applications. It is often necessary to transform a set of data stored in a relational database (RDB) into a document store. There are several approaches that execute such translation. However, it is difficult to evaluate which target document structure is the most appropriate. In...
Article
As universidades públicas brasileiras têm papel chave como promotoras do desenvolvimento econômico, tecnológico e social do país. Diversas instituições de ensino e pesquisa, com seus laboratórios e grupos, têm contribuído fortemente não apenas com a formação de recursos humanos de alta qualidade, mas também com o desenvolvimento científico e tecnol...
Article
As universidades públicas brasileiras têm papel chave como promotoras do desenvolvimento econômico, tecnológico e social do país. Diversas instituições de ensino e pesquisa, com seus laboratórios e grupos, têm contribuído fortemente não apenas com a formação de recursos humanos de alta qualidade, mas também com o desenvolvimento científico e tecnol...
Article
Phonological disorders are characterized by substitutions, insertion and/or deletions of sounds during the process of language acquisition, which are known as Phonological Processes (PPs). In the speech therapy domain, an early identification of PPs allows the diagnosis and treatment of various pathologies and may improve clinical tasks, however, t...
Presentation
Full-text available
This presentation describes our approach for queries large integrated Open Data sources, called BIOD (Blended Integrated Open Data: https://biod.c3sl.ufpr.br/index_en.html). It focuses on analytic queries, composed by dimensions, metrics and filters. It publishes a RESTful API for queries the data sources. The API middleware, called BlenDb, imple...
Conference Paper
Full-text available
The large availability of tabular Open Data sources with hundreds of attributes and relations makes the query development a difficult task, where analytic queries are common. When writing such queries, often called SPJG (Select-Project-Join-GroupBy), it is necessary to understand a data model and to write JOIN operations. The most common approach i...
Conference Paper
Full-text available
Models and metamodels created using model-based approaches have restrict conformance relations. However, there has been an increase of semi-structured or schema-free data formats, such as document-oriented representations, which are often persisted as JSON documents. Despite not having an explicit schema/metamodel, these documents could be categori...
Article
Full-text available
RESUMO: Este artigo apresenta o Simulador de Custo-Aluno Qualidade (SimCAQ) como instrumento de planejamento orçamentário, articulando tal discussão ao custo do financiamento da educação em condições de qualidade no Brasil. Trata-se de um sistema computacional online que gera informações sobre as seguintes questões: qual é o Custo-Aluno Qualidade (...
Conference Paper
Full-text available
The availability of large Open Data sources creates opportunities for data analytics on different domains. But in order to be effectively used, the data needs to be correctly extracted, formatted and integrated, which is a specially challenging task on Open Data sources, since there is usually less rigour in standardizing subsequent data releases....
Article
Full-text available
Information extraction systems and techniques have been largely used to deal with the increasing amount of unstructured data available nowadays. Time is among the different kinds of information that may be extracted from such unstructured data sources, including text documents. However, the inability to correctly identify and extract temporal infor...
Conference Paper
Full-text available
The automatic text processing of natural language, with the use of probabilistic models and neural networks allows the analysis and classification of large volumes of text, leading the professionals and institutions of legal areato work more efficiently. However, the Natural Language Processing for Portuguese lacks of textual resources to support t...
Technical Report
Full-text available
While several public institutions provide its data openly, the effort required to access, integrate and query this data is too high, reducing the amount of possible dataset users. The Blended Integrated Open Data (BIOD) project has as objective to ease the access to public Open Data. It integrates and makes available more than 300Gb of data, contai...
Preprint
Full-text available
While several public institutions provide its data openly, the effort required to access, integrate and query this data is too high, reducing the amount of possible dataset users. The Blended Integrated Open Data (BIOD) project has as objective to ease the access to public Open Data. It integrates and makes available more than 300Gb of data, contai...
Article
Full-text available
Ontologies are formal specifications of conceptualizations. Their designs require to understand the concepts involved in the domain to be mapped. One well-known method to produce ontologies is to extract their concepts from relational databases. We conducted a practical study over a real-world scenario on applying existing rules and we identified o...
Conference Paper
Full-text available
Data-centric (Dc) and declarative languages are being used for data processing in several application domains, such as distributed systems, natural language processing, and others. In Model Transformations (MT), recognized declarative and hybrid languages have been used to develop model transformation rules. Considering the execution semantics, a l...
Conference Paper
Full-text available
The emergence of applications dealing with structured, semi-structured, and non-structured data created demands on new data storage systems. The relational model, widely used to store data from diverse applications, does not meet all the imposed scenarios. In response, NoSQL databases have emerged as an option. As a consequence, new approaches for...
Conference Paper
Full-text available
We present HOTMapper, a tool that maps tables of Open Data with historical information into unified data sources. The tool couples data exchange and integration techniques implemented into two main components: 1) a CLI script with commands to create and update tables, and to insert and update the data, using 2) a simple mapping definition file, int...
Article
Full-text available
Ontologies are formal specifications of conceptualizations. Their designs require to understand the concepts involved in the domain to be mapped. One well-known method to produce ontologies is to extract their concepts from relational databases. We conducted a practical study over a real-world scenario on applying existing rules and we identified o...
Conference Paper
Full-text available
Open Educational Resources (OER) are educational resources openly available to be used by educators and students and are an important tool to support education. A considerable effort has been made to build repositories that allow the sharing and reuse of these OERs. However, many of these repositories offer unsatisfactory search engines, resulting...
Conference Paper
Full-text available
Well-defined dictionaries of tagged entities are used in many tasks to identify entities where the scope is limited and there is no need to use machine learning. One common solution is to encode the input dictionary into Trie trees to find matches on an input text. However, the size of the dictionary and the presence of spelling errors on the input...
Conference Paper
Full-text available
On the context of large availability of open data, it is important to provide interactive solutions where a data transformation workflow can be easily deployed and developed. This article presents Metamorfose, a framework for data transformation based on large-scale data processing engine Apache Spark. Through the presented framework, the user can...
Conference Paper
Full-text available
The large availability of open government data raises enormous opportunities for open big data analytics. However, providing an end-to-end framework able to handle tasks from data extraction and processing to a web interface involves many challenges. One critical factor is the existence of many players with different knowledge, who need to interact...
Conference Paper
Full-text available
Abstract—Data analytics and scientific computing are two modern applications that in recent years have substantially changed their computation and communication needs, requiring additional processing capability and bandwidth to be able to keep pace with current demands. These applications are commonly processed within data centers, exchanging enorm...
Article
Full-text available
Model generation operations are important artifacts in MDE applications. These approaches can be used for model verification, model finding, and others. In many scenarios, model transformations can as well be represented by a model generation operation. This often comes with the advantage of being bidirectional and supporting increments. However, m...
Conference Paper
Full-text available
Abstract—The Unified Modeling Language (UML) has been widely adopted for modeling different sorts of applications. Despite having several kinds of diagrams, they were not designed verifying the execution of real-time embedded systems with time and energy constraints. There are UML profiles that capture this information, but it is necessary to rely...
Conference Paper
Full-text available
Open Educational Resources (OER) are important digital assets used for teaching and learning. There exists different repositories, but searching for such items is often a difficult task. On one hand, most part of the solutions implement engines with syntactic search based on term frequency metrics, or using the only item's metadata. On the other ha...
Conference Paper
Full-text available
The JSON format is been applied in a variety of applications: it is established as the de-facto standard for representing document stores; it is widely used to achieve interoperability and as the exchange format in RESTful web APIs. For these reasons, it is necessary to provide interoperability between JSON and other NoSQL formats. There are severa...
Conference Paper
Full-text available
The Brazilian government is maintaining several digital inclusion projects, providing computers and Internet connection to developing regions around the country. However, these projects can only succeed if they are constantly assessed; namely, the projects infrastructure deployment must be closely monitored and evaluated. In this paper, we introduc...
Conference Paper
Full-text available
Scientific applications demand huge computational power connected through fast networks. They are developed using parallel kernel methods, usually implemented with the Message Passing Interface (MPI), presenting well-behaved communication patterns across computing nodes. The current network technologies do not allow defining traffic forwarding poli...
Conference Paper
cientific applications (SciApps) are broadly used in all science domains. For more accurate results, they have been increasingly demanding computational power and extremely agile networks. These applications are usually implemented using numerical methods presenting well-behaved patterns to exchange data across its computing nodes. This paper prese...
Conference Paper
Full-text available
In this paper we focus on the aggregate query model implemented over NoSQL document-stores for read-mostly data bases. We discuss that the aggregate query model can be a good fit for read-mostly databases if the following design requirements are met: on-line time range queries, aggregates with predefined filters, frequent schema evolution and no ad...
Chapter
Full-text available
The Federal University of Paran´a (UFPR) hosts the cross-disciplinary research group C3SL, which has been investigating on open-source solutions for digital inclusion, covering different topics over the last fifteen years. It has been acting as a partner of different Brazilian public institutions and governments, backing up them for strategic choic...
Article
Full-text available
Network devices have always been considered as configurable black boxes until the emergence of Software-Defined Networking (SDN). SDN enables the networks to be programmed according to the user requirements; furthermore, it allows the network to be easily modified to suit transient demands. However, how do we program the network? SDN-compliant swit...
Article
Full-text available
ADL is a formal language to express archetypes, independent of standards or domain. However, its specification is not precise enough in relation to the specialization and semantic of archetypes, presenting difficulties in implementation and a few available tools. Archetypes may be implemented using other languages such as XML or OWL, increasing int...
Conference Paper
Full-text available
We present two approaches to time expression identification, as entered in to SemEval-2015 Task 6, Clinical TempEval. The first is a comprehensive rule-based approach that favoured recall, and which achieved the best recall for time expression identification in Clinical TempEval. The second is an SVM-based system built using readily available compo...
Conference Paper
Full-text available
We present two approaches to time expression identification, as entered in to SemEval2015 Task 6, Clinical TempEval. The first is a comprehensive rule-based approach that favoured recall, and which achieved the best recall for time expression identification in Clinical TempEval. The second is an SVM-based system built using readily available compon...
Conference Paper
Full-text available
Abstract Annotating the semantics of time in language is important. THYME (Styler et al., 2014) is a recent temporal annotation standard for clinical texts. This paper examines temporal expressions in the first major corpus released under this standard. It investigates where the standard has proven difficult to apply, and gives a series of recom...
Conference Paper
Full-text available
Nowadays, Big Data applications exchange huge amounts of data, highly demanding network guarantees for band-width and low latency. However, network equipments did not provide a standard interface to control dynamically the resources. Software-Defined Network (SDN) has emerged to support network programmability, but it provides a programming model d...
Patent
Full-text available
An aspect of the invention includes transforming a source model to a target model. A source model is received and a transformation specification that includes a set of rules is accessed. Each rule includes a pattern description and a production component. The pattern description includes a pattern in the source model and the production component in...
Conference Paper
Full-text available
Analysis of unstructured data may be inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, with a supporting dictionary. However, they are not rich enough to encode phonetic information to assist the search. In this paper, we present a novel approach for efficientl...
Conference Paper
Full-text available
Model transformation is a central activity in Model Driven Engineering (MDE) as it specifies how models are consumed to generate other models or code. Complex scenarios typically involve the execution of several transformations that, due to variability of solutions to develop software projects, need to be tailored to attempt different implementatio...
Article
The availability of Distributed Database Management Systems (DDBMS) is related to the probability of being up and running at a given point in time and to the management of failures. One well-known and widely used mechanism to ensure availability is replication, which includes performance impact on maintaining data replicas across the DDBMS's machin...
Conference Paper
Full-text available
The design of the NoSQL schema has a direct impact on the scalability of web applications. Especially for developers with little experience in NoSQL stores, the risks inherent in poor schema design can be incalculable. Worse yet, the issues will only manifest once the application has been deployed, and the growing user base causes highly concurrent...
Conference Paper
Full-text available
In model-driven engineering, model transformations are considered a key element to generate and maintain consistency between related models. Rule-based approaches have become a mature technology and are widely used in different application domains. However, in various scenarios, these solutions still suffer from a number of limitations that stem fr...
Conference Paper
Full-text available
Model transformations are widely used by Model-Driven En-gineering (MDE) platforms to apply different kinds of operations over models, such as model translation, evolution or composition. However, existing solutions are not designed to handle very large models (VLMs), thus facing scalability issues. Coupling MDE with cloud-based platforms may help...
Article
Full-text available
Teaching web development in Computer Science undergraduate courses is a difficult task. Often, there is a gap between the students' experiences and the reality in the industry. As a consequence, the students are not always well-prepared once they get the degree. This gap is due to several reasons, such as the complexity of the assignments, the work...
Conference Paper
The main goal of this workshop is to bring together two different communities: the Model-Driven Engineering (MDE) community and the logic programming community, to explore how each community can benefit from the techniques of the other. Are both communities friends or foes?
Conference Paper
Full-text available
Finding the right configuration is often a challenging task since one needs to deal with many dependencies between plug-ins and most of existing configuration engines are not flexible enough to work in different scenarios. In this paper we propose a MDE-based approach to solve configuration problems, considering them as constraints satisfaction pro...
Conference Paper
Full-text available
Most of us have experienced configuration issues when installing new software applications. Finding the right configuration is often a challenging task since we need to deal with many dependencies between plug-ins, components, libraries, packages, etc; sometimes even regarding specific versions of the involved artefacts. Right now, most configurati...