Aline Senart’s research while affiliated with SAP Research and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (11)


Fig. 1. Processing pipeline for objective dataset quality assessment
Fig. 2. Average Error % per quality indicator for LOD group
Objective Linked Data Quality Framework
Towards an Objective Assessment Framework for Linked Data Quality: Enriching Dataset Profiles With Quality Indicators
  • Chapter
  • Full-text available

January 2018

·

104 Reads

·

4 Citations

·

Aline Senart

·

Ensuring data quality in Linked Open Data is a complex process as it consists of structured information supported by models, ontologies and vocabularies and contains queryable endpoints and links. In this paper, the authors first propose an objective assessment framework for Linked Data quality. The authors build upon previous efforts that have identified potential quality issues but focus only on objective quality indicators that can measured regardless on the underlying use case. Secondly, the authors present an extensible quality measurement tool that helps on one hand data owners to rate the quality of their datasets, and on the other hand data consumers to choose their data sources from a ranked set. The authors evaluate this tool by measuring the quality of the LOD cloud. The results demonstrate that the general state of the datasets needs attention as they mostly have low completeness, provenance, licensing and comprehensibility quality scores.

Download

Fig. 1. Processing pipeline for objective dataset quality assessment
Table 1 : Objective Linked Data Quality Framework
Fig. 2. Average Error % per quality indicator for LOD group
Towards An Objective Assessment Framework for Linked Data Quality:: Enriching Dataset Profiles with Quality Indicators

July 2016

·

411 Reads

·

18 Citations

Ensuring data quality in Linked Open Data is a complex process as it consists of structured information supported by models, ontologies and vocabularies and contains queryable endpoints and links. In this paper, the authors first propose an objective assessment framework for Linked Data quality. The authors build upon previous efforts that have identified potential quality issues but focus only on objective quality indicators that can measured regardless on the underlying use case. Secondly, the authors present an extensible quality measurement tool that helps on one hand data owners to rate the quality of their datasets, and on the other hand data consumers to choose their data sources from a ranked set. The authors evaluate this tool by measuring the quality of the LOD cloud. The results demonstrate that the general state of the datasets needs attention as they mostly have low completeness, provenance, licensing and comprehensibility quality scores.


Table 2 : Harmonized Dataset Models Mappings 
HDL -Towards a Harmonized Dataset Model for Open Data Portals

June 2015

·

335 Reads

·

22 Citations

The Open Data movement triggered an unprecedented amount of data published in a wide range of domains. Governments and corporations around the world are encouraged to publish, share, use and integrate Open Data. There are many areas where one can see the added value of Open Data, from transparency and self-empowerment to improving efficiency, effectiveness and decision making. This growing amount of data requires rich metadata in order to reach its full potential. This meta-data enables dataset discovery, understanding, integration and maintenance. Data portals, which are considered to be datasets' access points, offer metadata represented in different and heterogenous models. In this paper, we first conduct a unique and comprehensive survey of seven meta-data models: CKAN, DKAN, Public Open Data, Socrata, VoID, DCAT and Schema.org. Next, we propose a Harmonized Dataset modeL (HDL) based on this survey. We describe use cases that show the benefits of providing rich metadata to enable dataset discovery, search and spam detection.


Figure 1: Processing pipeline for validating and generating dataset profiles
Fig. 1. Processing pipeline for validating and generating dataset profiles 
table 2 .
Roomba: An Extensible Framework to Validate and Build Dataset Profiles

June 2015

·

117 Reads

·

16 Citations

Lecture Notes in Computer Science

Linked Open Data (LOD) has emerged as one of the largest collections of interlinked datasets on the web. In order to benefit from this mine of data, one needs to access to descriptive information about each dataset (or metadata). This information can be used to delay data en-tropy, enhance dataset discovery, exploration and reuse as well as helping data portal administrators in detecting and eliminating spam. However, such metadata information is currently very limited to a few data portals where they are usually provided manually, thus being often incomplete and inconsistent in terms of quality. To address these issues, we propose a scalable automatic approach for extracting, validating, correcting and generating descriptive linked dataset profiles. This approach applies several techniques in order to check the validity of the metadata provided and to generate descriptive and statistical information for a particular dataset or for an entire data portal.


Fig. 1: Error % by section 
Figure 1: Error % by section
Figure 2: Error % by information type
What’s up LOD Cloud?: Observing the State of Linked Open Data Cloud Metadata

June 2015

·

855 Reads

·

11 Citations

Lecture Notes in Computer Science

Linked Open Data (LOD) has emerged as one of the largest collections of interlinked datasets on the web. In order to benefit from this mine of data, one needs to access descriptive information about each dataset (or metadata). However, the heterogeneous nature of data sources reflects directly on the data quality as these sources often contain inconsistent as well as misinterpreted and incomplete metadata information. Considering the significant variation in size, the languages used and the freshness of the data, one realizes that finding useful datasets without prior knowledge is increasingly complicated. We have developed Roomba, a tool that enables to validate, correct and generate dataset metadata. In this paper, we present the results of running this tool on parts of the LOD cloud accessible via the datahub.io API. The results demonstrate that the general state of the datasets needs more attention as most of them suffers from bad quality metadata and lacking some informative metrics that are needed to facilitate dataset search. We also show that the automatic corrections done by Roomba increase the overall quality of the datasets metadata and we highlight the need for manual efforts to correct some important missing information.


Fig. 1. Processing pipeline for validating and generating dataset profiles
Roomba: An extensible framework to validate and build dataset profiles

January 2015

·

180 Reads

·

7 Citations

Linked Open Data (LOD) has emerged as one of the largest collections of interlinked datasets on the web. In order to benefit from this mine of data, one needs to access to descriptive information about each dataset (or metadata). This information can be used to delay data entropy, enhance dataset discovery, exploration and reuse as well as helping data portal administrators in detecting and eliminating spam. However, such metadata information is currently very limited to a few data portals where they are usually provided manually, thus being often incomplete and inconsistent in terms of quality. To address these issues, we propose a scalable automatic approach for extracting, validating, correcting and generating descriptive linked dataset profiles. This approach applies several techniques in order to check the validity of the metadata provided and to generate descriptive and statistical information for a particular dataset or for an entire data portal.


SNARC - An Approach for Aggregating and Recommending Contextualized Social Content

May 2013

·

93 Reads

·

1 Citation

Lecture Notes in Computer Science

The Internet has created a paradigm shift in how we consume and disseminate information. Data nowadays is spread over heterogeneous silos of archived and live data. People willingly share data on social media by posting news, views, presentations, pictures and videos. SNARC is a service that uses semantic web technology and combines services available on the web to aggregate social news. SNARC brings live and archived information to the user that is directly related to his active page. The key advantage is an instantaneous access to complementary information without the need to dig for it. Information appears when it is relevant enabling the user to focus on what is really important.


Data Quality Principles in the Semantic Web

May 2013

·

136 Reads

·

9 Citations

The increasing size and availability of web data make data quality a core challenge in many applications. Principles of data quality are recognized as essential to ensure that data fit for their intended use in operations, decision-making, and planning. However, with the rise of the Semantic Web, new data quality issues appear and require deeper consideration. In this paper, we propose to extend the data quality principles to the context of Semantic Web. Based on our extensive industrial experience in data integration, we identify five main classes suited for data quality in Semantic Web. For each class, we list the principles that are involved at all stages of the data management process. Following these principles will provide a sound basis for better decision-making within organizations and will maximize long-term data integration and interoperability.


remix: A Semantic Mashup Application

September 2012

·

9 Reads

·

1 Citation

Lecture Notes in Computer Science

With today’s public data sets containing billions of data items, more and more companies are looking to integrate external data with their traditional enterprise data to improve business intelligence analysis. These distributed data sources however exhibit heterogeneous data formats and terminologies and may require helping the user merging data coming from heterogeneous sources. remix is a Business Intelligence (BI) solution that offers business users a productive environment to easily create highly formatted reports they would ultimately like to see. Via rich visual context-aware interactions, users can quickly combine shared data sets and reuse report parts. This enhanced collaboration combined with the provision of a multi-source semantic layer give users the power to make more effective and informed decisions on virtually any relevant data source or BI resource wherever they are.


RUBIX, A Framework for Improving Data Integration with Linked Data

May 2012

·

62 Reads

·

3 Citations

·

Eldad Louw

·

Aline Senart

·

[...]

·

David Trastour

With today's public data sets containing billions of data items, more and more companies are looking to integrate external data with their traditional enterprise data to improve business intelligence analysis. These distributed data sources however exhibit heterogeneous data formats and terminologies and may contain noisy data. In this paper, we present RUBIX, a novel framework that enables business users to semi-automatically perform data integration on potentially noisy tabular data. This framework offers an extension to Google Refine with novel schema matching algorithms leveraging Freebase rich types. First experiments show that using Linked Data to map cell values with instances and column headers with types improves significantly the quality of the matching results and therefore should lead to more informed decisions.


Citations (10)


... The following list contains particular criteria and their interconnection to items of 5-star Linked Open Data ranking scheme -web accessibility (*), open licence (*), machine-readable data format (**), non-proprietary data format (***), RDF (****) and links to external data (*****). The following five properties (Table 2) of data resources are not declared in one formal document, but they are mentioned in various articles or papers such as [18][19][20][21][22]. These properties can help the Linked Data processing or they indicate a data quality. ...

Reference:

LINKED GEO-DATA RESOURCES
Towards an Objective Assessment Framework for Linked Data Quality: Enriching Dataset Profiles With Quality Indicators

... Usability quality refers to the usability of a GeoKG. Drawing from evaluation indicators used in KGs, linked data, geographic data, and the Big Data, usability quality encompasses four aspects: availability, security, interlinking, and licensing (Assaf, Senart, and Troncy 2016;Dror, Dalyot, and Doytsher 2015;Olama et al. 2014;Sherif et al. 2023). Availability refers to the degree of accessibility during the usage process. ...

Towards An Objective Assessment Framework for Linked Data Quality:: Enriching Dataset Profiles with Quality Indicators

... By using the zip code entities as the transit node, we can ask questions across graphs like which zip code areas had more than 10 electric vehicle charging stations in 2021 but were affected by multiple wildfires in the past. Third, since KWG also contains various co-reference resolution links to other knowledge graphs, this practice essentially makes our EVKG become an important component in the Linked Open Data Cloud (Assaf et al., 2015). ...

What’s up LOD Cloud?: Observing the State of Linked Open Data Cloud Metadata

Lecture Notes in Computer Science

... It can improve the usability of datasets in applications and services. (Assaf et al., 2015;Inkinen et al., 2019;Sinif & Bounabat, 2018) Multi-language Support The availability of multi-language support increases the inclusiveness of diverse stakeholders in the open data ecosystem. The European open data portal also supports multi-language support in metadata provision. ...

HDL -Towards a Harmonized Dataset Model for Open Data Portals

... The results of the research can be used by the DBpedia community (publisher) to eliminate the errors in its further editions. Data quality assessment tools such as ABSTAT [36], Loupe [33], DistQual-ityAssessment [42], Roomba [2] focus on understanding statistical information which include number of triples, and implicit vocabulary information. The information derived from these tools help the user get insight into the dataset that includes detecting outliers in the vocabulary usage, most frequent patterns in linked data, and thus interpreting data quality. ...

Roomba: An Extensible Framework to Validate and Build Dataset Profiles

Lecture Notes in Computer Science

... The objective of the project is to promote the development of the data web by providing data sets that are available under open licenses, converting them to RDF by the principles of linked data, and publishing them on the web (Bizer et al. 2009). Therefore, in Assaf et al. (2015) the authors propose a scalable automatic approach for extracting, validating, correcting, and generating descriptive linked dataset profiles. The system includes indexing the dataset, recommending domain vocabularies for the metadata, identifying the knowledge domain to which the dataset belongs, and generating the profile of the dataset composed of descriptive and structural metadata. ...

Roomba: An extensible framework to validate and build dataset profiles

... No último caso os estudos incluem principalmente propostas e frameworks de técnicas de visualização [11]. Adicionalmente, pesquisas apresentam aplicações para combinar e apresentar dados, e.g., agregando notícias da mídia social [12]. ...

SNARC - An Approach for Aggregating and Recommending Contextualized Social Content

Lecture Notes in Computer Science

... Utilizing a data interoperability framework in the development of FMIS application is not trivial, complicated, and deal with many problems [15]. The four main potential problems related to the use of open data through data interoperability framework are (1) heterogeneities of the schema, (2) granularity of data, (3) mismatch entity naming and data unit, and (4) inconsistency of data [16]. ...

RUBIX, A Framework for Improving Data Integration with Linked Data

... Furthermore, W3C provides a framework for data quality description 3 , and multiple approaches for assessing semantic data quality data are present in the literature. Among them, worth mentioning [1], who extends data quality to every stage of the creation process, and [5], who provides a definition for 44 measures on the basis of the ISO definitions tested over large RDF datasets. Although many dimensions can be evaluated with an automatic or semi-automatic approach, some of them require human validation or the aid of domain experts, especially for provenance and contextual information criteria related to the specific use case [8,4]. ...

Data Quality Principles in the Semantic Web

... This work belongs to the general domain of table information extraction covering a wide range of topics such as table structure understanding[26]that aims to uncover structural relations underlying table layout in complex tables; relational table identification that aims to separate tables containing relational data from noisy ones used for, e.g., page formatting, and then subsequently identifying table schema[5,4,1]; table schema matching and data integration that aims to merge tables describing similar data[2,25,3,13]; and semantic Table Interpretation, which is the focus of this work. It also belongs to the domain of (semi-)structured Information Extraction, where an extensive amount of literature is marginally related. ...

Improving Schema Matching with Linked Data