
Pieter ColpaertGhent University | UGhent · Internet and Data Lab
Pieter Colpaert
About
74
Publications
11,189
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,317
Citations
Citations since 2017
Introduction
Skills and Expertise
Publications
Publications (74)
To unlock the full potential of Linked Data sources, we need flexible ways to query them. Public SPARQL endpoints aim to fulfill that need, but their availability is notoriously problematic. We therefore introduce Linked Data Fragments, a publishing method that allows efficient offloading of query execution from servers to clients through a lightwe...
Publishing transport data on the Web for consumption by others poses several challenges for data publishers. In addition to planned schedules, access to live schedule updates (e.g. delays or cancellations) and historical data is fundamental to enable reliable applications and to support machine learning use cases. However publishing such dynamic da...
ANPR cameras allow the automatic detection of vehicle license plates and are increasingly used for law enforcement. However, also statistical data generated by ANPR cameras are a potential source of urban insights. In order for this data to reach its full potential for policy-making, we research how this data can be shared in digital twins, with re...
The Solid vision aims to make data independent of applications through technical specifications , which detail how to publish and consume permissioned data across multiple autonomous locations called "pods". The current document-centric interpretation of Solid, wherein a pod is a single hierarchy of Linked Data documents, cannot fully realize this...
Cultural heritage institutions maintain digital artefacts of their collections using Collection Management Software (CMS). In order to attract new audiences, these data should be interoperable with and reusable within other Web APIs. In this article, we explain how we applied Flemish Linked Data Standards (OSLO) to make the data within the Axiell C...
The European Union Agency for Railways is an European authority, tasked with the provision of a legal and technical framework to support harmonized and safe cross-border railway operations throughout the EU. So far, the agency relied on traditional application-centric approaches to support the data exchange among multiple actors interacting within...
Smart cities need (sensor) data for better decision-making. However, while there are vast amounts of data available about and from cities, an intermediary is needed that connects and interprets (sensor) data on a Web-scale. Today, governments in Europe are struggling to publish open data in a sustainable, predictable and cost-effective way. Our res...
Public transit operators often publish their open data in a data dump, but developers with limited computational resources may not have the means to process all this data efficiently. In our prior work we have shown that geospatially partitioning an operator’s network can improve query times for client-side route planning applications by a factor o...
Text-fields that need to look up specific entities in a dataset can be equipped with autocompletion functionality. When a dataset becomes too large to be embedded in the page, setting up a full-text search API is not the only alternative. Alternate API designs that balance different trade-offs such as archivability, cacheability and privacy, may no...
Fostering interoperability, Public Sector Bodies (PSBs) maintain datasets that should become queryable as an integrated Knowledge Graph (KG). While some PSBs allow to query a part of the KG on their servers, others favor publishing data dumps allowing the querying to happen on third party servers. As the budget of a PSB to publish their dataset on...
Cycling as a mean of urban transportation is positively correlated with cleaner, healthier and happier cities. By providing more infrastructure, such as secure parking facilities, cities aim on attracting more cyclists. However, authoritative information about parking facilities is heavily decentralized and heterogeneous, which makes secure parking...
Route planning is key in application domains such as delivery services, tourism advice and ride sharing. Today’s route planning as a service solutions do not cover all requirements of each use case, forcing application developers to build their own self-hosted route planners. This quickly becomes expensive to develop and maintain, especially when i...
There is an abundance of services and applications that find the most efficient route between two places, people are not always interested in efficiency; sometimes we just want a pleasant route. Such routes are subjective though, and may depend on contextual factors that route planners are oblivious to. One possible solution is to automatically lea...
At the end of 2019, Chinese authorities alerted the World Health Organization (WHO) of the outbreak of a new strain of the coronavirus, called SARS-CoV-2, which struck humanity by an unprecedented disaster a few months later. In response to this pandemic, a publicly available dataset was released on Kaggle which contained information of over 63,000...
Open data and its effects on society are always woven into infrastructural legacies, social relations, and the political economy. This raises questions about how our understanding and engagement with open data shifts when we focus on its situated use.
To shed a light on these questions, Situating Open Data provides several empirical accounts of ope...
jats:p>Abstract. One of the promises of the smart city concept is using real-time data to enhance policy making. In practice, such promises can turn out to be either very limited in what is actually possible or quickly trigger dystopian scenarios of tracking and monitoring. Today, many cities around the world already measure forms of urban bustle,...
Public transit operators often publish their open data as a single data dump, but developers with limited computational resources may not be able to process all this data. Existing work has already focused on fragmenting the data by departure time, so that data consumers can be more selective in the data they process. However, each fragment still c...
Web-based information services transformed how we interact with public transport. Discovering alternatives to reach destinations and obtaining live updates about them is necessary to optimize journeys and improve the quality of travellers’ experience. However, keeping travellers updated with opportune information is demanding. Traditional Web APIs...
There are two mechanisms for publishing live changing resources on the Web: a client can pull the latest state of a resource or the server pushes updates to the client. In the state of the art, it is clear that pushing delivers a lower latency compared to pulling, however, this has not been tested for an Open Data usage scenario where 15 k clients...
Route planning providers manually integrate different geo-spatial datasets before offering a Web service to developers, thus creating a closed world view. In contrast, combining open datasets at runtime can provide more information for user-specific route planning needs. For example, an extra dataset of bike sharing availabilities may provide more...
Maintaining an Open Dataset comes at an extra recurring cost when it is published in a dedicated Web interface. As there is not often a direct financial return from publishing a dataset publicly, these extra costs need to be minimized. Therefore we want to explore reusing existing infrastructure by enriching existing websites with Linked Data. In t...
For better traffic flow and making better policy decisions, the city of Antwerp is connecting traffic lights to the Internet. The live “time to green” only tells a part of the story: also the historical values need to be preserved and need to be made accessible to everyone. We propose (i) an ontology for describing the topology of an intersection a...
When benchmarking RDF data management systems such as public transport route planners, system evaluation needs to happen under various realistic circumstances, which requires a wide range of datasets with different properties. Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and in...
The road to publishing public streaming data on the Web is paved with trade-offs that determine its viability. The cost of unrestricted query answering on top of data streams, may not be affordable for all data publishers. Therefore, public streams need to be funded in a sustainable fashion to remain online. In this paper we present an overview of...
The road to publishing public streaming data on the Web is paved with trade-offs that determine its viability. The cost of unrestricted query answering on top of data streams, may not be affordable for all data publishers. Therefore, public streams need to be funded in a sustainable fashion to remain online. In this paper we present an overview of...
For smart decision making, user agents need live and historic access to open data from sensors installed in the public domain. In contrast to a closed environment, for Open Data and federated query processing algorithms, the data publisher cannot anticipate in advance on specific questions, nor can it deal with a bad cost-efficiency of the server i...
Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed-even though it has a strong...
Time series – such as public transport time schedules and their actual departure times – may deliver insights about the public transport network to third parties. Today, however, public transport data is published in a way in which analytical processing is too expensive. In previous work, the Linked Connections(LC) framework was introduced as a cos...
Using Linked Data based approaches, public transport companies are able to share their time tables and its updates in an affordable way while allowing user agents to perform multimodal route planning algorithms. Providing time table updates, usually published as data streams, means that data is being constantly modified and if there is a large anal...
While some public transit data publishers only provide a data dump – which only few reusers can afford to integrate within their applications – others provide a use case limiting origin-destination route planning api. The Linked Connections framework instead introduces a hypermedia api, over which the extendable base route planning algorithm “Conne...
The European Data Portal shows a growing number of governmental organisations opening up transport data. As end users need traffic or transit updates on their day-to-day travels, route planners need access to this government data to make intelligent decisions. Developers however, will not integrate a dataset when the cost for adoption is too high....
On dense railway networks "such as in Belgium" train travelers are frequently confronted with overly occupied trains, especially during peak hours. Crowdedness on trains leads to a deterioration in the quality of service and has a negative impact on the well-being of the passenger. In order to stimulate travelers to consider less crowded trains, th...
Calculating a public transit route involves taking into account user preferences: e.g., one might prefer trams over buses, one might prefer a slight detour to pass by their favorite coffee bar or one might only be interested in wheelchair accessible journeys. Traditional route planning interfaces do not expose enough features for these kind of ques...
Linked Data interfaces exist in many flavours, as evidenced by subject pages, SPARQL endpoints, triple pattern interfaces, and data dumps. These interfaces are mostly used to retrieve parts of a complete dataset, such parts can for example be defined by ranges in one or more dimensions. Filtering Linked Data by dimensions such as time range, geospa...
Base registries are trusted authentic information sources controlled by an appointed public administration or organization appointed by the government. Maintaining a base registry comes with extra maintenance costs to create the dataset and keep it up to date. In this paper, we study the possibility to entangle the maintenance of base registries at...
Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed—even though it has a strong...
The Public Sector Information directive has made Open Data the default within European Public Sector Bodies. End-user multimodal planners need access to government data to make intelligent route planning decisions. We studied both the needs of the market and the vision of the department of Mobility and Public Works in Flanders by interviewing 6 mar...
Traditional rdf stream processing engines work completely server-side, which contributes to a high server cost. For allowing a large number of concurrent clients to do continuous querying, we extend the low-cost Triple Pattern Fragments (tpf) interface with support for time-sensitive queries. In this poster, we give the overview of a client-side rd...
Existing solutions to query dynamic Linked Data sources extend the sparql language, and require continuous server processing for each query. Traditional sparql endpoints already accept highly expressive queries, so extending these endpoints for time-sensitive queries increases the server cost even further. To make continuous querying over dynamic L...
Existing solutions to query dynamic Linked Data sources extend the SPARQL language, and require continuous server processing for each query. Traditional SPARQL endpoints accept highly expressive queries, contributing to high server cost. Extending these endpoints for time-sensitive queries increases the server cost even further. To make continuous...
In the field of smart cities, researchers need an indication of how people move in and between cities. Yet, getting statistics of travel flows within public transit systems has proven to be troublesome. In order to get an indication of public transit travel flows in Belgium, we analyzed the query logs of the iRail API, a highly expressive route pla...
Billions of Linked Data triples exist in thousands of RDF knowledge graphs on the Web, but few of those graphs can be queried live from Web applications. Only a limited number of knowledge graphs are available in a queryable interface, and existing interfaces can be expensive to host at high availability. To mitigate this shortage of live queryable...
Billions of Linked Data triples exist in thousands of RDF knowledge graphs on the Web, but few of those graphs can be queried live from Web applications. Only a limited number of knowledge graphs are available in a queryable interface, and existing interfaces can be expensive to host at high availability. To mitigate this shortage of live queryable...
Ever since public transit agencies have found their way to the Web, they inform travelers using route planning software made available on their website. These travelers also need to be informed about other modes of transport, for which they have to consult other websites, or for which they have to ask the transit agency's server maintainer to imple...
Data Catalog Vocabulary (DCAT) is a W3C specification to describe datasets published on the Web. However, these catalogs are not easily discoverable based on a user's needs. In this paper, we introduce the Node.js module "dcat-merger" which allows a user agent to download and semantically merge different DCAT feeds from the Web into one DCAT feed,...
For publishers of Linked Open Data, providing queryable access to their dataset is costly. Those that offer a public sparql endpoint often have to sacrifice high availability; others merely provide non-queryable means of access such as data dumps. We have developed a client-side query execution approach for which servers only need to provide a ligh...
A proposed technique quantifies the semantic interoperability of open government datasets with three metrics calculated using a set of statements that indicate for each pair of identifiers in the system whether or not they represent the same concept.
As the Web of Data is growing at an ever increasing speed, the lack of reliable query solutions for live public data becomes apparent. SPARQL implementations have matured and deliver impressive performance for public SPARQL endpoints, but poor availability—especially under high loads—prevents their use in real-world applications. We propose to tack...
To inform citizens when they can use government services, governments publish the services' opening hours on their website. When opening hours would be published in a machine interpretable manner, software agents would be able to answer queries about when it is possible to contact a certain service. We introduce an ontology for describing opening h...
Intermodal route planners need to be provided with a lot of data from various sources: geographical data, speed limits, road blocks, time schedules, real-time vehicle locations, etc. These datasets need to be interoperable world-wide. Today, a lot of data integration needs to be done before this data can be be reused. Route planning becomes a data...
If we want a broad adoption of Linked Data, the barrier to conform to the Linked Data principles need to be as low as possible. One of the Linked Data principles is that URIs should be dereferenceable. This demonstrator shows how to set up The DataTank and configure a Linked Data repository, such as a turtle file or SPARQL endpoint, in it. Differen...
Despite the significant number of existing tools, incorporating data into the Linked Open Data cloud remains complicated; hence discouraging data owners to publish their data as Linked Data. Unlocking the semantics of published data, even if they are not provided by the data owners, can contribute to surpass the barriers posed by the low availabili...
Many organisations publish their data through a Web API. This stimulates use by Web applications, enabling reuse and enrichments. Recently, resource-oriented APIs are increasing in popularity because of their scalability. However, for organisations subject to data archiving, creating such an APIraises certain issues. Often, datasets are stored in d...
This article investigates bottom-up socio-technical innovations with and by citizen developers in an Urban living Lab, which is considered a platform for grassroots service creation in a city. In specific, the Living Lab framework is discussed as an instrumental platform within a Smart City, facilitating the governance of bottom-up innovation ‘by’...
Despite the significant number of existing tools, incorporating data from multiple sources and different formats into the Linked Open Data cloud remains complicated. No mapping formalisation exists to define how to map such heterogeneous sources into RDF in an integrated and interoperable fashion. This paper introduces the RML mapping language, a g...
When building reliable data-driven applications for local governments to interact with public servants or citizens, data publishers and consumers have to be sure that the applied data structure and schema definition are accurate and lead to reusable data. To understand the characteristics of reusable local government data, we motivate how the proce...
Two movements are currently influencing the owners of public datasets to open up what's inside their organization: the Web API movement and the Open Data movement. The first advocates open Web-services which can provide a specific use case of information. The second advocates raw data to be published to the Web to be able to get used, reused and re...
Not all data published on Open Government Data Portals reaches the same level. In this
talk, a distinction is made between non-machine readable data (1st level), data about
which only the serialization format is known (2nd level) and data about which both
serialization format is known as well as the model (3d level). At each of these three
levels,...
Although reaching the fifth star of the Open Data deployment scheme demands the data to be represented in RDF and linked, a generic and standard mapping procedure to deploy raw data in RDF was not established so far. Only the R2RML mapping language was standardized but its applicability is limited to mappings from relational databases to RDF. We pr...
Read/Write infrastructures are often predicted to be the next big challenge for Linked Data. In the domains of Open Data and cultural heritage, this is already an urgent need. They require the exchange of partial graphs, personalised views on data and a need for trust. A strong versioning model supported by provenance is therefore crucial. However,...