Conference PaperPDF Available

Abstract and Figures

Accessing data on the Web in order to obtain useful information has been a challenge in the past decade. The technologies of the Semantic Web have enabled the creation of the Linked Data Cloud, as a concrete materialization of the idea to transform the Web from a web of documents into a web of data. The Linked Data concept has introduced new ways of publishing, interlinking and using data from various distributed data sources, over the existing Web infrastructure. On the other hand, music represents a big part of the everyday life for many people in the world, and therefore, understandably, the Web contains loads of data from the music domain. Given the fact that Linked Data enables new, advanced use-case scenarios, the music domain and its users can also benefit from this new data concept. Besides being provided with additional information about their favorite artists and songs, the users can also potentially get an overview of the dynamics of the global music playlists and charts, from the aspects of artists, countries, genres, etc. In this paper, we describe the process of transforming one- and two-star music playlists and charts data from various global radio stations, into five-star Linked Data, in order to demonstrate these benefits. We also present the design of our Playlist Ontology necessary for our data model. We then demonstrate - via SPARQL queries and a web application - some of the new use-case scenarios for the users over the published linked dataset, which are otherwise not available over the isolated datasets on the Web.
The workflow of obtaining, transforming, publishing and interlinking the data. Another notable project is the music recommendation system, them into an RDF Graph and link them with datasets from the based on social networking and user contribution [6]. The goal of LOD Cloud. this system is to provide means to use interlinked data from the The automated workflow of generating Linked Data from the LOD Cloud and combine them with social and user data, in order playlists goes as follows (Figure 3): to provide data-rich recommendations. 1. Data gathering and staging 3. GENERATING LINKED DATA FROM a. We use a custom web crawler to crawl and gather the GLOBAL RADIO STATIONS HTML pages of the playlists and charts of interest. Although we have already worked on generating Linked Data in b. A parser is used for cleaning and filtering the data from other domains [7][8][9][10][11], we had no previous experience the HTML pages, before storing them as XML files with with music data. After we did our research on available music a uniform structure. data to support our research idea, we decided to use the public c. An XSL transformation is applied over the filtered XML data from the official music playlists and charts from various content, in order to generate RDF/XML files with global radio stations, which are published on their websites. Since annotated content. the generator of these playlists is the listener, i.e. the Web user, we believe that providing him/her with additional use-case d. The RDF/XML files are then loaded in a Virtuoso scenarios for information retrieval while browsing his/her favorite instance, into an RDF graph. artists, songs and releases, can be a potential source for 2. Transformation to Linked Data application development. a. SPARQL-based merge procedures are run over the data In order to provide these scenarios, we created a system which can from the RDF graph, in order to create links to existing obtain, transform, publish, interlink and update the playlist data. It entities in the LOD Cloud, i.e. generate Linked Data. consists of several parts which constitute one automated Each of these steps is depicted on Figure 3 and is described in workflow; the workflow can then be scheduled, in order to update more details below. the data on a regular basis. The radio stations we use as data sources are radio stations from 3.1 Data Gathering and Staging the BBC website 8 : Radio 1, Radio 1Xtra, Radio 2, Radio 6 Music, The process of data gathering is done with a custom web crawler, Asian Network, Radio Scotland; the official charts from BBC which stores the HTML pages locally. Since the HTML structure Radio 1: the Official UK Top 40 Singles Chart, Dance Singles, of these pages varies significantly, we use a parser to extract the Indie Singles, Rock Singles, etc. We chose these sources based of necessary data (playlist name, radio station name, list of songs in the amount and type of data they contain. The information about the playlist with their corresponding position in the playlits, etc.) these playlists, though represented differently on each radio from each of them. With this, we get cleaned HTML files, which station website, in a non-uniform fashion, generally contains the we store locally as XML files. name of the song, its current position in the playlist and the name of the artist performing it. The stored XML files are then put through an XSL transformation process, which outputs RDF data, in RDF/XML format. Even The playlists are available only as HTML tables on the radio though RDF/XML has been out of favor in the Linked Data and station websites. Therefore, we use a custom crawler to obtain and the Semantic Web community because of its verbose syntax, it is clean the playlist data, and to store it locally in XML format. After quite convenient for use with XSL transformations, since it can be that we transform them from XML to RDF/XML format, load generated directly. The XSL transformation uses the scheme described further in 3.1.1 for transforming the XML elements and attributes into RDF triples in an RDF/XML format.
… 
Content may be subject to copyright.
108
Linked Music Data from Global Music Charts
Milos Jovanovik Matej Petrov Bojan Najdenov
Faculty of Computer Science
and Engineering
Faculty of Computer Science and
Engineering
Faculty of Computer Science and
Engineering
Skopje, Macedonia
milos.jovanovik@finki.ukim.mk
Skopje, Macedonia
petrov_matej@yahoo.com
Skopje, Macedonia
bojan.najdenov@finki.ukim.mk
Dimitar Trajanov
Faculty of Computer Science
and Engineering
Skopje, Macedonia
dimitar.trajanov@finki.ukim.mk
ABSTRACT
Accessing data on the Web in order to obtain useful information
has been a challenge in the past decade. The technologies of the
Semantic Web have enabled the creation of the Linked Data
Cloud, as a concrete materialization of the idea to transform the
Web from a web of documents into a web of data. The Linked
Data concept has introduced new ways of publishing, interlinking
and using data from various distributed data sources, over the
existing Web infrastructure. On the other hand, music represents a
big part of the everyday life for many people in the world, and
therefore, understandably, the Web contains loads of data from
the music domain. Given the fact that Linked Data enables new,
advanced use-case scenarios, the music domain and its users can
also benefit from this new data concept. Besides being provided
with additional information about their favorite artists and songs,
the users can also potentially get an overview of the dynamics of
the global music playlists and charts, from the aspects of artists,
countries, genres, etc. In this paper, we describe the process of
transforming one- and two-star music playlists and charts data
from various global radio stations, into five-star Linked Data, in
order to demonstrate these benefits. We also present the design of
our Playlist Ontology necessary for our data model. We then
demonstrate – via SPARQL queries and a web application – some
of the new use-case scenarios for the users over the published
linked dataset, which are otherwise not available over the isolated
datasets on the Web.
Categories and Subject Descriptors
H.3.5 [Information storage and retrieval]: On-line information
Services – Data sharing, Web-based services; H.2.4 [Database
Management]: Systems – Distributed databases.
General Terms
Algorithms, Design, Experimentation.
Keywords
Music, Linked Data, Open Data, Playlist Ontology.
1. INTRODUCTION
Technology has always been a major tool in improving the quality
of life for people. By lowering the barrier for publishing and
accessing documents, the Web has been the innovation which
changed the way we communicate, as well as the way we gather
and share knowledge. However, the original design of the Web
has been intended for human consumption only, so in order to
obtain and analyze larger amounts of data, intelligent software
tools are needed. The technologies of the Semantic Web represent
a set of standards which can be applied over the documents on the
Web, or any other data, in order to enable interlinking of the
different data sources into a web of data. This lowers the data
access barrier even further for simpler software tools to be able to
obtain, understand, process and use it [1][2][3].
The concept of Linked Data represents a concrete materialization
of the Semantic Web vision. It is a set of best practices which can
be used for publishing, interlinking and querying data from
different and distributed data sources, over the existing
infrastructure of the Web. As part of the Linked Data endeavor,
the Linking Open Data (LOD) Cloud1 has been created. It consists
of a large number of interlinked datasets, from different domains,
which have been published on the Web. With this, the LOD
Cloud represents a rich network of data, which can be accessed
using the technologies of the Semantic Web.
The Linked Data concept and the LOD Cloud allow the creation
of use-case scenarios for both users and their applications which
have not been available before, in isolated datasets. Therefore,
they can be used in new, innovative applications in various
1 http://lod-cloud.net/
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
for components of this work owned by others than the author(s) must be
honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions from
Permissions@acm.org.
SEM '14, September 04 - 05 2014, Leipzig, AA, Germany
Copyright is held by the owner/author(s). Publication rights licensed to
ACM.
ACM 978-1-4503-2927-9/14/09$15.00.
http://dx.doi.org/10.1145/2660517.2660536
109
domains, which would leverage the value of the data, and create
new business value in the industries [4][5].
For the purpose of measuring data quality on the Web, Tim
Berners-Lee has proposed a 5-star rating system2. According to
the rating system, each information published online gets at least
one star. Data published in machine-readable, structured formats
get two stars, and data published in non-proprietary structured
formats (CSV, XML, etc.) get three stars. Four stars are given to
data which use Semantic Web standards (RDF, OWL, SPARQL,
etc.) for structure and access, and five stars are reserved for data
which additionally link to other people’s data, for providing
context.
Figure 1. The media part of the LOD Cloud3.
In order to create new, advanced use-case scenarios for the music
audience, we need to apply the principles of Linked Data over
existing music related data on the Web. One particular domain
where these new data publishing and access principles can help is
the domain of music data from global playlists and charts; these
data represent the music taste of the general public, and provide a
snapshot of which artist, song or genre was globally popular in a
particular moment in time. Having these data as Linked Data can
provide the general users with more information about their
favorite artist, songs, genres, but also allow them to get an
overview of the dynamics of the global music playlists and charts,
from various aspects. This can be achieved through applications
which access the Linked Data available on the Web.
In this paper, we present a sustainable system and its methodology
for obtaining and transforming one-star and two-star music related
data from the websites of various global radio stations, into five-
star Linked Data, interlinked with music domain datasets from the
LOD Cloud (Figure 1). We also provide example use-case
scenarios over the created dataset via SPARQL queries and a
web application – which demonstrate the advantage of interlinked
over isolated datasets.
The paper is organized as follows: in Section 2 we discuss related
work and present music related datasets which are part of the
2 http://5stardata.info/
3 Taken from the Linking Open Data cloud diagram, by Richard
Cyganiak and Anja Jentzsch. http://lod-cloud.net/
LOD Cloud. In Section 3 we describe the design of the process of
gathering and staging data, and their transformation and
interlinking using the Linked Data practices. Here, we also
discuss our data model, the reuse of existing ontologies and our
Playlist Ontology, which we designed specifically for this
purpose. In Section 4 we present and demonstrate example use-
cases which arise from the interlinked datasets. In Section 5, we
present our web application built on top of the dataset, and in
Section 6 we give a conclusion to the presented work.
2. RELATED WORK
As we already mentioned, music related datasets are already part
of the LOD Cloud (Figure 1). They have been created by various
projects, and we will take a closer look at them.
DBTune4 is a project aimed at providing access to music related
data, published following the Linked Data principles. It provides
access to more than 14 billion RDF Triples from various datasets,
such as MySpace, Jamendo, Last.fm, MusicBrainz, etc. The
datasets which are part of the DBTune project are represented as
blue circles on Figure 2. Both Figure 1 and Figure 2 show the
connections the DBTune datasets have with other datasets from
the LOD Cloud.
Figure 2. DBTune Datasets, depicted as blue circles.
MusicBrainz5 represents an open source repository of music
information, which is community-maintained. The data stored in
the MusicBrainz database is very diverse, spanning from data
about artists and their releases and albums, to publishers,
composers, etc. Although it is publicly available and free to use,
MusicBrainz does not serve its data in a Linked Data manner
directly. Regardless, since it provides unique identifiers for artists,
albums and tracks, it is already widely used as a source for music-
related URIs in the LOD Cloud.
LinkedBrainz6 is a project which is intended to publish the
MusicBrainz database as Linked Data. As a result of the project,
the MusicBrainz data is exposed in RDF using mappings of
concepts from its database into concepts of the Music Ontology7
and other appropriate ontologies. The project also provides
dereferenceable URIs for the entities and a public SPARQL
endpoint for querying the MusicBrainz data.
4 http://dbtune.org/
5 http://musicbrainz.org/
6 http://linkedbrainz.org/
7 http://musicontology.com/
110
Another notable project is the music recommendation system,
based on social networking and user contribution [6]. The goal of
this system is to provide means to use interlinked data from the
LOD Cloud and combine them with social and user data, in order
to provide data-rich recommendations.
3. GENERATING LINKED DATA FROM
GLOBAL RADIO STATIONS
Although we have already worked on generating Linked Data in
other domains [7][8][9][10][11], we had no previous experience
with music data. After we did our research on available music
data to support our research idea, we decided to use the public
data from the official music playlists and charts from various
global radio stations, which are published on their websites. Since
the generator of these playlists is the listener, i.e. the Web user,
we believe that providing him/her with additional use-case
scenarios for information retrieval while browsing his/her favorite
artists, songs and releases, can be a potential source for
application development.
In order to provide these scenarios, we created a system which can
obtain, transform, publish, interlink and update the playlist data. It
consists of several parts which constitute one automated
workflow; the workflow can then be scheduled, in order to update
the data on a regular basis.
The radio stations we use as data sources are radio stations from
the BBC website8: Radio 1, Radio 1Xtra, Radio 2, Radio 6 Music,
Asian Network, Radio Scotland; the official charts from BBC
Radio 1: the Official UK Top 40 Singles Chart, Dance Singles,
Indie Singles, Rock Singles, etc. We chose these sources based of
the amount and type of data they contain. The information about
these playlists, though represented differently on each radio
station website, in a non-uniform fashion, generally contains the
name of the song, its current position in the playlist and the name
of the artist performing it.
The playlists are available only as HTML tables on the radio
station websites. Therefore, we use a custom crawler to obtain and
clean the playlist data, and to store it locally in XML format. After
that we transform them from XML to RDF/XML format, load
8 http://www.bbc.co.uk/radio/
them into an RDF Graph and link them with datasets from the
LOD Cloud.
The automated workflow of generating Linked Data from the
playlists goes as follows (Figure 3):
1. Data gathering and staging
a. We use a custom web crawler to crawl and gather the
HTML pages of the playlists and charts of interest.
b. A parser is used for cleaning and filtering the data from
the HTML pages, before storing them as XML files with
a uniform structure.
c. An XSL transformation is applied over the filtered XML
content, in order to generate RDF/XML files with
annotated content.
d. The RDF/XML files are then loaded in a Virtuoso
instance, into an RDF graph.
2. Transformation to Linked Data
a. SPARQL-based merge procedures are run over the data
from the RDF graph, in order to create links to existing
entities in the LOD Cloud, i.e. generate Linked Data.
Each of these steps is depicted on Figure 3 and is described in
more details below.
3.1 Data Gathering and Staging
The process of data gathering is done with a custom web crawler,
which stores the HTML pages locally. Since the HTML structure
of these pages varies significantly, we use a parser to extract the
necessary data (playlist name, radio station name, list of songs in
the playlist with their corresponding position in the playlits, etc.)
from each of them. With this, we get cleaned HTML files, which
we store locally as XML files.
The stored XML files are then put through an XSL transformation
process, which outputs RDF data, in RDF/XML format. Even
though RDF/XML has been out of favor in the Linked Data and
the Semantic Web community because of its verbose syntax, it is
quite convenient for use with XSL transformations, since it can be
generated directly. The XSL transformation uses the scheme
described further in 3.1.1 for transforming the XML elements and
attributes into RDF triples in an RDF/XML format.
Figure 3. The workflow of obtaining, transforming, publishing and interlinking the data.
111
The RDF/XML files are loaded into a Virtuoso Universal Server9
instance, into a single RDF graph. Each time the automated
workflow is being run, it adds data into the same RDF graph, i.e.
it updates the dataset. This RDF graph10 has been published and is
available via a persistent URI. Its content is dereferenceable via
HTTP content negotiation, as well.
3.1.1 Playlist Ontology
In order to transform the playlist data from HTML to RDF/XML,
using XSL transformation, we need an ontology. As we
previously mentioned, LinkedBrainz is a project which publishes
the MusicBrainz data in RDF format, by using mappings of
concepts from the database with concepts defined in the Music
Ontology. The Music Ontology is used as a vocabulary for
describing a wide range of music related information. It provides
classes and concepts such as artists, albums, tracks and properties
such as biography, duration, instrument and many others [12].
Figure 4. Diagram of the Playlist Ontology.
However, since the entities described in our dataset are playlist
entries which have a different schema from the entities from
MusicBrainz, we are unable to use the classes and properties from
the Music Ontology for annotation purposes. Therefore, we
created our own ontology, the Playlist Ontology11. It is comprised
of classes and properties which are necessary for describing the
data from our playlist dataset. In order to support the interlinking
of the data from our dataset with data from the LOD Cloud, we
also needed object properties in the ontology which would serve
as links between entities from the different datasets.
9 http://virtuoso.openlinksw.com/
10 http://purl.org/net/lmd/data
11 http://purl.org/net/po#
The Playlist Ontology has three classes (Figure 4). The
po:PlaylistEntry class is used for representing and entry from a
playlist. This entry is not simply a song, but rather a song which
holds a specific position in a specific playlist, at a specific time.
The po:Playlist class is used for representing a playlist from a
radio station, and the po:Song class is used for representing a
song.
Table 1. Object Properties of the Playlist Ontology.
Property Description
hasPlaylistEntry Used for linking a po:Playlist
instance with po:PlaylistEntry
instances, for entries which are
part of the playlist. An inverse
property of po:partOfPlaylist.
partOfPlaylist Used for linking a po:PlaylistEntry
instance with a po:Playlist
instance. An inverse property of
po:hasPlaylistEntry.
playlistEntrySong Used for linking a po:PlaylistEntry
instance with a po:Song instance.
An inverse property of
po:featuredInPlaylistEntry.
featuredInPlaylistEntry Used for linking a po:Song
instance with a po:PlaylistEntry
instance. An inverse property of
po:playlistEntrySong.
artistInfo Used for linking a po:Song
instance with an mo:MusicArtist
instance from the LOD Cloud.
songInfo Used for linking a po:Song
instance with an mo:Track
instance from the LOD Cloud.
Table 2. Datatype Properties of the Playlist Ontology.
Property Description
position Position of the entry in the playlist, for the
specific week and year.
week The week of the occurrence of the entry in
the playlist.
year The year of the occurrence of the entry in
the playlist.
photoURL A URL to a photo for the entry.
playlistName The name of the playlist.
stationName The name of the radio station.
These classes are interconnected with four object properties
(Table 1, Figure 4); po:hasPlaylistEntry and po:playlistEntrySong
are the main properties of the model, and po:partOfPlaylist and
po:featuredInPlaylistEntry are their inverse properties,
respectively. Even though having inverse properties generally
introduces redundancy, i.e. writing more triples for the same
information, we defined them for better SPARQL query
performance for some of the use-cases. The po:Song class also
uses two other object properties for connecting with LOD
112
instances annotated with the Music Ontology (Table 1, Figure 4).
The ontology also contains six datatype properties (Table 2,
Figure 4).
In addition to these properties, we also used the foaf:name
property from the FOAF Ontology12, in order to define the name
of the artist who performs the po:Song instance, and the dc:title
property of the DCMI ontology13, in order to define the title of a
po:Song instance (Table 3, Figure 4).
Table 3. Other Datatype Properties used in our Model.
Property Description
foaf:name Used for the artist name of a po:Song instance.
dc:title Used for the song title of a po:Song instance.
The Playlist Ontology has been published following the best
practices14, i.e. with a persistent URI, and is dereferenceable via
HTTP content negotiation.
3.2 Transformation to Linked Data
After the RDF graph is created, we need to transform its data into
Linked Data. In order to do this, we need to link the data from our
graph to data from other datasets in the LOD Cloud. The external
dataset we chose for interconnecting was MusicBrainz, or more
specifically LinkedBrainz, since it contains the same data in RDF
format and is accessible via a SPARQL endpoint.
In order to accomplish this, we use the two object properties from
our ontology, po:songInfo and po:artistInfo. We use the
po:songInfo property to connect a po:Song instance to the
mo:Track instance described on LinkedBrainz. To do this, we
search for an mo:Track instance on LinkedBrainz which has the
same song title as our po:Song instance and is performed by the
same artist, and add it as an object in an RDF triple which
connects the po:Song instance with it, via the po:songInfo
property (Figure 4).
In a similar manner, we use the po:artistInfo property to connect a
po:Song instance to an mo:MusicArtist instance from
LinkedBrainz, where the matching is done by the name of the
artist performing the po:Song and the mo:MusicArtist name.
This logic was implemented in merge procedures via SPARQL
queries, which are triggered from a script after the RDF graph is
created or updated (Figure 3).
These po:songInfo and po:artistInfo relations represent a gateway
into more details about the song and artist in question, and enable
a large number of new use-case scenarios. After establishing these
links between our playlist dataset and the LinkedBrainz dataset,
we are able to access more data and retrieve more information
about the song and the artist not only from this dataset, but also
from all other LOD Cloud datasets which are interconnected with
it (Figure 1). This allows us to potentially traverse the entire LOD
Cloud, by starting from our dataset and playlist entries, which
adds to the number of potential uses of the playlist dataset.
12 http://xmlns.com/foaf/spec/
13 http://purl.org/dc/elements/1.1/
14 http://www.w3.org/TR/ld-bp/
4. USE CASES
As we already pointed out, our goal is to demonstrate that the
transformation of playlist data into Linked Data can provide new
use-case scenarios for the domain users and their applications.
The technologies of the Semantic Web allow data retrieval over a
distributed environment, via SPARQL federation. We will use
this feature, which allows execution of SPARQL queries over
distributed SPARQL endpoints.
Since the playlist dataset is published as an RDF graph on a
public Virtuoso instance, accessible and dereferenceable via a
persistent URI, and is linked with data from the LOD Cloud, the
next step is to explore these additional use-case scenarios which
arise from the interlinking, and demonstrate how they can be used
in applications developed over the dataset.
4.1 Using Data from the Playlist Dataset
The first question which appears is what kind of information can
be retrieved by using only our dataset. It contains consolidated
playlist data from different websites, which is enough to enable
new use-cases. One such scenario would be finding the songs
from a specific artist, along with their titles, their positions in
different playlists, and the names of the playlists and radio
stations they appear in, at a specific time. In order to get this
information, we could use the following SPARQL query:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX po: <http://purl.org/net/po#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?playlistName ?stationName ?songName
?position
FROM <http://purl.org/net/lmd/data#>
WHERE {
?song foaf:name "Arctic Monkeys" ;
dc:title ?songName ;
po:featuredInPlaylistEntry ?entry .
?entry po:week "31" ;
po:year "2014" ;
po:position ?position ;
po:partOfPlaylist ?playlist .
?playlist po:playlistName ?playlistName ;
po:stationName ?stationName .
}
This query finds all the po:Song entities from our dataset which
have ‘Arctic Monkeys’ as an artist name, and retrieves data
connected to the instance.
The use of the po:featuredInPlaylistEntry and po:partOfPlaylist
properties in this use-case allows for better query performance,
compared to the case if we only had the po:playlistEntrySong and
po:hasPlaylistEntry properties in the ontology. The partial result
of the query executed over our playlist dataset, edited for brevity,
is shown in Table 4.
Table 4. Partial results from the SPARQL query.
Playlist Station Song Position
Indie Singles BBC Radio 1 Do I Wanna Know? 10
Indie Singles BBC Radio 1 R U Mine? 18
Indie Singles BBC Radio 1 Why’d you only … 22
113
4.2 Using Data from LinkedBrainz and LOD
4.2.1 Using the po:songInfo property
The po:songInfo property enables us to step out of our playlist
dataset and obtain additional data from the LinkedBrainz dataset
about the song. For instance, if we want to find out the album
(release) for a song from a playlist entry, along with the date it
was published, we can use the following SPARQL query:
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX po: <http://purl.org/net/po#>
PREFIX pd: <http://purl.org/net/lmd/data#>
SELECT distinct ?artist str(?songTitle)
str(?releaseTitle) ?releaseDate ?releasePlace
WHERE {
GRAPH <http://purl.org/net/lmd/data#> {
pd:JYChLBn-1-3 po:playlistEntrySong ?song .
?song po:songInfo ?mbs ;
foaf:name ?artist .
}
SERVICE <http://linkedbrainz.org/sparql> {
?mbs dc:title ?songTitle .
?record mo:track ?mbs .
?release mo:record ?record ;
dc:title ?releaseTitle .
?releaseEvent mo:release ?release ;
dc:date ?releaseDate ;
event:place ?place .
?place rdfs:label ?releasePlace .
}
}
ORDER BY ?releaseDate
This query starts executing over the local playlist RDF graph,
looking for the po:Song instance from the selected playlist entry
pd:JYChLBn-1-3, which represents the occurrence of the ‘Give
Life Back to Music’ song by the artist ‘Daft Punk’ in one of the
playlists. The detected po:Song instance is already linked with a
LinkedBrainz song entity, and this entity (its ID) is then sent as a
variable in a subquery for execution at the LinkedBrainz
SPARQL endpoint, via SPARQL federation. As a result of the
federated call, we obtain the necessary data about the song in
question. Since the result set is large, only a part of it is shown in
Table 5.
As we see from Table 5, this query can be used by an application
for providing a user with more information about the song in
question and the album (release) it is part of, by using data not
present in our dataset.
Table 5. Partial results from the SPARQL query.
Song Album Date Place
Give Life
Back to
Music
Random
Access
Memories
2013-05-17 United States
Give Life
Back to
Music
Random
Access
Memories
2013-05-17 Germany
Give Life
Back to
Music
Random
Access
Memories
2013-05-17 Netherlands
Give Life
Back to
Music
Random
Access
Memories
2013-05-20 United
Kingdom
4.2.2 Using the po:artistInfo property
Another possible use-case scenario would be to get additional
information about the artist of a song featured as an entry in one
of the playlists the user is interested in. For instance, a common
scenario in an application would be to provide a picture, a
description and a website URL for the artist, which can be done
with the following SPARQL query:
Figure 5. Playlist and artist details in the web application.
114
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-
syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
PREFIX po: <http://purl.org/net/po#>
PREFIX pd: <http://purl.org/net/lmd/data#>
SELECT distinct ?thumbnail ?abstract ?website
WHERE {
GRAPH <http://purl.org/net/lmd/data#> {
pd:ghQTOqj-1-4 po:playlistEntrySong ?song .
?song po:artistInfo ?artist .
}
SERVICE <http://linkedbrainz.org/sparql> {
?artist owl:sameAs ?dbArtist .
}
SERVICE <http://dbpedia.org/sparql> {
?dbArtist dbpedia:thumbnail ?thumbnail ;
foaf:homepage ?website ;
dbpedia:abstract ?abstract .
FILTER langMatches(lang(?abstract), "EN")
}
}
This query starts in the local RDF graph, but then continues to
retrieve data from the LinkedBrainz and DBpedia datasets, in
order to provide the information for the use-case. The result of the
example query is shown in Table 6.
The retrieved data is not present in our dataset, but is retrieved
from other, distributed data repositories. The data from the result
can be used on an artist screen in an application, for example,
providing the user with general info about the performer of the
song of interest.
Table 6. Results from the SPARQL query.
Thumbnail Abstract Website
http://upload.wiki
media.org/wikipe
dia/commons/thu
mb/c/c2/Katy_Per
ry_UNICEF_201
2.jpg/200px-
Katy_Perry_UNI
CEF_2012.jpg
"Katheryn Elizabeth
Hudson (born
October 25, 1984),
known by her stage
name Katy Perry, is
an American
recording artist,
songwriter, and
actress...”
http://www.katy
perry.com/
It is important to note that these example queries can be sent as a
query string from an application, i.e. the SPARQL endpoint can
be used as a REST service. The HTTP GET calls generally have
the following format:
http://linkeddata.finki.ukim.mk/sparql?query=SPARQLQUE
RY&format=FORMAT
Here, SPARQLQUERY represents the URL encoded SPARQL
query, and FORMAT represents the format of the response, such
as HTML, XML, JSON, CSV, RDF/XML, N3, Turtle, JSON-LD,
etc. The SPARQL endpoint also allows the use of an Accept
header for the preferred output format.
Other useful use-case scenarios can be achieved with these
interconnected datasets, as well. We could, for instance, collect
the social media profile addresses of the artists of interest, find out
which label released their most recent album, or make an
analytical query and find the artist or label with most songs
present on the radio playlists, etc.
Figure 6. World map view of the artists from a selected playlist / chart, for a specific week and year.
115
These use-case scenarios are meant to be used by developers in
various applications from the music and entertainment domain, in
order to provide the users with interesting information from the
LOD Cloud. These applications have the opportunity to be richer
in information than those which use isolated data sources. This
will eventually contribute to a better user experience.
5. WEB APPLICATION
In order to demonstrate the feasibility of the use-cases, we
developed a web application. It uses our playlist dataset from our
Virtuoso instance and aims to provide the end users with basic
information about the artists and songs from the available playlists
and charts (Figure 5), as well as give them a more analytical
insight – a global overview of the countries of origin of the artists
present on a given playlist or chart, and allowing an analysis of
the weekly dynamics in them (Figure 6).
The web application uses our SPARQL endpoint to query for data
from both our dataset and data from the LOD Cloud. One basic
use-case is to provide the user additional information about an
artist he/she is interested in. This use-case can be achieved by
using the list of radio stations and their playlists, and the playlist
entries for the current week from the local dataset, along with
more artist details – a photo, a short bio, a geo-location of the
place of birth/origin of the artist – from the LOD Cloud (Figure
5).
For more analytical users, the web application provides a use-case
which offers a global overview of the places the artists from a
selected playlist are coming from (Figure 6). By selecting
different playlists, the user can gain insight of the differences
between radio stations and the varying presence of countries and
artists in them. Additionally, by changing the week for one
selected playlist, the user can visually witness these dynamics
happening from week to week in it. This use-case uses data from
the LOD Cloud, as well, in order to get the artist in question, the
place of origin or birth, and then its geo-location data.
The scenarios from the web application are in direct support of the
idea we initially had: to show that the application of Linked Data
principles in the music domain can prove beneficial for the end-
users from the domain, by providing more advanced and broader
use-cases.
6. CONCLUSION
The concept of Linked Data represents a big advantage in
representation and retrieval of structured data from distributed
parts of the Web. A large number of communities, companies and
other interested stakeholders are taking part in the initiative and
are contributing to the expansion of the LOD Cloud [3].
In this paper we described the design of a system which uses an
automated workflow to transform music related data from the
websites of global radio stations into five-star Linked Data. We
developed and published our Playlist Ontology. We also
presented and demonstrated novel use-case scenarios, enabled by
the interlinked datasets, as a basis for further development of
applications and services. As a proof of concept, we developed
our own web application which aims to present the benefit of
these new use-cases to the end-users.
As we know from [4] and [5], this type of data can help both the
business sector and developers, by creating new business value
with unique use-cases for applications and services, and the
general public as the end user of those applications and services.
Our goal in this paper was to demonstrate that the Linked Data
principles offer a bundle of new use-case scenarios in the music
domain which were previously either unavailable, or very hard to
implement. These use-cases, along with the public dataset itself,
can pose a base for further application development by the
community and the companies, and can hopefully introduce new
business value in the industry.
7. ACKNOWLEDGMENTS
The work in this paper was partially financed by the Faculty of
Computer Science and Engineering, at the Ss. Cyril and
Methodius University in Skopje, as part of the research project
“Semantic Sky 2.0: Enterprise Knowledge Management”.
8. REFERENCES
[1] C. Bizer, T. Heath, K. Idehen, and T. Berners-Lee, “Linked
data on the web,” 17th International conference on World
Wide Web, ACM, 2008, pp. 1265-1266.
[2] C. Bizer, T. Heath, and T. Berners-Lee, "Linked Data - the
story so far," International Journal on Semantic Web and
Information Systems 5, no. 3, 2009, pp. 1-22.
[3] T. Heath, and C. Bizer, "Linked Data: Evolving the Web into
a Global Data Space," Synthesis lectures on the Semantic
Web: Theory and Technology 1.1, 2011, pp. 1-136.
[4] T. Berners-Lee, N. Shadbolt, “There’s gold to be mined from
all our data”, The Times, 2012.
[5] V. Kundra, “Digital Fuel of the 21st Century: Innovation
through Open Data and the Network Effect”, Joan
Shorenstein Center on the Press, Politics and Public Policy,
2012.
[6] A. Passant, and Y. Raimond, "Combining Social Music and
Semantic Web for Music-Related Recommender Systems,"
Social Data on the Web Workshop, 2008.
[7] M. Jovanovik, B. Najdenov, D. Trajanov, “Linked Open
Drug Data from the Health Insurance Fund of Macedonia”,
10th Conference for Informatics and Information Technology
(CIIT), 2013.
[8] E. Misheva, B. Najdenov, M. Jovanovik, D. Trajanov, “Open
Public Transport Data in Macedonia”, 11th Conference for
Informatics and Information Technology (CIIT), 2014.
[9] B. Najdenov, H. Pejchinovski, K. Cieva, M. Jovanovik, D.
Trajanov, “Open Financial Data from the Macedonian Stock
Exchange”, ICT Innovations 2014, Advances in Intelligent
Systems and Computing, 2014, (in press).
[10] B. Najdenov, M. Jovanovik, D. Trajanov, “VEO: an
Ontology for CO2 Emissions from Vehicles”, ICT
Innovations 2014, (in press).
[11] M. Jovanovik, B. Najdenov, Gj. Strezoski, D. Trajanov,
“Linked Open Data for Medical Institutions and Drug
Availability Lists in Macedonia”, 3rd International
Workshop on Ontologies in Advanced Information Systems,
OAIS 2014. Advances in Intelligent Systems and
Computing, 2014, (in press).
[12] Y. Raimond, S. A. Abdallah, M. B. Sandler, and F. Giasson,
“The Music Ontology,” ISMIR, 2007, pp. 417-422.
... Multimedia. In the multimedia domain, we worked on a research project for generating, consolidating and publishing Linked Music Data from global music charts [61]. We designed an automated system to crawl and gather playlist and chart data from various global music stations, align them, annotate them using our own Playlist Ontology 7 and interlink them with various entities from the LOD Cloud. ...
... Based on our experience with applying the Linked Data principles in the domains of public transport and air pollution [73,77,75,72], the financial domain [76], the entertainment domain [61] and the healthcare domain [60,59,58,57], we developed a methodology for Linked Data, focused on reusable components as support for the methodology steps. These guidelines build on the existing Linked Data methodologies and contain actions which cover the general Linked Data lifecycle. ...
Thesis
Full-text available
The vast amount of data available over the distributed infrastructure of the Web has initiated the development of techniques for their representation, storage and usage. One of these techniques is the Linked Data paradigm, which aims to provide unified practices for publishing and contextually interlinking data on the Web, by using the World Wide Web Consortium (W3C) standards and the Semantic Web technologies. This approach enables the transformation of the Web from a web of documents, to a web of data. With it, the Web transforms into a distributed network of data which can be used by software agents and machines. The interlinked nature of the distributed datasets enables the creation of advanced use-case scenarios for the end users and their applications , scenarios previously unavailable over isolated data silos. This creates opportunities for generating new business values in the industry. The adoption of the Linked Data principles by data publishers from the research community and the industry has led to the creation of the Linked Open Data (LOD) Cloud, a vast collection of interlinked data published on and accessible via the existing infrastructure of the Web. The experience in creating these Linked Data datasets has led to the development of a few methodo-logies for transforming and publishing Linked Data. However, even though these methodologies cover the process of modeling, transforming / generating and publishing Linked Data, they do not consider reuse of the steps from the life-cycle. This results in separate and independent efforts to generate Linked Data within a given domain, which always go through the entire set of life-cycle steps. In this PhD thesis, based on our experience with generating Linked Data in various domains and based on the existing Linked Data methodologies, we define a new Linked Data methodology with a focus on reuse. It consists of five steps which encompass the tasks of studying the domain, modeling the data, transforming the data, publishing it and exploiting it. In each of the steps, the methodology provides guidance to data publishers on defining reusable components in the form of tools, schemas and services, for the given domain. With this, future Linked Data publishers in the domain would be able to reuse these components to go through the life-cycle steps in a more efficient and productive manner. With the reuse of schemas from the domain, the resulting Linked Data dataset will be compatible and aligned with other datasets generated by reusing the same components, which additionally leverages the value of the datasets. This approach aims to encourage data publishers to generate high-quality, aligned Linked Data datasets from various domains, leading to further growth of the number of datasets on the LOD Cloud, their quality and the exploitation scenarios. With the emergence of data-driven scientific fields, such as Data Science, creating and publishing high-quality Linked Data datasets on the Web is becoming even more important, as it provides an open dataspace built on existing Web standards. Such a dataspace enables data scientists to make data analytics over the cleaned, structured and aligned data in it, in order to produce new knowledge and introduce new value in a given domain. As the Linked Data principles are also applicable within closed environments over proprietary data, the same methods and approaches are applicable in the enterprise domain as well.
... In this paper, the problem of limited search capability in the UPnP-AV system will be tackled. The goal is to enable advanced semantic searching functionality by the notable means of Semantic Web technologies and Linked Data principle [5]. Although our semantically-enhanced UPnP-AV system shares a common architecture with Manuscript received December 20, 2015; revised March 10, 2016. ...
... This situation calls for a transformation of open government data into linked open government data. This is likely to benefit organizations, governments, and individuals by improving transparency, encouraging public participation and creating the ability to query multiple data sources [6], [21]. A number of studies such as [10] and [9] show that linking open government data facilitates transparency, citizen awareness, accountability, in addition to supporting better decision-making, avoiding duplication and enabling cost saving in data collection. ...
Article
Full-text available
bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Background: The trend in producing linked open data to publish high-quality interlinked data has gained widespread traction in recent years. Various sectors are producing linked open data to increase public access and ensure transparency, in addition to a better utilization of government data, namely linked open government data. Problem Definition: As compared to the developed countries, Saudi Arabia lags behind in benefiting from this new era of ubiquitous web of data, despite its publication of government related data in non-linked format. In the context of Saudi open government data, the full potential of multi-category data published by various government agencies at different portals is not being realized as the data are not published in open data format and remain unlinked to other existing datasets. Methodology: To bridge this gap, this study presents a framework to extract and generate semantically enriched data from various data sources under different domains. The framework was used to produce the Saudi linked open government data cloud by interlinking data entities with each other and with external existing open datasets. Results: The effectiveness of our approach is validated by applying it to a socially significant issue, i.e., divorce rate, in Saudi Arabia. By posing smart queries to semantically enriched data, we were able to perform an in-depth analysis of different factors related to increasing divorce rates in Saudi Arabia. Arguably, without using linked open data and related technologies such analysis would not have been possible. Finally, we also present a simulated visual environment for better understanding and communication of such analysis for decision and policy makers.
Article
As ontologias são instrumentos fundamentais para a interoperabilidade no contexto da Web Semântica, pois estão fundamentadas na descrição explícita das relações semânticas em domínios de conhecimento. O domínio da música, especificamente, apresenta conceitos implicados em dimensões de naturezas distintas, como sonora, bibliográfica, cultural (incluindo emoção, gênero musical, informações de redes sociais), de produção, de performance, etc. Este estudo, de caráter exploratório, procura mostrar as várias experiências relacionadas ao uso das ontologias musicais no mundo registradas na literatura científica internacional. Para tanto, realizou-se um levantamento nas bases LISA e Scopus e também em trabalhos oriundos das conferências do ISMIR. O corpus total é formado por 102 documentos, e a análise qualitativa ocorreu por meio do resumo dos documentos. Os resultados mostram cinco tipos de usos e aplicações das ontologias: 1) âmbito comercial; 2) compartilhamento da representação conceitual sem mencionar aplicação específica; 3) recomendação de músicas; 4) recuperação da informação; 5) outros. Conclui-se que para cada uso e aplicação pretendidos para determinada ontologia musical, certas dimensões da música são mais ou menos relevantes. Nesse sentido, do ponto de vista da interconexão de dados no contexto da Web Semântica, o que parece ser um desafio para o futuro das pesquisas em ontologias musicais é a interação, de fato, entre as distintas propostas de modo a cobrir, ainda que não completamente, o heterogêneo universo conceitual do domínio da música.
Article
Ontologies are fundamental tools for interoperability in the context of the Semantic Web, since they are based on the explicit description of semantic relations in knowledge domains. The domain of music, specifically, presents concepts implied in dimensions of different natures, such as sound, bibliographical, cultural (including emotion, musical genre, information of social networks), production, performance, etc. This exploratory study seeks to show the various experiences related to the use of musical ontologies in the world registered in the international scientific literature. To do so, we conducted a survey on the LISA and Scopus databases and also on papers from the ISMIR conferences. The total corpus consists of 102 documents and the qualitative analysis took place through a summary of the documents. The results show five types of uses and applications of the ontologies: 1) commercial scope; 2) sharing conceptual representation without mentioning specific application; 3) recommendation of songs; 4) information retrieval; 5) others. We conclude that for each intended use and application for a given musical ontology, certain dimensions of music are more or less relevant. In this sense, from the point of view of data interconnection in the context of the semantic web, what seems to be a challenge for the future of research on musical ontologies is the interaction, in fact, between the different proposals in order to cover, although not completely, the heterogeneous conceptual universe of the domain of music. © 2017, Brazilian Institute for Information in Science and Technology. All rights reserved.
Data
Full-text available
One of the most active fields of research in the past decade has been data representation, storage and retrieval. With the vast amount of data available on the Web, this field has initiated the development of data management techniques for distributed datasets over the existing infrastructure of the Web. The Linked Data paradigm is one of them, and it aims to provide common practices for publishing and linking data on the Web with the use of Semantic Web technologies. This allows for a transformation of the Web from a web of documents, to a web of data. With this, the Web becomes a distributed network for data access, usable by software agents and machines. The interlinked nature of the distributed datasets provides new use-case scenarios for the end users, scenarios which are unavailable over isolated datasets. In this paper, we are describing the process of generating Linked Open Data from the public data of the Health Insurance Fund along with data from the Associated Pharmacies of Macedonia. With this we generate and publish an interlinked RDF dataset in a machine-readable format. We also provide examples of newly available use-case scenarios which exploit the Linked Data format of the data. These use-cases can be used by applications and services for providing relevant information to the end-users.
Conference Paper
Full-text available
The Linked Data best practices provide ways for easier data representation, while at the same time raise the quality of the information that comes with it. The idea behind these best practices is to interlink datasets from various sources which are distributed over different locations and publish the data in an open, machine-readable format so that it would be easier to retrieve and process it by software agents, thus providing opportunities that many new use-cases can be created, which otherwise would not be possible in isolated datasets. With this, the value of the data itself rises to a whole new level. Environmental care is one of the most important issues on a global level, which means that great effort and resources are being spent, to help researchers find new and innovative ways of preserving our world and also to raise awareness of the problem itself. CO 2 emissions from vehicles became a large problem in the past few decades, since the number of vehicles exponentially increases, and also people are be-coming more mobile than ever, having to commute and travel on a regular basis. In this paper, we describe the process of transformation one-, two-and three-star data about CO 2 emissions from vehicles published by the European Environment Agency and various other sources, into five-star Linked Open Da-ta. In addition to that, we developed the Vehicle Emissions Ontology (VEO) to be able to describe the transformed data. We also provide use-case scenarios to show the benefits of using the Linked Data and Open Data concepts in these fields, and provide a public SPARQL endpoint as an entry point for accessing and using the data.
Chapter
Full-text available
One of the most active fields of research in the past decade has been data representation, storage and retrieval. With the vast amount of data available on the Web, this field has initiated the development of data management techniques for distributed datasets over the existing infrastructure of the Web. The Linked Data paradigm is one of them, and it aims to provide common practices for publishing and linking data on the Web with the use of Semantic Web technologies. This allows for a transformation of the Web from a web of documents, to a web of data. With this, the Web becomes a distributed network for data access, usable by software agents and machines. The interlinked nature of the distributed datasets provides new use-case scenarios for the end users, scenarios which are unavailable over isolated datasets. In this paper, we are describing the process of generating Linked Open Data from the public data of the Health Insurance Fund along with data from the Associated Pharmacies of Macedonia. With this we generate and publish an interlinked RDF dataset in a machine-readable format. We also provide examples of newly available use-case scenarios which exploit the Linked Data format of the data. These use-cases can be used by applications and services for providing relevant information to the end-users.
Chapter
Full-text available
The concept of Open Data, which represents the idea that public data should be published in a machine-readable format, starts to take a significant role in modern society. Public data from various fields are being transformed in open data formats and published on systems which allow easier consumption from software agents and applications, as well as the users behind them. On the other hand, people in the business world are trying for a few decades now to establishing standards for financial accounting that govern the preparation of financial reports. Financial reporting has crucial significance for companies today, since it is a record of their work which is presented to their stakeholders and represents a starting point for future business decisions and strategies. In this paper, we use data from the Macedonian Stock Exchange and data from different web sites of Macedonian companies in order to create datasets of Open Financial Data relevant for our country, thus increasing the transparency and improving the data accessibility. We describe the process of transforming the data into 4 star Open Data, and present use-case scenarios which use data from our generated datasets and from the World Bank. The datasets are published and accessible via a SPARQL endpoint, and we demonstrate how a software application can make use of them.
Conference Paper
Full-text available
The need to represent data on the Web in a way that will make it easier to manage, has led to new solutions for data representation, visualization, storage and querying. The concepts of Open Data, Linked Data and the Semantic Web offer a significant improvement in information and data dissemination. These concepts aim towards making data on the Web machine-readable and enable interlinking between data from different datasets, published on different locations. This allows easier data retrieval by software agents, and enables use-case scenarios which are unavailable over isolated data silos. On the other hand, personal time management and daily commute navigation in urban areas are one of the biggest influencers on the quality of life of a person. Public transport data has high value for citizens and generates numerous use-cases. In this paper, we describe the process of obtaining data from the public transport company JSP Skopje, transforming them into the standardized Google Transit Feed Specification format, enhancing them and creating 4 star Open Data. We reused the Transit Ontology and the W3C Geospatial Vocabulary, and developed our own complementing ontology for annotation purposes. We published the generated RDF datasets in order to support the provided use-case scenarios from this domain via a public SPARQL endpoint.
Article
Full-text available
The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article we present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. We describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
Conference Paper
Full-text available
Information dissemination has always been in the focus of the computer science research community. New ways of information and data representation, storage, querying and visualization are being constantly developed and upgraded. Linked Open Data represents a concept which offers a comprehensive solution for information and data dissemination. It accomplishes this by aiming towards two things: to represent data in an open, machine-readable format, and to interlink data from heterogeneous repositories in a way which allows a large variety of usage scenarios for both humans and machines. On the other hand, health also represents a domain of high interest in our research community. In order to provide use-case scenarios for publishing and using healthcare data in Macedonia, we generated a dataset of five-star Linked Open Data, based on the data provided and published by the Health Insurance Fund (HIF) of the Republic of Macedonia. In this paper, we describe the process of transforming the data available at the HIF website, into data published in an open format, and interlinked with data from the DrugBank domain.
Article
Full-text available
The term "Linked Data" refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
Conference Paper
The Web is increasingly understood as a global information space consisting not just of linked documents, but also of Linked Data. More than just a vision, the resulting Web of Data has been brought into being by the maturing of the Semantic Web technology stack, and by the publication of an increasing number of data sets according to the principles of Linked Data. The Linked Data on the Web (LDOW2008) workshop brings together researchers and practitioners working on all aspects of Linked Data. The workshop provides a forum to present the state of the art in the field and to discuss ongoing and future research challenges. In this workshop summary we will outline the technical context in which Linked Data is situated, describe developments in the past year through initiatives such as the Linking Open Data community project, and look ahead to the workshop itself.
Article
There is no need to fear a 'database state'. The information age will boost the economy and make life easier Data is the new raw material of the 21st century — a resource that gets more plentiful every day. In today's web-connected world it drives transactions and decisions of every kind. We need accurate data to help us to catch trains and buses on time, anticipate the weather and pick the right place to live, course to study or product to buy. Two years ago in this newspaper we anticipated a world in which, if you typed your postcode into a government website you would get all sorts of data. You would see the crime rate for your neighbourhood, when the buses ran and the rubbish was collected, how the schools were doing and what your local authority spends. This is now a reality at data.gov.uk. When the data has been released, applications have quickly followed, from mobile apps to find an NHS dentist to companies that use the open data on spending to advise local authorities on how to get the best value for money. These open data apps are creating new businesses for their developers and great resources for us all. Take, for example, bus finders (see London Bus Stop Live or BusMate London) — these were developed within weeks of the data's release and did not cost the taxpayer a penny.