Conference PaperPDF Available

Abstract and Figures

The Linked Data approach in data publishing allows the users and their data-driven applications to have broader use cases which encompass various data sources, either publicly available on the Web, or in private repositories. The use of W3C standards in publishing such data enables uniform access across platforms. Transport information today has higher importance to the citizens and society than ever and accessing the right information at the right time can improve the quality of everyday life for many people in the world. In this paper, we describe our approach of building a system for automated Linked Data generation from the transportation domain. We used the data from the Swedish Transport Administration (STA) as a specific case study. For the purpose of RDF annotation, we developed the Transport Administration Ontology (TAO). The resulting five-star data enables advanced use case scenarios over the original STA data, which we also demonstrate with our web application.
Content may be subject to copyright.
1
ABSTRACT The Linked Data approach in data
publishing allows the users and their data-driven applications
to have broader use cases which encompass various data
sources, either publicly available on the Web, or in private
repositories. The use of W3C standards in publishing such
data enables uniform access across platforms.
Transport information today has higher importance to the
citizens and society than ever and accessing the right
information at the right time can improve the quality of
everyday life for many people in the world.
In this paper, we describe our approach of building a
system for automated Linked Data generation from the
transportation domain. We used the data from the Swedish
Transport Administration (STA) as a specific case study. For
the purpose of RDF annotation, we developed the Transport
Administration Ontology (TAO). The resulting five-star data
enables advanced use case scenarios over the original STA
data, which we also demonstrate with our web application.
Keywords Automated Systems, Linked Data, Open Data,
Transport Administration Ontology, Swedish Transport
Administration.
1. INTRODUCTION
We live in a world where data is omnipresent and means
everything to us as individuals, but also as parts of organizations
and societies [1]. The quantity of the data is on the rise as never
before, while the future trends of growth are perceived only to
continue increasing [2]. As the volume of the data increases in
general, companies and public institutions promote transparency
of their line of work by opening up their data and sharing it with
the people in various formats over the Web. The large amount of
data available on the Web motivates the research of new data
1Bojan Najdenov, Lund University, School of Economics and
Management, Lund, Sweden (email:
bojan.najdenov.947@student.lu.se)
Goran Petkovski, Faculty of Computer Science and Engineering,
Skopje, Macedonia (email:
petkovski.goran.1@students.finki.ukim.mk)
Milos Jovanovik, Faculty of Computer Science and Engineering,
Skopje, Macedonia (email: milos.jovanovik@finki.ukim.mk)
Riste Stojanov, Faculty of Computer Science and Engineering,
Skopje, Macedonia (email: riste.stojanov@finki.ukim.mk)
Dimitar Trajanov, Faculty of Computer Science and Engineering,
Skopje, Macedonia (email: dimitar.trajanov@finki.ukim.mk)
management, storage and access techniques to be practiced on
distributed datasets over the existing Web infrastructure [3].
The main goal of the Web has always been to present information
in a which is understandable to humans, despite the fact that
machines also use the Web to communicate through it. The
Semantic Web on the other hand (Web 3.0), can be thought of as a
web of data that is annotated with Semantic Web standards as
RDF and OWL and is also interlinked based on the meaning
[4][5]. In addition to that, a special accent is placed on the
machine readability and interoperability which leads to providing
endless use case scenarios for both humans and machines [6][7].
Knowing how important the quality and availability of
transportation information is, in this paper we introduce a system
for automated Linked Data generation and publishing, and use the
Swedish Transport Administration (STA) as a case study. In order
to annotate the transport data, we created the Transport
Administration Ontology (TAO) and used it to make a semantic
annotation of the STA’s data, thus raising its quality to 5-Star
Linked Open Data2. Furthermore, we created a web application
where we demonstrated advanced use case scenarios that can be
performed over the STA’s data complimented with data from the
LOD Cloud.
Our paper is organized as follows: in Section 2 we go over related
scientific projects to the domain of Linked Data about
transportation; in Section 3 we describe the automated system for
gathering, transforming data and linking it to the LOD Cloud;
additionally, in Section 4 we present example use case scenarios
that can be performed over the linked datasets; finally, in the last
section we conclude our work.
2. RELATED WORK
For the purpose of providing greater details of the problem area
our research work is positioned in, we will go through several
existing projects concerning Linked Open Data in relation with
transportation as our topic of interest.
UK is one of the countries that is a pioneer in the incorporation
and use of Linked Data concepts. They publish public data from
multiple governmental agencies about multiple topics of interest
and transport is one of them. Using the Semantic Web standards,
UK publishes data about railway stations, airports, busses and
traffic conditions which can be accessed on their website3.
Google Transit Feed Specification (GTFS)4 is a format for
publishing data about public transport information connected with
geographical locations. It is a standard proposed by Google in
2 http://5stardata.info
3 http://transport.data.gov.uk/
4 https://developers.google.com/transit/gtfs/
Automated Linked Data Generation from the
Transport Administration Domain
Bojan Najdenov, Goran Petkovski, Milos Jovanovik, Riste Stojanov and Dimitar Trajanov
23rd Telecommunications forum TELFOR 2015
Serbia, Belgrade, November 24-26, 2015.
978-1-5090-0055-5/15/$31.00 ©2015 IEEE
827
order to help public transport agencies to publish their data and
integrating with Google Maps while also providing means for
developers to build applications over the data and promote
interoperability. There are efforts for creating a GTFS ontology5
and also for extending the GTFS ontology [8] to cover data that
additionally has been introduced as part of the GTFS standard.
The authors that extended the GTFS ontology used it to annotate
transit data from the public transport agency in the city of Skopje,
while also providing a SPARQL endpoint as a mean for querying
the data [8].
3. CREATING LINKED DATA FROM THE SWEDISH
TRANSPORT ADMINISTRATION
The Swedish Transport Administration (STA) Trafikverket6, is a
Swedish governmental agency responsible for long-term planning
and development of the Swedish national road network. They
collect extensive amount of traffic data 24 hours a day, all year
around and provide information to the general public either
through their website or through web services which are available
to software developer and researchers upon request. The
information they publish includes road and traffic conditions in
major cities, roads and highways within Sweden.
Currently, the data from STA accessible through their SOAP
services is structured in Datex II format7 which is derived from
the XML format and perceived as standard by many governmental
institutions within EU. Consequently, it can be concluded that the
data from STA has 3-star data quality, according to the Star rating
system of data. Even though the data published like this can be
used by different applications, the possible use cases from user
perspective are limited, should the data be available in 5-star
quality using the Linked Open Data standards.
Therefore, we see an opportunity in transforming the public
transport data from STA into 5-star quality and interlinking it with
entities of the LOD Cloud. The automated workflow of the system
is consisted of two main phases, as explained in the next two sub-
sections, which are scheduled to be executed on hourly basis, in
order to provide most-recently transport data. Explained briefly,
here are the main steps which take place in the automated process:
1. Automated data gathering
a. A script is scheduled to run and access the STA’s
services to obtain the XML data from the datasets of
interest.
b. The XML data is parsed and transformed into
RDF/XML format using ontologies for semantic
annotation.
c. The RDF graph is loaded into Apache Jena Fuseki
Server instance and is updated should new information
be added.
2. Transformation into 5-Star Linked Data
a. We run SPARQL-based procedures in order to locate
instances form our local RDF Graph, locate
corresponding entities in the LOD Cloud and create
necessary links.
A. Automated Data Gathering
The process of automated data gathering is done using a windows
batch script, which facilitates the requesting, gathering and storing
5 http://lov.okfn.org/dataset/lov/vocabs/gtfs/
6 http://www.trafikverket.se/
7 http://www.datex2.eu/
the data locally, as files in XML format. Once the XML files are
being stored locally, the next step in the process is transformation
into RDF/XML format (4-star data according to W3C standard).
The transformation is done by parsing each XML file and adding
RDF syntax to the content of each element, then every RDF/XML
file is loaded into an instance of Apache Jena Fuseki Server8.
Every time the automated process is being run, it updates the data
contained in our RDF graph. The RDF graph is available via a
persistent URI and it can be queried through a SPARQL endpoint.
B. Transport Administration Ontology
The main requirement for transforming the STA data in RDF
format and later as Linked Data is an ontology to be present which
would be used for semantic annotation of the data. Since no
common ontology exist that could be used for the annotation of
the whole STA data, we created the Transport Administration
Ontology (TAO) and reused properties from other ontologies in
the process of semantic annotation, following the Semantic Web
standards.
TAO consists of classes and properties which describe every
entity with all its attributes from the transport datasets we worked
with: Road Condition, Road Work, Rest Area, Ferry Service and
Accident Service. In this section we will describe our TAO
ontology and will go through the properties which we reused from
other ontologies, to be able to fulfill our goals.
The object properties that our TAO ontology has are shown in
Table 1. The ontology has one object property and its
correspondent inverse property, that make connections between a
Situation_Record instances and the Location where they occurred.
Table 1. Object Properties of the TAO Ontology
Description
Information where the Situation
instance was recorded. Inverse of
location_Of_Siutation.
Used for determining situations that
occurred on specified location.
Table 2 on the other hand illustrates the datatype properties that
the TAO ontology introduces.
Table 2. Datatype Properties of the TAO Ontology
Description
TimeStamp when the Situation_Record
instance was created.
Current status of the Situation_Record
instance .
Traffic speed limitation where the
Accident occured.
Total lenght of the affected carrigeway,
measured in metres.
Number of lanes which are restricted
due to the Accident event.
The TAO Ontology has been published with a persistent URI9,
and is dereferenceable via HTTP content negotiation, as the best
practices suggest.
8 http://jena.apache.org/documentation/fuseki2/
9 http://sta.linkeddata.finki.ukim.mk/ontology/tao#
828
C. Transformation into 5-Star Linked Data
After the generation of the RDF graph, the next step in the process
is the transformation of the data into 5-star Linked Data. The
interlinking of data refers to the establishment of links between
the data instances of our local dataset and other datasets available
in LOD Cloud. To be able to make the connection with DBpedia,
we use the skos:related property from the SKOS namespace 10 for
the purpose of connecting instances of the cities found in our
dataset with the related city instances described in DBpedia.
4. USE CASES
One of the main advantages of using Linked Data is the ability of
accessing distributed data available on different locations on the
Web, while starting from single source. This ability could provide
large variety of possible use case scenarios involving transport
data.
The Semantic Web technologies allow information retrieval from
distributed datasets, through SPARQL federation.
In this section we will analyze two different types of scenarios,
where we first query the local dataset only and afterwards we
demonstrate how the information from DBpedia can be queried
starting from our local dataset, providing useful information to
potential users.
A. Using Road Accidents Data
In our work we address several different datasets published by
STA as described previously, which we can use to demonstrate
the information that could be obtained using the isolated dataset
only.
Table 3. Results from the query on the local dataset.
Time Stamp
Point
Road
Number
Length
Affected
2015-06-
10T19:35:28
12.0203247,
57.48617
Road 158
1516
2015-06-
10T18:59:24
13.6646309,
55.75424
Road 1106
2060
10 http://www.w3.org/TR/skos-reference/
One such scenario would be to find all Road Accidents occurred
on some road, along with information about the instances. For this
we can use the following SPARQL query:
SELECT ?Time (fn:concat(?Longitude, ",
"+?Latitude) AS ?Point) ?RoadNumber
?LengthAffected
WHERE {
?Accident tao:situationRecordTime ?Time;
tao:lengthAffected ?LengthAffected;
tao:has_Location ?Location.
?Location place:Road ?RoadNumber;
geo:longitude ?Longitude;
geo:latitude ?Latitude. }
ORDER BY DESC(?Time)
As an answer to our query, we get the Time Stamp of occurrence
of the road accident, geographical point that precisely shows
where the accident happened, which road it happened on and the
length of the affected carriageway measured in meters. The results
from the query executed over our Road Accident dataset are
shown in Table 3.
B. Using Additional Data from DBpedia
In this section we present a use case that is made possible only by
the link we made with the LOD cloud through the skos:related
property. In the query shown below, we are connecting to
DBpedia as part of the LOD Cloud, in order to retrieve more
information about the city which is closest to a rest area in
Sweden.
SELECT DISTINCT ?cityName ?RestAreaName
?abstract ?thumbnail
WHERE
{
?city tao:cityName ?cityName;
skos:related ?dbpediaResource.
SERVICE <http://dbpedia.org/sparql>
{
?NearestCity dbpedia-owl:nearestCity
?dbpediaResource;
dbpedia-owl:abstract ?abstract;
dbpedia-owl:thumbnail ?thumbnail;
dbpprop:name ?RestAreaName.
FILTER langMatches(lang(?abstract),
"en")
}
}
This query first executes over the local RDF graph, looking for
cities via the property tao:cityName. The detected instance is of a
city is linked with its corresponding DBpedia city resource which
Figure 1. Diagram of the Transport Administration Ontology.
829
contains information about multiple facts which are related with
the city itself.
All of these example use case scenarios can be implemented in
any applications, since the SPARQL endpoint11 we created can be
used as a REST service. The GET calls should have the following
format:
http://sta.linkeddata.finki.ukim.mk/sparql?query
=SPARQLQUERY&format=FORMAT
Here, SPARQLQUERY represents the SPARQL query which is
to be executed, and FORMAT represents the format of the
response, such as HTML, XML, JSON, CSV, RDF/XML, N3,
Turtle, JSON-LD, etc.
This could provide an opportunity for developers to access
additional information, previously unavailable over the local
dataset from STA website.
5. WEB APPLICATION
The main use case scenarios can be seen in the web application12
that we built for demonstration purposes. It uses data from the
Apache Fuseki instance and provides information to the user
about most recent transport data from the STA site.
Figure 2. Details about city retrieved from DBpedia.
The web application uses our SPARQL endpoint to query the data
from the local dataset and data from LOD Cloud. One simple use
case would be showing information about a city (abstract,
population, geolocation data, etc.) retrieved from an external data
source - DBpedia. This can be done by first choosing a
corresponding Rest Area instance, and then selecting the city
Name. Then, the information is shown below, so the user can see
all the information about the city in one window (Figure 2).
11 http://sta.linkeddata.finki.ukim.mk/sparql
12 http://sta.linkeddata.finki.ukim.mk/
6. CONCLUSION
The Semantic Web and the Linked Data concept are considered to
be the next generation web of data which is structured and
interlinked by its meaning. By publishing the data with 5-star
quality and contributing to the LOD Cloud, we hope that we
increased the possibilities and motivated other organizations to
publish the data in this way.
In this paper we described a system we developed that
automatically gathers, transforms and publishes of 5-star Linked
Open Data from the transport domain, while also making it
accessible through a SPARQL endpoint. The initial 3-star quality
from the STA is transformed into Linked Open Data using the
TAO ontology. In addition to that, we demonstrated advanced use
case scenarios where the information from interlinked datasets is
being used thus enriching the user experience. Finally, we
developed a web application as proof of concept which
demonstrates the new scenarios.
The main idea behind this paper was to present the development
of an automated system which brings forward new possibilities,
new scenarios that can come out of the publishing of data as
Linked Open Data. From a point of view of an isolated data sets,
these advanced scenarios were previously unavailable. In the
future, we hope to extend the ontology and develop it further so
that it would follow the standards and models proposed by the
INSPIRE Directive. That way, new opportunities for semantic
annotation of published data which follows those standards would
arise. Furthermore, we would like to extend the possibilities by
interlinking the STA data with more datasets and also to motivate
organizations to publish data in this way, thus increasing the value
of the services they deliver.
REFERENCES
[1] T. H. Davenport, “Competing on analytics”, Harvard
Business Review, 2006, 84(1), p. 98.
[2] H. Chen, R. H. Chiang and V. C. Storey, “Business
Intelligence and Analytics: From Big Data to Big
Impact”, MIS Quarterly, 36(4), 2012, pp. 1165-1188.
[3] T. Berners-Lee and N. Shadbolt, “There’s gold to be mined
from all our data”, 2012.
[4] T. Berners-Lee, J. Hendler and O. Lassila, “The semantic
web”, Scientific American, 2001, 284(5), pp. 28-37.
[5] N. Shadbolt, W. Hall, T.Berners-Lee, “The semantic web
revisited”, Intelligent Systems, IEEE, 21(3), 2006, pp. 96-
101.
[6] T. Berners-Lee, “Semantic web road map”, 1998.
[7] C. Bizer, T. Heath, K. Idehen and T. Berners-Lee, “Linked
data on the web”, 17th International conference on World
Wide Web, ACM, 2008, pp. 1265-1266.
[8] E. Misheva, B. Najdenov, M. Jovanovik and D. Trajanov,
Open Public Transport Data in Macedonia”, 11th
Conference for Informatics and Information Technology
(CIIT), 2014.
830
... In the second research project, we designed and developed an automated system which transforms and publishes the public transport from Sweden to 5-star Linked Data [77]. In order to realize the annotation, we defined a new ontology, the Transport Administration (TAO) Ontology 3 . ...
... Based on our experience with applying the Linked Data principles in the domains of public transport and air pollution [73,77,75,72], the financial domain [76], the entertainment domain [61] and the healthcare domain [60,59,58,57], we developed a methodology for Linked Data, focused on reusable components as support for the methodology steps. These guidelines build on the existing Linked Data methodologies and contain actions which cover the general Linked Data lifecycle. ...
Thesis
Full-text available
The vast amount of data available over the distributed infrastructure of the Web has initiated the development of techniques for their representation, storage and usage. One of these techniques is the Linked Data paradigm, which aims to provide unified practices for publishing and contextually interlinking data on the Web, by using the World Wide Web Consortium (W3C) standards and the Semantic Web technologies. This approach enables the transformation of the Web from a web of documents, to a web of data. With it, the Web transforms into a distributed network of data which can be used by software agents and machines. The interlinked nature of the distributed datasets enables the creation of advanced use-case scenarios for the end users and their applications , scenarios previously unavailable over isolated data silos. This creates opportunities for generating new business values in the industry. The adoption of the Linked Data principles by data publishers from the research community and the industry has led to the creation of the Linked Open Data (LOD) Cloud, a vast collection of interlinked data published on and accessible via the existing infrastructure of the Web. The experience in creating these Linked Data datasets has led to the development of a few methodo-logies for transforming and publishing Linked Data. However, even though these methodologies cover the process of modeling, transforming / generating and publishing Linked Data, they do not consider reuse of the steps from the life-cycle. This results in separate and independent efforts to generate Linked Data within a given domain, which always go through the entire set of life-cycle steps. In this PhD thesis, based on our experience with generating Linked Data in various domains and based on the existing Linked Data methodologies, we define a new Linked Data methodology with a focus on reuse. It consists of five steps which encompass the tasks of studying the domain, modeling the data, transforming the data, publishing it and exploiting it. In each of the steps, the methodology provides guidance to data publishers on defining reusable components in the form of tools, schemas and services, for the given domain. With this, future Linked Data publishers in the domain would be able to reuse these components to go through the life-cycle steps in a more efficient and productive manner. With the reuse of schemas from the domain, the resulting Linked Data dataset will be compatible and aligned with other datasets generated by reusing the same components, which additionally leverages the value of the datasets. This approach aims to encourage data publishers to generate high-quality, aligned Linked Data datasets from various domains, leading to further growth of the number of datasets on the LOD Cloud, their quality and the exploitation scenarios. With the emergence of data-driven scientific fields, such as Data Science, creating and publishing high-quality Linked Data datasets on the Web is becoming even more important, as it provides an open dataspace built on existing Web standards. Such a dataspace enables data scientists to make data analytics over the cleaned, structured and aligned data in it, in order to produce new knowledge and introduce new value in a given domain. As the Linked Data principles are also applicable within closed environments over proprietary data, the same methods and approaches are applicable in the enterprise domain as well.
... The unified description of sensory data with Semantic Web technologies opens an opportunity for trading with sensory information, where the device holder can "sell" the data to the consumers that can benefit from it or to expose it for the common goods. The example for the later can be publishing information such as pollution values or location for city traffic optimization (the example with Google Traffic) [80]. In these cases, it is challenging to filter or aggregate only the data that is useful for the common purpose, while hiding and protecting the personal info. ...
Article
Full-text available
The increased number of IoT devices results in continuously generated massive amounts of raw data. Parts of this data are private and highly sensitive as they reflect owner’s behavior, obligations, habits, and preferences. In this paper, we point out that flexible and comprehensive access control policies are “a must” in the IoT domain. The Semantic Web technologies can address many of the challenges that the IoT access control is facing with today. Therefore, we analyze the current state of the art in this area and identify the challenges and opportunities for improved access control in a semantically enriched IoT environment. Applying semantics to IoT access control opens a lot of opportunities, such as semantic inference and reasoning, easy data sharing, data trading, new approaches to authentication, security policies based on a natural language and enhances the interoperability using a common ontology.
Article
(Full-text: https://doi.org/10.1016/j.iot.2024.101439) The ever-complex information environments and rapidly expanding data volumes of the modern digital infrastructure demand efficient knowledge organization and retrieval techniques. The Semantic Web initiative has defined principles and technologies, such as the Resource Description Framework (RDF) and the Web Ontology Language (OWL), to create structured and semantically rich Knowledge Graphs. Current OWL toolkits, however, are largely unsuitable for resource-constrained platforms, hindering development of truly ubiquitous knowledge-enabled frameworks and applications. This paper introduces Cowl, an OWL manipulation software designed for a wide spectrum of devices, ranging from workstations to embedded systems with stringent resource limitations. Its architecture, optimizations, and novel processing techniques are detailed, emphasizing computation efficiency and minimal memory use, and providing actionable design principles for future toolkit developers. Comparative experiments reveal state-of-the-art performance and memory efficiency, and its versatility is demonstrated through a comprehensive evaluation on a popular microcontroller platform. Finally, a case study illustrates its usefulness in a knowledge-enabled smart city context.
Article
Full-text available
Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework.
Conference Paper
Full-text available
The need to represent data on the Web in a way that will make it easier to manage, has led to new solutions for data representation, visualization, storage and querying. The concepts of Open Data, Linked Data and the Semantic Web offer a significant improvement in information and data dissemination. These concepts aim towards making data on the Web machine-readable and enable interlinking between data from different datasets, published on different locations. This allows easier data retrieval by software agents, and enables use-case scenarios which are unavailable over isolated data silos. On the other hand, personal time management and daily commute navigation in urban areas are one of the biggest influencers on the quality of life of a person. Public transport data has high value for citizens and generates numerous use-cases. In this paper, we describe the process of obtaining data from the public transport company JSP Skopje, transforming them into the standardized Google Transit Feed Specification format, enhancing them and creating 4 star Open Data. We reused the Transit Ontology and the W3C Geospatial Vocabulary, and developed our own complementing ontology for annotation purposes. We published the generated RDF datasets in order to support the provided use-case scenarios from this domain via a public SPARQL endpoint.
Article
Full-text available
Article
Full-text available
The Web is increasingly understood as a global information space consisting not just of linked documents, but also of Linked Data. More than just a vision, the resulting Web of Data has been brought into being by the maturing of the Semantic Web technology stack, and by the publication of an increasing number of data sets according to the principles of Linked Data. The Linked Data on the Web (LDOW2008) workshop brings together researchers and practitioners working on all aspects of Linked Data. The workshop provides a forum to present the state of the art in the field and to discuss ongoing and future research challenges. In this workshop summary we will outline the technical context in which Linked Data is situated, describe developments in the past year through initiatives such as the Linking Open Data community project, and look ahead to the workshop itself.
Article
Full-text available
We all know the power of the killer app. It's not just a support tool; it's a strategic weapon. Companies questing for killer apps generally focus all their firepower on the one area that promises to create the greatest competitive advantage. But a new breed of organization has upped the stakes: Amazon, Harrah's, Capital One, and the Boston Red Sox have all dominated their fields by deploying industrial-strength analytics across a wide variety of activities. At a time when firms in many industries offer similar products and use comparable technologies, business processes are among the few remaining points of differentiation--and analytics competitors wring every last drop of value from those processes. Employees hired for their expertise with numbers or trained to recognize their importance are armed with the best evidence and the best quantitative tools. As a result, they make the best decisions. In companies that compete on analytics, senior executives make it clear--from the top down--that analytics is central to strategy. Such organizations launch multiple initiatives involving complex data and statistical analysis, and quantitative activity is managed atthe enterprise (not departmental) level. In this article, professor Thomas H. Davenport lays out the characteristics and practices of these statistical masters and describes some of the very substantial changes other companies must undergo in order to compete on quantitative turf. As one would expect, the transformation requires a significant investment in technology, the accumulation of massive stores of data, and the formulation of company-wide strategies for managing the data. But, at least as important, it also requires executives' vocal, unswerving commitment and willingness to change the way employees think, work, and are treated.
Conference Paper
The Web is increasingly understood as a global information space consisting not just of linked documents, but also of Linked Data. More than just a vision, the resulting Web of Data has been brought into being by the maturing of the Semantic Web technology stack, and by the publication of an increasing number of data sets according to the principles of Linked Data. The Linked Data on the Web (LDOW2008) workshop brings together researchers and practitioners working on all aspects of Linked Data. The workshop provides a forum to present the state of the art in the field and to discuss ongoing and future research challenges. In this workshop summary we will outline the technical context in which Linked Data is situated, describe developments in the past year through initiatives such as the Linking Open Data community project, and look ahead to the workshop itself.
Article
There is no need to fear a 'database state'. The information age will boost the economy and make life easier Data is the new raw material of the 21st century — a resource that gets more plentiful every day. In today's web-connected world it drives transactions and decisions of every kind. We need accurate data to help us to catch trains and buses on time, anticipate the weather and pick the right place to live, course to study or product to buy. Two years ago in this newspaper we anticipated a world in which, if you typed your postcode into a government website you would get all sorts of data. You would see the crime rate for your neighbourhood, when the buses ran and the rubbish was collected, how the schools were doing and what your local authority spends. This is now a reality at data.gov.uk. When the data has been released, applications have quickly followed, from mobile apps to find an NHS dentist to companies that use the open data on spending to advise local authorities on how to get the best value for money. These open data apps are creating new businesses for their developers and great resources for us all. Take, for example, bus finders (see London Bus Stop Live or BusMate London) — these were developed within weeks of the data's release and did not cost the taxpayer a penny.
Article
The article included many scenarios in which intelligent agents and bots undertook tasks on behalf of their human or corporate owners. Of course, shopbots and auction bots abound on the Web, but these are essentially handcrafted for particular tasks: they have little ability to interact with heterogeneous data and information types. Because we haven't yet delivered large-scale, agent-based mediation, some commentators argue that the semantic Web has failed to deliver. We argue that agents can only flourish when standards are well established and that the Web standards for expressing shared meaning have progressed steadily over the past five years