Conference PaperPDF Available

NOVA: a Knowledge Base for the Node-RED IoT Ecosystem

Authors:

Abstract and Figures

Node-RED is comprised of a large ecosystem of nodes for IoT devices and services that makes it a powerful tool for IoT application development. In order to facilitate the usage of this heterogeneous ecosystem in industrial settings, we present here the NOde-red library eVAluation (NOVA) approach for gathering the relevant metadata in a knowledge base and first analyses of the data.
Content may be subject to copyright.
NOVA: a Knowledge Base for the Node-RED IoT
Ecosystem
Arne Bröring1, Victor Charpenay2, Darko Anicic1, and Sebastien Püech1
1Siemens AG — Corporate Technology, Munich, Germany
2Friedrich-Alexander-Universität, Erlangen/Nürnberg, Germany
Abstract. Node-RED is comprised of a large ecosystem of nodes for
IoT devices and services that makes it a powerful tool for IoT applica-
tion development. In order to facilitate the usage of this heterogeneous
ecosystem in industrial settings, we present here the NOde-red library
eVAluation (NOVA) approach for gathering the relevant metadata in a
knowledge base and first analyses of the data.
1 A Knowledge Base for Node-RED
Today, Node-RED’s catalog comprises over 3.000 nodes and flows. A node im-
plementation can be seen as an adapter of a device or service that makes their
functionality available in the Node-RED ecosystem. Using the Node-RED graph-
ical user interface, nodes can then be combined in so-called flows. A flow, which
can also comprise sub-flows, represents an IoT application [1]. Node-RED’s cat-
alog contains nodes for IoT platforms and devices (e.g., Xively or Raspberry Pi),
Web services (e.g., Twitter), smart home products (e.g., Philips Hue or Ama-
zon Alexa), industrial automation (e.g., Siemens S7, OPC UA, or ModBus) or
analytics and machine learning (e.g., IBM Watson).
The easy usage of the Node-RED UI to mash-up IoT applications in combina-
tion with the rich ecosystem makes Node-RED valuable for industrial enterprises
and their customers. However, descriptions of nodes and flows are currently not
accessible as a structured metadata. This hinders discovery, and does not allow
filtering based on the quality rating or licensing information. More importantly
metadata, containing explicit links from flows to contained nodes, does not exist
at all. All these shortcomings hamper the usage of Node-RED nodes and flows
in industrial settings.
In order to address these challenges, we have developed the NOde-red library
eVAluation (NOVA) approach1, which automatically collects large amounts of
metadata to feed a knowledge base for further analyses.
At its centre, NOVA comprises a Web crawler that harvests available meta-
data about nodes and flows, and stores them in a triple store. This is how
the NOVA knowledge base is created. The process works similarly to DBPe-
dia’s extraction manager [2]. Figure 1shows our RDF model, which is based on
1http://nova.iot-app.siemens.cloud
2
Fig. 1. RDF model of the NOVA knowledge base.
schema.org [3].While typical crawler approaches for the IoT (e.g. [4]) focus on
harvesting sensor data from public IoT platforms, we address here the collection
of metadata for IoT device adapters and application workflows.
The entry point for our crawler is the Node-RED catalog2, from which a
descriptive Web page for each node package can be found. For example, on
the page of the node that connects Particle devices3, a list of contained nodes
can be found, as well as general metadata (e.g., version, rating, keywords, or
maintainers). This metadata is stored in the class NodePackage. The crawler
follows links to listed Web resources: (1) the NPM repository and (2) the GitHub
project, and harvests further metadata from these pages.
Using NPM, the crawler downloads the software package and installs it lo-
cally. Thereby, all dependent libraries of the node package are downloaded, and
their metadata is stored using the SoftwareLibrary class. This includes informa-
tion on license and version, and gives a hint whether the native code is included.
This report is relevant for the choice of execution environment. Using GitHub’s
REST API, the crawler gathers all available metadata regarding the source code
(e.g., quality rating and number of commits), and stores this information in the
CodeRepository class. Additionally, the source code itself is downloaded by the
crawler and an automated analysis is triggered using PMD4. This static source
2https://flows.nodered.org
3https://flows.nodered.org/node/node-red- contrib-particle
4https://pmd.github.io/pmd-6.12.0/pmd_rules_ecmascript.html
3
code analyzer detects unreachable code, broken best practices, and error prone
code. The number of each kind of incident is stored in the CodeQualityCharac-
teristics class.
Finally, all flows are accessed via the Node-RED catalog and their metadata
is stored in the Flow class. Each flow references multiple nodes that make up its
structure. Thereby, a node that is part of a flow specifies, via the wires property,
to which other nodes its outputs are connected.
2 Data Analyses
The crawling described above runs for several hours and produces currently
around 650.000 triples. Using SPARQL, we analyzed the resulting knowledge
base. As a starting point, we counted node packages (1.593) and their (directly
and transitively) dependent libraries (4,482). In order to make these usable in
commercial settings, a license clearing is required. With the installation of the
NPM package (Section 1), we gathered those license-related data. In the post-
processing, we took the SPDX license list as a reference5and cleaned the license
names by ignoring case, spaces and ‘-’ characters. Thereby, 105 out of overall 128
licenses match an SPDX license. The most used license is MIT (41.528), followed
by ISC (7897) and Apache 2.0 (4140). The long tail of licenses (113) have less
than 10 occurrences. Now, having the license metadata available enables us to
automatically check the license compatibility for a specific node package.
Next, we address a known issue with the Node-RED specification of flows:
when a flow refers to some node, reference is not done by node package identifier
but simply by name of the node type. Consequently, it can be difficult for the user
to find the correct node package, which is required to run a desired flow. This
is an even more pressing issue, as the name of the node type can be ambiguous.
In the overall 984 flows, we found 20.311 node references to the 5.010 nodes
that are available in the 1.593 node packages. From these node references, 8.112
are ambiguous and coming from 90 different node packages. For instance, in a
flow6implementing smart metering on a Raspberry Pi, the listed "server" node
is ambiguous, as it is used in 2 node packages. In order to reuse this flow, a user
would have to manually determine the correct node package.
CONSTRUCT {
? fl o wN o de sc he ma : sa me As ?n od e .
}WHERE {
? no de a mo de l : Nod e ;
mo de l : nod eT y pe ? t y p e .
FILTER NOT EXISTS {
? o t h e r a mo d el : N ode ;
mo de l : nod eT y pe ? t y p e .
FILTER (? n od e ! = ? o t h e r )
}
? f l o w a mo d el : F lo w ;
mo de l : us e sN od e ? f lo w No de .
? f l o wN o d e mo de l : nod eT y pe ? t y p e .
}
Listing 1.1. Unambiguous refs.
CONSTRUCT {
? am b ig u ou s s ch em a : sa me As ? ot h e r .
}WHERE {
? f l o w a mo d el : F lo w ;
mo de l : u se s No d e ? a mb ig uo us , ? u na m bi gu ou s .
? am bi gu o us m od el : no deT yp e ? ty pe A mb ig uo us .
FILTER NOT EXISTS {
? am bi gu o us sch em a : sa meA s ? no de .
}
? un am bi gu ou s s ch em a : sam eAs ? no de .
? no d e m o de l : n o de Ho s te d On ? r e p o .
? o t h e r mo de l : n od eT yp e ? t yp e Am bi g uo us ;
mo de l : nod e Ho st e dO n ? re p o .
}
Listing 1.2. Refs. from the same repository
5https://spdx.org/licenses/
6https://flows.nodered.org/flow/ecd2b4f8af1b218df41258adb019184e
4
We address this issue by disambiguating nodes in flows. Listing 1.1 links nodes
packages to node references in flows, if they are unambiguous (i.e. if flows refer to
a unique node type value). The query in Listing 1.2 builds upon it and attempts
to resolve ambiguity by looking at ambiguous and unambiguous nodes from the
same repository (or node package) used in the same flow. Rationale: nodes of
the same package tend to be used together. 311 flows could be disambiguated
this way (out of 984). In the dataset, 120 nodes with a valid repository URL are
ambiguous, 75 of them are used in at least one flow.
3 Conclusions & Future Work
We present the NOVA approach, and showcase e.g. the automated license clear-
ing that can be extended similarly to [5], by checking composite license character-
istics for compatibility with the planned usage of a node. Second, we addressed
the issue of missing links from flows to contained nodes. In fact, the name of
listed nodes can be ambiguous. Using SPARQL, we show a two-fold querying
process that addresses this issue and solves ambiguity for many nodes.
The possibilities for future use and research on the created knowledge base
are broad. Automated quality checks or indexes can be developed based on
different input parameters from the NOVA model. This indicates to the users
if a component can be utilized or not. Discovery could be improved too. This
can be achieved by semantically annotating nodes and flows with categories, or
transforming their keywords into links to well-defined terms. This way, links to
a more general knowledge base, such as wikidata [6], could be built up.
Acknowledgement
This work has been supported through the project SEMIoTICS funded by the
European Union H2020 programme under grant agreement No. 780315.
References
1. N. K. Giang, M. Blackstock, R. Lea, V. C. Leung, Developing iot applications in
the fog: A distributed dataflow approach, in: 2015 5th International Conference on
the Internet of Things (IOT), IEEE, 2015, pp. 155–162.
2. J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hell-
mann, M. Morsey, P. Van Kleef, S. Auer, et al., Dbpedia–a large-scale, multilingual
knowledge base extracted from wikipedia, Semantic Web 6 (2) (2015) 167–195.
3. R. V. Guha, D. Brickley, S. Macbeth, Schema.org: evolution of structured data on
the web, Communications of the ACM 59 (2) (2016) 44–51.
4. A. Shemshadi, Q. Z. Sheng, Y. Qin, Thingseek: A crawler and search engine for the
internet of things, in: Proceedings of the 39th International ACM SIGIR conference
on Research and Development in Information Retrieval, ACM, 2016, pp. 1149–1152.
5. S. Villata, F. Gandon, Licenses compatibility and composition in the web of data,
in: Third International Workshop on Consuming Linked Data (COLD2012), 2012.
6. D. Vrandečić, M. Krötzsch, Wikidata: a free collaborative knowledge base, Commu-
nications of the ACM 57 (2014) 78–85.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The DBpedia community project extracts structured, multilingual knowledge from Wikipedia and makes it freely available on the Web using Semantic Web and Linked Data technologies. The project extracts knowledge from 111 different language editions of Wikipedia. The largest DBpedia knowledge base which is extracted from the English edition of Wikipedia consists of over 400 million facts that describe 3.7 million things. The DBpedia knowledge bases that are extracted from the other 110 Wikipedia editions together consist of 1.46 billion facts and describe 10 million additional things. The DBpedia project maps Wikipedia infoboxes from 27 different language editions to a single shared ontology consisting of 320 classes and 1,650 properties. The mappings are created via a world-wide crowd-sourcing effort and enable knowledge from the different Wikipedia editions to be combined. The project publishes releases of all DBpedia knowledge bases for download and provides SPARQL query access to 14 out of the 111 language editions via a global network of local DBpedia chapters. In addition to the regular releases, the project maintains a live knowledge base which is updated whenever a page in Wikipedia changes. DBpedia sets 27 million RDF links pointing into over 30 external data sources and thus enables data from these sources to be used together with DBpedia data. Several hundred data sets on the Web publish RDF links pointing to DBpedia themselves and make DBpedia one of the central interlinking hubs in the Linked Open Data (LOD) cloud. In this system report, we give an overview of the DBpedia community project, including its architecture, technical implementation, maintenance, internationalisation, usage statistics and applications.
Conference Paper
The rapidly growing paradigm of the Internet of Things (IoT) requires new search engines, which can crawl heterogeneous data sources and search in highly dynamic contexts. Existing search engines cannot meet these requirements as they are designed for traditional Web and human users only. This is contrary to the fact that things are emerging as major producers and consumers of information. Currently, there is very little work on searching IoT and a number of works claim the unavailability of public IoT data. However, it is dismissed that a majority of real-time web-based maps are sharing data that is generated by things, directly. To shed light on this line of research, in this paper, we firstly create a set of tools to capture IoT data from a set of given data sources. We then create two types of interfaces to provide real-time searching services on dynamic IoT data for both human and machine users.
Article
SEPARATION BETWEEN CONTENT and presentation has always been one of the important design aspects of the Web. Historically, however, even though most websites were driven off structured databases, they published their content purely in HTML. Services such as Web search, price comparison, reservation engines, among others that operated on this content had access only to HTML. Applications requiring access to the structured data underlying these Web pages had to build custom extractors to convert plain HTML into structured data. These efforts were often laborious and the scrapers were fragile and error prone, breaking every time a site changed its layout.
Article
Wikidata allows every user to extend and edit the stored information, even without creating an account. A form based interface makes editing easy. Wikidata's goal is to allow data to be used both in Wikipedia and in external applications. Data is exported through Web services in several formats, including JavaScript Object Notation, or JSON, and Resource Description Framework, or RDF. Data is published under legal terms that allow the widest possible reuse. The value of Wikipedia's data has long been obvious, with many efforts to use it. The Wikidata approach is to crowdsource data acquisition, allowing a global community to edit the data. This extends the traditional wiki approach of allowing users to edit a website. In March 2013, Wikimedia introduced Lua as a scripting language for automatically creating and enriching parts of articles. Lua scripts can access Wikidata, allowing Wikipedia editors to retrieve, process, and display data. Many other features were introduced in 2013, and development is planned to continue for the foreseeable future.
Licenses compatibility and composition in the web of data
  • S Villata
  • F Gandon
S. Villata, F. Gandon, Licenses compatibility and composition in the web of data, in: Third International Workshop on Consuming Linked Data (COLD2012), 2012.