Content uploaded by Arne Bröring
Author content
All content in this area was uploaded by Arne Bröring on May 06, 2019
Content may be subject to copyright.
NOVA: a Knowledge Base for the Node-RED IoT
Ecosystem
Arne Bröring1, Victor Charpenay2, Darko Anicic1, and Sebastien Püech1
1Siemens AG — Corporate Technology, Munich, Germany
2Friedrich-Alexander-Universität, Erlangen/Nürnberg, Germany
Abstract. Node-RED is comprised of a large ecosystem of nodes for
IoT devices and services that makes it a powerful tool for IoT applica-
tion development. In order to facilitate the usage of this heterogeneous
ecosystem in industrial settings, we present here the NOde-red library
eVAluation (NOVA) approach for gathering the relevant metadata in a
knowledge base and first analyses of the data.
1 A Knowledge Base for Node-RED
Today, Node-RED’s catalog comprises over 3.000 nodes and flows. A node im-
plementation can be seen as an adapter of a device or service that makes their
functionality available in the Node-RED ecosystem. Using the Node-RED graph-
ical user interface, nodes can then be combined in so-called flows. A flow, which
can also comprise sub-flows, represents an IoT application [1]. Node-RED’s cat-
alog contains nodes for IoT platforms and devices (e.g., Xively or Raspberry Pi),
Web services (e.g., Twitter), smart home products (e.g., Philips Hue or Ama-
zon Alexa), industrial automation (e.g., Siemens S7, OPC UA, or ModBus) or
analytics and machine learning (e.g., IBM Watson).
The easy usage of the Node-RED UI to mash-up IoT applications in combina-
tion with the rich ecosystem makes Node-RED valuable for industrial enterprises
and their customers. However, descriptions of nodes and flows are currently not
accessible as a structured metadata. This hinders discovery, and does not allow
filtering based on the quality rating or licensing information. More importantly
metadata, containing explicit links from flows to contained nodes, does not exist
at all. All these shortcomings hamper the usage of Node-RED nodes and flows
in industrial settings.
In order to address these challenges, we have developed the NOde-red library
eVAluation (NOVA) approach1, which automatically collects large amounts of
metadata to feed a knowledge base for further analyses.
At its centre, NOVA comprises a Web crawler that harvests available meta-
data about nodes and flows, and stores them in a triple store. This is how
the NOVA knowledge base is created. The process works similarly to DBPe-
dia’s extraction manager [2]. Figure 1shows our RDF model, which is based on
1http://nova.iot-app.siemens.cloud
2
Fig. 1. RDF model of the NOVA knowledge base.
schema.org [3].While typical crawler approaches for the IoT (e.g. [4]) focus on
harvesting sensor data from public IoT platforms, we address here the collection
of metadata for IoT device adapters and application workflows.
The entry point for our crawler is the Node-RED catalog2, from which a
descriptive Web page for each node package can be found. For example, on
the page of the node that connects Particle devices3, a list of contained nodes
can be found, as well as general metadata (e.g., version, rating, keywords, or
maintainers). This metadata is stored in the class NodePackage. The crawler
follows links to listed Web resources: (1) the NPM repository and (2) the GitHub
project, and harvests further metadata from these pages.
Using NPM, the crawler downloads the software package and installs it lo-
cally. Thereby, all dependent libraries of the node package are downloaded, and
their metadata is stored using the SoftwareLibrary class. This includes informa-
tion on license and version, and gives a hint whether the native code is included.
This report is relevant for the choice of execution environment. Using GitHub’s
REST API, the crawler gathers all available metadata regarding the source code
(e.g., quality rating and number of commits), and stores this information in the
CodeRepository class. Additionally, the source code itself is downloaded by the
crawler and an automated analysis is triggered using PMD4. This static source
2https://flows.nodered.org
3https://flows.nodered.org/node/node-red- contrib-particle
4https://pmd.github.io/pmd-6.12.0/pmd_rules_ecmascript.html
3
code analyzer detects unreachable code, broken best practices, and error prone
code. The number of each kind of incident is stored in the CodeQualityCharac-
teristics class.
Finally, all flows are accessed via the Node-RED catalog and their metadata
is stored in the Flow class. Each flow references multiple nodes that make up its
structure. Thereby, a node that is part of a flow specifies, via the wires property,
to which other nodes its outputs are connected.
2 Data Analyses
The crawling described above runs for several hours and produces currently
around 650.000 triples. Using SPARQL, we analyzed the resulting knowledge
base. As a starting point, we counted node packages (1.593) and their (directly
and transitively) dependent libraries (4,482). In order to make these usable in
commercial settings, a license clearing is required. With the installation of the
NPM package (Section 1), we gathered those license-related data. In the post-
processing, we took the SPDX license list as a reference5and cleaned the license
names by ignoring case, spaces and ‘-’ characters. Thereby, 105 out of overall 128
licenses match an SPDX license. The most used license is MIT (41.528), followed
by ISC (7897) and Apache 2.0 (4140). The long tail of licenses (113) have less
than 10 occurrences. Now, having the license metadata available enables us to
automatically check the license compatibility for a specific node package.
Next, we address a known issue with the Node-RED specification of flows:
when a flow refers to some node, reference is not done by node package identifier
but simply by name of the node type. Consequently, it can be difficult for the user
to find the correct node package, which is required to run a desired flow. This
is an even more pressing issue, as the name of the node type can be ambiguous.
In the overall 984 flows, we found 20.311 node references to the 5.010 nodes
that are available in the 1.593 node packages. From these node references, 8.112
are ambiguous and coming from 90 different node packages. For instance, in a
flow6implementing smart metering on a Raspberry Pi, the listed "server" node
is ambiguous, as it is used in 2 node packages. In order to reuse this flow, a user
would have to manually determine the correct node package.
CONSTRUCT {
? fl o wN o de sc he ma : sa me As ?n od e .
}WHERE {
? no de a mo de l : Nod e ;
mo de l : nod eT y pe ? t y p e .
FILTER NOT EXISTS {
? o t h e r a mo d el : N ode ;
mo de l : nod eT y pe ? t y p e .
FILTER (? n od e ! = ? o t h e r )
}
? f l o w a mo d el : F lo w ;
mo de l : us e sN od e ? f lo w No de .
? f l o wN o d e mo de l : nod eT y pe ? t y p e .
}
Listing 1.1. Unambiguous refs.
CONSTRUCT {
? am b ig u ou s s ch em a : sa me As ? ot h e r .
}WHERE {
? f l o w a mo d el : F lo w ;
mo de l : u se s No d e ? a mb ig uo us , ? u na m bi gu ou s .
? am bi gu o us m od el : no deT yp e ? ty pe A mb ig uo us .
FILTER NOT EXISTS {
? am bi gu o us sch em a : sa meA s ? no de .
}
? un am bi gu ou s s ch em a : sam eAs ? no de .
? no d e m o de l : n o de Ho s te d On ? r e p o .
? o t h e r mo de l : n od eT yp e ? t yp e Am bi g uo us ;
mo de l : nod e Ho st e dO n ? re p o .
}
Listing 1.2. Refs. from the same repository
5https://spdx.org/licenses/
6https://flows.nodered.org/flow/ecd2b4f8af1b218df41258adb019184e
4
We address this issue by disambiguating nodes in flows. Listing 1.1 links nodes
packages to node references in flows, if they are unambiguous (i.e. if flows refer to
a unique node type value). The query in Listing 1.2 builds upon it and attempts
to resolve ambiguity by looking at ambiguous and unambiguous nodes from the
same repository (or node package) used in the same flow. Rationale: nodes of
the same package tend to be used together. 311 flows could be disambiguated
this way (out of 984). In the dataset, 120 nodes with a valid repository URL are
ambiguous, 75 of them are used in at least one flow.
3 Conclusions & Future Work
We present the NOVA approach, and showcase e.g. the automated license clear-
ing that can be extended similarly to [5], by checking composite license character-
istics for compatibility with the planned usage of a node. Second, we addressed
the issue of missing links from flows to contained nodes. In fact, the name of
listed nodes can be ambiguous. Using SPARQL, we show a two-fold querying
process that addresses this issue and solves ambiguity for many nodes.
The possibilities for future use and research on the created knowledge base
are broad. Automated quality checks or indexes can be developed based on
different input parameters from the NOVA model. This indicates to the users
if a component can be utilized or not. Discovery could be improved too. This
can be achieved by semantically annotating nodes and flows with categories, or
transforming their keywords into links to well-defined terms. This way, links to
a more general knowledge base, such as wikidata [6], could be built up.
Acknowledgement
This work has been supported through the project SEMIoTICS funded by the
European Union H2020 programme under grant agreement No. 780315.
References
1. N. K. Giang, M. Blackstock, R. Lea, V. C. Leung, Developing iot applications in
the fog: A distributed dataflow approach, in: 2015 5th International Conference on
the Internet of Things (IOT), IEEE, 2015, pp. 155–162.
2. J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hell-
mann, M. Morsey, P. Van Kleef, S. Auer, et al., Dbpedia–a large-scale, multilingual
knowledge base extracted from wikipedia, Semantic Web 6 (2) (2015) 167–195.
3. R. V. Guha, D. Brickley, S. Macbeth, Schema.org: evolution of structured data on
the web, Communications of the ACM 59 (2) (2016) 44–51.
4. A. Shemshadi, Q. Z. Sheng, Y. Qin, Thingseek: A crawler and search engine for the
internet of things, in: Proceedings of the 39th International ACM SIGIR conference
on Research and Development in Information Retrieval, ACM, 2016, pp. 1149–1152.
5. S. Villata, F. Gandon, Licenses compatibility and composition in the web of data,
in: Third International Workshop on Consuming Linked Data (COLD2012), 2012.
6. D. Vrandečić, M. Krötzsch, Wikidata: a free collaborative knowledge base, Commu-
nications of the ACM 57 (2014) 78–85.