Content uploaded by Florian Stinner
Author content
All content in this area was uploaded by Florian Stinner on May 24, 2021
Content may be subject to copyright.
conference topic:7)
1
PROCEEDINGS OF ECOS 2021 - THE 34TH INTERNATIONAL CONFERENCE ON
EFFICIENCY, COST, OPTIMIZATION, SIMULATION AND ENVIRONMENTAL IMPACT OF ENERGY SYSTEMS
JUNE 27-JULY 2, 2021, TAORMINA, ITALY
Automatic digital twin data model generation of building
energy systems from piping and instrumentation
diagrams
Florian Stinnera, Martin Wieceka, Marc Baranskia, Alexander K¨
umpela, Dirk M¨
ullera
aRWTH Aachen University, E.ON Energy Research Center, Institute for Energy Efficient Buildings and Indoor Climate,
Aachen, Germany, fstinner@eonerc.rwth-aachen.de
Abstract:
Buildings directly and indirectly emit a large share of current CO2 emissions. There is a high potential for CO2 savings
through modern control methods in building automation systems (BAS) like model predictive control (MPC). For a proper
control, MPC needs mathematical models to predict the future behavior of the controlled system. For this purpose, digital
twins of the building can be used. However, with current methods in existing buildings, a digital twin set up is usually labor-
intensive. Especially connecting the different components of the technical system to an overall digital twin of the building is
time-consuming. Piping and instrument diagrams (P&ID) can provide the needed information, but it is necessary to extract
the information and provide it in a standardized format to process it further.
In this work, we present an approach to recognize symbols and connections of P&ID from buildings in a completely au-
tomated way. There are various standards for graphical representation of symbols in P&ID of building energy systems.
Therefore, we use different data sources and standards to generate a holistic training data set. We apply algorithms for
symbol recognition, line recognition and derivation of connections to the data sets. Furthermore, the result is exported to a
format that provides semantics of building energy systems.
The symbol recognition, line recognition and connection recognition show good results with an average precision of 93.7%,
which can be used in further processes like control generation, (distributed) model predictive control or fault detection.
Nevertheless, the approach needs further research.
Keywords:
Building automation systems, Digital twin, Piping and instrument diagrams (P&ID), Building energy system, Topology de-
tection, Brick Schema.
1. Introduction
Climate change is the ultimate challenge for economics [1]. Buildings have a special importance in the fight against
climate change. If indirect emissions are included, buildings account for 28% of global [2] and 36% of European CO2
emissions [3]. In Europe, the actual renovation rate of 1% is too low to achieve the required reduction in greenhouse
gases [4] as 3% would be required [3]. The actual renovation rate results in a building’s average lifespan between 40 and
120 years [5]. In 2050, 75% of the buildings in OECD countries that existed in 2010 will still be in use [5]. Therefore,
methods for minimal-invasive improvement of energy systems in existing buildings must be developed.
In particular, non-residential buildings are equipped with a comprehensive building automation system (BAS). However,
their control system is incorrectly set in 90% of total floor area [6]. About 20-30% of total energy consumption in existing
non-residential buildings with BAS can be avoided only by improved control [5]. These novel control methods can include
control generation [7], (distributed) model predictive control ((D)MPC) [8] or fault detection and diagnostics (FDD) [9]
including anomaly detection [10].
Digital planning processes could remedy the deficiencies in BAS. Nevertheless, incorrectly planned controls of existing
buildings are difficult to optimize. A digital twin can connect physical objects of an energy system with its virtual
objects [11] so that there is a seamless transmission of data between them [12]. With help of a precise digital twin of
building energy system, building automation can be optimized [13].
Machine learning applications can support to generate a digital twin. In order to use automatic control systems like
energy-cyber-physical systems [14], system knowledge of building energy systems (BES) and their technical building
equipment (TBE) must be acquired. If this knowledge is not digitally processable in existing buildings, it must be gained
in a personnel-intensive way.
Piping and instrumentation diagrams (P&ID) provide information on how the parts of an energy system are linked, such
as connecting a boiler to a distribution system. They are often not available in a computer-interpretable form. P&ID
can be digitized as cost-effective alternative for labor-intensive processing of P&ID to increase the understanding of the
system [15]. They are often available as a file in PDF format or as a printed or drawn plan. These plans provide valuable
information for the optimization of building operation and the creation of a digital twin. The costs of optimizing existing
BES can be reduced by automated extraction of information contained in the plans.
1
In order to control the increasingly complex control systems in a stable and optimized way, they must be permanently
improved by new systems, solutions and products [16]. Here, a digital twin can assist operation [17]. A format that
includes domain knowledge in building services engineering, especially for analysis of BAS, is crucial to ensure that this
format can continue to be used during its lifespan.
For analysis of operation of a building, the Brick Schema was developed [18]. It contains most of necessary connections
of energy systems and description of their sensors and actuators and is expected to become part of BACnet, which is one
of the most widely used protocol for BAS [19].
The sensors, actuators and controller of a building generate data streams. These data streams also have a data stream
identifier. It is usually constructed by a label structure previously defined by the building owner or building automation
company. Since the label structures differ greatly in practice [20, 21], they must be automatically processable. For this
purpose the BUDO Schema was developed as a universal data stream identifier [21]. It can be used for existing and new
buildings.
In our approach, the P&ID is transferred to Brick Schema for further processing and to the BUDO Schema for conversion
to a label structure. Building energy performance simulation (BEPS) models can then be generated from BUDO Schema
and Brick Schema [22]. The reduction of personnel expenses in BEPS generation is crucial to ensure that BEPS can be
used on a scalable basis [23] and in model predictive control [8]. The labels according to BUDO Schema are integrable
into the BAS. BUDO Schema offers a possibility to name and identify data streams directly from BEPS.
In this paper, we answer whether piping and instrumentation diagrams are a valid source for the extraction of topology
of existing building energy systems of non-residential buildings and their technical building equipment for modern appli-
cations in BAS such as model predictive control, building energy performance simulation (BEPS) or fault detection and
diagnostics.
Our contributions are as follows:
•Creation of a data set for detection of topology of various standards in plans of building energy systems
•Algorithm for export of P&ID of building energy systems into a digital twin model for
◦model predictive control
◦fault detection and diagnostics
◦automated building energy performance simulation model generation
◦integration into building automation system
•Time tracking of all needed process steps
First, we describe the used data set. The training data set is generated from three different sources. The test data set
consists of cutouts from five different P&ID from different vendors and used standards. Based on this, we present the
methodology used for symbol detection, line detection and cross detection. Afterwards, we present the results generated
using our method and our data set. Finally, we discuss the limitations of our approach, but also compare them with
approaches from industrial applications. We give an outlook for which applications our approach is suitable.
2. Related work
There are few commercial solutions for digitization of inventory plans. These are mainly limited to visualization, but do
not include the data transfer for automation systems [24]. For the history of image recognition and processing of P&ID,
we refer to [25].
There are only few existing automatic approaches for analyzing P&ID of BES. These used computer-aided design (CAD)
or Building Information Models (BIM) models as input. [26] developed a symbol recognition for BES P&ID based on
Faster R-CNN [27]. However, this approach lacks the recognition of connections between identified symbols and their
export.
Building Information Model (BIM) is a digital model and process that is used to coordinate the various trades, for example
building automation. Such a digital model could be a good data source of the building energy systems of existing buildings.
However, its dissemination is limited [28]. For example, a survey conducted by the German Chamber of Architects in
2017 indicated that only 12% of surveyed architects use BIM at all, and only 47% of these use BIM in all projects [29].
Thus, it can be assumed that only a very small part of the existing buildings were modeled with BIM though with CAD.
Additionally, the CAD data of buildings are often not transferred to the building operator. Therefore, alternative ways for
information extraction have to be found.
In buildings, various standards are used for the creation of P&ID as BAS partly controls energy supply of industrial
plants [30–42]. The symbols and designations changed over time and are not used consistently in practice. This is partly
due to the harmonization of standards. This makes a universal approach to the digitization of existing P&ID of BES into
a computer-interpretable form difficult.
The approaches for the automatic processing of P&ID shown so far concentrate on the automatic analysis of industrial
plants [15, 43–53]. Partly, the same technical systems are considered in the cross-sectional technology as in buildings.
However, these systems are often connected differently in industry than in buildings. The budget for the analysis of
systems also differs significantly here. As far as the authors are aware, there is no common approach for exporting P&ID
2
from BES into a computer-interpretable form.
There are different approaches for the detection of connections of symbols in technical systems. [26,43,52] only recognize
symbols of P&ID. We have no access to the algorithms used in [15, 49, 54] or they remain unclear. [44] additionally use
real data stream data to support topology recognition. [55] use human-machine interface (HMI) of technical systems.
However, details of used algorithms remain unclear.
A geometric matching is used in [48] for symbol detection and thus postulate that the symbols for an equipment type
in different plans are very similar and they use the square scan algorithm [56] for line detection. However, no results
exist and only one plan was involved. [45] apply an heuristic algorithm to improve the object and connection recognition
of complex P&ID while [51] use an independent process for connection detection based on pixel counting. [44] used
rule-based connection detection and thus an approach that is dependent on the creation of plans and standards it contains.
Probabilistic Hough transformation [57] is used in [50] for connection recognition. They use fully convolutional networks
(FCN) [58] with neural networks for symbol recognition. However, it is not exported into a reusable format. Additionally,
they achieved only 65.2% accuracy in detecting connections in the plan.
None of the examples shown in previously mentioned approaches has transferred a P&ID of a BES into a digital twin
model using object recognition and topology extraction.
3. Data set
Since the standards for objects in P&ID vary in BES, we use different data sources for raw data to generate a comprehen-
sive data set for training and testing. Raw data is defined in this work as graphical documents that contain information
needed for the detection tasks but can not directly be used for creating the data set of TBE symbols or connection deriva-
tion. Instead, preprocessing is required. We use in this work:
•Google Crawler image web search data [59]
•data from standards that standardize TBE symbols
•data from P&ID
All accessible examples of P&ID contain textual descriptions of the individual sensors and actuators, e.g. according to
DIN EN 81346 [60]. This extensive description of the sensors is unlocatable in any of our considered examples of BES.
In BES, P&ID textual annotations are often based on natural language or non-standard abbreviations and are individually
integrated into the P&ID of BES. For these reasons, the text entries in the plans have not been considered in the prepared
data set and used algorithm.
The data sets we use for object detection four TBE symbols, which are pump, valve, heat exchanger and flap. They can
be found most frequently in P&ID of BES and have a high influence on the control. Figure 1 shows a summary of the
training and test data set.
Figure 1: Histogram of the training and test data set showing number of images and objects by class (pump, valve,
HX=heat exchanger, flap, total, raw data sorted by source, connection detection
The sum of images for all TBE symbols is not equal to the total number of images, as one image can contain different TBE
symbols. The training data set contains 154 images in total, where as 84 images were extracted using Google Crawler [59]
with the following keywords:
• {”pump”, ”valve”, ”heat exchanger”, ”flap”}each combined with:
• {”symbol”, ”symbol iso”, ”standard symbol”, ”19227”, ”81346”, ”60417”}
3
13 standards are used [30–42] and 25 cutouts generated from it. Further, 45 images are from plan cutouts of five P&ID
from different buildings and design companies. These P&ID include complex energy systems such as district energy
systems.
As data source for the test data we use plan cutouts. The test image data set consists of 45 images from different technical
plans. The test to training data ratio is approximately 1:3. As the test data is generated with the focus on including images
that can be used for the connection detection, more valve objects exist in the test than training data.
Unfortunately, there is no publicly available data set for P&ID and for BES in particular. For creating the data set that was
used for our symbol recognition training process, the following preprocessing steps were done:
•Google Crawler data sorting
•P&ID cutout generation
•norm cutout generation
•training data labeling
The time duration for these steps can be found in figure 6.
The requirements for a test data set for evaluating the connection detection differ from the object detection. It is necessary
that connections between objects are shown. Therefore, for the connection test data set only the subset that comply with
this requirement was chosen. Therefore, the connection data set consists of 26 images and 566 classification elements.
4. Method
In [61], a toolchain is presented how raw data of building automation systems and digitization of P&ID support new
control approaches like control generation, (distributed) model predictive control or fault detection in building automation
systems. In this paper, we present the digitization of P&ID in BES in a computer-interpretable form.
In our algorithm, first, the objects within P&ID are recognized with object detection on the basis of their symbols. Sub-
sequently, the lines of P&ID and crossings of these are identified. From all this information, the next step is to determine
whether a connection exists between the objects. At the end, the connections are exported into a topology model for
digital twin generation.
4.1. Implementation of plan topology detection algorithms
For the detection of the plan elements and topology a step-wise approach was used. The overall structure of the approach
can be seen in Figure 2.
A Python-based object-oriented structure was established in our algorithm, which holds the information gathered from
our algorithms. The input plan image is saved. Further, for all three core objects of the plan (TBE, technical lines and
line crossings) three classes TechEquipment, TechLine and lineCrossing are established. Furthermore, a two-dimensional
connectionMatrix describes the connections between all TBE.
Figure 2: Schema showing the step-wise detection procedure (decomposition, symbol detection, line detection, line
crossing detection, connection derivation and export function)
Faster R-CNN [27] is used for symbol recognition. However, Faster R-CNN has problems to detect smaller objects [62],
thus why it is unable to directly recognize all the symbols on P&ID. So we decompose automatically the P&ID into
individual cutouts that are recognized by Faster R-CNN and then reassembled. The TBE symbols, the polylines and
locations of line crossings are extracted. The plan cutout information are composed to the whole plan level. A connection
derivation algorithm determines TBE connections. Last, the information about TBE objects and connections between
them are saved in a data interface.
4
4.1.1. Decomposition and composition of technical plan
Extracting whole plans from a PDF file as image file would result in high image sizes as a minimum dimension for the
symbols must be fulfilled. Our object detection classifier could not handle this high image sizes and the high number
of symbols to be detected. Therefore, we developed an approach to decompose the whole plan into cutouts. The cutout
allocation is saved, so that after extracting the core elements from the cutouts, the information can be composed again and
form the whole plan.
4.1.2. Object detection convolutional neural network approach
Since the symbols in BES are similar, but not exactly the same, a template matching approach is not appropriate where
an exact symbol is sought. A first test with template matching could confirm this assumption (F1 score <70%). Symbols
can also be rotated or skewed. Therefore, a general approach is needed for BES, which also does not recognize exact
copies of its training data symbols. Faster R-CNN [27] provides this approach. Therefore, we developed a function based
on it to detect the TBE symbols.
A function was developed for detecting the technical equipment symbols using Faster R-CNN [27]. To use the function,
first a CNN based object classifier was generated by training the R-CNN using the training data set previously described.
The labeled training images contain the position and class information for all objects. The training was done using a
TensorFlow script [63]. Around every five minutes the training routine periodically saves checkpoints which contain the
current state of the trained classifier. The total time duration for the training can be found in figure 6. After training is
done, checkpoint with the highest number of steps is used to generate the inference graph. The inference graph contains
the trained object detection classifier.
4.1.3. Line detection algorithm
All technical equipment are connected via polylines in P&ID in BES. Detecting the lines is therefore vital to derive the
connection information. First, the image is converted to a binary image, which is used to identify edges inside the image.
Next, the lines are identified using the Hough-Transform [64].
4.1.4. Line crossing algorithm
Line Crossings are locations inside the plan image, where straight lines intersect with other straight lines. They represent
a change of direction of a connection line. For the plans we used, we found that a connection exists if two or three lines
leave the line crossing and no connection exists if four lines leave the line crossing. Figure 3 shows examples for two,
three and four directional crossings and the derived connection assumption.
Every detected line is compared to all other lines for intersection identification. Consequently, the algorithm mainly
does one task, which is finding the intersection of two lines. We used the approach described in [65] where a line-line
intersection can be found using determinants.
Figure 3: Typical line crossing types in piping and instrumental diagrams in building energy systems (two directional
crossing, three directional crossing and four directional crossing)
4.1.5. Connection derivation algorithm
As basic information the connection derivation algorithm uses the core objects described in previous sections and join
them in order to derive the topological connections. The procedure can be seen in Figure 4, exemplarily shown for finding
5
Figure 4: Connection derivation procedure with detected symbols (Valve-1, Valve-2, Pump-4, Flap-3), detected lines
(Line-1, Line-2) and line crossings (LineCrossing-1)
the connecting TBE symbols for ”Pump-4”. First, the lines which cross that symbol are identified. This is done using the
Liang-Barsky-Algorithm [66]. Then, the neighboring elements for this symbol are detected. This is done by searching
for the closest elements that is on the same line as the symbol. In Figure 4 this is ”LineCrossing-1” for the left neighbor.
Therefore, the TBE neighbors depend on the directional information of the line crossing, according to Figure 3. As this
is a three-directional crossing, all symbols linked via this crossing count as connection. So the neighboring TBE symbols
for ”Pump-4” can be determined as ”Valve-1” and ”Valve-2”. The procedure will be done for all TBE symbols on the test
image.
4.2. Data interface
Industrial Foundation Classes in Version 4 (IFC4) is the actual open source standard for BIM. It has only a limited
descriptive capability for sensors and actuators and their relations [67]. Their descriptiveness is used even less frequently
in real-life projects [67].
None of the standards used for models by [68] and [15] includes specialized domain knowledge from BAS and BES.
For these reasons, we use the domain specific ontology-based Brick Schema [18] and label-based BUDO Schema [21] to
create a digital twin of BES systems as BES currently lack pervasive standards for automation [69].
4.3. Used computer hardware and software
For all Computer Vision tasks a HP Pavilion x360 convertible laptop, with 8 GB RAM, Intel Core i5-8250U CPU with
1.8 GHz and Nvidia GeForce MX130 graphic processing unit (GPU) was used. The computation capability for this GPU
is 5.0, which makes it possible to use the Nvidia Compute Unified Device Architecture (CUDA) toolkit, that enables
implementing parallel computing applications. For this work CUDA toolkit 10.0 was used for the training of the CNN.
All programming tasks in this approach were done with the programming language Python.
5. Results
To validate the proposed detection algorithms, binary classification is used. Important performance assessment parameters
derived from binary classification are recall, precision, F1 score, accuracy and average precision (AP) [70].
5.1. Object detection performance
Figure 5 shows the precision-recall curves for each of the four object classes. The previously defined object detection test
data set is used. The target state for the detection is the ground truth which is defined for every test image. The curves
are generated using an Intersection over Union (IoU)-threshold [71] between the ground truth and the detection bounding
boxes of 0.5, which led to the highest precision and recall values.
All curves show the typical course of the graph, high precision values for low recall values and a decreasing precision for
higher recall. For all curves, the precision stays above 90% until a recall of around 80%. For all classes, this leads to AP
values of around 80% or higher. The highest AP is reached for the class valve with a value of 97.5%. Even at a recall
of 1, the precision is at around 90%. Consequently, all valve symbols labeled as ground truths were detected, while only
10% of the predicted detections were false positives.
Figure 6 shows the time duration for different procedures of the whole process from data preparation to actual object
detection. It shall be noted that these values are not universally valid. They show the time we needed for these tasks and
can serve as a reference to evaluate the temporal and effort cost of the different procedures.
Generating the training image set out of the raw data sources is given by the first three points, where only times per image
are given. Times per object do not make sense in this case as the work effort is independent of the number of objects
drawn on the image. The preprocessing takes us around 120 minutes in total. The time per image ranges from around 0.05
up to 2 minutes. The labeling takes 154 minutes, which is comparable to the preprocessing. The time per object indicates
the time needed for labeling one object, which is about 20-25 seconds. The training time is 120 minutes with the data set
of 154 images and 296 objects. The times for the described procedures can be summarized to a total time for preparing a
usable CNN object detection algorithm, which is around 400 minutes in total. The actual detection process for all images
6
Figure 5: Precision-recall-curve for classes flap, heat exchanger, pump and valve, using IoU-threshold 0.5
of the test image set takes 5 minutes.
5.2. Connection detection performance
The connection detection performance is evaluated using the defined connection test data set with the numbers shown in
Figure 1 and the ground truth being connection matrices for all test images. To see the performance of the connection
algorithm separately without taking into account errors from the object detection, the ground truths for the TBE symbols
are used for the evaluation. Figure 7 shows the evaluation metrics for the total connection test image data set. 39
elements were identified as false negatives, so were not detected although a connection exists. 116 elements were identified
correctly, which leads to a recall of 75%. The precision is 91%, 11 elements were misdetected as being connections
although no connection exists. The number of false positives is low.
400 connection candidates were predicted correctly as true negative, so correctly not being connections. As the number
of true negatives is higher than the number of true positives, the accuracy is higher than the recall, as the accuracy takes
both, TP and TN into account. It is slightly above 90%.
6. Discussion
6.1. Object detection
Building energy systems (BES), unlike industrial systems, do not have an established standard for the construction of
P&ID diagrams in practice. Despite the inhomogeneous data set, the results of the object recognition are very good
with 93% average precision (AP) over all classes. The results of connection detection are highly dependent on the
preceding object detection. However, these results depend on the object class used. Heavily skewed symbols, like the heat
exchanger, achieve good results with up to 78% average precision (AP). However, they do not reach the results of constant
proportional symbols like valves with 97.5% AP. The heat exchanger is a decisive element for the topology detection of
building energy systems. Here, two hydraulically decoupled systems are connected with each other. In addition to the
skew, the symbols were also marked in different colors and their patterns were different. Other symbols were also different
in shape, but could be generalized well by the Faster R-CNN algorithm.
Especially for approaches of other domains, the comparison to our results is difficult. Since there is no open data set for
building energy systems, a comparison always depends on the data set used and the object types used. If these conditions
are disregarded, the results of object detection are comparable to the results of [26] and [49]. [49] used template matching.
Our results of <70% F1 score for template matching show that a universal approach using artificial intelligence is
needed. [52] had problems recognizing small symbols in the P&ID with Faster R-CNN. We solved this problem by
automating the decomposition of the P&ID in smaller pieces.
Erroneous symbol detections (false positives) were generated especially for symbols that were not included in the consid-
ered object types.
7
Figure 6: Total time duration, time duration per image and time duration per object for different tasks in object detection
6.2. Connection detection
With an accuracy of 91% and a precision of 91% many connections could be detected correctly. We found out that the
algorithm detects two-directional connections very well, but does not achieve this accuracy for connections over three
directional crossings and four directional crossings.
Considering all adversities of a comparison, we were able to exceed the results of [50] (65% accuracy) and performed
slightly worse compared to [49] (92% accuracy).
6.3. Overall evaluation
It is remarkable that despite the large variety of standards for P&ID in buildings, our approach produces good results. The
actual usefulness in applications depends on the use case.
For the automated transformation of a digital twin in control applications, such as MPC, the recognition of the inputs and
outputs of a system is important [72]. The definition of the system boundaries is crucial here. If for MPC the system
boundaries are defined to match the plan boundaries, our approach can support MPC. However, if the system boundaries
are set differently, a different approach to topology detection must be used.
For transformed Petri net [7], the detected topology can provide a basis for the creation of control code. Here, the correct
identification of the connections and the object types is important. The object types could be determined very well. The
connections still need manual work for correct use in Petri nets.
For fault detection and diagnosis, the evaluation of our algorithm depends on the methods used. The generation of time
series of individual fault models can be supported by our algorithm. In this case, connections that are set individually
incorrectly are not decisive. If the automatically generated digital twin is used in a parallel operation for fault detection,
the model exported from our approach cannot be used directly. Adjustments must be made here.
The approach in [22] for generating building energy systems can be used directly. Depending on the use of the model,
it is not decisive whether all connections can be correctly identified. For example, in the approach of generic data sets
shown in [22], the exact connection of the systems is not decisive for the use of the model. However, if the model is used
to predict the exact energy consumption (BEPS), an inaccurate model is not suitable [23].
For the integration into a building automation system, e.g. as a simulated virtual sensor, a very good model is necessary,
depending on the integration. For this purpose, our approach is only of limited use. Here, the exported digital twin model
has to be adapted. However, the translation into a labeling approach for building automation systems supports its use in
building automation systems.
8
Figure 7: Connection detection evaluation metrics (recall, precision, F1 score, accuracy, specificity, negative predictive
value)
7. Conclusion
7.1. Overall performance
We created a data set for the detection of topologies in building energy systems with piping and instrumentation diagrams
as data source. We tracked needed time for steps for preparation of our data set and implementation of the algorithms. The
combination of object recognition, line detection, line crossing and connection detection algorithm showed good results.
The algorithm recognized all object categories and connections well. An extension of the test data set by further object
types could improve the object recognition algorithm. This would result in fewer false detections of object types not taken
into account. The integration of skewed symbols in the test data set could also improve the results, especially for the
object type heat exchanger.
We showed that plans of technical building equipment are a good source for the system understanding of a building energy
system. The direct use of the exported digital twins depends on the application.
7.2. Further use of the developed digital twins
The approach presented here provides a base for advanced algorithms to optimize the energy use. We discussed if our
approach supports model predictive control, generation of control code, fault detection and diagnosis, automatically gener-
ated simulation models and integration into building automation systems depending on the application. We will implement
and review these approaches in our tool chain in future work.
Acknowledgments
We gratefully acknowledge the financial support provided by the BMWi (Federal Ministry for Economic Affairs and
Energy), promotional reference 03SBE006A.
References
[1] Nordhaus W. Climate Change: The Ultimate Challenge for Economics. American Economic Review.
2019;109(6):1991–2014.
[2] International Energy Agency. Tracking Buildings;. Available from: https://www.iea.org/reports/
tracking-buildings.
[3] European Comission. Directive (EU) 2018/844 of the European Parliament and of the Council of 30 May 2018
amending Directive 2010/31/EU on the energy performance of buildings and Directive 2012/27/EU on energy effi-
ciency;. Available from: http://data.europa.eu/eli/dir/2018/844/oj.
[4] D’Agostino D, Zangheri P, Castellazzi L. Towards Nearly Zero Energy Buildings in Europe: A Focus on Retrofit in
Non-Residential Buildings. Energies. 2017;10(1):117.
[5] International Energy Agency. Transition to Sustainable Buildings. OECD; 2013.
[6] Waide P, Ure J, Karagianni N, Smith G, Bordass B. WAIDE STRATEGIC EFFICIENCY, editor. The scope for
energy and CO2 savings in the EU through the use of building automation technology: Final Report;. Avail-
able from: http://neu.eubac.org/fileadmin/eu.bac/BACS_studies_and_reports/2014.
06.13_Waide_ECI_-_Energy_and_CO2_savings_BAT.pdf.
9
[7] Schumacher F, Fay A. Formal representation of GRAFCET to automatically generate control code. Control Engi-
neering Practice. 2014;33:84–93.
[8] Pr´
ıvara S, Cigler J, V´
aˇ
na Z, Oldewurtel F, Sagerschnig C, ˇ
Z´
aˇ
cekov´
a E. Building modeling as a crucial part for
building predictive control. Energy and Buildings. 2013;56:8–22.
[9] Kim W, Katipamula S. A review of fault detection and diagnostics methods for building systems. Science and
Technology for the Built Environment. 2018;24(1):3–21.
[10] Capozzoli A, Piscitelli MS, Brandi S, Grassi D, Chicco G. Automated load pattern learning and anomaly detection
for enhancing energy management in smart buildings. Energy. 2018;157:336–352.
[11] Tao F, Cheng J, Qi Q, Zhang M, Zhang H, Sui F. Digital twin-driven product design, manufacturing and service with
big data. The International Journal of Advanced Manufacturing Technology. 2018;94(9-12):3563–3576.
[12] El Saddik A. Digital Twins: The Convergence of Multimedia Technologies. IEEE MultiMedia. 2018;25(2):87–92.
[13] Fuentes DED, Becker U, Diekhake P, Gunther M, Scholz A, Schmidt PP, et al. Evaluation and simulation of building
automation systems based on their AutomationML description. In: 2016 IEEE 21st International Conference on
Emerging Technologies and Factory Automation (ETFA). [Piscataway, New Jersey]: Ieee; 2016. p. 1–6.
[14] Schmidt M, Moreno MV, Sch¨
ulke A, Macek K, Maˇ
r´
ık K, Pastor AG. Optimizing legacy building operation: The
evolution into data-driven predictive cyber-physical systems. Energy and Buildings. 2017;148:257–279.
[15] Arroyo E, Hoernicke M, Rodr´
ıguez P, Fay A. Automatic derivation of qualitative plant simulation models from
legacy piping and instrumentation diagrams. Computers & Chemical Engineering. 2016;92:112–132.
[16] Isaksson AJ, Harjunkoski I, Sand G. The impact of digitalization on the future of control and operations. Computers
& Chemical Engineering. 2018;114:122–129.
[17] Lydon GP, Caranovic S, Hischier I, Schlueter A. Coupled simulation of thermally active building systems to support
a digital twin. Energy and Buildings. 2019;202:109298.
[18] Balaji B, Bhattacharya A, Fierro G, Gao J, Gluck J, Hong D, et al. Brick : Metadata schema for portable smart
building applications. Applied Energy. 2018;226:1273–1292.
[19] Haynes A. ASHRAE’s BACnet Committee, Project Haystack and Brick Schema Collaborating to Provide
Unified Data Semantic Modeling Solution. ATLANTA, BERKLEY, CA and RICHMOND, VA; 28.02.2018.
Available from: https://www.ashrae.org/about/news/2018/ashrae-s-bacnet-committee-
project-haystack-and-brick- schema-collaborating-to-provide-unified-data-
semantic-modeling-solution.
[20] Gao J, Berg´
es M. A large-scale evaluation of automated metadata inference approaches on sensors from air handling
units. Advanced Engineering Informatics. 2018;37:14–30.
[21] Stinner F, Kornas A, Baranski M, M¨
uller D. Structuring building monitoring and automation system data. In:
REHVA, editor. The REHVA European HVAC Journal - August 2018. REHVA Journal; 2018. p. 10–15. Available
from: https://www.rehva.eu/fileadmin/user_upload/10-15_RJ1804_WEB.pdf.
[22] Stinner F, Yang Y, Schreiber T, Bode G, Baranski M, M¨
uller D. Generating Generic Data Sets for Machine Learning
Applications in Building Services Using Standardized Time Series Data. In: Al-Hussein M, editor. Proceedings
of the 36th International Symposium on Automation and Robotics in Construction (ISARC). Proceedings of the
International Symposium on Automation and Robotics in Construction (IAARC). International Association for Au-
tomation and Robotics in Construction (IAARC); 2019. .
[23] Egan J, Finn D, Deogene Soares PH, Rocha Baumann VA, Aghamolaei R, Beagon P, et al. Definition of a useful
minimal-set of accurately-specified input data for Building Energy Performance Simulation. Energy and Buildings.
2018;165:172–183.
[24] Arroyo E, Fay A, Hoernicke M, Rodriguez P. Digitalisierung grafischer Engineering-Dokumente mit Hilfe optischer
Erkennung und semantischer Analyse als Grundlage f¨
ur die Modernisierung bestehender Anlagen. In: Automation
2015; 2015. .
[25] Moreno-Garc´
ıa CF, Elyan E, Jayne C. New trends on digitisation of complex engineering drawings. Neural Com-
puting and Applications. 2018.
[26] Yuxi Zhang. CNN-based Symbol Recognition and Detection in Piping Drawings [Master thesis]. Purdue University.
West Lafayette, USA; 2019. Available from: https://hammer.figshare.com/articles/CNN-based_
Symbol_Recognition_and_Detection_in_Piping_Drawings/8301080.
[27] Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Net-
works. In: Computing research repository. vol. abs/1506.01497;. Available from: http://arxiv.org/pdf/
1506.01497v3.
[28] Charef R, Emmitt S, Alaka H, Fouchal F. Building Information Modelling adoption in the European Union: An
overview. Journal of Building Engineering. 2019;25:100777.
10
[29] Reiß & Rommerich. Bericht zum Thema Building Information Modeling (BIM): Bundesweite Befragung der Mit-
glieder der Architektenkammern der L¨
ander;. Available from: https://www.bak.de/w/files/bak/07-
daten-und-fakten/architektenbefragungen/berufspolitik2017/2017_bak_bim_
berichtsband_alle-befragten.pdf.
[30] Deutsches Institut f¨
ur Normung. DIN EN 806-1 - Specifications for installations inside buildings conveying water
for human consumption – Part 1: General; 12/2001.
[31] Deutsches Institut f¨
ur Normung. DIN EN 12792:2003 - Ventilation for buildings - Symbols, terminology and graph-
ical symbols; 2004-01.
[32] Deutsches Institut f¨
ur Normung. DIN 19227-2:1991 - Control technology; graphical symbols and identifying letters
for process control engineering; representation of details; 1991.
[33] Deutsches Institut f¨
ur Normung. DIN EN 60617-2:1997 - Graphical symbols for diagrams - Part 2: Symbol elements,
qualifying symbols and other symbols having general application; 1997.
[34] Deutsches Institut f¨
ur Normung. DIN EN 60617-3 - Graphical symbols for diagrams - Part 3: Conductors and
connecting devices (IEC 60617-3:1996); German version EN 60617-3:1996; 1997.
[35] Deutsches Institut f¨
ur Normung. DIN EN 60617-11 - Graphical symbols for diagrams - Part 11: Architectural and
topographical installation plans and diagrams (IEC 60617-11:1996); German version EN 60617-11:1996; 1997.
[36] Deutsches Institut f¨
ur Normung. DIN EN ISO 10628-1:2014 - Diagrams for the chemical and petrochemical industry
- Part 1: Specification of diagrams; 04/2015.
[37] Deutsches Institut f¨
ur Normung. DIN EN ISO 10628-2 - Diagrams for the chemical and petrochemical industry -
Part 2: Graphical symbols (ISO 10628-2:2012); German version EN ISO 10628-2:2012; 04/2013.
[38] Deutsches Institut f¨
ur Normung. DIN 6779-12 - Structuring principles for technical products and technical product
documentation - Part 12: Buildings and building technology; 04/2011.
[39] Deutsches Institut f¨
ur Normung. DIN 6779-13 - Structuring principles for technical products and technical product
documentation - Part 13: Chemical plants; 01/2018.
[40] Deutsches Institut f¨
ur Normung. DIN 28000-4 - Chemical apparatus - Documentation in the life cycle of process
plants - Part 4: Graphical symbols of valves, pipes and actuators; 07/2014.
[41] Deutsches Institut f¨
ur Normung. DIN 28000-5 - Chemical apparatus - Documentation in the life cycle of process
plants - Part 5: Graphical symbols of apparatus and machines; 04/2015.
[42] Deutsches Institut f¨
ur Normung. DIN ISO/TS 81346-10:2015 - Industrial systems, installations and equipment
and industrial products - Structuring principles and reference designation - Part 10: Power plants (ISO/TS 81346-
10:2015); 10/2015.
[43] Gellaboina MK, Venkoparao VG. Graphic Symbol Recognition Using Auto Associative Neural Network Model. In:
Chanda B, editor. Proceedings. Los Alamitos, Calif.: IEEE Computer Society Press; 2009. p. 297–301.
[44] Gutermuth G, Hoernicke M. Automatic generation of plant topologies by analysing operations data. In: Automa-
tion IICoET, Factory, editors. 2017 22nd IEEE International Conference on Emerging Technologies and Factory
Automation. [Piscataway, NJ]: Ieee; 2017. p. 1–8.
[45] Moreno-Garc´
ıa CF, Elyan E, Jayne C. Heuristics-Based Detection to Improve Text/Graphics Segmentation in Com-
plex Engineering Drawings. In: Boracchi G, Iliadis L, Jayne C, Likas A, editors. Engineering applications of neural
networks. vol. 744 of Communications in Computer and Information Science. [Place of publication not identified]:
Springer International Publishing; 2017. p. 87–98.
[46] Martinez GS, Sierla S, Karhela T, Vyatkin V. Automatic Generation of a Simulation-Based Digital Twin of an
Industrial Process Plant. In: IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society.
Piscataway, NJ: Ieee; 2018. p. 3084–3089.
[47] Tan WC, Chen IM, Pantazis D, Pan SJ. Transfer Learning with PipNet: For Automated Visual Analysis of Piping
Design. In: 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE). [S.l.]: Ieee;
8/20/2018 - 8/24/2018. p. 1296–1301.
[48] Koltun G, Maurer F, Knoll A, Trunzer E, Vogel-Heuser B. Information Retrieval from Redlined Circuit Diagrams
and its Model-Based Representation for Automated Engineering. In: IECON 2018 - 44th Annual Conference of the
IEEE Industrial Electronics Society. Piscataway, NJ: Ieee; 2018. p. 3114–3119.
[49] Kang SO, Lee EB, Baek HK. A Digitization and Conversion Tool for Imaged Drawings to Intelligent Piping and
Instrumentation Diagrams (P&ID). Energies. 2019;12(13):2593.
[50] Rahul R, Paliwal S, Sharma M, Vig L. Automatic Information Extraction from Piping and Instrumentation Diagrams.
In: de Maria E, Fred A, Gamboa H, editors. Proceedings. Bioinformatics. [S. l.]: SCITEPRESS = Science and
Technology Publications; 2019. p. 163–172.
[51] Yu, Cha, Lee, Kim, Mun. Features Recognition from Piping and Instrumentation Diagrams in Image Format Using
11
a Deep Learning Network. Energies. 2019;12(23):4425.
[52] Nurminen JK, Rainio K, Numminen JP, Syrj¨
anen T, Paganus N, Honkoila K. Object Detection in Design Diagrams
with Machine Learning. In: Burduk R, Kurzy ´
nski M, Wo´
zniak M, editors. Progress in computer recognition systems.
vol. 977 of Advances in Intelligent Systems and Computing. Cham, Switzerland: Springer; 2020. p. 27–36.
[53] Rica E, Moreno-Garc´
ıa CF, ´
Alvarez S, Serratosa F. Reducing human effort in engineering drawing validation.
Computers in Industry. 2020;117:103198.
[54] Bigvand PG, Fay A. A workflow support system for the process and automation engineering of production plants.
In: 2017 IEEE International Conference on Industrial Technology (ICIT). Piscataway, NJ: Ieee; 2017. p. 1118–1123.
[55] Hoernicke M, Fay A, Barth M. Virtual plants for brown-field projects. In: 2015 IEEE 20th Conference on Emerging
Technologies & Factory Automation (ETFA). Piscataway, NJ: Ieee; 2015. p. 1–8.
[56] El-Harby AA. New Square Scan Algorithm. In: GVIP Journal. vol. 9; 2005. .
[57] Kiryati N, Eldar Y, Bruckstein AM. A probabilistic Hough transform. Pattern Recognition. 1991;24(4):303–316.
[58] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Computer Vision and
Pattern Recognition (CVPR), 2015 IEEE Conference on. [Piscataway, New Jersey]: [EEE]; 2015. p. 3431–3440.
[59] Vasa H. Google images download; 2019. Available from: https://github.com/hardikvasa/google-
images-download.
[60] Deutsches Institut f¨
ur Normung. DIN EN 81346-1: Industrial systems, installations and equipment and industrial
products - Structuring principles and reference designations - Part 1: Basic rules (IEC 81346-1:2009); German
version EN 81346-1:2009; 2009.
[61] Bode G, Stinner F, Baranski M, Br ¨
ummendorf E, Cai X, K¨
umpel A, et al. From plans to programs: A holistic
toolchain for building data applications. Journal of Physics: Conference Series. 2019;1343:012117. Available from:
https://iopscience.iop.org/article/10.1088/1742-6596/1343/1/012117/pdf.
[62] Eggert C, Brehm S, Winschel A, Zecha D, Lienhart R. A closer look: Small object detection in faster R-CNN. In:
2017 IEEE International Conference on Multimedia and Expo (ICME). Piscataway, NJ: Ieee; 2017. p. 421–426.
[63] Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al.. TensorFlow: Large-Scale Machine Learning on
Heterogeneous Systems; 2015. Available from: https://www.tensorflow.org/.
[64] ; Method and means for recognizing complex patterns. ; 1962-12-18. Available from: https://www.osti.
gov/biblio/4746348.
[65] Weisstein EW. Line-Line Intersection; 2007. Available from: https://mathworld.wolfram.com/Line-
LineIntersection.html.
[66] Barsky BA, Liang Y, Slater M. University of California at Berkeley, editor. Some Improvements to a Parametric
Line Clipping Algorithm. USA;.
[67] Lange H, Johansen A, Kjærgaard MB. Evaluation of the opportunities and limitations of using IFC models as source
of building metadata. In: Ramachandran GS, Batra N, editors. BuildSys’18. New York, New York: The Association
for Computing Machinery; 2018. p. 21–24.
[68] Koltun G, Basirati MR, Subhan Hammeed M, Bohm M, Krcmar H, Vogel-Heuser B. Reverse Engineering on
changed Functional Specification Documents for Model-Based Requirements Engineering. In: 2019 IEEE Interna-
tional Conference on Industrial Cyber Physical Systems (ICPS). Ieee; 5/6/2019 - 5/9/2019. p. 687–692.
[69] Krutwig M, K¨
olmel B, Tantau A, Starosta K. Standards for Cyber-Physical Energy Systems—Two Case Studies
from Sensor Technology. Applied Sciences. 2019;9(3):435.
[70] Nanopoulos A, Alcock R, Manolopoulos Y. Feature-based Classification of Time-series Data. In: Mastorakis N,
Nikolopoulos SD, editors. Information processing and technology. Commack, NY, USA: Nova Science Publishers,
Inc; 2001. p. 49–61. Available from: http://dl.acm.org/citation.cfm?id=766914.766918.
[71] Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S. Generalized Intersection over Union: A Metric
and A Loss for Bounding Box Regression;. Available from: http://arxiv.org/pdf/1902.09630v2.
[72] Christofides PD, Scattolini R, La Mu˜
noz de Pe˜
na D, Liu J. Distributed model predictive control: A tutorial review
and future research directions. Computers & Chemical Engineering. 2013;51:21–41.
12