Conference PaperPDF Available

Automatic digital twin data model generation of building energy systems from piping and instrumentation diagrams

  • aedifion GmbH

Abstract and Figures

Buildings directly and indirectly emit a large share of current CO2 emissions. There is a high potential for CO2 savings through modern control methods in building automation systems (BAS) like model predictive control (MPC). For a proper control, MPC needs mathematical models to predict the future behavior of the controlled system. For this purpose, digital twins of the building can be used. However, with current methods in existing buildings, a digital twin set up is usually labor-intensive. Especially connecting the different components of the technical system to an overall digital twin of the building is time-consuming. Piping and instrument diagrams (P&ID) can provide the needed information, but it is necessary to extract the information and provide it in a standardized format to process it further. In this work, we present an approach to recognize symbols and connections of P&ID from buildings in a completely automated way. There are various standards for graphical representation of symbols in P&ID of building energy systems. Therefore, we use different data sources and standards to generate a holistic training data set. We apply algorithms for symbol recognition, line recognition and derivation of connections to the data sets. Furthermore, the result is exported to a format that provides semantics of building energy systems. The symbol recognition, line recognition and connection recognition show good results with an average precision of 93.7%, which can be used in further processes like control generation, (distributed) model predictive control or fault detection. Nevertheless, the approach needs further research.
Content may be subject to copyright.
conference topic:7)
Automatic digital twin data model generation of building
energy systems from piping and instrumentation
Florian Stinnera, Martin Wieceka, Marc Baranskia, Alexander K¨
umpela, Dirk M¨
aRWTH Aachen University, E.ON Energy Research Center, Institute for Energy Efficient Buildings and Indoor Climate,
Aachen, Germany,
Buildings directly and indirectly emit a large share of current CO2 emissions. There is a high potential for CO2 savings
through modern control methods in building automation systems (BAS) like model predictive control (MPC). For a proper
control, MPC needs mathematical models to predict the future behavior of the controlled system. For this purpose, digital
twins of the building can be used. However, with current methods in existing buildings, a digital twin set up is usually labor-
intensive. Especially connecting the different components of the technical system to an overall digital twin of the building is
time-consuming. Piping and instrument diagrams (P&ID) can provide the needed information, but it is necessary to extract
the information and provide it in a standardized format to process it further.
In this work, we present an approach to recognize symbols and connections of P&ID from buildings in a completely au-
tomated way. There are various standards for graphical representation of symbols in P&ID of building energy systems.
Therefore, we use different data sources and standards to generate a holistic training data set. We apply algorithms for
symbol recognition, line recognition and derivation of connections to the data sets. Furthermore, the result is exported to a
format that provides semantics of building energy systems.
The symbol recognition, line recognition and connection recognition show good results with an average precision of 93.7%,
which can be used in further processes like control generation, (distributed) model predictive control or fault detection.
Nevertheless, the approach needs further research.
Building automation systems, Digital twin, Piping and instrument diagrams (P&ID), Building energy system, Topology de-
tection, Brick Schema.
1. Introduction
Climate change is the ultimate challenge for economics [1]. Buildings have a special importance in the fight against
climate change. If indirect emissions are included, buildings account for 28% of global [2] and 36% of European CO2
emissions [3]. In Europe, the actual renovation rate of 1% is too low to achieve the required reduction in greenhouse
gases [4] as 3% would be required [3]. The actual renovation rate results in a building’s average lifespan between 40 and
120 years [5]. In 2050, 75% of the buildings in OECD countries that existed in 2010 will still be in use [5]. Therefore,
methods for minimal-invasive improvement of energy systems in existing buildings must be developed.
In particular, non-residential buildings are equipped with a comprehensive building automation system (BAS). However,
their control system is incorrectly set in 90% of total floor area [6]. About 20-30% of total energy consumption in existing
non-residential buildings with BAS can be avoided only by improved control [5]. These novel control methods can include
control generation [7], (distributed) model predictive control ((D)MPC) [8] or fault detection and diagnostics (FDD) [9]
including anomaly detection [10].
Digital planning processes could remedy the deficiencies in BAS. Nevertheless, incorrectly planned controls of existing
buildings are difficult to optimize. A digital twin can connect physical objects of an energy system with its virtual
objects [11] so that there is a seamless transmission of data between them [12]. With help of a precise digital twin of
building energy system, building automation can be optimized [13].
Machine learning applications can support to generate a digital twin. In order to use automatic control systems like
energy-cyber-physical systems [14], system knowledge of building energy systems (BES) and their technical building
equipment (TBE) must be acquired. If this knowledge is not digitally processable in existing buildings, it must be gained
in a personnel-intensive way.
Piping and instrumentation diagrams (P&ID) provide information on how the parts of an energy system are linked, such
as connecting a boiler to a distribution system. They are often not available in a computer-interpretable form. P&ID
can be digitized as cost-effective alternative for labor-intensive processing of P&ID to increase the understanding of the
system [15]. They are often available as a file in PDF format or as a printed or drawn plan. These plans provide valuable
information for the optimization of building operation and the creation of a digital twin. The costs of optimizing existing
BES can be reduced by automated extraction of information contained in the plans.
In order to control the increasingly complex control systems in a stable and optimized way, they must be permanently
improved by new systems, solutions and products [16]. Here, a digital twin can assist operation [17]. A format that
includes domain knowledge in building services engineering, especially for analysis of BAS, is crucial to ensure that this
format can continue to be used during its lifespan.
For analysis of operation of a building, the Brick Schema was developed [18]. It contains most of necessary connections
of energy systems and description of their sensors and actuators and is expected to become part of BACnet, which is one
of the most widely used protocol for BAS [19].
The sensors, actuators and controller of a building generate data streams. These data streams also have a data stream
identifier. It is usually constructed by a label structure previously defined by the building owner or building automation
company. Since the label structures differ greatly in practice [20, 21], they must be automatically processable. For this
purpose the BUDO Schema was developed as a universal data stream identifier [21]. It can be used for existing and new
In our approach, the P&ID is transferred to Brick Schema for further processing and to the BUDO Schema for conversion
to a label structure. Building energy performance simulation (BEPS) models can then be generated from BUDO Schema
and Brick Schema [22]. The reduction of personnel expenses in BEPS generation is crucial to ensure that BEPS can be
used on a scalable basis [23] and in model predictive control [8]. The labels according to BUDO Schema are integrable
into the BAS. BUDO Schema offers a possibility to name and identify data streams directly from BEPS.
In this paper, we answer whether piping and instrumentation diagrams are a valid source for the extraction of topology
of existing building energy systems of non-residential buildings and their technical building equipment for modern appli-
cations in BAS such as model predictive control, building energy performance simulation (BEPS) or fault detection and
Our contributions are as follows:
Creation of a data set for detection of topology of various standards in plans of building energy systems
Algorithm for export of P&ID of building energy systems into a digital twin model for
model predictive control
fault detection and diagnostics
automated building energy performance simulation model generation
integration into building automation system
Time tracking of all needed process steps
First, we describe the used data set. The training data set is generated from three different sources. The test data set
consists of cutouts from five different P&ID from different vendors and used standards. Based on this, we present the
methodology used for symbol detection, line detection and cross detection. Afterwards, we present the results generated
using our method and our data set. Finally, we discuss the limitations of our approach, but also compare them with
approaches from industrial applications. We give an outlook for which applications our approach is suitable.
2. Related work
There are few commercial solutions for digitization of inventory plans. These are mainly limited to visualization, but do
not include the data transfer for automation systems [24]. For the history of image recognition and processing of P&ID,
we refer to [25].
There are only few existing automatic approaches for analyzing P&ID of BES. These used computer-aided design (CAD)
or Building Information Models (BIM) models as input. [26] developed a symbol recognition for BES P&ID based on
Faster R-CNN [27]. However, this approach lacks the recognition of connections between identified symbols and their
Building Information Model (BIM) is a digital model and process that is used to coordinate the various trades, for example
building automation. Such a digital model could be a good data source of the building energy systems of existing buildings.
However, its dissemination is limited [28]. For example, a survey conducted by the German Chamber of Architects in
2017 indicated that only 12% of surveyed architects use BIM at all, and only 47% of these use BIM in all projects [29].
Thus, it can be assumed that only a very small part of the existing buildings were modeled with BIM though with CAD.
Additionally, the CAD data of buildings are often not transferred to the building operator. Therefore, alternative ways for
information extraction have to be found.
In buildings, various standards are used for the creation of P&ID as BAS partly controls energy supply of industrial
plants [30–42]. The symbols and designations changed over time and are not used consistently in practice. This is partly
due to the harmonization of standards. This makes a universal approach to the digitization of existing P&ID of BES into
a computer-interpretable form difficult.
The approaches for the automatic processing of P&ID shown so far concentrate on the automatic analysis of industrial
plants [15, 43–53]. Partly, the same technical systems are considered in the cross-sectional technology as in buildings.
However, these systems are often connected differently in industry than in buildings. The budget for the analysis of
systems also differs significantly here. As far as the authors are aware, there is no common approach for exporting P&ID
from BES into a computer-interpretable form.
There are different approaches for the detection of connections of symbols in technical systems. [26,43,52] only recognize
symbols of P&ID. We have no access to the algorithms used in [15, 49, 54] or they remain unclear. [44] additionally use
real data stream data to support topology recognition. [55] use human-machine interface (HMI) of technical systems.
However, details of used algorithms remain unclear.
A geometric matching is used in [48] for symbol detection and thus postulate that the symbols for an equipment type
in different plans are very similar and they use the square scan algorithm [56] for line detection. However, no results
exist and only one plan was involved. [45] apply an heuristic algorithm to improve the object and connection recognition
of complex P&ID while [51] use an independent process for connection detection based on pixel counting. [44] used
rule-based connection detection and thus an approach that is dependent on the creation of plans and standards it contains.
Probabilistic Hough transformation [57] is used in [50] for connection recognition. They use fully convolutional networks
(FCN) [58] with neural networks for symbol recognition. However, it is not exported into a reusable format. Additionally,
they achieved only 65.2% accuracy in detecting connections in the plan.
None of the examples shown in previously mentioned approaches has transferred a P&ID of a BES into a digital twin
model using object recognition and topology extraction.
3. Data set
Since the standards for objects in P&ID vary in BES, we use different data sources for raw data to generate a comprehen-
sive data set for training and testing. Raw data is defined in this work as graphical documents that contain information
needed for the detection tasks but can not directly be used for creating the data set of TBE symbols or connection deriva-
tion. Instead, preprocessing is required. We use in this work:
Google Crawler image web search data [59]
data from standards that standardize TBE symbols
data from P&ID
All accessible examples of P&ID contain textual descriptions of the individual sensors and actuators, e.g. according to
DIN EN 81346 [60]. This extensive description of the sensors is unlocatable in any of our considered examples of BES.
In BES, P&ID textual annotations are often based on natural language or non-standard abbreviations and are individually
integrated into the P&ID of BES. For these reasons, the text entries in the plans have not been considered in the prepared
data set and used algorithm.
The data sets we use for object detection four TBE symbols, which are pump, valve, heat exchanger and flap. They can
be found most frequently in P&ID of BES and have a high influence on the control. Figure 1 shows a summary of the
training and test data set.
Figure 1: Histogram of the training and test data set showing number of images and objects by class (pump, valve,
HX=heat exchanger, flap, total, raw data sorted by source, connection detection
The sum of images for all TBE symbols is not equal to the total number of images, as one image can contain different TBE
symbols. The training data set contains 154 images in total, where as 84 images were extracted using Google Crawler [59]
with the following keywords:
• {”pump”, ”valve”, ”heat exchanger”, ”flap”}each combined with:
• {”symbol”, ”symbol iso”, ”standard symbol”, ”19227”, ”81346”, ”60417”}
13 standards are used [30–42] and 25 cutouts generated from it. Further, 45 images are from plan cutouts of five P&ID
from different buildings and design companies. These P&ID include complex energy systems such as district energy
As data source for the test data we use plan cutouts. The test image data set consists of 45 images from different technical
plans. The test to training data ratio is approximately 1:3. As the test data is generated with the focus on including images
that can be used for the connection detection, more valve objects exist in the test than training data.
Unfortunately, there is no publicly available data set for P&ID and for BES in particular. For creating the data set that was
used for our symbol recognition training process, the following preprocessing steps were done:
Google Crawler data sorting
P&ID cutout generation
norm cutout generation
training data labeling
The time duration for these steps can be found in figure 6.
The requirements for a test data set for evaluating the connection detection differ from the object detection. It is necessary
that connections between objects are shown. Therefore, for the connection test data set only the subset that comply with
this requirement was chosen. Therefore, the connection data set consists of 26 images and 566 classification elements.
4. Method
In [61], a toolchain is presented how raw data of building automation systems and digitization of P&ID support new
control approaches like control generation, (distributed) model predictive control or fault detection in building automation
systems. In this paper, we present the digitization of P&ID in BES in a computer-interpretable form.
In our algorithm, first, the objects within P&ID are recognized with object detection on the basis of their symbols. Sub-
sequently, the lines of P&ID and crossings of these are identified. From all this information, the next step is to determine
whether a connection exists between the objects. At the end, the connections are exported into a topology model for
digital twin generation.
4.1. Implementation of plan topology detection algorithms
For the detection of the plan elements and topology a step-wise approach was used. The overall structure of the approach
can be seen in Figure 2.
A Python-based object-oriented structure was established in our algorithm, which holds the information gathered from
our algorithms. The input plan image is saved. Further, for all three core objects of the plan (TBE, technical lines and
line crossings) three classes TechEquipment, TechLine and lineCrossing are established. Furthermore, a two-dimensional
connectionMatrix describes the connections between all TBE.
Figure 2: Schema showing the step-wise detection procedure (decomposition, symbol detection, line detection, line
crossing detection, connection derivation and export function)
Faster R-CNN [27] is used for symbol recognition. However, Faster R-CNN has problems to detect smaller objects [62],
thus why it is unable to directly recognize all the symbols on P&ID. So we decompose automatically the P&ID into
individual cutouts that are recognized by Faster R-CNN and then reassembled. The TBE symbols, the polylines and
locations of line crossings are extracted. The plan cutout information are composed to the whole plan level. A connection
derivation algorithm determines TBE connections. Last, the information about TBE objects and connections between
them are saved in a data interface.
4.1.1. Decomposition and composition of technical plan
Extracting whole plans from a PDF file as image file would result in high image sizes as a minimum dimension for the
symbols must be fulfilled. Our object detection classifier could not handle this high image sizes and the high number
of symbols to be detected. Therefore, we developed an approach to decompose the whole plan into cutouts. The cutout
allocation is saved, so that after extracting the core elements from the cutouts, the information can be composed again and
form the whole plan.
4.1.2. Object detection convolutional neural network approach
Since the symbols in BES are similar, but not exactly the same, a template matching approach is not appropriate where
an exact symbol is sought. A first test with template matching could confirm this assumption (F1 score <70%). Symbols
can also be rotated or skewed. Therefore, a general approach is needed for BES, which also does not recognize exact
copies of its training data symbols. Faster R-CNN [27] provides this approach. Therefore, we developed a function based
on it to detect the TBE symbols.
A function was developed for detecting the technical equipment symbols using Faster R-CNN [27]. To use the function,
first a CNN based object classifier was generated by training the R-CNN using the training data set previously described.
The labeled training images contain the position and class information for all objects. The training was done using a
TensorFlow script [63]. Around every five minutes the training routine periodically saves checkpoints which contain the
current state of the trained classifier. The total time duration for the training can be found in figure 6. After training is
done, checkpoint with the highest number of steps is used to generate the inference graph. The inference graph contains
the trained object detection classifier.
4.1.3. Line detection algorithm
All technical equipment are connected via polylines in P&ID in BES. Detecting the lines is therefore vital to derive the
connection information. First, the image is converted to a binary image, which is used to identify edges inside the image.
Next, the lines are identified using the Hough-Transform [64].
4.1.4. Line crossing algorithm
Line Crossings are locations inside the plan image, where straight lines intersect with other straight lines. They represent
a change of direction of a connection line. For the plans we used, we found that a connection exists if two or three lines
leave the line crossing and no connection exists if four lines leave the line crossing. Figure 3 shows examples for two,
three and four directional crossings and the derived connection assumption.
Every detected line is compared to all other lines for intersection identification. Consequently, the algorithm mainly
does one task, which is finding the intersection of two lines. We used the approach described in [65] where a line-line
intersection can be found using determinants.
Figure 3: Typical line crossing types in piping and instrumental diagrams in building energy systems (two directional
crossing, three directional crossing and four directional crossing)
4.1.5. Connection derivation algorithm
As basic information the connection derivation algorithm uses the core objects described in previous sections and join
them in order to derive the topological connections. The procedure can be seen in Figure 4, exemplarily shown for finding
Figure 4: Connection derivation procedure with detected symbols (Valve-1, Valve-2, Pump-4, Flap-3), detected lines
(Line-1, Line-2) and line crossings (LineCrossing-1)
the connecting TBE symbols for ”Pump-4”. First, the lines which cross that symbol are identified. This is done using the
Liang-Barsky-Algorithm [66]. Then, the neighboring elements for this symbol are detected. This is done by searching
for the closest elements that is on the same line as the symbol. In Figure 4 this is ”LineCrossing-1” for the left neighbor.
Therefore, the TBE neighbors depend on the directional information of the line crossing, according to Figure 3. As this
is a three-directional crossing, all symbols linked via this crossing count as connection. So the neighboring TBE symbols
for ”Pump-4” can be determined as ”Valve-1” and ”Valve-2”. The procedure will be done for all TBE symbols on the test
4.2. Data interface
Industrial Foundation Classes in Version 4 (IFC4) is the actual open source standard for BIM. It has only a limited
descriptive capability for sensors and actuators and their relations [67]. Their descriptiveness is used even less frequently
in real-life projects [67].
None of the standards used for models by [68] and [15] includes specialized domain knowledge from BAS and BES.
For these reasons, we use the domain specific ontology-based Brick Schema [18] and label-based BUDO Schema [21] to
create a digital twin of BES systems as BES currently lack pervasive standards for automation [69].
4.3. Used computer hardware and software
For all Computer Vision tasks a HP Pavilion x360 convertible laptop, with 8 GB RAM, Intel Core i5-8250U CPU with
1.8 GHz and Nvidia GeForce MX130 graphic processing unit (GPU) was used. The computation capability for this GPU
is 5.0, which makes it possible to use the Nvidia Compute Unified Device Architecture (CUDA) toolkit, that enables
implementing parallel computing applications. For this work CUDA toolkit 10.0 was used for the training of the CNN.
All programming tasks in this approach were done with the programming language Python.
5. Results
To validate the proposed detection algorithms, binary classification is used. Important performance assessment parameters
derived from binary classification are recall, precision, F1 score, accuracy and average precision (AP) [70].
5.1. Object detection performance
Figure 5 shows the precision-recall curves for each of the four object classes. The previously defined object detection test
data set is used. The target state for the detection is the ground truth which is defined for every test image. The curves
are generated using an Intersection over Union (IoU)-threshold [71] between the ground truth and the detection bounding
boxes of 0.5, which led to the highest precision and recall values.
All curves show the typical course of the graph, high precision values for low recall values and a decreasing precision for
higher recall. For all curves, the precision stays above 90% until a recall of around 80%. For all classes, this leads to AP
values of around 80% or higher. The highest AP is reached for the class valve with a value of 97.5%. Even at a recall
of 1, the precision is at around 90%. Consequently, all valve symbols labeled as ground truths were detected, while only
10% of the predicted detections were false positives.
Figure 6 shows the time duration for different procedures of the whole process from data preparation to actual object
detection. It shall be noted that these values are not universally valid. They show the time we needed for these tasks and
can serve as a reference to evaluate the temporal and effort cost of the different procedures.
Generating the training image set out of the raw data sources is given by the first three points, where only times per image
are given. Times per object do not make sense in this case as the work effort is independent of the number of objects
drawn on the image. The preprocessing takes us around 120 minutes in total. The time per image ranges from around 0.05
up to 2 minutes. The labeling takes 154 minutes, which is comparable to the preprocessing. The time per object indicates
the time needed for labeling one object, which is about 20-25 seconds. The training time is 120 minutes with the data set
of 154 images and 296 objects. The times for the described procedures can be summarized to a total time for preparing a
usable CNN object detection algorithm, which is around 400 minutes in total. The actual detection process for all images
Figure 5: Precision-recall-curve for classes flap, heat exchanger, pump and valve, using IoU-threshold 0.5
of the test image set takes 5 minutes.
5.2. Connection detection performance
The connection detection performance is evaluated using the defined connection test data set with the numbers shown in
Figure 1 and the ground truth being connection matrices for all test images. To see the performance of the connection
algorithm separately without taking into account errors from the object detection, the ground truths for the TBE symbols
are used for the evaluation. Figure 7 shows the evaluation metrics for the total connection test image data set. 39
elements were identified as false negatives, so were not detected although a connection exists. 116 elements were identified
correctly, which leads to a recall of 75%. The precision is 91%, 11 elements were misdetected as being connections
although no connection exists. The number of false positives is low.
400 connection candidates were predicted correctly as true negative, so correctly not being connections. As the number
of true negatives is higher than the number of true positives, the accuracy is higher than the recall, as the accuracy takes
both, TP and TN into account. It is slightly above 90%.
6. Discussion
6.1. Object detection
Building energy systems (BES), unlike industrial systems, do not have an established standard for the construction of
P&ID diagrams in practice. Despite the inhomogeneous data set, the results of the object recognition are very good
with 93% average precision (AP) over all classes. The results of connection detection are highly dependent on the
preceding object detection. However, these results depend on the object class used. Heavily skewed symbols, like the heat
exchanger, achieve good results with up to 78% average precision (AP). However, they do not reach the results of constant
proportional symbols like valves with 97.5% AP. The heat exchanger is a decisive element for the topology detection of
building energy systems. Here, two hydraulically decoupled systems are connected with each other. In addition to the
skew, the symbols were also marked in different colors and their patterns were different. Other symbols were also different
in shape, but could be generalized well by the Faster R-CNN algorithm.
Especially for approaches of other domains, the comparison to our results is difficult. Since there is no open data set for
building energy systems, a comparison always depends on the data set used and the object types used. If these conditions
are disregarded, the results of object detection are comparable to the results of [26] and [49]. [49] used template matching.
Our results of <70% F1 score for template matching show that a universal approach using artificial intelligence is
needed. [52] had problems recognizing small symbols in the P&ID with Faster R-CNN. We solved this problem by
automating the decomposition of the P&ID in smaller pieces.
Erroneous symbol detections (false positives) were generated especially for symbols that were not included in the consid-
ered object types.
Figure 6: Total time duration, time duration per image and time duration per object for different tasks in object detection
6.2. Connection detection
With an accuracy of 91% and a precision of 91% many connections could be detected correctly. We found out that the
algorithm detects two-directional connections very well, but does not achieve this accuracy for connections over three
directional crossings and four directional crossings.
Considering all adversities of a comparison, we were able to exceed the results of [50] (65% accuracy) and performed
slightly worse compared to [49] (92% accuracy).
6.3. Overall evaluation
It is remarkable that despite the large variety of standards for P&ID in buildings, our approach produces good results. The
actual usefulness in applications depends on the use case.
For the automated transformation of a digital twin in control applications, such as MPC, the recognition of the inputs and
outputs of a system is important [72]. The definition of the system boundaries is crucial here. If for MPC the system
boundaries are defined to match the plan boundaries, our approach can support MPC. However, if the system boundaries
are set differently, a different approach to topology detection must be used.
For transformed Petri net [7], the detected topology can provide a basis for the creation of control code. Here, the correct
identification of the connections and the object types is important. The object types could be determined very well. The
connections still need manual work for correct use in Petri nets.
For fault detection and diagnosis, the evaluation of our algorithm depends on the methods used. The generation of time
series of individual fault models can be supported by our algorithm. In this case, connections that are set individually
incorrectly are not decisive. If the automatically generated digital twin is used in a parallel operation for fault detection,
the model exported from our approach cannot be used directly. Adjustments must be made here.
The approach in [22] for generating building energy systems can be used directly. Depending on the use of the model,
it is not decisive whether all connections can be correctly identified. For example, in the approach of generic data sets
shown in [22], the exact connection of the systems is not decisive for the use of the model. However, if the model is used
to predict the exact energy consumption (BEPS), an inaccurate model is not suitable [23].
For the integration into a building automation system, e.g. as a simulated virtual sensor, a very good model is necessary,
depending on the integration. For this purpose, our approach is only of limited use. Here, the exported digital twin model
has to be adapted. However, the translation into a labeling approach for building automation systems supports its use in
building automation systems.
Figure 7: Connection detection evaluation metrics (recall, precision, F1 score, accuracy, specificity, negative predictive
7. Conclusion
7.1. Overall performance
We created a data set for the detection of topologies in building energy systems with piping and instrumentation diagrams
as data source. We tracked needed time for steps for preparation of our data set and implementation of the algorithms. The
combination of object recognition, line detection, line crossing and connection detection algorithm showed good results.
The algorithm recognized all object categories and connections well. An extension of the test data set by further object
types could improve the object recognition algorithm. This would result in fewer false detections of object types not taken
into account. The integration of skewed symbols in the test data set could also improve the results, especially for the
object type heat exchanger.
We showed that plans of technical building equipment are a good source for the system understanding of a building energy
system. The direct use of the exported digital twins depends on the application.
7.2. Further use of the developed digital twins
The approach presented here provides a base for advanced algorithms to optimize the energy use. We discussed if our
approach supports model predictive control, generation of control code, fault detection and diagnosis, automatically gener-
ated simulation models and integration into building automation systems depending on the application. We will implement
and review these approaches in our tool chain in future work.
We gratefully acknowledge the financial support provided by the BMWi (Federal Ministry for Economic Affairs and
Energy), promotional reference 03SBE006A.
[1] Nordhaus W. Climate Change: The Ultimate Challenge for Economics. American Economic Review.
[2] International Energy Agency. Tracking Buildings;. Available from:
[3] European Comission. Directive (EU) 2018/844 of the European Parliament and of the Council of 30 May 2018
amending Directive 2010/31/EU on the energy performance of buildings and Directive 2012/27/EU on energy effi-
ciency;. Available from:
[4] D’Agostino D, Zangheri P, Castellazzi L. Towards Nearly Zero Energy Buildings in Europe: A Focus on Retrofit in
Non-Residential Buildings. Energies. 2017;10(1):117.
[5] International Energy Agency. Transition to Sustainable Buildings. OECD; 2013.
[6] Waide P, Ure J, Karagianni N, Smith G, Bordass B. WAIDE STRATEGIC EFFICIENCY, editor. The scope for
energy and CO2 savings in the EU through the use of building automation technology: Final Report;. Avail-
able from:
[7] Schumacher F, Fay A. Formal representation of GRAFCET to automatically generate control code. Control Engi-
neering Practice. 2014;33:84–93.
[8] Pr´
ıvara S, Cigler J, V´
na Z, Oldewurtel F, Sagerschnig C, ˇ
a E. Building modeling as a crucial part for
building predictive control. Energy and Buildings. 2013;56:8–22.
[9] Kim W, Katipamula S. A review of fault detection and diagnostics methods for building systems. Science and
Technology for the Built Environment. 2018;24(1):3–21.
[10] Capozzoli A, Piscitelli MS, Brandi S, Grassi D, Chicco G. Automated load pattern learning and anomaly detection
for enhancing energy management in smart buildings. Energy. 2018;157:336–352.
[11] Tao F, Cheng J, Qi Q, Zhang M, Zhang H, Sui F. Digital twin-driven product design, manufacturing and service with
big data. The International Journal of Advanced Manufacturing Technology. 2018;94(9-12):3563–3576.
[12] El Saddik A. Digital Twins: The Convergence of Multimedia Technologies. IEEE MultiMedia. 2018;25(2):87–92.
[13] Fuentes DED, Becker U, Diekhake P, Gunther M, Scholz A, Schmidt PP, et al. Evaluation and simulation of building
automation systems based on their AutomationML description. In: 2016 IEEE 21st International Conference on
Emerging Technologies and Factory Automation (ETFA). [Piscataway, New Jersey]: Ieee; 2016. p. 1–6.
[14] Schmidt M, Moreno MV, Sch¨
ulke A, Macek K, Maˇ
ık K, Pastor AG. Optimizing legacy building operation: The
evolution into data-driven predictive cyber-physical systems. Energy and Buildings. 2017;148:257–279.
[15] Arroyo E, Hoernicke M, Rodr´
ıguez P, Fay A. Automatic derivation of qualitative plant simulation models from
legacy piping and instrumentation diagrams. Computers & Chemical Engineering. 2016;92:112–132.
[16] Isaksson AJ, Harjunkoski I, Sand G. The impact of digitalization on the future of control and operations. Computers
& Chemical Engineering. 2018;114:122–129.
[17] Lydon GP, Caranovic S, Hischier I, Schlueter A. Coupled simulation of thermally active building systems to support
a digital twin. Energy and Buildings. 2019;202:109298.
[18] Balaji B, Bhattacharya A, Fierro G, Gao J, Gluck J, Hong D, et al. Brick : Metadata schema for portable smart
building applications. Applied Energy. 2018;226:1273–1292.
[19] Haynes A. ASHRAE’s BACnet Committee, Project Haystack and Brick Schema Collaborating to Provide
Unified Data Semantic Modeling Solution. ATLANTA, BERKLEY, CA and RICHMOND, VA; 28.02.2018.
Available from:
project-haystack-and-brick- schema-collaborating-to-provide-unified-data-
[20] Gao J, Berg´
es M. A large-scale evaluation of automated metadata inference approaches on sensors from air handling
units. Advanced Engineering Informatics. 2018;37:14–30.
[21] Stinner F, Kornas A, Baranski M, M¨
uller D. Structuring building monitoring and automation system data. In:
REHVA, editor. The REHVA European HVAC Journal - August 2018. REHVA Journal; 2018. p. 10–15. Available
[22] Stinner F, Yang Y, Schreiber T, Bode G, Baranski M, M¨
uller D. Generating Generic Data Sets for Machine Learning
Applications in Building Services Using Standardized Time Series Data. In: Al-Hussein M, editor. Proceedings
of the 36th International Symposium on Automation and Robotics in Construction (ISARC). Proceedings of the
International Symposium on Automation and Robotics in Construction (IAARC). International Association for Au-
tomation and Robotics in Construction (IAARC); 2019. .
[23] Egan J, Finn D, Deogene Soares PH, Rocha Baumann VA, Aghamolaei R, Beagon P, et al. Definition of a useful
minimal-set of accurately-specified input data for Building Energy Performance Simulation. Energy and Buildings.
[24] Arroyo E, Fay A, Hoernicke M, Rodriguez P. Digitalisierung grafischer Engineering-Dokumente mit Hilfe optischer
Erkennung und semantischer Analyse als Grundlage f¨
ur die Modernisierung bestehender Anlagen. In: Automation
2015; 2015. .
[25] Moreno-Garc´
ıa CF, Elyan E, Jayne C. New trends on digitisation of complex engineering drawings. Neural Com-
puting and Applications. 2018.
[26] Yuxi Zhang. CNN-based Symbol Recognition and Detection in Piping Drawings [Master thesis]. Purdue University.
West Lafayette, USA; 2019. Available from:
[27] Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Net-
works. In: Computing research repository. vol. abs/1506.01497;. Available from:
[28] Charef R, Emmitt S, Alaka H, Fouchal F. Building Information Modelling adoption in the European Union: An
overview. Journal of Building Engineering. 2019;25:100777.
[29] Reiß & Rommerich. Bericht zum Thema Building Information Modeling (BIM): Bundesweite Befragung der Mit-
glieder der Architektenkammern der L¨
ander;. Available from:
[30] Deutsches Institut f¨
ur Normung. DIN EN 806-1 - Specifications for installations inside buildings conveying water
for human consumption – Part 1: General; 12/2001.
[31] Deutsches Institut f¨
ur Normung. DIN EN 12792:2003 - Ventilation for buildings - Symbols, terminology and graph-
ical symbols; 2004-01.
[32] Deutsches Institut f¨
ur Normung. DIN 19227-2:1991 - Control technology; graphical symbols and identifying letters
for process control engineering; representation of details; 1991.
[33] Deutsches Institut f¨
ur Normung. DIN EN 60617-2:1997 - Graphical symbols for diagrams - Part 2: Symbol elements,
qualifying symbols and other symbols having general application; 1997.
[34] Deutsches Institut f¨
ur Normung. DIN EN 60617-3 - Graphical symbols for diagrams - Part 3: Conductors and
connecting devices (IEC 60617-3:1996); German version EN 60617-3:1996; 1997.
[35] Deutsches Institut f¨
ur Normung. DIN EN 60617-11 - Graphical symbols for diagrams - Part 11: Architectural and
topographical installation plans and diagrams (IEC 60617-11:1996); German version EN 60617-11:1996; 1997.
[36] Deutsches Institut f¨
ur Normung. DIN EN ISO 10628-1:2014 - Diagrams for the chemical and petrochemical industry
- Part 1: Specification of diagrams; 04/2015.
[37] Deutsches Institut f¨
ur Normung. DIN EN ISO 10628-2 - Diagrams for the chemical and petrochemical industry -
Part 2: Graphical symbols (ISO 10628-2:2012); German version EN ISO 10628-2:2012; 04/2013.
[38] Deutsches Institut f¨
ur Normung. DIN 6779-12 - Structuring principles for technical products and technical product
documentation - Part 12: Buildings and building technology; 04/2011.
[39] Deutsches Institut f¨
ur Normung. DIN 6779-13 - Structuring principles for technical products and technical product
documentation - Part 13: Chemical plants; 01/2018.
[40] Deutsches Institut f¨
ur Normung. DIN 28000-4 - Chemical apparatus - Documentation in the life cycle of process
plants - Part 4: Graphical symbols of valves, pipes and actuators; 07/2014.
[41] Deutsches Institut f¨
ur Normung. DIN 28000-5 - Chemical apparatus - Documentation in the life cycle of process
plants - Part 5: Graphical symbols of apparatus and machines; 04/2015.
[42] Deutsches Institut f¨
ur Normung. DIN ISO/TS 81346-10:2015 - Industrial systems, installations and equipment
and industrial products - Structuring principles and reference designation - Part 10: Power plants (ISO/TS 81346-
10:2015); 10/2015.
[43] Gellaboina MK, Venkoparao VG. Graphic Symbol Recognition Using Auto Associative Neural Network Model. In:
Chanda B, editor. Proceedings. Los Alamitos, Calif.: IEEE Computer Society Press; 2009. p. 297–301.
[44] Gutermuth G, Hoernicke M. Automatic generation of plant topologies by analysing operations data. In: Automa-
tion IICoET, Factory, editors. 2017 22nd IEEE International Conference on Emerging Technologies and Factory
Automation. [Piscataway, NJ]: Ieee; 2017. p. 1–8.
[45] Moreno-Garc´
ıa CF, Elyan E, Jayne C. Heuristics-Based Detection to Improve Text/Graphics Segmentation in Com-
plex Engineering Drawings. In: Boracchi G, Iliadis L, Jayne C, Likas A, editors. Engineering applications of neural
networks. vol. 744 of Communications in Computer and Information Science. [Place of publication not identified]:
Springer International Publishing; 2017. p. 87–98.
[46] Martinez GS, Sierla S, Karhela T, Vyatkin V. Automatic Generation of a Simulation-Based Digital Twin of an
Industrial Process Plant. In: IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society.
Piscataway, NJ: Ieee; 2018. p. 3084–3089.
[47] Tan WC, Chen IM, Pantazis D, Pan SJ. Transfer Learning with PipNet: For Automated Visual Analysis of Piping
Design. In: 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE). [S.l.]: Ieee;
8/20/2018 - 8/24/2018. p. 1296–1301.
[48] Koltun G, Maurer F, Knoll A, Trunzer E, Vogel-Heuser B. Information Retrieval from Redlined Circuit Diagrams
and its Model-Based Representation for Automated Engineering. In: IECON 2018 - 44th Annual Conference of the
IEEE Industrial Electronics Society. Piscataway, NJ: Ieee; 2018. p. 3114–3119.
[49] Kang SO, Lee EB, Baek HK. A Digitization and Conversion Tool for Imaged Drawings to Intelligent Piping and
Instrumentation Diagrams (P&ID). Energies. 2019;12(13):2593.
[50] Rahul R, Paliwal S, Sharma M, Vig L. Automatic Information Extraction from Piping and Instrumentation Diagrams.
In: de Maria E, Fred A, Gamboa H, editors. Proceedings. Bioinformatics. [S. l.]: SCITEPRESS = Science and
Technology Publications; 2019. p. 163–172.
[51] Yu, Cha, Lee, Kim, Mun. Features Recognition from Piping and Instrumentation Diagrams in Image Format Using
a Deep Learning Network. Energies. 2019;12(23):4425.
[52] Nurminen JK, Rainio K, Numminen JP, Syrj¨
anen T, Paganus N, Honkoila K. Object Detection in Design Diagrams
with Machine Learning. In: Burduk R, Kurzy ´
nski M, Wo´
zniak M, editors. Progress in computer recognition systems.
vol. 977 of Advances in Intelligent Systems and Computing. Cham, Switzerland: Springer; 2020. p. 27–36.
[53] Rica E, Moreno-Garc´
ıa CF, ´
Alvarez S, Serratosa F. Reducing human effort in engineering drawing validation.
Computers in Industry. 2020;117:103198.
[54] Bigvand PG, Fay A. A workflow support system for the process and automation engineering of production plants.
In: 2017 IEEE International Conference on Industrial Technology (ICIT). Piscataway, NJ: Ieee; 2017. p. 1118–1123.
[55] Hoernicke M, Fay A, Barth M. Virtual plants for brown-field projects. In: 2015 IEEE 20th Conference on Emerging
Technologies & Factory Automation (ETFA). Piscataway, NJ: Ieee; 2015. p. 1–8.
[56] El-Harby AA. New Square Scan Algorithm. In: GVIP Journal. vol. 9; 2005. .
[57] Kiryati N, Eldar Y, Bruckstein AM. A probabilistic Hough transform. Pattern Recognition. 1991;24(4):303–316.
[58] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Computer Vision and
Pattern Recognition (CVPR), 2015 IEEE Conference on. [Piscataway, New Jersey]: [EEE]; 2015. p. 3431–3440.
[59] Vasa H. Google images download; 2019. Available from:
[60] Deutsches Institut f¨
ur Normung. DIN EN 81346-1: Industrial systems, installations and equipment and industrial
products - Structuring principles and reference designations - Part 1: Basic rules (IEC 81346-1:2009); German
version EN 81346-1:2009; 2009.
[61] Bode G, Stinner F, Baranski M, Br ¨
ummendorf E, Cai X, K¨
umpel A, et al. From plans to programs: A holistic
toolchain for building data applications. Journal of Physics: Conference Series. 2019;1343:012117. Available from:
[62] Eggert C, Brehm S, Winschel A, Zecha D, Lienhart R. A closer look: Small object detection in faster R-CNN. In:
2017 IEEE International Conference on Multimedia and Expo (ICME). Piscataway, NJ: Ieee; 2017. p. 421–426.
[63] Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al.. TensorFlow: Large-Scale Machine Learning on
Heterogeneous Systems; 2015. Available from:
[64] ; Method and means for recognizing complex patterns. ; 1962-12-18. Available from: https://www.osti.
[65] Weisstein EW. Line-Line Intersection; 2007. Available from:
[66] Barsky BA, Liang Y, Slater M. University of California at Berkeley, editor. Some Improvements to a Parametric
Line Clipping Algorithm. USA;.
[67] Lange H, Johansen A, Kjærgaard MB. Evaluation of the opportunities and limitations of using IFC models as source
of building metadata. In: Ramachandran GS, Batra N, editors. BuildSys’18. New York, New York: The Association
for Computing Machinery; 2018. p. 21–24.
[68] Koltun G, Basirati MR, Subhan Hammeed M, Bohm M, Krcmar H, Vogel-Heuser B. Reverse Engineering on
changed Functional Specification Documents for Model-Based Requirements Engineering. In: 2019 IEEE Interna-
tional Conference on Industrial Cyber Physical Systems (ICPS). Ieee; 5/6/2019 - 5/9/2019. p. 687–692.
[69] Krutwig M, K¨
olmel B, Tantau A, Starosta K. Standards for Cyber-Physical Energy Systems—Two Case Studies
from Sensor Technology. Applied Sciences. 2019;9(3):435.
[70] Nanopoulos A, Alcock R, Manolopoulos Y. Feature-based Classification of Time-series Data. In: Mastorakis N,
Nikolopoulos SD, editors. Information processing and technology. Commack, NY, USA: Nova Science Publishers,
Inc; 2001. p. 49–61. Available from:
[71] Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S. Generalized Intersection over Union: A Metric
and A Loss for Bounding Box Regression;. Available from:
[72] Christofides PD, Scattolini R, La Mu˜
noz de Pe˜
na D, Liu J. Distributed model predictive control: A tutorial review
and future research directions. Computers & Chemical Engineering. 2013;51:21–41.
... Using available or extractable information of the brownfield process can be a good starting point for the automatic 58790 VOLUME 10, 2022 digital twin generation. Digital twins for brownfield processes can be automatically extracted from available or extractable information such as 3D scan of production sites [32], [33], 3D models [35], P&ID documents [36], design phase requirements [37], archived data repository [38] and mixture of these information [39]. Sierla et al. [40] extracted graph models of process plants from two different sources, 3D CAD models and P&IDs, for the generation of digital twins. ...
Full-text available
Digital twins are now one of the top trends in Industry 4.0, and many companies are using them to increase their level of digitalization, and, as a result, their productivity and reliability. However, the development of digital twins is difficult, expensive, and time consuming. This article proposes a semi-automated methodology to generate digital twins for process plants by extracting process data from engineering documents using text and image processing techniques. The extracted information is used to build an intermediate graph model, which serves as a starting point for generating a model in a simulation software. The translation of a graph-based model into a simulation software environment necessitates the use of simulator-specific mapping rules. This paper describes an approach for generating a digital twin based on a steady state simulation model, using a Piping and Instrumentation Diagram (P&ID) as the main source of information. The steady state modeling paradigm is especially suitable for use cases involving retrofits for an operational process plant, also known as a brownfield plant. A methodology and toolchain is proposed, consisting of manual, semi-automated and fully automated steps. A pilot scale brownfield fiber processing plant was used as a case study to demonstrate our proposed methodology and toolchain, and to identify and address issues that may not occur in laboratory scale case studies. The article concludes with an evaluation of unresolved concerns and future research topics for the automated development of a digital twin for a brownfield process system.
... AUC is an important indicator of the quality of a binary classification model, which accords to the size of the area under the Receiver Operating Characteristic (ROC) curve. It represents the probability that the positive example is in front of the negative example [32,33]. Its maximum value is 1 and the minimum value is 0. Table 2 demonstrates its measurement standard. ...
Full-text available
The current work aims to strengthen the research of segmentation, detection, and tracking methods of stem cell image in the fields of regenerative medicine and tissue damage restoration. Firstly, based on the relevant theories of stem cell image segmentation, digital twins (DTs), and lightweight deep learning, a new phase contrast microscope is introduced through the research of optical microscope. Secondly, the results of DTs method and phase contrast imaging principle are compared in stem cell image segmentation and detection. Finally, a lightweight deep learning model is introduced in the segmentation and tracking of stem cell image to observe the gray value and mean value before and after stem cell image movement and stem cell division. The results show that phase contrast microscope can increase the phase contrast and amplitude difference of stem cell image and solve the problem of stem cell image segmentation to a certain extent. The detection results of DTs method are compared with phase contrast imaging principle. It indicates that not only can DTs method make the image contour more accurate and clearer, but also its accuracy, recall, and F1 score are 0.038, 0.024, and 0.043 higher than those of the phase contrast imaging method. The lightweight deep learning model is applied to the segmentation and tracking of stem cell image. It is found that the gray value and mean value of stem cell image before and after movement and stem cell division do not change significantly. Hence, the application of DTs and lightweight deep learning methods in the segmentation, detection, and tracking of stem cell image has great reference significance for the development of biology and medicine.
Accounting for around 40% of total energy consumption, buildings in industrialized countries are a particular focus for saving energy. Especially digital twins and building information modeling (BIM) can serve as a mean to improve operational processes, detect faulty ventilation and air conditioning (HVAC) components which lead to an increased energy consumption and evaluate future renovation possibilities during a buildings life cycle. To enable these advantages a transition from existing HVAC diagrams to a machine-readable representation including spatial and functional interrelationship of all components as well as their semantic relationship is necessary. Especially in retrofit cases the manual transition is a time-intensive and an error prone process. The present work aims to develop a procedure for the (partially) automated recognition of HVAC diagrams and extraction of information about their intercorrelation. In order to achieve the desired results, we use multiple approaches of computer vision, both conventional as well as more recent ones using artificial intelligence such as “Faster R-CNN”. The developed pipeline consists of three basic steps: Find symbols of components, find connecting lines, combine the extracted information into a machine-readable representation. Due to privacy issues, only very few real diagrams are available. To evaluate our approach we developed a data generator to train and test our pipeline. The obtained results show that to a large extend it is already possible to transfer relevant information from technical diagrams into a machine-readable format in order to reduce the effort of creating and validating digital twins and BIM for retrofits.KeywordsOntologyHVAC diagram recognitionMachine learningComputer visionBuilding information modelling
Optimizing the operation of building energy systems holds great potential to reduce energy consumption in buildings. However, this requires detailed system information, such as the relationship of sensor data. Automatic detection of this information requires monitoring data from buildings, which is rarely available in the needed quality for automatic assignment. This study bases on 200 weeks of data collected from eight temperature sensors of a heat pump and a heat exchanger in 5-min samples. We use this data to auto-generate grey-box models to extend the data set with 500 weeks of simulated data. We train six supervised deep learning algorithms with all the data to test whether detecting connections is possible. The maximum F1 score of 94.9% compared to real-based results with a maximum of 34.2%, which is over 60% better. The advantage of the proposed approach is its independence from the low availability of real data.
Full-text available
A piping and instrumentation diagram (P&ID) is a key drawing widely used in the energy industry. In a digital P&ID, all included objects are classified and made amenable to computerized data management. However, despite being widespread, a large number of P&IDs in the image format still in use throughout the process (plant design, procurement, construction, and commissioning) are hampered by difficulties associated with contractual relationships and software systems. In this study, we propose a method that uses deep learning techniques to recognize and extract important information from the objects in the image-format P&IDs. We define the training data structure required for developing a deep learning model for the P&ID recognition. The proposed method consists of preprocessing and recognition stages. In the preprocessing stage, diagram alignment, outer border removal, and title box removal are performed. In the recognition stage, symbols, characters, lines, and tables are detected. The objects for recognition are symbols, characters, lines, and tables in P&ID drawings. A new deep learning model for symbol detection is defined using AlexNet. We also employ the connectionist text proposal network (CTPN) for character detection, and traditional image processing techniques for P&ID line and table detection. In the experiments where two test P&IDs were recognized according to the proposed method, recognition accuracies for symbol, characters, and lines were found to be 91.6%, 83.1%, and 90.6% on average, respectively.
Full-text available
In the Fourth Industrial Revolution, artificial intelligence technology and big data science are emerging rapidly. To apply these informational technologies to the engineering industries, it is essential to digitize the data that are currently archived in image or hard-copy format. For previously created design drawings, the consistency between the design products is reduced in the digitization process, and the accuracy and reliability of estimates of the equipment and materials by the digitized drawings are remarkably low. In this paper, we propose a method and system of automatically recognizing and extracting design information from imaged piping and instrumentation diagram (P&ID) drawings and automatically generating digitized drawings based on the extracted data by using digital image processing techniques such as template matching and sliding window method. First, the symbols are recognized by template matching and extracted from the imaged P&ID drawing and registered automatically in the database. Then, lines and text are recognized and extracted from in the imaged P&ID drawing using the sliding window method and aspect ratio calculation, respectively. The extracted symbols for equipment and lines are associated with the attributes of the closest text and are stored in the database in neutral format. It is mapped with the predefined intelligent P&ID information and transformed to commercial P&ID tool formats with the associated information stored. As illustrated through the validation case studies, the intelligent digitized drawings generated by the above automatic conversion system, the consistency of the design product is maintained, and the problems experienced with the traditional and manual P&ID input method by engineering companies, such as time consumption, missing items, and misspellings, are solved through the final fine-tune validation process.
Full-text available
Building information modelling (BIM)is one of the most promising recent developments in the Architecture, Engineering, Construction and Operation (AECO)industry. However, its adoption remains a challenge for the AECO industry because it requires a shift to a new way of working, leading to a current discrepancy in the adoption of BIM in the EU. The paper aims at assessing the gaps in the BIM adoption between the 28 EU countries and the barriers related to its implementation. The methodology adopted here is twofold: first, secondary data are given by a systematic literature review, completed with the review of current projects funded by the European Commission, and dealing with fostering the BIM adoption. Second, primary data are provided by a questionnaire survey to classify BIM initiatives regarding policies, the level of adoption and the barriers encountered in the 28 EU countries. In order to grade the heterogeneity of BIM adoption in the EU, we have classified the countries into four categories with different levels of awareness, from early adopters (BIM already mandated)to countries without any plan. The survey has enabled the analysis of twenty barriers to BIM adoption using the four grades in relation to the respondent country. We found barriers that are acknowledged by all countries irrespective of their level of BIM adoption. Other barriers have been already tackled by the early adopters but not by the newcomers who have yet to experience some of these issues. Finally, the assessment of the disparities of BIM adoption within the EU can help the European Commission towards unifying European standard on BIM.
Conference Paper
Piping is an essential component in buildings, and its as-built information is critical to facility management tasks. Manually extracting piping information from legacy drawings that are in paper, PDF, or image format is mentally exerting, time-consuming, and error-prone. Symbol recognition is the core problem in the computer-based interpretation of piping drawings, and the main technical challenge is to determine robust features that are invariant to scaling, rotation, and translation. This thesis aims to use convolutional neural networks (CNNs) to automatically extract features from raw images, and consequently, to recognize symbols in piping drawings. In this thesis, the spatial transformer network (STN) is applied to improve the performance of a standard CNN model for recognizing piping symbols. For experimentation, eight types of symbols are synthesized based on the geometric constraints between the primitives. The experiment for symbol recognition is conducted, and the recognition accuracy of the CNN+STN model and the standard CNN model are compared. It is observed that the spatial transformer layer improves the accuracy in classifying piping symbols from 95.39% to 98.26%. Future works will focus on detecting symbols in piping drawings and collecting real drawings.
Oil & Gas facilities are extremely huge and have complex industrial structures that are documented using thousands of printed sheets. During the last years, it has been a tendency to migrate these paper sheets towards a digital environment, with the final end of regenerating the original computer-aided design (CAD) projects which are useful to visualise and analyse these facilities through diverse com- puter applications. Usually, this was done manually by re-sketching each page using CAD applications. Nevertheless, some applications have appeared which generate the CAD document automatically given the paper sheets. In this last case, the final document is always verified by an engineer due to the need of being a zero-error process. Since the need of an engineer is absolutely accepted, we present a new method to reduce the required engineer working time. This is done by highlighting the digitised components in the CAD document that the automatic method could have incorrectly identified. Thus, the engineer is required only to look at these components. The experimental section shows our method achieves a reduction of approximately 40% of the human effort keeping a zero-error proces.
Over the years companies have accumulated large amounts of legacy data. With modern data mining and machine learning techniques the data is increasingly valuable. Therefore being able to convert legacy data into a computer understandable form is important. In this work, we investigate how to convert schematic diagrams, such as process and instrumentation diagrams (P&I diagrams). We use modern machine learning based approaches, in particular, the Yolo neural network system, to detect high-level objects, e.g. pumps or valves, in diagrams which are scanned from paper archives or stored in pixel or vector form. Together with connection detection and OCR this is an essential step for the reuse of old planning data. Our results show that Yolo, as an instance of modern machine learning based object detection systems, works well with schematic diagrams. In our concept, we use a simulator to automatically generate labeled training material to the system. We then retrain a previously trained network to detect the components of our interest. Detection of large components is accurate but small components with sizes below 15% of page size are missed. However, this can be worked around by dividing a big diagram into a set of smaller subdiagrams with different scales, processing them separately, and combining the results.