Article

Extraction of line objects from piping and instrumentation diagrams using an improved continuous line detection algorithm

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Digitizing image-format piping and instrumentation diagrams (P&IDs) consists of a step for detecting the information objects that constitute P&IDs, which identifies connection relationships between the detected objects, and a step for creating digital P&IDs. This paper presents a P&ID line object extraction method that uses an improved continuous line detection algorithm to extract the information objects that constitute P&IDs. The improved continuous line detection algorithm reduces the time spent performing line extraction by edge detection that employs a differential filter. It is also used to detect continuous lines in the vertical, horizontal, and diagonal directions. Additionally, it processes diagonal continuous lines after performing image differentiation to handle short continuous lines, which are a major cause of misdetection when detecting diagonal continuous lines. The P&ID line object extraction method that incorporates this algorithm consists of three steps. The preprocessing step removes the diagram’s outline borders and heading areas. Second, the detection step detects continuous lines and then detects the special signs that are needed to distinguish different types of lines. Third, the postprocessing step uses the detected line signs to identify detected continuous lines, which must be converted to other types of lines, and their types are changed. Finally, the lines and the flow arrow detection information are merged. To verify the proposed method, an image-format P&ID line extraction system prototype was implemented, and line extraction tests were conducted. In nine test P&IDs, the overall average precision and recall were 95.26 % and 91.25 %, respectively, demonstrating good line extraction performance.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... To implement these approaches, besides identifying specific objects in the EDs, it is necessary to extract lines from the drawings. Moon et al. (Moon et al., 2023) mentioned that the identification of lines could help in the connection relationships between the detected symbols; in their work, they were able to digitalize lines of P&IDs with a precision of 95.26 % and a recall 91.25 % on average. To improve the interpretation of P&IDs, Han et al. used a rule-based continuous line classification, and Ekanayake et al. (Ekanayake et al., 2023) considered a CNN with RandomizedSearchCV method. ...
Article
Engineering drawings of the railway interlocking systems come often from a legacy since the railway networks were built several years ago. Most of these drawings remained archived on handwritten sheets and need to be digitalized to continue updating and safety checks. This digitalization task is challenging as it requires major manual labor, and standard machine learning methods may not perform satisfactorily because drawings can be noisy and have poor sharpness. Considering these challenges, this paper proposes to solve this problem with a hybrid method that combines machine learning models, clustering techniques, computer vision, and ruled-based methods. A fine-tuned deep learning model is applied to identify symbols, labels, specifiers, and electrical connections. The lines representing electrical connections are determined using a combination of probabilistic Hough transform and clustering techniques. The identified letters are joined to create the labels by applying rule-based methods, and electrical connections are attached to symbols in a graph structure. A readable output is created for a drawing interface using the edges from the graph structure and the position of the detected objects. The method proposed in this paper can support the digitization of other engineering drawings assisting in solving the challenge of digitizing engineering schemes.
... To implement these approaches, besides identifying specific objects in the EDs, it is necessary to extract lines from the drawings. Moon et al. (Moon et al., 2023) mentioned that the identification of lines could help in the connection relationships between the detected symbols; in their work, they were able to digitalize lines of P&IDs with a precision of 95.26 % and a recall 91.25 % on average. To improve the interpretation of P&IDs, Han et al. used a rule-based continuous line classification, and Ekanayake et al. (Ekanayake et al., 2023) considered a CNN with RandomizedSearchCV method. ...
Preprint
Full-text available
Engineering drawings of the railway interlocking systems come often from a legacy since the railway networks were built several years ago. Most of these drawings remained archived on handwritten sheets and need to be digitalized to continue updating and safety checks. This digitalization task is challenging as it requires major manual labor, and standard machine learning methods may not perform satisfactorily because drawings can be noisy and have poor sharpness. Considering these challenges, this paper proposes to solve this problem with a hybrid method that combines machine learning models, clustering techniques, computer vision, and ruled-based methods. A fine-tuned deep learning model is applied to identify symbols, letters, numbers, and specified objects. The lines representing electrical connections are determined using a combination of probabilistic Hough transform and clustering techniques. The identified letters are joined to create the labels by applying rule-based methods, and electrical connections are attached to symbols in a graph structure. A readable output is created for a drawing interface using the edges from the graph structure and the position of the detected objects. The method proposed in this paper can be applied to other engineering drawings and is a generalizable solution to the challenge of digitizing engineering schemes.
... Luo & Liu (2003) used a casebased recognition method to specify symbols from relationships such as points and lines. Additionally, Moon et al. (2023) used an image-based line recognition algorithm to find connecting lines in P&ID. Conversely, if the drawing is not created accurately and the creator makes a mistake when drawing some objects, there is a disadvantage in that the corresponding part cannot be found. ...
Article
Full-text available
In this study, we developed a method to simplify the analysis of complex Piping and Instrumentation Diagrams (P&IDs) on ships. By converting P&IDs into a graph format, we extracted lines and symbols from the original DXF files, enabling easier identification of connections between ship systems. Utilizing the graph, we can intuitively understand complex P&ID and easily apply it to research such as pipe routing optimization. This approach enhances the understanding of ship systems and has potential applications in recommending similar systems within existing ships, streamlining the design and analysis process.
... Piping and Instrumentation Diagrams (P&IDs) are another interesting field for automatic digitization. For example, recent works [33,34] present methods respectively using an improved continuous line detection algorithm to extract information from P&IDs, in- cluding identifying connection relationships and creating digital P&IDs, and an end-to-end digitization method based on Deep Neural Networks (DNNs) for converting P&IDs into digital form by object recognition, topology reconstruction, and diagram generation. A review of papers on hand-drawn chemical structure reconstruction can be found in [35]. ...
Article
Full-text available
Pedigree charts remain essential in oncological genetic counseling for identifying individuals with an increased risk of developing hereditary tumors. However, this valuable data source often remains confined to paper files, going unused. We propose a computer-aided detection/diagnosis system, based on machine learning and deep learning techniques, capable of the following: (1) assisting genetic oncologists in digitizing paper-based pedigree charts, and in generating new digital ones, and (2) automatically predicting the genetic predisposition risk directly from these digital pedigree charts. To the best of our knowledge, there are no similar studies in the current literature, and consequently, no utilization of software based on artificial intelligence on pedigree charts has been made public yet. By incorporating medical images and other data from omics sciences, there is also a fertile ground for training additional artificial intelligence systems, broadening the software predictive capabilities. We plan to bridge the gap between scientific advancements and practical implementation by modernizing and enhancing existing oncological genetic counseling services. This would mark the pioneering development of an AI-based application designed to enhance various aspects of genetic counseling, leading to improved patient care and advancements in the field of oncogenetics.
... Then, straight line joining is performed with center of gravity consistency and endpoint distance constraints. Moon [21] a P&ID line object extraction method, which uses the edge detection of straight lines with differential filters in the algorithm. The abovementioned image gradient-based straight line detection methods have good detection results, but the detection speed is slow. ...
Article
Full-text available
Aiming at the problems of the poor recognition effect and low recognition rate of the existing methods in the process of belt deviation detection, this paper proposes a real-time belt deviation detection method. Firstly, ResNet18 combined with the attention mechanism module is used as a feature extraction network to enhance the features in the belt edge region and suppress the features in other regions. Then, the extracted features are used to predict the approximate locations of the belt edges using a classifier based on the contextual information on the fully connected layer. Next, the improved gradient equation is used as a structural loss in the model training stage to make the model prediction value closer to the target value. Then, the authors of this paper use the least squares method to fit the set of detected belt edge line points to obtain the accurate belt edge straight line. Finally, the deviation threshold is set according to the requirements of the safety production code, and the fitting results are compared with the threshold to achieve the belt deviation detection. Comparisons are made with four other methods: ultrafast structure-aware deep lane detection, end-to-end wireframe parsing, LSD, and the Hough transform. The results show that the proposed method is the fastest at 41 frames/s; the accuracy is improved by 0.4%, 13.9%, 45.9%, and 78.8% compared to the other four methods; and the F1-score index is improved by 0.3%, 10.2%, 32.6%, and 72%, respectively, which meets the requirements of practical engineering applications. The proposed method can be used for intelligent monitoring and control in coal mines, logistics and transport industries, and other scenarios requiring belt transport.
Article
Full-text available
Digital transformation is omnipresent in our daily lives and its impact is noticeable through new technologies, like smart devices, AI-Chatbots or the changing work environment. This digitalization also takes place in product development, with the integration of many technologies, such as Industry 4.0, digital twins or data-driven methods, to improve the quality of new products and to save time and costs during the development process. Therefore, the use of data-driven methods reusing existing data has great potential. However, data from product design are very diverse and strongly depend on the respective development phase. One of the first few product representations are sketches and drawings, which represent the product in a simplified and condensed way. But, to reuse the data, the existing sketches must be found with an automated approach, allowing the contained information to be utilized. One approach to solve this problem is presented in this paper, with the detection of principle sketches in the early phase of the development process. The aim is to recognize the symbols in these sketches automatically with object detection models. Therefore, existing approaches were analyzed and a new procedure developed, which uses synthetic training data generation. In the next step, a total of six different data generation types were analyzed and tested using six different one- and two-stage detection models. The entire procedure was then evaluated on two unknown test datasets, one focusing on different gearbox variants and a second dataset derived from CAD assemblies. In the last sections the findings are discussed and a procedure with high detection accuracy is determined.
Article
Full-text available
Gait analysis has been studied for a long time and applied to fields such as security, sport, and medicine. In particular, clinical gait analysis has played a significant role in improving the quality of healthcare. With the growth of machine learning technology in recent years, deep learning-based approaches to gait analysis have become popular. However, a large number of samples are required for training models when using deep learning, where the amount of available gait-related data may be limited for several reasons. This paper discusses certain techniques that can be applied to enable the use of deep learning for gait analysis in case of limited availability of data. Recent studies on the clinical applications of deep learning for gait analysis are also reviewed, and the compatibility between these applications and sensing modalities is determined. This article also provides a broad overview of publicly available gait databases for different sensing modalities.
Article
Full-text available
As part of research on technology for automatic conversion of image-format piping and instrumentation diagram (P&ID) into digital P&ID, the present study proposes a method for recognizing various types of lines and flow arrows in image-format P&ID. The proposed method consists of three steps. In the first step of preprocessing, the outer border and title box in the diagram are removed. In the second step of detection, continuous lines are detected, and then line signs and flow arrows indicating the flow direction are detected. In the third step of post-processing, using the results of line sign detection, continuous lines that require changing of the line type are determined, and the line types are adjusted accordingly. Then, the recognized lines are merged with flow arrows. For verification of the proposed method, a prototype system was used to conduct an experiment of line recognition. For the nine test P&IDs, the average precision and recall were 96.14% and 89.59%, respectively, showing high recognition performance.
Article
Full-text available
Deep convolutional networks have obtained remarkable achievements on various visual tasks due to their strong ability to learn a variety of features. A well-trained deep convolutional network can be compressed to 20%–40% of its original size by removing filters that make little contribution, as many overlapping features are generated by redundant filters. Model compression can reduce the number of unnecessary filters but does not take advantage of redundant filters since the training phase is not affected. Modern networks with residual, dense connections and inception blocks are considered to be able to mitigate the overlap in convolutional filters, but do not necessarily overcome the issue. To do so, we propose a new training strategy, weight asynchronous update, which helps to significantly increase the diversity of filters and enhance the representation ability of the network. The proposed method can be widely applied to different convolutional networks without changing the network topology. Our experiments show that the stochastic subset of filters updated in different iterations can significantly reduce filter overlap in convolutional networks. Extensive experiments show that our method yields noteworthy improvements in neural network performance.
Article
Full-text available
Historical maps classification has become an important application in today’s scenario of everchanging land boundaries. Historical map changes include the change in boundaries of cities/states, vegetation regions, water bodies and so forth. Change detection in these regions are mainly carried out via satellite images. Hence, an extensive knowledge on satellite image processing is necessary for historical map classification applications. An exhaustive analysis on the merits and demerits of many satellite image processing methods are discussed in this paper. Though several computational methods are available, different methods perform differently for the various satellite image processing applications. Wrong selection of methods will lead to inferior results for a specific application. This work highlights the methods and the suitable satellite imaging methods associated with these applications. Several comparative analyses are also performed in this work to show the suitability of several methods. This work will help support the selection of innovative solutions for the different problems associated with satellite image processing applications.
Article
Full-text available
Piping and instrument diagrams (P&IDs) are a key component of the process industry; they contain information about the plant, including the instruments, lines, valves, and control logic. However, the complexity of these diagrams makes it difficult to extract the information automatically. In this study, we implement an object-detection method to recognize graphical symbols in P&IDs. The framework consists of three parts—region proposal, data annotation, and classification. Sequential image processing is applied as the region proposal step for P&IDs. After getting the proposed regions, the unsupervised learning methods, k-means, and deep adaptive clustering are implemented to decompose the detected dummy symbols and assign negative classes for them. By training a convolutional network, it becomes possible to classify the proposed regions and extract the symbolic information. The results indicate that the proposed framework delivers a superior symbol-recognition performance through dummy detection.
Article
Full-text available
This paper describes the navigation of an automated Pioneer P3-DX wheeled robot between obstacles using particle swarm optimization (PSO) algorithm tuned feedforward neural network (FNN). This PSO algorithm minimizes the mean square error between the actual and predicted values of the FNN. In this work, 2 separate DC motors and 16 ultrasonic sensors have been used for making differential drive steering angle and for collecting the distance from obstacles, respectively. The proposed without tuned FNN and PSO-tuned FNN receives obstacle's distance as inputs form ultrasonic sensors and control the steering angle of a differential drive of automated Pioneer P3-DX wheeled robot as output. We have compared the results between without tuned FNN and PSO-tuned FNN, and it has been found that PSO-tuned FNN gives a better trajectory and takes less distance to reach the target. Virtual Robot Experimentation Platform software has been used to design the real-time simulation results. A comparative study between without tuned FNN and PSO-tuned FNN verifies the effectiveness of PSO-tuned FNN for automated Pioneer P3-DX wheeled robot navigation. Also, we have compared this winner PSO-tuned FNN to the previously developed PSO-optimized Fuzzy Logic Controller navigational technique to show the authenticity and real-time implementation of PSO-tuned FNN.
Article
Full-text available
A piping and instrumentation diagram (P&ID) is a key drawing widely used in the energy industry. In a digital P&ID, all included objects are classified and made amenable to computerized data management. However, despite being widespread, a large number of P&IDs in the image format still in use throughout the process (plant design, procurement, construction, and commissioning) are hampered by difficulties associated with contractual relationships and software systems. In this study, we propose a method that uses deep learning techniques to recognize and extract important information from the objects in the image-format P&IDs. We define the training data structure required for developing a deep learning model for the P&ID recognition. The proposed method consists of preprocessing and recognition stages. In the preprocessing stage, diagram alignment, outer border removal, and title box removal are performed. In the recognition stage, symbols, characters, lines, and tables are detected. The objects for recognition are symbols, characters, lines, and tables in P&ID drawings. A new deep learning model for symbol detection is defined using AlexNet. We also employ the connectionist text proposal network (CTPN) for character detection, and traditional image processing techniques for P&ID line and table detection. In the experiments where two test P&IDs were recognized according to the proposed method, recognition accuracies for symbol, characters, and lines were found to be 91.6%, 83.1%, and 90.6% on average, respectively.
Chapter
Full-text available
Accurate, automated lesion detection in Computed Tomography (CT) is an important yet challenging task due to the large variation of lesion types, sizes, locations and appearances. Recent work on CT lesion detection employs two-stage region proposal based methods trained with centroid or bounding-box annotations. We propose a highly accurate and efficient one-stage lesion detector, by re-designing a RetinaNet to meet the particular challenges in medical imaging. Specifically, we optimize the anchor configurations using a differential evolution search algorithm. For training, we leverage the response evaluation criteria in solid tumors (RECIST) annotation which are measured in clinical routine. We incorporate dense masks from weak RECIST labels, obtained automatically using GrabCut, into the training objective, which in combination with other advancements yields new state-of-the-art performance. We evaluate our method on the public DeepLesion benchmark, consisting of 32,735 lesions across the body. Our one-stage detector achieves a sensitivity of 90.77% at 4 false positives per image, significantly outperforming the best reported methods by over 5%.
Article
Full-text available
Feature pyramids are widely exploited by both the state-of-the-art one-stage object detectors (e.g., DSSD, RetinaNet, RefineDet) and the two-stage object detectors (e.g., Mask RCNN, DetNet) to alleviate the problem arising from scale variation across object instances. Although these object detectors with feature pyramids achieve encouraging results, they have some limitations due to that they only simply construct the feature pyramid according to the inherent multiscale, pyramidal architecture of the backbones which are originally designed for object classification task. Newly, in this work, we present Multi-Level Feature Pyramid Network (MLFPN) to construct more effective feature pyramids for detecting objects of different scales. First, we fuse multi-level features (i.e. multiple layers) extracted by backbone as the base feature. Second, we feed the base feature into a block of alternating joint Thinned U-shape Modules and Feature Fusion Modules and exploit the decoder layers of each Ushape module as the features for detecting objects. Finally, we gather up the decoder layers with equivalent scales (sizes) to construct a feature pyramid for object detection, in which every feature map consists of the layers (features) from multiple levels. To evaluate the effectiveness of the proposed MLFPN, we design and train a powerful end-to-end one-stage object detector we call M2Det by integrating it into the architecture of SSD, and achieve better detection performance than state-of-the-art one-stage detectors. Specifically, on MSCOCO benchmark, M2Det achieves AP of 41.0 at speed of 11.8 FPS with single-scale inference strategy and AP of 44.2 with multi-scale inference strategy, which are the new stateof-the-art results among one-stage detectors. The code will be made available on https://github.com/qijiezhao/M2Det.
Article
Full-text available
In the Fourth Industrial Revolution, artificial intelligence technology and big data science are emerging rapidly. To apply these informational technologies to the engineering industries, it is essential to digitize the data that are currently archived in image or hard-copy format. For previously created design drawings, the consistency between the design products is reduced in the digitization process, and the accuracy and reliability of estimates of the equipment and materials by the digitized drawings are remarkably low. In this paper, we propose a method and system of automatically recognizing and extracting design information from imaged piping and instrumentation diagram (P&ID) drawings and automatically generating digitized drawings based on the extracted data by using digital image processing techniques such as template matching and sliding window method. First, the symbols are recognized by template matching and extracted from the imaged P&ID drawing and registered automatically in the database. Then, lines and text are recognized and extracted from in the imaged P&ID drawing using the sliding window method and aspect ratio calculation, respectively. The extracted symbols for equipment and lines are associated with the attributes of the closest text and are stored in the database in neutral format. It is mapped with the predefined intelligent P&ID information and transformed to commercial P&ID tool formats with the associated information stored. As illustrated through the validation case studies, the intelligent digitized drawings generated by the above automatic conversion system, the consistency of the design product is maintained, and the problems experienced with the traditional and manual P&ID input method by engineering companies, such as time consumption, missing items, and misspellings, are solved through the final fine-tune validation process.
Conference Paper
Full-text available
We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. For 300×300300 \times 300 input, SSD achieves 74.3 % mAP on VOC2007 test at 59 FPS on a Nvidia Titan X and for 512×512512 \times 512 input, SSD achieves 76.9 % mAP, outperforming a comparable state of the art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size. Code is available at https:// github. com/ weiliu89/ caffe/ tree/ ssd.
Article
Full-text available
This paper addresses the problem of generating possible object locations for use in object recognition. We introduce selective search which combines the strength of both an exhaustive search and segmentation. Like segmentation, we use the image structure to guide our sampling process. Like exhaustive search, we aim to capture all possible object locations. Instead of a single technique to generate possible object locations, we diversify our search and use a variety of complementary image partitionings to deal with as many image conditions as possible. Our selective search results in a small set of data-driven, class-independent, high quality locations, yielding 99 % recall and a Mean Average Best Overlap of 0.879 at 10,097 locations. The reduced number of locations compared to an exhaustive search enables the use of stronger machine learning techniques and stronger appearance models for object recognition. In this paper we show that our selective search enables the use of the powerful Bag-of-Words model for recognition. The selective search software is made publicly available (Software: http://disi.unitn.it/~uijlings/SelectiveSearch.html).
Conference Paper
Full-text available
The present work investigates the qualitative and quantitative effects of the convolution of a Gaussian function with an image. Besides the evaluation of the commonly called "Gaussian-blur" in the filtering of images, this work also investigates a methodology of segmentation using Gaussian blurring. Noise is inherent to the physical process of acquisition. Therefore, to know the effects of a filtering technique it is fundamental to choose the right technique to filter the image properly, since the segmentation process could be very expensive and time-consuming. An automated method for segmentation that saves time and human labor is always desirable. To evaluate the filtering characteristics, we chose a Quality Index in order to analyze in a quantitative way the effects of the convolution. Results show that the Gaussian Blur technique is to be used in images with high noise and with a Gaussian function of small variance whereas larger variance Gaussian function is more relevant in segmentation of images.
Article
Full-text available
This paper describes a computational approach to edge detection. The success of the approach depends on the definition of a comprehensive set of goals for the computation of edge points. These goals must be precise enough to delimit the desired behavior of the detector while making minimal assumptions about the form of the solution. We define detection and localization criteria for a class of edges, and present mathematical forms for these criteria as functionals on the operator impulse response. A third criterion is then added to ensure that the detector has only one response to a single edge. We use the criteria in numerical optimization to derive detectors for several common image features, including step edges. On specializing the analysis to step edges, we find that there is a natural uncertainty principle between detection and localization performance, which are the two main goals. With this principle we derive a single operator shape which is optimal at any scale. The optimal detector has a simple approximate implementation in which edges are marked at maxima in gradient magnitude of a Gaussian-smoothed image. We extend this simple detector using operators of several widths to cope with different signal-to-noise ratios in the image. We present a general method, called feature synthesis, for the fine-to-coarse integration of information from operators at different scales. Finally we show that step edge detector performance improves considerably as the operator point spread function is extended along the edge.
Article
Full-text available
The state of the art in machine vision inspection and a critical overview of real-world applications are presented in this paper. Two independent ways to classify applications are proposed, one according to the inspected features of the industrial product or process and the other according to the inspection independent characteristics of the inspected product or process. The most contemporary software and hardware tools for developing industrial vision systems are reviewed. Finally, under the light of recent advances in image sensors, software and hardware technology, important issues and directions for designing and developing industrial vision systems are identified and discussed.
Article
Full-text available
As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. This is evidenced by the emergence of face recognition conferences such as AFGR [1] and AVBPA [2], and systematic empirical evaluations of face recognition techniques, including the FERET [3, 4, 5, 6] and XM2VTS [7] protocols. There are at least two reasons for this trend; the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. This paper provides an up-to-date critical survey of still- and video-based face recognition research. 1 The support of the Office of Naval Research under Grants N00014-95-1-0521 and N00014-00-1-0908 is gratefully acknowledged. 2 Vision Technologies Lab, Sarnoff Corporation, Princeton, NJ 08543-5300. 3 Center for Automation Research, University of Maryland, College Park...
Article
Piping and instrumentation diagrams (P&IDs) are commonly used in the process industry as a transfer medium for the fundamental design of a plant and for detailed design, purchasing, procurement, construction, and commissioning decisions. The present study proposes a method for symbol and text recognition for P&ID images using deep-learning technology. Our proposed method consists of P&ID image pre-processing, symbol and text recognition, and the storage of the recognition results. We consider the recognition of symbols of different sizes and shape complexities in high-density P&ID images in a manner that is applicable to the process industry. We also standardize the training dataset structure and symbol taxonomy to optimize the developed deep neural network. A training dataset is created based on diagrams provided by a local Korean company. After training the model with this dataset, a recognition test produced relatively good results, with a precision and recall of 0.9718 and 0.9827 for symbols and 0.9386 and 0.9175 for text, respectively.
Chapter
Digitization of scanned Piping and Instrumentation diagrams (P&ID), widely used in manufacturing or mechanical industries such as oil and gas over several decades, has become a critical bottleneck in dynamic inventory management and creation of smart P&IDs that are compatible with the latest CAD tools. Historically, P&ID sheets have been manually generated at the design stage, before being scanned and stored as PDFs. Current digitization initiatives involve manual processing and are consequently very time consuming, labour intensive and error-prone. Thanks to advances in image processing, machine and deep learning techniques there is an emerging body of work on P&ID digitization. However, existing solutions face several challenges owing to the variation in the scale, size and noise in the P&IDs, the sheer complexity and crowdedness within the drawings, domain knowledge required to interpret the drawings and the very minute visual differences among symbols. This motivates our current solution called Digitize-PID which comprises of an end-to-end pipeline for detection of core components from P&IDs like pipes, symbols and textual information, followed by their association with each other and eventually, the validation and correction of output data based on inherent domain knowledge. A novel and efficient kernel-based line detection and a two-step method for detection of complex symbols based on a fine-grained deep recognition technique is presented in the paper. In addition, we have created an annotated synthetic dataset, Dataset-P&ID, of 500 P&IDs by incorporating different types of noise and complex symbols which is made available for public use (currently there exists no public P&ID dataset). We evaluate our proposed method on this synthetic dataset and a real-world anonymized private dataset of 12 P&ID sheets. Results show that Digitize-PID outperforms the existing state-of-the-art for P&ID digitization.
Article
The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. Code is at: https://github.com/facebookresearch/Detectron.
Conference Paper
Can a large convolutional neural network trained for whole-image classification on ImageNet be coaxed into detecting objects in PASCAL? We show that the answer is yes, and that the resulting system is simple, scalable, and boosts mean average precision, relative to the venerable deformable part model, by more than 40% (achieving a final mAP of 48% on VOC 2007). Our framework combines powerful computer vision techniques for generating bottom-up region proposals with recent advances in learning high-capacity convolutional neural networks. We call the resulting system R-CNN: Regions with CNN features. The same framework is also competitive with state-of-the-art semantic segmentation methods, demonstrating its flexibility. Beyond these results, we execute a battery of experiments that provide insight into what the network learns to represent, revealing a rich hierarchy of discriminative and often semantically meaningful features.
Conference Paper
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region pro-posal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolu-tional features. For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Article
With the development of science and technology, a variety of office automation systems (OAS) has been extensively applied in various occasions. Moreover, digital image processing technology has made great progress. The emergence of a series of excellent algorithms represented by Adaboost human face detection algorithm extends the application space of digital image processing in daily work and study. Besides, the operational capability of existing personal computers enables them to run smoothly these algorithms, which further contributes to the technological maturity of the digital image processing associated office automation systems. To keep up with the pace of information technology, this study selects high definition (HD) technology for paper archives in OAS, which is related to digital image processing as the research content. Automatic high definition demonstration of paper archives can reduce the burden on staff. This paper solved the problems of correction of slanted document image, automatic extraction of identification photo and color enhancement of seal and verified the feasibility of the scheme.
Article
We present YOLO, a unified pipeline for object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is also extremely fast; YOLO processes images in real-time at 45 frames per second, hundreds to thousands of times faster than existing detection systems. Our system uses global image context to detect and localize objects, making it less prone to background errors than top detection systems like R-CNN. By itself, YOLO detects objects at unprecedented speeds with moderate accuracy. When combined with state-of-the-art detectors, YOLO boosts performance by 2-3% points mAP.
Article
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Article
This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.
Article
Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.
Article
Can a large convolutional neural network trained for whole-image classification on ImageNet be coaxed into detecting objects in PASCAL? We show that the answer is yes, and that the resulting system is simple, scalable, and boosts mean average precision, relative to the venerable deformable part model, by more than 40% (achieving a final mAP of 48% on VOC 2007). Our framework combines powerful computer vision techniques for generating bottom-up region proposals with recent advances in learning high-capacity convolutional neural networks. We call the resulting system R-CNN: Regions with CNN features. The same framework is also competitive with state-of-the-art semantic segmentation methods, demonstrating its flexibility. Beyond these results, we execute a battery of experiments that provide insight into what the network learns to represent, revealing a rich hierarchy of discriminative and often semantically meaningful features.
Article
Many public and private Korean organizations are involved during the lifecycle of a domestic nuclear power plant. Korea Plant Engineering Co. (KOPEC) participates in the design stage, Korea Hydraulic and Nuclear Power (KHNP) operates and manages all nuclear power plants in Korea, Doosan Heavy Industry and Construction Co. manufactures the main equipment, and a construction company constructs the plant. Even though each organization has its own digital data management system and obtains a certain level of automation, data sharing among organizations is poor. KHNP obtains drawings and technical specifications from KOPEC in the form of paper. This results in manual re-work of definitions, and errors can potentially occur in the process. In order to establish an information bridge between design and operation and maintenance (O&M) phases, a generic product model (GPM), a data model from Hitachi, is extended for constructing a neutral data warehouse and the Korean Nuclear Power Plant Information Sharing System (KNPISS) is implemented.
Article
We present a computational recognition approach to convert network-like, image-based engineering diagrams into engineering models with which computations of interests, such as CAD modeling, simulation, information retrieval and semantic-aware editing, are enabled. The proposed approach is designed to work on diagrams produced using computer-aided drawing tools or hand sketches, and does not rely on temporal information for recognition. Our approach leverages a Convolutional Neural Network (CNN) as a trainable engineering symbol recognizer. The CNN is capable of learning the visual features of the defined symbol categories from a few user-supplied prototypical diagrams and a set of synthetically generated training samples. When deployed, the trained CNN is applied either to the entire input diagram using a multi-scale sliding window or, where applicable, to each isolated pixel cluster obtained through Connected Component Analysis (CCA). Then the connectivity between the detected symbols are analyzed to obtain an attributed graph representing the engineering model conveyed by the diagram. We evaluate the performance of the approach with benchmark datasets and demonstrate its utility in different application scenarios, including the construction and simulation of control system or mechanical vibratory system models from hand-sketched or camera-captured images, content-based image retrieval for resonant circuits and sematic-aware image editing for floor plans.
Article
Hough has proposed an interesting and computationally efficient procedure for detecting lines in pictures. This paper points out that the use of angle-radius rather than slope-intercept parameters simplifies the computation further. It also shows how the method can be used for more general curve fitting, and gives alternative interpretations that explain the source of its efficiency.
Article
As modular production becomes increasingly widespread in globalized manufacturing industries, many components constituting a final product are being developed and produced by collaborating part suppliers who have the ability to design their own parts by themselves without aid from the original equipment manufacturer (OEM). In this collaborative product development, the important aspect to expedite engineering changes is that engineering change information should be represented precisely in a designer-friendly form and shared among participating companies in an effective manner. However, all part suppliers and an OEM typically do not use the same computer-aided design (CAD) system and they are reluctant to share their CAD data with other cooperating companies owing to the policy of protecting corporate intellectual property. These circumstances make it difficult for collaborating companies to conduct engineering changes, since a part supplier who is responsible for one part of a product needs other CAD part model data designed by other companies for the engineering changes in the typical CAD assembly modeling of a product. In this article, a neutral reference model that consists of a neutral skeleton model and an external reference model is proposed as a new medium for the sharing and propagation of engineering change information among collaborating companies.
Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection
  • X Li
  • W Wang
  • L Wu
  • S Chen
  • X Hu
  • J Li
  • J Tang
  • J Yang