Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Current computer vision methods for symbol detection in piping and instrumentation diagrams (P&IDs) face limitations due to the manual data annotation resources they require. This paper introduces a versatile two-stage symbol detection pipeline that optimizes efficiency by (1) labeling only data samples with minimal cumulative informational redundancy, (2) restricting annotation to the minimal effective training dataset size, and (3) expanding the training dataset using pseudo-labels. In Stage-1, the method performs generic symbol detection, while Stage-2 focuses on symbol differentiation through metric learning. To enhance robustness and general-izability, the model is trained on a diverse dataset collected from both industry sources and web scraping. The achieved Top-1 accuracy is 85.39%, with a Top-5 accuracy of 95.19% on a test dataset containing 102 symbol classes. These results suggest the potential for a shift from resource-intensive supervised learning approaches to a more efficient semi-supervised paradigm.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... [13] Techniques used in One-Shot Learning are often similar to those in Few-Shot Learning but impose even stricter constraints on the number of available examples. A recently developed method by Gupta et al. [14] describes how symbols in Piping and Instrumentation Diagrams (P&IDs) can be recognized using FSL. In this method, first, all symbols in the diagrams are detected, and, second, the symbols are classified using a Siamese network. ...
... However, the fundamental process is expected to remain functional for most legend formats. In our evaluation of the FSL approach as demonstrated by the recent study of Gupta et al. [14], we found that while the method shows promise in the context of P&IDs, it does not generalize to the domain of site plans. One of the primary challenges is that, unlike P&IDs, where symbols typically exist in isolation without overlapping with other elements like pipelines, site plans present a more complex environment. ...
Conference Paper
Full-text available
Site plans play a crucial role in construction projects, providing detailed layouts of structures and infrastructure components. Extracting specific information, such as sewage system details including shafts and pipeline routes, from these plans is essential for accurate cost estimation. However, it remains a labor-intensive task, especially for pixel-based drawings lacking machine-interpretable vector geometries and machine-readable text elements. In this study, we conceptualize an automated method to streamline this process, leveraging object detection and optical character recognition (OCR) techniques. The proposed approach involves three main steps: (1) locating the legend region in which the shaft symbol is specified using a specialized OCR method, (2) identifying the relevant shaft symbol from the plan's legend, and (3) detecting shaft locations within site plans using state-of-the-art object detection algorithms. Developing this method aims to significantly increase efficiency in construction cost estimation by automating the tedious task of extracting and analyzing site plan data. The preliminary results included in this study demonstrate candidate techniques for symbol processing in site plans.
... Over the last few years, various deep learning methods for symbol digitization have been proposed (Elyan, Jamieson, and Ali-Gombe 2020;Faltin, Gann, and König 2023;Faltin, Schönfelder, and König 2022;Gupta, Wei, and Czerniawski 2024;Jakubik et al. 2022;Lai 2020, 2021). Most were based on object detection models, which predict the location and class of objects within an image. ...
... However, to obtain optimum accuracy, it is typically better to source the training data from the same distribution as the test data. In another example, Gupta, Wei, and Czerniawski (2024) presented a symbol detection method in which all symbols were detected as one class using a YOLO-based model. A Siamese Network was then used to differentiate between classes. ...
Article
Full-text available
Recently, there has been significant interest in digitizing engineering drawings due to their complexity and practical benefits. Symbol digitization, a critical aspect in this field, is challenging as utilizing Deep Learning-based methods to recognize symbols of interest requires a large number of training instances for each class of symbols. Acquiring and annotating sufficient diagrams is difficult due to concerns about confidentiality and availability. The conventional manual annotation process is time-consuming, costly, and prone to human error. Additionally, obtaining an adequate number of samples for rare classes proves to be exceptionally challenging. This paper introduces a few-shot framework to address these challenges. Several experiments with fewer than ten, and sometimes just one, training instance per class using complex engineering drawings from industry sources were carried out. The results suggest that our method not only significantly improves symbol detection performance compared to other state-of-the-art methods but also decreases the necessary number of training instances.
... In this paper, we applied Convolutional Neural Networks (CNNs) for glare prediction from participants' faces. CNNs have been effective for image and video classification problems (Gupta et al., 2024), object detection & tracking (Gupta et al., 2022), and segmentation (Wei et al., 2023). A detailed review of various CNN architectures, use-cases, and their variants have been conducted by (Alzubaidi et al., 2021). ...
Conference Paper
Full-text available
Any building designed for human occupancy needs to be visually comfortable. Glare from daylight is one of the main causes of visual discomfort. Glare perception is evaluated by empirical glare models either by photometric measurements or by lighting simulations. This study explores an alternate solution that implements deep learning methods to develop glare prediction models from video recordings of human faces exposed to different levels of sunlight indoors. We trained and evaluated 12 widely used Convolutional Neural Network (CNN) architectures over a dataset of 78 facial videos of 21 human participants experiencing glare in a daylit office-like setup. Results indicate that the best-performing CNN achieves an accuracy of 1) 87% in predicting glare on the repeated participants in unseen lighting conditions of different intensity and, 2) 67% on new participants’ faces with previously seen lighting conditions. We propose future research directions to improve predictions from such models.
... Since the considerable size of the construction drawings (up to 11,000 pixels in side length) exceeds the processing capabilities of Keypoint R-CNN, they are divided into smaller, equally-sized patches. This approach has been successfully used in various studies, including those by Gupta et al. [71], Faltin et al. [56], or Elyan et al. [72]. Segmenting the drawings into smaller patches also better suits the symbol's dimensions, typically around 120 pixels wide and 90 pixels tall. ...
Article
Full-text available
Efficient maintenance planning and streamlined inspection for bridges are essential to prevent catastrophic structural failures. Digital Bridge Management Systems (BMS) have the potential to streamline these tasks. However, their effectiveness relies heavily on the availability of accurate digital bridge models, which are currently challenging and costly to create, limiting the widespread adoption of BMS. This study addresses this issue by proposing a computer vision-based process for generating bridge superstructure models from pixel-based construction drawings. We introduce an automatic pipeline that utilizes a deep learning-based symbol pose estimation approach based on Keypoint R-CNN to organize drawing views spatially, implementing parts of the proposed process. By extending the keypoint-based detection approach to simultaneously process multiple object classes with a variable number of keypoints, a single instance of Keypoint R-CNN can be trained for all identified symbols. We conducted an empirical analysis to determine evaluation parameters for the symbol pose estimation approach to evaluate the method's performance and improve the trained model's comparability. Our findings demonstrate promising steps towards efficient bridge modeling, ultimately facilitating maintenance planning and management.
Article
In structural design, accurately extracting information from floor plan drawings of buildings is essential for building 3D models and facilitating design automation. However, deep learning models often face challenges due to their dependence on large labeled datasets, which are labor and time‐intensive to generate. And floor plan drawings often present challenges, such as overlapping elements and similar geometric shapes. This study introduces a semi‐supervised wall segmentation approach (SWS), specifically designed to perform effectively with limited labeled data. SWS combines a deep semantic feature extraction framework with a hierarchical vision transformer and multi‐scale feature aggregation to refine feature maps and maintain the spatial precision necessary for pixel‐wise segmentation. SWS incorporates consistency regularization to encourage consistent predictions across weak and strong augmentations of the same image. The proposed method improves an intersection over union by more than 4%.
Article
Full-text available
Digital transformation is omnipresent in our daily lives and its impact is noticeable through new technologies, like smart devices, AI-Chatbots or the changing work environment. This digitalization also takes place in product development, with the integration of many technologies, such as Industry 4.0, digital twins or data-driven methods, to improve the quality of new products and to save time and costs during the development process. Therefore, the use of data-driven methods reusing existing data has great potential. However, data from product design are very diverse and strongly depend on the respective development phase. One of the first few product representations are sketches and drawings, which represent the product in a simplified and condensed way. But, to reuse the data, the existing sketches must be found with an automated approach, allowing the contained information to be utilized. One approach to solve this problem is presented in this paper, with the detection of principle sketches in the early phase of the development process. The aim is to recognize the symbols in these sketches automatically with object detection models. Therefore, existing approaches were analyzed and a new procedure developed, which uses synthetic training data generation. In the next step, a total of six different data generation types were analyzed and tested using six different one- and two-stage detection models. The entire procedure was then evaluated on two unknown test datasets, one focusing on different gearbox variants and a second dataset derived from CAD assemblies. In the last sections the findings are discussed and a procedure with high detection accuracy is determined.
Conference Paper
Full-text available
For successfully training neural networks, developers often require large and carefully labelled datasets. However, gathering such high-quality data is often time-consuming and prohibitively expensive. Thus, synthetic data are used for developing AI (Artificial Intelligence) /ML (Machine Learning) models because their generation is comparatively faster and inexpensive. The paper presents a proof-of-concept for generating a synthetic labelled dataset for P&ID diagrams. This is accomplished by employing a data-augmentation approach of random cropping. The framework also facilitates the creation of a complete and automatically labelled dataset which can be used directly as an input to the deep learning models. We also investigate the importance of context in an image that is, the impact of relative resolution of a symbol and the background image. We have tested our algorithm for the symbol of a valve as a proof-of-concept and obtained encouraging results.
Article
Full-text available
This study proposes an end-to-end digitization method for converting piping and instrumentation diagrams (P&IDs) in the image format to digital P&IDs. Automating this process is an important concern in the process plant industry because presently image P&IDs are manually converted into digital P&IDs. The proposed method comprises object recognition within the P&ID images, topology reconstruction of recognized objects, and digital P&ID generation. A dataset comprising 75,031 symbol, 10,073 text, and 90,054 line data was constructed to train the deep neural networks used for recognizing symbols, text, and lines. Topology reconstruction and digital P&ID generation were developed based on traditional rule-based approaches. Five test P&IDs were digitalized in the experiments. The experimental results for recognizing symbols, text, and lines showed good precision and recall performance, with averages of 96.65%/96.40%, 90.65%/92.16%, and 95.25%/87.91%, respectively. The topology reconstruction results showed an average precision of 99.56% and recall of 96.07%. The digitization was completed in less than three and a half hours (8488.2 s on average) for five test P&IDs.
Article
Full-text available
Machine learning has become the state-of-the-art technique for many tasks including computer vision, natural language processing, speech processing tasks, etc. However, the unique challenges posed by machine learning suggest that incorporating user knowledge into the system can be beneficial. The purpose of integrating human domain knowledge is also to promote the automation of machine learning. Human-in-the-loop is an area that we see as increasingly important in future research due to the knowledge learned by machine learning cannot win human domain knowledge. Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize the major approaches in the field; along with their technical strengths/ weaknesses, we have a simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and to motivate interested readers to consider approaches for designing effective human-in-the-loop solutions.
Article
Full-text available
At present usage of computational intelligence became the ultimate need of the heavy engineering industries. Digitization can be achieved in these sectors by scanning the hard copy images. When older documents are digitized are not of very high fidelity and therefore the accuracy, reliability of the estimates of components such as equipment and materials after digitization are remarkably low since (Piping and Instrumentation Diagrams) P&IDs come in various shapes and sizes, with varying levels of quality along with myriad smaller challenges such as low resolution of images, high intra project diagram variation along with no standardization in the engineering sector for diagram representation to name a few, digitizing P&IDs remains a challenging problem. In this study an end to end pipeline is proposed for automatically digitizing engineering diagrams which would involve automatic recognition, classification and extraction of diagram components from images and scans of engineering drawings such as P&IDs and automatically generating digitized drawings automatically from this obtained data. This would be done using image processing algorithms such as template matching, canny edge detection and the sliding window method. Then the lines would be obtained from the P&ID using canny edge detection and sliding window approach, the text would be recognized using an aspect ratio calculation. Finally, all the extracted components of the P&ID are associated with the closest texts present and the components mapped to each other. By the way of using such pipelines as proposed the diagrams are consistently of high quality, other smaller problems such as mis-spelling and valuable time churn are solved or minimized to large extent and paving the way for application of big data technologies such as machine learning analytics on these diagrams resulting in further efficiencies in operational process
Article
Full-text available
Piping and instrument diagrams (P&IDs) are a key component of the process industry; they contain information about the plant, including the instruments, lines, valves, and control logic. However, the complexity of these diagrams makes it difficult to extract the information automatically. In this study, we implement an object-detection method to recognize graphical symbols in P&IDs. The framework consists of three parts—region proposal, data annotation, and classification. Sequential image processing is applied as the region proposal step for P&IDs. After getting the proposed regions, the unsupervised learning methods, k-means, and deep adaptive clustering are implemented to decompose the detected dummy symbols and assign negative classes for them. By training a convolutional network, it becomes possible to classify the proposed regions and extract the symbolic information. The results indicate that the proposed framework delivers a superior symbol-recognition performance through dummy detection.
Preprint
Full-text available
Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.
Article
Full-text available
A piping and instrumentation diagram (P&ID) is a key drawing widely used in the energy industry. In a digital P&ID, all included objects are classified and made amenable to computerized data management. However, despite being widespread, a large number of P&IDs in the image format still in use throughout the process (plant design, procurement, construction, and commissioning) are hampered by difficulties associated with contractual relationships and software systems. In this study, we propose a method that uses deep learning techniques to recognize and extract important information from the objects in the image-format P&IDs. We define the training data structure required for developing a deep learning model for the P&ID recognition. The proposed method consists of preprocessing and recognition stages. In the preprocessing stage, diagram alignment, outer border removal, and title box removal are performed. In the recognition stage, symbols, characters, lines, and tables are detected. The objects for recognition are symbols, characters, lines, and tables in P&ID drawings. A new deep learning model for symbol detection is defined using AlexNet. We also employ the connectionist text proposal network (CTPN) for character detection, and traditional image processing techniques for P&ID line and table detection. In the experiments where two test P&IDs were recognized according to the proposed method, recognition accuracies for symbol, characters, and lines were found to be 91.6%, 83.1%, and 90.6% on average, respectively.
Conference Paper
Full-text available
Piping and instrumentation diagram(P&ID) is an essential design document which is continuously modified and managed from design phase to O&M phase. For the ease of data transfer, P&IDs are generally converted into PDF which is hard to be modified. Therefore, engineering companies who need to manage P&IDs should manually re-convert their P&ID images into the P&IDs in CAD formats. To reduce the inefficiency of the P&ID re-conversion, various symbols and texts in P&ID images should be automatically detected before the re-conversion. As the first step to the auto P&ID conversion, in this study, we propose methods detecting symbols and texts in P&ID image using geometrical and deep learning-based approaches.
Article
Full-text available
Overfitting is a fundamental issue in supervised machine learning which prevents us from perfectly generalizing the models to well fit observed data on training data, as well as unseen data on testing set. Because of the presence of noise, the limited size of training set, and the complexity of classifiers, overfitting happens. This paper is going to talk about overfitting from the perspectives of causes and solutions. To reduce the effects of overfitting, various strategies are proposed to address to these causes: 1) "early-stopping" strategy is introduced to prevent overfitting by stopping training before the performance stops optimize; 2) "network-reduction" strategy is used to exclude the noises in training set; 3) "data-expansion" strategy is proposed for complicated models to fine-tune the hyper-parameters sets with a great amount of data; and 4) "regularization" strategy is proposed to guarantee models performance to a great extent while dealing with real world issues by feature-selection, and by distinguishing more useful and less useful features.
Preprint
Full-text available
One of the most common modes of representing engineering schematics are Piping and Instrumentation diagrams (P&IDs) that describe the layout of an engineering process flow along with the interconnected process equipment. Over the years, P&ID diagrams have been manually generated, scanned and stored as image files. These files need to be digitized for purposes of inventory management and updation, and easy reference to different components of the schematics. There are several challenging vision problems associated with digitizing real world P&ID diagrams. Real world P&IDs come in several different resolutions, and often contain noisy textual information. Extraction of instrumentation information from these diagrams involves accurate detection of symbols that frequently have minute visual differences between them. Identification of pipelines that may converge and diverge at different points in the image is a further cause for concern. Due to these reasons, to the best of our knowledge, no system has been proposed for end-to-end data extraction from P&ID diagrams. However, with the advent of deep learning and the spectacular successes it has achieved in vision, we hypothesized that it is now possible to re-examine this problem armed with the latest deep learning models. To that end, we present a novel pipeline for information extraction from P&ID sheets via a combination of traditional vision techniques and state-of-the-art deep learning models to identify and isolate pipeline codes, pipelines, inlets and outlets, and for detecting symbols. This is followed by association of the detected components with the appropriate pipeline. The extracted pipeline information is used to populate a tree-like data-structure for capturing the structure of the piping schematics. We evaluated proposed method on a real world dataset of P&ID sheets obtained from an oil firm and have obtained promising results.
Conference Paper
Full-text available
Technical drawings are commonly used across different industries such as Oil and Gas, construction, mechanical and other types of engineering. In recent years, the digitization of these drawings is becoming increasingly important. In this paper, we present a semi-automatic and heuristic-based approach to detect and localise symbols within these drawings. This includes generating a labeled dataset from real world engineering drawings and investigating the classification performance of three different state-of the art supervised machine learning algorithms. In order to improve the classification accuracy the dataset was pre-processed using unsupervised learning algorithms to identify hidden patterns within classes. Testing and evaluating the proposed methods on a dataset of symbols representing one standard of drawings, namely Process and Instrumentation (P&ID) showed very competitive results.
Article
Full-text available
Engineering drawings are commonly used across different industries such as oil and gas, mechanical engineering and others. Digitising these drawings is becoming increasingly important. This is mainly due to the legacy of drawings and documents that may provide rich source of information for industries. Analysing these drawings often requires applying a set of digital image processing methods to detect and classify symbols and other components. Despite the recent significant advances in image processing, and in particular in deep neural networks, automatic analysis and processing of these engineering drawings is still far from being complete. This paper presents a general framework for complex engineering drawing digitisation. A thorough and critical review of relevant literature, methods and algorithms in machine learning and machine vision is presented. Real-life industrial scenario on how to contextualise the digitised information from specific type of these drawings, namely piping and instrumentation diagrams, is discussed in details. A discussion of how new trends on machine vision such as deep learning could be applied to this domain is presented with conclusions and suggestions for future research directions.
Article
Full-text available
Convolutional neural networks have significantly boosted the performance of face recognition in recent years due to its high capacity in learning discriminative features. To enhance the discriminative power of the Softmax loss, multiplicative angular margin and additive cosine margin incorporate angular margin and cosine margin into the loss functions, respectively. In this paper, we propose a novel supervisor signal, additive angular margin (ArcFace), which has a better geometrical interpretation than supervision signals proposed so far. Specifically, the proposed ArcFace cos(θ+m)\cos(\theta + m) directly maximise decision boundary in angular (arc) space based on the L2 normalised weights and features. Compared to multiplicative angular margin cos(mθ)\cos(m\theta) and additive cosine margin cosθm\cos\theta-m, ArcFace can obtain more discriminative deep features. We also emphasise the importance of network settings and data refinement in the problem of deep face recognition. Extensive experiments on several relevant face recognition benchmarks, LFW, CFP and AgeDB, prove the effectiveness of the proposed ArcFace. Most importantly, we get state-of-art performance in the MegaFace Challenge in a totally reproducible way. We make data, models and training/test code public available~\footnote{https://github.com/deepinsight/insightface}.
Article
Full-text available
We propose a deep learning-based solution for the problem of feature learning in one-class classification. The proposed method operates on top of a Convolutional Neural Network (CNN) of choice and produces descriptive features while maintaining a low intra-class variance in the feature space for the given class. For this purpose two loss functions, compactness loss and descriptiveness loss are proposed along with a parallel CNN architecture. A template matching-based framework is introduced to facilitate the testing process. Extensive experiments on publicly available anomaly detection, novelty detection and mobile active authentication datasets show that the proposed Deep One-Class (DOC) classification method achieves significant improvements over the state-of-the-art.
Conference Paper
Full-text available
In order to effectively detect faults and maintain heavy machines, a standard practice in several organizations is to conduct regular manual inspections. The procedure for conducting such inspections requires marking of the damaged components on a standardized inspection sheet which is then camera scanned. These sheets are marked for different faults in corresponding machine zones using hand-drawn arrows and text. As a result, the reading environment is highly unstructured and requires a domain expert while extracting the manually marked information. In this paper, we propose a novel pipeline to build an information extraction system for such machine inspection sheets, utilizing state-of-the-art deep learning and computer vision techniques. The pipeline proceeds in the following stages: (1) localization of different zones of the machine, arrows and text using a combination of template matching, deep learning and connected components, and (2) mapping the machine zone to the corresponding arrow head and the text segment to the arrow tail, followed by pairing them to get the correct damage code for each zone. Experiments were performed on a dataset collected from an anonymous real world manufacturing unit. Results demonstrate the efficacy of the proposed approach and we also report the accuracy of each step in the pipeline.
Article
Full-text available
We investigate coresets - succinct, small summaries of large data sets - so that solutions found on the summary are provably competitive with solution found on the full data set. We provide an overview over the state-of-the-art in coreset construction for machine learning. In Section 2, we present both the intuition behind and a theoretically sound framework to construct coresets for general problems and apply it to k-means clustering. In Section 3 we summarize existing coreset construction algorithms for a variety of machine learning problems such as maximum likelihood estimation of mixture models, Bayesian non-parametric models, principal component analysis, regression and general empirical risk minimization.
Article
Full-text available
Coresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clustering models to massive data sets. While existing approaches generally only allow for multiplicative approximation errors, we propose a novel notion of coresets called lightweight coresets that allows for both multiplicative and additive errors. We provide a single algorithm to construct light-weight coresets for k-Means clustering, Bregman clustering and maximum likelihood estimation of Gaussian mixture models. The algorithm is substantially faster than existing constructions, embarrassingly parallel and resulting coresets are smaller. In an extensive experimental evaluation, we demonstrate that the proposed method outperforms existing coreset constructions.
Article
Full-text available
We propose the simple and efficient method of semi-supervised learning for deep neural networks. Basically, the proposed network is trained in a supervised fashion with labeled and unlabeled data simultaneously. For un-labeled data, Pseudo-Label s, just picking up the class which has the maximum predicted probability, are used as if they were true labels. This is in effect equivalent to Entropy Regularization. It favors a low-density separation between classes, a commonly assumed prior for semi-supervised learning. With De-noising Auto-Encoder and Dropout, this simple method outperforms conventional methods for semi-supervised learning with very small labeled data on the MNIST handwritten digit dataset.
Article
Full-text available
This paper describes the development of an algorithm for verification of signatures written on a touch-sensitive pad. The signature verification algorithm is based on an artificial neural network. The novel network presented here, called a “Siamese” time delay neural network, consists of two identical networks joined at their output. During training the network learns to measure the similarity between pairs of signatures. When used for verification, only one half of the Siamese network is evaluated. The output of this half network is the feature vector for the input signature. Verification consists of comparing this feature vector with a stored feature vector for the signer. Signatures closer than a chosen threshold to this stored representation are accepted, all other signatures are rejected as forgeries. System performance is illustrated with experiments performed in the laboratory.
Article
Full-text available
Many deep neural networks trained on natural images exhibit a curious phenomenon in common: on the first layer they learn features similar to Gabor filters and color blobs. Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks. Features must eventually transition from general to specific by the last layer of the network, but this transition has not been studied extensively. In this paper we experimentally quantify the generality versus specificity of neurons in each layer of a deep convolutional neural network and report a few surprising results. Transferability is negatively affected by two distinct issues: (1) the specialization of higher layer neurons to their original task at the expense of performance on the target task, which was expected, and (2) optimization difficulties related to splitting networks between co-adapted neurons, which was not expected. In an example network trained on ImageNet, we demonstrate that either of these two issues may dominate, depending on whether features are transferred from the bottom, middle, or top of the network. We also document that the transferability of features decreases as the distance between the base task and target task increases, but that transferring features even from distant tasks can be better than using random features. A final surprising result is that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset.
Article
Full-text available
Landscape chamcteristics such as small patch size and land-cover heterogeneity have been hypothesized to increase the likelihood of mis-classifying pixels during thematic image classification. However, there has been a lack of empirical evidence to support these hypotheses. This study utilizes data gathered as part of the accuracy assessment of the 1992 National Land Cover Data (NLCD) set to identify and quantify the impacts of land-cover heterogeneity and patch size on classification accuracy Logistic regression is employed to assess the impacts of these variables, as well as the impact of land-cover class information. The results reveal that accuracy decreases as land-cover heterogeneity increases and as patch size decreases. These landscape variables remain significant factors in explaining classification accuracy even when adjusted for their con-founding association with land-cover class information.
Article
Full-text available
This paper describes a proof-of-concept computer system capable of interpreting graphical component connectivity in Process and Instrumentation Drawings (P&IDs). The input drawing is a CAD file in DXF format (AutoCAD Release 12, Advanced Tools Manual, Autodesk Inc., 1993, Chapter 6). Drawing features are considered to be pipes or symbolic entities with geometrically constrained attachment ports. A low-level image processing front-end is currently being developed to support the interpretation of drawings in vector format from scanned paper P&IDs. A hierarchical data structure is used to enable recognition of components in collinear and circuit complexes.
Preprint
Overfitting and generalization is an important concept in Machine Learning as only models that generalize are interesting for general applications. Yet some students have trouble learning this important concept through lectures and exercises. In this paper we describe common examples of students misunderstanding overfitting, and provide recommendations for possible solutions. We cover student misconceptions about overfitting, about solutions to overfitting, and implementation mistakes that are commonly confused with overfitting issues. We expect that our paper can contribute to improving student understanding and lectures about this important topic.
Conference Paper
Piping and Instrumentation Diagrams are detailed diagrams that show the piping and process equipment together with the instrumentation and control devices used in the process industry. These are updated during the plant life cycle to depict the latest changes and modifications done in the plant. These diagrams are detailed and are usually in the form of PDF, they are difficult to modify, and inferring details from these diagrams requires a deep understanding of plant process and engineering. To reduce the difficulty in finding and retrieving information from the diagrams, the proposal of automating the task of detection of the symbols and the texts embedded is explored in this paper. This paper compares various Deep Learning models on P&I Diagrams by using object detection for symbols using the Transfer Learning technique. Initially, the Connectionist Text Proposal Network algorithm is used for text detection in the images, then for the geometrically shaped objects, OpenCV library is used. The models trained by Transfer Learning are exported and the results are compared by performing symbol detection. This paper is an overall study of various deep learning methods implemented on P&I Diagrams.
Article
Recently, a popular line of research in face recognition is adopting margins in the well-established softmax loss function to maximize class separability. In this paper, we first introduce an Additive Angular Margin Loss (ArcFace), which not only has a clear geometric interpretation but also significantly enhances the discriminative power. Since ArcFace is susceptible to the massive label noise, we further propose sub-center ArcFace, in which each class contains K sub-centers and training samples only need to be close to any of the K positive sub-centers. Sub-center ArcFace encourages one dominant sub-class that contains the majority of clean faces and non-dominant sub-classes that include hard or noisy faces. Based on this self-propelled isolation, we boost the performance through automatically purifying raw web faces under massive real-world noise. Besides discriminative feature embedding, we also explore the inverse problem, mapping feature vectors to face images. Without training any additional generator or discriminator, the pre-trained ArcFace model can generate identity-preserved face images for both subjects inside and outside the training data only by using the network gradient and Batch Normalization (BN) priors. Extensive experiments demonstrate that ArcFace can enhance the discriminative feature embedding as well as strengthen the generative face synthesis.
Article
Piping and instrumentation diagrams (P&IDs) are commonly used in the process industry as a transfer medium for the fundamental design of a plant and for detailed design, purchasing, procurement, construction, and commissioning decisions. The present study proposes a method for symbol and text recognition for P&ID images using deep-learning technology. Our proposed method consists of P&ID image pre-processing, symbol and text recognition, and the storage of the recognition results. We consider the recognition of symbols of different sizes and shape complexities in high-density P&ID images in a manner that is applicable to the process industry. We also standardize the training dataset structure and symbol taxonomy to optimize the developed deep neural network. A training dataset is created based on diagrams provided by a local Korean company. After training the model with this dataset, a recognition test produced relatively good results, with a precision and recall of 0.9718 and 0.9827 for symbols and 0.9386 and 0.9175 for text, respectively.
Chapter
Digitization of scanned Piping and Instrumentation diagrams (P&ID), widely used in manufacturing or mechanical industries such as oil and gas over several decades, has become a critical bottleneck in dynamic inventory management and creation of smart P&IDs that are compatible with the latest CAD tools. Historically, P&ID sheets have been manually generated at the design stage, before being scanned and stored as PDFs. Current digitization initiatives involve manual processing and are consequently very time consuming, labour intensive and error-prone. Thanks to advances in image processing, machine and deep learning techniques there is an emerging body of work on P&ID digitization. However, existing solutions face several challenges owing to the variation in the scale, size and noise in the P&IDs, the sheer complexity and crowdedness within the drawings, domain knowledge required to interpret the drawings and the very minute visual differences among symbols. This motivates our current solution called Digitize-PID which comprises of an end-to-end pipeline for detection of core components from P&IDs like pipes, symbols and textual information, followed by their association with each other and eventually, the validation and correction of output data based on inherent domain knowledge. A novel and efficient kernel-based line detection and a two-step method for detection of complex symbols based on a fine-grained deep recognition technique is presented in the paper. In addition, we have created an annotated synthetic dataset, Dataset-P&ID, of 500 P&IDs by incorporating different types of noise and complex symbols which is made available for public use (currently there exists no public P&ID dataset). We evaluate our proposed method on this synthetic dataset and a real-world anonymized private dataset of 12 P&ID sheets. Results show that Digitize-PID outperforms the existing state-of-the-art for P&ID digitization.
Chapter
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is one of the most authoritative academic competitions in the field of Computer Vision (CV) in recent years. But applying ILSVRC’s annual champion directly to fine-grained visual categorization (FGVC) tasks does not achieve good performance. To FGVC tasks, the small inter-class variations and the large intra-class variations make it a challenging problem. Our attention object location module (AOLM) can predict the position of the object and attention part proposal module (APPM) can propose informative part regions without the need of bounding-box or part annotations. The obtained object images not only contain almost the entire structure of the object, but also contains more details, part images have many different scales and more fine-grained features, and the raw images contain the complete object. The three kinds of training images are supervised by our multi-branch network. Therefore, our multi-branch and multi-scale learning network(MMAL-Net) has good classification ability and robustness for images of different scales. Our approach can be trained end-to-end, while provides short inference time. Through the comprehensive experiments demonstrate that our approach can achieves state-of-the-art results on CUB-200-2011, FGVC-Aircraft and Stanford Cars datasets. Our code will be available at https://github.com/ZF1044404254/MMAL-Net.
Article
Engineering drawings are commonly used in different industries such as Oil and Gas, construction, and other types of engineering. Digitising these drawings is becoming increasingly important. This is mainly due to the need to improve business practices such as inventory, assets management, risk analysis, and other types of applications. However, processing and analysing these drawings is a challenging task. A typical diagram often contains a large number of different types of symbols belonging to various classes and with very little variation among them. Another key challenge is the class-imbalance problem, where some types of symbols largely dominate the data while others are hardly represented in the dataset. In this paper, we propose methods to handle these two challenges. First, we propose an advanced bounding-box detection method for localising and recognising symbols in engineering diagrams. Our method is end-to-end with no user interaction. Thorough experiments on a large collection of diagrams from an industrial partner proved that our methods accurately recognise more than 94% of the symbols. Secondly, we present a method based on Deep Generative Adversarial Neural Network for handling class-imbalance. The proposed GAN model proved to be capable of learning from a small number of training examples. Experiment results showed that the proposed method greatly improved the classification of symbols in engineering drawings.
Chapter
Computational properties of use to biological organisms or to the construction of computers can emerge as collective properties of systems having a large number of simple equivalent components (or neurons). The physical meaning of content-addressable memory is described by an appropriate phase space flow of the state of a system. A model of such a system is given, based on aspects of neurobiology but readily adapted to integrated circuits. The collective properties of this model produce a content-addressable memory which correctly yields an entire memory from any subpart of sufficient size. The algorithm for the time evolution of the state of the system is based on asynchronous parallel processing. Additional emergent collective properties include some capacity for generalization, familiarity recognition, categorization, error correction, and time sequence retention. The collective properties are only weakly sensitive to details of the modeling or the failure of individual devices.
Conference Paper
\emphCoresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clustering models to massive data sets. While existing approaches generally only allow for multiplicative approximation errors, we propose a novel notion of lightweight coresets that allows for both multiplicative and additive errors. We provide a single algorithm to construct lightweight coresets for k -means clustering as well as soft and hard Bregman clustering. The algorithm is substantially faster than existing constructions, embarrassingly parallel, and the resulting coresets are smaller. We further show that the proposed approach naturally generalizes to statistical k -means clustering and that, compared to existing results, it can be used to compute smaller summaries for empirical risk minimization. In extensive experiments, we demonstrate that the proposed algorithm outperforms existing data summarization strategies in practice.
Article
We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more accurate. It's still fast though, don't worry. At 320x320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 mAP@50 in 51 ms on a Titan X, compared to 57.5 mAP@50 in 198 ms by RetinaNet, similar performance but 3.8x faster. As always, all the code is online at https://pjreddie.com/yolo/
Conference Paper
Many deep neural networks trained on natural images exhibit a curious phenomenon in common: on the first layer they learn features similar to Gabor filters and color blobs. Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks. Features must eventually transition from general to specific by the last layer of the network, but this transition has not been studied extensively. In this paper we experimentally quantify the generality versus specificity of neurons in each layer of a deep convolutional neural network and report a few surprising results. Transferability is negatively affected by two distinct issues: (1) the specialization of higher layer neurons to their original task at the expense of performance on the target task, which was expected, and (2) optimization difficulties related to splitting networks between co-adapted neurons, which was not expected. In an example network trained on ImageNet, we demonstrate that either of these two issues may dominate, depending on whether features are transferred from the bottom, middle, or top of the network. We also document that the transferability of features decreases as the distance between the base task and target task increases, but that transferring features even from distant tasks can be better than using random features. A final surprising result is that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset.
Conference Paper
Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features. The objective is to make these higher- level representations more abstract, with their individual features more invariant to most of the variations that are typically present in the training distribution, while collectively preserving as much as possible of the information in the input. Ideally, we would like these representations to disentangle the unknown factors of variation that underlie the training distribution. Such unsupervised learning of representations can be exploited usefully under the hypothesis that the input distribution P(x) is structurally related to some task of interest, say predicting P(y|x). This paper focusses on why unsupervised pre-training of representations can be useful, and how it can be exploited in the transfer learning scenario, where we care about predictions on examples that are not from the same distribution as the training distribution
Conference Paper
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region pro-posal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolu-tional features. For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Conference Paper
This paper proposes a novel framework for automated recognition of components in a Piping and Instrumentation Diagram (P&ID) of raster form. Contour is used as the main clue for visual recognition through the use of Local Binary Pattern (LBP) as descriptor and concept of Spatial Pyramid Matching (SPM). Comparison of two image patches is done by calculating the l1 distance between two corresponding LBP based descriptors. Firstly, the framework requires at least one example image per type of component to be recognised, the corresponding LBP and SPM based descriptor is determined and stored. Linear sliding window approach is used to detect a small set of top candidates from a pool of all sub-images in original image. Verification against the entire library of symbols is performed on each candidate selected from previous stage, using concept of nearest neighbour based classification. The method has demonstrated state of the art performance in a new challenging dataset created with advices from a group of experienced engineers in marine and offshore industry.
Article
Crowdsourcing refers to solving large problems by involving human workers that solve component sub-problems or tasks. In data crowdsourcing, the problem involves data acquisition, management, and analysis. In this paper, we provide an overview of data crowdsourcing, giving examples of problems that the authors have tackled, and presenting the key design steps involved in implementing a crowdsourced solution. We also discuss some of the open challenges that remain to be solved.
Article
People learning new concepts can often generalize successfully from just a single example, yet machine learning algorithms typically require tens or hundreds of examples to perform with similar accuracy. People can also use learned concepts in richer ways than conventional algorithms—for action, imagination, and explanation. We present a computational model that captures these human learning abilities for a large class of simple visual concepts: handwritten characters from the world’s alphabets. The model represents concepts as simple programs that best explain observed examples under a Bayesian criterion. On a challenging one-shot classification task, the model achieves human-level performance while outperforming recent deep learning approaches. We also present several “visual Turing tests” probing the model’s creative generalization abilities, which in many cases are indistinguishable from human behavior.
Conference Paper
Integrating legacy plant and process information into engineering, control, and enterprise systems may significantly increase the efficiency of managerial and technical operations in industrial facilities. The first step towards the pursued data integration is the extraction of relevant information from existing engineering documents, many of which are stored in vector-graphics-compatible formats such as PDF. Accordingly, this paper is aimed at proposing a novel methodology for the automatic extraction of structural and connectivity information from vector-graphics-coded engineering documents. A case study of a piping and instrumentation diagram (P&ID) demonstrates the reliable performance of the approach for the recognition of symbols, annotations, and underlying connectivity.
Conference Paper
Deep learning has proven itself as a successful set of models for learning useful semantic representations of data. These, however, are mostly implicitly learned as part of a classification task. In this paper we propose the triplet network model, which aims to learn useful representations by distance comparisons. A similar model was defined by Wang et al. (2014), tailor made for learning a ranking for image information retrieval. Here we demonstrate using various datasets that our model learns a better representation than that of its immediate competitor, the Siamese network. We also discuss future possible usage as a framework for unsupervised learning
Article
A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence-absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size (n < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.
Article
The focus of this paper is on how to select a small sample of examples for labeling that can help us to evaluate many different classification models unknown at the time of sampling. We are particularly interested in studying the sampling strategies for problems in which the prevalence of the two classes is highly biased toward one of the classes. The evaluation measures of interest we want to estimate as accurately as possible are those obtained from the contingency table. We provide a careful theoretical analysis on sensitivity, specificity, and precision and show how sampling strategies should be adapted to the rate of skewness in data in order to effectively compute the three aforementioned evaluation measures.
Article
Principal Components Analysis (PCA) as a method of multivariate statistics was created before the Second World War. However, the wider application of this method only occurred in the 1960s, during the “Quantitative Revolution” in the Natural and Social Sciences.The main reason for this time-lag was the huge difficulty posed by calculations involving this method. Only with the advent and development of computers did the almost unlimited application of multivariate statistical methods, including principal components, become possible.At the same time, requirements arose for precise numerical methods concerning, among other things, the calculation of eigenvalues and eigenvectors, because the application of principal components to technical problems required absolute accuracy.On the other hand, numerous applications in Social Sciences gave rise to a significant increase in the ability to interpret these nonobservable variables, which is just what the principal components are. In the application of principal components, the problem is not only to do with their formal properties but above all, their empirical origins.The authors considered these two tendencies during the creation of the program for principal components. This program—entitled PCA—accompanies this paper. It analyzes consecutively, matrices of variance-covariance and correlations, and performs the following functions: •- the determination of eigenvalues and eigenvectors of these matrices.•- the testing of principal components.•- the calculation of coefficients of determination between selected components and the initial variables, and the testing of these coefficients,•- the determination of the share of variation of all the initial variables in the variation of particular components,•- construction of a dendrite for the initial set of variables,•- the construction of a dendrite for a selected pattern of the principal components,•- the scatter of the objects studied in a selected coordinate system.Thus, the PCA program performs many more functions especially in testing and graphics, than PCA programs in conventional statistical packages. Included in this paper are a theoretical description of principal components, the basic rules for their interpretation and also statistical testing.
Article
With the continuous expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and analysis from raw data to support decision-making processes. Although existing knowledge discovery and data engineering techniques have shown great success in many real-world applications, the problem of learning from imbalanced data (the imbalanced learning problem) is a relatively new challenge that has attracted growing attention from both academia and industry. The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews. Due to the inherent complex characteristics of imbalanced data sets, learning from such data requires new understandings, principles, algorithms, and tools to transform vast amounts of raw data efficiently into information and knowledge representation. In this paper, we provide a comprehensive review of the development of research in learning from imbalanced data. Our focus is to provide a critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario. Furthermore, in order to stimulate future research in this field, we also highlight the major opportunities and challenges, as well as potential important research directions for learning from imbalanced data.
Article
The Hough transform is a method for detecting curves by exploiting the duality between points on a curve and parameters of that curve. The initial work showed how to detect both analytic curves(1,2) and non-analytic curves,(3) but these methods were restricted to binary edge images. This work was generalized to the detection of some analytic curves in grey level images, specifically lines,(4) circles(5) and parabolas.(6) The line detection case is the best known of these and has been ingeniously exploited in several applications.(7,8,9)We show how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space. Such a mapping can be exploited to detect instances of that particular shape in an image. Furthermore, variations in the shape such as rotations, scale changes or figure ground reversals correspond to straightforward transformations of this mapping. However, the most remarkable property is that such mappings can be composed to build mappings for complex shapes from the mappings of simpler component shapes. This makes the generalized Hough transform a kind of universal transform which can be used to find arbitrarily complex shapes.