Preprint

# Artificial Intelligence Assisted Infrastructure Assessment Using Mixed Reality Systems

Authors:
• Connected Wise
To read the file of this research, you can request a copy directly from the authors.

## Abstract and Figures

Conventional methods for visual assessment of civil infrastructures have certain limitations, such as subjectivity of the collected data, long inspection time, and high cost of labor. Although some new technologies i.e. robotic techniques that are currently in practice can collect objective, quantified data, the inspectors own expertise is still critical in many instances since these technologies are not designed to work interactively with human inspector. This study aims to create a smart, human centered method that offers significant contributions to infrastructure inspection, maintenance, management practice, and safety for the bridge owners. By developing a smart Mixed Reality framework, which can be integrated into a wearable holographic headset device, a bridge inspector, for example, can automatically analyze a certain defect such as a crack that he or she sees on an element, display its dimension information in real-time along with the condition state. Such systems can potentially decrease the time and cost of infrastructure inspections by accelerating essential tasks of the inspector such as defect measurement, condition assessment and data processing to management systems. The human centered artificial intelligence will help the inspector collect more quantified and objective data while incorporating inspectors professional judgement. This study explains in detail the described system and related methodologies of implementing attention guided semi supervised deep learning into mixed reality technology, which interacts with the human inspector during assessment. Thereby, the inspector and the AI will collaborate or communicate for improved visual inspection.
Content may be subject to copyright.

## No file available

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Convolutional neural networks (CNNs) have shown remarkable results over the last several years for a wide range of computer vision tasks. A new architecture recently introduced by Sabour et al., referred to as a capsule networks with dynamic routing, has shown great initial results for digit recognition and small image classification. The success of capsule networks lies in their ability to preserve more information about the input by replacing max-pooling layers with convolutional strides and dynamic routing, allowing for preservation of part-whole relationships in the data. This preservation of the input is demonstrated by reconstructing the input from the output capsule vectors. Our work expands the use of capsule networks to the task of object segmentation for the first time in the literature. We extend the idea of convolutional capsules with locally-connected routing and propose the concept of deconvolutional capsules. Further, we extend the masked reconstruction to reconstruct the positive input class. The proposed convolutional-deconvolutional capsule network, called SegCaps, shows strong results for the task of object segmentation with substantial decrease in parameter space. As an example application, we applied the proposed SegCaps to segment pathological lungs from low dose CT scans and compared its accuracy and efficiency with other U-Net-based architectures. SegCaps is able to handle large image sizes (512 x 512) as opposed to baseline capsules (typically less than 32 x 32). The proposed SegCaps reduced the number of parameters of U-Net architecture by 95.4% while still providing a better segmentation accuracy.
Conference Paper
Full-text available
Article
Full-text available
Thesis
Full-text available
Article
Full-text available
New technology that intelligently combines the physical and digital worlds looks set to revolutionise the way civil engineers monitor infrastructure, both during and after construction. Ioannis Brilakis of the University of Cambridge reports.
Article
Full-text available
This study is to develop a detector that automatically detects cracks from the photographs of concrete structures, using convolution neural network which is a kind of deep learning. Firstly, photographs of concrete were collected for the learning data. Secondly, pictures of cracked part, chalk letter part, joint part, surface part and others part were produced from these photographs for the dataset. Thirdly, classifier to classify into these 5 class from pictures was created using the dataset and convolution neural network. Finally, the automatic detector was produced using this classifier.
Article
Full-text available
Deep learning is a subfield of machine learning, which aims to learn a hierarchy of features from input data. Nowadays, researchers have intensively investigated deep learning algorithms for solving challenging problems in many areas such as image classification, speech recognition, signal processing, and natural language processing. In this study, we not only review typical deep learning algorithms in computer vision and signal processing but also provide detailed information on how to apply deep learning to specific areas such as road crack detection, fault diagnosis, and human activity detection. Besides, this study also discusses the challenges of designing and training deep neural networks.
Conference Paper
Full-text available
We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. For $$300 \times 300$$ input, SSD achieves 74.3 % mAP on VOC2007 test at 59 FPS on a Nvidia Titan X and for $$512 \times 512$$ input, SSD achieves 76.9 % mAP, outperforming a comparable state of the art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size. Code is available at https:// github. com/ weiliu89/ caffe/ tree/ ssd.
Article
Full-text available
There is a need for rapid and objective assessment of concrete bridge decks for maintenance decision making. Infrared Thermography (IRT) has great potential to identify deck delaminations more objectively than routine visual inspections or chain drag tests. In addition, it is possible to collect reliable data rapidly with appropriate IRT cameras attached to vehicles and the data are analyzed effectively. This research compares three infrared cameras with different specifications at different times and speeds for data collection, and explores several factors affecting the utilization of IRT in regards to subsurface damage detection in concrete structures, specifically when the IRT is utilized for high-speed bridge deck inspection at normal driving speeds. These results show that IRT can detect up to 2.54 cm delamination from the concrete surface at any time period. It is observed that nighttime would be the most suitable time frame with less false detections and interferences from the sunlight and less adverse effect due to direct sunlight, making more " noise " for the IRT results. This study also revealed two important factors of camera specifications for high-speed inspection by IRT as shorter integration time and higher pixel resolution.
Article
Full-text available
We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the fully convolutional network (FCN) architecture and its variants. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. The design of SegNet was primarily motivated by road scene understanding applications. Hence, it is efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than competing architectures and can be trained end-to-end using stochastic gradient descent. We also benchmark the performance of SegNet on Pascal VOC12 salient object segmentation and the recent SUN RGB-D indoor scene understanding challenge. We show that SegNet provides competitive performance although it is significantly smaller than other architectures. We also provide a Caffe implementation of SegNet and a webdemo at http://mi.eng.cam.ac.uk/projects/segnet/
Article
Full-text available
Template warping is a popular technique in visionbased 3D motion tracking and 3D pose estimation due to its flexibility of being applicable to monocular video sequences. However, the method suffers from two major limitations that hamper its successful use in practice. First, it requires the camera to be calibrated prior to applying the method. Second, it may fail to provide good results if the inter-frame displacements are too large. To overcome the first problem, we propose to estimate the unknown focal length of the camera from several initial frames by an iterative optimization process. To alleviate the second problem, we propose a tracking method based on combining complementary information provided by dense optical flow and tracked SIFT features. While optical flow is good for small displacements and provides accurate local information, tracked SIFT features are better in handling larger displacements or global transformations. To combine these two complementary information, we introduce a "forgetting factor" to bootstrap the 3D pose estimates provided by SIFT features, and refine the final results using optical flow. Experiments are performed on three public databases, i.e. the BIWI head pose dataset, the BU dataset, and the McGill faces datasets. Results illustrate that the proposed solution provides more accurate results than baseline methods that rely solely on either template warping or SIFT features. In addition, the approach can be applied in a larger variety of scenarios, due to circumventing the need for camera calibration, providing thus a more flexible solution to the problem than existing methods.
Article
Full-text available
Detection of cracks on bridge decks is a vital task for maintaining the structural health and reliability of concrete bridges. Robotic imaging can be used to obtain bridge surface image sets for automated on-site analysis. We present a novel automated crack detection algorithm, the STRUM (spatially tuned robust multifeature) classifier, and demonstrate results on real bridge data using a state-of-the-art robotic bridge scanning system. By using machine learning classification, we eliminate the need for manually tuning threshold parameters. The algorithm uses robust curve fitting to spatially localize potential crack regions even in the presence of noise. Multiple visual features that are spatially tuned to these regions are computed. Feature computation includes examining the scale-space of the local feature in order to represent the information and the unknown salient scale of the crack. The classification results are obtained with real bridge data from hundreds of crack regions over two bridges. This comprehensive analysis shows a peak STRUM classifier performance of 95% compared with 69% accuracy from a more typical image-based approach. In order to create a composite global view of a large bridge span, an image sequence from the robot is aligned computationally to create a continuous mosaic. A crack density map for the bridge mosaic provides a computational description as well as a global view of the spatial patterns of bridge deck cracking. The bridges surveyed for data collection and testing include Long-Term Bridge Performance program's (LTBP) pilot project bridges at Haymarket, VA, USA, and Sacramento, CA, USA.
Article
Full-text available
To ensure the safety and the serviceability of civil infrastructure it is essential to visually inspect and assess its physical and functional condition. This review paper presents the current state of practice of assessing the visual condition of vertical and horizontal civil infrastructure; in particular of reinforced concrete bridges, precast concrete tunnels, underground concrete pipes, and asphalt pavements. Since the rate of creation and deployment of computer vision methods for civil engineering applications has been exponentially increasing, the main part of the paper presents a comprehensive synthesis of the state of the art in computer vision based defect detection and condition assessment related to concrete and asphalt civil infrastructure. Finally, the current achievements and limitations of existing methods as well as open research challenges are outlined to assist both the civil engineering and the computer science research community in setting an agenda for future research.
Article
Full-text available
Background Many context-aware techniques have been proposed to deliver cyber-information, such as project specifications or drawings, to on-site users by intelligently interpreting their environment. However, these techniques primarily rely on RF-based location tracking technologies (e.g., GPS or WLAN), which typically do not provide sufficient precision in congested construction sites or require additional hardware and custom mobile devices. Method This paper presents a new vision-based mobile augmented reality system that allows field personnel to query and access 3D cyber-information on-site by using photographs taken from standard mobile devices. The system does not require any location tracking modules, external hardware attachments, and/or optical fiducial markers for localizing a user’s position. Rather, the user’s location and orientation are purely derived by comparing images from the user’s mobile device to a 3D point cloud model generated from a set of pre-collected site photographs. Results The experimental results show that 1) the underlying 3D reconstruction module of the system generates complete 3D point cloud models of target scene, and is up to 35 times faster than other state-of-the-art Structure-from-Motion (SfM) algorithms, 2) the localization time takes at most few seconds in actual construction site. Conclusion The localization speed and empirical accuracy of the system provides the ability to use the system on real-world construction sites. Using an actual construction case study, the perceived benefits and limitations of the proposed method for on-site context-aware applications are discussed in detail.
Article
Full-text available
There is consensus on the importance of objectively and reliably assessing the condition and load capacity of aged bridges. Although each bridge may be considered as a unique structure, the behavior of many bridge types may be governed by only a few mechanisms and related parameters, especially if a population is constructed from standard designs. By identifying these parameters, and their variation within the population, it is possible to extend findings such as load rating obtained from a statistical sample to the entire population. Bridge type-specific strategies for load rating and condition assessment in conjunction with statistical sampling may therefore offer significant advantages for inspecting and load rating bridges sharing common materials, similar geometry and detailing, and the same critical behavior mechanisms. In this paper, the writers present their recent work on load rating of the reinforced concrete T-beam bridge population in Pennsylvania to objectively re-qualify them based on field-calibrated finite element models.
Article
Full-text available
Mixed Reality (MR) visual displays, a particular subset of Virtual Reality (VR) related technologies, involve the merging of real and virtual worlds somewhere along the 'virtuality continuum' which connects completely real environments to completely virtual ones. Augmented Reality (AR), probably the best known of these, refers to all cases in which the display of an otherwise real environment is augmented by means of virtual (computer graphic) objects. The converse case on the virtuality continuum is therefore Augmented Virtuality (AV). Six classes of hybrid MR display environments are identified. However quite different groupings are possible and this demonstrates the need for an efficient taxonomy, or classification framework, according to which essential differences can be identified. An approximately three-dimensional taxonomy is proposed comprising the following dimensions: extent of world knowledge, reproduction fidelity, and extent of presence metaphor.
Article
Full-text available
In this paper, we obtain sufficient conditions for the uniform convergence of trigonometric series with monotone (in the extended sense) coefficients.
Book
As virtual reality expands from the imaginary worlds of science fiction and pervades every corner of everyday life, it is becoming increasingly important for students and professionals alike to understand the diverse aspects of this technology. This book aims to provide a comprehensive guide to the theoretical and practical elements of virtual reality, from the mathematical and technological foundations of virtual worlds to the human factors and the applications that enrich our lives: in the fields of medicine, entertainment, education and others. After providing a brief introduction to the topic, the book describes the kinematic and dynamic mathematical models of virtual worlds. It explores the many ways a computer can track and interpret human movement, then progresses through the modalities that make up a virtual world: visual, acoustic and haptic. It explores the interaction between the actual and virtual environments, as well as design principles of the latter. The book closes with an examination of different applications, focusing on augmented reality as a special case. Though the content is primarily VR-related, it is also relevant for many other fields.
Article
Augmented reality (AR) allows to seamlessly insert virtual objects in an image sequence. In order to accomplish this goal, it is important that synthetic elements are rendered and aligned in the scene in an accurate and visually acceptable way. The solution of this problem can be related to a pose estimation or, equivalently, a camera localization process. This paper aims at presenting a brief but almost self-contented introduction to the most important approaches dedicated to vision-based camera localization along with a survey of several extension proposed in the recent years. For most of the presented approaches, we also provide links to code of short examples. This should allow readers to easily bridge the gap between theoretical aspects and practical implementations.
Article
In Civil Infrastructure System (CIS) applications, the requirement of blending synthetic and physical objects distinguishes Augmented Reality (AR) from other visualization technologies in three aspects: (1) it reinforces the connections between people and objects, and promotes engineers’ appreciation about their working context; (2) it allows engineers to perform field tasks with the awareness of both the physical and synthetic environment; and (3) it offsets the significant cost of 3D Model Engineering by including the real world background. This paper reviews critical problems in AR and investigates technical approaches to address the fundamental challenges that prevent the technology from being usefully deployed in CIS applications, such as the alignment of virtual objects with the real environment continuously across time and space; blending of virtual entities with their real background faithfully to create a sustained illusion of co-existence; and the integration of these methods to a scalable and extensible computing AR framework that is openly accessible to the teaching and research community. The research findings have been evaluated in several challenging CIS applications where the potential of having a significant economic and social impact is high. Examples of validation test beds implemented include an AR visual excavator-utility collision avoidance system that enables workers to “see” buried utilities hidden under the ground surface, thus helping prevent accidental utility strikes; an AR post-disaster reconnaissance framework that enables building inspectors to rapidly evaluate and quantify structural damage sustained by buildings in seismic events such as earthquakes or blasts; and a tabletop collaborative AR visualization framework that allows multiple users to observe and interact with visual simulations of engineering processes.
Article
Cracking can invite sudden failures of concrete structures. The objective of this research is to develop an integrated model based on digital image processing in developing the numerical representation of defects. The integration model consists of crack quantification, change detection, neural networks, and 3D visualization models to visualize the defects in such a way that it mimics the on-site visual inspections. The crack quantification model evaluates crack lengths based on the perimeter of the skeleton of a crack which considers the tortuosity of the crack. The change detection model is based on the Fourier Transform of digital images eliminating the need for image registration as required in the traditional. Also, the integrated model as proposed here for crack length and change detection is supported by neural networks to predict crack depth and 3D visualization of crack patterns considering crack density as a key attribute.
Article
This paper discusses the feasibility of using augmented reality AR to evaluate earthquake-induced building damage. In the proposed approach, previously stored building information is superimposed onto a real structure in AR. Structural damage can then be quantified by measuring and interpreting key differences between the real and augmented views of the facility. Proof-of-concept experi- ments were performed in conjunction with large-scale cyclic shear wall tests. In these, CAD images of the walls were superimposed onto the wall specimens. Then, as the wall specimens were deformed under applied loading, the horizontal drifts between the walls and the augmented images were computed using two different techniques and compared with actual wall drifts. The obtained results highlight the potential of using AR for rapid damage detection and indicate that the accuracy of structural displacements measured using AR is a direct function of the accuracy with which augmented images can be registered with the real world. The limitations of the technology, considerations for field implementation, and the potential for other related applications of AR are also discussed.
Article
Current inspection standards require an inspector to travel to a target structure site and visually assess the structure's condition. This approach is labor-intensive, yet highly qualitative. A less time-consuming and inexpensive alternative to current monitoring methods is to use a robotic system that could inspect structures more frequently, and perform autonomous damage detection. In this paper, a vision-based crack detection methodology is introduced. The proposed approach processes 2D digital images (image processing) by considering the geometry of the scene (computer vision). The crack segmentation parameters are adjusted automatically based on depth parameters. The depth perception is obtained using 3D scene reconstruction. This system extracts the whole crack from its background, where the regular edge-based approaches just segment the crack edges. This characteristic is appropriate for the development of a crack thickness quantification system. Experimental tests have been carried out to evaluate the performance of the proposed system.Highlights► A vision-based crack detection methodology is introduced and evaluated. ► The proposed approach utilizes depth perception to detect and segment cracks. ► The segmentation parameters are adjusted automatically based on depth parameters. ► The depth perception is obtained using 3D scene reconstruction.
Article
This paper describes research that investigated the application of the global positioning system and 3 degree-of-freedom 3-DOF angular tracking to address the registration problem during interactive visualization of construction graphics in outdoor augmented reality AR environments. The global position and the three-dimensional 3D orientation of a user's viewpoint are tracked, and this information is reconciled with the known global position and orientation of superimposed computer-aided design CAD objects. Based on this computation, the relative translation and axial rotations between the user's viewpoint and the CAD objects are continually calculated. The relative geometric transformations are then applied to the CAD objects inside a virtual viewing frustum that is coincided with the real world space that is in the user's view. The result is an augmented outdoor environment where superimposed graphical objects stay fixed to their real world locations as the user navigates. The algorithms are implemented in a software tool called UM-AR-GPS-ROVER that is capable of interactively placing static and dynamic 3D models at any location in outdoor augmented space. The concept and prototype are demonstrated with an example in which scheduled construction activities for the erection of a structural steel frame are graphically simulated in outdoor AR.
Conference Paper
Mixed reality systems seek to smoothly link the physical and data processing (digital) environments. Although mixed reality systems are becoming more prevalent, we still do not have a clear understanding of this interaction paradigm. Addressing this problem, this article introduces a new interaction model called Mixed Interaction model. It adopts a unified point of view on mixed reality systems by considering the interaction modalities and forms of multimodality that are involved for defining mixed environments. This article presents the model and its foundations. We then study its unifying and descriptive power by comparing it with existing classification schemes. We finally focus on the generative and evaluative power of the Mixed Interaction model by applying it to design and compare alternative interaction techniques in the context of RAZZLE, a mobile mixed reality game for which the goal of the mobile player is to collect digital jigsaw pieces localized in space.
Article
We live in a physical world whose properties we have come to know well through long familiarity. We sense an involvement with this physical world which gives us the ability to predict its properties well. For example, we can predict where objects will fall, how well-known shapes look from other angles, and how much force is required to push objects against friction. We lack corresponding familiarity with the forces on charged particles, forces in non-uniform fields, the effects of nonprojective geometric transformations, and high-inertia, low friction motion. A display connected to a digital computer gives us a chance to gain familiarity with concepts not realizable in the physical world. It is a looking glass into a mathematical wonderland. Computer displays today cover a variety of capabilities. Some have only the fundamental ability to plot dots. Displays being sold now generally have built in line-drawing capability. An ability to draw simple curves would be useful. Some available displays are able to plot very short line segments in arbitrary directions, to form characters or more complex curves. Each of these abilities has a history and a known utility.
Long-Term Bridge Performance High Priority Bridge Performance Issues
• M C Brown
• J P Gomez
• M L Hammer
• J M Hooks
Brown, M. C., J. P. Gomez, M. L. Hammer, and J. M. Hooks. Long-Term Bridge Performance High Priority Bridge Performance Issues. McLean, VA, 2014.
• R Azuma
• R Behringer
• S Feiner
• S Julier
• B Macintyre
Azuma, R., R. Behringer, S. Feiner, S. Julier, and B. Macintyre. Recent Advances in Augmented Reality. IEEE Computer Graphics and Applications, Vol. 2011, No. December, 2001, pp. 1-27. https://doi.org/10.4061/2011/908468.
The Leader in Mixed Reality Technology | HoloLens
• Microsoft
Microsoft. The Leader in Mixed Reality Technology | HoloLens. Microsoft.
Hybrid Sensor-Camera Monitoring for Damage Detection: Case Study of a Real Bridge
• R Zaurin
• T Khuc
• F N Catbas
• F Asce
Zaurin, R., T. Khuc, F. N. Catbas, and F. Asce. Hybrid Sensor-Camera Monitoring for Damage Detection: Case Study of a Real Bridge. Journal of Bridge Engineering, Vol. 21, No. 6, 2015, pp. 1-27. https://doi.org/10.1061/(ASCE)BE.1943.
Manual for Bridge Element Inspection
• Aashto
• Guide
AASHTO. Guide Manual for Bridge Element Inspection. Bridge Element Inspection Manual, 2011, p. 172.
Bridge Inspection Standards Regulations (NBIS)
• Fhwa
FHWA. National Bridge Inspection Standards Regulations (NBIS). Federal Register, Vol. 69, No. 239, 2004, pp. 15-35.
DOT Bridge Inspection Field Guide
• Fdot
• Florida
FDOT. Florida DOT Bridge Inspection Field Guide. 2016.
Overview of Human-Computer Collaboration
• M Hill
Hill, M. Overview of Human-Computer Collaboration. Vol. 8, No. June, 1995, pp. 67-81.
Programming 3D Applications with HTML5 and WebGL
• T Parisi
Parisi, T. Programming 3D Applications with HTML5 and WebGL. 2014.