Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Today’s deep learning architectures, if trained with proper dataset, can be used for object detection in marine search and rescue operations. In this paper a dataset for maritime search and rescue purposes is proposed. It contains aerial-drone videos with 40,000 hand-annotated persons and objects floating in the water, many of small size, which makes them difficult to detect. The second contribution is our proposed object detection method. It is an ensemble composed of a number of the deep convolutional neural networks, orchestrated by the fusion module with the nonlinearly optimized voting weights. The method achieves over 82% of average precision on the new aerial-drone floating objects dataset and outperforms each of the state-of-the-art deep neural networks, such as YOLOv3, -v4, Faster R-CNN, RetinaNet, and SSD300. The dataset is publicly available from the Internet.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This makes SaR operations much cheaper and safer by reducing the time and number of rescuers needed in emergencies. In recent years, SaR missions based on drones, which allow for quick aerial views of huge regions with potentially difficult-to-reach terrain, have been developed in many countries [1][2][3][4]. While object detection as a crucial SaR mission step has advanced somewhat, it is still far from meeting the technical requirements for ground application, which may warrant further research. ...
... • Heridal dataset [5] based on collection of images by unmanned helicopters, including 1650 high-resolution images (4000 × 3000) containing persons from non-urban areas, such as mountains, forests, oceans, and deserts. • AFO dataset [4], which contains 3647 images with close to 40,000 labeled floating objects (human, wind/sup-board, boat, buoy, sailboat, and kayak). ...
... In these datasets [4,5,16,33], we pay particular attention to samples of persons, removing the influence of other categories. Table 1 presents a comparison of the original and optimized datasets. ...
Article
Full-text available
Detecting sparse, small, lost persons with only a few pixels in high-resolution aerial images was, is, and remains an important and difficult mission, in which a vital role is played by accurate monitoring and intelligent co-rescuing for the search and rescue (SaR) system. However, many problems have not been effectively solved in existing remote-vision-based SaR systems, such as the shortage of person samples in SaR scenarios and the low tolerance of small objects for bounding boxes. To address these issues, a copy-paste mechanism (ISCP) with semi-supervised object detection (SSOD) via instance segmentation and maximum mean discrepancy distance is proposed (MMD), which can provide highly robust, multi-task, and efficient aerial-based person detection for the prototype SaR system. Specifically, numerous pseudo-labels are obtained by accurately segmenting the instances of synthetic ISCP samples to obtain their boundaries. The SSOD trainer then uses soft weights to balance the prediction entropy of the loss function between the ground truth and unreliable labels. Moreover, a novel evaluation metric MMD for anchor-based detectors is proposed to elegantly compute the IoU of the bounding boxes. Extensive experiments and ablation studies on Heridal and optimized public datasets demonstrate that our approach is effective and achieves state-of-the-art person detection performance in aerial images.
... • Due to the lack of available datasets that exhibit a combination of adverse effects, we generate a new dataset, namely LowVis-AFO (abbreviation for Low-Visibility Aerial Floating Objects dataset). We use AFO [15] as our ground truth dataset and synthesize dark hazy images. The data generation process has been elaborated in Section 4.1. ...
... Due to the lack of available datasets that meet our requirements, we generate a new one using the AFO dataset [15]. The dataset generation process has been elaborated below, and the final images have been shown in Figure 5. ...
... Ground Truth Images Generated Images Generated Images Figure 5. Visual illustration of a few sample images from our dataset. Columns 1 and 3 show original images taken from AFO Dataset [15], whereas Columns 2 and 4 show their corresponding images generated as explained in Section 4.1 simulating low-visibility conditions. ...
Preprint
Full-text available
Learning to recover clear images from images having a combination of degrading factors is a challenging task. That being said, autonomous surveillance in low visibility conditions caused by high pollution/smoke, poor air quality index, low light, atmospheric scattering, and haze during a blizzard becomes even more important to prevent accidents. It is thus crucial to form a solution that can result in a high-quality image and is efficient enough to be deployed for everyday use. However, the lack of proper datasets available to tackle this task limits the performance of the previous methods proposed. To this end, we generate the LowVis-AFO dataset, containing 3647 paired dark-hazy and clear images. We also introduce a lightweight deep learning model called Low-Visibility Restoration Network (LVRNet). It outperforms previous image restoration methods with low latency, achieving a PSNR value of 25.744 and an SSIM of 0.905, making our approach scalable and ready for practical use. The code and data can be found at https://github.com/Achleshwar/LVRNet.
... Liu et al. (2021) constructed a new maritime target dataset (MRSP-13) and proposed a cross-layer, multi-task CNN model for maritime target detection, which is capable of concurrently addressing ship target detection, classification, and segmentation tasks. Gasienica-Jozkowy et al. (2021) publicly released a maritime rescue dataset and proposed an object detection method based on deep convolutional neural networks, which achieved an average precision of 82% on the dataset. Ai et al. (2021) established a maritime SAR environment model using field data of the marine environment and electronic charts, and proposed a reinforcement learning-based autonomous coverage path planning model for SAR missions, planning the shortest search path and prioritizing high probability areas. ...
... To demonstrate the effectiveness of our proposed method, we conducted experiments on the sea dataset using six classic object detection algorithms, namely YOLOv3 (Redmon and Farhadi, 2018), YOLOv4 (Bochkovskiy et al., 2020), SSD (Liu et al., 2016), Faster-RCNN, RetinaNet (Lin et al., 2017), and Ensemble 3 from reference (Gasienica-Jozkowy et al., 2021). Results are shown in Table 5. Faster-RCNN is a classic two-stage detection method that generates candidate boxes through region proposal networks and then performs object classification and bounding box regression. ...
Article
Full-text available
Introduction The issue of low detection rates and high false negative rates in maritime search and rescue operations has been a critical problem in current target detection algorithms. This is mainly due to the complex maritime environment and the small size of most targets. These challenges affect the algorithms' robustness and generalization. Methods We proposed YOLOv7-CSAW, an improved maritime search and rescue target detection algorithm based on YOLOv7. We used the K-means++ algorithm for the optimal size determination of prior anchor boxes, ensuring an accurate match with actual objects. The C2f module was incorporated for a lightweight model capable of obtaining richer gradient flow information. The model's perception of small target features was increased with the non-parameter simple attention module (SimAM). We further upgraded the feature fusion network to an adaptive feature fusion network (ASFF) to address the lack of high-level semantic features in small targets. Lastly, we implemented the wise intersection over union (WIoU) loss function to tackle large positioning errors and missed detections. Results Our algorithm was extensively tested on a maritime search and rescue dataset with YOLOv7 as the baseline model. We observed a significant improvement in the detection performance compared to traditional deep learning algorithms, with a mean average precision (mAP) improvement of 10.73% over the baseline model. Discussion YOLOv7-CSAW significantly enhances the accuracy and robustness of small target detection in complex scenes. This algorithm effectively addresses the common issues experienced in maritime search and rescue operations, specifically improving the detection rates and reducing false negatives, proving to be a superior alternative to current target detection algorithms.
... Helicopters including search and recovery drones were increasingly outfitted with cameras, reflectors, coastal automated recognition system transceivers, tracing & navigational devices, or even mobile phone detectors [10]. Substantial effort has been attempted to strengthen communication networks and search path planning [19,30]. ...
... In Eqs. (6)(7)(8)(9)(10) where represents the logistic sigmoid function while i, f, o, & z are indeed the input, forget, output, as well as cell activation vectors, that have identical size as the concealed vector L. W Li denotes the hidden-input gate matrix, whereas W Ao denotes the input-output gate matrix. Because the weight matrix from cell to gate vectors is diagonal, element m in each gate vector only gets input from cell vector element m. ...
Article
Full-text available
It is a challenging problem and a little-researched subject that has recently gotten increased attention in the scientific community to identify items on drone footage. The shooting height and angle change when using a drone because the camera's position is not fixed during the shot, in addition to the weather and lighting circumstances. Offering a hybrid deep learning model for object detection that can aid in search and rescue operations is the goal of this research. The three working stages of this unique technique are pre-processing, feature extraction, and object recognition. First, the videos are turned into image frames, and then an improved region-based convolution neural network was used to detect objects in those frames (RCNN). Consequently, improved LGBP (Local Gabor Binary Pattern) features were also recovered from those images together with traditional ResNet and SIFT (scale-invariant feature transform) features. In the object recognition phase, the proposed African Vulture Updated Honey Badger Optimization (AVUHBO) will be applied to optimize the weight parameters of hybrid neural networks such as Bi-GRU and LSTM for object recognition based on the retrieved features. This optimization model boosts the classifier's performance to produce better results. In contrast to other approaches, our suggested AVUHBO achieves higher accuracy ratings of 0.91, 0.9, 0.91, and 0.9, while SHO only manages lower accuracy ratings of 0.84, 0.85, 0.87, and 0.88. This proves that our proposed AVUHBO can provide accurate object detection. The findings of the proposed object detection methodology are then compared to those of other existing techniques.
... Taken from Huang et al. [23]. 16 ix 4.1 (Top left) Shows the size and aspect ratios of the orignal-SSD-inspired anchor boxes used in most of the TFMG's sample SSD configurations. A dot shows an aspect ratio used by one or usually more anchor boxes. ...
... Besides further manual labelling on our part, there may be other shark researchers willing to share, or existing data sets online that could be combined with ours. One such example is the recent aerial data set created by Gasienica-Józkowy et al. [16]. Additionally, a transfer learning data set consisting of data more similar to ours than COCO may help. ...
Article
Recent years have seen several projects across the globe using drones to detect sharks, including several high profile projects around alerting beach authorities to keep people safe. However, so far many of these attempts have used cloud-based machine learning solutions for the detection component, which complicates setup and limits their use geographically to areas with internet connection. An on-device (or on-controller) shark detector would offer greater freedom for researchers searching for and tracking sharks in the field, but such a detector would need to operate under reduced resource constraints. To this end we look at SSD MobileNet, a popular object detection architecture that targets edge devices by sacrificing some accuracy. We look at the results of SSD MobileNet in detecting sharks from a data set of aerial images created by a collaboration between Cal Poly and CSU Long Beach’s Shark Lab. We conclude that SSD MobileNet does suffer from some accuracy issues with smaller objects in particular, and we note the importance of customized anchor box configuration.
... There are many other valuable remote sensing datasets. For example, there are some high-quality datasets for object detection [16][17][18][19][20], semantic segmentation [21][22][23][24][25], and instance segmentation [26][27][28][29][30]. These datasets contribute a lot to the overall remote sensing field. ...
Article
Full-text available
The prediction of a tropical cyclone’s trajectory is crucial for ensuring marine safety and promoting economic growth. Previous approaches to this task have been broadly categorized as either numerical or statistical methods, with the former being computationally expensive. Among the latter, multilayer perceptron (MLP)-based methods have been found to be simple but lacking in time series capabilities, while recurrent neural network (RNN)-based methods excel at processing time series data but do not integrate external information. Recent works have attempted to enhance prediction performance by simultaneously utilizing both time series and meteorological field data through feature fusion. However, these approaches have relatively simplistic methods for data fusion and do not fully explore the correlations between different modalities. To address these limitations, we propose a systematic solution called TC-TrajGRU for predicting tropical cyclone tracks. Our approach improves upon existing methods in two main ways. Firstly, we introduce a Spatial Alignment Feature Fusion (SAFF) module to address feature misalignment issues in different dimensions. Secondly, our Track-to-Velocity (T2V) module leverages time series differences to integrate external information. Our experiments demonstrate that our approach yields highly accurate predictions comparable to the official optimal forecast for a 12 h period.
... As shown in Section 4.3, YOLOv5m and FPN showed reasonable recognition performance on large object classes such as worker and hardhat, while recorded lower performance on small object classes such as harness, strap, and hook. This phenomenon was expected before the experiments, as previous studies, [44][45][46] identified the challenges of recognizing small objects in far-field monitoring settings. The challenges include a lack of visual features due to a small number of pixels, the color similarity between foreground and background, and irregular shapes of target objects. ...
Chapter
Context-aware safety monitoring based on computer vision has received relatively little attention, although it is critical for recognizing the working context of workers and performing precise safety assessment with respect to Personal Protective Equipment (PPE) compliance checks. To address this knowledge gap, this study proposes vision-based monitoring approaches for context-aware PPE compliance checks using a modularized analysis pipeline composed of object detection, semantic segmentation, and depth estimation. The efficacy of two different approaches under this methodology was examined using YUD-COSAv2 data collected from actual construction sites. In experiments, the proposed method was able to distinguish between workers at heights and on the ground, applying different PPE compliance rules for evaluating workers’ safety. The depth estimation model achieved an Average Precision of 78.50%, while the segmentation model achieved an Average Precision of 86.22%.
... In the UAVDT dataset, 75.0%, 15.0%, and 10.0% of the UAV images were used for training, testing, and validation, respectively. AFO [69]: This dataset was extracted from over 50 UAV videos for deep learning with resolutions ranging from 1280 × 720 to 3840 × 2160, and it includes 3647 images with 39,991 manually labeled individuals and floating objects on water, many of which are small-scale objects that are difficult to detect. In this dataset, 76.6%, 14.1%, and 9.3% of the images were used for training, testing, and validation, respectively. ...
Article
Full-text available
The extensive application of unmanned aerial vehicle (UAV) technology has increased academic interest in object detection algorithms for UAV images. Nevertheless, these algorithms present issues such as low accuracy, inadequate stability, and insufficient pre-training model utilization. Therefore, a high-quality object detection method based on a performance-improved object detection baseline and pretraining algorithm is proposed. To fully extract global and local feature information, a hybrid backbone based on the combination of convolutional neural network (CNN) and vision transformer (ViT) is constructed using an excellent object detection method as the baseline network for feature extraction. This backbone is then combined with a more stable and generalizable optimizer to obtain high-quality object detection results. Because the domain gap between natural and UAV aerial photography scenes hinders the application of mainstream pre-training models to downstream UAV image object detection tasks, this study applies the masked image modeling (MIM) method to aerospace remote sensing datasets with a lower volume than mainstream natural scene datasets to produce a pre-training model for the proposed method and further improve UAV image object detection accuracy. Experimental results for two UAV imagery datasets show that the proposed method achieves better object detection performance compared to state-of-the-art (SOTA) methods with fewer pre-training datasets and parameters.
... Arti cial intelligence and machine learning (ML) techniques [10,11] such as DL provide fresh opportunities to discover biomarkers for diagnosis of ASD taking into account factors like age and gender that affect ASD, to shorten the diagnostic process of ASD, to avoid subjective opinions of different doctors and possibly reach a de nitive diagnosis [12][13][14]. DL techniques have found extensive application in medical and neurological elds such as seizure detection [15], seizure prediction [16][17][18], epilepsy diagnosis and classi cation [19,20], autism [21][22][23], optimization of neuroprosthetic vision [24], post-stroke rehabilitation with motor imagery [25], sentiment analysis [26], emotion recognition [27,28], patient-speci c quality assurance [29], classi cation of the intracranial electrocorticogram [30], brain-computer interface (BCI) for discriminating hand motion planning [31], and many other elds such as mobile robots [32], drone-based water rescue and surveillance [33], and structural health monitoring in recent years [34][35][36]. ...
Preprint
Full-text available
The fact that the rapid and definitive diagnosis of autism cannot be made today and that autism cannot be treated provides an impetus to look into novel technological solutions. To contribute to the resolution of this problem through multiple classifications by considering age and gender factors, in this study, two quadruple and one octal classifications were performed using a deep learning (DL) approach. Gender in one of the four classifications and age groups in the other were considered. In the octal classification, classes were created considering gender and age groups. In addition to the diagnosis of ASD (Autism Spectrum Disorders), another goal of this study is to find out the contribution of gender and age factors to the diagnosis of ASD by making multiple classifications based on age and gender for the first time. Brain structural MRI (sMRI) scans of participators with ASD and TD (Typical Development) were pre-processed in the system originally designed for this purpose. Using the Canny Edge Detection (CED) algorithm, the sMRI image data was cropped in the data pre-processing stage, and the data set was enlarged five times with the data augmentation (DA) techniques. The most optimal convolutional neural network (CNN) models were developed using the grid search optimization (GSO) algorism. The proposed DL prediction system was tested with the five-fold cross-validation technique. The accuracy rates acquired for all three CNN models designed to be utilized within the system were compared with successfull pre-trained CNN models through the transfer learning (TL) practice. As a result, it was revealed that age and gender factors were effective in the diagnosis of ASD with the system developed for ASD multiple classifications, and it was seen that higher accuracy rates were achieved than pre-trained models.
... Topographic, climatic, and anthropic factors could be identified using deep learning, specifically convolutional neural networks (CNNs). These networks have the ability to recognize and classify images through specialized hidden layers with a hierarchy of extraction from simple to more complex patterns [18] [19][20] [21]. ...
Article
Full-text available
The destructive power of a landslide can seriously affect human beings and infrastructures. The prediction of this phenomenon is of great interest; however, it is a complex task in which traditional methods have limitations. In recent years, Artificial Intelligence has emerged as a successful alternative in the geological field. Most of the related works use classical machine learning algorithms to correlate the variables of the phenomenon and its occurrence. This requires large quantitative landslide datasets, collected and labeled manually, which is costly in terms of time and effort. In this work, we create an image dataset using an official landslide inventory, which we verified and updated based on journalistic information and interpretation of satellite images of the study area. The images cover the landslide crowns and the actual triggering values of the conditioning factors at the detail level (5 × 5 pixels). Our approach focuses on the specific location where the landslide starts and its proximity, unlike other works that consider the entire landslide area as the occurrence of the phenomenon. These images correspond to geological, geomorphological, hydrological and anthropological variables, which are stacked in a similar way to the channels of a conventional image to feed and train a convolutional neural network. Therefore, we improve the quality of the data and the representation of the phenomenon to obtain a more robust, reliable and accurate prediction model. The results indicate an average accuracy of 97.48%, which allows the generation of a landslide susceptibility map on the Aloag-Santo Domingo highway in Ecuador. This tool is useful for risk prevention and management in this area where small, medium and large landslides occur frequently.
... The dataset used in this experiment is the aerial-drone floating objects(AFO) dataset proposed by [30] specifically for the object detection work of floating objects, and the object category contains six categories: human, surfboard, boat, buoy, sailboat, and kayak. The dataset contains 3647 images and mostly large-size UAV images like 3840*2160, with more than 60,000 annotated objects. ...
Preprint
Full-text available
Maritime search and rescue is a crucial component of the national emergency response system, which currently mainly relies on Unmanned Aerial Vehicles (UAVs) to detect the objects. Most traditional object detection methods focus on boosting the detection accuracy while neglecting the detection speed of the heavy model. However, it is also essential to improve the detection speed which can provide timely maritime search and rescue. To address the issues, we propose a lightweight object detector named Shuffle-GhostNet-based detector (SG-Det). First, we construct a lightweight backbone, named Shuffle-GhostNet, which enhances the information flow between channel groups by redesigning the correlation group convolution and introducing the channel shuffle operation. Second, we propose an improved feature pyramid model, namely BiFPN-tiny, which has a lighter structure while being capable of reinforcing small object features. Furthermore, we incorporate the atrous spatial pyramid pooling module (ASPP) to the network, which employs atrous convolution with different sampling rates to obtain multi-scale information. Finally, we generate three sets of bounding boxes at different scales – large, medium, and small – to detect objects of different sizes. Compared with other lightweight detectors, SG-Det achieves better tradeoffs across performance metrics, and enables real-time detection with an accuracy rate of over 90% for maritime objects, which shows that it can be better meet the actual requirements of maritime search and rescue.
... It also implies discovering goals compatible with the functions assigned to the robot independently of the domain it finds itself in, as well as managing all the knowledge that is acquired in the process. This knowledge includes motivational knowledge, perceptual classification knowledge, modeling knowledge or skill related knowledge among others Unlike more standard AI and machine learning approaches [7][8][9][10] that usually concentrate on one skill and are generally based on off-line learning, this knowledge must be learnt through on-line interaction in continuous, uncertain, and dynamic domains These types of systems must also contemplate the capability of transferring knowledge [11] in an orderly and efficient manner. Consequently, an internal management structure in the form of some type of cognitive architecture must be established to provide for: a) open-ended learning, in this case providing components concerned with motivational and knowledge modelling aspects and b) lifelong learning, creating components devoted to contextual storage and knowledge transfer and reuse aspects. ...
Article
Full-text available
Achieving Lifelong Open-ended Learning Autonomy (LOLA) is a key challenge in the field of robotics to advance to a new level of intelligent response. Robots should be capable of discovering goals and learn skills in specific domains that permit achieving the general objectives the designer establishes for them. In addition, robots should reuse previously learnt knowledge in different domains to facilitate learning and adaptation in new ones. To this end, cognitive architectures have arisen which encompass different components to support LOLA. A key feature of these architectures is to implement a proper balance between deliberative and reactive processes that allows for efficient real time operation and knowledge acquisition, but this is still an open issue. First, objectives must be defined in a domain-independent representation that allows for the autonomous determination of domain-dependent goals. Second, as no explicit reward function is available, a method to determine expected utility must also be developed. Finally, policy learning may happen in an internal deliberative scale (dreaming), so it is necessary to provide an efficient way to infer relevant and reliable data for dreaming to be meaningful. The first two aspects have already been addressed in the realm of the e-MDB cognitive architecture. For the third one, this work proposes Perceptual Classes (P-nodes) as a metacognitive structure that permits generating relevant “dreamt” data points that allow creating “imagined” trajectories for deliberative policy learning in a very efficient way. The proposed structure has been tested by means of an experiment with a real robot in LOLA settings, where it has been shown how policy dreaming is possible in such a challenging realm.
... The collected images have been captured either from satellites or by drones. We make use of available published datasets [17], [18], [20]- [22], [28]- [30]. The collected images are all processed. ...
Conference Paper
Full-text available
Marine object detection and tracking is an important application for several disciplines such as sea surface monitoring, marine area management, ship collision avoidance, search and rescue missions, etc. Top-view scenes based on aerial or satellite imaging offer capturing objects from new angles of view or for locations that are not seen by capturing nodes fixed at the port side or mounted on moving boats. Moreover, artificial intelligence techniques based on deep learning provide robust solutions for classification and detection. Convolutional neural network (CNN) architectures are being used to detect multiple objects in images and videos. The achieved performance proves the relevance of CNNs in circumventing existing computer vision challenges. In this paper, we investigate the state-of-the-art CNN-based technique, so called You only look once (YOLO), to detect marine objects in images showing sea ships and humans from top-view. YOLO available models are trained using our collected dataset. The evaluation of the trained models illustrates the effectiveness of YOLO in detecting targeted classes (humans and sea ships) with high precision (90%). The deployment of the trained model on embedded edge devices achieves a high inference performance beyond 80 frames per second.
... It allows user to see infrared spectrum which is invisible with naked eye. Hence it's willingly used not only during daylight, but especially during nighttime or difficult weather conditions [2] [3]. In this section an overview of the influencing works related to the processing of thermal images, analysis and detection in infrared spectrum is presented and discussed. ...
... e approach, which is based mostly on unique aerialdrone flying object collection, achieves roughly 82 percent Advances in Materials Science and Engineering point accuracy and outperforms all state-of-the-art deep learning algorithms, including v4, faster R-CNN, SSD300, YOLOv3, and RetinaNet. e database is open to the public over the Internet [16]. ...
Article
Full-text available
During the recent decade, emerging technological and dramatic uses for drones were devised and accomplished, including rescue operations, monitoring, vehicle tracking, forest fire monitoring, and environmental monitoring, among others. Wildfires are one of the most significant environmental threats to wild areas and forest management. Traditional firefighting methods, which rely on ground operation inspections, have major limits and may threaten firefighters’ lives. As a result, remote sensing techniques, particularly UAV-based remotely sensed techniques, are currently among the most sought-after wildfire-fighting approaches. Current improvements in drone technology have resulted in significant breakthroughs that allow drones to perform a wide range of more sophisticated jobs. Rescue operations and forest monitoring, for example, demand a large security camera, making the drone a perfect tool for executing intricate responsibilities. Meanwhile, growing movement of the deep learning techniques in computer vision offers an interesting perspective into the project’s objective. They were used to identify forest fires in their beginning stages before they become out of control. This research describes a methodology for recognizing the presence of humans in a forest setting utilizing a deep learning framework and a human object detection method. The goal of identifying human presence in forestry areas is to prevent illicit forestry operations like illegal access into forbidden areas and illegal logging. In recent years, a lot of interest in automated wildfire identification utilizes UAV-based visual information and various deep learning techniques. This study focused on detecting wildfires at the beginning stages in forest and wilderness areas, utilizing deep learning-based computer vision algorithms that control and then mitigate massive damages to human life and forest management.
... Topographic, climatic, and anthropic factors could be identified using deep learning, specifically convolutional neural networks (CNNs). These networks have the ability to recognize and classify images through specialized hidden layers with a hierarchy of extraction from simple to more complex patterns [18] [19][20] [21]. ...
Chapter
Full-text available
Predicting landslides is a task of vital importance to prevent disasters, avoid human damage and reduce economic losses. Several research works have determined the suitability of Machine Learning techniques to address this problem. In the present study, we leverage a neural network model for landslide prediction developed in our previous work, in order to identify the specific areas where landslides are most likely to occur. We have created a dataset that collects an inventory of landslides and geological, geomorphological and meteorological conditioning factors of a region susceptible to this type of events. Among these variables, precipitation is widely recognized as a trigger of the phenomenon. In contrast to related works, we considered precipitation in a cumulative form with different time windows. The application of our model produces probability values which can be represented as multi-temporal landslide susceptibility maps. The distribution of the values in the different susceptibility classes is performed by means of equal intervals, quantile, and Jenks methods, whose comparison allowed us to select the most appropriate map for each cumulative precipitation. In this way, the areas of maximum risk are identified, as well as the specific locations with the highest probability of landslides. These products are valuable tools for risk management and prevention. KeywordsLandslideMachine LearningSusceptibility MapSupport Vector MachineRandom ForestMulti-layer Perceptron
... We also tested MCGR for another aerial dataset by Gąsienica-Józkowy et al. called the Aerial dataset of Floating Objects (AFO), where the dataset contains tiny objects with six classes (Gąsienica-Józkowy et al., 2021). We used the pre-trained MCGR weights of the 5-class network and performed transfer learning to train on the AFO dataset for an additional 50 epochs. ...
Article
For the past two decades, there have been significant efforts to develop methods for object detection in Remote Sensing (RS) images. In most cases, the datasets for small object detection in remote sensing images are inadequate. Many researchers used scene classification datasets for object detection, which has its limitations; for example, the large-sized objects outnumber the small objects in object categories. Thus, they lack diversity; this further affects the detection performance of small object detectors in RS images. This paper reviews current datasets and object detection methods (deep learning-based) for remote sensing images. We also propose a large-scale, publicly available benchmark Remote Sensing Super-resolution Object Detection (RSSOD) dataset. The RSSOD dataset consists of 1,759 hand-annotated images with 22,091 instances of very high-resolution (VHR) images with a spatial resolution of ∼0.05 m. There are five classes with varying frequencies of labels per class; the images are annotated in You Only Look Once (YOLO) and Common Objects in Context (COCO) format. The image patches are extracted from satellite images, including real image distortions such as tangential scale distortion and skew distortion. The proposed RSSOD dataset will help researchers benchmark the state-of-the-art object detection methods across various classes, especially for small objects using image super-resolution. We also propose a novel Multi-class Cyclic super-resolution Generative adversarial network with Residual feature aggregation (MCGR) and auxiliary YOLOv5 detector to benchmark image super-resolution-based object detection and compare with the existing state-of-the-art methods based on image super-resolution (SR). The proposed MCGR achieved state-of-the-art performance for image SR with an improvement of 1.2dB in peak signal-to-noise ratio (PSNR) compared to the current state-of-the-art non-local sparse network (NLSN). MCGR achieved best object detection mean average precisions (mAPs) of 0.758, 0.881, 0.841, and 0.983, respectively, for five-class, four-class, two-class, and single classes, respectively surpassing the performance of the state-of-the-art object detectors YOLOv5, EfficientDet, Faster RCNN, SSD, and RetinaNet.
... In the past, the research community in artificial Intelligence (AI) has actively engaged with the health & care sector and produced relevant neural, fuzzy and evolutionary classifiers for early diagnosis and prognosis [7][8][9][10][11]. Given the recent exceptional results obtained with Deep Learning (DL) for image processing, see e.g., [12][13][14][15], the most logical choice for designing early detection systems based on X-ray images is to employ this AI paradigm. In terms of implementation, the most successful DL architecture seems to be those based on Convolutional Neural Networks (CNNs), which have already been adopted in recent studies on COVID-19 detection [4,[16][17][18][19]. ...
Article
Full-text available
This article proposes a framework that automatically designs classifiers for the early detection of COVID-19 from chest X-ray images. To do this, our approach repeatedly makes use of a heuristic for optimisation to efficiently find the best combination of the hyperparameters of a convolutional deep learning model. The framework starts with optimising a basic convolutional neural network which represents the starting point for the evolution process. Subsequently, at most two additional convolutional layers are added, at a time, to the previous convolutional structure as a result of a further optimisation phase. Each performed phase maximises the the accuracy of the system, thus requiring training and assessment of the new model, which gets gradually deeper, with relevant COVID-19 chest X-ray images. This iterative process ends when no improvement, in terms of accuracy, is recorded. Hence, the proposed method evolves the most performing network with the minimum number of convolutional layers. In this light, we simultaneously achieve high accuracy while minimising the presence of redundant layers to guarantee a fast but reliable model. Our results show that the proposed implementation of such a framework achieves accuracy up to 99.11%, thus being particularly suitable for the early detection of COVID-19.
Article
Learning to recover clear images from images having a combination of degrading factors is a challenging task. That being said, autonomous surveillance in low visibility conditions caused by high pollution/smoke, poor air quality index, low light, atmospheric scattering, and haze during a blizzard, etc, becomes even more important to prevent accidents. It is thus crucial to form a solution that can not only result in a high-quality image but also which is efficient enough to be deployed for everyday use. However, the lack of proper datasets available to tackle this task limits the performance of the previous methods proposed. To this end, we generate the LowVis-AFO dataset, containing 3647 paired dark-hazy and clear images. We also introduce a new lightweight deep learning model called Low-Visibility Restoration Network (LVRNet). It outperforms previous image restoration methods with low latency, achieving a PSNR value of 25.744 and an SSIM of 0.905, hence making our approach scalable and ready for practical use.
Article
The accuracy and reliability requirements in aerospace manufacturing processes are some of the most demanding in industry. One of the first steps is detection and precise measurement using artificial vision models to accurately process the part. However, these systems require complex adjustments and do not work correctly in uncontrolled scenarios, but require manual supervision, which reduces the autonomy of automated machinery. To solve these problems, this paper proposes a convolutional neural network for the detection and measurement of drills and other fixation elements in an uncontrolled industrial manufacturing environment. In addition, a fine-tuning algorithm is applied to the results obtained from the network, and a new metric is defined to evaluate the quality of detection. The efficiency and robustness of the proposed method were verified in a real production environment, with 99.7% precision, 97.6% recall and an overall quality factor of 96.0%. The reduction in operator intervention went from 13.3% to 0.6%. The presented work will allow the competitiveness of aircraft component manufacturing processes to increase, and working environments will be safer and more efficient.
Article
A Generative Adversarial Network (GAN) can learn the relationship between two image domains and achieve unpaired image-to-image translation. One of the breakthroughs was Cycle-consistent Generative Adversarial Networks (CycleGAN), which is a popular method to transfer the content representations from the source domain to the target domain. Existing studies have gradually improved the performance of CycleGAN models by modifying the network structure or loss function of CycleGAN. However, these methods tend to suffer from training instability and the generators lack the ability to acquire the most discriminating features between the source and target domains, thus making the generated images of low fidelity and few texture details. To overcome these issues, this paper proposes a new method that combines Evolutionary Algorithms (EAs) and Attention Mechanisms to train GANs. Specifically, from an initial CycleGAN, binary vectors indicating the activation of the weights of the generators are progressively improved upon by means of an EA. At the end of this process, the best-performing configurations of generators can be retained for image generation. In addition, to address the issues of low fidelity and lack of texture details on generated images, we make use of the channel attention mechanism. The latter component allows the candidate generators to learn important features of real images and thus generate images with higher quality. The experiments demonstrate qualitatively and quantitatively that the proposed method, namely, Attention evolutionary GAN (AevoGAN) alleviates the training instability problems of CycleGAN training. In the test results, the proposed method can generate higher quality images and obtain better results than the CycleGAN training methods present in the literature, in terms of Inception Score (IS), Fréchet Inception Distance (FID) and Kernel Inception Distance (KID).
Article
Video feeds from traffic cameras can be useful for many purposes, the most critical of which are related to monitoring road safety. Vehicle trajectory is a key element in dangerous behavior and traffic accidents. In this respect, it is crucial to detect those anomalous vehicle trajectories, that is, trajectories that depart from usual paths. In this work, a model is proposed to automatically address that by using video sequences from traffic cameras. The proposal detects vehicles frame by frame, tracks their trajectories across frames, estimates velocity vectors, and compares them to velocity vectors from other spatially adjacent trajectories. From the comparison of velocity vectors, trajectories that are very different (anomalous) from neighboring trajectories can be detected. In practical terms, this strategy can detect vehicles in wrong-way trajectories. Some components of the model are off-the-shelf, such as the detection provided by recent deep learning approaches; however, several different options are considered and analyzed for vehicle tracking. The performance of the system has been tested with a wide range of real and synthetic traffic videos.
Article
Structural health monitoring (SHM) plays an increasingly vital role in guaranteeing the operation success of marine and offshore (MO) structures while reducing their risks of structural failure. In this study, a real‐time MO‐SHM system based on the highly controllable underwater robots is developed, which is functionalized with three modules, that is, underwater monitoring robots, vision‐based image processing and analyzing, and time‐dependent damage assessing and early warning. The robotic module is actuated by the hybrid driving method that combines the combustion‐based actuators (ejection) and propeller thrusters (propulsion) for transient actuation ability and well reliability in complex underwater environments. The image processing and analyzing module is conducted based on the You Only Look Once (YOLO)‐Underwater model that is expanded by the transfer learning and attention mechanisms and the underwater preconditioning and warning correction added for the specific underwater concrete identification. The damage assessing and early warning module is achieved by the time‐dependent hybrid analytic hierarchy process method that is integrated with the ordered weighted averaging and entropy weight method for more objective assessment and feedback. The reported MO‐SHM system provides design guidance for the next‐generation multifunctional underwater devices that combine analyzing and early warning for underwater concrete structures.
Article
Cracks are common defects in slab tracks, which can grow and expand over time, leading to a deterioration of the mechanical properties of slab tracks and shortening service life. Therefore, it is essential to accurately detect and repair cracks before they impact services. This study developed a systematic pixel‐level crack segmentation–quantification method suited for nighttime detection of slab tracks. To be specific, slab track crack network II, a pixel‐level segmentation network that aggregates multi‐scale information was proposed to extract the morphology of slab track cracks, and then their widths were calculated by an alternative quantification method proposed in the paper. The model performs best when the initial learning rate is 0.0001, with intersection over unions (IOUs) 84.94% and 83.84% observed on the training set and validation set, respectively. In the test set, the IOU value is 81.07%, higher than that derived from similar segmentation algorithms, indicating higher robustness and better generalization of the network architecture. In addition, the average errors in predicting crack widths resulting from the proposed method are 0.13 and 0.12 mm, compared to the results measured by a vernier caliper and a 3D scanner, respectively. The proposed pixel‐level segmentation–quantification system provides a new method and theoretical support for slab track maintenance and repair.
Article
Scheduling is a frequently studied combinatorial optimisation that often needs to be solved under dynamic conditions and to optimise multiple criteria. The most commonly used method for solving dynamic problems are dispatching rules (DRs), simple constructive heuristics that build the schedule incrementally. Since it is difficult to design DRs manually, they are often created automatically using genetic programming. Although such rules work well, their performance is still limited and various methods, especially ensemble learning, are used to improve them. So far, ensembles have only been used in the context of single-objective scheduling problems This study aims to investigate the possibility of constructing ensembles of DRs for solving multi-objective (MO) scheduling problems. To this end, an existing ensemble construction method called SEC is adapted by extending it with non-dominated sorting to construct Pareto fronts of ensembles for a given MO problem. In addition, the algorithms NSGA-II and NSGA-III were adapted to construct ensembles and compared with the SEC method to demonstrate their effectiveness. All methods were evaluated on four MO problems with different number of criteria to be optimised. The results show that ensembles of DRs achieve better Pareto fronts compared to individual DRs. Moreover, the results show that SEC achieves equally good or even slightly better results than NSGA-II and NSGA-III when constructing ensembles, while it is simpler and slightly less computationally expensive. This shows the potential of using ensembles to increase the performance of individual DRs for MO problems.
Article
The dilemma between stability and plasticity is crucial in machine learning, especially when non-stationary input distributions are considered. This issue can be addressed by continual learning in order to alleviate catastrophic forgetting. This strategy has been previously proposed for supervised and reinforcement learning models. However, little attention has been devoted to unsupervised learning. This work presents a dynamic learning rate framework for unsupervised neural networks that can handle non-stationary distributions. In order for the model to adapt to the input as it changes its characteristics, a varying learning rate that does not merely depend on the training step but on the reconstruction error has been proposed. In the experiments, different configurations for classical competitive neural networks, self-organizing maps and growing neural gas with either per-neuron or per-network dynamic learning rate have been tested. Experimental results on document clustering tasks demonstrate the suitability of the proposal for real-world problems.
Article
Free access until 04/06/2023 at https://authors.elsevier.com/c/1gbOt3IhXMxU96
Article
Regular detection of defects in drainage pipelines is crucial. However, some problems associated with pipeline defect detection, such as data scarcity and defect counting difficulty, need to be addressed. Therefore, a Transformer‐optimized generation, detection, and counting method for drainage‐pipeline defects was established in this paper. First, a generation network called Trans‐GAN‐Cla was developed for data augmentation. A classification network was trained to improve the quality of the generated images. Second, a detection and tracking model called Trans‐Det‐Tra was developed to track and count the number of defects. Third, the feature extraction capability of the proposed method was improved by leveraging Transformers. Compared with some well‐known convolutional neural network‐based methods, the proposed network achieved the best classification and detection accuracies of 87.2% and 87.57%, respectively. Furthermore, the F1 scores were 87.7% and 91.9%. Finally, two pieces of onsite videos were detected and tracked, and the numbers of misalignments and obstacles were accurately counted. The results indicate that the established Transformer‐optimized method can generate high‐quality images and realize the high‐accuracy detection and counting of drainage pipeline defects.
Article
Full-text available
In the last years, the number of machine learning algorithms and their parameters has increased significantly. On the one hand, this increases the chances of finding better models. On the other hand, it increases the complexity of the task of training a model, as the search space expands significantly. As the size of datasets also grows, traditional approaches based on extensive search start to become prohibitively expensive in terms of computational resources and time, especially in data streaming scenarios. This paper describes an approach based on meta-learning that tackles two main challenges. The first is to predict key performance indicators of machine learning models. The second is to recommend the best algorithm/configuration for training a model for a given machine learning problem. When compared to a state-of-the-art method (AutoML), the proposed approach is up to 130x faster and only 4% worse in terms of average model quality. Hence, it is especially suited for scenarios in which models need to be updated regularly, such as in streaming scenarios with big data, in which some accuracy can be traded for a much shorter model training time.
Conference Paper
Full-text available
Boats and ships have always been used throughout history as one of the main types of transportation. In recent years, due to the fast evolution of deep learning techniques and online datasets available, convolutional neural networks (CNN) have been widely used for ship and boat detection applications, such as surveillance of marine resources, helping in maritime rescue, monitoring illegal marine activities, among others. In this paper, we present a robust and efficient CNN-based on state-of-the-art YOLO model to perform boat and other water vehicles detection. The training dataset was built considering boats of different sizes, located on the coast and sea and taken with drones and satellites. We also applied data augmentation techniques such as flipping, cropping and changing brightness to increase the number of samples and improve the model robustness. A case study is presented considering a multi Unmanned Aerial Vehicles (UAV) to detect boats in a Coral Reefs Environmental Protection Area (APARC), where human activity is limited. We evaluated the developed system considering a testing dataset with images of the case study, achieving a recognition rate of 87,2% and a mean average precision of 97,23%.
Article
Identifying photovoltaic (PV) parameters accurately and reliably can be conducive to the effective use of solar energy. The grey wolf optimizer (GWO) that was proposed recently is an effective nature-inspired method and has become an effective way to solve PV parameter identification. However, determining PV parameters is typically regarded as a multimodal optimization, which is a challenging optimization problem; thus, the original GWO still has the problem of insufficient accuracy and reliability when identifying PV parameters. In this study, an enhanced grey wolf optimizer with fusion strategies (EGWOFS) is proposed to overcome these shortcomings. First, a modified multiple learning backtracking search algorithm (MMLBSA) is designed to ameliorate the global exploration potential of the original GWO. Second, a dynamic spiral updating position strategy (DSUPS) is constructed to promote the performance of local exploitation. Finally, the proposed EGWOFS is verified by two groups of test data, which include three types of PV test models and experimental data extracted from the manufacturer’s data sheet. Experiments show that the overall performance of the proposed EGWOFS achieves competitive or better results in terms of accuracy and reliability for most test models.
Preprint
Full-text available
Earth observation, aiming at monitoring the state of planet Earth using remote sensing data, is critical for improving our daily lives and living environment. With an increasing number of satellites in orbit, more and more datasets with diverse sensors and research domains are published to facilitate the research of the remote sensing community. In this paper, for the first time, we present a comprehensive review of more than 400 publicly published datasets, including applications like, land use/cover, change/disaster monitoring, scene understanding, agriculture, climate change and weather forecasting. We systemically analyze these Earth observation datasets from five aspects, including the volume, bibliometric analysis, research domains and the correlation between datasets. Based on the dataset attributes, we propose to measure, rank and select datasets to build a new benchmark for model evaluation. Furthermore, a new platform for Earth observation, termed EarthNets, is released towards a fair and consistent evaluation of deep learning methods on remote sensing data. EarthNets supports standard dataset libraries and cutting-edge deep learning models to bridge the gap between remote sensing and the machine learning community. Based on the EarthNets platform, extensive deep learning methods are evaluated on the new benchmark. The insightful results are beneficial to future research. The platform, dataset collections are publicly available at https://earthnets.nicepage.io.
Article
Full-text available
Training a model using batch learning requires uniform data storage in a repository. This approach is intrusive, as users have to expose their privacy and exchange sensitive data by sending them to central entities to be preprocessed. Unlike the aforementioned centralized approach, training of intelligent models via the federated learning (FEDL) mechanism can be carried out using decentralized data. This process ensures that privacy and protection of sensitive information can be managed by a user or an organization, employing a single universal model for all users. This model should apply average aggregation methods to the set of cooperative training data. This raises serious concerns for the effectiveness of this universal approach and, therefore, for the validity of FEDL architectures in general. Generally, it flattens the unique needs of individual users without considering the local events to be managed. This paper proposes an innovative hybrid explainable semi-personalized federated learning model, that utilizes Shapley Values and Lipschitz Constant techniques, in order to create personalized intelligent models. It is based on the needs and events that each individual user is required to address in a federated format. Explanations are the assortment of characteristics of the interpretable system, which, in the case of a specified illustration, helped to bring about a conclusion and provided the function of the model on both local and global levels. Retraining is suggested only for those features for which the degree of change is considered quite important for the evolution of its functionality.
Article
Automated machine learning (AutoML) supports ML engineers and data scientist by automating single tasks like model selection and hyperparameter optimization, automatically generating entire ML pipelines. This article presents a survey of 20 state-of-the-art AutoML solutions, open source and commercial. There is a wide range of functionalities, targeted user groups, support for ML libraries, and degrees of maturity. Depending on the AutoML solution, a user may be locked into one specific ML library technology or one product ecosystem. Additionally, the user might require some expertise in data science and programming for using the AutoML solution. We propose a concept called OMA-ML (Ontology-based Meta AutoML) that combines the features of existing AutoML solutions by integrating them (Meta AutoML). OMA-ML can incorporate any AutoML solution allowing various user groups to generate ML pipelines with the ML library of choice. An ontology is the information backbone of OMA-ML. OMA-ML is being implemented as an open source solution with currently third-party 7 AutoML solutions being integrated.
Article
Although previous research laid the foundation for vision‐based monitoring systems using convolutional neural networks (CNNs), too little attention has been paid to the challenges associated with data imbalance and varying object sizes in far‐field monitoring. To fill the knowledge gap, this paper investigates various loss functions to design a customized loss function to address the challenges. Scaffold installation operations recorded by camcorders were selected as the subject of analysis in a far‐field surveillance setting. It was confirmed that the data imbalance between the workers, hardhats, harnesses, straps, and hooks caused poor performances especially for small size objects. This problem was mitigated by employing a region‐based loss and Focal loss terms in the loss function of segmentation models. The findings illustrate the importance of the loss function design in improving performance of CNN models for far‐field construction site monitoring.
Article
The automation in the diagnosis of medical images is currently a challenging task. The use of Computer Aided Diagnosis (CAD) systems can be a powerful tool for clinicians, especially in situations when hospitals are overflowed. These tools are usually based on artificial intelligence (AI), a field that has been recently revolutionized by deep learning approaches. blackThese alternatives usually obtain a large performance based on complex solutions, leading to a high computational cost and the need of having large databases. In this work, we propose a classification framework based on sparse coding. Images are blackfirst partitioned into different tiles, and a dictionary is built after applying PCA to these tiles. The original signals are then transformed as a linear combination of the elements of the dictionary. blackThen, they are reconstructed by iteratively deactivating the elements associated with each component. Classification is finally performed employing as features the subsequent reconstruction errors. Performance is evaluated in a real context where distinguishing between four different pathologies: control versus bacterial pneumonia versus viral pneumonia versus COVID-19. blackOur system differentiates between pneumonia patients and controls with an accuracy of 97.74%, whereas in the 4-class context the accuracy is 86.73%. The excellent results and the pioneering use of sparse coding in this scenario evidence that our proposal can assist clinicians when their workload is high.
Article
Full-text available
Mobile robots such as unmanned aerial vehicles (drones) can be used for surveillance, monitoring and data collection in buildings, infrastructure and environments. The importance of accurate and multifaceted monitoring is well known to identify problems early and prevent them escalating. This motivates the need for flexible, autonomous and powerful decision-making mobile robots. These systems need to be able to learn through fusing data from multiple sources. Until very recently, they have been task specific. In this paper, we describe a generic navigation algorithm that uses data from sensors on-board the drone to guide the drone to the site of the problem. In hazardous and safety-critical situations, locating problems accurately and rapidly is vital. We use the proximal policy optimisation deep reinforcement learning algorithm coupled with incremental curriculum learning and long short-term memory neural networks to implement our generic and adaptable navigation algorithm. We evaluate different configurations against a heuristic technique to demonstrate its accuracy and efficiency. Finally, we consider how safety of the drone could be assured by assessing how safely the drone would perform using our navigation algorithm in real-world scenarios.
Article
Full-text available
The detection of objects in very high-resolution (VHR) remote sensing images has become increasingly popular with the enhancement of remote sensing technologies. High-resolution images from aircrafts or satellites contain highly detailed and mixed backgrounds that decrease the success of object detection in remote sensing images. In this study, a model that performs weighted ensemble object detection using optimized coefficients is proposed. This model uses the outputs of three different object detection models trained on the same dataset. The model’s structure takes two or more object detection methods as its input and provides an output with an optimized coefficient-weighted ensemble. The Northwestern Polytechnical University Very High Resolution 10 (NWPU-VHR10) and Remote Sensing Object Detection (RSOD) datasets were used to measure the object detection success of the proposed model. Our experiments reveal that the proposed model improved the Mean Average Precision (mAP) performance by 0.78%–16.5% compared to stand-alone models and presents better mean average precision than other state-of-the-art methods (3.55% higher on the NWPU-VHR-10 dataset and 1.49% higher when using the RSOD dataset).
Article
Full-text available
Gaining access to labeled reference data is one of the great challenges in supervised machine-learning endeavors. This is especially true for an automat ed analysis of remote sensing images on a global scale, which enables us to address global challenges, such as urbanization and climate change, using state-of-the-art machine-learning techniques. To meet these pressing needs, especially in urban research, we provide open access to a valuable benchmark data set, So2Sat LCZ42, which consists of local-climate-zone (LCZ) labels of approximately half a million Sentinel-1 and Sentinel-2 image patches in 42 urban agglomerations (plus 10 additional smaller areas) across the globe.
Article
Full-text available
Data streaming classification has become an essential task in many fields where real-time decisions have to be made based on incoming information. Neural networks are a particularly suitable technique for the streaming scenario due to their incremental learning nature. However, the high computation cost of deep architectures limits their applicability to high-velocity streams, hence they have not yet been fully explored in the literature. Therefore, in this work, we aim to evaluate the effectiveness of complex deep neural networks for supervised classification in the streaming context. We propose an asynchronous deep learning framework in which training and testing are performed simultaneously in two different processes. The data stream entering the system is dual fed into both layers in order to concurrently provide quick predictions and update the deep learning model. This separation reduces processing time while obtaining high accuracy on classification. Several time-series datasets from the UCR repository have been simulated as streams to evaluate our proposal, which has been compared to other methods such as Hoeffding trees, drift detectors, and ensemble models. The statistical analysis carried out verifies the improvement in performance achieved with our dual-pipeline deep learning framework, that is also competitive in terms of computation time.
Article
Full-text available
Crack information provides important evidence of structural degradation and safety in civil structures. Existing inspection methods are inefficient and difficult to rapidly deploy. A real‐time crack inspection method is proposed in this study to address this difficulty. Within this method, a wall‐climbing unmanned aerial system (UAS) is developed to acquire detailed crack images without distortion, then a wireless data transmission method is applied to fulfill real‐time detection requirements, allowing smartphones to receive real‐time video taken from the UAS. Next, an image data set including 1,330 crack images taken by the wall‐climbing UAS is established and used for training a deep‐learning model. For increasing detection speed, state‐of‐the‐art convolutional neural networks (CNNs) are compared and employed to train the crack detector; the selected model is transplanted into an android application so that the detection of cracks can be undertaken on a smartphone in real time. Following this, images with cracks are separated and crack width is calculated using an image processing method. The proposed method is then applied to a building where crack information is acquired and calculated accurately with high efficiency, thus verifying the practicability of the proposed method and system.
Article
Full-text available
Maritime search and rescue (SAR) decisions are the most important part of maritime SAR operations. In the process of making maritime SAR decisions, a key factor affecting efficiency and success rate is how to quickly respond to accidents and develop an emergency response plan. At present, maritime SAR emergency response plans are still mostly obtained through a combination of drift prediction models and SAR experience. There is a lack of SAR resource scheduling and SAR task assignment. The primary purpose of this paper is to explore the possibility of using an intelligent decision-making algorithm to formulate maritime SAR emergency response plans so as to produce results more scientifically. First, the relevant research areas and research data are briefly introduced, and the main mathematical models involved in the optimal search theory are expounded. Next, key technologies involved in the process of maritime SAR emergency response plan generation, including the determination of search area, the scheduling of SAR resources, the allocation of search tasks, and the planning of search routes, are analyzed in detail. Two optimization model algorithms, namely the SAR resource scheduling model based on genetic simulated annealing algorithm (GSAA) and the regional task allocation algorithm based on space-time characteristics, are proposed as approaches to solving the problem of resource scheduling and task allocation. Finally, the effectiveness and optimization of the proposed algorithms are verified by analyzing the emergency response of a real case which occurred in the Bohai Sea and comparing the different schemes. Through the algorithm proposed in this paper, the efficiency of maritime SAR operations can be effectively improved and the loss of life and property can be reduced.
Article
Full-text available
Object detection is one of the most important and challenging branches of computer vision, which has been widely applied in people’s life, such as monitoring security, autonomous driving and so on, with the purpose of locating instances of semantic objects of a certain class. With the rapid development of deep learning algorithms for detection tasks, the performance of object detectors has been greatly improved. In order to understand the main development status of object detection pipeline thoroughly and deeply, in this survey, we analyze the methods of existing typical detection models and describe the benchmark datasets at first. Afterwards and primarily, we provide a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors. Moreover, we list the traditional and new applications. Some representative branches of object detection are analyzed as well. Finally, we discuss the architecture of exploiting these object detection methods to build an effective and efficient system and point out a set of development trends to better follow the state-of-the-art algorithms and further research.
Article
Full-text available
Unmanned aerial vehicles (UAVs) play a primary role in a plethora of technical and scientific fields owing to their wide range of applications. In particular, the provision of emergency services during the occurrence of a crisis event is a vital application domain where such aerial robots can contribute, sending out valuable assistance to both distressed humans and rescue teams. Bearing in mind that time constraints constitute a crucial parameter in search and rescue (SAR) missions, the punctual and precise detection of humans in peril is of paramount importance. The paper in hand deals with real-time human detection onboard a fully autonomous rescue UAV. Using deep learning techniques, the implemented embedded system was capable of detecting open water swimmers. This allowed the UAV to provide assistance accurately in a fully unsupervised manner, thus enhancing first responder operational capabilities. The novelty of the proposed system is the combination of global navigation satellite system (GNSS) techniques and computer vision algorithms for both precise human detection and rescue apparatus release. Details about hardware configuration as well as the system’s performance evaluation are fully discussed.
Article
Full-text available
In recent years, demand has been increasing for target detection and tracking from aerial imagery via drones using onboard powered sensors and devices. We propose a very effective method for this application based on a deep learning framework. A state-of-the-art embedded hardware system empowers small flying robots to carry out the real-time onboard computation necessary for object tracking. Two types of embedded modules were developed: one was designed using a Jetson TX or AGX Xavier, and the other was based on an Intel Neural Compute Stick. These are suitable for real-time onboard computing power on small flying drones with limited space. A comparative analysis of current state-of-the-art deep learning-based multi-object detection algorithms was carried out utilizing the designated GPU-based embedded computing modules to obtain detailed metric data about frame rates, as well as the computation power. We also introduce an effective target tracking approach for moving objects. The algorithm for tracking moving objects is based on the extension of simple online and real-time tracking. It was developed by integrating a deep learning-based association metric approach with simple online and real-time tracking (Deep SORT), which uses a hypothesis tracking methodology with Kalman filtering and a deep learning-based association metric. In addition, a guidance system that tracks the target position using a GPU-based algorithm is introduced. Finally, we demonstrate the effectiveness of the proposed algorithms by real-time experiments with a small multi-rotor drone.
Article
Full-text available
Machine learning has played an essential role in the past decades, and in lockstep with the main advances in computer technology. Given the massive amount of data generated daily, there is a need for even faster and more effective machine learning algorithms that can provide updated models for real-time applications and on-demand tools. This paper presents FEMa-A Finite Element Machine classifier-for supervised learning problems, where each training sample is the center of a basis function, and the whole training set is modeled as a probabilistic manifold for classification purposes. FEMa has its theoretical basis in the Finite Element Method, which is widely used for numeral analysis in engineering problems. It is shown FEMa is parameterless and has a quadratic complexity for both training and classification phases when basis functions are used that satisfy certain properties. The proposed classifier yields very competitive results when compared with some state-of-the-art supervised pattern recognition techniques.
Article
Full-text available
The Sentinel-2 satellite mission, developed by the European Space Agency (ESA) for the Copernicus program of the European Union, provides repetitive multi-spectral observations of all Earth land surfaces at a high resolution. The Level 2A product is a basic product requested by many Sentinel-2 users: it provides surface reflectance after atmospheric correction, with a cloud and cloud shadow mask. The cloud/shadow mask is a key element to enable an automatic processing of Sentinel-2 data, and therefore, its performances must be accurately validated. To validate the Sentinel-2 operational Level 2A cloud mask, a software program named Active Learning Cloud Detection (ALCD) was developed, to produce reference cloud masks. Active learning methods allow reducing the number of necessary training samples by iteratively selecting them where the confidence of the classifier is low in the previous iterations. The ALCD method was designed to minimize human operator time thanks to a manually-supervised active learning method. The trained classifier uses a combination of spectral and multi-temporal information as input features and produces fully-classified images. The ALCD method was validated using visual criteria, consistency checks, and compared to another manually-generated cloud masks, with an overall accuracy above 98%. ALCD was used to create 32 reference cloud masks, on 10 different sites, with different seasons and cloud cover types. These masks were used to validate the cloud and shadow masks produced by three Sentinel-2 Level 2A processors: MAJA, used by the French Space Agency (CNES) to deliver Level 2A products, Sen2Cor, used by the European Space Agency (ESA), and FMask, used by the United States Geological Survey (USGS). The results show that MAJA and FMask perform similarly, with an overall accuracy around 90% (91% for MAJA, 90% for FMask), while Sen2Cor’s overall accuracy is 84%. The reference cloud masks, as well as the ALCD software used to generate them are made available to the Sentinel-2 user community.
Article
Full-text available
Recording workers’ activities is an important, but burdensome, management task for site supervisors. The last decade has seen a growing trend toward vision‐based activity recognition. However, recognizing workers’ activities in far‐field surveillance videos is understudied. This study proposes a hierarchical statistical method for recognizing workers’ activities in far‐field surveillance videos. The method consists of two steps. First, a deep action recognition method was used to recognize workers’ actions, and a new fusion strategy was proposed to consider the characteristics of far‐field surveillance videos. The deep action recognition method with the new fusion strategy has achieved the comparable performance (0.84 average accuracy) on the far‐field surveillance data set in contrast to the original method on the public data sets. Second, a Bayesian nonparametric hidden semi‐Markov model was innovatively used to model and infer workers’ activities based on action sequences. The latent states of the fitted Bayesian model captured workers’ activities in terms of state duration distributions and state transition distributions, which are indispensable for understanding workers’ time allocation. It has been preliminarily illustrated that the activity information learned by the Bayesian model possesses the potential to implement objective work sampling, personal physical fatigue monitoring, trade‐level health risk assessment, and process‐based quality control. Also, the limitations of this study are discussed.
Article
Full-text available
We introduce a new large-scale dataset for the advancement of object detection techniques and overhead object detection research. This satellite imagery dataset enables research progress pertaining to four key computer vision frontiers. We utilize a novel process for geospatial category detection and bounding box annotation with three stages of quality control. Our data is collected from WorldView-3 satellites at 0.3m ground sample distance, providing higher resolution imagery than most public satellite imagery datasets. We compare xView to other object detection datasets in both natural and overhead imagery domains and then provide a baseline analysis using the Single Shot MultiBox Detector. xView is one of the largest and most diverse publicly available object-detection datasets to date, with over 1 million objects across 60 classes in over 1,400 km^2 of imagery.
Article
Full-text available
In this paper, we address the challenge of land use and land cover classification using Sentinel-2 satellite images. The Sentinel-2 satellite images are openly and freely accessible provided in the Earth observation program Copernicus. We present a novel dataset based on Sentinel-2 satellite images covering 13 spectral bands and consisting out of 10 classes with in total 27,000 labeled and geo-referenced images. We provide benchmarks for this novel dataset with its spectral bands using state-of-the-art deep Convolutional Neural Network (CNNs). With the proposed novel dataset, we achieved an overall classification accuracy of 98.57%. The resulting classification system opens a gate towards a number of Earth observation applications. We demonstrate how this classification system can be used for detecting land use and land cover changes and how it can assist in improving geographical maps. The geo-referenced dataset EuroSAT is made publicly available at https://github.com/phelber/eurosat.
Conference Paper
Full-text available
The ensemble structure is a computational intelligence supervised strategy consisting of a pool of multiple operators that compete among each other for being selected, and an adaptation mechanism that tends to reward the most successful operators. In this paper we extend the idea of the ensemble to multiple local search logics. In a memetic fashion, the search structure of an ensemble framework cooperatively/competitively optimizes the problem jointly with a pool of diverse local search algorithms. In this way, the algorithm progressively adapts to a given problem and selects those search logics that appear to be the most appropriate to quickly detect high quality solutions. The resulting algorithm, namely Ensemble of Parameters and Strategies Differential Evolution empowered by Local Search (EPSDE-LS), is evaluated on multiple testbeds and dimensionality values. Numerical results show that the proposed EPSDE-LS robustly displays a very good performance in comparison with some of the state-of-the-art algorithms.
Conference Paper
Full-text available
We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. For \(300 \times 300\) input, SSD achieves 74.3 % mAP on VOC2007 test at 59 FPS on a Nvidia Titan X and for \(512 \times 512\) input, SSD achieves 76.9 % mAP, outperforming a comparable state of the art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size. Code is available at https:// github. com/ weiliu89/ caffe/ tree/ ssd.
Article
Full-text available
The paper presents a hybrid ensemble of diverse classifiers for logo and trademark symbols recognition. The proposed ensemble is composed of four types of different member classifiers. The first one compares color distribution of the logo patterns and is responsible for sifting out images of different color distribution. The second of the classifiers is based on the structural tensor recognition of local phase histograms. A proposed modification in this module consists of tensor computation in the space of the morphological scale-space. Thanks to this, more discriminative histograms describing global shapes are obtained. Next in the chain, is a novel member classifier that joins the Hausdorff distance with the correspondence measure of the log-polar patches computed around the corner points. This sparse classifier allows reliable comparison of even highly deformed patterns. The last member classifier relies on the statistical affine moment invariants which describe global shapes. However, a real advantage is obtained by joining the aforementioned base classifiers into a hybrid ensemble of classifiers, as proposed in this paper. Thanks to this a more accurate response and generalizing properties are obtained at reasonable computational requirements. Experimental results show good recognition accuracy even for the highly deformed logo patterns, as well as fair generalization properties which support human search and logo assessment tasks.
Conference Paper
Full-text available
The ensemble structure is a computational intelligence supervised strategy consisting of a pool of multiple operators that compete among each other for being selected, and an adaptation mechanism that tends to reward the most successful operators. In this paper we extend the idea of the ensemble to multiple local search logics. In a memetic fashion, the search structure of an ensemble framework cooperatively/competitively optimizes the problem jointly with a pool of diverse local search algorithms. In this way, the algorithm progressively adapts to a given problem and selects those search logics that appear to be the most appropriate to quickly detect high quality solutions. The resulting algorithm, namely Ensemble of Parameters and Strategies Differential Evolution empowered by Local Search (EPSDE-LS), is evaluated on multiple testbeds and dimensionality values. Numerical results show that the proposed EPSDE-LS robustly displays a very good performance in comparison with some of the state-of-the-art algorithms.
Article
Full-text available
In recent years the Probabilistic Neural Network (PPN) has been used in a large number of applications due to its simplicity and efficiency. PNN assigns the test data to the class with maximum likelihood compared with other classes. Likelihood of the test data to each training data is computed in the pattern layer through a kernel density estimation using a simple Bayesian rule. The kernel is usually a standard probability distribution function such as a Gaussian function. A spread parameter is used as a global parameter which determines the width of the kernel. The Bayesian rule in the pattern layer estimates the conditional probability of each class given an input vector without considering any probable local densities or heterogeneity in the training data. In this paper, an enhanced and generalized PNN (EPNN) is presented using local decision circles (LDCs) to overcome the aforementioned shortcoming and improve its robustness to noise in the data. Local decision circles enable EPNN to incorporate local information and non-homogeneity existing in the training population. The circle has a radius which limits the contribution of the local decision. In the conventional PNN the spread parameter can be optimized for maximum classification accuracy. In the proposed EPNN two parameters, the spread parameter and the radius of local decision circles, are optimized to maximize the performance of the model. Accuracy and robustness of EPNN are compared with PNN using three different benchmark classification problems, iris data, diabetic data, and breast cancer data, and five different ratios of training data to testing data: 90:10, 80:20, 70:30, 60:40, and 50:50. EPNN provided the most accurate results consistently for all ratios. Robustness of PNN and EPNN is investigated using different values of signal to noise ratio (SNR). Accuracy of EPNN is consistently higher than accuracy of PNN at different levels of SNR and for all ratios of training data to testing data.
Article
Full-text available
This paper addresses the problem of generating possible object locations for use in object recognition. We introduce selective search which combines the strength of both an exhaustive search and segmentation. Like segmentation, we use the image structure to guide our sampling process. Like exhaustive search, we aim to capture all possible object locations. Instead of a single technique to generate possible object locations, we diversify our search and use a variety of complementary image partitionings to deal with as many image conditions as possible. Our selective search results in a small set of data-driven, class-independent, high quality locations, yielding 99 % recall and a Mean Average Best Overlap of 0.879 at 10,097 locations. The reduced number of locations compared to an exhaustive search enables the use of stronger machine learning techniques and stronger appearance models for object recognition. In this paper we show that our selective search enables the use of the powerful Bag-of-Words model for recognition. The selective search software is made publicly available (Software: http://disi.unitn.it/~uijlings/SelectiveSearch.html).
Conference Paper
This article presents an intelligent system using deep learning algorithms and the transfer learning approach to detect oil palm units in multispectral photographs taken with unmanned aerial vehicles. Two main contributions come from this piece of research. First, a dataset for oil palm units detection is carefully produced and made available online. Although being tailored to the palm detection problem, the latter has general validity and can be used for any classification application. Second, we designed and evaluated a state-of-the-art detection system, which uses a convolutional neural network to extract meaningful features, and a classifier trained with the images from the proposed dataset. Results show outstanding effectiveness with an accuracy peak of 99.5% and a precision of 99.8%. Using different images for validation taken from different altitudes the model reached an accuracy of 97.5% and a precision of 98.3%. Hence, the proposed approach is highly applicable in the field of precision agriculture.
Article
The design of automated video surveillance systems often involves the detection of agents which exhibit anomalous or dangerous behavior in the scene under analysis. Models aimed to enhance the video pattern recognition abilities of the system are commonly integrated in order to increase its performance. Deep learning neural networks are found among the most popular models employed for this purpose. Nevertheless, the large computational demands of deep networks mean that exhaustive scans of the full video frame make the system perform rather poorly in terms of execution speed when implemented on low cost devices, due to the excessive computational load generated by the examination of multiple image windows. This work presents a video surveillance system aimed to detect moving objects with abnormal behavior for a panoramic 360∘ surveillance camera. The block of the video frame to be analyzed is determined on the basis of a probabilistic mixture distribution comprised by two mixture components. The first component is a uniform distribution, which is in charge of a blind window selection, while the second component is a mixture of kernel distributions. The kernel distributions generate windows within the video frame in the vicinity of the areas where anomalies were previously found. This contributes to obtain candidate windows for analysis which are close to the most relevant regions of the video frame, according to the past recorded activity. A Raspberry Pi microcontroller based board is employed to implement the system. This enables the design and implementation of a system with a low cost, which is nevertheless capable of performing the video analysis with a high video frame processing rate.
Article
This paper aims at providing researchers and engineering professionals from the first step of solution development to the last step of solution deployment with a practical and comprehensive deep‐learning‐based solution for detecting construction vehicles. This paper places particular focus on the often‐ignored last step of deployment. Our first phase of solution development involved data preparation, model selection, model training, and model validation. Given the necessarily small‐scale nature of construction vehicle image datasets, we propose as detection model an improved version of the single shot detector MobileNet, which is suitable for embedded devices. Our study's second phase comprised model optimization, application‐specific embedded system selection, economic analysis, and field implementation. Several embedded devices were proposed and compared. Results including a consistent above 90% mean average precision confirm the superior real‐time performance of our proposed solutions. Finally, the practical field implementation of our proposed solutions was investigated. This study validates the practicality of deep‐learning‐based object detection solutions for construction scenarios. Moreover, the detailed information provided by the current study can be employed for several purposes such as safety monitoring, productivity assessments, and managerial decision making.
Article
Crack assessment of bridge piers using unmanned aerial vehicles (UAVs) eliminates unsafe factors of manual inspection and provides a potential way for the maintenance of transportation infrastructures. However, the implementation of UAV‐based crack assessment for real bridge piers is hindered by several key issues, including the following: (a) both perspective distortion and the geometry distortion by nonflat structural surfaces usually appear on crack images taken by the UAV system from the pier surface; however, these two kinds of distortions are difficult to correct at the same time; and (b) the crack image taken by a close‐range inspection flight UAV system is partially imaged, containing only a small part of the entire surface of the pier, and thereby hinders crack localization. In this paper, a new image‐based crack assessment methodology for bridge piers using UAV and three‐dimensional (3D) scene reconstruction is proposed. First, the data acquisition of UAV‐based crack assessment is discussed, and the UAV flight path and photography strategy for bridge pier assessment are proposed. Second, image‐based crack detection and 3D reconstruction are conducted to obtain crack width feature pair sequences and 3D surface models, respectively. Third, a new method of projecting cracks onto a meshed 3D surface triangular model is proposed, which can correct both the perspective distortion and geometry distortion by nonflat structural surfaces, and realize the crack localization. Field test investigations of crack assessment of a real bridge pier using a UAV are carried out for illustration, validation, and error analysis of the proposed methodology.
Chapter
Object detection is a fundamental and important problem in computer vision. Although impressive results have been achieved on large/medium sized objects in large-scale detection benchmarks (e.g. the COCO dataset), the performance on small objects is far from satisfactory. The reason is that small objects lack sufficient detailed appearance information, which can distinguish them from the background or similar objects. To deal with the small object detection problem, we propose an end-to-end multi-task generative adversarial network (MTGAN). In the MTGAN, the generator is a super-resolution network, which can up-sample small blurred images into fine-scale ones and recover detailed information for more accurate detection. The discriminator is a multi-task network, which describes each super-resolved image patch with a real/fake score, object category scores, and bounding box regression offsets. Furthermore, to make the generator recover more details for easier detection, the classification and regression losses in the discriminator are back-propagated into the generator during training. Extensive experiments on the challenging COCO dataset demonstrate the effectiveness of the proposed method in restoring a clear super-resolved image from a blurred small one, and show that the detection performance, especially for small sized objects, improves over state-of-the-art methods.
Conference Paper
This paper presents a deep-learning based framework for addressing the problem of accurate cloud detection in remote sensing images. This framework benefits from a Fully Convolutional Neural Network (FCN), which is capable of pixellevel labeling of cloud regions in a Landsat 8 image. Also, a gradient-based identification approach is proposed to identify and exclude regions of snow/ice in the ground truths of the training set. We show that using the hybrid of the two methods (threshold-based and deep-learning) improves the performance of the cloud identification process without the need to manually correct automatically generated ground truths. In average the Jaccard index and recall measure are improved by 4.36% and 3.62%, respectively. Index Terms—Cloud detection, remote sensing, Landsat 8, image segmentation, deep-learning, CNN, FCN, U-Net.
Article
We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more accurate. It's still fast though, don't worry. At 320x320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 mAP@50 in 51 ms on a Titan X, compared to 57.5 mAP@50 in 198 ms by RetinaNet, similar performance but 3.8x faster. As always, all the code is online at https://pjreddie.com/yolo/
Article
Identifying a core set of features is one of the most important steps in the development of an automated seizure detector. In most of the published studies describing features and seizure classifiers, the features were hand-engineered, which may not be optimal. The main goal of the present paper is using deep convolutional neural networks (CNNs) and random forest to automatically optimize feature selection and classification. The input of the proposed classifier is raw multi-channel EEG and the output is the class label: seizure/nonseizure. By training this network, the required features are optimized, while fitting a nonlinear classifier on the features. After training the network with EEG recordings of 26 neonates, five end layers performing the classification were replaced with a random forest classifier in order to improve the performance. This resulted in a false alarm rate of 0.9 per hour and seizure detection rate of 77% using a test set of EEG recordings of 22 neonates that also included dubious seizures. The newly proposed CNN classifier outperformed three data-driven feature-based approaches and performed similar to a previously developed heuristic method.
Article
Differential evolution (DE) is one of the most popular and efficient evolutionary algorithms for numerical optimization and it has gained much success in a series of academic benchmark competitions as well as real applications. Recently, ensemble methods receive an increasing attention in designing high-quality DE algorithms. However, previous efforts are mainly devoted to the low-level ensemble of mutation strategies of DE. This study investigates the high-level ensemble of multiple existing efficient DE variants. A multi-population based framework (MPF) is proposed to realize the ensemble of multiple DE variants to derive a new algorithm named EDEV for short. EDEV consists of three highly popular and efficient DE variants, namely JADE (adaptive differential evolution with optional external archive), CoDE (differential evolution with composite trial vector generation strategies and control parameters) and EPSDE (differential evolution algorithm with ensemble of parameters and mutation strategies). The whole population of EDEV is partitioned into four subpopulations, including three indicator subpopulations with smaller size and one reward subpopulation with much larger size. Each constituent DE variant in EDEV owns an indicator subpopulation. After every predefined generations, the most efficient constituent DE variant is determined and the reward subpopulation is assigned to that best performed DE variant as an extra reward. Through this manner, the most efficient DE variant is expected to obtain the most computational resources during the optimization process. In addition, the population partition operator is triggered at every generation, which results in timely information sharing and tight cooperation among the component DE variants. Extensive experiments and comparisons have been done based on the CEC2005 and CEC2014 benchmark suit, which shows that the overall performance of EDEV is superior to several state-of-the-art peer DE variants. The success of EDEV reveals that, through an appropriate ensemble framework, different DE variants of different merits can support one another to cooperatively solve optimization problems.
Conference Paper
Can a large convolutional neural network trained for whole-image classification on ImageNet be coaxed into detecting objects in PASCAL? We show that the answer is yes, and that the resulting system is simple, scalable, and boosts mean average precision, relative to the venerable deformable part model, by more than 40% (achieving a final mAP of 48% on VOC 2007). Our framework combines powerful computer vision techniques for generating bottom-up region proposals with recent advances in learning high-capacity convolutional neural networks. We call the resulting system R-CNN: Regions with CNN features. The same framework is also competitive with state-of-the-art semantic segmentation methods, demonstrating its flexibility. Beyond these results, we execute a battery of experiments that provide insight into what the network learns to represent, revealing a rich hierarchy of discriminative and often semantically meaningful features.
Technical Report
TensorFlow [1] is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
Conference Paper
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region pro-posal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolu-tional features. For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Article
The keys for the development of an effective classification algorithm are: 1) discovering feature spaces with large margins between clusters and close proximity of the classmates and 2) discovering the smallest number of the features to perform accurate classification. In this paper, a new supervised classification algorithm, called neural dynamic classification (NDC), is presented with the goal of: 1) discovering the most effective feature spaces and 2) finding the optimum number of features required for accurate classification using the patented robust neural dynamic optimization model of Adeli and Park. The new classification algorithm is compared with the probabilistic neural network (PNN), enhanced PNN (EPNN), and support vector machine using two sets of classification problems. The first set consists of five standard benchmark problems. The second set is a large benchmark problem called Mixed National Institute of Standards and Technology database of handwritten digits. In general, NDC yields the most accurate classification results followed by EPNN. A beauty of the new algorithm is the smoothness of convergence curves which is an indication of robustness and good performance of the algorithm. The main aim is to maximize the prediction accuracy.
Conference Paper
Humans navigate crowded spaces such as a university campus by following common sense rules based on social etiquette. In this paper, we argue that in order to enable the design of new target tracking or trajectory forecasting methods that can take full advantage of these rules, we need to have access to better data in the first place. To that end, we contribute a new large-scale dataset that collects videos of various types of targets (not just pedestrians, but also bikers, skateboarders, cars, buses, golf carts) that navigate in a real world outdoor environment such as a university campus. Moreover, we introduce a new characterization that describes the “social sensitivity” at which two targets interact. We use this characterization to define “navigation styles” and improve both forecasting models and state-of-the-art multi-target tracking–whereby the learnt forecasting models help the data association step.
Article
A new heuristic approach for minimizing possiblynonlinear and non-differentiable continuous spacefunctions is presented. By means of an extensivetestbed it is demonstrated that the new methodconverges faster and with more certainty than manyother acclaimed global optimization methods. The newmethod requires few control variables, is robust, easyto use, and lends itself very well to parallelcomputation.
Article
One-class classification belongs to the one of the novel and very promising topics in contemporary machine learning. In recent years ensemble approaches have gained significant attention due to increasing robustness to unknown outliers and reducing the complexity of the learning process. In our previous works, we proposed a highly efficient one-class classifier ensemble, based on input data clustering and training weighted one-class classifiers on clustered subsets. However, the main drawback of this approach lied in difficult and time consuming selection of a number of competence areas which indirectly affects a number of members in the ensemble. In this paper, we investigate ten different methodologies for an automatic determination of the optimal number of competence areas for the proposed ensemble. They have roots in model selection for clustering, but can be also effectively applied to the classification task. In order to select the most useful technique, we investigate their performance in a number of one-class and multi-class problems. Numerous experimental results, backed-up with statistical testing, allows us to propose an efficient and fully automatic method for tuning the one-class clustering-based ensembles.
Article
This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.