Article

A Brain–Computer Interface (BCI) for the Detection of Mine-Like Objects in Sidescan Sonar Imagery

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In recent years, an increasing number of research efforts have been dedicated to the development of BCI systems [5,6], with applications extended from the realization of wheelchair operation [7], prosthetic control [8], neurological rehabilitation This work was supported in part by the Natural Science Foundation of China 61803255 and the Natural Science Foundation of Shanghai 18ZR1416700). (Corresponding author: Raofen Wang) [9] for physically challenged patients to a wider range of practical scenarios, such as virtual reality games [10], military detection [11] and operator fatigue detection [12,13]. Depending on the specific activity patterns of the brain, EEG signals applied to BCI development mainly include: slow cortical potential (SCP) [14], P300 evoked potential [15,16], steady-state visual evoked potential (SSVEP) [17,18], eventrelated desynchronization (ERD) and synchronization (ERS) [19,20]. ...
... Firstly, the joint correlation matrix between X and Y should be calculated as: 11 12 ...
Article
Full-text available
In recent years, multivariate synchronization index (MSI) algorithm, as a novel frequency detection method, has attracted increasing attentions in the study of brain-computer interfaces (BCIs) based on steady state visual evoked potential (SSVEP). However, MSI algorithm is hard to fully exploit SSVEP-related harmonic components in the electroencephalogram (EEG), which limits the application of MSI algorithm in BCI systems. In this paper, we propose a novel filter bank-driven MSI algorithm (FBMSI) to overcome the limitation and further improve the accuracy of SSVEP recognition. We evaluate the efficacy of the FBMSI method by developing a 6-command SSVEP-NAO robot system with extensive experimental analyses. An offline experimental study is first performed with EEG collected from nine subjects to investigate the effects of varying parameters on the model performance. Offline results show that the proposed method has achieved a stable improvement effect. We further conduct an online experiment with six subjects to assess the efficacy of the developed FBMSI algorithm in a real-time BCI application. The online experimental results show that the FBMSI algorithm yields a promising average accuracy of 83.56% using a data length of even only one second, which was 12.26% higher than the standard MSI algorithm. These extensive experimental results confirmed the effectiveness of the FBMSI algorithm in SSVEP recognition and demonstrated its potential application in the development of improved BCI systems.
... RSVP-based BCI systems can deal with the problems that computer vision is difficult to solve. Christopher B. et al. developed an RSVP-based BCI system for finding the mine-like object from the sonar imagery, which takes advantage of the accuracy and rapidity of human vision [13]. Recently, various RSVP-based BCI applications have been developed such as speller [14], [15], image retrieval [12], image classification [16], [17], anomaly detection [13], anti-deception [18]. ...
... Christopher B. et al. developed an RSVP-based BCI system for finding the mine-like object from the sonar imagery, which takes advantage of the accuracy and rapidity of human vision [13]. Recently, various RSVP-based BCI applications have been developed such as speller [14], [15], image retrieval [12], image classification [16], [17], anomaly detection [13], anti-deception [18]. RSVP-based speller is gaze independent and not being easy to cause visual fatigue, thus it is a very effective way for patients and external communication, especially for those patients with severe oculomotor impairments [19]. ...
Article
Rapid Serial Visual Presentation (RSVP)-based Brain-Computer Interface (BCI) is an efficient information detection technology by detecting event-related brain responses evoked by target visual stimuli. However, a time-consuming calibration procedure is needed before a new user can use this system. Thus, it is important to reduce calibration efforts for BCI applications. In this paper, we propose a multi-source conditional adversarial domain adaptation with the correlation metric learning (mCADA-C) framework that utilizes data from other subjects to reduce the data requirement from the new subject for training the model. This model utilizes adversarial training to enable a CNNbased feature extraction network to extract common features from different domains. A correlation metric learning (CML) loss is proposed to constrain the correlation of features based on class and domain to maximize the intra-class similarity and minimize inter-class similarity. Also, a multi-source framework with a source selection strategy is adopted to integrate the results of multiple domain adaptation. We constructed an RSVP-based dataset that includes 11 subjects each performing three RSVP experiments on three different days. The experimental results demonstrate that our proposed method can achieve 87.72% cross-subject balanced-accuracy under one block calibration. The results indicate our method can realize a higher performance with less calibration efforts.
... Sonar is a technology that utilizes underwater acoustic reflection to acquire underwater images, which can provide high-resolution images of the seafloor even in harsh underwater environments, and is therefore widely used in marine fields such as underwater torpedo detection [1], shipwreck rescue [2], marine geologic surveys, and submarine pipeline tracking [3]. When detecting objects on the seafloor, there are three types of areas that are typically recognized in sonar images: highlights, shadows, and seafloor backgrounds [4]. ...
Article
Full-text available
Sonar image segmentation is an important task in the field of underwater detection, and the realization of accurate segmentation of targets and shadows is the key to subsequent image processing. However, due to the influence of various marine environments, the formation of sonar images is often accompanied by high scattering noise and intensity inhomogeneity. In order to solve the difficulties brought by the above reasons to sonar image segmentation, this paper proposes a multi-spatial information constrained fuzzy c-means clustering algorithm (MSCFCM). Firstly, we incorporate local spatial information into the MSCFCM through morphological reconstruction (MR), and construct the distance metric between the current pixel and its neighboring pixels by combining the mean value information of the image, removing a large amount of noise in the background; secondly, we use the difference in the normalized variance of the processed image and the original image as a weight to constrain the influence of the distance terms, and embed it adaptively into the fuzzy clustering algorithm; finally, the member information of the neighboring pixels is used as the prior information of the current pixel by using the Kullback–Leibler (KL) divergence, and the division of membership degree in each iteration can be optimized to further improve the segmentation performance. We test our method on sonar images and medical images, and the experimental results demonstrate that the algorithm exhibits strong segmentation performance and noise-immunity.
... Sonar sensors are adept at capturing geometric structure information and can offer insights into underwater scenes even in lowvisibility conditions. Two main types of sonars are commonly used in sonar-based UOD: side-scan sonar (SSS) [32], [48], [49] and multi-beam forward-looking sonar (FLS) [36], [50], [51]. SSS provides long-range, high-resolution data, allowing for detection across vast survey areas (hundreds of meters long). ...
Preprint
Full-text available
Underwater object detection (UOD), aiming to identify and localise the objects in underwater images or videos, presents significant challenges due to the optical distortion, water turbidity, and changing illumination in underwater scenes. In recent years, artificial intelligence (AI) based methods, especially deep learning methods, have shown promising performance in UOD. To further facilitate future advancements, we comprehensively study AI-based UOD. In this survey, we first categorise existing algorithms into traditional machine learning-based methods and deep learning-based methods, and summarise them by considering learning strategy, experimental dataset, utilised features or frameworks, and learning stage. Next, we discuss the potential challenges and suggest possible solutions and new directions. We also perform both quantitative and qualitative evaluations of mainstream algorithms across multiple benchmark datasets by considering the diverse and biased experimental setups. Finally, we introduce two off-the-shelf detection analysis tools, Diagnosis and TIDE, which well-examine the effects of object characteristics and various types of errors on detectors. These tools help identify the strengths and weaknesses of detectors, providing insigts for further improvement. The source codes, trained models, utilised datasets, detection results, and detection analysis tools are public available at \url{https://github.com/LongChenCV/UODReview}, and will be regularly updated.
... Microgrids inherit many characteristics of smart grids, such as the self-healing ability of the grid [1], real-time monitoring of distributed generator sets, and high permeability, which can effectively improve the quality and stability of power grids. However, in order to realize the safe and reliable operation of microgrids [2], it is necessary to carry out fast and accurate fault detection, identification and protection mechanisms for relevant microgrids, so as to shorten the line inspection time in microgrids, reduce the power outage time of the power grid, reduce the economic loss of the power grid, and improve the overall reliability of the microgrid [3]. Therefore, identifying power flow faults in microgrids is of great significance for maintaining and operating microgrids. ...
Article
To improve the safety of microgrid operation, this paper uses a computer vision search system to study the identification of power flow faults in microgrids. Firstly, a power flow fault detection method based on a message transmission network was proposed, and the fault mapping from the microgrid fault to each node was constructed, and the power flow fault of each node in the microgrid was extracted. Then, by migrating the long short-term memory network to identify power flow faults in microgrids, the influence of computer vision search system on power flow fault identification in microgrids was comprehensively analyzed. The simulation results show that the proposed method can effectively improve the accuracy of power flow fault identification in the microgrid to improve the safety of the microgrid operation.
... In comparison to optical sensors, sonar instruments have advantages in imaging conditions, distance, and range [7] and are commonly used for underwater imaging detection. Commonly utilized sonar devices include forward-looking sonar [8], multibeam sonar [9], side-scan sonar [10], synthetic aperture sonar [11], and others. These devices are typically installed on autonomous or non-autonomous underwater vehicles that continuously transmit and receive sonar signals while traveling and convert the signals into images for tasks such as target detection, identification, and segmentation [12]. ...
Article
To overcome the challenges of limited samples, difficult acquisition, under-representation, and labeling in utilizing sonar images and deep learning for target detection, recognition, and segmentation tasks for full-class underwater targets, we propose the Seg2Sonar network based on SPADE. This network generates images through segmentation maps, thus eliminating the need for sample annotation. Additionally, we incorporate the Skip-Layer channel-wise Excitation (SLE) module into the SPADE network to enhance feature extraction ability with minimal training samples. To improve the realism of generated images, we introduce the Focal Frequency Loss (FFL) module, and propose the Elasticity loss (EL) strategy to improve the random combination capability of the network, considering the characteristics of low resolution and severe distortion of sonar images. Furthermore, we propose a weight adjustment (WA) strategy that tackles the challenge of low and unbalanced feature representation with few samples by taking into account the unbalanced distribution of features using prior information. hese four improvements enable efficient sample augmentation of sonar images with limited samples. Building upon the improved Seg2Sonar network, we propose an underwater full-class target augmentation strategy. Based on the imaging characteristics of sonar images, we classify underwater full-class targets into four categories: texture level, group level, shape level, and intensity level. We provide corresponding augmentation strategies by leveraging similar features among sonar target images or adding external radar/optical features to supplement the diversity of features. Our experimental results demonstrate the efficacy of our proposed method in achieving sample augmentation of underwater full-class targets with minimal samples (less than 10) or even zero samples. The approach achieves about 90% accuracy in detection, recognition, and segmentation for all types of targets through deep learning methods. Our findings provide a promising solution for efficient sample augmentation of underwater full-class targets with limited samples.
... Side-scan sonar, which was developed in the 1960s [16][17][18], is designed and used for a variety of survey work. It is widely used in both military and civilian fields, primarily for hydroacoustic mapping the seafloor in order to locate shipwrecks, lost airplanes, mines and mine-like objects [19], all types of debris lying on the bottom, detecting cables and pipelines [20,21] or marine archaeological sites [22]. At present, side-scan sonar is In turn, the across-track resolution (range resolution) mainly depends on the pulse length of the acoustic beam [50]. ...
Article
Full-text available
Side-scan sonar is designed and used for a variety of survey work, in both military and civilian fields. These systems provide acoustic imageries that play a significant role in a variety of marine and inland applications. For this reason, it is extremely important that the recorded sonar image is characterized by high resolution, detail and sharpness. This article is mainly aimed at the demonstration of the impact of side-scan sonar resolution on the imaging quality. The article also presents the importance of acoustic shadow in the process of analyzing sonar data and identifying underwater objects. The real measurements were carried out using two independent survey systems: hull-mounted sonar and towed side-scan sonar. Six different shipwrecks lying in the Baltic Sea were selected as the objects of research. The results presented in the article also constitute evidence of how the sonar technology has changed over time. The survey findings show that by maintaining the appropriate operational conditions and meeting several requirements, it is possible to obtain photographic-quality sonar images, which may be crucial in the process of data interpretation and shipwreck identification.
... We then can infer whether this picture contains a target or not. The existing applications of RSVP-based BCI include surveillance [19,20], face recognition [21,22], medical image analysis [23], and RSVP spellers [24][25][26]. Due to its relatively high detection speed compared with manual operation, especially for detecting targets from multiple huge images with high resolution, RSVP-based BCI is considered to be a potential approach for enhancing the ability and improving the efficiency of operators [27][28][29][30][31]. ...
Article
Full-text available
Although target detection based on electroencephalogram (EEG) signals has been extensively investigated recently, EEG-based target detection under weak hidden conditions remains a problem. In this paper, we proposed a rapid serial visual presentation (RSVP) paradigm for target detection corresponding to five levels of weak hidden conditions quantitively based on the RGB color space. Eighteen subjects participated in the experiment, and the neural signatures, including P300 amplitude and latency, were investigated. Detection performance was evaluated under five levels of weak hidden conditions using the linear discrimination analysis and support vector machine classifiers on different channel sets. The experimental results showed that, compared with the benchmark condition, (1) the P300 amplitude significantly decreased (8.92 ± 1.24 μV versus 7.84 ± 1.40 μV, p = 0.021) and latency was significantly prolonged (582.39 ± 25.02 ms versus 643.83 ± 26.16 ms, p = 0.028) only under the weakest hidden condition, and (2) the detection accuracy decreased by less than 2% (75.04 ± 3.24% versus 73.35 ± 3.15%, p = 0.029) with a more than 90% reduction in channel number (62 channels versus 6 channels), determined using the proposed channel selection method under the weakest hidden condition. Our study can provide new insights into target detection under weak hidden conditions based on EEG signals with a rapid serial visual presentation paradigm. In addition, it may expand the application of brain–computer interfaces in EEG-based target detection areas.
... Sonar image object detection remains one of the most difficult tasks in marine engineering due to noise, contrast and brightness limitations. Sonar image-based object detection, with the aim of locating and identifying semantic objects, is a prerequisite for a range of downstream underwater vision tasks, such as sea mines detection [1], pipeline detection [2] and archeology [3]. ...
Article
Full-text available
As a special detection task, sonar image object detection has been suffering from two main problems: the widespread noise and the lack of high-frequency information. In this paper, we propose two independent modules to solve the above two problems. For the widespread noise, we propose the foreground semantic enhancement module. Different from simple feature fusion, this module creatively associates the semantic map with features from each feature level, thus increasing the foreground–background distance and highlighting the object information. To solve the problem of insufficient high-frequency information, we propose the foreground edge enhancement module. This module inventively combines RNN networks to enhance edges by spatial semantic information from different directions as a way to improve the feature representation of foreground objects. Based on the above two modules, we design a novel detection architecture, foreground enhancement network (FEN), which enhances the features of a single point to make the classification more powerful and the localization more accurate. Through extensive experimental validation, our FEN network achieves high-performance improvement when combined with different detectors, and achieves the highest 10%%\% mAP performance improvement when combined with a single-stage detector (FCOS).
... EEG has become the most widely used neuroimaging technique for brain-computer interfaces (BCI). Some of these extended uses of EEG include military operations such as controlling weapons or drones [4][5][6][7][8], educational classroom applications such as monitoring student's attention/other mental states or helping them engage with material [9][10][11][12][13], cognitive enhancement such as increasing cognitive load or focus [12,14,15], and consumer based games such as computer games or physical toys controlled via brain waves [2,[15][16][17][18][19]. ...
Article
Full-text available
In the last decade there has been significant growth in the interest and application of using EEG (electroencephalography) outside of laboratory as well as in medical and clinical settings, for more ecological and mobile applications. However, for now such applications have mainly included military, educational, cognitive enhancement, and consumer-based games. Given the monetary and ecological advantages, consumer-grade EEG devices such as the Emotiv EPOC have emerged, however consumer-grade devices make certain compromises of data quality in order to become affordable and easy to use. The goal of this study was to investigate the reliability and accuracy of EPOC as compared to a research-grade device, Brainvision. To this end, we collected data from participants using both devices during three distinct cognitive tasks designed to elicit changes in arousal, valence, and cognitive load: namely, Affective Norms for English Words, International Affective Picture System, and the n-Back task. Our design and analytical strategies followed an ideographic person-level approach (electrode-wise analysis of vincentized repeated measures). We aimed to assess how well the Emotiv could differentiate between mental states using an Event-Related Band Power approach and EEG features such as amplitude and power, as compared to Brainvision. The Emotiv device was able to differentiate mental states during these tasks to some degree, however it was generally poorer than Brainvision, with smaller effect sizes. The Emotiv may be used with reasonable reliability and accuracy in ecological settings and in some clinical contexts (for example, for training professionals), however Brainvision or other, equivalent research-grade devices are still recommended for laboratory or medical based applications.
... Image retrieval is a typical application of RSVP-based BCIs. In addition, various BCI applications have been developed, such as speller [11][12][13], image classification [14,15], anomaly detection [16], and anti-deception [17]. ...
Article
Objective: Rapid serial visual presentation (RSVP)-based brain-computer interface (BCI) is an efficient information detection technology through detecting event-related potential (ERP) evoked by target visual stimuli. The BCI system requires a time-consuming calibration process to build a reliable decoding model for a new user. Therefore, zero-calibration has become an important topic in BCI research. Approach: In this paper, we construct an RSVP dataset that includes 31 subjects, and propose a zero-calibration method based on a metric-based meta-learning: ERP Prototypical Matching Net (EPMN). EPMN learns a metric space where the distance between EEG features and ERP prototypes belonging to the same category is smaller than that of different categories. Here, we employ prototype learning to learn a common representation from ERP templates of different subjects as ERP prototypes. Also, a metric-learning loss function is proposed for maximizing the distance between different classes of EEG and ERP prototypes and minimize the distance between the same classes of EEG and ERP prototypes in the metric space. Main results: The experimental results showed that EPMN achieved a balanced-accuracy of 86.34% and outperformed the comparable methods. Significance: Our EPMN can realize zero-calibration for an RSVP-based BCI system.
... Apart from model-based approaches, local feature descriptors without prior knowledge have also been deployed for mine classification. Among them, the most popular are: the Haarlike feature [55], the combination of Haar features and learned features from a human operator's brain electroencephalogram (EEG) [56] and Haar-like and local binary pattern (LBP) features [4]. The extracted features are usually analysed using machine learning techniques, such as boosting [55] and support vector machines (SVMs) [57]. ...
Article
Full-text available
Underwater mines pose extreme danger for ships and submarines. Therefore, navies around the world use mine countermeasure (MCM) units to protect against them. One of the measures used by MCM units is mine hunting, which requires searching for all the mines in a suspicious area. It is generally divided into four stages: detection, classification, identification and disposal. The detection and classification steps are usually performed using a sonar mounted on a ship’s hull or on an underwater vehicle. After retrieving the sonar data, military personnel scan the seabed images to detect targets and classify them as mine-like objects (MLOs) or benign objects. To reduce the technical operator’s workload and decrease post-mission analysis time, computer-aided detection (CAD), computer-aided classification (CAC) and automated target recognition (ATR) algorithms have been introduced. This paper reviews mine detection and classification techniques used in the aforementioned systems. The author considered current and previous generation methods starting with classical image processing, and then machine learning followed by deep learning. This review can facilitate future research to introduce improved mine detection and classification algorithms.
... Sidescan sonar (SSS), which can provide high-resolution images of the seabed, is one of the most common sensors for various underwater applications, such as topography measurement [1], search for sunken vessels and submerged settlements [2], underwater mine detection [3], fish stocks detection, cable or pipeline detection [4][5][6], and offshore oil prospecting [7]. Accurate and efficient segmentation of SSS images is essential for underwater objects detection. ...
Article
Full-text available
For high-resolution side scan sonar images, accurate and fast segmentation of sonar images is crucial for underwater target detection and recognition. However, due to the characteristics of low signal-to-noise ratio (SNR) and complex environmental noise of sonar, the existing methods with high accuracy and good robustness are mostly iterative methods with high complexity and poor real-time performance. For this purpose, a region growing based segmentation using the likelihood ratio testing method (RGLT) is proposed. This method obtains the seed points in the highlight and the shadow regions by likelihood ratio testing based on the statistical probability distribution and then grows them according to the similarity criterion. The growth avoids the processing of the seabed reverberation regions, which account for the largest proportion of sonar images, thus greatly reducing segmentation time and improving segmentation accuracy. In addition, a pre-processing filtering method called standard deviation filtering (STDF) is proposed to improve the SNR and remove the speckle noise. Experiments were conducted on three sonar databases, which showed that RGLT has significantly improved quantitative metrics such as accuracy, speed, and segmentation visual effects. The average accuracy and running times of the proposed segmentation method for 100 × 400 images are separately 95.90% and 0.44 s.
... BCI system based on Electroencephalogram (EEG) has been extensively explored due to the characteristics of easy operation, cost-effectiveness, and zero risks [2]. As one of the most significant branches of EEG-based BCI system, Event-Related Potential (ERP) analysis based on Rapid Serial Visual Presentation (RSVP) paradigm has received increasing attention in recent years, and its applications range from face recognition [3] and medical image diagnosis [4] to target surveillance [5]. However, due to the low signal-to-noise ratio, large inter-subject variabilities, and imbalanced ERP dataset, the generalization of the EEG-based BCI system is still limited. ...
Article
Full-text available
Due to the low signal-to-noise ratio, limited training samples, and inter-subject variabilities in electroencephalogram (EEG) signals, developing a subject-independent brain-computer interface (BCI) system used for new users without any calibration is still challenging. In this letter, we propose a novel Multi-Attention Convolutional Recurrent mOdel (MACRO) for EEG-based event-related potential (ERP) detection in the subject-independent scenario. Specifically, the convolutional recurrent network is designed to capture the spatial-temporal features, while the multi-attention mechanism is integrated to focus on the most discriminative channels and temporal periods of EEG signals. Comprehensive experiments conducted on a benchmark dataset for RSVP-based BCIs show that our method achieves the best performance compared with the five state-of-the-art baseline methods. This result indicates that our method is able to extract the underlying subject-invariant EEG features and generalize to unseen subjects. Finally, the ablation studies verify the effectiveness of the designed multi-attention mechanism in MACRO for EEG-based ERP detection.
... Second, the SVM classifier was trained using samples from all the subjects. Whereas in previous studies, individual classifiers were constructed for each subject (Wang and Jung, 2011;Barngrover et al., 2016), thus each subject had his own classification performance. But the subject-specific classifiers were hard to apply to other subjects because of the individual differences. ...
Article
Full-text available
Face processing is a spatiotemporal dynamic process involving widely distributed and closely connected brain regions. Although previous studies have examined the topological differences in brain networks between face and non-face processing, the time-varying patterns at different processing stages have not been fully characterized. In this study, dynamic brain networks were used to explore the mechanism of face processing in human brain. We constructed a set of brain networks based on consecutive short EEG segments recorded during face and non-face (ketch) processing respectively, and analyzed the topological characteristic of these brain networks by graph theory. We found that the topological differences of the backbone of original brain networks (the minimum spanning tree, MST) between face and ketch processing changed dynamically. Specifically, during face processing, the MST was more line-like over alpha band in 0–100 ms time window after stimuli onset, and more star-like over theta and alpha bands in 100–200 and 200–300 ms time windows. The results indicated that the brain network was more efficient for information transfer and exchange during face processing compared with non-face processing. In the MST, the nodes with significant differences of betweenness centrality and degree were mainly located in the left frontal area and ventral visual pathway, which were involved in the face-related regions. In addition, the special MST patterns can discriminate between face and ketch processing by an accuracy of 93.39%. Our results suggested that special MST structures of dynamic brain networks reflected the potential mechanism of face processing in human brain.
... According to the research of bionics [6], the biological vision system divides an object into several subsystems and realizes the identification through the synthesis of local information. In acoustic image sequences, local features are different from the image patterns of the nearest neighbour [7,8]. ...
Article
Full-text available
This paper proposes underwater target identification with local features and a feature tracking algorithm for acoustic image sequences. Feature detectors and descriptors are key to feature tracking. Their performance in underwater scene is evaluated by the change of multitarget parameters. A comprehensive quantitative investigation into the performance of feature tracking is thereby presented. Experimental results confirm that the proposed algorithm can accurately track potential targets and determine whether the potential targets are static targets, dynamic targets, or false alarms according to the tracking trajectories and statistical data.
... RSVP paradigm uses a high speed presented picture sequence containing a small number of target pictures to induce specific event-related potential (ERP) components, for example, the P300 components which have a positive potential that appears with a 300∼500 ms delay after the onset of the target stimulus. RSVP-based BCI has been *This work was supported in part by the National Natural Science Foundation of China under Grant 61976209, Grant 81701785, and Grant 61906188, and in part by the Strategic Priority Research Program of CAS under Grant XDB32040200, and in part by the China Postdoctoral Science Foundation 2019M650893. 1 Research Center for Brain-inspired Intelligence, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Beijing, China 2 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3 Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Science, Beijing, China * Corresponding author e-mail: huiguang.he@ia.ac.cn used for many applications, for example, speller [3][4], image retrieval [5] [6], anomaly detection [7], anti-deception [8], etc. ...
Conference Paper
Rapid Serial Visual Presentation (RSVP)-based Brain-Computer Interface (BCI) is an efficient information detection technology by detecting event-related brain responses evoked by target visual stimuli. However, a time-consuming calibration procedure is needed before a new user can use this system. Thus, it is important to reduce calibration efforts for BCI applications. In this paper, we collect an RSVP-based electroencephalogram (EEG) dataset, which includes 11 subjects. The experimental task is image retrieval. Also, we propose a multi-source transfer learning framework by utilizing data from other subjects to reduce the data requirement on the new subject for training the model. A source-selection strategy is firstly adopted to avoid negative transfer. And then, we propose a transfer learning network based on domain adversarial training. The convolutional neural network (CNN)-based network is designed to extract common features of EEG data from different subjects, while the discriminator tries to distinguish features from different subjects. In addition, a classifier is added for learning semantic information. Also, conditional information and gradient penalty are added to enable stable training of the adversarial network and improve performance. The experimental results demonstrate that our proposed method outperforms a series of state-of-the-art and baseline approaches.
... In [19], Sawas and Petillot applied the Haar-like features and a cascade of boosted classifiers, which were first introduced by Viola and Jones [31]. In [21], Barngrover et al. also utilized the Haar-like feature classifier to generate image patches (around regions of interest), which are then processed by subjects using the rapid serial visual presentation paradigm. Other feature-based methods used the geometric visual descriptors, such as scale-invariant feature transform (SIFT) [32], [33], [18] and local binary pattern (LBP) [34], [20]. ...
Article
Full-text available
With the advances in sonar imaging technology, sonar imagery has increasingly been used for oceanographic studies in civilian and military applications. High-resolution imaging sonars can be mounted on various survey platforms, typically autonomous underwater vehicles, which provide enhanced speed and improved data quality with long-range support. This paper addresses the automatic detection of mine-like objects using sonar images. The proposed Gabor-based detector is designed as a feature pyramid network with a small number of trainable weights. Our approach combines both semantically weak and strong features to handle mine-like objects at multiple scales effectively. For feature extraction, we introduce a parameterized Gabor layer which improves the generalization capability and computational efficiency. The steerable Gabor filtering modules are embedded within the cascaded layers to enhance the scale and orientation decomposition of images. The entire deep Gabor neural network is trained in an end-to-end manner from input sonar images with annotated mine-like objects. An extensive experimental evaluation on a real sonar dataset shows that the proposed method achieves competitive performance compared to the existing approaches.
... Cho et al. [18] tried to improve the recognition accuracy by using multi-angle view mine simulation and template matching. Away from model-based approaches, local feature descriptors without prior knowledge, such as the Haar-like feature [19], the Haar-like and local binary pattern (LBP) features [3], the combination of Haar features and learned features from a human operator's brain electroencephalogram (EEG) [20] have also been proposed for mine recognition. The extracted features are usually combined with some state-of-the-art machine learning approaches, such as boosting [19] and support vector machines (SVMs) [21]. ...
Article
Full-text available
Sidescan sonars are increasingly used in underwater search and rescue for drowning victims, wrecks and airplanes. Automatic object classification or detection methods can help a lot in case of long searches, where sonar operators may feel exhausted and therefore miss the possible object. However, most of the existing underwater object detection methods for sidescan sonar images are aimed at detecting minelike objects, ignoring the classification of civilian objects, mainly due to lack of dataset. So, in this study, we focus on the multi-class classification of drowning victim, wreck, airplane, mine and seafloor in sonar images. Firstly, through a long-term accumulation, we built a real sidescan sonar image dataset named SeabedObjects-KLSG, which currently contains 385 wreck, 36 drowning victim, 62 airplane, 129 mine and 578 seafloor images. Secondly, considering the real dataset is imbalanced, we proposed a semisynthetic data generation method for producing sonar images of airplanes and drowning victims, which uses optical images as input, and combines image segmentation with intensity distribution simulation of different regions. Finally, we demonstrate that by transferring a pre-trained deep convolutional neural network (CNN), e.g. VGG19, and fine-tuning the deep CNN using 70% of the real dataset and the semisynthetic data for training, the overall accuracy on the remaining 30% of the real dataset can be eventually improved to 97.76%, which is the highest among all the methods. Our work indicates that the combination of semisynthetic data generation and deep transfer learning is an effective way to improve the accuracy of underwater object classification.
... The side scan sonar [1] during motion may blur the sonar image of underwater moving objects. If the movement speed is too fast, the collected sonar image will be too blurred to extract the required information. ...
Article
Full-text available
In order to recover the blurred sonar image collected by the side scan sonar during motion, we propose a solution based on the conditional adversarial networks to deblur the sonar image of the unknown motion blur kernels. First, we use improved conditional adversarial networks to recover the sonar image, and improve the loss function, so that the quality of image generation is improved while the training stability is enhanced. Then we propose a method for generating blurred sonar images. The blurred sonar image generated by this method is closer to the real blurred sonar image. Finally, we made our own sonar image set and trained it with two-timescale update rule. The final results proved that the image restored by this method has higher definition.
... Side scan sonar (SSS), among the most common sensors used in ocean survey, can provide images of the seafloor and underwater target. Target detection based on SSS image has a great variety of applications in marine archaeological surveying [1], oceanic mapping [2], and underwater detection [3][4][5], in which the main task is SSS image segmentation. ...
Article
Full-text available
This paper presents a novel and practical convolutional neural network architecture to implement semantic segmentation for side scan sonar (SSS) image. As a widely used sensor for marine survey, SSS provides higher-resolution images of the seafloor and underwater target. However, for a large number of background pixels in SSS image, the imbalance classification remains an issue. What is more, the SSS images contain undesirable speckle noise and intensity inhomogeneity. We define and detail a network and training strategy that tackle these three important issues for SSS images segmentation. Our proposed method performs image-to-image prediction by leveraging fully convolutional neural networks and deeply-supervised nets. The architecture consists of an encoder network to capture context, a corresponding decoder network to restore full input-size resolution feature maps from low-resolution ones for pixel-wise classification and a single stream deep neural network with multiple side-outputs to optimize edge segmentation. We performed prediction time of our network on our dataset, implemented on a NVIDIA Jetson AGX Xavier, and compared it to other similar semantic segmentation networks. The experimental results show that the presented method for SSS image segmentation brings obvious advantages, and is applicable for real-time processing tasks.
... Recent works have tended to merge the detection and classification of objects in images into a unified stage. Barngrover et al. [12] used a brain-computer interface that combines computer vision with human vision, in which a Haar-like feature [13] classifier is trained on a large data set to detect objects. Sadjadi et al. [14] proposed a subspace-based detector. ...
Article
We offer a new unsupervised statistically-based algorithm for the detection of underwater objects in synthetic aperture sonar (SAS) imagery due to its high-resolution imagery and because its resolution is independent of the range. In contrast to other methods that do not utilize the statistical model of the shadow region, our algorithm combines highlight detection and shadow detection using a weighted likelihood ratio test, while exploiting the expected spatial distribution of potential objects. We detect highlights by a higher-order-statistics representation of the image, followed by a segmentation process to form a region-of-interest (ROI). Then, while taking into account the sonar elevation and scan angle, for each ROI, we use a support vector machine (SVM) over the statistical features of the pixels within the ROI to detect shadow-related pixels and background pixels. Our algorithm has the benefit of being robust as a result of setting its main parameters in situ . Moreover, we do not require knowledge about the target’s shape or size, thereby making our algorithm suitable for all sonar detection applications and sonar types. To test detection performance, using our own autonomous underwater vehicle, we collected 270 sonar images, which we also share with the community. Compared to the results of benchmark schemes, our detection algorithm shows a trade-off between the probability of detection and the false alarm rate (FAR), which is close to the Kullback-Leibler (KL) divergence lower bound.
... superimposed with small target airplane images, which could vary in location and angle within an 23 elliptical focal area. Correspondingly, in (Barngrover et al., 2016), the prime goal was to correctly 24 identify sonar images of mine-like objects on the sea bed. Accordingly, a three-stage BCI system was 25 developed whereby the initial stages entail computer vision procedures e.g. ...
Article
Full-text available
Rapid serial visual presentation (RSVP) combined with the detection of event related brain responses facilitates the selection of relevant information contained in a stream of images presented rapidly to a human. Event related potentials (ERPs), measured non-invasively with electroencephalography (EEG), can be associated with infrequent target stimuli(images) in groups of images, potentially providing an interface for human-machine symbiosis, where humans can interact and interface with a computer without moving and which may offer faster image sorting than scenarios where humans are expected to physically react when a target image is detected. Certain features of the human visual system impact on the success of the RSVP paradigm. Pre-attentive processing supports the identification of target information ~100ms following information presentation. This paper presents a comprehensive review and evaluation of research in the broad field of RSVP-based brain-computer interfaces (BCIs). Applications that use RSVP-based BCIs are classified based on the operation mode whilst protocol design considerations are critiqued. Guidelines for using the RSVP-based BCI paradigms are defined and discussed, with a view to further standardization of methods and experimental evidence gathering to support the use of RSVP-based BCIs in practice.
Article
Objective. While brain–computer interface (BCI) based on rapid serial visual presentation (RSVP) is widely used in target detection, patterns of event-related potential (ERP), as well as the performance on detecting inconspicuous targets remain unknown. Moreover, participant-screening methods to excluded ‘BCI-blind’ users are still lacking. Approach. A RSVP paradigm was designed with targets of varied concealment, size, and location. ERPs (e.g. P300 and N2pc) and target detection accuracy were compared among these conditions. The relationship between participants’ attention scores and target detection accuracy was also analyzed to test attention level as a criterion for participant screening. Main results. Statistical analysis showed that the conditions of target concealment and size significantly influenced ERP. In particular, ERP for inconspicuous targets, such as concealed and small targets, exhibited lower amplitudes and longer latencies. In consistent, the accuracy of detection in inconspicuous condition was significantly lower than that of conspicuous condition. In addition, a significant association was found between attention scores and target detection accuracy for camouflaged targets. Significance. The study was the first to address ERP features among multiple dimensions of concealment, size, and location. The conclusion provided insights into the relationship between ERP decoding and properties of targets. In addition, the association between attention scores and detection accuracy implied a promising method in screening well-behaved participants for camouflaged target detection.
Article
The rapid serial visual presentation (RSVP) paradigm, which is based on the electroencephalogram (EEG) technology, is an effective approach for object detection. It aims to detect the event-related potentials (ERP) components evoked by target images for rapid identification. However, the object detection performance within this paradigm is affected by the visual disparity between adjacent images in a sequence. Currently, there is no objective metric to quantify this visual difference. Consequently, a reliable image sorting method is required to ensure the generation of a smooth sequence for effective presentation. In this paper, we propose a novel semantic image sorting method for sorting RSVP sequences, which aims at generating sequences that are perceptually smoother in terms of the human visual experience. We conducted a comparative analysis between our method and two existing methods for generating RSVP sequences using both qualitative and quantitative assessments. A qualitative evaluation revealed that the sequences generated by our method were smoother in subjective vision and were more effective in evoking stronger ERP components than those generated by the other two methods. Quantitatively, our method generated semantically smoother sequences than the other two methods. Furthermore, we employed four advanced approaches to classify single-trial EEG signals evoked by each of the three methods. The classification results of the EEG signals evoked by our method were superior to those of the other two methods. In summary, the results indicate that the proposed method can significantly enhance the object detection performance in RSVP-based sequences.
Article
Objective. Many subject-dependent methods were proposed for electroencephalogram (EEG) classification in rapid serial visual presentation (RSVP) task, which required a large amount of data from new subject and were time-consuming to calibrate system. Cross-subject classification can realize calibration reduction or zero calibration. However, cross-subject classification in RSVP task is still a challenge. Approach. This study proposed a multi-source domain adaptation based tempo-spatial convolution (MDA-TSC) network for cross-subject RSVP classification. The proposed network consisted of three modules. First, the common feature extraction with multi-scale tempo-spatial convolution was constructed to extract domain-invariant features across all subjects, which could improve generalization of the network. Second, the multi-branch domain-specific feature extraction and alignment was conducted to extract and align domain-specific feature distributions of source and target domains in pairs, which could consider feature distribution differences among source domains. Third, the domain-specific classifier was exploited to optimize the network through loss functions and obtain prediction for the target domain. Main results. The proposed network was evaluated on the benchmark RSVP dataset, and the cross-subject classification results showed that the proposed MDA-TSC network outperformed the reference methods. Moreover, the effectiveness of the MDA-TSC network was verified through both ablation studies and visualization. Significance. The proposed network could effectively improve cross-subject classification performance in RSVP task, and was helpful to reduce system calibration time.
Article
Disposal of industrial and hazardous waste in the ocean was a pervasive global practice in the 20th century. Uncertainty in the quantity, location, and contents of dumped materials underscores ongoing risks to marine ecosystems and human health. This study presents an analysis of a wide-area side-scan sonar survey conducted with autonomous underwater vehicles (AUVs) at a dump site in the San Pedro Basin, California. Previous camera surveys located 60 barrels and other debris. Sediment analysis in the region showed varying concentrations of the insecticidal chemical dichlorodiphenyltrichloroethane (DDT), of which an estimated 350-700 t were discarded in the San Pedro Basin between 1947 and 1961. A lack of primary historical documents specifying DDT acid waste disposal methods has contributed to the ambiguity surrounding whether dumping occurred via bulk discharge or containerized units. Barrels and debris observed during previous surveys were used for ground truth classification algorithms based on size and acoustic intensity characteristics. Image and signal processing techniques identified over 74,000 debris targets within the survey region. Statistical, spectral, and machine learning methods characterize seabed variability and classify bottom-type. These analytical techniques combined with AUV capabilities provide a framework for efficient mapping and characterization of uncharted deep-water disposal sites.
Article
Rapid Serial Visual Presentation (RSVP) based Brain-Computer Interface (BCI) facilities the high-throughput detection of rare target images by detecting evoked event-related potentials (ERPs). At present, the decoding accuracy of the RSVP-based BCI system limits its practical applications. This study introduces eye movements (gaze and pupil information), referred to as EYE modality, as another useful source of information to combine with EEG-based BCI and forms a novel target detection system to detect target images in RSVP tasks. We performed an RSVP experiment, recorded the EEG signals and eye movements simultaneously during a target detection task, and constructed a multi-modal dataset including 20 subjects. Also, we proposed a cross-modal guiding and fusion network to fully utilize EEG and EYE modalities and fuse them for better RSVP decoding performance. In this network, a two-branch backbone was built to extract features from these two modalities. A Cross-Modal Feature Guiding (CMFG) module was proposed to guide EYE modality features to complement the EEG modality for better feature extraction. A Multi-scale Multi-modal Reweighting (MMR) module was proposed to enhance the multi-modal features by exploring intra- and inter-modal interactions. And, a Dual Activation Fusion (DAF) was proposed to modulate the enhanced multi-modal features for effective fusion. Our proposed network achieved a balanced accuracy of 88.00% (±2.29) on the collected dataset. The ablation studies and visualizations revealed the effectiveness of the proposed modules. This work implies the effectiveness of introducing the EYE modality in RSVP tasks. And, our proposed network is a promising method for RSVP decoding and further improves the performance of RSVP-based target detection systems.
Chapter
Brain-Computer Interface (BCI) is a communication system that transmits information between the brain and the outside world which does not rely on peripheral nerves and muscles. Rapid Serial Visual Presentation (RSVP)-based BCI system is an efficient and robust information retrieval method based on human vision. However, the current RSVP-BCI system requires a time-consuming calibration procedure for one new subject, which greatly restricts the use of the BCI system. In this study, we propose a zero-training method based on convolutional neural network and graph attention network with adaptive graph learning. Firstly, a single-layer convolutional neural network is used to extract EEG features. Then, the extracted features from similar samples were adaptively connected to construct the graph. Graph attention network was employed to classify the target sample through decoding the connection relationship of adjacent samples in one graph. Our proposed method achieves 86.76% mean balanced-accuracy (BA) in one self-collected dataset containing 31 subjects, which performs better than the comparison methods. This indicates our method can realize zero-calibration for an RSVP-based BCI system.KeywordsBrain-Computer Interface (BCI)Adaptive graph learningGraph attention networkZero-trainingRSVP
Chapter
For the underwater acoustic targets recognition, it is a challenging task to provide good classification accuracy for underwater acoustic target using radiated acoustic signals. Generally, due to the complex and changeable underwater environment, when the difference between the two types of targets is not large in some sensitive characteristics, the classifier based on single feature training cannot output correct classification. In addition, the complex background noise of target will also lead to the degradation of feature data quality. Here, we present a feature fusion strategy to identify underwater acoustic targets with one-dimensional Convolutional Neural Network. This method mainly consists of three steps. Firstly, considering the phase spectrum information is usually ignored, the Long and Short-Term Memory (LSTM) network is adopted to extract phase features and frequency features of the acoustic signal in the real marine environment. Secondly, for leveraging the frequency-based features and phase-based features in a single model, we introduce a feature fusion method to fuse the different features. Finally, the newly formed fusion features are used as input data to train and validate the model. The results show the superiority of our algorithm, as compared with the only single feature data, which meets the intelligent requirements of underwater acoustic target recognition to a certain extent.
Article
Full-text available
To overcome the shortcomings of the traditional manual detection of underwater targets in side-scan sonar (SSS) images, a real-time automatic target recognition (ATR) method is proposed in this paper. This method consists of image preprocessing, sampling, ATR by integration of the transformer module and YOLOv5s (that is, TR–YOLOv5s), and target localization. By considering the target-sparse and feature-barren characteristics of SSS images, a novel TR–YOLOv5s network and a down-sampling principle are put forward, and the attention mechanism is introduced in the method to meet the requirements of accuracy and efficiency for underwater target recognition. Experiments verified the proposed method achieved 85.6% mean average precision (mAP) and 87.8% macro-F2 score, and brought 12.5% and 10.6% gains compared with the YOLOv5s network trained from scratch, and had the real-time recognition speed of about 0.068 s per image.
Article
This work demonstrates that automated mine countermeasure (MCM) tasks are greatly facilitated by characterizing the seafloor environment in which the sensors operate as a first step within a comprehensive strategy for how to exploit information from available sensors, multiple detector types, measured features, and target classifiers, depending on the specific seabed characteristics present within the high-frequency synthetic aperture sonar (SAS) imagery used to perform MCM tasks. This approach is able to adapt as environmental characteristics change and includes the ability to recognize novel seabed types. Classifiers are then adaptively retrained through active learning in these unfamiliar seabed types, resulting in improved mitigation of challenging environmental clutter as it is encountered. Further, a segmentation constrained network algorithm is introduced to enable enhanced generalization abilities for recognizing mine-like objects from underrepresented environments within the training data. Additionally, a fusion approach is presented that allows the combination of multiple detectors, feature types spanning both measured expert features and deep learning, and an ensemble of classifiers for the particular seabed mixture proportions measured around each detected target. The environmentally adaptive approach is demonstrated to provide the best overall performance for automated mine-like object recognition.
Article
Full-text available
The classification of low signal-to-noise ratio (SNR) underwater acoustic signals in complex acoustic environments and increasingly small target radiation noise is a hot research topic.. This paper proposes a new method for signal processing—low SNR underwater acoustic signal classification method (LSUASC)—based on intrinsic modal features maintaining dimensionality reduction. Using the LSUASC method, the underwater acoustic signal was first transformed with the Hilbert-Huang Transform (HHT) and the intrinsic mode was extracted. the intrinsic mode was then transformed into a corresponding Mel-frequency cepstrum coefficient (MFCC) to form a multidimensional feature vector of the low SNR acoustic signal. Next, a semi-supervised fuzzy rough Laplacian Eigenmap (SSFRLE) method was proposed to perform manifold dimension reduction (local sparse and discrete features of underwater acoustic signals can be maintained in the dimension reduction process) and principal component analysis (PCA) was adopted in the process of dimension reduction to define the reduced dimension adaptively. Finally, Fuzzy C-Means (FCMs), which are able to classify data with weak features was adopted to cluster the signal features after dimensionality reduction. The experimental results presented here show that the LSUASC method is able to classify low SNR underwater acoustic signals with high accuracy.
Article
Objective: Brain computer interface (BCI) aims at providing a new way of communication between the human brain and external devices. One of the major tasks associated with the BCI system is to improve classification performance of motor imagery (MI) signal. Electroencephalogram (EEG) signals are widely used for the MI BCI system. The raw EEG signals are usually non-stationary time series with weak class property, degrading the classification performance. Approach: Nonnegative matrix factorization (NMF) has been successfully applied to pattern extraction which provides meaningful data presentation. However, NMF is unsupervised and cannot make use of the label information. Based on the label information of MI EEG data, we propose a novel method, called double constrained nonnegative matrix factorization (DCNMF), to improve the classification performance of NMF on MI BCI. The proposed method constructs a couple of label matrices as the constraints on the NMF procedure to make the EEGs with the same class labels have the similar representation in the low dimensional space, while the EEGs with different class labels have dissimilar representation as much as possible. Accordingly, the extracted features have obtain obvious class property, which is optimal to the classification of MI EEG. Main results: This study is conducted on the BCI competition III datasets (I and IVa). The proposed method has helped achieve high averaged accuracy across two datasets (79.00% for dataset I, 77.78% for dataset IVa), it performed better than the existing studies in the literature by about10%. Significance: Our study provides a novel solution for the MI BCI analysis from the perspective of label constraint, it provides convenience for semi-supervised learning of features and significantly improves the classification performance.
Article
Sonar is one of the most important tools for underwater object detection and submarine topography reconstruction. To classify sonar images automatically and accurately is essential for the navigation and path planning of autonomous underwater vehicles (AUV). However, for the intensity inhomogeneity and speckle noise in sonar images, it is difficult to obtain segmentation results of high accurate rate. To address these issues, in this paper, we advocate a segmentation method incorporating simple linear iterative clustering (SLIC) and adaptive intensity constraint into Markov random field (MRF), to segment sonar images with intensity inhomogeneity into the object highlight, the object shadow and the background areas. The main procedures of the proposed work are as follows: first, SLIC is used to separate sonar images into homogeneous super pixels, and second the homogeneity patches, with a novel intensity constraint strategy, is utilized to optimize the segmentation result of MRF at each iteration. Experimental results reveal that the proposed method performs well and fast on real sonar images which have intensity inhomogeneity problem.
Article
This article presents an automatic real-time object detection method using sidescan sonar (SSS) and an onboard graphics processing unit (GPU). The detection method is based on a modified convolutional neural network (CNN), which is referred to as self-cascaded CNN (SC-CNN). The SC-CNN model segments SSS images into object-highlight, object-shadow, and seafloor areas, and it is robust to speckle noise and intensity inhomogeneity. Compared with typical CNN, SC-CNN utilizes crop layers which enable the network to use local and global features simultaneously without adding convolution parameters. Moreover, to take the local dependencies of class labels into consideration, the results of SC-CNN are postprocessed using Markov random field. Furthermore, the sea trial for real-time object detection via the presented method was implemented on our autonomous underwater vehicle (AUV) named SAILFISH via its GPU module at Jiaozhou Bay Bridge, Qingdao, China. The results show that the presented method for SSS image segmentation has obvious advantages when compared with the typical CNN and unsupervised segmentation methods, and is applicable in real-time object detection task.
Conference Paper
This paper introduces a new unsupervised statistically-based algorithm for the detection of underwater objects in sonar imagery. Highlights are detected by a higher-order-statistics representation of the image followed by a segmentation process to form a region-of-interest (ROI). Our algorithm sets its main parameters \textit{in situ} and avoids the need of parameter calibration. Moreover, we do not require knowledge about the target's shape or size, thereby making our algorithm robust to any sonar detection application. Results obtained from real sonar system show a good trade-off between probability of detection and false alarm rate (FAR).
Conference Paper
Face recognition plays an import role in our daily lives. However, computer face recognition performance degrades dramatically with the presence of variations in illumination, head pose and occlusion. In contrast, the human brain can recognize target faces over a much wider range of conditions. In this paper, we investigate target face detection through electroencephalography (EEG). We address the problem of single-trial target-face detection in a rapid serial visual presentation (RSVP) paradigm. Whereas most previous approaches used support vector machines (SVMs), we use a convolutional neural network (CNN) to classify EEG signals when subjects view target and non-target face stimuli. The CNN outperforms the SVM algorithm, which is commonly used for event-related-potential (ERP) detection. We also compare the difference in performance when using animal stimuli. The proposed system can be potentially used in rapid face recognition system.
Article
Creating human-informative signal processing systems for the underwater acoustic environment that do not generate operator cognitive saturation and overload is a major challenge. To alleviate cognitive operator overload, we present a visual analytics methodology in which multiple beam-formed sonar returns are mapped to an optimized 2-D visual representation, which preserves the relevant data structure. This representation alerts the operator as to which beams are likely to contain anomalous information by modeling a latent distribution of information for each beam. Sonar operators therefore focus their attention only on the surprising events. In addition to the principled visualization of high-dimensional uncertain data, the system quantifies anomalous information using a Fisher Information measure. Central to this process is the novel use of both signal and noise observation modeling to characterize the sensor information. A demonstration of detecting exceptionally low signal-to-noise ratio targets embedded in real-world 33-beam passive sonar data is presented.
Article
A new algorithm called the Mondrian detector has been developed for object detection in high-frequency synthetic aperture sonar (SAS) imagery. If a second (low) frequency-band image is available, the algorithm can seamlessly exploit the additional information via an auxiliary prescreener test. This flexible single-band and multiband functionality fills an important capability gap. The algorithm's overall prescreener component limits the number of potential alarms. The main module of the method then searches for areas that pass a subset of pixel-intensity tests. A new set of reliable classification features has also been developed in the process. The overall framework has been kept uncomplicated intentionally in order to facilitate performance estimation, to avoid requiring dedicated training data, and to permit delayed real-time detection at sea on an autonomous underwater vehicle. The promise of the new algorithm is demonstrated on six substantial data sets of real SAS imagery collected at various geographical sites that collectively exhibit a wide range of diverse seafloor characteristics. The results show that--as with Mondrian's art--simplicity can be powerful.
Conference Paper
Detection of mines on the seafloor is most accurately performed by a human operator. However, it is a difficult task for machine vision methods. In addition, mine detection calls for high accuracy detection because of the high-risk nature of the problem. The advancements in the capabilities of sonar imaging and autonomous underwater vehicles has led to research using machine learning techniques and well known computer vision features (Barngrover et al., IEEE J. Ocean Eng. (2015), [1]). Non-linear classifiers such as Haar-like feature classifiers have shown good potential in extracting complex spatial and temporal patterns from noisy multidirectional series of sonar imagery, however this approach is dependent on specific sonar illumination methods and does not account for amount of lighting or soil type variation in training and test images. In this paper, we report on the preliminary methods and results of applying a non-linear classification method, convolutional neural networks (CNN) to mine detection in noisy sonar imagery. The advantage of this method is that it can learn more abstract and complex features in the input space, leading to a lower false-positive and higher true positive rates. CNNs routinely outperform other methods in similar machine vision tasks (Deng and Yu, Found. Trends Signal Process. 7, 197–387 (2013), [2]). We used a simple CNN architecture trained to distinguish mine-like objects from background clutter with up to 99% accuracy.
Article
Full-text available
The detection of mine-like objects (MLOs) in sidescan sonar (SSS) imagery continues to be a challenging task. In practice, subject matter experts tediously analyze images searching for MLOs. In the literature, there are many attempts at automated target recognition (ATR) to detect the MLOs. This paper focuses on the classifiers that use computer vision and machine learning approaches. These techniques require large amounts of data, which is often prohibitive. For this reason, the use of synthetic and semisynthetic data sets for training and testing is commonplace. This paper shows how a simple semisynthetic data creation scheme can be used to pretest these data-hungry training algorithms to determine what features are of value. The paper provides real-world testing and training data sets in addition to the semisynthetic training and testing data sets. The paper considers the Haar-like and local binary pattern (LBP) features with boosting, showing improvements in performance with real classifiers over semisynthetic classifiers and improvements in performance as semisynthetic data set size increases.
Conference Paper
Full-text available
Given a high dimensional dataset, one would like to be able to represent this data using fewer parameters while preserving relevant signal information. If we assume the original data actually exists on a lower dimensional manifold embedded in a high dimensional feature space, then recently popularized approaches based in graph-theory and differential geometry allow us to learn the underlying manifold that generates the data. One such technique, called Diffusion Maps, is said to preserve the local proximity between data points by first constructing a representation for the underlying manifold. This work examines target specific classification problems using Diffusion Maps to embed inverse imaged synthetic aperture sonar signal data for automatic target recognition. The data set contains six target types. Results demonstrate that the diffusion features capture suitable discriminating information from the raw signals and acoustic color to improve target specific recognition with a lower false alarm rate. However, fusion performance is degraded.
Article
Full-text available
Boosting is one of the most important recent developments in classi-fication methodology. Boosting works by sequentially applying a classifica-tion algorithm to reweighted versions of the training data and then taking a weighted majority vote of the sequence of classifiers thus produced. For many classification algorithms, this simple strategy results in dramatic improvements in performance. We show that this seemingly mysterious phenomenon can be understood in terms of well-known statistical princi-ples, namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multiclass generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multiclass generalizations of boosting in most situations, and far superior in some. We suggest a minor modification to boosting that can reduce computation, often by factors of 10 to 50. Finally, we apply these insights to produce an alternative formulation of boosting decision trees. This approach, based on best-first truncated tree induction, often leads to better performance, and can provide interpretable descrip-tions of the aggregate decision rule. It is also much faster computationally, making it more suitable to large-scale data mining applications.
Chapter
Full-text available
We have developed EEG-based BCI systems which couple human vision and computer vision for speeding the search of large images and image/video databases. We term these types of BCI systems “cortically-coupled computer vision” (C3Vision). C3Vision exploits (1)the ability of the human visual system to get the “gist” of a scene with brief (10’s–100’s of ms) and rapid serial (10Hz) image presentations and (2)our ability to decode from the EEG whether, based on the gist, the scene is relevant, informative and/or grabs the user’s attention. In this chapter we describe two system architectures for C3Vision that we have developed. The systems are designed to leverage the relative advantages, in both speed and recognition capabilities, of human and computer, with brain signals serving as the medium of communication of the user’s intentions and cognitive state.
Conference Paper
Full-text available
A scanning window type pedestrian detector is presented that uses both appearance and motion information to find walking people in surveillance video. We extend the work of Viola, Jones and Snow (2005) to use many more frames as input to the detector thus allowing a much more detailed analysis of motion. The resulting detector is about an order of magnitude more accurate than the detector of Viola, Jones and Snow. It is also computationally efficient, processing frames at the rate of 5 Hz on a 3 GHz Pentium processor.
Conference Paper
Full-text available
This paper describes a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features and yields extremely efficient classifiers (6). The third contribution is a method for combining classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. A set of experiments in the domain of face detection are presented. The system yields face detection performace comparable to the best previous systems (18, 13, 16, 12, 1). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.
Article
Full-text available
We report the design and performance of a brain-computer interface (BCI) system for real-time single-trial binary classification of viewed images based on participant-specific dynamic brain response signatures in high-density (128-channel) electroencephalographic (EEG) data acquired during a rapid serial visual presentation (RSVP) task. Image clips were selected from a broad area image and presented in rapid succession (12/s) in 4.1-s bursts. Participants indicated by subsequent button press whether or not each burst of images included a target airplane feature. Image clip creation and search path selection were designed to maximize user comfort and maintain user awareness of spatial context. Independent component analysis (ICA) was used to extract a set of independent source time-courses and their minimally-redundant low-dimensional informative features in the time and time-frequency amplitude domains from 128-channel EEG data recorded during clip burst presentations in a training session. The naive Bayes fusion of two Fisher discriminant classifiers, computed from the 100 most discriminative time and time-frequency features, respectively, was used to estimate the likelihood that each clip contained a target feature. This estimator was applied online in a subsequent test session. Across eight training/test session pairs from seven participants, median area under the receiver operator characteristic curve, by tenfold cross validation, was 0.97 for within-session and 0.87 for between-session estimates, and was nearly as high (0.83) for targets presented in bursts that participants mistakenly reported to include no target features.
Article
Full-text available
In this paper, we describe a simple set of "recipes" for the analysis of high spatial density EEG. We focus on a linear integration of multiple channels for extracting individual components without making any spatial or anatomical modeling assumptions, instead requiring particular statistical properties such as maximum difference, maximum power, or statistical independence. We demonstrate how corresponding algorithms, for example, linear discriminant analysis, principal component analysis and independent component analysis, can be used to remove eye-motion artifacts, extract strong evoked responses, and decompose temporally overlapping components. The general approach is shown to be consistent with the underlying physics of EEG, which specifies a linear mixing model of the underlying neural and non-neural current sources.
Article
Full-text available
Detecting people in images is key for several important application domains in computer vision. This paper presents an in-depth experimental study on pedestrian classification; multiple feature-classifier combinations are examined with respect to their ROC performance and efficiency. We investigate global versus local and adaptive versus nonadaptive features, as exemplified by PCA coefficients, Haar wavelets, and local receptive fields (LRFs). In terms of classifiers, we consider the popular Support Vector Machines (SVMs), feed-forward neural networks, and k-nearest neighbor classifier. Experiments are performed on a large data set consisting of 4,000 pedestrian and more than 25,000 nonpedestrian (labeled) images captured in outdoor urban environments. Statistically meaningful results are obtained by analyzing performance variances caused by varying training and test sets. Furthermore, we investigate how classification performance and training sample size are correlated. Sample size is adjusted by increasing the number of manually labeled training data or by employing automatic bootstrapping or cascade techniques. Our experiments show that the novel combination of SVMs with LRF features performs best. A boosted cascade of Haar wavelets can, however, reach quite competitive results, at a fraction of computational cost. The data set used in this paper is made public, establishing a benchmark for this important problem.
Article
Full-text available
We describe a real-time electroencephalography (EEG)-based brain-computer interface system for triaging imagery presented using rapid serial visual presentation. A target image in a sequence of nontarget distractor images elicits in the EEG a stereotypical spatiotemporal response, which can be detected. A pattern classifier uses this response to reprioritize the image sequence, placing detected targets in the front of an image stack. We use single-trial analysis based on linear discrimination to recover spatial components that reflect differences in EEG activity evoked by target versus nontarget images. We find an optimal set of spatial weights for 59 EEG sensors within a sliding 50-ms time window. Using this simple classifier allows us to process EEG in real time. The detection accuracy across five subjects is on average 92%, i.e., in a sequence of 2500 images, resorting images based on detector output results in 92% of target images being moved from a random position in the sequence to one of the first 250 images (first 10% of the sequence). The approach leverages the highly robust and invariant object recognition capabilities of the human visual system, using single-trial EEG analysis to efficiently detect neural signatures correlated with the recognition event.
Article
Full-text available
This paper is concerned with hierarchical Markov random field (MRP) models and their application to sonar image segmentation. We present an original hierarchical segmentation procedure devoted to images given by a high-resolution sonar. The sonar image is segmented into two kinds of regions: shadow (corresponding to a lack of acoustic reverberation behind each object lying on the sea-bed) and sea-bottom reverberation. The proposed unsupervised scheme takes into account the variety of the laws in the distribution mixture of a sonar image, and it estimates both the parameters of noise distributions and the parameters of the Markovian prior. For the estimation step, we use an iterative technique which combines a maximum likelihood approach (for noise model parameters) with a least-squares method (for MRF-based prior). In order to model more precisely the local and global characteristics of image content at different scales, we introduce a hierarchical model involving a pyramidal label field. It combines coarse-to-fine causal interactions with a spatial neighborhood structure. This new method of segmentation, called the scale causal multigrid (SCM) algorithm, has been successfully applied to real sonar images and seems to be well suited to the segmentation of very noisy images. The experiments reported in this paper demonstrate that the discussed method performs better than other hierarchical schemes for sonar image segmentation
Article
Full-text available
This review summarizes linear spatiotemporal signal analysis methods that derive their power from careful consideration of spatial and temporal features of skull surface potentials. BCIs offer tremendous potential for improving the quality of life for those with severe neurological disabilities. At the same time, it is now possible to use noninvasive systems to improve performance for time-demanding tasks. Signal processing and machine learning are playing a fundamental role in enabling applications of BCI and in many respects, advances in signal processing and computation have helped to lead the way to real utility of noninvasive BCI.
Article
This paper describes a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features and yields extremely efficient number of critical visual features and yields extremely efficient classifiers [6]. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. A set of experiments in the domain of face detection are presented. The system yields face detection performace comparable to the best previous systems [18, 13, 16, 12, 1]. Implemented on a conventional desktop, face detection proceeds at 15 frames per second.
Article
Chapter
ABSTRACT Studies in scene ,perception ,have ,shown ,that observers,recognize ,a real-world ,scene ,at a ,single glance. During this expeditious process of seeing, the visual system ,forms ,a spatial ,representation ,of the outside world,that is rich enough,to grasp the meaning of the scene, recognizing a few objects and other salient information in the image, to facilitate object detection and the deployment ,of attention. This representation refers to the gist of a scene, which includes all levels of processing, from low-level features (e.g., color, spatial frequencies) to intermediate image properties (e.g., surface, volume) and high-level information (e.g., objects, activation of semantic knowledge). Therefore, gist can be studied,at both,perceptual,and conceptual levels. I. WHAT IS THE “GIST OF A SCENE”?
Conference Paper
Automatic detection of underwater objects is a critical task for a variety of underwater applications. Rapid detection approaches are needed to tackle the large amount of data produced using state-of-the-art sensors such as Synthetic Aperture Sonar. Accurate detection approaches are also required to reduce the number of false alarms and enable on the fly adaptation of the missions in Autonomous Underwater Vehicles. In this paper we propose a new method for object detection in Synthetic Aperture Sonar imagery capable of processing images extremely rapidly based on the Viola and Jones cascade of boosted classifiers. Our approach provides confidence-rated predictions rather than the {-1,1} of the traditional cascade. This does not only provide a confidence level to each prediction but also reduces the false alarm rate significantly. We also introduce a novel structure of the cascade capable of obtaining low false alarm rates while achieving high detection accuracy. Results obtained on a real dataset of Synthetic Aperture Sonar on a variety of challenging terrains are presented to show the discriminative power of such an approach.
Article
The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
The problem of classifying targets in sonar images from multiple views is modeled as a partially observable Markov decision process (POMDP). This model allows one to adaptively determine which additional views of an object would be most beneficial in reducing the classification uncertainty. Acquiring these additional views is made possible by employing an autonomous underwater vehicle (AUV) equipped with a side-looking imaging sonar. The components of the multiview target classification POMDP are specified. The observation model for a target is specified by the degree of similarity between the image under consideration and a number of precomputed templates. The POMDP is validated using real synthetic aperture sonar (SAS) data gathered during experiments at sea carried out by the NATO Undersea Research Centre, and results show that the accuracy of the proposed method outperforms an approach using a number of predetermined view aspects. The approach provides an elegant way to fully exploit multiview information and AUV maneuverability in a methodical manner.
Article
An advanced capability for automated detection and classification of sea mines in sonar imagery has been developed. The advanced mine detection and classification (AMDAC) algorithm consists of an improved detection density algorithm, a classification feature extractor that uses a stepwise feature selection strategy, a k-nearest neighbor attractor-based neural network (KNN) classifier, and an optimal discriminatory filter classifier. The detection stage uses a nonlinear matched filter to identify mine-size regions in the sonar image that closely match a mine's signature. For each detected mine-like region, the feature extractor calculates a large set of candidate classification features. A stepwise feature selection process then determines the subset features that optimizes probability of detection and probability of classification for each of the classifiers while minimizing false alarms. Bibtex entry for this abstract Preferred format for this abstract (see Preferences) Find Similar Abstracts: Use: Authors Title Abstract Text Return: Query Results Return items starting with number Query Form Database: Astronomy Physics arXiv e-prints
Conference Paper
Recently Viola et al. [2001] have introduced a rapid object detection. scheme based on a boosted cascade of simple feature classifiers. In this paper we introduce a novel set of rotated Haar-like features. These novel features significantly enrich the simple features of Viola et al. and can also be calculated efficiently. With these new rotated features our sample face detector shows off on average a 10% lower false alarm rate at a given hit rate. We also present a novel post optimization procedure for a given boosted cascade improving on average the false alarm rate further by 12.5%.
Conference Paper
Operational requirements for naval applications have shifted towards the fast, reliable detection and avoidance or elimination of underwater threats (e.g. mines, IEDs (improvised explosive devices),...) over the last decade. For these purposes the ability to reliable separate mines or IEDs from rocks or bottom features is essential. This separation can be much more difficult for IEDs compared to traditional cylindrical or spherical mines. Furthermore, automatic target recognition (ATR) approaches are gaining more and more importance for autonomous UUVs. Since no operator is in the loop, these systems are harmed by a limited number of missed detections or a significant number of false targets. In this context the ability to automatically detect and classify objects depends directly on the true resolution of the acoustic imaging system. All this points towards the need for a high resolution sensor for reliable object detection, classification and identification. Starting with some examples, this paper presents theoretical considerations about the required resolution for the detection, classification and identification process of objects in side scan sonar images. Clues for the required resolution can be directly derived from the Johnson-Criterion for electro-optics systems. Secondly, an image processing software for automatic object detection and classification currently under development at FWG with the assistance of FU-Berlin and FGAN-FOM is presented. This part focuses on an overview of the system and recently developed and tested algorithms. Before applying different detection algorithms, the side scan sonar images are preprocessed including normalization, height estimation plus slant range correction and geo-referencing. Different normalization algorithms can be used. Currently six different screening algorithms for detecting regions of interests (ROIs) with objects of interest are implemented. These screening algorithms base on statistical features within a slid- ing window, a highlight / shadow analysis after threshold segmentation, a normalized Id-cross correlation with a template, a modified maximally stable extremal regions (MSER) approach, a k-means and a higher order statistic based segmentation. Afterwards false detections of ROIs without objects of interest are eliminated by applying a single snake algorithm for the entire highlight and shadow area, a coupled snake algorithms for the highlight area and for the shadow area, a 2D-cross correlation with reference images of MLOs and an iterative segmentation, all combined with robust and fast classifiers. The final processing step is a classifier (Probabilistic Neural Network (PNN)). Also a simple data fusion strategy was tested based on the output of the different screening and reduction of false positives algorithms. Finally, consequences for image processing with improved sensor resolution are discussed. All algorithms were tested using a data set representing roughly 25 km2 of the sea floor. This data set was in part collected by the SeaOtter MK I AUV from Atlas Elektronik and gathered in the Baltic Sea and the Mediterranean Sea. Different side scan sonar systems were used.
Article
In many remote-sensing classification problems, the number of targets (e.g., mines) present is very small compared with the number of clutter objects. Traditional classification approaches usually ignore this class imbalance, causing performance to suffer accordingly. In contrast, the recently developed infinitely imbalanced logistic regression (IILR) algorithm explicitly addresses class imbalance in its formulation. We describe this algorithm and give the details necessary to employ it for remote-sensing data sets that are characterized by class imbalance. The method is applied to the problem of mine classification on three real measured data sets. Specifically, classification performance using the IILR algorithm is shown to exceed that of a standard logistic regression approach on two land-mine data sets collected with a ground-penetrating radar and on one underwater-mine data set collected with a sidescan sonar.
Conference Paper
Human-aided computing proposes using information measured directly from the human brain in order to per- form useful tasks. In this paper, we extend this idea by fus- ing computer vision-based processing and processing done by the human brain in order to build more effective object categorization systems. Specifically, we use an electroen- cephalograph (EEG) device to measure the subconscious cognitive processing that occurs in the brain as users see images, even when they are not trying to explicitly clas- sify them. We present a novel framework that combines a discriminative visual category recognition system based on the Pyramid Match Kernel (PMK) with information derived from EEG measurements as users view images. We propose a fast convex kernel alignment algorithm to effectively com- bine the two sources of information. Our approach is val- idated with experiments using real-world data, where we show significant gains in classification accuracy. We an- alyze the properties of this information fusion method by examining the relative contributions of the two modalities, the errors arising from each source, and the stability of the combination in repeated experiments.
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
Mine detection and classification using high-resolution sidescan sonar is a critical technology for mine counter measures (MCM). As opposed to the majority of techniques which require large training data sets, this paper presents unsupervised models for both the detection and the shadow extraction phases of an automated classification system. The detection phase is carried out using an unsupervised Markov random field (MRF) model where the required model parameters are estimated from the original image. Using a priori spatial information on the physical size and geometric signature of mines in sidescan sonar, a detection-orientated MRF model is developed which directly segments the image into regions of shadow, seabottom-reverberation, and object-highlight. After detection, features are extracted so that the object can be classified. A novel co-operating statistical snake (CSS) model is presented which extracts the highlight and shadow of the object. The CSS model again utilizes available a priori information on the spatial relationship between the highlight and shadow, allowing accurate segmentation of the object's shadow to be achieved.
In a blink of an eye and a switch of a transistor: Cortically coupled computer vision Brain activity-based image classification from rapid serial visual presentation
  • P Sajda
  • E Pohlmeyer
  • J Wang
  • L C Parra
  • C Christoforou
  • J Dmochowski
  • B Hanna
  • C Bahlmann
  • M K Singh
  • S.-F Chang
P. Sajda, E. Pohlmeyer, J. Wang, L. C. Parra, C. Christoforou, J. Dmochowski, B. Hanna, C. Bahlmann, M. K. Singh, and S.-F. Chang, " In a blink of an eye and a switch of a transistor: Cortically coupled computer vision, " Proc. IEEE, vol. 98, no. 3, pp. 462–478, Mar. 2010. [18] N. Bigdely-Shamlo, A. Vankov, R. R. Ramirez, and S. Makeig, " Brain activity-based image classification from rapid serial visual presentation, " IEEE Trans. Neural Syst. Rehab. Eng., vol. 16, no. 5, pp. 432–441, Oct. 2008.