Chapter

Seismic signal augmentation to improve generalization of deep neural networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Deep learning has emerged as an effective approach for seismic data processing in general, and for earthquake monitoring in particular. The ability of deep learning models to generalize beyond the training and validation data is important for comprehensive earthquake monitoring; this ability furthermore depends on the availability of a sufficiently large and complete training dataset. However, this requirement can prove challenging to meet due to significant effort and time for data collection and labeling. Data augmentation provides an efficient and effective approach for increasing the dimension of training samples and improving generalization to unseen samples. In this paper, we present augmentation methods appropriate for seismic waveforms and demonstrate their ability to reduce bias and increase performance. These augmentation methods can be applied to a wide range of deep learning applications designed for seismic data.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Additionally, we applied data augmentation by randomly shifting the 10 s window signals by 0 to 1 seconds to either the right or left, to reduce overfitting in the avalanche class and for better generalization (Zhu et al., 2020). Similarly, in the spectral autoencoder, we used an expected portion of 0.5 avalanches, a learning rate of 1e −4 and a batch size of 128 and found 5 training epochs to be optimal. ...
... Also, implementing specialized data augmentation techniques to increase the variety and number of the avalanche recordings, e.g. seismic data augmentation techniques (Zhu et al., 2020) or generative models (Wang et al., 2021), might help to make the classifiers more robust. ...
Preprint
Full-text available
Monitoring snow avalanche activity is essential for operational avalanche forecasting and the successful implementation of mitigation measures to ensure safety in mountain regions. To facilitate and automate the monitoring process, avalanche detection systems equipped with seismic sensors can provide a cost-effective solution. Still, automatically differentiating avalanche signals from other sources in seismic data remains challenging, mainly due to the complexity of seismic signals generated by avalanches, the complex signal transmission through the ground, the relatively rare occurrence of avalanches, and the presence of multiple sources in the continuous seismic data. One approach to automate avalanche detection is by applying machine learning methods. So far, research in this area has mainly focused on extracting standard domain-specific signal attributes in the time and frequency domains as input features for statistical models. In this study, we propose a novel application of deep learning autoencoder models for the automatic and unsupervised extraction of features from seismic recordings. These new features are then fed into classifiers for discriminating snow avalanches. To this end, we trained three Random forest classifiers based on different feature extraction approaches. The first set of 32 features was automatically extracted from the time-series signals by an autoencoder consisting of convolutional layers and a recurrent long short-term memory unit. The second autoencoder applies a series of fully connected layers to extract 16 features from the spectrum of the signals. As a benchmark, a third random forest was trained with typical waveform, spectral and spectrogram attributes used to discriminate seismic events. We extracted all these features from 10-second windows of the seismograms recorded with an array of five seismometers installed in an avalanche test site located above Davos, Switzerland. The database used to train and test the models contained 84 avalanches and 828 noise (unrelated to avalanches) events recorded during the winter seasons of 2020–2021 and 2021–2022. Finally, we assessed the performance of each classifier, compared the results, and proposed different aggregation methods to improve the predictive performance of the developed seismic detection algorithms. The classifiers achieved an avalanche f1-score of 0.61 (seismic attributes), 0.49 (temporal autoencoder) and 0.60 (spectral autoencoder) and avalanche recall of 0.68, 0.71 and 0.71, respectively. Overall, the macro f1-score ranged from 0.70 (temporal autoencoder) to 0.78 (seismic attributes). After applying a post-processing step to event-based predictions, the avalanche recall of the three models significantly increased, reaching values between 0.82 and 0.91. The developed approach could be potentially used as an operational, near-real-time avalanche detection system. Yet, the relatively high number of false alarms still needs further implementation of the current automated seismic classification algorithms to be used as unique methods to detect avalanches effectively.
... The squeezing ratio is randomly sampled from 1,2,…8 with equal probability (i.e., 12.5% for all ratios). We then shift waveforms to avoid the case of the denoising algorithm memorizing the stationary P-wave arrival time (Zhu et al., 2020). We take the theoretical P arrival time as the original zero and then shift waveforms using a uniform probability between ±75 s. ...
... CNN: convolutional neural network. FCNN: fully connected neural network. of the data is enhanced with each additional training step (epoch), which reduces the possibility of overfitting the training data (Zhu et al., 2020). ...
Article
Full-text available
Elevated seismic noise for moderate‐size earthquakes recorded at teleseismic distances has limited our ability to see their complexity. We develop a machine‐learning‐based algorithm to separate noise and earthquake signals that overlap in frequency. The multi‐task encoder‐decoder model is built around a kernel pre‐trained on local (e.g., short distances) earthquake data (Yin et al., 2022, https://doi.org/10.1093/gji/ggac290) and is modified by continued learning with high‐quality teleseismic data. We denoise teleseismic P waves of deep Mw5.0+ earthquakes and use the clean P waves to estimate source characteristics with reduced uncertainties of these understudied earthquakes. We find a scaling of moment and duration to be M0 ≃ τ⁴, and a resulting strong scaling of stress drop and radiated energy with magnitude (Δσ≃M00.21 ΔσM00.21{\Delta }\sigma \simeq {M}_{0}^{0.21} and ER≃M01.24 ERM01.24{E}_{R}\simeq {M}_{0}^{1.24}). The median radiation efficiency is 5%, a low value compared to crustal earthquakes. Overall, we show that deep earthquakes have weak rupture directivity and few subevents, suggesting a simple model of a circular crack with radial rupture propagation is appropriate. When accounting for their respective scaling with earthquake size, we find no systematic depth variations of duration, stress drop, or radiated energy within the 100–700 km depth range. Our study supports the findings of Poli and Prieto (2016, https://doi.org/10.1002/2016jb013521) with a doubled amount of earthquakes investigated and with earthquakes of lower magnitudes.
... These factors collectively give rise to a scenario in which models trained on one set of seismic data encounter difficulties when adapting to new, previously unseen seismic data. In response to the challenges related to generalizability, researchers are actively investigating pioneering strategies, such as transfer learning techniques [24], [25] and data augmentation methods [26]. Nevertheless, it remains a significant and ongoing endeavor to bridge the gap between the training and real-world deployment of neural networks in the complex and ever-evolving field of geophysics. ...
Article
Full-text available
Precise first-arrival picking holds pivotal importance in the realms of seismic data processing and microseismic monitoring. Recently, data-driven approaches have shown remarkable performance. However, this approach relies on high-quality labeled datasets and involves a time-consuming and labor-intensive labeling process. Additionally, data-driven picking methods often suffer from generalisation problems in the face of varying noise characteristics and geological environments. To tackle the challenges head-on, this study introduces a novel training algorithm grounded in meta-learning. In contrast to traditional training methods, this innovative approach distinguishes itself by reducing the costs associated with dataset creation and requiring only a modest number of high-quality labeled samples to achieve superior performance. Furthermore, the proposed method can be seamlessly implemented with different types of deep neural networks. Our extensive experimentation on two field datasets encompassing distinct geological zones demonstrates the method’s effectiveness in alleviating the dependence on high-quality training samples, enhancing first-arrival picking accuracy, and bolstering the model’s robustness against strong noise interference.
... Collecting sufficient labeled data to overcome these issues in NDE applications presents several challenges due to unknown properties inside the structure, a lack of realistic training cases due to rare conditions, as well as challenges in obtaining real samples for experiments. Furthermore, in high-dimensional data space, the inadequate training samples may span in a small subspace and provide weak constraints on the possible decision boundaries learned by the network, which can lead to poor performance [30][31][32]. Therefore, it is essential to ...
Article
Full-text available
Bulk wave acoustic time-of-flight (ToF) measurements in pipes and closed containers can be hindered by guided waves with similar arrival times propagating in the container wall, especially when a low excitation frequency is used to mitigate sound attenuation from the material. Convolutional neural networks (CNNs) have emerged as a new paradigm for obtaining accurate ToF in non-destructive evaluation (NDE) and have been demonstrated for such complicated conditions. However, the generalizability of ToF-CNNs has not been investigated. In this work, we analyze the generalizability of the ToF-CNN for broader applications, given limited training data. We first investigate the CNN performance with respect to training dataset size and different training data and test data parameters (container dimensions and material properties). Furthermore, we perform a series of tests to understand the distribution of data parameters that need to be incorporated in training for enhanced model generalizability. This is investigated by training the model on a set of small- and large-container datasets regardless of the test data. We observe that the quantity of data partitioned for training must be of a good representation of the entire sets and sufficient to span through the input space. The result of the network also shows that the learning model with the training data on small containers delivers a sufficiently stable result on different feature interactions compared to the learning model with the training data on large containers. To check the robustness of the model, we tested the trained model to predict the ToF of different sound speed mediums, which shows excellent accuracy. Furthermore, to mimic real experimental scenarios, data are augmented by adding noise. We envision that the proposed approach will extend the applications of CNNs for ToF prediction in a broader range.
... It is worth emphasizing that we did not apply additional filtering to the test data to maintain the originality of the data. During the model training process, we employed various data augmentation techniques such as random cropping, time shifting, and amplitude scaling [48]. These techniques contributed to improving the convergence speed of the model, reducing the risk of overfitting, and improving the generalization of the trained model. ...
Article
Full-text available
Seismograms, the fundamental seismic records, have revolutionized earthquake research and monitoring. Recent advancements in deep learning have further enhanced seismic signal processing, leading to even more precise and effective earthquake monitoring capabilities. This paper introduces a foundational deep learning model, the Seismogram Transformer (SeisT), designed for a variety of earthquake monitoring tasks. SeisT combines multiple modules tailored to different tasks and exhibits impressive out-of-distribution generalization performance, outperforming or matching state-of-the-art models in tasks like earthquake detection, seismic phase picking, first-motion polarity classification, magnitude estimation, back-azimuth estimation, and epicentral distance estimation. The performance scores on the tasks are 0.96, 0.96, 0.68, 0.95, 0.86, 0.55, and 0.81, respectively. The most significant improvements, in comparison to existing models, are observed in phase-P picking, phase-S picking, and magnitude estimation, with gains of 1.7%, 9.5%, and 8.0%, respectively. Our study, through rigorous experiments and evaluations, suggests that SeisT has the potential to contribute to the advancement of seismic signal processing and earthquake research.
... Without the autoencoder-based denoising process, stochastic methods can be used to inject structured noise into data to modulate outcomes (e.g., Zhu et al., 2020;Cao et al., 2022). However, because geochemical data are generally multivariate, its uncertainty is also multivariate. ...
Article
Full-text available
Regional geochemical surveys generate large amounts of data that can be used for a number of purposes such as to guide mineral exploration. Modern surveys are typically designed to permit quantification of data uncertainty through data quality metrics by using quality assurance and quality control (QA/QC) methods. However, these metrics, such as data accuracy and precision, are obtained through the data generation phase. Consequently, it is unclear how residual uncertainty in geochemical data can be minimized (denoised). This is a limitation to propagating uncertainty through downstream activities, particularly through complex models, which can result from the usage of artificial intelligence-based methods. This study aims to develop a deep learning-based method to examine and quantify uncertainty contained in geochemical survey data. Specifically, we demonstrate that: (1) autoencoders can reduce or modulate geochemical data uncertainty; (2) a reduction in uncertainty is observable in the spatial domain as a decrease of the nugget; and (3) a clear data reconstruction regime of the autoencoder can be identified that is strongly associated with data denoising, as opposed to the removal of useful events in data, such as meaningful geochemical anomalies. Our method to post-hoc denoising of geochemical data using deep learning is simple, clear and consistent, with the amount of denoising guided by highly interpretable metrics and existing frameworks of scientific data quality. Consequently, variably denoised data, as well as the original data, could be fed into a single downstream workflow (e.g., mapping, general data analysis or mineral prospectivity mapping), and the differences in the outcome can be subsequently quantified to propagate data uncertainty.
... It was applied at three stages: training, model-picking, and inference. Adding data augmentation at the training stage could be regarded as expanding the training space and giving more features for building a generalized model, reducing overfitting, and then improving the validation performance 53,54 . ...
Article
Full-text available
The Motor Disorder Society’s Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) is designed to assess bradykinesia, the cardinal symptoms of Parkinson’s disease (PD). However, it cannot capture the all-day variability of bradykinesia outside the clinical environment. Here, we introduce FastEval Parkinsonism ( https://fastevalp.cmdm.tw/ ), a deep learning-driven video-based system, providing users to capture keypoints, estimate the severity, and summarize in a report. Leveraging 840 finger-tapping videos from 186 individuals (103 patients with Parkinson’s disease (PD), 24 participants with atypical parkinsonism (APD), 12 elderly with mild parkinsonism signs (MPS), and 47 healthy controls (HCs)), we employ a dilated convolution neural network with two data augmentation techniques. Our model achieves acceptable accuracies (AAC) of 88.0% and 81.5%. The frequency-intensity (FI) value of thumb-index finger distance was indicated as a pivotal hand parameter to quantify the performance. Our model also shows the usability for multi-angle videos, tested in an external database enrolling over 300 PD patients.
... We augmented the data from MLAAPDE in four different ways following methods tested in Zhu et al. (2020). We first implemented a random shift in the location of the phase pick so that it was no longer in the exact center of each waveform. ...
Article
The foundation of earthquake monitoring is the ability to rapidly detect, locate, and estimate the size of seismic sources. Earthquake magnitudes are particularly difficult to rapidly characterize because magnitude types are only applicable to specific magnitude ranges, and location errors propagate to substantial magnitude errors. We developed a method for rapid estimation of single-station earthquake magnitudes using raw three-component P waveforms observed at local to teleseismic distances, independent of prior size or location information. We used the MagNet regression model architecture (Mousavi and Beroza, 2020b), which combines convolutional and recurrent neural networks. We trained our model using ∼2.4 million P-phase arrivals labeled by the authoritative magnitude assigned by the U.S. Geological Survey. We tested input data parameters (e.g., window length) that could affect the performance of our model in near-real-time monitoring applications. At the longest waveform window length of 114 s, our model (Artificial Intelligence Magnitude [AIMag]) is accurate (median estimated magnitude within ±0.5 magnitude units from catalog magnitude) between M 2.3 and 7.6. However, magnitudes above M ∼7 are more underestimated as true magnitude increases. As the windows are shortened down to 1 s, the point at which higher magnitudes begin to be underestimated moves toward lower magnitudes, and the degree of underestimation increases. The over and underestimation of magnitudes for the smallest and largest earthquakes, respectively, are potentially related to the limited number of events in these ranges within the training data, as well as magnitude saturation effects related to not capturing the full source time function of large earthquakes. Importantly, AIMag can determine earthquake magnitudes with individual stations’ waveforms without instrument response correction or knowledge of an earthquake’s source-station distance. This work may enable monitoring agencies to more rapidly recognize large, potentially tsunamigenic global earthquakes from few stations, allowing for faster event processing and reporting. This is critical for timely warnings for seismic-related hazards.
... Given the existence of several widely used deep-learning-based phase pickers [18][19][20] , we directly reuse the pre-trained PhaseNet 19 model to omit retraining a deep-learning phase picker for conventional seismic data, which is not the focus of this work. Despite PhaseNet being trained on three-component seismic waveforms, it can also be applied to single-component waveforms because channel dropout (i.e., randomly zero-out one or two channels) is added as data augmentation 70 . ...
Article
Full-text available
Distributed Acoustic Sensing (DAS) is an emerging technology for earthquake monitoring and subsurface imaging. However, its distinct characteristics, such as unknown ground coupling and high noise level, pose challenges to signal processing. Existing machine learning models optimized for conventional seismic data struggle with DAS data due to its ultra-dense spatial sampling and limited manual labels. We introduce a semi-supervised learning approach to address the phase-picking task of DAS data. We use the pre-trained PhaseNet model to generate noisy labels of P/S arrivals in DAS data and apply the Gaussian mixture model phase association (GaMMA) method to refine these noisy labels and build training datasets. We develop PhaseNet-DAS, a deep learning model designed to process 2D spatio-temporal DAS data to achieve accurate phase picking and efficient earthquake detection. Our study demonstrates a method to develop deep learning models for DAS data, unlocking the potential of integrating DAS in enhancing earthquake monitoring.
... One solution for addressing these constraints is data augmentation. Augmentation entails the introduction of variations into the data or the synthesis of additional data, thereby augmenting its diversity and ultimately improving the performance and generalization capabilities of the model [7]. While various augmentation methodologies have been proposed for vision and language models, it is not appropriate to apply them directly to vision-language models. ...
Preprint
Full-text available
p>In recent research, there has been a growing focus on advancing multimodal models beyond unimodal counterparts to ensure robustness in real-world scenarios. Achieving effectiveness amid various types of noise requires resilience to distribution shifts, often addressed through data augmentation techniques. However, the widely used vision-language augmentation, MixGen, experiences a notable performance decline under real-world conditions with perturbed data. Quantitative and qualitative analyses, employing ground-truth ranking and Grad-CAM in a retrieval task, reveal that this decline is attributed to models trained with MixGen-augmented data relying on spurious correlations.In response, we propose RobustMixGen, a novel data augmentation method considering both image and text content to address this challenge. To enhance modality-specific considerations, we introduce a method for categorizing object and background classes in advance. Employing CutMixUp for image synthesis and Conjunction Concat for text synthesis, this technique aims to mitigate spurious correlations. The effectiveness of RobustMixGen is demonstrated in a retrieval task, exhibiting a 0.21% improvement in Recall@K Mean compared to existing models. Additionally, under perturbed data in a distribution shift scenario, it showcases robustness with a 17.11% improvement in image perturbations and a 2.77% enhancement in text perturbations based on MMI, establishing itself as a more robust data augmentation technique.</p
... One solution for addressing these constraints is data augmentation. Augmentation entails the introduction of variations into the data or the synthesis of additional data, thereby augmenting its diversity and ultimately improving the performance and generalization capabilities of the model [7]. While various augmentation methodologies have been proposed for vision and language models, it is not appropriate to apply them directly to vision-language models. ...
Preprint
Full-text available
p>In recent research, there has been a growing focus on advancing multimodal models beyond unimodal counterparts to ensure robustness in real-world scenarios. Achieving effectiveness amid various types of noise requires resilience to distribution shifts, often addressed through data augmentation techniques. However, the widely used vision-language augmentation, MixGen, experiences a notable performance decline under real-world conditions with perturbed data. Quantitative and qualitative analyses, employing ground-truth ranking and Grad-CAM in a retrieval task, reveal that this decline is attributed to models trained with MixGen-augmented data relying on spurious correlations.In response, we propose RobustMixGen, a novel data augmentation method considering both image and text content to address this challenge. To enhance modality-specific considerations, we introduce a method for categorizing object and background classes in advance. Employing CutMixUp for image synthesis and Conjunction Concat for text synthesis, this technique aims to mitigate spurious correlations. The effectiveness of RobustMixGen is demonstrated in a retrieval task, exhibiting a 0.21% improvement in Recall@K Mean compared to existing models. Additionally, under perturbed data in a distribution shift scenario, it showcases robustness with a 17.11% improvement in image perturbations and a 2.77% enhancement in text perturbations based on MMI, establishing itself as a more robust data augmentation technique.</p
... Interestingly, we find that the eqt-scedc performs even better than the eqt-instance in terms of pick rates, but it also has a larger variance in pick errors. We postulate that the reason behind the high picking rate of the eqt-scedc is that the scedc dataset includes much more low-SNR data for training, though we can only guess since DL models tend to be trained on augmented data [50]. This suggests that the involvement of more well-labeled low-SNR data may improve the model performance. ...
Article
Full-text available
The detection and picking of seismic waves is the first step toward earthquake catalog building, earthquake monitoring, and seismic hazard management. Recent advances in deep learning have leveraged the amount of labeled seismic data to improve the capability of detecting and picking earthquake signals. While these deep learning methods have shown great promise, their success remains hindered by low generalizability and poor performance in low signal-to-noise ratios (SNRs) data. Here, we propose a new processing workflow that integrates pretrained deep learning models, multi-frequency band predictions, and ensemble estimations to enhance the generalization of these algorithms. We test the performance of the ensemble model using three benchmark datasets, one of which is within-domain and has been used for training the deep learning models, the other two being cross-domain test datasets. We explore the performance given data and model characteristics. We also compare an ensemble approach with a transfer-learning approach and discuss the benefits and drawbacks of these two approaches when deploying on continuous data. Our experiments demonstrate that ensemble learning can drastically improve generalization ability and hence alleviate the need for transfer learning in the case where no labeled datasets exist.
... This work shows how seismic data present variations that complicate generalizing DL models and uses a feature extraction method that obtains a representation of all seismic data in a common space. [15] presents a method to improve the generalization of DL models to detect earthquake signals in seismic data by augmenting the training data. This work shows the importance of the quality of the representativeness of the training data, as well as how it is possible to obtain a better generalization without modifying the architecture of the DL model. ...
... The lack of labeled training data remains a big challenge in applying DNNs in the geoscience field because the subsurface ground truth is typically unknown and manual labeling is highly subjective and labor-intensive. One method to address this challenge is to generate synthetic training datasets with labels by using various numerically geological and geophysical forward modeling methods (6,43,52,(62)(63)(64)(65)(66)(67). ...
Article
Full-text available
One of the key objectives in geophysics is to characterize the subsurface through the process of analyzing and interpreting geophysical field data that are typically acquired at the surface. Data-driven deep learning methods have enormous potential for accelerating and simplifying the process but also face many challenges, including poor generalizability, weak interpretability, and physical inconsistency. We present three strategies for imposing domain knowledge constraints on deep neural networks (DNNs) to help address these challenges. The first strategy is to integrate constraints into data by generating synthetic training datasets through geological and geophysical forward modeling and properly encoding prior knowledge as part of the input fed into the DNNs. The second strategy is to design nontrainable custom layers of physical operators and preconditioners in the DNN architecture to modify or shape feature maps calculated within the network to make them consistent with the prior knowledge. The final strategy is to implement prior geological information and geophysical laws as regularization terms in loss functions for training the DNNs. We discuss the implementation of these strategies in detail and demonstrate their effectiveness by applying them to geophysical data processing, imaging, interpretation, and subsurface model building.
... For instance, such approaches have been widely used for image data 34 with recent applications also in seismology. 35,36 The general idea is that rather than collecting (or, in the present application, numerically simulating) more data, data synthesis approaches leverage the structure of the already available data to generate additional samples of representative training data. The following paragraphs present a first-ever approach for augmentation of the training data specifically tailored for seismic collapse risk assessment. ...
Article
Full-text available
Majority of the past research on application of machine learning (ML) in earthquake engineering focused on contrasting the predictive performance of different ML algorithms. In contrast, the emphasis of this paper is on the use of data to boost the predictive performance of surrogates. To that end, a novel data engineering methodology for seismic collapse risk assessment is proposed. This method, termed the automated collapse data constructor (ACDC), stems from combined understanding of the ground motion characteristics and the collapse process. In addition, the data‐driven collapse classifier (D2C2) methodology is proposed which enables conversion of the collapse data from a regression format to a classification format. The D2C2 methodology can be used with any classification tool, and it allows estimation of seismic collapse capacities in a way analogous to the incremental dynamic analysis. The proposed methodologies are tested in a case study using decision trees (XGBoost) and neural network classifiers with an extensive dataset of collapse responses of a 4‐story and an 8‐story steel moment resisting frames. The results suggest that the ACDC methodology allows for dramatic improvement of the predictive performance of data‐driven tools while at the same time significantly reducing data requirements. Specifically, the proposed method can reduce the number of ground motions required for collapse risk assessment from at least forty, as traditionally used, to less than twenty motions. Moreover, interpretation of feature importance conforms with the engineering understanding while revealing a novel, period‐dependent measure of ground motion duration. All data and code developed in this research are made openly available.
... Another benefit of this approach is the removal of the requirement for the inclusion of realistic noise. A number of studies in deep learning have circumvented this by the inclusion of previously recorded noise (e.g., Mousavi et al., 2019;Zhu et al., 2020;Wang H. et al., 2021). However, this requires a substantial amount of pre-recorded data and is restricted to the geometry/ field where the data were collected. ...
Article
Full-text available
Noise is ever present in seismic data and arises from numerous sources and is continually evolving, both spatially and temporally. The use of supervised deep learning procedures for denoising of seismic datasets often results in poor performance: this is due to the lack of noise-free field data to act as training targets and the large difference in characteristics between synthetic and field datasets. Self-supervised, blind-spot networks typically overcome these limitation by training directly on the raw, noisy data. However, such networks often rely on a random noise assumption, and their denoising capabilities quickly decrease in the presence of even minimally-correlated noise. Extending from blind-spots to blind-masks has been shown to efficiently suppress coherent noise along a specific direction, but it cannot adapt to the ever-changing properties of noise. To preempt the network’s ability to predict the signal and reduce its opportunity to learn the noise properties, we propose an initial, supervised training of the network on a frugally-generated synthetic dataset prior to fine-tuning in a self-supervised manner on the field dataset of interest. Considering the change in peak signal-to-noise ratio, as well as the volume of noise reduced and signal leakage observed, using a semi-synthetic example we illustrate the clear benefit in initialising the self-supervised network with the weights from a supervised base-training. This is further supported by a test on a field dataset where the fine-tuned network strikes the best balance between signal preservation and noise reduction. Finally, the use of the unrealistic, frugally-generated synthetic dataset for the supervised base-training includes a number of benefits: minimal prior geological knowledge is required, substantially reduced computational cost for the dataset generation, and a reduced requirement of re-training the network should recording conditions change, to name a few. Such benefits result in a robust denoising procedure suited for long term, passive seismic monitoring.
... Depending on the configuration of optics for digital imaging, factors such as sharpness, distortion, and resolution can affect the image quality output, which can be resolved by several image processing techniques (Bayer et al., 2001). The former factors are not necessarily bottlenecks for the development of ML models, but instead, improve the generalisation and increase the dimension of training samples to unseen samples, to build more robust models (Zhu et al., 2020). State-of-the-art sensors are introduced which can help to fulfil the demand for the image capture of various microalgae under different scenarios, before image pre-processing. ...
Article
The identification of microalgae species is an important tool in scientific research and commercial application to prevent harmful algae blooms (HABs) and recognizing potential microalgae strains for the bioaccumulation of valuable bioactive ingredients. The aim of this study is to incorporate rapid, high-accuracy, reliable, low-cost, simple, and state-of-the-art identification methods. Thus, increasing the possibility for the development of potential recognition applications, that could identify toxic-producing and valuable microalgae strains. Recently, deep learning (DL) has brought the study of microalgae species identification to a much higher depth of efficiency and accuracy. In doing so, this review paper emphasizes the significance of microalgae identification, and various forms of machine learning algorithms for image classification, followed by image pre-processing techniques, feature extraction, and selection for further classification accuracy. Future prospects over the challenges and improvements of potential DL classification model development, application in microalgae recognition, and image capturing technologies are discussed accordingly.
... In addition, when applying the method to continuous waveform data, as was the case with the test above data, there were many cases where small amplitudes were missed when multiple waveforms of widely different amplitudes were included in the time window. Zhu et al. (2020) showed that by having multiple seismic waveforms in the training data, the detection performance improves when many seismic waveforms are in a short period. Thus, it may be necessary to insert multiple seismic waveforms into the training data to improve the detection performance of active swarm earthquakes, where many earthquakes occur in a short period at a volcano. ...
Preprint
Full-text available
In volcanic regions, active earthquake swarms often occur associated with volcanic activity, and their rapid detection and measurement are crucial for volcano disaster prevention. Currently, however, these processes are ultimately left to human judgment and require much time and money, making detailed verification in real time impossible. To overcome this issue, we attempted to apply machine learning, which has been studied in many seismological fields in recent years. Several models have already been trained using a large amount of training data (mainly crustal earthquakes). Although there are some cases where these models can be applied without any problems, regional dependence on the learned models has also been reported. Since this study targets earthquakes in a volcanic region, existing learned models may be difficult to apply. Therefore, in this study, we created the above publicly available trained model (model0), a model trained with approximately 220,000 seismic waveform data recorded at Hakone volcano from 1999 to 2020 with initialized weights (model1) using the same architecture, and a model fine-tuned with the aforementioned Hakone data using the weight of model0 as initial values (model2), and evaluated their performance. As a result, the detection rates of model1 and 2 were much higher than model0. However, small amplitudes are often missed when multiple seismic waves are in a time window to determine the phase arrival. Therefore, we created training data with two waveforms in the one-time window, retrained the model using the data, and successfully detected waveforms that would have been missed previously. In addition, it was found that more events were detected by setting the threshold to a low value for detection, increasing the number of detections, and filtering by phase association and hypocenter location.
... The neural network architecture, neural network type, and size of a training set are not the only defining factors for a good generalization . The hyperparameter tuning (e.g., Soto & Bernd 2020), training procedure, and data augmentation (e.g., Zhu et al. 2020) also play important roles. Transfer learning approaches are a viable option in cases with low generalization. ...
Article
Machine learning (ML) is a collection of methods used to develop understanding and predictive capability by learning relationships embedded in data. ML methods are becoming the dominant approaches for many tasks in seismology. ML and data mining techniques can significantly improve our capability for seismic data processing. In this review we provide a comprehensive overview of ML applications in earthquake seismology, discuss progress and challenges, and offer suggestions for future work. ▪ Conceptual, algorithmic, and computational advances have enabled rapid progress in the development of machine learning approaches to earthquake seismology. ▪ The impact of that progress is most clearly evident in earthquake monitoring and is leading to a new generation of much more comprehensive earthquake catalogs. ▪ Application of unsupervised approaches for exploratory analysis of these high-dimensional catalogs may reveal new understanding of seismicity. ▪ Machine learning methods are proving to be effective across a broad range of other seismological tasks, but systematic benchmarking through open source frameworks and benchmark data sets are important to ensure continuing progress. Expected final online publication date for the Annual Review of Earth and Planetary Sciences, Volume 51 is May 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
... Another benefit of this approach is the removal of the requirement for the inclusion of realistic noise. A number of studies in deep learning have circumvented this by the inclusion of previously recorded noise (e.g., [34,35,29]). However, this requires a substantial amount of pre-recorded data and is restricted to the geometry/field where the data were collected. ...
Preprint
Full-text available
Noise in seismic data arises from numerous sources and is continually evolving. The use of supervised deep learning procedures for denoising of seismic datasets often results in poor performance: this is due to the lack of noise-free field data to act as training targets and the large difference in characteristics between synthetic and field datasets. Self-supervised, blind-spot networks typically overcome these limitation by training directly on the raw, noisy data. However, such networks often rely on a random noise assumption, and their denoising capabilities quickly decrease in the presence of even minimally-correlated noise. Extending from blind-spots to blind-masks can efficiently suppress coherent noise along a specific direction, but it cannot adapt to the ever-changing properties of noise. To preempt the network's ability to predict the signal and reduce its opportunity to learn the noise properties, we propose an initial, supervised training of the network on a frugally-generated synthetic dataset prior to fine-tuning in a self-supervised manner on the field dataset of interest. Considering the change in peak signal-to-noise ratio, as well as the volume of noise reduced and signal leakage observed, we illustrate the clear benefit in initialising the self-supervised network with the weights from a supervised base-training. This is further supported by a test on a field dataset where the fine-tuned network strikes the best balance between signal preservation and noise reduction. Finally, the use of the unrealistic, frugally-generated synthetic dataset for the supervised base-training includes a number of benefits: minimal prior geological knowledge is required, substantially reduced computational cost for the dataset generation, and a reduced requirement of re-training the network should recording conditions change, to name a few.
... In Table II, even when trained without EEWA, the MMWA model can trigger many P arrivals, which works as anticipated since the movement of "marching" would produce a timeclipped earthquake waveform that only contains the P phase. Performance of the MWA model, in which the training strategy is similar to [33], [54], could support the above argument. It can hardly pick P arrivals using time-clipped earthquake waveforms without marching waveforms in training data. ...
Article
Full-text available
In this paper, we show that the Real-time Earthquake Detection and Phase-picking with multi-task Attention Network (RED-PAN) can carry out earthquake detection and seismic phase picking on real-time and continuous data with appropriate data augmentation. Goal-oriented data augmentations materialize the capability of RED-PAN. Mosaic Waveform Augmentation (MWA) synthesizes data conditioned by superimposed earthquake waveforms, Marching Mosaic Waveform Augmentation (MMWA) extends MWA to allow the dynamic input of seismograms, and Earthquake Early Warning Augmentation (EEWA) enables to identify P arrivals using the early part of P wave waveforms. For stable P and S arrival probability distribution functions (PDFs) of continuous recordings, we use the median values of phase predictions at each time point until the model scans through, which we term the Seismogram-Tracking Median Filter (STMF). For the real-time P arrival detection, we use a threshold (0.3) on the real-time P arrival PDF as the trigger criterion. We examined our proposed strategy in different application scenarios. For the dataset of the fixed-length samples, our RED-PAN( 60s) model performs similarly to EQTransformer on the STanford EArthquake Dataset (STEAD) and outperforms on the Taiwan dataset. For continuous data examination of the 2019 Ridgecrest earthquake sequence, the number of earthquake waveforms detected by our RED-PAN(60s) model is 2.7 times the number of EQTransformer under the same receptive field (60-second-long seismogram). In the application of earthquake early warning, our RED-PAN(60s) model only requires the P-wave waveform about 0.13 seconds long from the P-alert and 0.09 seconds long from the Taiwan Strong Motion Instrumentation Program (TSMIP) network. The source code is available at https://github.com/tso1257771/RED-PAN.
... We can perform rotation, flipping, adding random noise, or smoothing the training images. Nalepa et al. (2019); Hu et al. (2019); Zhu et al. (2020); Zhao et al. (2021); Zhang et al. (2021) show that data augmentation can improve the generalization of a neural network. However, some augmentation methods can provide negative impacts on the model. ...
... Since the number and sequence of stations might vary with different samples, we store the node information of each sample in the dictionary data structure. The edge information is not stored in the data set, but constructed during the training, validation, and testing process, because edges could change when performing data augmentation (e.g., reordering, resampling stations; W. Zhu et al., 2020). In addition, the edgeless structure allows for the exploration of different methods and threshold values to establish edges in further research. ...
Article
Full-text available
In this study, we build a multi‐station phase‐picking model named EdgePhase by integrating an Edge Convolutional module with a state‐of‐the‐art single‐station phase‐picking model, EQTransformer. The Edge Convolutional module, a variant of Graph Neural Network, exchanges information relevant to seismic phases between neighboring stations. In EdgePhase, seismograms are first encoded into the latent representations, then converted into enhanced representations by the Edge Convolutional module, and finally decoded into the P‐ and S‐phase probabilities. Compared to the standard EQTransformer, EdgePhase increases the precision (fraction of phase identifications that are real) and recall (fraction of phase arrivals that are identified) rate by 5% on our training and test data sets of Southern California earthquakes. To evaluate its performance in regions of different tectonic settings, we applied EdgePhase to detect the early aftershocks following the 2020 M7.0 Samos, Greece earthquake. Compared to a local earthquake catalog, EdgePhase produced 190% additional detections with an event distribution more conformative to a planar fault interface, suggesting higher fidelity in event locations. This case study indicates that EdgePhase provides a strong regional generalization capability in real‐world applications.
Preprint
The San Andreas Fault system, known for its frequent seismic activity, provides an extensive dataset for earthquake studies. The region's well-instrumented seismic networks have been crucial in advancing research on earthquake statistics, physics, and subsurface Earth structures. In recent years, earthquake data from California has become increasingly valuable for deep learning applications, such as Generalized Phase Detection (GPD) for phase detection and polarity determination, and PhaseNet for phase arrival-time picking. The continuous accumulation of data, particularly those manually labeled by human analysts, serves as an essential resource for advancing both regional and global deep learning models. To support the continued development of machine learning and data mining studies, we have compiled a unified California Earthquake Event Dataset (CEED) that integrates seismic records from the Northern California Earthquake Data Center (NCEDC) and the Southern California Earthquake Data Center (SCEDC). The dataset includes both automatically and manually determined parameters such as earthquake origin time, source location, P/S phase arrivals, first-motion polarities, and ground motion intensity measurements. The dataset is organized in an event-based format organized by year spanning from 2000 to 2024, facilitating cross-referencing with event catalogs and enabling continuous updates in future years. This comprehensive open-access dataset is designed to support diverse applications including developing deep learning models, creating enhanced catalog products, and research into earthquake processes, fault zone structures, and seismic risks.
Article
Full-text available
As Japan is one of the most seismically active countries, seismic data from various institutions are shared in real time and made accessible via the Web to promote research. The Japan Meteorological Agency (JMA), in collaboration with the Ministry of Education, Culture, Sports, Science, and Technology, processes these data to compile a 'unified earthquake catalog' for use in the development of strategies for disaster prevention and public safety. Based on manual arrival-time measurements provided by the JMA, we retrained PhaseNet, the deep-learning phase picker, known as neural phase picker that has gained prominence in recent years, to promote the development of high-quality seismic catalogs in Japan. We utilized the PhaseNet architecture for our model and trained it using 6.1 million three-component seismic waveforms collected in 2014–2021. The performance of the original PhaseNet model, trained with data from California, was suboptimal when applied to routine Japanese data, particularly ocean-bottom seismometer records. Retraining the model with the JMA unified catalog and corresponding waveforms significantly enhanced its performance in picking the arrival times of regular and low-frequency earthquakes. Compared with the original PhaseNet, the dependency of the model on the type of seismic station was reduced by retraining and its performance for waveforms was improved even from stations not included in the training data set. The model performance varied with earthquake magnitude, highlighting the reliance on extensive data for small events in the training set. Compared with the conventional procedure, the model identified numerous events, particularly smaller ones with undetermined magnitudes when integrated into the routine automatic processing of the JMA. Furthermore, leveraging approximately ten times more training data than the California data set, we developed and trained PhaseNetWC, doubling the number of filter channels in each convolutional layer in comparison with those of the original PhaseNet. This modified phase picker surpassed the performance of its predecessor. The dissemination of these models is anticipated to enhance the analysis of routine observational data sets in Japan. Graphical Abstract
Article
Full-text available
Distributed acoustic sensing (DAS) presents challenges and opportunities for seismological research and data management. This study explores wavefield reconstruction using deep learning methods for data compression and wavefield separation. We test various architectures to treat DAS data as two‐dimensional arrays, such as the implicit neural representation (INR) models and the SHallow REcurrent Decoder (SHRED) model. The INR models present better data compression ability but do not generalize over space and time, a major practical limitation. On the other hand, SHRED generalizes over space and time for a single optical fiber with data from 20% decimated channels for the reconstruction. Despite good performance in reconstructing long‐wavelength features, the shallow recurrent decoder does not reconstruct transient earthquake wavefields at shorter wavelengths, limiting its usability for seismic data transmission. Nevertheless, we leverage wavefield reconstruction of ocean waves to separate them from the seismic wavefield and improve seismological use cases for earthquake detection and Earth imaging. In summary, as a lightweight deep learning model, SHRED is well suited for wavefield separation and lossy compression of the DAS data.
Article
Extensive research has been conducted in the domain of seismic noise to enhance the quality of seismic signals. However, despite these efforts, a notable gap exists in the literature concerning the physical properties of seismic noise with rigorous quantitative assessment methodologies for its characterization. Therefore, we suggest our data-driven generative model PPSD GAN, unconditional WGAN-GP framework which is trained with the PPSD loss. We define a metric PPSD score for evaluation by leveraging the information contained in the PPSD histogram. We used two distinct datasets sampled from noisy and quiet areas in our study. Compared with previous approaches, PPSD GAN achieved 9.6-24.3% higher PPSD scores compared to the existing models in both regions. The waveform generated by PPSD GAN is visually similar to the actual waveform. Also, the experimental result shows that our model succeeded in learning the regional characteristics.
Preprint
Full-text available
In Japan, one of the most seismically active nations, seismic data from various institutions are shared in real-time and made accessible via the web, which facilitates research by numerous scholars. The Japan Meteorological Agency (JMA), in collaboration with the Ministry of Education, Culture, Sports, Science, and Technology (MEXT), processes these data to compile a “unified earthquake catalog.” This catalog is crucial for developing disaster prevention strategies and significantly enhances societal safety. To facilitate the efficient development of high-quality seismic catalogs from this data, we retrained a deep-learning phase picker, known as a neural phase picker, which has gained prominence in recent years. This retraining was based on manual arrival time measurements provided by the JMA. We utilized the PhaseNet architecture for our model and trained it using 6.1 million three-component seismic waveforms collected between 2014 and 2021. When the original PhaseNet model, which was trained with data from California, was applied to routine Japanese data, its performance was suboptimal, especially with ocean-bottom seismometer records. However, retraining the model with the JMA-unified catalogs and corresponding waveforms significantly enhanced its performance in reading arrival times of natural and low-frequency earthquakes. The retrained model reduced its dependency on the types of stations that were monitored when applying the original PhaseNet and displayed improved performance for waveforms even from stations not included in the training dataset. The performance of the model varied with earthquake magnitude, which highlights the reliance on extensive data on small events in the training set. When integrated into the automatic processing used in the routine operation of the JMA, the model identified a larger number of events, especially smaller ones with undetermined magnitudes, compared to the events recorded by the conventional procedure. Furthermore, leveraging approximately ten times more training data than the California study, we developed and trained PhaseNetWC, which is a model with greater expressiveness than the original PhaseNet. This new model surpassed the performance of its predecessor. The publication and dissemination of these models are anticipated to enhance the analysis of routine observational datasets in Japan.
Article
The utilization of sequence signals in real-world mobile communications plays a crucial role in the design and optimization of communication methods. Through our own performance evaluation, we have confirmed that conventional augmentation techniques, mainly designed for image or photo data, are unsuitable for sequence signal applications due to inherent differences in data characteristics. To address this practical limitation, Multi-Shape Augmentation (MuShAug) employs Sequence Signal-To-Image (SSI) to represent sequence signals in image format, enabling the extraction of diverse signal features. To evaluate the practical applicability of our proposed method, we conduct experiments using real-world sequence signals collected from operational Fifth Generation (5G) mobile communication systems. In experimental trials, MuShAug consistently demonstrates robust generalization performance, achieving high levels of classification accuracy. Furthermore, through the incorporation of Random Phase Transformation (RPT), our method achieves further enhanced performance within advanced data augmentation techniques.
Article
Shortage of labeled seismic field data poses a significant challenge for deep-learning related applications in seismology. One approach to mitigate this issue is to use synthetic waveforms as a complement to field data. However, traditional physics-driven methods for synthesizing data are computationally expensive and may fail to capture features key for understanding the subsurface as in real seismic waveforms. In this study, we develop a deep-learning-based generative model, PhaseGen, for synthesizing realistic seismic waveforms dictated by provided P- and S-wave arrival labels. Contrary to previous generative models which require a large amount of data for training, the proposed model can be trained with only 100 seismic events recorded by a single seismic station. The fidelity, diversity and alignment for waveforms synthesized by PhaseGen with diverse P- and S-wave arrival labels are quantitatively evaluated. Also, PhaseGen is used to augment a labelled seismic dataset used for training a deep neural network for the phase picking task, and it is found that the model training using augmented datasets improves the picking performance. It is expected that PhaseGen can offer a valuable alternative for rapid seismic waveform synthesis and provide a promising solution for the lack of labeled seismic data.
Article
Full-text available
Ground motion parameters are crucial characteristics in earthquake warning and earthquake engineering practice. However, the existing methods are time-consuming and labor-intensive. In this study, a multi-task approach (GMP-MT) based on a hard parameter sharing strategy and single station data is proposed to improve the overall estimation accuracy by jointly optimizing the estimation of peak ground acceleration (PGA) and peak ground velocity (PGV). In addition, this study reshapes the mean squared error by adjusting the weight of the loss according to the data distribution to solve the data imbalance. The developed network structure extracts not only the seismic features from various dimensions but also the spatial–temporal correlations from large-dimensional seismic data. The designed model is trained and tested based on the global three-component seismic waveform data recorded in the STanford EArthquake Dataset. Experimental results show that the correlation coefficients of PGA and PGV are above 90%, and the average errors are less than 0.19. The model has good stability, specifically insensitive to epicenter distance, hypocentral depth, and signal-to-noise ratio. Furthermore, the superiority of the model in terms of learning and fitting is demonstrated by comparison with several state-of-the-art models in the existing literature.
Article
Full-text available
Given the recent developments in machine-learning technology, its application has rapidly progressed in various fields of earthquake seismology, achieving great success. Here, we review the recent advances, focusing on catalog development, seismicity analysis, ground-motion prediction, and crustal deformation analysis. First, we explore studies on the development of earthquake catalogs, including their elemental processes such as event detection/classification, arrival time picking, similar waveform searching, focal mechanism analysis, and paleoseismic record analysis. We then introduce studies related to earthquake risk evaluation and seismicity analysis. Additionally, we review studies on ground-motion prediction, which are categorized into four groups depending on whether the output is ground-motion intensity or ground-motion time series and the input is features (individual measurable properties) or time series. We discuss the effect of imbalanced ground-motion data on machine-learning models and the approaches taken to address the problem. Finally, we summarize the analysis of geodetic data related to crustal deformation, focusing on clustering analysis and detection of geodetic signals caused by seismic/aseismic phenomena. Graphical Abstract
Article
The conventional magnetotelluric inversion method is subject to the influence of the initial model, which leads to an unstable inversion process and a tendency to get trapped at local optimal solutions. In contrast, deep learning technology relies on its powerful non-linear fitting capability and can construct complex non-linear mappings directly from observation data (input) to model (output). In recent years, it has received extensive attention from researchers. Due to the difficulties in creating a sufficiently large dataset and performing extensive neural network training, most current magnetotelluric inversion methods for geophysical exploration remain limited to one-dimensional (1D) or two-dimensional (2D) scenarios. To the best of our knowledge, for deep learning-based three-dimensional (3D) magnetotelluric inversion, currently there is no reported work in the literature. In this work, we propose a 3D magnetotelluric inversion method based on deep learning technology. By designing a neural network architecture for 3D structures (MT3D-Net), we achieve an end-to-end mapping from the network input to output. To alleviate the excessive dependence of the network on the training set, we introduce a joint weighted loss function based on data-driven and physics-driven method, allowing the network to follow the physical constraints of magnetotelluric data during the training process and thus more reasonably guide the update of network parameters. Numerical experiments show that this method combines the advantages of traditional and data-driven inversions, significantly improving the stability and accuracy of magnetotelluric inversion. The proposed method has been successfully applied to synthetic models and measured field data, and has good application prospects.
Article
Full-text available
Plain Language Summary Myanmar is a highly seismically active region, yet fault geometry and activities remain poorly understood because of limited modern seismological investigations. Here, we designed a set of machine‐learning algorithms to detect small earthquakes and determine their locations precisely. The seismic data are recorded by a temporary seismic network deployed in central Myanmar. We obtained twice as many earthquakes as the previous research used the regular procedure. Our improved earthquake data set unveils seismic activity changes along the Kabaw Fault through the changes in earthquake locations, depths, and magnitude‐frequency relations. Kabaw Fault is an import boundary fault in the subduction system of the Indo‐Burma Range. This subtle change was not previously observed but means a significant alternation in deformation style along the subduction strike. Moreover, our improved data set indicates that the Sagaing Fault, the most active fault in Myanmar, is prone to generating large earthquakes in the future. This implication warns the nearby populated cities, like Mandalay, of a significant megaquake threat.
Article
Seismic signals classification has many real-time applications related to monitoring and collecting information for investigations, public safety, and prevention of security breaches. We cross-amalgamated the seismic signals with acoustic data augmentation/feature extraction techniques, keeping the beneficial effects of each domain. In this context, we have identified the human walk from that of an animal by manipulating the seismic response. This work presents a robust automated system for surveillance against noisy environments for the classification of seismic events, which is trained to exploit the collected geo signals, namely the physical security dataset (PSD). In this context, an ensemble machine learning-based integrated physical security paradigm (EML-PSP) framework is proposed for automatically classifying humans and animals on seismic signals through a cross-domain ultra-fused feature extraction (UFFE) module using numerous speech-related feature extraction approaches. For the model to learn considerably, we have introduced a hybrid augmentation module (HAM) to synthesize realistic seismic signals based on multiple acoustic augmentation schemes. The ensemble features with enhanced discrimination power have been used to train ensemble algorithms like light-gradient boosted machine (LGBM), random forest-, and adaptive boosting-models. The exhaustive comparison of the proposed solution has been carried out with other state-of-the-art methods. On exploiting the UFFE-based features, the performance of the LGBM ensemble outnumbered other classifiers with an F 1-Score of 0.9961 ± 0.0031. The Matthews correlation coefficient and accuracy were 0.9841 ± 0.0127 and 99.4111 ± 0.0047 percent, respectively. The geo sensor's PSD results illustrated that the EML-PSP framework has adequate physical security and surveillance prospects.
Article
Full-text available
Deep-learning (DL) pickers have demonstrated superior performance in seismic phase picking compared to traditional pickers. DL pickers are extremely effective in processing large amounts of seismic data. Nevertheless, they encounter challenges when handling seismograms from different tectonic environments or source types, and even a slight change in the input waveform can considerably affect their consistency. Here, we fine-tuned a self-trained deep neural network picker using a small amount of local seismic data (26,875 three-component seismograms) recorded by regional seismic networks in South Korea. The self-trained model was developed using publicly available waveform datasets, comprising over two million three-component seismograms. The results revealed that the Korean-fine-tuned phase picker (KFpicker) effectively enhanced picking quality, even when applied to data that were not used during the fine-tuning process. When compared to the performance of the pre-trained model, this improvement was consistently observed regardless of variations in the positions of seismic phases in the input waveform, Furthermore, when the KFpicker predicted the phases for overlapping input windows and used the median value of probabilities as a threshold for phase detection, a considerable decrease was observed in the number of false picks. These findings indicate that fine-tuning a deep neural network using a small amount of local data can improve earthquake detection in the region of interest, while careful data augmentation can enhance the robustness of DL pickers against variations in the input window. The application of KFpicker to the 2016 Gyeongju earthquake sequence yielded approximately twice as many earthquakes compared to previous studies. Consequently, detailed and instantaneous statistical parameters of seismicity can be evaluated, making it possible to assess seismic hazard during an earthquake sequence.
Article
Full-text available
Data-driven approaches to identify geophysical signals have proven beneficial in high dimensional environments where model-driven methods fall short. GNSS offers a source of unsaturated ground motion observations that are the data currency of ground motion forecasting and rapid seismic hazard assessment and alerting. However, these GNSS-sourced signals are superposed onto hardware-, location- and time-dependent noise signatures influenced by the Earth’s atmosphere, low-cost or spaceborne oscillators, and complex radio frequency environments. Eschewing heuristic or physics based models for a data-driven approach in this context is a step forward in autonomous signal discrimination. However, the performance of a data-driven approach depends upon substantial representative samples with accurate classifications, and more complex algorithm architectures for deeper scientific insights compound this need. The existing catalogs of high-rate (≥1Hz) GNSS ground motions are relatively limited. In this work, we model and evaluate the probabilistic noise of GNSS velocity measurements over a hemispheric network. We generate stochastic noise time series to augment transferred low-noise strong motion signals from within 70 kilometers of strong events (≥ MW 5.0) from an existing inertial catalog. We leverage known signal and noise information to assess feature extraction strategies and quantify augmentation benefits. We find a classifier model trained on this expanded pseudo-synthetic catalog improves generalization compared to a model trained solely on a real-GNSS velocity catalog, and offers a framework for future enhanced data driven approaches.
Article
Traditional seismic phase pickers perform poorly during periods of elevated seismicity due to inherent weakness when detecting overlapping earthquake waveforms. This weakness results in incomplete seismic catalogs, particularly deficient in earthquakes that are close in space and time. Supervised deep-learning (DL) pickers allow for improved detection performance and better handle the overlapping waveforms. Here, we present a DL phase-picking procedure specifically trained on Yellowstone seismicity and designed to fit within the University of Utah Seismograph Stations (UUSS) real-time system. We modify and combine existing DL models to label the seismic phases in continuous data and produce better phase arrival times. We use transfer learning to achieve consistency with UUSS analysts while maintaining robust models. To improve the performance during periods of enhanced seismicity, we develop a data augmentation strategy to synthesize waveforms with two nearly coincident P arrivals. We also incorporate a model uncertainty quantification method, Multiple Stochastic Weight Averaging-Gaussian (MultiSWAG), for arrival-time estimates and compare it to dropout—a more standard approach. We use an efficient, model-agnostic method of empirically calibrating the uncertainties to produce meaningful 90% credible intervals. The credible intervals are used downstream in association, location, and quality assessment. For an in-depth evaluation of our automated method, we apply it to continuous data recorded from 25 March to 3 April 2014, on 20 three-component stations and 14 vertical-component stations. This 10-day period contains an Mw 4.8 event, the largest earthquake in the Yellowstone region since 1980. A seismic analyst manually examined more than 1000 located events, including ∼855 previously unidentified, and concluded that only two were incorrect. Finally, we present an analyst-created, high-resolution arrival-time data set, including 651 new arrival times, for one hour of data from station WY.YNR for robust evaluation of missed detections before association. Our method identified 60% of the analyst P picks and 81% of the S picks.
Article
Full-text available
Exploration seismology uses reflected and refracted seismic waves, emitted from a controlled (active) source into the ground, and recorded by an array of seismic sensors (receivers) to image the subsurface geologic structures. These seismic images are the main resources for energy and resource exploration and scientific investigation of the crust and upper mantle. We survey recent advances in applications of machine-learning methods, more specifically deep neural networks (DNNs), in exploration seismology. We provide a technically oriented review of DNN applications for seismic data acquisition; data preprocessing tasks such as interpolation/extrapolation, denoising, first-break picking, velocity picking, and seismic migration; data processing tasks such as geologic and structural interpretations; and data modeling tasks such as the inference of subsurface structures and lithologic and petrophysical properties. DNNs have entered almost every sector of exploration seismology. They have outperformed many traditional algorithms for the automation of seismic data acquisition, data preprocessing, data processing, interpretations, and data modeling tasks. However, despite the impressive performances of DNN-based approaches, the out-of-distribution generalization and interpretability of these models remain challenging. To overcome these challenges, incorporating domain knowledge into the DNNs is a promising path and a focus of current deep-learning research in seismology.
Article
The Machine Learning Asset Aggregation of the Preliminary Determination of Epicenters (MLAAPDE) dataset is a labeled waveform archive designed to enable rapid development of machine learning (ML) models used in seismic monitoring operations. MLAAPDE consists of more than 5.1 million recordings of 120 s long three-component broadband waveform data (raw counts) for P, Pn, Pg, S, Sn, and Sg arrivals. The labeled catalog is collected from the U.S. Geological Survey National Earthquake Information Center’s (NEIC) Preliminary Determination of Epicenters bulletin, which includes local to teleseismic observations for earthquakes ∼M 2.5 and larger. Each arrival in the labeled dataset has been manually reviewed by NEIC staff. An accompanying Python module enables users to develop customized training datasets, which includes different time-series lengths, distance ranges, sampling rates, and/or phase lists. MLAAPDE is distinct from other publicly available datasets in containing local (14%), regional (36%), and teleseismic (50%) observations, in which local, regional, and teleseismic distance are 0°–3°, 3°–30°, and 30°+, respectively. A recent version of the dataset is publicly available (see Data and Resources), and user-specific versions can be generated locally with the accompanying software. MLAAPDE is an NEIC supported, curated, and periodically updated dataset that can contribute to seismological ML research and development.
Article
Full-text available
The curation of seismic datasets is the cornerstone of seismological research and the starting point of machine-learning applications in seismology. We present a 21-year-long AI-ready dataset of diverse seismic event parameters, instrumentation metadata, and waveforms, as curated by the Pacific Northwest Seismic Network and ourselves. The dataset contains about 190,000 three-component (3C) waveform traces from more than 65,000 earthquake and explosion events, and about 9,200 waveforms from 5,600 exotic events. The magnitude of the events ranges from 0 to 6.4, while the biggest one is 20 December 2022 M6.4 Ferndale Earthquake. We include waveforms from high-gain (EH, BH, and HH channels) and strong-motion (EN channels) seismometers and resample to 100 Hz. We describe the earthquake catalog and the temporal evolution of the data attributes (e.g., event magnitude type, channel type, waveform polarity, and signal-tonoise ratio, phase picks) as the network earthquake monitoring system evolved through time. We propose this AI-ready dataset as a new open-source benchmark dataset.
Article
Deep learning has been applied to microseismic event detection over the past few years. However, it is still challenging to detect microseismic events from records with low signal-to-noise ratios (SNRs). To achieve high accuracy of event detection in low-SNR scenario, we propose an end-to-end network that jointly performs denoising and classification tasks (JointNet), and apply it to fiber-optic distributed acoustic sensing (DAS) microseismic data. The JointNet consists of 2D convolution layers that are suitable for extracting features (such as moveout and amplitude) of the dense DAS data. Moreover, the JointNet uses a joint loss, rather than any intermediate loss, to simultaneously update the coupled denoising and classification modules. With the above advantages, the JointNet is capable of simultaneously attenuating noise and preserving fine details of the events, and therefore improving the accuracy of event detection. We generate synthetic events and collect real background noise from a real hydraulic fracturing project, and then expand them using data augmentation methods to yield sufficient training datasets. We train and validate the JointNet using training datasets of different SNRs and compare it with the conventional classification networks VGG (Visual Geometry Group) and DVGG (Deep VGG). The results demonstrate the effectiveness of the JointNet: it consistently outperforms the VGG and DVGG in all SNR scenarios; it has a superior capability to detect events, especially in low-SNR scenario. Finally, we apply the JointNet to detect microseismic events from real DAS data acquired during a hydraulic fracturing. The JointNet successfully detects all manually detected events, and has a better performance than VGG and DVGG.
Preprint
Distributed Acoustic Sensing (DAS) is an emerging technology for earthquake monitoring and subsurface imaging. The recorded seismic signals by DAS have several distinct characteristics, such as unknown coupling effects, strong anthropogenic noise, and ultra-dense spatial sampling. These aspects differ from conventional seismic data recorded by seismic networks, making it challenging to utilize DAS at present for seismic monitoring. New data analysis algorithms are needed to extract useful information from DAS data. Previous studies on conventional seismic data demonstrated that deep learning models could achieve performance close to human analysts in picking seismic phases. However, phase picking on DAS data is still a difficult problem due to the lack of manual labels. Further, the differences in mathematical structure between these two data formats, i.e., ultra-dense DAS arrays and sparse seismic networks, make model fine-tuning or transfer learning difficult to implement on DAS data. In this work, we design a new approach using semi-supervised learning to solve the phase-picking task on DAS arrays. We use a pre-trained PhaseNet model as a teacher network to generate noisy labels of P and S arrivals on DAS data and apply the Gaussian mixture model phase association (GaMMA) method to refine these noisy labels to build training datasets. We develop a new deep learning model, PhaseNet-DAS, to process the 2D spatial-temporal data of DAS arrays and train the model on DAS data. The new deep learning model achieves high picking accuracy and good earthquake detection performance. We then apply the model to process continuous data and build earthquake catalogs directly from DAS recording. Our approach using semi-supervised learning provides a way to build effective deep learning models for DAS, which have the potential to improve earthquake monitoring using large-scale fiber networks.
Article
We propose MiShape, a millimeter-wave (mmWave) wireless signal based imaging system that generates high-resolution human silhouettes and predicts 3D locations of body joints. The system can capture human motions in real-time under low light and low-visibility conditions. Unlike existing vision-based motion capture systems, MiShape is privacy non-invasive and can generalize to a wide range of motion tracking applications at-home. To overcome the challenges with low-resolution, specularity, and aliasing in images from Commercial-Off-The-Shelf (COTS) mmWave systems, MiShape designs deep learning models based on conditional Generative Adversarial Networks and incorporates the rules of human biomechanics. We have customized MiShape for gait monitoring, but the model is well adaptive to any tracking applications with limited fine-tuning samples. We experimentally evaluate MiShape with real data collected from a COTS mmWave system for 10 volunteers, with diverse ages, gender, height, and somatotype, performing different poses. Our experimental results demonstrate that MiShape delivers high-resolution silhouettes and accurate body poses on par with an existing vision-based system, and unlocks the potential of mmWave systems, such as 5G home wireless routers, for privacy-noninvasive healthcare applications.
Article
Seismic waves from earthquakes and other sources are used to infer the structure and properties of Earth’s interior. The availability of large-scale seismic datasets and the suitability of deep-learning techniques for seismic data processing have pushed deep learning to the forefront of fundamental, long-standing research investigations in seismology. However, some aspects of applying deep learning to seismology are likely to prove instructive for the geosciences, and perhaps other research areas more broadly. Deep learning is a powerful approach, but there are subtleties and nuances in its application. We present a systematic overview of trends, challenges, and opportunities in applications of deep-learning methods in seismology.
Article
Full-text available
Seismograms contain multiple sources of seismic waves, from distinct transient signals such as earthquakes to continuous ambient seismic vibrations such as microseism. Ambient vibrations contaminate the earthquake signals, while the earthquake signals pollute the ambient noise’s statistical properties necessary for ambient-noise seismology analysis. Separating ambient noise from earthquake signals would thus benefit multiple seismological analyses. This work develops a multi-task encoder-decoder network named WaveDecompNet to separate transient signals from ambient signals directly in the time domain for 3-component seismograms. We choose the active-volcanic Big Island in Hawai’i as a natural laboratory given its richness in transients (tectonic and volcanic earthquakes) and diffuse ambient noise (strong microseism). The approach takes a noisy 3-component seismogram as input and independently predicts the 3-component earthquake and noise waveforms. The model is trained on earthquake and noise waveforms from the STandford EArthquake Dataset (STEAD) and on the local noise of seismic station IU.POHA. We estimate the network’s performance by using the Explained Variance (EV) metric on both earthquake and noise waveforms. We explore different neural network designs for WaveDecompNet and find that the model with Long-Short-Term-Memory (LSTM) performs best over other structures. Overall, we find that WaveDecompNet provides satisfactory performance down to a Signal-to-Noise-Ratio (SNR) of 0.1. The potential of the method is 1) to improve broadband SNR of transient (earthquake) waveforms and 2) to improve local ambient noise to monitor the Earth’s structure using ambient noise signals. To test this, we apply a Short-Time-Average to a Long-Time-Average (STA/LTA) filter and improve the number of detected events. We also measure single-station cross-correlation functions of the recovered ambient noise and establish their improved coherence through time and over different frequency bands. We conclude that WaveDecompNet is a promising tool for a broad range of seismological research.
Article
We propose a method to generate seismic images with corresponding fault labels for augmenting training data in automatic fault detection. Our method is based on two generative adversarial networks: one for creating fault system and the other for generating 2D seismic images with faults as a condition. Our method can capture the characteristics of field seismic data during inference to generate samples that have properties of both field seismic data and synthetic training data. We then use the newly generated seismic images with corresponding fault labels to train a convolutional neural network for fault picking. We test the proposed approach on a 3D field dataset from the Gulf of Mexico. We use different areas in the field dataset as an input to generate new training data for corresponding fault picking models. The results show that the generated training data from our method help improve the fault picking models in the targeted areas. This article is protected by copyright. All rights reserved
Article
Full-text available
Earthquake monitoring by seismic networks typically involves a workflow consisting of phase detection/picking, association, and location tasks. In recent years, the accuracy of these individual stages has been improved through the use of machine learning techniques. In this study, we introduce a new end‐to‐end approach that improves overall earthquake detection accuracy by jointly optimizing each stage of the detection pipeline. We propose a neural network architecture for the task of multi‐station processing of seismic waveforms recorded over a seismic network. This end‐to‐end architecture consists of three sub‐networks: a backbone network that extracts features from raw waveforms, a phase picking sub‐network that picks P‐ and S‐wave arrivals based on these features, and an event detection sub‐network that aggregates the features from multiple stations to associate and detect earthquakes across a seismic network. We use these sub‐networks together with a shift‐and‐stack module based on back‐projection that introduces kinematic constraints on arrival times, allowing the neural network model to generalize to different velocity models and to variable station geometry in seismic networks. We evaluate our proposed method on the STanford EArthquake Dataset (STEAD) and on the 2019 Ridgecrest, CA earthquake sequence. The results demonstrate that our end‐to‐end approach can effectively pick P‐ and S‐wave arrivals and achieve earthquake detection accuracy rivaling that of other state‐of‐the‐art approaches. Because our approach preserves information across tasks in the detection pipeline, it has the potential to outperform approaches that do not.
Article
Full-text available
We present a deep-learning method for a single-station earthquake location, which we approach as a regression problem using two separate Bayesian neural networks. We use a multitask temporal convolutional neural network to learn epicentral distance and P travel time from 1-min seismograms. The network estimates epicentral distance and P travel time with mean errors of 0.23 km and 0.03 s and standard deviations of 5.42 km and 0.66 s, respectively, along with their epistemic and aleatory uncertainties. We design a separate multi-input network using standard convolutional layers to estimate the back-azimuth angle and its epistemic uncertainty. This network estimates the direction from which seismic waves arrive at the station with a mean error of 1°. Using this information, we estimate the epicenter, origin time, and depth along with their confidence intervals. We use a global data set of earthquake signals recorded within 1° (~112 km) from the event to build the model and demonstrate its performance. Our model can predict epicenter, origin time, and depth with mean errors of 7.3 km, 0.4 s, and 6.7 km, respectively, at different locations around the world. Our approach can be used for fast earthquake source characterization with a limited number of observations and also for estimating the location of earthquakes that are sparsely recorded--either because they are small or because stations are widely separated.
Article
Full-text available
The accurate and automated determination of small earthquake (ML < 3.0) locations is still a challenging endeavor due to low signal-to-noise ratio in data. However, such information is critical for monitoring seismic activity and assessing potential hazards. In particular, earthquakes caused by industrial injection have become a public concern, and regulators need a solid capability for estimating small earthquakes that may trigger the action requirements for operators to follow in real time. In this study, we develop a fully convolutional network and locate earthquakes induced during oil and gas operations in Oklahoma with data from 30 network stations. The network is trained by 1,013 cataloged events (ML ≥ 3.0) as base data along with augmented data accounting for smaller events (3.0 > ML ≥ 0.5), and the output is a 3D volume of the event location probability in the Earth. The prediction results suggest that the mean epicenter errors of the testing events (ML ≥ 1.5) vary from 3.7 to 6.4 km, meeting the need of the traffic light system in Oklahoma, but smaller events (ML = 1.0, 0.5) show errors larger than 11 km. Synthetic tests suggest that the accuracy of ground truth from catalog affects the prediction results. Correct ground truth leads to a mean epicenter error of 2.0 km in predictions, but adding a mean location error of 6.3 km to ground truth causes a mean epicenter error of 4.9 km. The automated system is able to distinguish certain interfered events or events out of the monitoring zone based on the output probability estimate. It requires approximately one hundredth of a second to locate an event without the need for any velocity model or human interference.
Article
Full-text available
In this study, we present a fast and reliable method for end‐to‐end estimation of earthquake magnitude from raw waveforms recorded at single stations. We design a regressor (MagNet) composed of convolutional and recurrent neural networks that is not sensitive to the data normalization, hence waveform amplitude information can be utilized during the training. The network can learn distance‐dependent and site‐dependent functions directly from the training data. Our model can predict local magnitudes with an average error close to zero and standard deviation of ~0.2 based on single‐station waveforms without instrument response correction. We test the network for both local and duration magnitude scales and show a station‐based learning can be an effective approach for improving the performance. The proposed approach has a variety of potential applications from routine earthquake monitoring to early warning systems.
Article
Full-text available
Seismology is a data rich and data-driven science. Application of machine learning for gaining new insights from seismic data is a rapidly evolving sub-field of seismology. The availability of a large amount of seismic data and computational resources, together with the development of advanced techniques can foster more robust models and algorithms to process and analyze seismic signals. Known examples or labeled data sets, are the essential requisite for building supervised models. Seismology has labeled data, but the reliability of those labels is highly variable, and the lack of high-quality labeled data sets to serve as ground truth as well as the lack of standard benchmarks are obstacles to more rapid progress. In this paper we present a high-quality, large-scale, and global data set of local earthquake and non-earthquake signals recorded by seismic instruments. The data set in its current state contains two categories: (1) local earthquake waveforms (recorded at “local” distances within 350 km of earthquakes) and (2) seismic noise waveforms that are free of earthquake signals. Together these data comprise ∼ 1.2 million time series or more than 19,000 hours of seismic signal recordings. Constructing such a large-scale database with reliable labels is a challenging task. Here, we present the properties of the data set, describe the data collection, quality control procedures, and processing steps we undertook to insure accurate labeling, and discuss potential applications. We hope that the scale and accuracy of STEAD presents new and unparalleled opportunities to researchers in the seismological community and beyond.
Article
Full-text available
Generative adversarial networks have gained a lot of attention in the computer vision community due to their capability of data generation without explicitly modelling the probability density function. The adversarial loss brought by the discriminator provides a clever way of incorporating unlabeled samples into training and imposing higher order consistency. This has proven to be useful in many cases, such as domain adaptation, data augmentation, and image-to-image translation. These properties have attracted researchers in the medical imaging community, and we have seen rapid adoption in many traditional and novel applications, such as image reconstruction, segmentation, detection, classification, and cross-modality synthesis. Based on our observations, this trend will continue and we therefore conducted a review of recent advances in medical imaging using the adversarial training scheme with the hope of benefiting researchers interested in this technique.
Article
Full-text available
Frequency filtering is widely used in routine processing of seismic data to improve the signal-to-noise ratio (SNR) of recorded signals and by doing so to improve subsequent analyses. In this paper, we develop a new denoising/decomposition method, DeepDenoiser, based on a deep neural network. This network is able to simultaneously learn a sparse representation of data in the time-frequency domain and a non-linear function that maps this representation into masks that decompose input data into a signal of interest and noise (defined as any non-seismic signal). We show that DeepDenoiser achieves impressive denoising of seismic signals even when the signal and noise share a common frequency band. Because the noise statistics are automatically learned from data and require no assumptions, our method properly handles white noise, a variety of colored noise, and non-earthquake signals. DeepDenoiser can significantly improve the SNR with minimal changes in the waveform shape of interest, even in the presence of high noise levels. We demonstrate the effect of our method on improving earthquake detection. There are clear applications of DeepDenoiser to seismic imaging, micro-seismic monitoring, and preprocessing of ambient noise data. We also note that the potential applications of our approach are not limited to these applications or even to earthquake data and that our approach can be adapted to diverse signals and applications in other settings.
Article
Full-text available
Earthquake signal detection is at the core of observational seismology. A good detection algorithm should be sensitive to small and weak events with a variety of waveform shapes, robust to background noise and non-earthquake signals, and efficient for processing large data volumes. Here, we introduce the Cnn-Rnn Earthquake Detector (CRED), a detector based on deep neural networks. CRED uses a combination of convolutional layers and bi-directional long-short-term memory units in a residual structure. It learns the time-frequency characteristics of the dominant phases in an earthquake signal from three component data recorded on individual stations. We train the network using 500,000 seismograms (250k associated with tectonic earthquakes and 250k identified as noise) recorded in Northern California. The robustness of the trained model with respect to the noise level and non-earthquake signals is shown by applying it to a set of semi-synthetic signals. We also apply the model to one month of continuous data recorded at Central Arkansas to demonstrate its efficiency, generalization, and sensitivity. Our model is able to detect more than 800 microearthquakes as small as −1.3 ML induced during hydraulic fracturing far away than the training region. We compare the performance of the model with the STA/LTA, template matching, and FAST algorithms. Our results indicate an efficient and reliable performance of CRED. This framework holds great promise for lowering the detection threshold while minimizing false positive detection rates.
Article
Full-text available
Plain Language Summary Deep learning is currently attracting immense research interest in seismology due to its powerful ability to deal with huge seismic data collections. In this study we developed a deep‐learning model (PickNet) that can rapidly pick a great number of first P and S wave arrival times precisely from local earthquake seismograms. The picking accuracy of the arrival times provided by our PickNet model is close to that by human experts. The data are good enough to be used directly to determine high‐resolution 3‐D velocity models of the Earth. Our PickNet model can deal with seismic waveforms provided by data centers of different earthquake networks. Furthermore, our PickNet model is also a potential tool for automatically picking later seismic phases accurately. A large number of high‐quality seismic arrival times can be used to illuminate the Earth structure clearly. Hence, this study may greatly contribute to improve our knowledge of the Earth's interior.
Article
Full-text available
The increasing volume of seismic data from long-term continuous monitoring motivates the development of algorithms based on convolutional neural network (CNN) for faster and more reliable phase detection and picking. However, many less studied regions lack a significant amount of labeled events needed for traditional CNN approaches. In this paper, we present a CNN-based Phase-Identification Classifier (CPIC) designed for phase detection and picking on small to medium sized training datasets. When trained on 30,146 labeled phases and applied to one-month of continuous recordings during the aftershock sequences of the 2008 M_W 7.9 Wenchuan Earthquake in Sichuan, China, CPIC detects 97.5% of the manually picked phases in the standard catalog and predicts their arrival times with a five-times improvement over the ObsPy AR picker. In addition, unlike other CNN-based approaches that require millions of training samples, when the off-line training set size of CPIC is reduced to only a few thousand training samples the accuracy stays above 95%. The online implementation of CPIC takes less than 12 hours to pick arrivals in 31-day recordings on 14 stations. In addition to the catalog phases manually picked by analysts, CPIC finds more phases for existing events and new events missed in the catalog. Among those additional detections, some are confirmed by a matched filter method while others require further investigation. Finally, when tested on a small dataset from a different region (Oklahoma, US), CPIC achieves 97% accuracy after fine tuning only the fully connected layer of the model. This result suggests that the CPIC developed in this study can be used to identify and pick P/S arrivals in other regions with no or minimum labeled phases.
Article
Full-text available
In this letter, we use deep neural networks for unsupervised clustering of seismic data. We perform the clustering in a feature space that is simultaneously optimized with the clustering assignment, resulting in learned feature representations that are effective for a specific clustering task. To demonstrate the application of this method in seismic signal processing, we design two different neural networks consisting primarily of full convolutional and pooling layers and apply them to: 1) discriminate waveforms recorded at different hypocentral distances and 2) discriminate waveforms with different first-motion polarities. Our method results in precisions that are comparable to those recently achieved by supervised methods, but without the need for labeled data, manual feature engineering, and large training sets. The applications we present here can be used in standard single-site earthquake early warning systems to reduce the false alerts on an individual station level. However, the presented technique is general and suitable for a variety of applications including quality control of the labeling and classification results of other supervised methods.
Article
Full-text available
We developed a hybrid algorithm using both convolutional and recurrent neural networks (CNNs and RNNs, respectively) to pick phases from archived continuous waveforms in two steps. First, an eight‐layer CNN is trained to detect earthquake events from 30‐second‐long three‐component seismograms. The event seismograms are then sent to a two‐layer bidirectional RNN to pick P‐ and S‐arrival times. The data for training and validation and testing of the networks are obtained from the continuous waveforms of 16 stations recording the aftershock sequence of the 2008 Wenchuan earthquake. The augmented training set has 135,966 P–S‐wave arrival‐time pairs. The CNN achieved 94% and 98% hit rate for event and noise segments in the test set, respectively. The RNN picking accuracies for P and S waves are −0.03±0.48 (mean error ± standard deviation) and 0.03±0.56 s⁠, respectively.
Article
Full-text available
The availability of abundant digital seismic records and successful application of deep learning in pattern recognition and classification problems enable us to achieve a reliable earthquake detection framework. To overcome the limitations and challenges of conventional methods, which are mainly due to an incomplete set of template waveforms and low signal‐to‐noise ratio, we design a generalized model to improve discrimination between earthquake and noise recordings using a deep convolutional network (ConvNet). Exclusively based on a dataset of over 4900 earthquakes recorded over a period of 3 yrs in western Canada, a multilayer ConvNet is trained to learn general characteristics of background noise and earthquake signals in the time–frequency domain. In the next step, we train a secondary network using the wavelet transform of the major seismic arrivals to separate P from S waves and estimate their approximate arrival times. The results of validation experiments demonstrate promising performance and achieve an average accuracy of nearly 99% for both networks. To investigate the applicability of our algorithm, we apply the trained model on an independent dataset recently recorded in northeastern British Columbia (NE BC). It is found that deep‐learning‐based methods are superior to traditional techniques in detecting a higher number of seismic events at significantly less computational cost.
Article
Full-text available
Seismic phase association is a fundamental task in seismology that pertains to linking together phase detections on different sensors that originate from a common earthquake. It is widely employed to detect earthquakes on permanent and temporary seismic networks and underlies most seismicity catalogs produced around the world. This task can be challenging because the number of sources is unknown, events frequently overlap in time, or can occur simultaneously in different parts of a network. We present PhaseLink, a framework based on recent advances in deep learning for grid‐free earthquake phase association. Our approach learns to link phases together that share a common origin and is trained entirely on millions of synthetic sequences of P and S wave arrival times generated using a 1‐D velocity model. Our approach is simple to implement for any tectonic regime, suitable for real‐time processing, and can naturally incorporate errors in arrival time picks. Rather than tuning a set of ad hoc hyperparameters to improve performance, PhaseLink can be improved by simply adding examples of problematic cases to the training data set. We demonstrate the state‐of‐the‐art performance of PhaseLink on a challenging sequence from southern California and synthesized sequences from Japan designed to test the point at which the method fails. For the examined data sets, PhaseLink can precisely associate phases to events that occur only ∼12 s apart in origin time. This approach is expected to improve the resolution of seismicity catalogs, add stability to real‐time seismic monitoring, and streamline automated processing of large seismic data sets.
Conference Paper
Full-text available
These days deep learning is the fastest-growing field in the field of Machine Learning(ML)and Deep Neural Networks (DNN). Among many of DNN structures, the Convolutional Neural Networks (CNN) are currently the main tool used for the image analysis and classification purposes. Although great achievements and perspectives, deepneural networks and accompanying learning algorithms have somerelevant challenges to tackle. In this paper, we have focused on the most frequently mentioned problem in the field of machine learning, that is the lack of sufficient amount of the training data or uneven class balance within the datasets. One of the ways of dealing with this problem is so called data augmentation. In the paper we have compared and analyzed multiple methods of data augmentation in the task of image classification, starting from classical image transformations like rotating, cropping, zooming, histogrambased methods and finishing at Style Transfer and Generative Adversarial Networks,along with the representative examples. Next, we presentedour own method of data augmentation based on image style transfer. The method allows to generate the new images of high perceptual quality that combine the content of a base image with the appearance of another ones. The newly created images can be used to pre-train the given neural network in order to improve the training process efficiency. Proposed method is validated on the three medical case studies: skin melanomas diagnosis, histopathological images and breast magnetic resonance imaging (MRI) scans analysis, utilizing the image classificationin order to provide a diagnose. In such kind of problemsthe data deficiency is one of the most relevantissues. Finally, we discuss the advantages and disadvantages of the methods being analyzed (PDF) Data augmentation for improving deep learning in image classification problem. Available from: https://www.researchgate.net/publication/325920702_Data_augmentation_for_improving_deep_learning_in_image_classification_problem [accessed Aug 26 2019].
Article
Full-text available
Performance of earthquake early warning (EEW) systems suffers from false alerts caused by local impulsive noise from natural or anthropogenic sources. To mitigate this problem, we train a Generative Adversarial Network (GAN) to learn the characteristics of first‐arrival earthquake P waves, using 300,000 waveforms recorded in southern California and Japan. We apply the GAN critic as an automatic feature extractor and train a Random Forest classifier with about 700,000 earthquake and noise waveforms. We show that the discriminator can recognize 99.2% of the earthquake P waves and 98.4% of the noise signals. This state‐of‐the‐art performance is expected to reduce significantly the number of false triggers from local impulsive noise. Our study demonstrates that GANs can discover a compact and effective representation of seismic waves, which has the potential for wide applications in seismology.
Article
Full-text available
Determining earthquake hypocenters and focal mechanisms requires precisely measured P-wave arrival times and first-motion polarities. Automated algorithms for estimating these quantities have been less accurate than estimates by human experts, which is problematic for processing large data volumes. Here, we train convolutional neural networks to measure both quantities, which learn directly from seismograms without the need for feature extraction. The networks are trained on 18.2 million manually picked seismograms for the southern California region. Through cross-validation on 1.2 million independent seismograms, the differences between the automated and manual picks have a standard deviation of 0.023 seconds. The polarities determined by the classifier have a precision of 95% when compared with analyst-determined polarities. We show that the classifier picks more polarities overall than the analysts, without sacrificing quality, resulting in almost double the number of focal mechanisms. The remarkable precision of the trained networks indicates that they can perform as well, or better, than expert seismologists.
Article
Full-text available
Deep learning methods, and in particular convolutional neural networks (CNNs), have led to an enormous breakthrough in a wide range of computer vision tasks, primarily by using large-scale annotated datasets. However, obtaining such datasets in the medical domain remains a challenge. In this paper, we present methods for generating synthetic medical images using recently presented deep learning Generative Adversarial Networks (GANs). Furthermore, we show that generated medical images can be used for synthetic data augmentation, and improve the performance of CNN for medical image classification. Our novel method is demonstrated on a limited dataset of computed tomography (CT) images of 182 liver lesions (53 cysts, 64 metastases and 65 hemangiomas). We first exploit GAN architectures for synthesizing high quality liver lesion ROIs. Then we present a novel scheme for liver lesion classification using CNN. Finally, we train the CNN using classic data augmentation and our synthetic data augmentation and compare performance. In addition, we explore the quality of our synthesized examples using visualization and expert assessment. The classification performance using only classic data augmentation yielded 78.6% sensitivity and 88.4% specificity. By adding the synthetic data augmentation the results increased to 85.7% sensitivity and 92.4% specificity. We believe that this approach to synthetic data augmentation can generalize to other medical classification applications and thus support radiologists' efforts to improve diagnosis.
Article
Full-text available
While convolutional neural networks (CNNs) have been successfully applied to many challenging classification applications, they typically require large datasets for training. When the availability of labeled data is limited, data augmentation is a critical preprocessing step for CNNs. However, data augmentation for wearable sensor data has not been deeply investigated yet. In this paper, various data augmentation methods for wearable sensor data are proposed. The proposed methods and CNNs are applied to the problem of classifying the motor state of Parkinson's Disease (PD) patients, which is challenging due to small dataset size, noisy labels, and large within-class variability. Appropriate augmentation improves the classification performance from 76.7% to 92.0%.
Article
Full-text available
The recent evolution of induced seismicity in Central United States calls for exhaustive catalogs to improve seismic hazard assessment. Over the last decades, the volume of seismic data has increased exponentially, creating a need for efficient algorithms to reliably detect and locate earthquakes. Today's most elaborate methods scan through the plethora of continuous seismic records, searching for repeating seismic signals. In this work, we leverage the recent advances in artificial intelligence and present ConvNetQuake, a highly scalable convolutional neural network for earthquake detection and location from a single waveform. We apply our technique to study the induced seismicity in Oklahoma (USA). We detect 20 times more earthquakes than previously cataloged by the Oklahoma Geological Survey. Our algorithm is orders of magnitude faster than established methods.
Article
Full-text available
The seismic waveforms would be clipped when the amplitude exceeds the upper-limit dynamic range of seismometer. Clipped waveforms are typically assumed not useful and seldom used in waveformbased research. Here, we assume the clipped components of the waveform share the same frequency content with the un-clipped components. We leverage this similarity to convert clipped waveforms to true waveforms by iteratively reconstructing the frequency spectrum using the projection onto convex sets method. Using artificially clipped data we find that statistically the restoration error is similar to 1% and similar to 5% when clipped at 70% and 40% peak amplitude, respectively. We verify our method using real data recorded at co-located seismometers that have different gain controls, one set to record large amplitudes on scale and the other set to record low amplitudes on scale. Using our restoration method we recover 87 out of 93 clipped broadband records from the 2013 Mw6.6 Lushan earthquake. Estimating that we recover 20 clipped waveforms for each M5.0+ earthquake, so for the similar to 1,500 M5.0+ events that occur each year we could restore similar to 30,000 clipped waveforms each year, which would greatly enhance useable waveform data archives. These restored waveform data would also improve the azimuthal station coverage and spatial footprint.
Article
Full-text available
Training a deep convolutional neural network (CNN) from scratch is difficult because it requires a large amount of labeled training data and a great deal of expertise to ensure proper convergence. A promising alternative is to fine-tune a CNN that has been pre-trained using, for instance, a large set of labeled natural images. However, the substantial differences between natural and medical images may advise against such knowledge transfer. In this paper, we seek to answer the following central question in the context of medical image analysis: Can the use of pre-trained deep CNNs with sufficient fine-tuning eliminate the need for training a deep CNN from scratch? To address this question, we considered four distinct medical imaging applications in three specialties (radiology, cardiology, and gastroenterology) involving classification, detection, and segmentation from three different imaging modalities, and investigated how the performance of deep CNNs trained from scratch compared with the pre-trained CNNs fine-tuned in a layer-wise manner. Our experiments consistently demonstrated that 1) the use of a pre-trained CNN with adequate fine-tuning outperformed or, in the worst case, performed as well as a CNN trained from scratch; 2) fine-tuned CNNs were more robust to the size of training sets than CNNs trained from scratch; 3) neither shallow tuning nor deep tuning was the optimal choice for a particular application; and 4) our layer-wise fine-tuning scheme could offer a practical way to reach the best performance for the application at hand based on the amount of available data.
Article
Full-text available
Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large- scale visual recognition challenge (ILSVRC2012). The suc- cess of CNNs is attributed to their ability to learn rich mid- level image representations as opposed to hand-designed low-level features used in other image classification meth- ods. Learning CNNs, however, amounts to estimating mil- lions of parameters and requires a very large number of annotated image samples. This property currently prevents application of CNNs to problems with limited training data. In this work we show how image representations learned with CNNs on large-scale annotated datasets can be effi- ciently transferred to other visual recognition tasks with limited amount of training data. We design a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset. We show that despite differences in image statistics and tasks in the two datasets, the transferred rep- resentation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets. We also show promising results for object and action localization.
Article
Correctly determining the association of seismic phases across a network is crucial for developing accurate earthquake catalogs. Nearly all established methods use travel time information as the main criterion for determining associations, and in problems in which earthquake rates are high and many false arrivals are present, many standard techniques may fail to resolve the problem accurately. As an alternative approach, in this work we apply convolutional neural networks (CNNs) to the problem of associations; we train CNNs to read earthquake waveform arrival pairs between two stations and predict the binary classification of whether the two waveforms are from a common source or different sources. Applying the method to a large training dataset of previously cataloged earthquakes in Chile, we obtain > 80% true positive prediction rates for high frequency data (> 2 Hz) and stations separated in excess of 100 km. As a secondary benefit, the output of the neural network can also be used to infer predicted phase types of arrivals. The method is ideally applied in conjunction with standard travel-time-based association routines and can be adapted for arbitrary network geometries and applications, so long as sufficient training data are available.
Article
To optimally monitor earthquake‐generating processes, seismologists have sought to lower detection sensitivities ever since instrumental seismic networks were started about a century ago. Recently, it has become possible to search continuous waveform archives for replicas of previously recorded events (i.e., template matching), which has led to at least an order of magnitude increase in the number of detected earthquakes and greatly sharpened our view of geological structures. Earthquake catalogs produced in this fashion, however, are heavily biased in that they are completely blind to events for which no templates are available, such as in previously quiet regions or for very large‐magnitude events. Here, we show that with deep learning, we can overcome such biases without sacrificing detection sensitivity. We trained a convolutional neural network (ConvNet) on the vast hand‐labeled data archives of the Southern California Seismic Network to detect seismic body‐wave phases. We show that the ConvNet is extremely sensitive and robust in detecting phases even when masked by high background noise and when the ConvNet is applied to new data that are not represented in the training set (in particular, very large‐magnitude events). This generalized phase detection framework will significantly improve earthquake monitoring and catalogs, which form the underlying basis for a wide range of basic and applied seismological research.
Article
Automatic event detection from time series signals has broad applications. Traditional detection methods detect events primarily by the use of similarity and correlation in data. Those methods can be inefficient and yield low accuracy. In recent years, machine learning techniques have revolutionized many sciences and engineering domains. In particular, the performance of object detection in a 2-D image data has significantly improved due to deep neural networks. In this paper, we develop a deep-learning-based detection method, called "DeepDetect," to detect events from seismic signals. We find that the direct adaptation of similar ideas from 2-D object detection to our problem faces two challenges. The first challenge is that the duration of earthquake event varies significantly; the other is that the proposals generated are temporally correlated. To address these challenges, we propose a novel cascaded region-based convolutional neural network to capture earthquake events in different sizes while incorporating contextual information to enrich features for each proposal. To achieve a better generalization performance, we use densely connected blocks as the backbone of our network. Because some positive events are not correctly annotated, we further formulate the detection problem as a learning-from-noise problem. To verify the performance, we employ the seismic data generated from the Pennsylvania State University Rock and Sediment Mechanics Laboratory, and we acquire labels with the help of experts. We show that our techniques yield high accuracy. Therefore, our novel deep-learning-based detection methods can potentially be powerful tools for identifying events from the time series data in various applications.
Article
The conventional arrival pick-up algorithms cannot avoid the manual modification of the parameters for the simultaneous identification of multiple events under different signal-tonoise ratios (SNRs). Therefore, in order to automatically obtain the arrivals of multiple events with high precision under different SNRs, in this study an algorithm was proposed which had the ability to pick up the arrival of microseismic or acoustic emission events based on deep recurrent neural networks. The arrival identification was performed using two important steps, which included a training phase and a testing phase. The training process was mathematically modelled by deep recurrent neural networks using Long Short-Term Memory architecture. During the testing phase, the learned weights were utilized to identify the arrivals through the microseismic/acoustic emission data sets. The data sets were obtained by rock physics experiments of the acoustic emission. In order to obtain the data sets under different SNRs, this study added random noise to the raw experiments' data sets. The results showed that the outcome of the proposed method was able to attain an above 80 per cent hit-rate at SNR 0 dB, and an approximately 70 per cent hit-rate at SNR -5 dB, with an absolute error in 10 sampling points. These results indicated that the proposed method had high selection precision and robustness. © The Authors 2017. Published by Oxford University Press on behalf of The Royal Astronomical Society.
Conference Paper
While convolutional neural networks (CNNs) have been successfully applied to many challenging classification applications, they typically require large datasets for training. When the availability of labeled data is limited, data augmentation is a critical preprocessing step for CNNs. However, data augmentation for wearable sensor data has not been deeply investigated yet. In this paper, various data augmentation methods for wearable sensor data are proposed. The proposed methods and CNNs are applied to the classification of the motor state of Parkinson’s Disease patients, which is challenging due to small dataset size, noisy labels, and large intra-class variability. Appropriate augmentation improves the classification performance from 77.54% to 86.88%.
Conference Paper
Can a large convolutional neural network trained for whole-image classification on ImageNet be coaxed into detecting objects in PASCAL? We show that the answer is yes, and that the resulting system is simple, scalable, and boosts mean average precision, relative to the venerable deformable part model, by more than 40% (achieving a final mAP of 48% on VOC 2007). Our framework combines powerful computer vision techniques for generating bottom-up region proposals with recent advances in learning high-capacity convolutional neural networks. We call the resulting system R-CNN: Regions with CNN features. The same framework is also competitive with state-of-the-art semantic segmentation methods, demonstrating its flexibility. Beyond these results, we execute a battery of experiments that provide insight into what the network learns to represent, revealing a rich hierarchy of discriminative and often semantically meaningful features.