Fig 2 - uploaded by Rui Zhou
Content may be subject to copyright.
Source publication
Outlier or anomaly detection is one of the major challenges in big data analytics since unusual but insightful patterns are often hidden in massive data sets such as sensing data and social networks. Sampling techniques have been a focus for outlier detection to address scalability on big data. The recent study has shown uniform random sampling wit...
Contexts in source publication
Context 1
... The outlier detection methods are listed in Table 2, with sampling types U, D+1, D-1 and P, for Uniform, Density biased (v=+1), Density biased (v=-1) and Piecewise density biased sampling, respectively. The parameters for the P sampling type are as follows: ζ B = 0.7, υ l = −1, υ r = 1. The execution time results are reported in Fig. 2, and the performance results in Fig. 3. As per Fig. 3, the detectors with density biased sampling outperform those with uniform sampling in most cases. As to the traditional density biased sampling, the first three data sets prefer υ < 0, while the last one get very good results with υ > 0. However, the performance of piecewise density ...
Context 2
... the first three data sets prefer υ < 0, while the last one get very good results with υ > 0. However, the performance of piecewise density biased sampling is mostly comparable to or even better than the cases where υ < 0 or υ > 0. The ensemble iterative methods mostly outperform others. However, the performance gains are at time cost as shown in Fig. ...
Similar publications
In this paper, we propose a novel self-supervised representation learning by taking advantage of a neighborhood-relational encoding (NRE) among the training data. Conventional unsupervised learning methods only focused on training deep networks to understand the primitive characteristics of the visual data, mainly to be able to reconstruct the data...
Citations
This paper presents LSHAD, an anomaly detection (AD) method based on Locality Sensitive Hashing (LSH), capable of dealing with large-scale datasets. The resulting algorithm is highly parallelizable and its implementation in Apache Spark further increases its ability to handle very large datasets. Moreover, the algorithm incorporates an automatic hyperparameter tuning mechanism so that users do not have to implement costly manual tuning. Our LSHAD method is novel as both hyperparameter automation and distributed properties are not usual in AD techniques. Our results for experiments with LSHAD across a variety of datasets point to state-of-the-art AD performance while handling much larger datasets than state-of-the-art alternatives. In addition, evaluation results for the tradeoff between AD performance and scalability show that our method offers significant advantages over competing methods.
Various cyber attacks often occur in logistics network of the Industry4.0,which poses a threat to internet security.Intrusion detection can intelligently detect anomalous activities and secure the internet with the help of anomaly detection algorithms.Different from static data,intrusion detection data is a dynamic data form and has the following characteristics.First,itis multi-aspect.Second,it contains point anomalies and group anomalies.Third,there are correlations between different attributes.Nevertheless,these properties pose a challenge onexisting anomaly detection approaches.Thus,a novel anomaly detection approachMDS_AD is proposed in this paper to deal with the challenges.It combines Locality-Sensitive Hashing(LSH),isolation forest and PCA techniques.MDS_ADhas the following properties:(a)the introducedLSH can operate on multi-aspect data.(b)MDS_ADcan effectively catch group anomalies from the experimental results.(c)the PCA is utilized to reduce dimensionality for correlations between different attributes.(d)MDS_ADis a streaming approach,which can perform model update and process data in constant memory and time.To confirm the performance of MDS_AD,multiple experiments are designed and implemented on UNSW-NB15dataset.Experimentalresults show thatMDS_AD outperformsstate-of-the-art baselines.