Conference Paper

?-Anomica: A Fast Support Vector Based Novelty Detection Technique.

Nikunj.C.Oza@nasa, NASA Ames Research Center
DOI: 10.1109/ICDM.2009.42 Conference: ICDM 2009, The Ninth IEEE International Conference on Data Mining, Miami, Florida, USA, 6-9 December 2009
Source: DBLP


In this paper we propose ν-Anomica, a novel anomaly detection technique that can be trained on huge data sets with much reduced running time compared to the benchmark one-class Support Vector Machines algorithm. In ν-Anomica, the idea is to train the machine such that it can provide a close approximation to the exact decision plane using fewer training points and without losing much of the generalization performance of the classical approach. We have tested the proposed algorithm on a variety of continuous data sets under different conditions. We show that under all test conditions the developed procedure closely preserves the accuracy of standard one- class Support Vector Machines while reducing both the training time and the test time by 5 � 20 times.

Download full-text


Available from: Nikunj C Oza
  • Source
    • "In the context of outlier detection from aviation data, several papers that use support vector machines have been published. Das et al. [12] presented a technique for speeding up one-class support vector machines (SVMs) using a sampling strategy. The authors showed that the proposed technique is 15 times faster than the traditional one-class SVM while maintaining accuracy. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1 Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.
    Full-text · Article · Oct 2013 · Journal of Aerospace Information Systems
  • Source
    • "In the context of outlier detection from aviation data, several papers have been recently published. Das et al. [17] present a technique for speeding up 1-class SVM using a sampling strategy. The authors show that the proposed technique is 15 times faster than the traditional 1-class SVM while maintaining accuracy. "
    [Show abstract] [Hide abstract]
    ABSTRACT: There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of data sets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only because of the massive volume of data but also because these data sets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available data sets: (i) the NASA MODIS satellite images and (ii) a simulated aviation data set generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS). © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 4: 393–406, 2011
    Preview · Article · Aug 2011 · Statistical Analysis and Data Mining
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we propose an innovative learning algorithm - a variation of One-class ν Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with a much reduced computational complexity. The proposed technique returns an approximate solution, nearly as good as the solution obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in reduced time in comparison to the classical approach. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Throughout the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate its effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class ν SVMs while reducing both training time and test time by several factors.
    No preview · Conference Paper · Jan 2011