Conference Paper

?-Anomica: A Fast Support Vector Based Novelty Detection Technique.

Nikunj.C.Oza@nasa, NASA Ames Research Center
DOI: 10.1109/ICDM.2009.42 Conference: ICDM 2009, The Ninth IEEE International Conference on Data Mining, Miami, Florida, USA, 6-9 December 2009
Source: DBLP

ABSTRACT In this paper we propose ν-Anomica, a novel anomaly detection technique that can be trained on huge data sets with much reduced running time compared to the benchmark one-class Support Vector Machines algorithm. In ν-Anomica, the idea is to train the machine such that it can provide a close approximation to the exact decision plane using fewer training points and without losing much of the generalization performance of the classical approach. We have tested the proposed algorithm on a variety of continuous data sets under different conditions. We show that under all test conditions the developed procedure closely preserves the accuracy of standard one- class Support Vector Machines while reducing both the training time and the test time by 5 � 20 times.

Download full-text

Full-text

Available from: Nikunj C Oza, Aug 13, 2015
0 Followers
  • Source
    • "In the context of outlier detection from aviation data, several papers that use support vector machines have been published. Das et al. [12] presented a technique for speeding up one-class support vector machines (SVMs) using a sampling strategy. The authors showed that the proposed technique is 15 times faster than the traditional one-class SVM while maintaining accuracy. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1 Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.
    10/2013; 10(10):467-475. DOI:10.2514/1.I010080
  • Source
    • "In the context of outlier detection from aviation data, several papers have been recently published. Das et al. [17] present a technique for speeding up 1-class SVM using a sampling strategy. The authors show that the proposed technique is 15 times faster than the traditional 1-class SVM while maintaining accuracy. "
    [Show abstract] [Hide abstract]
    ABSTRACT: There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of data sets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only because of the massive volume of data but also because these data sets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available data sets: (i) the NASA MODIS satellite images and (ii) a simulated aviation data set generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS). © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 4: 393–406, 2011
    Statistical Analysis and Data Mining 08/2011; 4(4):393-406. DOI:10.1002/sam.10125