Conference Paper

Erratic server behavior detection using machine learning on basic monitoring metrics

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper provides a detailed overview of the Unified Monitoring Architecture (UMA) that aims at merging the monitoring of the CERN IT data centres and the WLCG monitoring using common and widely-adopted open source technologies such as Flume, Elasticsearch, Hadoop, Spark, Kibana, Grafana and Zeppelin. It provides insights and details on the lessons learned, explaining the work performed in order to monitor the CERN IT data centres and the WLCG computing activities such as the job processing, data access and transfers, and the status of sites and services.
Article
Full-text available
This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.
Article
Full-text available
Ganesha aims to diagnose faults transparently in MapReduce systems, by analyzing OS-level metrics alone. Ganesha’s approach is based on peer-symmetry under fault-free conditions, and can diagnose faults that manifest asymmetrically at nodes within a MapReduce system. While our training is performed on smaller Hadoop clusters and for specific workloads, our approach allows us to diagnose faults in larger Hadoop clusters and for unencountered workloads. We also candidly highlight faults that escape Ganesha’s black-box diagnosis.
Article
This survey paper describes a focused literature survey of machine learning (ML) and data mining (DM) methods for cyber analytics in support of intrusion detection. Short tutorial descriptions of each ML/DM method are provided. Based on the number of citations or the relevance of an emerging method, papers representing each method were identified, read, and summarized. Because data are so important in ML/DM approaches, some well-known cyber data sets used in ML/DM are described. The complexity of ML/DM algorithms is addressed, discussion of challenges for using ML/DM for cyber security is presented, and some recommendations on when to use a given method are provided.
Article
Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark's open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shipped with Spark, MLlib supports several languages and provides a high-level API that leverages Spark's rich ecosystem to simplify the development of end-to-end machine learning pipelines. MLlib has experienced a rapid growth due to its vibrant open-source community of over 140 contributors, and includes extensive documentation to support further growth and to let users quickly get up to speed.
Article
Monitor anything, not just system parameters.
Article
Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter-correlated quantitative dependent variables. Its goal is to extract the important information from the table, to represent it as a set of new orthogonal variables called principal components, and to display the pattern of similarity of the observations and of the variables as points in maps. The quality of the PCA model can be evaluated using cross-validation techniques such as the bootstrap and the jackknife. PCA can be generalized as correspondence analysis (CA) in order to handle qualitative variables and as multiple factor analysis (MFA) in order to handle heterogeneous sets of variables. Mathematically, PCA depends upon the eigen-decomposition of positive semi-definite matrices and upon the singular value decomposition (SVD) of rectangular matrices. Copyright © 2010 John Wiley & Sons, Inc.For further resources related to this article, please visit the WIREs website.
Conference Paper
In this paper, we present a stream-based mining algorithm for online anomaly prediction. Many real-world applications such as data stream analysis requires continuous cluster operation. Unfortunately, today's large-scale cluster systems are still vulnerable to various software and hardware problems. System administrators are often overwhelmed by the tasks of correcting various system anomalies such as processing bottlenecks (i.e., full stream buffers), resource hot spots, and service level objective (SLO) violations. Our anomaly prediction scheme raises early alerts for impending system anomalies and suggests possible anomaly causes. Specifically, we employ Bayesian classification methods to capture different anomaly symptoms and infer anomaly causes. Markov models are introduced to capture the changing patterns of different measurement metrics. More importantly, our scheme combines Markov models and Bayesian classification methods to predict when a system anomaly will appear in the foreseeable future and what are the possible anomaly causes. To the best of our knowledge, our work provides the first stream-based mining algorithm for predicting system anomalies. We have implemented our approach within the IBM System S distributed stream processing cluster, and conducted case study experiments using fully implemented distributed data analysis applications processing real application workloads. Our experiments show that our approach efficiently predicts and diagnoses several bottleneck anomalies with high accuracy while imposing low overhead to the cluster system.
Article
Ganglia is a scalable distributed monitoring system for high performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It relies on a multicast-based listen/announce protocol to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on over 500 clusters around the world. This paper presents the design, implementation, and evaluation of Ganglia along with experience gained through real world deployments on systems of widely varying scale, configurations, and target application domains over the last two and a half years.
Information quantity
  • M Hilbert
M. Hilbert, Information quantity, in Encyclopedia of Big Data, (Cham), pp. 1-4, Springer International Publishing (2017), DOI.