BookPDF Available

Event Stream Processing with BeepBeep 3: Log Crunching and Analysis Made Easy

Authors:

Abstract and Figures

Event logs and event streams can be found in software systems of very diverse kinds. For instance, workflow management systems and ERP platforms produce event logs in some common format based on XML. Financial transaction systems also keep a log of their operations in some standardized and documented format, as is the case for web servers such as Apache and Microsoft IIS. Network monitors also receive streams of packets whose various headers and fields can be analyzed. Recently, even the world of video games has seen an increasing trend towards the logging of players’ realtime activities. Analyzing the wealth of information contained in these logs can serve multiple purposes. Business process logs can be used to reconstruct a workflow based on a sample of its possible executions; financial database logs can be audited for compliance to regulations; suspicious or malicious activity can be detected by studying patterns in network or server logs. However, the available tools to process logs or streams of events are often large systems that are hard to setup, and even simple examples seem needlessly complicated. In this book, you will learn about BeepBeep, a versatile Java library intended to make the processing of event streams both fun and simple. Through more than a hundred simple, illustrated code examples, you will see how running event processing tasks can be done in just a few lines of code—and what is more, code that you actually understand. From generating plots to computing statistics and evaluating temporal logic specifications, BeepBeep can prove a handy addition to a developer’s toolbox.
Content may be subject to copyright.
A preview of the PDF is not available
... Note that in Chapter 5, specifically in Section 5.2, we effectively put our conceptual ideas from this chapter into practice by extending a well-known event stream processing engine called BeepBeep [93]. We conduct experiments encompassing various scenarios and subsequently discuss the results. ...
... We describe a construction that lifts a loss-tolerant "multi-monitor" from a classical monitor. In Chapter 5, we shall see In section 5.3 of Chapter 5, we present a concrete implementation of our pipeline and different categories of proxies as extensions of the BeepBeep event stream processing library [93] and provide a comparison and discussion of the obtained results. ...
... give users the freedom of choosing the formal notation of their choice for each component of the pipeline. Nevertheless, a software implementation of each framework has been developed as a Java library that extends the BeepBeep event stream processing engine [93] and several experiments are done to test each framework. This chapter presents an overview of BeepBeep, the experiments conducted, and a detailed discussion of the obtained results. ...
Thesis
Full-text available
This thesis innovates in the fields of Runtime Verification (RV) and Runtime Enforcement (RE) by addressing key challenges in monitoring and enforcing software behavior. In RV, where traditional monitors assume complete event trace visibility, the thesis introduces a novel framework designed to handle uncertainty when only partial trace information is available. Central to this framework is a stateful access control proxy that transforms events into sets of possible events, termed "multi-traces". This extension of classical Mealy machines enhances monitor resilience against data degradation and access limitations, validated through extensive experiments across diverse scenarios. Meanwhile, in RE, the thesis proposes a modular enforcement monitor model that separates tasks such as altering program execution, enforcing policy compliance, and selecting replacement sequences into distinct modules. This modular approach simplifies monitor design and implementation, enhancing flexibility and adaptability in enforcing security policies at runtime. Practical implementation using the BeepBeep event stream processor demonstrates the frameworks' effectiveness in dynamically selecting and enforcing actions, thereby streamlining operational processes and improving overall system robustness and performance efficiency.
... In Apache Spark Streaming, operations are also modeled as a directed acyclic graph of operators [78]; each operator breaks an input stream into micro-batches that are processed in a single cycle. The BeepBeep event stream processing library [45] operates in a similar fashion, by providing a set of ready-made Processor objects (different from the Kinesis class of the same name) that can be connected to achieve complex calculations on streams; this tool will be the focus of the next section. In all these systems, one of many programming languages must be used as the "glue" to create and connect boxes: Java in all cases, Python and Scala for Spark, and Groovy for BeepBeep. ...
... We shall now turn our attention towards one specific event stream processing system which was briefly mentioned earlier and is called BeepBeep [45]. It has been under active development for close to a decade, and has been used in a variety of case studies, such as the detection of security policy violations in Java programs [20], the tracking of lifecycle properties on smart containers [19], or the observation of trend deviations in event logs [65]. ...
Article
Full-text available
Stream processing is a programming paradigm that is growing in popularity due to the presence of an increasing number of academic and commercial platforms. However, there exist few tools and methodologies to properly test a program that manipulates streams; in particular, model-based testing techniques need to be adapted to the particularities of stream processing. The paper suggests that pre/post models on stream programs be specified in the form of runtime monitors, themselves implemented as stream programs. It also describes how test inputs satisfying a given precondition can be generated automatically, through a mechanism that inverts the operation of a stream processing pipeline. A proof-of-concept library implements these concepts for the specific case of the BeepBeep event stream processing library, and is evaluated experimentally. The approach can successfully find satisfying input cases for pre-conditions involving non-trivial constructs such as sliding windows, aggregations and filtering.
... BeepBeep's basic processors (adapted from[28]). ...
Article
Full-text available
The integrity of sensor datasets used in smart home applications is crucial for tasks like activity recognition and automation. We identify common validity issues such as event ordering errors, lifecycle inconsistencies, and data corruption, which are often overlooked but can significantly affect the reliability of analyses. We present a toolbox based on the BeepBeep stream processing library that enables efficient verification of 19 sanity checks on data streams. Our analysis of 15 publicly available smart home datasets collected by five different research teams reveals that most of them violate key assumptions about sensor behavior, emphasizing the need for pre-validation.
... BeepBeep 3 [5] is mainly a data stream query engine. It provides processors and functions that define recurrent operations on event logs. ...
Preprint
Full-text available
Anomaly-based intrusion detection systems are essential defenses against cybersecurity threats because they can identify anomalies in current activities. However, these systems have difficulties providing entity processing independence through a programming language. In addition, a degradation of the detection process is caused by the complexity of scheduling the training and detection processes, which are required to keep the anomaly detection system continuously updated. This paper shows how to use the algebraic state-transition diagram (ASTD) language to develop flexible anomaly detection systems. This paper provides a model for detecting point anomalies using the unsupervised non-parametric technique Kernel Density Estimation to estimate the probability density of event occurrence. The proposed model caters for both the training and the detection phase continuously. The ASTD language streamlines the modeling of detection systems thanks to its process algebraic operators that provide a solution to overcome these challenges. By delegating the combination of anomaly-based detection processes to the ASTD language, the effort and complexity are reduced during detection models development. Finally, using a qualitative evaluation, this study demonstrates that the algebraic operators in the ASTD specification language overcome these challenges.
Conference Paper
Compliance checking is the operation that consists of assessing whether every execution trace of a business process satisfies a given correctness condition. The paper introduces the notion of hyperquery, which is a calculation that involves multiple traces from a log at the same time. A particular case of hyperquery is a hypercompliance condition, which is a correctness requirement that involves the whole log instead of individual process instances. A formalization of hyperqueries is presented, along with a number of elementary operations to express hyperqueries on arbitrary logs. An implementation of these concepts in an event stream processing engine allows users to concretely evaluate hyperqueries in real time.
Chapter
Integrating security in the development and operation of information systems is the cornerstone of SecDevOps. From an operational perspective, one of the key activities for achieving such an integration is the detection of incidents (such as intrusions), especially in an automated manner. However, one of the stumbling blocks of an automated approach to intrusion detection is the management of the large volume of information typically produced by this type of solution. Existing works on the topic have concentrated on the reduction of volume by increasing the precision of the detection approach, thus lowering the rate of false alarms. However, another less explored possibility is to reduce the volume of evidence gathered for each alarm raised. This chapter explores the concept of intrusion detection from the angle of complex event processing. It provides a formalization of the notion of pattern matching in a sequence of events produced by an arbitrary system, by framing the task as a runtime monitoring problem. It then focuses on the topic of incident reporting and proposes a technique to automatically extract relevant elements of a stream that explain the occurrence of an intrusion. These relevant elements generally amount to a small fraction of all the data ingested for an alarm to be triggered and thus help reduce the volume of evidence that needs to be examined by manual means. The approach is experimentally evaluated on a proof-of-concept implementation of these principles.
Chapter
Full-text available
Runtime enforcement is an effective method to ensure the compliance of program with user-defined security policies. In this paper we show how the stream event processor tool BeepBeep can be used to monitor the security properties of Java programs. The proposed approach relies on AspectJ to generate a trace capturing the program’s runtime behavior. This trace is then processed by BeepBeep, a complex event processing tool that allows complex data-driven policies to be stated and verified with ease. Depending on the result returned by BeepBeep, AspectJ can then be used to halt the execution or take other corrective action. The proposed method offers multiple advantages, notable flexibility in devising and stating expressive user-defined security policies.
Conference Paper
Full-text available
We present an extension to the BeepBeep 3 event stream engine that allows the use of multiple threads during the evaluation of a query. Compared to the single-threaded version of BeepBeep, the allocation of just a few threads to specific portions of a query provides improvement in terms of throughput.
Chapter
Full-text available
This tutorial presents an overview of the field referred as to runtime verification. Runtime Verification is the study of algorithms, data structures, and tools focused on analyzing executions of system. The performed analysis aims at improving the confidence in systems behavior, either by improving program understanding, or by checking conformance to specifications or algorithms. This chapter focuses specifically on checking execution traces against requirements formalized in terms of monitors. It is first shown on examples how such monitors can be written using aspect-oriented programming, exemplified by ASPECTJ. Subsequently four monitoring systems are illustrated on the same examples. The systems cover such formalisms as regular expressions, temporal logics, state machines, and rule-based programming, as well as the distinction between external and internal DSLs.
Article
Full-text available
Many problems in Computer Science can be framed as the computation of queries over sequences, or "streams" of data units called events. The field of Complex Event Processing (CEP) relates to the techniques and tools developed to efficiently process these queries. However, most CEP systems developed so far have concentrated on relatively narrow types of queries, which consist of sliding windows, aggregation functions, and simple sequential patterns computed over events that have a fixed tuple structure. Many of them boast throughput, but in counterpart, they are difficult to setup and cumbersome to extend with user-defined elements. This paper describes a variety of use cases taken from real-world scenarios that present features seldom considered in classical CEP problems. It also provides a broad review of current solutions, that includes tools and techniques going beyond typical surveys on CEP. From a critical analysis of these solutions, design principles for a new type of event stream processing system are exposed. The paper proposes a simple, generic and extensible framework for the processing of event streams of diverse types; it describes in detail a stream processing engine, called BeepBeep, that implements these principles. BeepBeep's modular architecture, which borrows concepts from many other systems, is complemented with an extensible query language, called eSQL. The end result is an open, versatile, and reasonably efficient query engine that can be used in situations that go beyond the capabilities of existing systems.
Conference Paper
Full-text available
This paper describes the design and implementation of an SQL-like language for performing complex queries on event streams. The Event Stream Query Language (eSQL) aims at providing a simple, intuitive and fully non-procedural syntax, while still preserving backwards compatibility with traditional SQL. More importantly, eSQL's core syntax is designed to be extended by user-defined grammatical constrcts. These new constructs can form domain-specific sub-languages, with eSQL being used as the “glue” to form very expressive queries. These concepts have been implemented in BeepBeep 3, an open source event stream query engine.
Article
Full-text available
This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.
Conference Paper
Full-text available
We explore of use of the tool BeepBeep, a monitor for the temporal logic LTL-FO+^+, in interpreting assembly traces, focusing on security-related applications. LTL-FO+^+ is an extension of LTL, which includes first order quantification. We show that LTL-FO+^+ is a sufficiently expressive formalism to state a number of interesting program behaviors, and demonstrate experimentally that BeepBeep can efficiently verify the validity of the properties on assembly traces in tractable time.
Conference Paper
Full-text available
Chapter
In this paper and its accompanying tutorial, we discuss the topic of runtime verification for linear-time temporal logic specifications. We recall the idea of runtime verification, give ideas about specification languages for runtime verification and develop a solid theory for linear-time temporal logic. Concepts like monitors, impartiality, and anticipation are explained based on this logic.