ArticlePDF Available

Interactive Visualization of High-Velocity Event Streams

Authors:
  • LOVOO GmbH

Abstract and Figures

Today, complex event processing systems enable real-time analysis of high-velocity event streams. Considering their efficiency for high-speed data analytics, they provide a promising basis for real-time visualization. However, a CEP system has to deal with several streaming-specific problems when being used for modern, web-based visualizations. Such visualizations do not only consume streaming data in real-time, but should also provide advanced, interactive exploration options, run on mobile devices, and scale efficiently for mass user applications. In this paper, I define three core challenges for CEP systems regarding interactive, real-time visualization for future web applications. Within my PhD work, I want to meet those challenges by investigating (1) Interactivity Operators, solving problems with long running queries, (2) backend-powered Visualization Operators, relieving mobile devices of rendering duties, and (3) Multi-User Visualization Pipelines that avoid redundant data processing when serving visualizations to thousands of event stream consumers.
Content may be subject to copyright.
Interactive Visualization of High-Velocity Event Streams
Uwe Jugel
SAP Research Dresden
Chemnitzer Str. 48
01187 Dresden
uwe.jugel@sap.com
Volker Markl
Technische Universit¨
at Berlin
Straße des 17. Juni 135
10623 Berlin
volker.markl@tu-berlin.de
ABSTRACT
Today, complex event processing systems enable real-time
analysis of high-velocity event streams. Considering their ef-
ficiency for high-speed data analytics, they provide a promis-
ing basis for real-time visualization. However, a CEP system
has to deal with several streaming-specific problems when
being used for modern, web-based visualizations. Such visu-
alizations do not only consume streaming data in real-time,
but should also provide advanced, interactive exploration
options, run on mobile devices, and scale efficiently for mass
user applications.
In this paper, I define three core challenges for CEP sys-
tems regarding interactive, real-time visualization for future
web applications. Within my PhD work, I want to meet
those challenges by investigating (1) Interactivity Operators,
solving problems with long running queries, (2) backend-
powered Visualization Operators, relieving mobile devices of
rendering duties, and (3) Multi-User Visualization Pipelines
that avoid redundant data processing when serving visual-
izations to thousands of event stream consumers.
1. MOTIVATION
Complex Event Processing (CEP) [21] is an established tech-
nology for processing high-velocity data in real-time, for sce-
narios where sub-second latency matters significantly. The
most common applications are electronic trading systems.
Other scenarios, where real-time processing of events is cru-
cial, are military surveillance, manufacturing automation,
pipeline monitoring, fraud detection, and tolling [10].
CEP systems can process data rapidly and provide the
resulting Complex Events to higher level applications. In
many cases, a human being will be informed by the system
on certain complex events, and often the system continu-
ously pushes high-frequency data to many connected clients.
All data may be further processed and will eventually be vi-
sualized on a client device, for example, as simple bitmap or
Volker Markl is Uwe’s PhD supervisor.
Web Servers Web Clients
ws1
ws2
wsp
n2
ni
n1
c1c2
c5
c3
c6
c4
cq
...
...
...
cq-2 cq-1
inner
CEP
nodes
s2
sj
s1
s3
CEP
outbound
channels
CEP end-node
n
Stream Broker
ws
Web
Clients
events
CEP System
Legend
Figure 1: CEP for web-scale, real-time visualization
– now more often – as interactive chart in a visual analyt-
ics application. Good business intelligence tools follow the
Visual Analytics Mantra by Keim et al. [8, p. 83]:
Analyze First, Show the Important, Zoom, Filter
and Analyze Further, Details-on-Demand.
The mantra matches well with the tasks of a CEP engine:
Analyze and Filter. But current CEP systems do not play
well with Zoom,Analyze Further, and Details-on-Demand,
i.e., the need for interactivity. With a growing number of
users and devices, and a growing number of use cases, new
requirements are put on the CEP system and the entire
stream processing and visualization pipeline to better sup-
port interactive, real-time visualizations.
Therefore, in my PhD thesis, I will define, develop, and
evaluate a processing pipeline that meets these requirements
and allows for web-scale processing and distribution of event
streams to visualize them on any modern web client, such
as mobile phones and tablets. Figure 1 depicts a high level
view for the envisioned system, leveraging the power of a dis-
tributed CEP System that can spread workload on-demand
to many CEP nodes, and combine it with efficient Web
Server technology, working as event stream broker and vi-
sualization pipeline for the connected web clients. I want to
analyze and optimize the event processing inside this whole
system, especially considering the requirements and poten-
tial constraints for continuous rendering of event streams in
web-browsers on mobile devices.
The remainder of this paper is structured as follows. First,
I introduce the basic optimization concepts of CEP in Sec-
tion 2. Then, in Section 3, I describe the current challenges
of interactive event stream visualization to establish three
66
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. This article was presented at:
The VLDB 12 PhD Workshop, August 27th - 31st 2012, Istanbul, Turkey.
Copyright 2012..
core problems that I want to investigate in my PhD work.
In Section 4, I describe my basic concepts to solve these
problems, following up with Section 5, proposing two archi-
tectures for implementing these concepts. After an overview
of related work in Section 6, I conclude the thesis discussion
and provide an outlook on my future work in Section 7.
2. COMPLEX EVENT PROCESSING
CEP is a common approach to handle streaming data sce-
narios, where thousands to millions of events per second
[5] are continuously analyzed to deduce further actions and
make decisions. A CEP engine performs the necessary data
mining operations using Standing Queries on the stream.
CEP engines are heavily optimized for this purpose and can
often deal with complex data mining tasks while still guar-
anteeing low latencies.
Many CEP engines can spread processing work across a
(virtual) server infrastructure to handle the high work load
that is common for such systems. To use any resources ef-
ficiently, CEP systems analyze, optimize, and split up user
queries into different sub tasks. In particular, all queries are
transformed to a network of stateful or stateless CEP Oper-
ators, i.e., an acyclic graph of operators based on a subset
of the Relational Algebra for databases in combination with
Temporal Operators [7].
First, the CEP system applies a Query Optimization [15],
analyzing the operator graph, e.g., to detect overlap in oper-
ator sub-graphs. Thereafter, the system estimates the costs
for each operator. Costs are computed in different mod-
els, usually incorporating usage of CPU, RAM, and network
bandwidth. The overall optimization goal is to maximize
throughput and minimize infrastructure cost. The costs are
used to define an optimal Operator Placement, i.e., a suitable
configuration for executing the operators in parallel threads,
or different instances of the CEP engine.
3. CHALLENGES
When visualizing event streams, several conceptual and
technological challenges arise that are not well covered by
current CEP systems and event stream visualization pipe-
lines. The following sections cover these challenges in detail.
3.1 Adding Interactivity to CEP Systems
Users do not only want instant notifications on predefined
events, resulting from long running queries, but also interac-
tively drill down into the streamed data and instantly receive
the requested information. This often introduces a com-
pletely new query to the system. To instantly answer this
query, the system needs to store a certain amount of data,
based on the exploration options provided to the user. In
the worst case, this data comprises all data for the full time
span. Storing a lot of data can improve interactivity, but
contradicts to common CEP setups, where millions of events
per second stream into the system. Such vast amounts of
data do not easily fit into common databases.
Instant Queries. Unfortunately, CEP engines do not yet
support serving instant data in response to new queries,
since many queries are long running queries, gathering data
over certain time windows, and the engine may also re-
move unqueried data from the processing pipeline to op-
timize throughput and consumption of system resources. If
a new query requests data that is not inside the time win-
dows of running operators, new operators will be added to
the pipeline, and the user will not get valid results until the
requested time window is filled up again.
To solve this problem, and eventually facilitating the high
level of interactivity needed for modern real-time visualiza-
tions, I want to define a custom set of Interactivity Operators
and optimize their processing and scheduling, as described
in Section 4.1.
3.2 Streaming Events to Any Device
Today, feature-rich analytics applications are expected to
be available on mobile devices. Mobile market places pro-
vide an increasing number of such business applications,
e.g., RoamBI (roambi.com) and StockTouch (stocktouch.com).
When developing such applications, additional requirements
have to be considered, regarding device capabilities and un-
derlying technologies.
Device Capabilities. Mobile devices have limited CPU
power and memory, and often suffer from limited network
connectivity. Therefore, application developers should be
very conservative regarding use of these resources to achieve
reasonable performance and still save battery power.
Portable Front-Ends. Developing an application for each
mobile platform is expensive. The best level of technolog-
ical reuse is expected from browser-based approaches, i.e.,
using HTML5 [18]. However, browser-based development
imposes additional constraints on achievable performance,
data transfer techniques, and security. For instance, all con-
nections have to be established via HTTP, all communica-
tion has to respect cross-domain policies, and continuous
visualizations have to respect how the browser renders web
pages, e.g., in active and inactive tabs.
The whole event processing system has to handle these re-
quirements efficiently, i.e., it must be able to serve the cor-
rect, real-time streaming data to many users, using many
web browsers, running on many kinds of devices, request-
ing many views on the same data, without unnecessarily
exhausting the device’s resources. Current CEP systems do
not cope with all of these tasks and leave them to be solved
by higher level custom built applications, even though – due
to their optimized stream processing engines – they hold the
potential to solve them very efficiently.
In my PhD work I want to shift specific visualization-
related tasks from the devices to the backend, using a cus-
tom, optimized set of Visualization Operators, potentially
running inside the CEP engine, as described in Section 4.2.
3.3 Serving Visualizations to Many Users
Today, CEP is also leveraged for mass user scenarios. For
example, one of SAP’s customers estimates 22.000 simul-
taneous users with 5 queries per user running on a CEP
system. Similar numbers can be expected from other sce-
narios, such as massively multi-player online games [17] and
financial applications [5]. In the future, with the growing
popularity and performance of HTML5, I expect the front-
ends of such applications to be mainly browser-based so-
lutions. The more consumers connect to such a web-scale
system the more often users will consume similar or even
exactly the same data. Any two processing nodes that com-
pute the same value twice for two consumers are wasting
resources. A web-scale event distribution system has to effi-
67
ciently manage such processing pipelines, for example when
converting the incoming complex events to JSON messages.
For static content, such as archived images and videos on
the web, huge efforts are undertaken, e.g., by Akamai Tech-
nologies (akamai.com), to serve data in time and with lowest,
server-side resource consumption. Nevertheless, with the
web shifting towards a more real-time, event-driven system,
an efficient processing and distribution of dynamic, real-time
event streams has to be considered.
Therefore, I want to investigate the underlying data queries
and visualization operators, and develop a multi-user pro-
cessing model to facilitate reuse of processing pipelines in
the backend, as described in Section 4.3.
3.4 Hypothesis and Research Questions
The described gaps and real-time visualization challenges,
regarding interactivity of CEP systems, device limitations,
and multi-user management, converge in three research ques-
tions, forming the foundation of my thesis:
Problem 1. Interactive Complex Event Processing: How
to define, place, and schedule operators in a distributed CEP
system for processing numerous changing queries and pro-
viding instant responses?
Problem 2. Efficient Visualization Processing: How to bet-
ter/further process events in the CEP engine, to facilitate a
more efficient rendering of continuous visualizations, espe-
cially on devices with limited capabilities?
Problem 3. Efficient Multi-User Visualization Processing:
How to efficiently process events to create visualizations for
a large number of connected visualization front-ends?
4. SOLUTION CONCEPTS
Based on these problems, my goal is to develop a CEP
system that can efficiently serve data for real-time visual-
izations. Therefore, I plan to use a distributed CEP engine
as core processing unit, and extend it to efficiently process
special Interactivity and Visualization Operators, facilitat-
ing instant interaction with real-time visualizations. The
system must handle high-velocity data, high query load, and
a high number of simultaneous users. The following sections
describe solution approaches for each of the three problems.
4.1 Interactivity Operators
To solve the Instant Query problem, and to provide better
interactivity, I want real-time visualizations to report their
configuration, e.g., the supported range of values, zoom lev-
els, etc., to the CEP system, such that a set of Interactivity
Operators can be derived and scheduled to the CEP pipeline.
One of the visualizations for our current project comprises
a scatter plot and several line charts for presenting the cur-
rent activity of the stock market (see Figure 2). In this ex-
ample we stream real-time stock prices and quote data to a
(mobile) HTML5 application for visualization and optional
drill down. The scatter plot shows the number of quotes in
a certain market sector (prime) and allows the user to drill
down into the appropriate sub sectors (sub prime). The line
charts show stock price trends for the top kcompanies.
The scatter plot presents aggregations of the data (Pand
B) and allows de-aggregation on-demand (Sand A). One of
resulting data management problems is to decide where to
Quotes, prime, sub prime:
Sum of quotes in sub market:
Sum of quotes in market:
Avg. quotes in sub market:
Avg. quotes in market:
Stock price and market cap.:
Price, mcap. for top kstocks:
R(q, p, s)
S(s, p, qs) = πs,p,s GSum(q)(R)
P(p, qp) = πp,pGS um(qs)(S)
A(s, p, qs) = πs,p,s GAvg(q)(R)
B(p, qp) = πp,pGAvg (qs)(A)
M(v, c)
T(v, c) = σϕk(τc(M))
τcsorts stocks by market cap. c, and ϕlimits result to ktuples.
Figure 2: Real-time stock market data visualization
with related data and data operators.
process the aggregates. The sub prime aggregation is com-
puted by the backend and the clients compute the primes
based on the sub prime values. Backend and front-end pro-
cess these operations for every tick. The client-side aggre-
gation has to be done for each client, redundantly yielding
the same results. When considering the system as a whole,
such inefficiencies could be eliminated by doing the aggre-
gate in the backend and using a smarter visualization that
receives only the data it currently displays (either primes
or sub primes). This could reduce the overall workload and
save network traffic.
The current visualization only receives the currently ag-
gregated data for the last timestamp. If the user wants to
see past quotes or stock values, the client-side application
can only resort on data it already received from the back-
end. However, a good analytics application would provide a
more fine-grained and generic interaction model. Therefore
the visualization system has to support two main features.
First, it has to intelligently manage several data stores,
i.e., history caches, for each interaction option, thus the
clients can request historical data on demand.
Second, the system needs to schedule several visualization-
related queries, even if no client currently receives the re-
sults. Running these Shadow Queries will allow for instant
responses for any supported user interaction, i.e., any sup-
ported query. For example, the user may want to request
stock prices for non-top-k companies.
In a nutshell: to solve thesis Problem 1, I need to define
rules and strategies for an interactive, real-time visualization
system to manage (1) where and how much (historical) data
to store and (2) where to process operators on this data.
4.2 Visualization Operators
For supporting efficient rendering, the envisioned system
will provide Visualization Operators running in the backend,
either as part of the application level processing pipeline
or as operators in the CEP engine, taking over tasks from
any connected client. These clients can be relieved from
several visualization pre-processing steps, and will only have
to conduct the final rendering.
68
For example, in our stock market application, in addition
to the value aggregation, each client also continuously com-
putes the scatter plot geometry for primes and sub primes,
and several line chart geometries for each of the top kstocks.
Such geometry computations can make up a significant part
of the client-side continuous processing and are therefore
good candidates for being outsourced to the backend.
The scalar transformation of quote data to scatter plot
coordinates is an example for a Visualization Operator. Al-
though, such a scalar transformation of set-based data is
actually a very cheap operation that may well run on the
client, more expensive operators are those that require to
iterate (recursively) over a whole data set to derive a ”sta-
ble” geometry. Most layout computations belong to this
category, e.g., treemaps [16], circle packing [20], and graphs.
To get a better view on the continuous rendering per-
formance of such geometric transformations, I implemented
several real-time visualization test cases. For each visualiza-
tion I separated the layout and geometry computation, i.e.,
the Visualization Operator, from the DOM updating and
drawing part of the client-side processing. The visualiza-
tions are implemented using D3 [2] (mbostock.github.com/d3),
which is reasonably fast and compatible with mobile de-
vices. Figure 3 depicts the results for three different tests:
(1) generation of an svg:path for a Line Chart, (2) Treemap
layouting, and (3) Circle Pack layouting.
100%
50%
20%
10%
20%
10%
Line Chart Treemap Circle Pack
Layout Time
Chrome
Firefox
Firefox
Safari
(Android)
(iPad)
Figure 3: Geometry/layout computation time
The results show that layout computation performance
varies a lot with the use case and must not be neglected
when building real-time visualizations. Especially for the
line chart, I observed a very high workload of up to 78%
of the overall client-side processing, which is caused by the
numerous string concatenations in the browser’s JavaScript
VM (i.e., a lot of memcpy operations) that are required for
creating the dattribute of the svg:path.
This demonstrates that, depending on the rendering tech-
nology, there exist very expensive visualization operators
that may heavily benefit from backend processing. There-
fore, one goal of my PhD thesis is to identify and classify
a common set of such geometric operators and implement
them as backend Visualization Operators that can enrich
event-streams with geometry data.
The implementation requires splitting up geometric trans-
formations into separate steps, as done for the performance
tests. In general, I need to analyze the actual continuous
data flow processing for different visualizations, i.e., how
data sets are transformed to the common geometric primi-
tives. The expected outcome is a clear separation of which
parts of the transformation can be expressed as classical rela-
tional database operator and which parts require additional
processing, e.g., using custom, stateful operators.
My research will facilitate using backend and front-end in
an optimal way to speed up the entire real-time visualization
pipeline, thereby effectively solving Problem 2 of my thesis.
4.3 Multi-User Visualization Processing
In Section 1, going back to Figure 1, I already introduced
the basic model for event stream distribution on top of a
CEP System, using a distributed CEP Engine and scalable
stream-processing Web Servers for processing visualizations
and pushing events to a large number of Web Clients con-
nected to the system.
Figure 1 also shows in detail that multiple users may re-
ceive data from a single data source. For example, the clients
c1to c6receive a single event stream s1through web servers
ws1and ws2. In this case the system is continuously push-
ing the same data to these clients and has to process it
accordingly. This processing includes all running Interac-
tivity Operators, but also the Visualization Operators and
additional tasks, such as the deserialization of binary event
data and the generation of web-friendly JSON data.
In a multi-user system many users may require different
visualization geometries, e.g., for different screen sizes. How-
ever, I expect them to use a common core geometry that
could be processed by the backend, leaving the client to
conduct only a less costly (e.g., scalar) transformation oper-
ation. By analyzing common visual transformations I plan
to categorize such visualization operators, and estimate their
reuse potential in multi-user scenarios.
This multi-user visualization processing is closely related
to the multi-query optimization problem [15]. Therefore,
to achieve an optimal processing, I consider reusing, e.g.,
the CEP engine’s query subsumption detection for finding
“similar visualizations”. Leveraging these subsumptions will
reduce the number of processing pipelines per visualization
and facilitate reuse of running operators, thereby effectively
solving Problem 3.
5. TOWARDS INTERACTIVE CEP
In this section, I briefly describe the project context for my
PhD work, followed by Sections 5.1 and 5.2, where I present
two complementing architectures for solving the three chal-
lenges of real-time event stream visualization, as described
in Sections 3 and 4.
In our current working project, we are dealing with high-
frequency, financial, energy, and manufacturing data, pushed
to different web-browsers, running on many kinds of devices,
such as iPad, iPhone, and Desktop PCs. We have set up a
distributed CEP system, and combined it with a stream-
based web application stack to conduct our research.
All data is pushed to the clients solely via WebSockets
[19], since we primarily see modern web applications as our
future front-ends. The final rendering is done on the devices
themselves to provide best interactivity, i.e., the backend
will not generate full-sized bitmap images.
5.1 Web-Server-centric Architecture
The first version of an architecture for real-time event
stream visualization is depicted in Figure 4. In the Stream
Broker, the system implements a simple channel-reuse model
to avoid redundant data processing. Every time a new query
is issued by a client, the query is either sent to the CEP
system’s Query Manager, resulting in a new channel for that
query, or, if the query matches with an existing query, the
client can be mapped to an existing channel. In the latter
case the client instantly receives the correct data, while this
may be delayed for completely new user queries, because of
the Instant Query problem (see Section 3.1).
69
Streaming
Engine
Stream Broker
Query
Manager
Renderer
Charting
Library
Layout
Processor
Web Browser
Message
Serializer
Layout
Processor
query
graph
events
data flow query flow system control flow
?
Web ServerDistributed CEP System
Figure 4: Web-server-centric architecture
For mass client support, many instances of web servers can
connect to the CEP system, and issue queries of their con-
nected clients. Optimally, the different stream brokers on
the different web servers would exchange information about
the client connections, to optimize channel reuse across the
cluster, e.g., when the same query is sent over two differ-
ent web servers. Currently, the web servers will execute two
separate pipelines, for adding layout information and serial-
izing the event stream. This results in redundant work for
the Message Serializers and the Layout Processors.
In the future, the web servers should be enhanced to act
as really distributed streaming web servers, avoiding any
redundant processing, e.g., by running the layout processing
and serialization in a separate component, managed by the
stream broker, and accessible by different web servers.
Our system will allow many users issuing new, arbitrary
queries at any time. Therefore, optimizing the visualization
processing pipeline is crucial, since the number of running
operators in the system not only depends on the requested
data, but may grow significantly when considering the work
of the Layout Processor. For example, with the current sys-
tem, every backend-computed geometry has to be processed
separately if the screen parameters do not match exactly. In
this case the system schedules a visualization operator for
every screen size, even if they run on the same underlying
event stream.
In this architecture, visualizations are processed in a sep-
arate component that can not directly benefit from the op-
timizations that a CEP engine could provide for such kind
of event processing. Considering the CEP engine as native
visualization processor can solve this problem.
5.2 CEP-centric Architecture
I expect that Visualization Operators can be applied to
many kinds of event streams, making them good candidates
for running inside a CEP engine, where they can be op-
timized together with the other operators in the pipeline.
Leveraging this factor, the web-server-centric architecture
can be advanced towards a CEP-centric architecture, de-
picted in Figure 5, where all visualization-specific operators
can be processed directly in the CEP engine. This makes the
Web Server a pure Stream Broker for distributing the event
streams and managing HTTP sessions. In addition to vi-
sualization processing, events streams have to be serialized,
e.g., to JSON or XML, to be consumable by the clients.
Such Serialization Operators are equally well suited to be
processed by the CEP engine, as they usually are stateless
operations, just as many Visualization Operators.
To support interactivity and ensure real-time responsive-
ness the CEP engine’s Query Manager must be able to han-
dle the second class of stateful Interactivity Operators, as
CEP
Engine
Renderer
Charting
Library
Backend
Adapter
Web Browser
Query
Graph
Stream
Broker
Web
Server
Visualization
Operator
Serialization
Operator
V
S
events
Query
Manager
Interactivity
Operator
DB
I
Distributed CEP System
Figure 5: CEP-centric architecture
described in Section 4.1, storing history efficiently and us-
ing an optimized scheduling model for inactive operators.
They must not be removed from the global Query Graph,
since their data may be requested by the user at any time.
Since the two classes of operators, Interactivity and Visu-
alization Operators, are derived from visualization require-
ments of higher level applications, I expect them to exist
very close to the leaves of the operator network, i.e., at the
end of the Query Graph, reflecting the behavior of the ex-
ternal processing pipeline in the web-server-centric system.
I want to consider this fact in my PhD work, investigating
how local optimizations for these operators can be leveraged
and how local and global query optimizations can be per-
formed together. Running Interactivity and Visualization
Operators in a CEP engine in an optimal way will allow the
CEP system to better react to changing user queries.
6. RELATED WORK
In the context of real-time visualization on top of a CEP
system many different approaches and concepts can be in-
vestigated. There are already products and techniques for
implementing “real-time” visualizations and there is also sig-
nificant research on operators, continuous queries, and how
to optimize their processing in CEP engines.
Real-Time Visualization Software. Event stream vi-
sualizations for web applications are typically developed as
custom solutions, e.g., using Java Applets or Adobe Flash
[1]; legacy technologies, not compatible with the open web.
Another approach is to use dedicated visualization software,
such as the solutions by Panopticon (panopticon.com), or
specialized toolkits for time series data, such as Graphite
(graphite.wikidot.com), which mainly supports bitmap-based
rendering. So far, my investigation did not reveal a suit-
able real-time visualization toolkit that supports interactiv-
ity, runs on mobile web-browsers, and scales well with a
growing number of clients.
Using Databases with CEP. Databases are used for
pseudo-real-time, interactive applications. They can be com-
bined with a streaming system, e.g., by constantly writing
events from the CEP system to the database. Any visu-
alizations work directly on top of the database, allowing
reuse of existing database-centric analytics tools. This com-
bination of databases and CEP has reached the market in
form of products, such as StreamBase LiveView (stream-
base.com/products/liveview). This model can be further en-
hanced using in-memory database technology [13] with state-
of-the-art CEP engines, such as the Sybase ESP [12]. How-
ever, these solutions have some drawbacks, as they have
to continuously query the database instead of using push-
based techniques. In addition, I also did not find the no-
70
tion of Visualization Operators in the context of databases,
even though I expect they could be implemented on top of
database technology, such as materialized views.
Visualization Operators. Regarding potential opera-
tors and functions for visualization systems, Chi and Riedl
[3] provide an in-depth analysis, separating them into view
and value operators. The work is built on previous oper-
ator research, such as a data flow model for scientific vi-
sualization by Schroeder et al. [14] or a visual exploration
pipeline for databases by Lee et al. [11]. These works show
that the visualization pipeline can be split up into several
steps using different operators at different stages, which fits
well with the concept of outsourcing operators for visualiza-
tion processing. The operator classification may also help
with preparing visualization operators for direct execution
inside a CEP engine. What these approaches are missing
is the notion of interactivity. Interactivity relates to the
Instant Query problem, which I also plan to compare with
the problems solved by Operator Contracts, as proposed by
Childs et al. [4]. They introduce a distributed visualization
pipeline with several operators sharing up-stream contract
channels, which, according to the authors, “allow all possi-
ble optimizations to be applied to each pipeline execution”.
I consider these works on stream processing operators very
relevant to my PhD work, and will evaluate them as imple-
mentation model for my own operators.
Optimal Query Processing. As already described in
Section 2, when splitting visualization processing into spe-
cial CEP operators or even when using them in the CEP-
outbound processing pipeline, they can be optimized, using
techniques, such as Query Optimization [6, 15] and efficient
Operator Placement across the different processing nodes of
the system [9]. In my PhD work, I want to use prior art in
this research field for the optimal processing of my Visual-
ization and Interactivity Operators.
7. CONCLUSION
This paper illustrated my vision for real-time event stream
visualization. When combining CEP with real-time, inter-
active visualizations, distributed for a large number of mo-
bile, browser-based applications, the streaming system has
to solve some major challenges, such as efficient pipeline
management, respecting device capabilities, and providing
interactive exploration options on the streaming data. From
these challenges, I derived three core problems, to be solved
in my PhD work, and presented my solution concepts.
To motivate the proposed concepts, the paper includes ex-
amples based on real-time visualizations of a financial appli-
cation, and first results, regarding the continuous rendering
performance of web browsers (on mobile devices). In addi-
tion, I presented two prospective architectures that I plan
to use and evaluate for implementing my solutions.
In the future, I first plan to evaluate multi-query optimiza-
tion techniques to facilitate reuse of visualization processing
pipelines. Thereafter, I will focus on related approaches
for CEP and visualization operators and how they can be
used to implement my Interactivity and Visualization Oper-
ators. The goal is to run such operators efficiently inside a
CEP engine or within a higher level visualization processing
pipeline. Eventually, my work will enable the development
of highly interactive, mobile analytics tools, working on real-
time, streaming data.
8. REFERENCES
[1] Adobe Systems Inc. The NASDAQ Stock Market, Inc.
2009. Online, 06/2012, (tinyurl.com/nasdaq-study).
[2] M. Bostock, V. Ogievetsky, and J. Heer. D3
Data-Driven Documents. Visualization & Comp.
Graphics, IEEE Trans., 17(12):2301–2309, 2011.
[3] E. Chi and J. Riedl. An operator interaction
framework for visualization systems. Symposium on
Information Visualization, IEEE, pages 63–70, 1998.
[4] H. Childs, E. Brugger, K. Bonnell, J. Meredith,
M. Miller, B. Whitlock, and N. Max. A contract based
system for large data visualization. Visualization,
IEEE, pages 190–198, 2005.
[5] J. P. Corrigan. Opra updated traffic projections for
2012 and 2013. Technical report, OPRA, 2011. Online,
06/2012, (tinyurl.com/opra-prj).
[6] C. Jin and J. Carbonell. Predicate indexing for
incremental multi-query optimization. Foundations of
Intelligent Systems, LNCS, 4994:339–350, 2008.
[7] E. Kalyvianaki, W. Wiesemann, Q. H. Vu, D. Kuhn,
and P. Pietzuch. Sqpr: Stream query planning with
reuse. Proc. ICDE, IEEE, pages 840 – 851, 2011.
[8] D. Keim, F. Mansmann, J. Schneidewind, J. Thomas,
and H. Ziegler. Visual analytics: scope and challenges.
Visual Data Mining, LNCS, 4404:76–90, 2008.
[9] G. Lakshmanan, Y. Li, and R. Strom. Placement
strategies for internet-scale data stream systems.
Internet Computing, IEEE, 12(6):50–60, 2008.
[10] N. Leavitt. Complex-event processing poised for
growth. Computer, IEEE, 42(4):17–20, 2009.
[11] J. Lee and G. Grinstein. An architecture for retaining
and analyzing visual explorations of databases.
Visualization, IEEE, pages 101–108, 1995.
[12] Neil McGovern. Introduction to compex event
processing in capital markets. Technical report,
Sybase, Inc., 2009.
[13] H. Plattner and A. Zeier. In-Memory Data
Management: An Inflection Point For Enterprise
Applications. Springer, 2011.
[14] W. Schroeder and B. Lorenson. Visualization Toolkit:
An Object-Oriented Approach to 3-D Graphics.
Prentice Hall PTR, 1996.
[15] T. Sellis. Multiple-query optimization. Transactions
on Database Systems, 13(1):23–52, 1988.
[16] B. Shneiderman. Treemaps for space-constrained
visualization of hierarchies. ACM Transactions on
Graphics, 11:92–99, 1998.
[17] Streambase Inc. StreamBase for MMOs, MMOGs, &
Social Worlds. Online, 06/2012, (tinyurl.com/sbmmo).
[18] W3C Consortium. HTML5 Working Draft. Online,
06/2012, (dev.w3.org/html5/spec/spec.html).
[19] W3C Consortium. The WebSocket API. Online,
06/2012, (dev.w3.org/html5/websockets).
[20] W. Wang, H. Wang, G. Dai, and H. Wang.
Visualization of large hierarchical data by circle
packing. In Proc. SIGCHI, pages 517–520. ACM, 2006.
[21] E. Wu, Y. Diao, and S. Rizvi. High-performance
complex event processing over streams. In Proc.
SIGMOD, pages 407–418. ACM, 2006.
71
... Online Aggregation and Streaming. Even though this paper focuses on aggregation of static data, our work was initially driven by the need for interactive, real-time visualizations of high-velocity streaming data [16]. Indeed, we can apply the M4 aggregation for online aggregation, i.e., derive the four extremum tuples in O(n) and in a single pass over the input stream. ...
Article
Full-text available
Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume of time series data disregard the semantics of visualizations and result in visualization errors. In this work, we introduce M4, an aggregation-based time series dimensionality reduction technique that provides error-free visualizations at high data reduction rates. Focusing on line charts, as the predominant form of time series visualization, we explain in detail the drawbacks of existing data reduction techniques and how our approach outperforms state of the art, by respecting the process of line rasterization. We describe how to incorporate aggregation-based dimensionality reduction at the query level in a visualization-driven query rewriting system. Our approach is generic and applicable to any visualization system that uses an RDBMS as data source. Using real world data sets from high tech manufacturing, stock markets, and sports analytics domains we demonstrate that our visualization-oriented data aggregation can reduce data volumes by up to two orders of magnitude, while preserving perfect visualizations.
Article
Contemporary RDBMS-based systems for visualization of high-volume numerical data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume of large data sets disregard the spatial properties of visualizations, resulting in visualization errors. In this work, we introduce VDDA, a visualization-driven data aggregation that models visual aggregation at the pixel level as data aggregation at the query level. Based on the M4 aggregation for producing pixel-perfect line charts from highly reduced data subsets, we define a complete set of data reduction operators that simulate the overplotting behavior of the most frequently used chart types. Relying only on the relational algebra and the common data aggregation functions, our approach is generic and applicable to any visualization system that consumes data stored in relational databases. We demonstrate our visualization-driven data aggregation using real-world data sets from high-tech manufacturing, stock markets, and sports analytics, reducing data volumes by up to two orders of magnitude, while preserving pixel-perfect visualizations, as producible from the raw data.
Article
Full-text available
In the last 50 years the world has been completely transformed through the use of IT. We have now reached a new inflection point. Here we present, for the first time, how in-memory data management is changing the way businesses are run. Today, enterprise data is split into separate databases for performance reasons. Analytical data resides in warehouses, synchronized periodically with transactional systems. This separation makes flexible, real-time reporting on current data impossible. Multi-core CPUs, large main memories, and cloud computing are serving as the foundation for the transition of enterprises away from this restrictive model. In this book, we describe techniques that allow analytical and transactional processing at the speed of thought and enable new ways of doing business. The book is intended for university students, IT professionals and IT managers, but it is also for senior management who wish to create new business processes by leveraging in-memory computing.
Article
Full-text available
Optimally assigning streaming tasks to network machines is a key factor that influences a large data-stream-processing system's performance. Although researchers have prototyped and investigated various algorithms for task placement in data stream management systems, taxonomies and surveys of such algorithms are currently unavailable. To tackle this knowledge gap, the authors identify a set of core placement design characteristics and use them compare eight placement algorithms. They also present a heuristic decision tree that can help designers judge how suitable a given placement solutions might be to specific problems.
Article
Full-text available
Information visualization encounters a wide variety of different data domains. The visualization community has developed representation methods and interactive techniques. As a community, we have realized that the requirements in each domain are often dramatically different. In order to easily apply existing methods, researchers have developed a semiology of graphic representations. We have extended this research into a framework that includes operators and interactions in visualization systems, such as a visualization spreadsheet. We discuss properties of this framework and use it to characterize operations spanning a variety of different visualization techniques. The framework developed in this paper enables a new way of exploring and evaluating the design space of visualization operators, and helps end--users in their analysis tasks.
Conference Paper
Full-text available
We present a relational schema that stores the computations of a shared query evaluation plan, and tools that search the common computations between new queries and the schema, which are the two essential parts of the Incremental Multiple Query Optimization (IMQO) framework we proposed to allow the efficient construction of the optimal evaluation plan for multiple continuous queries.
Article
Organizations are increasingly turning to complex-event processing to help make sense of the flood of data they work with.
Conference Paper
When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads.
Conference Paper
In this paper a novel approach is described for tree visualization using nested circles. The brother nodes at the same level are represented by externally tangent circles; the tree nodes at different levels are displayed by using 2D nested circles or 3D nested cylinders. A new layout algorithm for tree structure is described. It provides a good overview for large data sets. It is easy to see all the branches and leaves of the tree. The new method has been applied to the visualization of file systems.
Conference Paper
In this paper, we present the design, implementation, and evalua- tion of a system that executes complex event queries over real-time streams of RFID readings encoded as events. These complex event queries filter and correlate events to match specific patterns, and transform the relevant events into new composite events for the use of external monitoring applications. Stream-based execution of these queries enables time-critical actions to be taken in environ- ments such as supply chain management, surveillance and facility management, healthcare, etc. We first propose a complex event language that significantly extends existing event languages to meet the needs of a range of RFID-enabled monitoring applica- tions. We then describe a query plan-based approach to efficiently implementing this language. Our approach uses native operators to efficiently handle query-defined sequences, which are a key com- ponent of complex event processing, and pipelines such sequences to subsequent operators that are built by leveraging relational tech- niques. We also develop a large suite of optimization techniques to address challenges such as large sliding windows and intermediate result sizes. We demonstrate the effectiveness of our approach through a detailed performance analysis of our prototype imple- mentation as well as through a comparison to a state-of-the-art stream processor.
Conference Paper
A software architecture is presented to integrate a database management system with data visualization. One of it's primary objectives, the retention of user-data interactions, is detailed. By storing all queries over the data along with high-level descriptions of the query result and associated visualization, the process by wich a database is explored can be analyzed. This approach can lead to contributions in the development of user models as "data explorers", metadata models for scientific databases, intelligent assistants, and data exploration services. We describe the underlying elements of this approach, specifically the visual database exploration model and the metadata objects that support the model.