Content uploaded by Nane Kratzke
Author content
All content in this area was uploaded by Nane Kratzke on Aug 25, 2022
Content may be subject to copyright.
Article
Cloud-native Observability: The Many-faceted Benefits of
Structured and Unified Logging - A Case Study
Nane Kratzke
Lübeck University of Applied Sciences; nane.kratzke@th-luebeck.de
Correspondence: nane.kratzke@th-luebeck.de
Abstract: Background: Cloud-native software systems often have a much more decentralized
1
structure and many independently deployable and (horizontally) scalable components, making it
2
more complicated to create a shared and consolidated picture of the overall decentralized system state.
3
Today, observability is often understood as a triad of collecting and processing metrics, distributed
4
tracing data, and logging. The result is often a complex observability system composed of three
5
stovepipes whose data is difficult to correlate. Objective: This study analyzes whether these three
6
historically emerged observability stovepipes of logs, metrics and distributed traces could be handled
7
more integrated and with a more straightforward instrumentation approach. Method: This study
8
applied an action research methodology used mainly in industry-academia collaboration and common
9
in software engineering. The research design utilized iterative action research cycles, including one
10
long-term use case. Results: This study presents a unified logging library for Python and a unified
11
logging architecture that uses the structured logging approach. The evaluation shows that several
12
thousand events per minute are easily processable. Conclusion: The results indicate that a unification
13
of the current observability triad is possible without the necessity to develop utterly new toolchains.
14
Keywords: cloud-native; observability; cloud computing; logging; structured logging; logs; metrics;
15
traces; distributed tracing; log aggregation; log forwarding; log consolidation 16
1. Introduction 17
A "crypto winter" basically means that the prices for so-called cryptocurrencies such as
18
Bitcon, Ethereeum, Solana, etc. fell sharply on the crypto exchanges and then stay low. The
19
signs were all around in 2022: the failure of the TerraUSD crypto project in May 2022 sent
20
an icy blast through the market, then the cryptocurrency lending platform Celsius Network
21
halted withdrawals, prompting a sell-off that pushed Bitcoin to a 17-month low. 22
This study logged such a "crypto winter" on Twitter more by accident than by intention.
23
Twitter was simply selected as an appropriate use case to evaluate a unified logging solution
24
for cloud-native systems and decided to log Tweets containing stock symbols like $USD or
25
$EUR. It turned out that most symbols used on Twitter are not related to currencies like
26
$USD (US-Dollar) or stocks like $AAPL (Apple) but to Cryptocurrencies like $BTC (Bitcoin)
27
or $ETH (Ethereum). The Twitter community therefore seems to be quite cryptocurrency-
28
savvy. So, although some data of this 2022 crypto winter will be presented in this paper,
29
this paper will take more the methodical part into focus and will address how such and
30
further data could be collected more systematically in distributed cloud-native applications.
31
The paper will at least show that even complex observability of distributed systems can be
32
reached, simply by logging events to stdout. 33
Observability measures how well a system’s internal state can be inferred from knowl-
34
edge of its external outputs. The concept of observability was initially introduced by
35
the Hungarian-American engineer Rudolf E. Kálmán for linear dynamical systems [
1
,
2
].
36
However, observability also applies to information systems and is of particular interest
37
to fine-grained and distributed cloud-native systems that come with a very own set of
38
observability challenges. 39
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
© 2022 by the author(s). Distributed under a Creative Commons CC BY license.
2 of 18
Traditionally, the responsibility for observability is (was?) with operations (Ops). With
40
the emergence of DevOps, we can observe a shift of Ops responsibilities to developers. So,
41
observability is evolving more and more into a Dev responsibility. Observability should
42
ideally already be considered during the application design phase and not be regarded
43
as some "add-on" feature for later expansion stages of an application. The current discus-
44
sion about observability began well before the advent of cloud-native technologies like
45
Kubernetes. A widely cited blog post by Cory Watson from 2013 shows how engineers at
46
Twitter looked for ways to monitor their systems as the company moved from a monolithic
47
to a distributed architecture [
3
–
5
]. One of the ways Twitter did this was by developing a
48
command-line tool that engineers could use to create their dashboards to keep track of the
49
charts they were creating. While CI/CD tools and container technologies often bridge Dev
50
and Ops in one direction, observability solutions close the loop in the opposite direction,
51
from Ops to Dev [
4
]. Observability is thus the basis for data-driven software development
52
(see Fig. 1and [
6
]). As developments around cloud(-native) computing progressed, more
53
and more engineers began to "live in their dashboards." They learned that it is not enough
54
to collect and monitor data points but that it is necessary to address this problem more
55
systematically. 56
Figure 1. Observability can be seen as a feedback channel from Ops to Dev (adopted from [
4
] + [
6
]).
2. Problem description 57
Today, observability is often understood as a triad. Observability of distributed information
58
systems is typically achieved through the collection and processing of metrics (quantitative
59
data primarily as time-series), distributed tracing data (execution durations of complex
60
system transactions that flow through services of a distributed system), and logging (qual-
61
itative data of discrete system events often associated with timestamps but encoded as
62
unstructured strings). Consequently, three stacks of observability solutions have emerged,
63
and the following somehow summarizes the current state of the art. 64
•
Metrics: Here, quantitative data is often collected in time series, e.g., how many
65
requests a system is currently processing. The metrics technology stack is often
66
characterized by tools such as Prometheus and Grafana. 67
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
3 of 18
•
Distributed tracing involves following the path of transactions along the components
68
of a distributed system. The tracing technology stack is characterized by tools such as
69
Zipkin or Jaeger, and the technologies are used to identify and optimize particularly
70
slow or error-prone substeps of distributed transaction processing. 71
•
Logging is probably as old as software development itself, and many developers,
72
because of the log ubiquity, are unaware that logging should be seen as part of holistic
73
observability. Logs are usually stored in so-called log files. Primarily qualitative events
74
are logged (e.g. user XYZ logs in/out). An event is usually attached to a log file in
75
a text line. Often the implicit and historically justifiable assumption prevails with
76
developers that these log files are read and evaluated primarily by administrators
77
(thus humans). However, that is hardly the case anymore. It is becoming increasingly
78
common for the contents of these log files to be forwarded to a central database
79
through "log forwarders" so that they can be evaluated and analyzed centrally. The
80
technology stack is often characterized by tools such as Fluentd, FileBeat, LogStash
81
for log forwarding, databases such as ElasticSearch, Cassandra or simply S3 and user
82
interfaces such as Kibana. 83
Figure 2. An application is quickly surrounded by a complex observability system when metrics,
tracing and logs are captured with different observability stacks.
Incidentally, all three observability pillars have in common that software to be developed
84
must be somehow instrumented. This instrumentation is normally done using program-
85
ming language-specific libraries. Developers often regard distributed tracing instrumenta-
86
tion in particular as time-consuming. Also, which metric types (counter, gauge, histogram,
87
history, and more) are to be used in metric observability solutions such as Prometheus
88
often depends on Ops experience and is not always immediately apparent to developers.
89
Certain observability hopes fail simply because of wrongly chosen metric types. Only
90
system metrics such as CPU, memory, and storage utilization can be easily captured in a
91
black-box manner (i.e., without instrumentation in the code). However, these data are often
92
only of limited use for the functional assessment of systems. For example, CPU utilization
93
provides little information about whether conversion rates in an online store are developing
94
in the desired direction. 95
So, current observability solutions are often based on these three stovepipes for logs,
96
metrics, and traces. The result is an application surrounded by a complex observability
97
system whose isolated datasets can be difficult to correlate. Fig. 2focuses on the application
98
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
4 of 18
(i.e., the object to be monitored) and triggers the question, whether it is justified to use three
99
complex subsystems and three types of instrumentation, which always means three times
100
the instrumentation and data analysis effort of isolated data silos. 101
The often-used tool combination of ElasticSearch, LogStash, and Kibana is often used
102
for logging and has even been given a catchy acronym: ELK-Stack [3]. The ELK stack
103
can be used to collect metrics and using the plugin APM also for distributed tracing. So,
104
at least for the ELK stack, the three stovepipes are not clearly separable or disjoint. The
105
separateness is somewhat historically "suggested" than technologically given. Nevertheless,
106
this tripartite division into metrics, tracing and logging is very formative for the industry,
107
as shown, for example, by the OpenTelemetry project [
7
]. OpenTelemetry is currently in the
108
incubation stage at the Cloud Native Computing Foundation and provides a collection of
109
standardized tools, APIs, and SDKs to instrument, generate, collect, and export telemetry
110
data (metrics, logs, and traces) to analyze the performance and behaviour of software
111
systems. OpenTelemetry thus standardizes observability but hardly aims to overcome the
112
columnar separation into metrics, tracing, and logging. 113
In past and current industrial action research [
4
,
6
,
8
–
14
], I came across various cloud-
114
native applications and corresponding engineering methodologies like the 12-factor app
115
(see 4.1) and learned that the discussion around observability is increasingly moving
116
beyond these three stovepipes and taking a more nuanced and integrated view. There is a
117
growing awareness of integrating and unifying these three pillars, and more emphasis is
118
being placed on analytics. 119
The research question arises whether these three historically emerged observability
120
stovepipes of logs, metrics and distributed traces could be handled more integrated and
121
with a more straightforward instrumentation approach. The results of this action research
122
study shows that this unification potential could be surprisingly easy to realize. This paper
123
presents the methodology in Sec. 3and its results in Sec. 4(including a logging prototype
124
in Sec 4.4 and its evaluation results in 4.5 as the main contribution of this paper to the
125
field). The results are discussed in Sec. 5. Furthermore, the study presents related work in
126
Sec. 6and concludes its findings as well as future promising research directions in Sec. 7.127
3. Methodology 128
This study followed the action research methodology as a proven and well-established re-
129
search methodology model for industry-academia collaboration in the software engineering
130
context to analyze the research-question mentioned above. Following the recommendations
131
of Petersen et al. [
15
], a research design was defined that applied iterative action research
132
cycles (see Fig. 3): 133
1. Diagnosis (Diagnosing according to [15]) 134
2. Prototyping (Action planning, design and taking according to [15]) 135
3. Evaluation including a may be required redesign (Evaluation according to [15]) 136
4.
Transfer learning outcomes to further use cases (Specifying learning according to
137
[15]) 138
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
5 of 18
Figure 3. Action research methodology of this study
With each of the following use cases insights were transferred from the previous use case
139
into a structured logging prototype (see Fig. 3). The following use cases have been studied
140
and evaluated. 141
•
Use Case 1: Observation of qualitative events occurring in an existing solution (on-
142
line code editor; https://codepad.th-luebeck.dev, this use case was inspired by our
143
research [11]) 144
•
Use Case 2: Observation of distributed events along distributed services (distributed
145
tracing in an existing solution of an online code editor, see UC1) 146
•
Use Case 3: Observation of quantitative data generated by a technical infrastructure
147
(Kubernetes platform, this use case was inspired by our research [14]) 148
•
Use Case 4: Observation of a massive online event stream to gain experiences with
149
high-volume event streams (we used Twitter as a data source and tracked worldwide
150
occurrences of stock symbols, this use case was inspired by our research [16,17]) 151
4. Results 152
The analysis of cloud-native methodologies like the 12-factor app [
18
] has shown that to
153
build observability, one should take a more nuanced and integrated view to integrate and
154
unify these three pillars of metrics, traces, and logs to enable more agile and convenient
155
analytics in feedback information flow in DevOps cycles (see Fig. 1). Two aspects that
156
gained momentum in cloud-native computing are of interest: 157
•
Recommendations on how to handle log forwarding and log consolidaion in cloud-
158
native applications 159
• Recommendations to apply structured logging 160
Because both aspects guided the implementation of the logging prototype deeply, they will
161
be explained in more details providing the reader the necessary context. 162
4.1. Twelve-factor apps 163
The 12-factor app is a method [
18
] for building software-as-a-service applications that
164
pay special attention to the dynamics of organic growth of an application over time,
165
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
6 of 18
the dynamics of collaboration between developers working together on a codebase, and
166
avoiding the cost of software erosion. At its core, 12 rules (factors) should be followed to
167
develop well-operational and evolutionarily developable distributed applications. This
168
methodology harmonizes very well with microservice architecture approaches [
3
] and
169
cloud-native operating environments like Kubernetes [
19
], which is why the 12-factor
170
methodology is becoming increasingly popular. Incidentally, the 12-factor methodology
171
does not contain any factor explicitly referring to observability, certainly not in the triad of
172
metrics, tracing and logging. However, factor XI recommends how to handle logging: 173
Logs are the stream of aggregated events sorted by time and summarized from the output
174
streams of all running processes and supporting services. Logs are typically a text format
175
with one event per line. 176
[...] 177
A twelve-factor app never cares about routing or storing its output stream. It should
178
not attempt to write to or manage log files. Instead, each running process writes its
179
stream of events to stdout. [...] On staging or production deploys, the streams of all
180
processes are captured by the runtime environment, combined with all other streams of
181
the app, and routed to one or more destinations for viewing or long-term archiving. These
182
archiving destinations are neither visible nor configurable to the app - they are managed
183
entirely from the runtime environment. 184
4.2. From logging to structured logging 185
The logging instrumentation is quite simple for developers and works mainly programming
186
language specific but basically according to the following principle illustrated in Python. 187
A logging library must often be imported, defining so-called log levels such as DEBUG,
188
INFO, WARNING, ERROR, FATAL, and others. While the application is running, a log
189
level is usually set via an environment variable, e.g. INFO. All log calls above this level are
190
then written to a log file. 191
1im po rt l ogging 192
lo g gi n g . b as i cC o nf i g ( fi l en a me = " e xa m pl e . lo g " , le v el = l o gg in g . D EB U G ) 193
3lo g gi n g . de b ug ( " P e rf o rm i ng ␣ u s er ␣ c he c k ") 194
us e r = "N a ne ␣ K r at z ke " 195
5lo g g in g . i nf o ( f " U se r ␣ { ␣ u se r ␣ } ␣ t r ie s ␣ t o ␣ lo g ␣ i n . ") 196
lo g g in g . w ar n in g ( f " Us er ␣ { ␣ u se r ␣ }␣ n o t␣ f ou n d ’ ) 197
7lo g g in g . e rr o r (f " U se r { us e r } ha s be e n b an n ed . " ) 198
For example, line 5 would create the following entry in a log file: 199
1IN FO 20 22 - 01 - 27 16 : 17 :5 8 - Us er Na ne K ra tz ke tr i es t o lo g in 200
In a 12-factor app, this logging would be configured so that events are written directly to
201
Stdout (console). The runtime environment (e.g., Kubernetes with FileBeat service installed)
202
then routes the log data to the appropriate database taking work away from the developer
203
that they would otherwise have to invest in log processing. This type of logging is well
204
supported across many programming languages and can be consolidated excellently with
205
the ELK stack (or other observability stacks). 206
Logging (unlike distributed tracing and metrics collection) is often not even perceived
207
as (complex) instrumentation by developers. Often it is done on their own initiative.
208
However, one can systematize this instrumentation somewhat and extend it to so-called
209
"structured logging". Again, the principle is straightforward. One simply does not log lines
210
of text like 211
1IN FO 20 22 - 01 - 27 16 : 17 :5 8 - Us er Na ne K ra tz ke tr i es t o lo g in 212
but instead, the same information in a structured form, e.g. using JSON: 213
1{ " lo g ␣ l e ve l " : " i nf o " , " t im e s ta m p " : " 20 2 2 - 01 - 2 7 ␣ 1 6: 1 7 :5 8 " , " e ve n t " : " Lo g ␣ i n " , 214
" us e r ": " N an e ␣ K ra t zk e " , " re s ul t " : " su c ce s s "} 215
In both cases, the text is written to the console. In the second case, however, a structured text-
216
based data format is used that is easier to evaluate. In the case of a typical logging statement
217
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
7 of 18
like "User Max Mustermann tries to log in" the text must first be analyzed to determine the
218
user. This text parsing is costly on a large scale and can also be very computationally
219
intensive and complex if there is plenty of log data in a variety of formats (which is the
220
common case in the real world). 221
However, in the case of structured logging, this information can be easily extracted
222
from the JSON data field "user". In particular, more complex evaluations become much
223
easier with structured logging as a result. However, the instrumentation does not become
224
significantly more complex, especially since there are logging libraries for structured
225
logging. The logging looks in the logging prototype log12 of this study like this: 226
1im po rt l og 12 227
[. .. ] 228
3lo g 1 2 . e rr o r ( " Lo g ␣ i n " , us e r = us er , re s u lt = " N ot ␣ f ou n d " , r ea s o n = " Ba n n ed " ) 229
The resulting log files are still readable for administrators and developers (even if a bit more
230
unwieldy) but much better processable and analyzable by databases such as ElasticSearch.
231
Quantitative metrics can also be recorded in this way. Structured logging can thus also be
232
used for the recording of quantitative metrics. 233
1im po rt l og 12 234
[. .. ] 235
3lo g 12 . i n fo ( " O pe n ␣ r eq u es t s " , re q ue s ts = l e n ( re q ue s ts ) ) 236
1{ " ev en t " : " Op en ␣ r e qu e st s " , " re q ue s ts " : 42 } 237
What is more, this structured logging approach can also be used to create tracings. In
238
distributed tracing systems, a trace ID is created for each transaction that passes through a
239
distributed system. The individual steps are so-called spans. These are also assigned an
240
ID (span ID). The span ID is then linked to the trace ID, and the runtime is measured and
241
logged. In this way, the time course of distributed transactions can be tracked along the
242
components involved, and, for example, the duration of individual processing steps can be
243
determined. 244
4.3. Resulting and simplified logging architecture 245
So, if the two principles to print logs simply to stdout and to log in a structured and text-
246
based data format are applied consequently. The resulting observability system complexity
247
thus reduces from Fig. 2to Fig. 4because all system components can collect log, metric, and
248
trace information in the same style that can be routed seamlessly from an operation platform
249
provided log forwarder (already existing technology) to a central analytical database. 250
Figure 4. An observability system consistently based on structured logging with significantly reduced
complexity.
4.4. Study outcome: Unified instrumentation via an structured logging library (prototype) 251
This paper will briefly explain below the way to capture events, metrics, and traces using
252
the logging prototype that emerged. The prototype library log12 was developed in Python
253
3 but could implemented in other programming languages analogously. 254
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
8 of 18
log12 will create automatically for each event additional key-value attributes like an
255
unique identifier (that is used to relate child events to parent events and even remote events
256
in distributed tracing scenarios) and start and completion timestamps that can be used to
257
measure the runtime of events (although known from distributed tracing libraries but not
258
common for logging libraries). It is explained 259
• how to create a log stream, 260
• how an event in a log stream is created and logged, 261
•
how a child event can be created and assigned to a parent event (to trace and record
262
runtimes of more complex and dependent chains of events within the same process), 263
•
and how to make use of the distributed tracing features to trace events that pass
264
through a chain of services in a distributed service of services system). 265
The following lines of code create a log stream with the name "logstream" that is logged to
266
stdout. 267
Listing 1: Creating an event log stream in log12
1im po rt l og 12 268
lo g = l og 1 2 . l o gg i n g ( " l og s t re a m " , 269
3ge n e ra l = " v al u e " , ta g =" f o o " , se r vi c e _m a r k =" t es t " 270
)271
Each event and child events of this stream are assigned a set of key-value pairs: 272
• general="value" 273
• tag="foo" 274
• service_mark="test" 275
These log-stream-specific key-value pairs can be used to define selection criteria in analytical
276
databases like ElasticSearch to filter events of a specific service only. The following lines of
277
code demonstrate how to create a parent event and child events. 278
Listing 2: Event logging in log12 using blocks as structure
# Lo g events us in g th e wi th cl au se 279
2wi t h lo g . ev e nt ( " T es t " , he ll o = " Wo r ld " ) as e v en t : 280
ev e nt . u p da t e ( te s t =" s o me t hi n g " ) 281
4# ad ds e ve nt s pe ci fi c ke y va lue pairs to th e event 282
283
6wi t h e ve n t . c h il d ( " S u be v e n t ␣ 1␣ o f ␣ T e st " ) a s ev : 284
ev . u p da t e ( fo o = " ba r " ) 285
8ev . e r r or ( " C a ta s t r op h e " ) 286
# E xp l ic i t ca l l of l o g (h e re o n e rr o r le v el ) 287
10 288
wi t h e ve n t . c h il d ( " S u be v e n t ␣ 2␣ o f ␣ T e st " ) a s ev : 289
12 ev . u p da t e ( ba r = " fo o " ) 290
# I mp l i ci t c a ll of e v . i nf o ( " S u cc e s s ") ( a t b lo c k e nd ) 291
14 292
wi t h e ve n t . c h il d ( " S u be v e n t ␣ 3␣ o f ␣ T e st " ) a s ev : 293
16 ev . u p da t e ( ba r = " fo o " ) 294
# I mp l i ci t c a ll of e v . i nf o ( " S u cc e s s ") ( a t b lo c k e nd ) 295
Furthermore, it is possible to log events in the event stream without the block style. That
296
might be necessary for programming languages that do not support to close resources (here
297
a log stream) at the end of a block. In this case programmers are responsible to close events
298
using the .info(),.warn(),.error() log levels. 299
Listing 3: Event logging in log12 without blocks
1# To l og ev e nt s w i th o ut w it h - b l oc k s is p o ss i b le a s w el l . 300
ev = l og . e v en t ( " A no t he r ␣ t e st " , f oo = " b ar " ) 301
3ev . u p da t e ( ba r = " fo o " ) 302
ch i l d = ev . c h il d ( " S u be v e nt ␣ o f ␣ A no t h er ␣ t e st " , f oo = " b ar " ) 303
5ev . i n fo ( " F i ni s h e d " ) 304
# <= Howeve r , th an y ou a re are r es po ns ib le t o lo g ev en ts e xp licit y 305
7# If p ar en t ev en ts a re l og ge d al l su bs eq ue nt child events 306
# ar e assumed to ha ve c lo se d s uc ce ss fu lly as wel l 307
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
9 of 18
Using this type of logging to forward events along HTTP-based requests is also possible.
308
This usage of HTTP-Headers is the usual method in distributed tracing. Two main ca-
309
pabilities are required for this [
20
]. First, extracting header information received by an
310
HTTP service process must be possible. Secondly, it must be possible to inject the tracing
311
information in follow-up upstream HTTP requests (in particular, the trace ID and span ID
312
of the process initiating the request). 313
Listing 4shows how log12 supports this with an extract attribute at event creation
314
and an inject method of the event that extracts relevant key-value pairs from the event so
315
that they can be passed as header information along an HTTP request. 316
Listing 4: Extraction and injection of tracing headers in log12
im po rt l og 12 317
2im po rt r equests # To g enerate H TT P r eq ue st s 318
fr om fl as k import re qu es t # To d emons tr at e H ea de r ext ra ct io n 319
4320
wi t h lo g . ev e nt ( " D is t ri b ut e d ␣ tr ac i ng " , extract=request.headers ) as e v : 321
6322
# He re i s ho w to p as s tracing i nf or mat io n al on g r em ot e ca ll s 323
8wi t h ev . c hi l d (" T a sk ␣ 1 ") a s ev e nt : 324
re s p on s e = r eq u es t s . g et ( 325
10 " h tt p s :/ / q r . my l ab . t h - l ue b ec k . d ev / r o ut e ? u rl = h t tp s : / / g oo g le . c o m ", 326
headers=event.inject() 327
12 )328
ev e nt . u p da te ( l e ng t h = le n ( re s po n se . t ex t ) , st a tu s = re s po n se . s t a tu s _c o de ) 329
4.5. Evaluation of logging prototype in the definded use cases 330
Use Cases 1 and 2: Codepad is an online coding tool to share quickly short code snippets in
331
online and offline teaching scenarios. It has been introduced during the Corona Pandemic
332
shutdowns to share short code snippets mainly in online educational settings for 1st or
333
2nd semester computer science students. Meanwhile the tool is used in presence lectures
334
and labs as well. The reader is welcome to try out the tool at https://codepad.th-luebeck.
335
dev. This study used the Codepad tool in its steps 1, 2, 3, and 4 of its action research
336
methodology as an instrumentation use case (see Fig. 3) to evaluate the instrumentation of
337
qualitative system events according to Sec. 4.4. Fig. 5shows the Web-UI on the left and the
338
resulting dashboard on the right. In a transfer step (steps 12, 13, 14, and 15 of the action
339
research methodolgy, see Fig. 3) the same product was used to evaluate distributed tracing
340
instrumentation (not covered in detail by this report). 341
Figure 5. Use Cases 1 and 2: Codepad is an online coding tool to share quickly short code snippets in
online and offline teaching scenarios. On the left the Web-UI. On the right the Kibana Dashboard
used for observability in this study. Codepad was used as an instrumentation object of investigation.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
10 of 18
The Use Case 3 (steps 5, 6, 7, 8 of research methodology; Fig. 3)observed an institutes
342
infrastructure, the so-called myLab infrastructure. myLab (https://mylab.th-luebeck.dev)
343
is a virtual laboratory that can be used by students and faculty staff to develop and host
344
web applications. This use case was chosen to demonstrate that it is possible to collect
345
primarily metrics based data over a long term using the same approach as in Use Case 1. A
346
pod tracked mainly the resource consumption of various differing workloads deployed by
347
more than 70 student web projects of different university courses. To observe this resource
348
consumption the pod simply run periodically 349
•kubectl top nodes 350
•kubectl top pods –all-namespaces 351
against the cluster. This observation pod parsed the output of both shell commands and
352
printed the parsed results in the structured logging approach presented in Sec. 4.4. Fig. 6
353
shows the resulting Kibana dashboard for demonstration purposes. 354
Figure 6. Use Case 3: The dashboard of the Kubernetes infrastructure under observation (myLab)
The Use Case 4 (steps 9, 10, 11 of research methodology; Fig. 3)left our own ecosystem and
355
observed the public Twitter Event stream as a type representative for a high-volume and
356
long-term observation of an external system. So, a system that was intentionally not under
357
the direct administrative control of the study investigators. The Use Case 4 was designed as
358
two phase study: The first screening phase was designed to gain experiences in logging high
359
volume event streams and to provide necessary features and performance optimizations
360
to the structured logging library prototype. The screening phase was designed to screen
361
the complete and representative Twitter traffic as a kind of "ground truth". We were
362
interested in the distribution of languages and stock symbols in relation to the general
363
Twitter "background noise". This screening phase lasted from 20/01/2022 to 02/02/2022
364
and identified most used stock symbols. A long-term recording was then done as a second
365
long-term evaluation phase and was used to track and record the most frequent used stock
366
symbols identified in the screening phase. This evaluation phase lasted from Feb. 2022 until
367
mid of August 2022. In this evaluation phase just one infrastructure downtime occurred
368
due to a shutdown of electricity of the author’s institute. However, this downtime was not
369
due to or related to the presented unified logging stack (see Fig. 9). 370
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
11 of 18
hashtag
mention
symbol
tweet
user
Screening
phase
0.0
0.5
1.0
1.5
Events per day
1e7
all events
symbols
hashtag
mention
symbol
tweet
user
Evaluation
phase
2022-02
2022-03
2022-04
2022-05
2022-06
2022-07
2022-08
0
2
4
Events per day
1e6
Infrastructure
downtime
LUNA crash all events
symbols
Figure 7. Recorded events (screening and evaluation phase of Use Case 4).
The recording was done using the following source code, compiled into a Docker container,
371
that has been executed on a Kubernetes cluster that has been logged in Use Case 1, 2, and 3.
372
FileBeat was used as a log forwarding component to a background ElasticSearch database.
373
The resulting event log has been analyzed and visualized using Kibana. Kibana was used
374
as well to collect the data in form of CSV-Files for the screening and the evaluation phase.
375
The Fig. 7,8, and 9have been compiled from that data. This setting followed exactly the
376
unified and simplified logging architecture presented in Fig. 4.377
Listing 5: The used logging program to record Twitter stock symbols from the public
Twitter Stream API
1im po rt l og1 2 , t weepy , o s 378
379
3KE Y = o s . en v ir on . g et ( " C ON SU M ER _ KE Y " ) 380
SE C R ET = o s . en v ir o n . g et ( " C O NS U M ER _ S EC R E T ") 381
5TO K E N = o s . e n vi r o n . ge t ( " A C C ES S _ T OK E N " ) 382
T OK E N _ SE C R E T = o s . e n vi r o n . g et ( " A C C E SS _ T O K EN _ S E C RE T " ) 383
7384
L AN G U AG E S = [l . s t r ip ( ) fo r l in o s . e n vi r o n . ge t ( " L A N GU A G ES " , " ") . s p l it ( " , " ) ] 385
9TR A C K = [ t . s tr i p ( ) fo r t in o s . e n vi r o n . ge t ( " T R AC K S " ) . s pl i t ( " ," ) ] 386
387
11 lo g = lo g 12 . l o gg i ng ( " t wi t te r ␣ st r ea m " ) 388
389
13 cl a ss Tw i st a ( t we ep y . S tr e am ) : 390
391
15 de f o n _ st a t us ( s el f , s t a tu s ) : 392
wi t h lo g . ev e nt ( " t we e t ", t w ee t _i d = s ta t us . i d_ st r , 393
17 us e r _i d = s ta t us . u s er . i d_ st r , l an g = s ta t us . l a ng 394
) a s e ve n t : 395
19 ki n d = " s ta t us " 396
ki n d = " r ep l y " i f s ta tu s . _ js o n [’ i n _ re p ly _ t o_ s ta t u s_ i d ’ ] e ls e k in d 397
21 ki n d = " r et w ee t " if ’ r e tw e et e d_ s ta t us ’ i n s t at u s . _j so n e ls e k in d 398
ki n d = " q uo t e " i f ’ qu o te d _ st a tu s ’ in st a tu s . _ js on el s e ki n d 399
23 ev e nt . u p da t e ( la n g = st a tu s . la ng , ki n d = ki nd , m e ss a ge = s t at u s . te x t ) 400
401
25 wi t h e v en t . c h i ld ( ’ u s e r ’ ) a s u sr : 402
na m e = s t at u s . us e r . na me if s t at u s . us e r . na me el s e " u n kn o w n " 403
27 us r . u pd at e ( l an g = st a tu s . la ng , i d = st at u s . us er . i d_ s tr , 404
na m e = na m e , 405
29 s cr e e n_ n a m e = f "@ { s t a tu s . u s er . s c r ee n _ n am e } " , 406
me s sa g e = st a tu s . te xt , 407
31 kind=kind 408
)409
33 410
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
12 of 18
fo r t ag i n s t at u s . e n ti t i e s [ ’ h as h t ag s ’ ] : 411
35 wi t h e ve n t . c h il d ( ’ h a sh t a g ’ ) a s h a sh t a g : 412
ha s ht a g . up d at e ( la n g = st at u s . la ng , 413
37 ta g = f " #{ t ag [ ’ t e x t ’ ]. l o w er ( ) } " , 414
me s sa g e = st a tu s . te xt , 415
39 kind=kind 416
)417
41 418
fo r s ym i n s t at u s . e n ti t i e s [ ’ s ym b o ls ’ ] : 419
43 wi t h e ve n t . c h il d ( ’ s y mb o l ’ ) as s y m bo l : 420
sy m bo l . u pd a te ( l an g = st a tu s . la ng , 421
45 sy m b ol = f " $ { sy m [ ’ t ex t ’ ]. u p p er ( ) } " , 422
me s sa g e = st a tu s . te xt , 423
47 kind=kind 424
)425
49 sy m bo l . u pd a te ( s c re e n_ n am e = f "@ { s ta t us . u se r . s cr e en _ na m e }" ) 426
427
51 fo r u s e r_ m e n ti o n i n s ta t u s . e nt i t ie s [ ’ u s er _ m e nt i o n s ’ ]: 428
wi t h e ve n t . c h il d ( ’ m e nt i o n ’ ) a s m e nt i o n : 429
53 me n ti o n . up d at e ( la n g = st at u s . la ng , 430
s cr ee n _n a me = f "@ { u se r _m e nt i on [ ’ sc r ee n _n a me ’] } ", 431
55 me s sa g e = st a tu s . te xt , 432
kind=kind 433
57 )434
435
59 re c o r d = Tw i s ta ( K EY , S EC R E T , T OK E N , T O K EN _ S E C RE T ) 436
if L A NG U AG E S : 437
61 re c or d . fi l te r ( tr a ck = T RA CK , l a ng u ag e s = LA N GU AG E S ) 438
else: 439
63 re c or d . f il t er ( t ra c k = TR AC K ) 440
According to Fig. 7, just every 100th observed event in the screening phase was a stock
441
symbol. That is simply the "ground-truth" on Twitter. If one is observing the public Twitter
442
stream without any filter, that is what you get. So, the second evaluation phase recorded
443
a very specific "filter bubble" of the Twitter stream. The reader should be aware, that the
444
data presented in the following is a clear bias and not a representative Twitter event stream,
445
it is clearly a stock market focused subset or to be even more precise: a cryptcocurrency
446
focused subset, because almost all stock symbols on Twitter are related to cryptocurrencies.
447
It is possible to visualize the resulting effects using the recorded data. Fig. 8shows the
448
difference in language distributions of the screening phase (unfiltered ground-truth) and
449
the evaluation phase (activated symbol filter). While in the screening phase English (en),
450
Spanish (es), Portugese (pt), and Turkish (tr) are responsible for more than 3/4 of all traffic,
451
in the evaluation phase almost all recorded Tweets are in English. So, on Twitter, the most
452
stock symbol related language is clearly English. 453
en
es
pt
tr
Other
fr
und
ja
it
pl
in
Languages
(ISO code)
Screening phase
en
und
Other
tr
qme
fr
es
ja
in
cy
de
Languages
(ISO code)
Evaluation phase
Figure 8. Observed languages (screening and evaluation phase of Use Case 4).
Although the cryptocurrency logging was used mainly as a use case for technical evaluation
454
purposes of the logging library prototype, some interesting insights could be gained. E.g., 455
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
13 of 18
although Bitcoin (BTC) is likely the most prominent cryptocurrency, it is by far not the most
456
frequent used stock symbol on Twitter. The most prominent stock symbols on Twitter are:
457
• ETH: Ethereum cryptocurrency 458
• SOL: Solana cryptocurrency 459
• BTC: Bitcoin cryptocurrency 460
•
LUNA: Terra Luna cryptocurrency (replaced by a new version after the crash in May
461
2022) 462
• BNB: Binance Coin cryptocurrency 463
What is more, we can see interesting details in trends (see Fig. 9). 464
• The ETH usage on Twitter seems to reducing throughout our observed period. 465
•
The SOL usage is on the opposite increasing, although we observed a sharp decline in
466
July. 467
•
The LUNA usage has a clear peak that correlates with the LUNA cryptocurrency crash
468
in the mid of May 2022 (this crash was heavily reflected in the investor media). 469
The Twitter usage was not correlated with the curreny rates on crpytocurrency stock
470
markets. However, changes in usage patterns of stock market symbols might be of interest
471
for cryptocurrency investors as interesting indicators to observe. As this study shows, these
472
changes can be easily tracked using structured logging approaches. Of course, this can be 473
transferred to other social media streaming or general event streaming use cases like IoT
474
(Internet of Things) as well. 475
5. Discussion 476
This style of a unified and structured observability was successfully evaluated on several
477
use cases that made usage of a FileBeat/ElasticSearch-based observability stack. However,
478
other observability stacks that can forward and parse structured text in a JSON-format will
479
likely show the same results. The evaluation included a long-term test over more than six 480
months for a high-volume evaluation use-case. 481
• On the one hand, it could be proven that such a type of logging can easily be used to 482
perform classic metrics collections. For this purpose, BlackBox metrics such as CPU,
483
memory, and storage for the infrastructure (nodes) but also the "payload" (pods) were
484
successfully collected and evaluated in several Kubernetes clusters (see Fig. 6). 485
•
Second, a high-volume use case was investigated and analyzed in-depth. Here, all
486
English-language tweets on the public Twitter stream were logged. About 1 million
487
events per hour were logged over a week and forwarded to an ElasticSearch database
488
using the log forwarder FileBeat. Most systems will generate far fewer events (see
489
Figure 7). 490
2022-02 2022-03 2022-04 2022-05 2022-06 2022-07 2022-08
0
100000
200000
300000
400000
500000
600000
700000
800000
Infrastructure
downtime
LUNA crash
decline unclear
Recorded symbols per day (Screening phase)
$ETH $SOL $BTC $LUNA $BNB
Figure 9. Recorded symbols per day (evaluation phase of Use Case 4).
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
14 of 18
•
In addition, the prototype logging library log12 is meanwhile used in several internal
491
systems, including web-based development environments, QR code services, and
492
e-learning systems, to record access frequencies to learning content, and to study
493
learning behaviours of students. 494
5.1. Lessons learned 495
All use cases have shown that structured logging is easy to instrument and harmonizes
496
well with existing observability stacks (esp. Kubernetes, Filebeat, ElasticSearch, Kibana).
497
However, some aspects should be considered: 498
1.
It is essential to apply structured logging, cause this can be used to log events, metrics,
499
and traces in the same style. 500
2.
Very often, only error-prone situations are logged. However, if you want to act in the
501
sense of DevOps-compliant observability, you should also log normal - completely
502
regular - behaviour. DevOps engineers can gain many insights from how normal
503
users use systems in standard situations. So, the log level should be set to INFO, and
504
not WARNING, ERROR, or above. 505
3.
Cloud-native system components should rely on the log forwarding and log aggrega-
506
tion of the runtime environment. Never implement this on your own. You will double
507
logic and end up with complex and may be incompatible log aggregation systems. 508
4.
To simplify analysis for engineers, one should push key-value pairs of parent events
509
down to child events. This logging approach simplifies analysis in centralized log
510
analysis solutions - it simply reduces the need to derive event contexts that might be
511
difficult to deduce in JSON document stores. However, this comes with the cost of
512
more extensive log storage. 513
5.
Do not collect aggregated metrics data. The aggregation (mean, median, percentile,
514
standard deviations, sum, count, and more) can be done much more convenient in
515
the analytical database. The instrumentation should focus on recording metrics data
516
in a point-on-time style. According to our developer experience, developers are glad
517
to be authorized to log only such simple metrics, especially when there is not much
518
background knowledge in statistics. 519
5.2. Threats of validity and to be considered limitations of the study design 520
Action research is prone to drawing incorrect or non-generalizable conclusions. Logically,
521
the significance is consistently highest within the considered use cases. In order to draw
522
generalizable conclusions, this study defined use cases in such a way that intentionally
523
different classes of telemetry data (logs, metrics, traces) were considered. It should be noted
524
that the study design primarily considered logs and metrics but traces only marginally.
525
Traces were not wholly neglected, however, but were analyzed less intensively. 526
The long-term acquisition was performed with a high-volume use case to cover certain
527
stress test aspects. However, the reader must be aware, that the screening phase generated
528
significantly higher data volumes in Use Case 4 than the evaluation phase. Therefore, to use
529
stress test data from this study, one should look at the event volume of the screening phase
530
of Use Case 4. Here, about ten thousand events per minute were logged for more than a
531
week giving an impression of the performance of the proposed approach. The study data
532
shows that the saturation limit should be far beyond these ten thousand events per minute.
533
However, the study design did not pushed the system to its event recording saturation
534
limits. 535
What is more, this study should not be used to derive any cryptocurrency related
536
conclusions. Although some interesting aspects from Use Case 4 could be of interest for
537
cryptocurency trading indicator generation. However, no detailed analysis on correlations
538
between stock prices and usage frequencies of stock symbols on Twitter have been done. 539
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
15 of 18
6. Related work 540
There are relatively few studies dealing with observability as a main object of investigation
541
in an academic understanding. The field is currently treated somewhat stepmotherly.
542
However, an interesting and recent overview is provided by the survey of Usman et al.
543
[
21
]. This survey provides a list of microservice-focused managed and unified observability
544
services (Dynatrace, Datadog, New Relic, Sumo Logic, Solar Winds, Honeycomb). The
545
presented research prototype of this study heads into the same direction, but tries to pursue
546
the problem primarily on the instrumenting side using a more lightweight and unified
547
approach. So, to address the client-side of the problem is obviously harder economical ex-
548
ploitable which is why the industry might address the problem preferable on the managed
549
service side. 550
Of logs, metrics and distributed traces, distributed tracing is still considered in the
551
most detail. In particular, the papers around Dapper [
20
] should be mentioned here, which
552
had a significant impact on this field. A black box approach without instrumenting needs 553
for distributed tracing is presented by [
22
]. This study, however, has seen tracing as only
554
one of three aspects of observability and therefore follows a broader approach. A more
555
recent review on current challenges and approaches of distributed tracing is presented by 556
Bento et. al. [23]. 557
6.1. Existing instrumenting libraries and observability solutions 558
Although the academic coverage of the observability field is expandable, in practice, there is
559
an extensive set of existing solutions, especially for time series analysis and instrumentation.
560
A complete listing is beyond the scope of this paper. However, from the disproportion
561
of the number of academic papers to the number of real existing solutions, one quickly
562
recognizes the practical relevance of the topic. Table 1contains a list of existing database
563
products often used for telemetry data consolidation to give the reader an overview without
564
claiming completeness. This study used ElasticSearch as an analytical database. 565
Table 1. Often seen databases for telemetry data consolidation. Products used in this study are
marked bold ⊗. Without claiming completeness.
Product Organization License often seen scope
APM Elastic Apache 2.0 Tracing (add-on to ElasticSearch database)
ElasticSearch ⊗Elastic Apache/Elastic License 2.0 Logs, Tracing, (rarely Metrics)
InfluxDB Influxdata MIT Metrics
Jaeger Linux Foundation Apache 2.0 Tracing
OpenSearch Amazon Web Services Apache 2.0 Logs, Tracing, (rarely Metrics); fork from ElasticSearch
Prometheus Linux Foundation Apache 2.0 Metrics
Zipkin OpenZipkin Apache 2.0 Tracing
Table 2lists several frequently used forwarding solutions that developers can use to forward
566
data from the point of capture to the databases listed in Table 1. In the context of this study,
567
FileBeat was used as a log forwarding solution. It could be prooved that this solution is
568
also capable to forward traces and metrics if applied in a structured logging setting. 569
Table 2. Often seen forwarding solutions for log consolidation. Products used in this study are
marked bold ⊗. Without claiming completeness.
Product Organization License
Fluentd FluentD Project Apache 2.0
Flume Apache Apache 2.0
LogStash Elastic Apache 2.0
FileBeat ⊗Elastic Apache/Elastic License 2.0
Rsyslog Adiscon GPL
syslog-ng One Identity GPL
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
16 of 18
An undoubtedly incomplete overview of instrumentation libraries for different prod-
570
ucts and languages is given in Table 3, presumably because each programming language
571
comes with its own form of logging in the shape of specific libraries. To avoid this language-
572
binding is hardly possible in the instrumentation context unless one pursues "esoteric
573
approaches" like [
22
]. The logging library prototype is strongly influenced by the Python
574
standard logging library but also by structlog for structured logging but without actually
575
using these libraries. 576
Table 3. Often seen instrumenting libraries. Products that inspired the research prototype are marked
bold ⊗. Without claiming completeness.
Product Use Case Organization License Remark
APM Agents ⊗Tracing Elastic BSD 3
Jaeger Clients Tracing Linux Foundation Apache 2.0
log Logging Go Standard Library BSD 3 Logging for Go
log4j Logging Apache Apache 2.0 Logging for Java
logging ⊗Logging Python Standard Library GPL compatible Logging for Python
Micrometer Metrics Pivotal Apache 2.0
OpenTracing Tracing OpenTracing Apache 2.0
prometheus Metrics Linux Foundation Apache 2.0
Splunk APM Tracing Splunk Apache 2.0
structlog ⊗Logging Hynek Schlawack Apache 2.0, MIT structured logging for Python
winston Logging Charlie Robbins MIT Logging for node.js
6.2. Standards 577
There are hardly any observability standards. However, a noteworthy standardization ap-
578
proach is the OpenTelemetry Specification [
7
] of the Cloud Native Computing Foundation
579
[
24
], that tries to standardize the way of instrumentation. This approach corresponds to the
580
core idea, which this study also follows. Nevertheless, the standard is still divided into Logs
581
[
25
], Metrics [
26
] and Traces [
27
], which means that the conceptual triad of observability
582
is not questioned. On the other hand, approaches like the OpenTelemetry Operator [
28
]
583
for Kubernetes enable to inject auto-instrumentation libraries for Java, Node.js and Python
584
into Kubernetes operated applications which is a feature that is currently not addressed
585
by the present study. However, so-called service meshes also use auto-instrumentation. A
586
developing standard here is the so-called Service Mesh Interface (SMI) [29]. 587
7. Conclusions and Future Research Directions 588
Cloud-native software systems often have a much more decentralized structure and many
589
independently deployable and (horizontally) scalable components, making it more compli-
590
cated to create a shared and consolidated picture of the overall decentralized system state.
591
Today, observability is often understood as a triad of collecting and processing metrics,
592
distributed tracing data, and logging. But why except for historical reasons? 593
This study presents a unified logging library for Python [
30
] and a unified logging
594
architecture (see Fig. 4) that uses a structured logging approach. The evaluation of four
595
use cases shows that several thousand events per minute are easily processable and can
596
be used to handle logs, traces, and metrics the same. At least, this study was able with
597
a straight-forward approach to log the world-wide Twitter event stream of stock market
598
symbols over a period of six months without any noteworthy problems. As a side effect,
599
some interesting aspects how crypto-currencies are reflected on Twitter could be derived.
600
This might be of minor relevance for this study but shows the overall potential of an unified
601
and structured logging based observability approach. 602
The presented approach relies on an easy-to-use programming language-specific
603
logging library that follows the structured logging approach. The long-term observation
604
results of more than six months indicate that a unification of the current observability
605
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
17 of 18
triad of logs, metrics, and traces is possible without the necessity to develop utterly new
606
toolchains. The trick is to 607
• use structured logging and 608
• apply log forwarding to a central analytical database 609
• in a systematic infrastructure- or platform-provided way. 610
Further research should therefore be concentrated on the instrumenting and less on the
611
log forwarding and consolidation layer. If we instrument logs, traces, and metrics in the
612
same style using the same log forwarding, we automatically generate correlatable data in a
613
single data source of truth and we simplify analysis. 614
So, the observability road ahead may have several paths. On the one hand, we
615
should standardize the logging libraries in a structured style like log12 in this study
616
or the OpenTelemetry project in the "wild". Logging libraries should be comparably
617
implemented in different programming languages and shall generate the same structured
618
logging data. So, we have to standardize the logging SDKs and the data format. Both
619
should be designed to cover logs, metrics, and distributed traces in a structured format. To
620
simplify instrumentation further, we should additionally think about auto-instrumentation
621
approaches, for instance, proposed by the OpenTelemetry Kubernetes Operator [
28
] and
622
several Service Meshes like Istio [31] and corresponding standards like SMI [29]. 623
Funding: This research received no external funding. 624
Data Availability Statement: The resulting research prototype of the developed structured logging
625
library log12 can be accessed here [
30
]. However, the reader should be aware, that this is prototyping
626
software in progress. 627
Conflicts of Interest: The author declares no conflict of interest. 628
References 629
1.
Kalman, R. On the general theory of control systems. IFAC Proceedings Volumes 1960,1, 491–502. 1st International IFAC Congress
630
on Automatic and Remote Control, Moscow, USSR, 1960, https://doi.org/https://doi.org/10.1016/S1474-6670(17)70094-8.631
2.
Kalman, R.E. Mathematical Description of Linear Dynamical Systems. Journal of the Society for Industrial and Applied Mathematics
632
Series A Control 1963,1, 152–192. https://doi.org/10.1137/0301010.633
3. Newman, S. Building Microservices, 1st ed.; O’Reilly Media, Inc., 2015. 634
4.
Kim, G.; Humble, J.; Debois, P.; Willis, J.; Forsgren, N. The DevOps handbook: How to create world-class agility, reliability, & security in
635
technology organizations; IT Revolution, 2016. 636
5. Davis, C. Cloud Native Patterns: Designing change-tolerant software; Simon and Schuster, 2019. 637
6.
Kratzke, N. Cloud-native Computing: Software Engineering von Diensten und Applikationen für die Cloud; Carl Hanser Verlag GmbH
638
Co. KG, 2021. 639
7. The OpenTelemetry Authors. The OpenTelemetry Specification, 2021. 640
8.
Kratzke, N.; Peinl, R. ClouNS - a Cloud-Native Application Reference Model for Enterprise Architects. In Proceedings
641
of the 2016 IEEE 20th International Enterprise Distributed Object Computing Workshop (EDOCW), 2016, pp. 1–10. https:
642
//doi.org/10.1109/EDOCW.2016.7584353.643
9. Kratzke, N.; Quint, P.C. Understanding Cloud-native Applications after 10 Years of Cloud Computing - A Systematic Mapping 644
Study. Journal of Systems and Software 2017,126, 1–16. https://doi.org/10.1016/j.jss.2017.01.001.645
10. Kratzke, N. A Brief History of Cloud Application Architectures. Applied Sciences 2018,8.https://doi.org/10.3390/app8081368.646
11.
Kratzke, N. How programming students trick and what JEdUnit can do against it. In Computer Supported Education ; Lane, H.C.;
647
Zvacek, S.; Uhomoibhi, J., Eds.; Springer International Publishing , 2020; pp. 1–25. CSEDU 2019 - Revised Selected Best Papers 648
(CCIS), https://doi.org/10.1007/978-3-030-58459-7_1.649
12. Kratzke, N. Einfachere Observability durch strukturiertes Logging. Informatik Aktuell 2022.650
13.
Kratzke, N.; Siegfried, R. Towards Cloud-native Simulations - Lessons learned from the front-line of cloud computing. Journal of
651
Defense Modeling and Simulation 2020.https://doi.org/10.1177/1548512919895327.652
14.
Truyen, Eddy.; Kratzke, Nane.; Van Landyut, Dimitri.; Lagaisse, Bert.; Joosen, Wouter. Managing Feature Compatibility in
653
Kubernetes: Vendor Comparison and Analysis. IEEE Access 2020,8, "228420–228439". https://doi.org/10.1109/ACCESS.2020.3
654
045768.655
15.
Petersen, K.; Gencel, C.; Asghari, N.; Baca, D.; Betz, S. Action Research as a Model for Industry-Academia Collaboration in the
656
Software Engineering Context. In Proceedings of the Proceedings of the 2014 International Workshop on Long-Term Industrial
657
Collaboration on Software Engineering; Association for Computing Machinery: New York, NY, USA, 2014; WISE ’14, p. 55–62.
658
https://doi.org/10.1145/2647648.2647656.659
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1
18 of 18
16.
Kratzke, N. The #BTW17 Twitter Dataset - Recorded Tweets of the Federal Election Campaigns of 2017 for the 19th German
660
Bundestag. Data 2017,2.https://doi.org/10.3390/data2040034.661
17. Kratzke, N. Monthly Samples of German Tweets, 2022. https://doi.org/10.5281/zenodo.2783954.662
18. Wiggins, A. The Twelve-Factor App, 2017. https://12factor.net.663
19. The Kubernetes Authors. Kubernetes, 2014. https://kubernetes.io.664
20.
Sigelman, B.H.; Barroso, L.A.; Burrows, M.; Stephenson, P.; Plakal, M.; Beaver, D.; Jaspan, S.; Shanbhag, C. Dapper, a Large-Scale
665
Distributed Systems Tracing Infrastructure. Technical report, Google, Inc., 2010. 666
21.
Usman, M.; Ferlin, S.; Brunstrom, A.; Taheri, J. A Survey on Observability of Distributed Edge & Container-based Microservices.
667
IEEE Access 2022, pp. 1–1. https://doi.org/10.1109/ACCESS.2022.3193102.668
22.
Chow, M.; Meisner, D.; Flinn, J.; Peek, D.; Wenisch, T.F. The Mystery Machine: End-to-end Performance Analysis of Large-scale
669
Internet Services. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14);
670
USENIX Association: Broomfield, CO, 2014; pp. 217–231. 671
23.
Bento, A.; Correia, J.; Filipe, R.; Araujo, F.; Cardoso, J. Automated Analysis of Distributed Tracing: Challenges and Research
672
Directions. Journal of Grid Computing 2021,19, 9. https://doi.org/10.1007/s10723-021- 09551-5.673
24. Linux Foundation. Cloud-native Computing Foundation, 2015. https://cncf.io.674
25.
The OpenTelemetry Authors. The OpenTelemetry Specification - Logs Data Model, 2021. https://opentelemetry.io/docs/
675
reference/specification/logs/data-model/.676
26.
The OpenTelemetry Authors. The OpenTelemetry Specification - Metrics SDK, 2021. https://opentelemetry.io/docs/reference/
677
specification/metrics/sdk/.678
27.
The OpenTelemetry Authors. The OpenTelemetry Specification - Tracing SDK, 2021. https://opentelemetry.io/docs/reference/
679
specification/trace/sdk/.680
28.
The OpenTelemetry Authors. The OpenTelemetry Operator, 2021. https://github.com/open-telemetry/opentelemetry- operator.
681
29. Service Mesh Interface Authors. SMI: A standard interface for service meshes on Kubernetes, 2022. https://smi-spec.io.682
30. Kratzke, N. log12 - a single and self-contained structured logging library, 2022. https://github.com/nkratzke/log12.683
31. Istio Authors. The Istio service mesh, 2017. https://istio.io/.684
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 August 2022 doi:10.20944/preprints202208.0427.v1