Conference PaperPDF Available

Using Static Analysis for Knowledge Extraction from Industrial User Interfaces

Authors:

Figures

Content may be subject to copyright.
Using Static Analysis for Knowledge Extraction
from Industrial User Interfaces
Bernhard Dorninger, Josef Pichler, and Albin Kern
Software Competence Center Hagenberg, Austria
{
firstname.lastname
}@scch.at
Engel Austria GmbH
albin.kern@engel.at
Abstract—Graphical User Interfaces (GUI) play an essential
role in operating industrial facilities and machines. Depending on
the range and variability of a manufacturer’s product portfolio
a huge library of GUI software may exist. This poses quite
a challenge when it comes to testing or re-engineering. Static
analysis helps to unveil valuable, inherent knowledge and prepare
it for further analysis and processing. In our case at ENGEL
Austria GmbH, we extract the internal structure of the GUI
screens, their variants and the control system context they are
used in, i.e. which PLC variables they access. In another step, we
analyze the usage pattern of method calls to certain UI widgets.
In this paper we show our approach to gain these information
based on static analysis of existing GUI source code for injection
molding machines.
I. INTRODUCTION
Most industrial processes demand flexible and reliable user
interfaces, which can be operated in an intuitive and safe way
by operators. While in former times hardware-based human
machine interfaces with a mixture of knobs, buttons and signal
lamps dominated the landscape, the fast and constant techno-
logical development since the 80s has seen GUI taking their
fair share in industrial applications. Nowadays, GUI software
for e.g. machine control and visualization is a constituent
part of complex machines and plant facilities. When offering
a large and complex product range, machine manufacturers
might have a vast library of GUI software programs and
components. Quite frequently, such libraries grow and evolve
over the years, whilst in contrast the (domain) knowledge
embodied by the software becomes less explicit. There are
several reasons for this. A main cause is that—often under
pressure of daily routine—documentation of software is often
neglected and tends to be inaccurate [1]. In addition, key
experts may leave the organization, thus making the software
itself the only reliable source of knowledge. Typically, needs
expressed by the market bring this deficit to mind again. This
is the point, where static analysis may be utilized to regain
buried knowledge.
The research reported in this paper has been supported by the Austrian
Ministry for Transport, Innovation and Technology, the Federal Ministry of
Science, Research and Economy, and the Province of Upper Austria in the
frame of the COMET center SCCH.
There have been some efforts regarding the use of static
analysis of GUIs: An approach quite similar to ours is de-
scribed by Staiger [2], who analyzes C/C++ source code of
interactive applications. His approach utilizes Static Single
Assignment analysis to detect user interface widget creation
and references.
Silva et al. [3] use static analysis and graph algorithms
in their tool GUISurfer to extract behavioral models out of
interactive Java applications.
Another usage for static analysis of GUI code is the need
to gather information for automated testing. An example for
such an effort is given by Arlt et al. [4], who construct event
flow graphs from static analysis of the byte code (instead of
source code) of an application, which are then used for testing
the GUI.
The remainder of this paper is structured as follows: Section
II briefly describes the motivation and initial situation, as
well as the requirements and goals of our project. Section
III outlines our approach, where we pick selected aspects for
a more detailed view. Section IV deals with some significant
challenges met during the analysis and their handling. In the
subsequent conclusion, we sum up the current state of our
work and the contents of this paper.
II. PRO BL EM CONTEXT AND MOTIVATIO N
ENGEL Austria is a large injection molding machine man-
ufacturer, who offers a broad portfolio of molding machines
tailorable to specific customer needs. For this purpose, each
machine can be equipped and configured with numerous
different options. Since ENGEL not only builds the machines,
but also develops both machine control and GUI software, the
high product variability of course is reflected in that software
as well. The GUI framework, which ENGELs machine visual-
ization software is based on (a proprietary framework atop of
Java AWT), is nearing the end of its lifetime, which has trig-
gered efforts for designing the next machine generation’s GUI
software. However, the existing software contains knowledge
which is vital for re-engineering the GUI framework justifying
the application of static analysis methods. The most interesting
aspects concern the relationship of the GUI to the underlying
machine control system (PLC). Specifically, it is the PLC
variables being accessed by the UI components, be it for
displaying values or for changing the behavior or appearance
of a GUI component. But there are also other aspects, such
as the internal structure of GUI components and the usage of
attributes, which are of interest. We will go into more detail
in the following sections, starting with a brief look at the most
important characteristics of ENGEL’s GUI components.
A. Initial Situation and GUI Characteristics
A major share of the regarded GUI library consists of highly
modularized and semi-standardized components, which are the
building blocks for GUI applications of the various molding
machines. These applications are usually made up of a number
of screen masks representing machine features or operator
tasks. For instance, the feature Ejector may consist of 1 to n
masks, depending on type, size and customization of the ma-
chine. Nearly all screens are built from GUI composites called
stripes (see Figure 1: stripes are surrounded by the dashed
lines). Stripes may cover different aspects of a machine feature
and are in turn also built from smaller components (“base
widgets”), ranging from simple atomic widgets provided by
the basic GUI framework, e.g. a button, to advanced ones, e.g.
an input panel with a descriptive label, a value field and a unit
label. The ENGEL GUI library contains an amount of 2000+
stripes using 50+ base widgets. These stripes are the subject
Figure 1: Sample of a GUI screen mask
of our analysis. They share some important characteristics:
Homogenous Language and User Interface technology: All
analyzed stripes are implemented in the same programming
language (Java). The base framework used is a commercial
third-party product built on top of Java AWT. No source code
is available for this base framework.
Uniform Structure and Initialization: Each stripe is derived
from the same base implementation. Abstract base implemen-
tations are complemented by concrete subclasses. Creation
and initialization of a stripe follow a more or less consistent
pattern.
Variability at runtime: As there are no declarative config-
uration possibilities, options and variants within stripes are
represented by conditional statements in the program code.
Almost all of these conditions refer to the presence or the
value of PLC variables.
B. Requirements and Goals
The overall goal is to analyze the stripe code base and
provide extracted knowledge as a base for re-engineering the
GUI library. In particular, the most important requirements are:
Extract GUI containment tree: Extract the content tree of
user interface components as it would be present after the
stripe’s initialization procedure. Each containment has to be
resolved. Base widgets (unparseable widgets, System widgets
or ENGEL widgets from certain packages) need not to be
decomposed further.
Preserve attributes set during initialization: Apart from
usual GUI attributes relevant for presentation (e.g. color,
fonts, texts, images) and behavior, references to PLC vari-
ables deserve special attention. Concerning PLC related at-
tributes, it is not only the variable names, but also the pattern
of the attribute usage which is of particular interest.
Preserve conditional variants: Most stripes contain hard-
coded variants and options, which are implemented by con-
ditional statements. These conditions control the construction
of the content tree and thus the presentation of a stripe,
depending on the machine the GUI application is deployed
on. A special form of variants is implemented by loops.
Depending on the value of a PLC variable the loop adds
a varying amount of child components to a stripe.
III. APP ROAC H
In a preceding manual analysis (“Widget Candidate Analy-
sis”), we have identified the relevant base widgets used by the
stripes and the relevant methods of these widgets being called
during initialization. In addition we examined the stripe source
code regarding structural peculiarities which may originate in
e.g. the programmer’s coding habits.
In a first step, a stripe source code file is parsed, which
in our tool is done with the help of the Eclipse JDT. If the
processed stripe’s type is derived from another stripe, the base
types have to be parsed, too.
The ensuing abstract syntax trees (AST) act as input for
the analysis step, which accomplishes the extraction of the
required information to instances of a simple meta model
(henceforth referred to as
uiModel
”). The purpose of this
model is to decouple content analysis from post processing
steps. In our case, the only post processing step at the moment
is the discovery of widget usage patterns, which produces an
XML report. However, more processing steps are possible,
ranging from the generation of test code for the existing stripes
to generation of code skeletons as a base for re-engineering.
In the remaining sections of our paper, we will concentrate
on elaborating on the analysis aspect, which may be roughly
divided into the sub-tasks call graph determination,content
analysis and model assembly. There will also be a short look
at the detection of widget usage patterns.
A. Determining the Call Graph
In the first place, the inheritance tree of the currently
processed stripe is examined and the call graph of the stripe’s
creation is determined. The goals are to index the method
declaration (and their bodies) needed for interpretation, ensure
the processing of the method declarations in the correct order
and to decide if a method declaration encountered during AST
visitation may be ignored at all. Call graph determination
(CGD) typically involves the whole inheritance tree of a
stripe, which usually features two hierarchy levels (rarely three
or more). The creation path usually starts with the default
constructor of a concrete stripe declaration. In case there is
no default, but another constructor with arguments, the latter
is chosen as starting point. CGD solely focuses on methods
declared in the inheritance tree. Of course, dynamic dispatch
is taken into account as well: For an abstract method along
the call path the most significant concrete overriding method
is picked. Methods not being declared in the scope of classes
in the inheritance tree (e.g. attribute setters) of the examined
UI stripe are ignored at this point.
The result is a data structure containing the sequence of
method invocations mapped to the method declaration nodes
from the ASTs.
B. Content Analysis
In this step, the stripes content tree is examined with the
help of (partial) interprocedural analysis, where the AST is
traversed with a visitor by considering the order of invocations
in the generated call graph. The basic and simplified flow for
that task is shown in Algorithm 1. Method declaration nodes
corresponding to method invocations along the call path will
be visited and analyzed further. If the declaration node for
an invocation is unavailable (e.g. in case of binary libraries),
the invocation has to be handled by a suitable method handler
- which needs to be implemented the effects of the handled
method in mind. For instance, a handler for the setText
method of a GUI component has to add an attribute text to
that component’s
uiModel
element. Which methods actually
need such handlers has been identified in the Widget Candidate
Analysis mentioned above at the beginning of Section III.
The selection of a method handler works via matching of a
method invocation’s signature and caller type. The matching
of the method name needs not to be exact, a method handler
implementation may provide a regular expression for matching
an actual invocation. Also we allow different parameter num-
bers or types to be specified to support overloaded methods.
The caller’s type may be restricted to an exact match or the
match may be allowed for base classes and interfaces as well.
When analyzing method declarations on the initialization
call path, it is needed to record every addition of GUI
components to the currently analyzed stripe. Thus, our analysis
needs to be flow sensitive and has to memorize modifications
Algorithm: Processing the call graph
Data: the call graph cGraph referencing the declarations decl of the contained
methods called
Data: the method handler registry mHRegistr y
Data: the analysis context scope
input : the AST node nd to analyze
switch type of nd do
case ”Method Declaration”
if nd cGraph then
if nd is root of cGraph or nd is marked for analysis then
visit body of nd;
push result to scope;
end
case ”Method Invocation”
if declaration decl of nd cGraph then
mark decl for analysis;
visit decl;
unmark decl;
else
mhdl find method handler in mHReg istry for nd;
if mhdl is found then
invoke mdl on nd;
push result to scope;
end
...
endsw
Algorithm 1. Processing the call graph
to variables referring to GUI components. Furthermore, other
variables have to be stored and kept up to date: Numbers may
be used for adding components in a loop, Strings may be used
for dynamic construction of PLC variable names. Keeping
track of the data flow involves two facets:
Gathering variable declarations: Variables used for building
and initializing the stripe may be declared on type level
(member vars) and on block level (local vars). Thus, for each
type in the hierarchy and for each block statement a variable
scope is created, which acts as context for the call analysis.
Scopes are organized in a stack, with each scope entry being
linked to its parent scope. If during AST traversal a variable
declaration is encountered, an empty variable container is
created in the currently open scope. When analysis leaves a
block statement, the corresponding scope is discarded. This
ensures our analysis being context-sensitive, since a method
may be called more than once.
Processing assignments: When declared variables are as-
signed a value, the correct variable container in the scope has
to be updated appropriately. In case of GUI components, the
expression of an assignment is usually an instance creation
statement or an invocation of a factory method. Handling
of these statements will result in creation of the respective
uiModel
element. Subsequently, the corresponding variable
container in the currently open scope or - if appropriate - in
one of its ancestor scopes is updated, depending on the type
of access (unqualified vs. this or super) and variable visibility.
Treatment of other variable types (Numbers, Strings,..) is
implemented similarly, but may involve expression resolving.
For instance, hardcoded concatenations of string constants for
building PLC variable names will be resolved.
C. Model Assembly
Model assembly deals with the addition of scoped UI
components and their attributes to the
uiModel
. Assembly is
more or less intertwined with analysis, as it is also done during
traversal of the AST.
protected void initStripe(String machineUnitID) {
EPanel partRemovalPanel = new EPanel();
// input panel var
InputPanel ip = new InputPanel();
partRemovalPanel.add(ip);
ip.setBackgroundColor(CTouchConstants.BGCOLOR);
....
// conditional input panel var via factory method in a loop
if(HMIVarService.checkVariable(machineUnitID+".vEjectNumSet") {
for(int i=0; i< HMIVarService.getVariable(machineUnitID+".vNumEjectors" ; i++) {
ip=createSettingsPanel();
ip.setLabelText(HMITextService.getText(......));
partRemovalPanel.add(ip);
}
}
....
// factory Method as argument expression
partRemovalPanel.add(createPosComboPanel());
....
this.add(partRemovalPanel)
}
Listing 1: component init sample snippet
Component Additions: A GUI component is appended to
the
uiModel
, whenever the respective variable is referred to
in a call to one of another container component’s add(...)
methods. This may happen any time during the initialization
call path. Components may also be added anonymously
without any variable use. Listing 1 shows these addition
variants.
Attribute Additions: Attribute setter invocations are not
subjected to intraprocedural analysis but are rather treated
as terminal and processed by method handlers. Processing
attribute additions may also require evaluation of complex
method arguments. Arguments may refer not only to literals
and local constants, but also to scoped variables or can be
follow up method invocations (both internal and external).
In addition, some attributes require recovery of their “inner”
value. A method handler for color settings, for instance, must
extract the RGB value - a color’s name alone is not sufficient.
D. Widget Usage Pattern Detection
This is currently the only post processing task. The focus
lies on attributes referring to PLC variables being set on
base widgets. Some of these widgets have been designed for
universal use and feature up to 8 different PLC variable setters
- each serving a different purpose. Depending on which of
these are used, the widget may change its behavior and/or
its appearance. With nbeing the number of distinct attributes,
there may be up to 2n1possible usage patterns. The
uiModel
is scanned and the usage of interesting attributes is recorded
per UI object taking the conditions under which the attributes
have been added into account. The results are grouped by UI
component types and then merged to a report, supporting the
goal of finding out which attribute combinations are still in use
and which ones may be omitted from further considerations.
IV. CHALLENGES
In this section, we will discuss some challenging issues we
came across during implementing the analysis. See Listing 1
for samples.
A. Unresolvable Expressions
Stripe source code contains a lot of expressions, which are
impossible to resolve with static analysis. This is especially
the case for references to the underlying PLC (e.g. querying
PLC variable existence or values), which play a major role in
handling the variability of a stripe or loop statement handling
(see below).
B. Handling of Conditions
Conditions handle the variability within existing stripe code.
In almost all cases they cannot and even should not be (fully)
resolved, since the stripe’s internal variability shall also be
reflected in the
uiModel
and the subsequently generated target
artifacts. Thus, only minor reduction is done for condition
expressions (e.g. string concats). However, the chaining and
nesting of conditions in the stripe code must be reflected
correctly in the analysis result even over method and type
boundaries. This is achieved with the help of a condition stack
allowing the correct assembly of complex conditions.
C. Handling of Loop Statements
In GUI stripes, components are also added within loops.
FOR loops dominate by far over WHILE (10%) and DO
(only one) and most loop expressions tend to be quite simple.
As with conditions, a lot of loop statements also contain unre-
solvable (termination) expressions referencing PLC variables.
In this case, only one element is added to the
uiModel
and
the loop range is treated as a kind of special condition.
V. CONCLUSION
In this paper, we have provided a brief insight in our
application of static analysis in a large GUI re-engineering
project. We have successfully utilized static analysis methods
to unearth structural and contextual information burrowed
in existing GUI source code, which is subsequently used
as informational input in re-engineering tasks. The analysis
has been implemented in a functional prototype, which is
capable of analyzing single stripe classes as well as doing
a bulk analysis of packages or projects. The results so far
have been promising as currently approximately 85% of the
existing stripe source files can be analyzed thoroughly. Some
open issues include processing of certain structural patterns,
such as complex loop bodies (e.g. endless loops with break
statements in case of an exception) or the analysis of factory
methods with inner conditions and/or multiple return paths.
As re-engineering progresses, supporting code generation for
the new base technology may also become an option.
REFERENCES
[1] D. L. Parnas, “Software Aging,” in ICSE ’94: Proceedings of the 16th
international conference on Software engineering. Los Alamitos, CA,
USA: IEEE Computer Society Press, 1994, pp. 279–287. [Online].
Available: http://dx.doi.org/10.1016/0164-1212(87)90025- 2
[2] S. Staiger, “Static Analysis of programs with graphical user interface, in
11th European Conference on Software Maintenance and Reengineering,
2007. CSMR ’07, March 2007, pp. 252–264.
[3] J. C. Silva, J. C. Campos, and J. Saraiva, “GUI Inspection from Source
Code Analysis,” ECEASST, vol. 33, 2010.
[4] S. Arlt, A. Podelski, C. Bertolini, M. Schaf, I. Banerjee, and A. M.
Memon, “Lightweight Static Analysis for GUI Testing, in IEEE 23rd
International Symposium on Software Reliability Engineering (ISSRE),
2012. IEEE, 2012, pp. 301–310.
... The reUI [9] tool extracts the internal structure of GUI screens of an injection molding machine, their variants and the control system context they are used in. Furthermore, the usage pattern of method calls to certain UI widgets are analyzed. ...
Conference Paper
Full-text available
GUI testing is an active research area. The open challenge is the judicious generation of event sequences (an event sequence encodes a user interaction). A major advance in this direction is the use of a black-box model to systematically generate event sequences that are executable on the GUI. The black-box model can be, e.g., an Event Flow Graph (EFG) or an Event Sequence Graph (ESG). In this paper we propose a new approach to select relevant event sequences among the event sequences generated by a black-box model. We express the relevance of an event sequence by a precisely defined dependency between a fixed number of events in the event sequence. Departing from a pure black-box approach we apply a static analysis to the byte code of the application. This allows us to infer a dependency graph, which we call Event Dependency Graph (EDG). We use the EDG together with a black-box model to construct a set of relevant event sequences among the executable ones. We have implemented our approach in a new tool. We evaluate the approach on four open source GUI applications. With the specific choice of a lightweight static analysis, the approach scales to large applications and, at the same time, leads to an informed selection of event sequences. Using our approach we are able to find previously undetected bugs.
Conference Paper
Full-text available
Graphical user interfaces (GUIs) are critical components of todays software. Given their increased relevance, correctness and usability of GUIs are becoming essential. This paper describes the latest results in the development of our tool to reverse engineer the GUI layer of interactive computing systems. We use static analysis techniques to generate models of the user interface behaviour from source code. Models help in graphical user interface inspection by allowing designers to concentrate on its more important aspects. One particularly type of model that the tool is able to generate is state machines. The paper shows how graph theory can be useful when applied to these models. A number of metrics and algorithms are used in the analysis of aspects of the user interface's quality. The ultimate goal of the tool is to enable analysis of interactive system through GUIs source code inspection.
Conference Paper
Full-text available
Programs, like people, get old. We can't prevent aging, but we can understand its causes, take steps to limits its effects, temporarily reverse some of the damage it has caused, and prepare for the day when the software is no longer viable. A sign that the software engineering profession has matured will be that we lose our preoccupation with the first release and focus on the long-term health of our products. Researchers and practitioners must change their perception of the problems of software development. Only then will software engineering deserve to be called “engineering”
Article
Graphical user interfaces (GUIs) are critical components of todays software. Given their increased relevance, correctness and usability of GUIs are becoming essential. This paper describes the latest results in the development of our tool to reverse engineer the GUI layer of interactive computing systems. We use static analysis techniques to generate models of the user interface behaviour from source code. Models help in graphical user interface inspection by allowing designers to concentrate on its more important aspects. One particular type of model that the tool is able to generate is state machines. The paper shows how graph theory can be useful when applied to these models. A number of metrics and algorithms are used in the analysis of aspects of the user interface’s quality. The ultimate goal of the tool is to enable analysis of interactive system through GUIs source code inspection.
Conference Paper
We describe a new approach for statically analyzing pro- grams which have a graphical user interface (GUI). Our analysis detects the parts of the program which belong to the GUI, it detects widgets and hierarchies they form, and it shows the event handlers connected to events of those wid- gets. Besides supporting general program understanding, we show that this also supports control-flow analysis, ar- chitecture recovery, migration to GUI builders and mapping the visual appearance of the program to source code arte- facts. Our tests indicate that the static analysis we propose is fast and useful.