Conference PaperPDF Available

Fast subtype checking in the HotSpot JVM

Authors:
  • Stealth Mode Startup

Abstract

We present the fast subtype checking implemented in Sun's HotSpot JVM. Subtype checks occur when a program wishes to know if class S implements class T, where S and T are not both known at compile-time. Large Java programs will make millions or even billions of such checks, hence a fast check is essential. In actual benchmark runs our technique performs complete subtype checks in 3 instructions (and only 1 memory reference) essentially all the time. In rare instances it reverts to a slower array scan. Memory usage is moderate (11 words per class) and can be traded off for time. Class loading does not require recomputing any data structures associated with subtype checking.
ABSTRACT
We present the fast subtype checking
implemented in Sun's HotSpot JVM. Subtype
checks occur when a program wishes to know if
class S implements class T, where S and T are
not known at compile-time. Large Java
programs will make millions or even billions of
such checks, hence a fast check is essential. In
actual benchmark runs our technique performs
complete subtype checks in 3 instructions (only
1 memory reference) essentially all the time. In
rare instances it reverts to a slower array scan.
Memory usage is moderate (6 words per class)
and can be traded off for time. Class loading
does not require recomputing any data structures
associated with subtype checking.
1 Introduction
Java [5] is a strongly typed language with single
inheritance and multiple subtypes via the
interface mechanism. During the execution of a
normal Java program it is common to query an
object to see if it is a subtype of a particular
supertype. Such checks arise both from user-
written language constructs (instanceof, check-
cast) and automatically during object-array
stores [6] leading to the infamous array-store-
check problem. The most common form of
subtype check can be performed thousands of
times per second in normal programs. Larger,
longer running programs can see millions or
even billions of such checks. Because the
checks are so common it pays to have a very fast
subtype check.
A subtype S can be used in all contexts where
it's supertype T can be used. Objects of type S
can be stored into variables of type T. Fields in
T objects also exist in S objects so that
references to fields in a T object apply to S
objects as well. The standard Java class
hierarchy uses the subtype relation in the
obvious way. However, arrays also exist in the
hierarchy all arrays are subtypes of Object.
Since multidimensional arrays are implemented
as arrays-of-arrays, Object[][] is a subtype of
Object[]. Because of interfaces, the subtype
relation is not a simple tree.
A Java array variable is polymorphic, in that the
variable can point to various kinds of arrays.
This leads to the infamous "store check"
requirement, where every reference stored into
an array must be checked against the element
type of the array. This is inconvenient, because
unlike instanceof and checkcast, the supertype
used in aastores may vary at runtime.
HotSpot's subtype check uses a variant of
Cohen's display [3], which is similar to the
Pascal display but with types instead of
execution frames. Our display is fixed size, so
we can skip the range check. We include array
types in the display. We also use a 1-element
cache for interfaces (and arrays of interfaces).
Finally, we organize these structures so a simple
load, compare & branch sequence can report
success without needing to distinguish between
interface supertypes, array supertypes or other
supertypes. For regular classes, failures are
reported without more work. For interfaces, if
the first test fails a small amount of extra work
is required. Emperically this failure is extremely
rare.
This paper presents HotSpot's approach to
answering subtype checks quickly. Section 2
covers some background on HotSpot. Section 3
defines some terms and introduces the data
structures used to solve the problem. Section 4
describes our solution in more detail. Section 5
covers related work. Section 6 demonstrates it's
performance. Section 7 has our conclusions.
Fast Subtype Checking in the HotSpot JVM
Cliff Click and John Rose
Cliff.Click@Sun.com, John.Rose@Sun.com
Sun Microsystems
4140 Network Circle
Santa Clara, CA 95054
1 HotSpot Background
HotSpot is Sun's Java Virtual Machine
implementation. It is a mixed mode system,
interpreting Java bytecodes until “hotspots” are
recognized. It then invokes an optimizing
compiler to generate native code to implement
the bytecode semantics. This optimizer is fairly
aggressive, generating high quality code on three
different CPU platforms. The optimizer is
discussed more fully elsewhere [7].
HotSpot's previous subtype checking mechanism
was a 2-element positive software cache. When
asking if S was a subtype of T, we would check
to see if T was in S's cache. If not, we would
upcall into the VM, determine the answer and
update the cache. Failed tests always required a
VM upcall. The SpecJVM98 benchmark
required about 4 million upcalls to handle the
negative tests, painful but still acceptable.
However, the old subtype check performed very
badly on SpecJBB. SpecJBB rotates some
subtypes through 3 positive tests. Since the
cache held only 2 elements, we had frequent
cache misses, VM upcalls and software cache
updates. On large (64-way) servers, the
software cache updates required hardware cache
traffic between CPUs, leading to cache ping-
ponging. Thus turned out to be a very large
limit to scalability. When we replaced the
subtype check we saw more than a factor of 2
increase in our high end SpecJBB scores.
HotSpot's optimizer has a strongly typed
intermediate representation: best known object
types are propagated throughout the optimizer.
These types are available to optimize calls (e.g.,
converting virtual dispatch into static dispatch or
allowing inlining) and to optimize subtype
checks [2][11].
In our system, subtype checks are inserted into
the compiler's IR in their most general form.
The optimizer is responsible for special-casing
the check when more specific information is
available. Only the special case of a known
supertype is inserted into the IR already partially
optimized to help lower compile times;
removing this special case will not change the
generated code.
HotSpot performs the common form of subtype
checks, comparing an unknown subtype against
a known supertype, with a short load, compare
& branch idiom. This idiom covers nearly all of
the checks in a run of SpecJVM98 and
SpecJBB2000. A check against an unknown
supertype requires a 2nd load. In a very few
cases a linear array scan is required. Even this
scan is fairly fast, taking about 1-2 clocks per
array member (the array generally contains only
a list of implemented interfaces so is usually
quite short).
The algorithm is simple and easy to implement.
The code is entirely inline with no upcalls into
the VM.1 Class loading does not require recom-
puting any complex data structures.
This implementation bests prior efforts in a
number of ways. Dynamically, essentially all
checks require a single load, compare & branch.
Checks for array supertypes and array stores are
handled with the same general mechanism.
Class loading and unloading does not require
modifying any data structures (other than the
loaded class). Interface loading does require
checking existing classes, and some small work
in conforming classes. In all cases the checking
code is small enough to be completely inlined in
the compiled code; we never make a call into
native VM code.
1 Definitions
A klass is the concrete Hotspot VM
representation of any non-null reference type
(Java class, interface, or array type).
We will use klass T as the canonical supertype
and klass S as the canonical subtype. The goal is
to determine whether S is a subtype of T or not.
1 A klass T is a primary type iff T is a proper
class2, or an array of a primary type, or an
array of primitive values. Interfaces and
arrays of interfaces are excluded.
1 The creation of VM calls is a significant compile-time
cost in HotSpot. We want to avoid compiling the VM
call even if the call is dynamically very rare.
2 The class must also be loaded. Unloaded classes exist
in the system as placeholders for a variety of reasons
(e.g. exception handlers). They must be loaded before
doing a type check against them.
Note that T is a primary type independently of
any subtypes it may have. Under this definition,
even a final class can be called a primary type.
The definition is recursive such that deeply
nested arrays whose inner element type is a
primary type are also primary types.
Example: Object, Object[], Object[][],
int[], int[][], String, and String[] are
primary types.
We say "T is a primary supertype of S" iff T is a
primary type and T is a supertype of S.
1 The direct primary supertype T of klass S is
the minimal primary supertype of S. By
minimal we mean that any other primary
supertype of S is also a supertype of T.
Except for Object, the direct primary supertype
T exists and is unique. The set of all primary
types forms a tree under the subtype relation.
For standard classes this definition mirrors the
Java language class hierarchy. For an array of
primitive types or an array of Object the direct
primary supertype is Object. For a 1-dimen-
sional array of non-Object klasses the direct
primary supertype is an array of the element
type's direct primary supertype. For multiple-di-
mension arrays, it's an array with the same inner
element and with 1 fewer dimensions.
Example: The direct primary type of Integer is
Number. The tree for Integer[][] is:
Object
Object[]
Number[]
Integer[]
Integer[][]
Object[][] is not a supertype of Integer[]
[], because it is not legal to store an Object[]
into the first dimension of an Integer[][]
while it is legal for an Object[][].
1 A display is an array holding S's primary
supertypes ordered by the tree relation [3][4].
Object is in array element 0 and S's direct
primary supertype is the second to last
element of the array and S is the last element.
The code S.display[D] refers to the D'th ele-
ment of S's display. If the display is stored di-
rectly in klass S, then this can be implemented as
a single array load off of S.
1 The depth of any klass T is zero for Object
and otherwise is one plus the depth of T's
direct primary supertype.
Depth is tree depth within the primary type
relation. At the time a class is loaded it's direct
primary type is known. From this it's depth is
also known. In our implementation the depth
cached in a klass field. Thus S.depth report's
S's depth in one memory reference.
The depth of Object is 0, the depth of String
and Number is 1, and the depth of Integer is
2. The depth of Integer[][] is 4. Every
interface inherits directly from Object, so the
depth of Comparable is 1. A klass S's display
is of length depth+1.
1 A klass T is a secondary type iff T is a
interface or an array of a secondary type.
Every type is either a primary type or a
secondary type but not both.
As before, T is a secondary type without
reference to any subtypes it may have. We say
"T is a secondary supertype of S" iff T is a
secondary type and T is a supertype of S.
Example: Cloneable, Cloneable[], and
Cloneable[][] are secondary types.
The secondary types of Integer[][] are (in
no particular order):
Cloneable Serializable
Cloneable[] Serializable[]
Cloneable[][] Serializable[][]
We define for every klass S a secondary
supertype list which is an unordered list of S's
secondary supertypes3. In order to encourage
sharing of these lists, we exclude S from S's own
list. In code we shorten the name to
S.s_s_array. We will use the display and
the secondary supertype list to build a complete
subtype check algorithm.
3 HotSpot implements it as a standard Java Object array
and keeps it in the heap. When the last klass that
refers to the array is unloaded, the array will be GC'd
as a normal object.
1 The Subtype Check
Older implementations of HotSpot4 (and several
other JVMs) required an upcall into the VM to
handle some subtype checking cases. We
decided we wanted all tests to be handled inline,
including negative results. We decided not to
implement one of the algorithms that always
reports an answer in constant time, because
these algorithms are complex, require extensive
(and expensive) recomputation when classes are
loaded, and because their constant factors aren't
always the best [12].
Conceptually, we handle the cases of a primary
supertype and a secondary supertype separately.
Our algorithm starts out with the same code for
both. We'll discuss the primary supertype check
first, then the secondary check, then we'll show
how to combine them.
The Primary Supertype Check
If T is a primary supertype of S, its position on
S's display is determined by T's depth, without
reference to S. For example, if T's depth is 0
(i.e., T is Object), then T will be the first
element in S's display. Or, if T's depth is 1 (e.g.,
T is java.lang.Throwable), T will be the second
in S's display. If T is not a supertype of S, then
T's depth might be greater than the size of S's
display. We can make a subtype check by
range-checking T's depth, then seeing if T is in
S's display:
S.is_subtype_of(T) :=
return (T.depth < S.depth)
? (T==S.display[T.depth])
: false;
This technique does not map perfectly to a few
machine instructions. It includes an undesirable
range check. The layout of Hotspot klasses does
not allow us to place the variable-length display
at a fixed offset in the klass layout (we could
place the display elsewhere, but this will require
another layer of indirection and cost another
dependent memory operation).
We can adjust the above algorithm ensure that
checking an unknown type S against a known
4 The HotSpot JVM shipping in JDK 1.3.1 or earlier
used a 2-element positive cache. If the cache failed,
the VM was called to find the correct answer and
update the cache.
primary type T can be done with one memory
access by making the display an embedded array
within the klass layout and giving it a fixed size
[8]. We refer to the fixed size as display.length.
Then we can enjoy a one-load implementation of
is_subtype_of:
S.is_subtype_of(T) :=
return T==S.display[T.depth];
In effect, we remove the need for range checks
and floating arrays by putting a limit on the
depth of all klasses. This is easily done, by
adjusting the definitions of primary and
secondary types:
1 A klass is an restricted primary type iff it is a
proper class, or an array of a primary type,
or an array of primitive values, and also
that klass's depth is less than
display.length.
2 A klass is a restricted secondary type iff it is
an interface, or an array of a restricted
secondary type, or any klass of depth >=
display.length.
Again these two definitions are mutually
exclusive (and form an equivalence relationship
on all klasses).
By using these restricted definitions, we can
constrain the size of the display. The type
checks against restricted primary types will go
faster because of the removal of the indirection
and the range check.
Overflows in the display (because you are too
deep in the type hierarchy) go into the secondary
supertype list. Type checks against very deep
classes will be slower.
Checking secondary types
In order to quickly check secondary types, we
could introduce a global numbering on them [1]
[3][12]. It is easy to envision a type check based
on a global type numbering which indexes a
two-dimensional bit table.
However, this is difficult to do well in the
presence of dynamic loading and unloading of
types. We use a simpler technique which
performs a linear search on the secondary
supertypes list. It is slower, but still fairly fast
because we can do a short linear search inline
without visiting the VM. Also the typically
length of secondary supertype lists is very short
and linear scans are very friendly to modern chip
architectures, typically running 1-2 clocks per
list element. The scan operation is implemented
with hand-written assembly instructions. Since
it is uncommon, we optimize it for space instead
of speed.
A 1-element cache avoids the need for the linear
search in most cases. A secondary supertype
check, with a one-element supertype cache,
looks like this:
S.is_subtype_of(T) := {
if (S.cache == T) return true;
if (S == T) return true;
if( S.scan_s_s_array(T) ) {
S.cache = T;
return true;
}
return false;
}
Combining the Checks
The last problem to solve is quickly determining
whether a given test klass is a primary or
secondary type. This might be done e.g., by
means of a bit in the klass header. We do it by
introducing a new field check_offset, which
holds the bytewise offset of a klass field relative
to the base of the klass layout. Thus
S[check_offset] refers to some klass stored
in a field in klass S.
For a restricted primary type T, check_offset
is the offset of display[T.depth] and
S[T.check_offset] is the same as
S.display[T.depth].
For a restricted secondary type T, the
check_offset is the offset of cache, and
S[T.check_offset] is the same as S.cache.
The combined (unoptimized) test is:
S.is_subtype_of(T) := {
int off = T.check_offset;
if (T == S[off]) return true;
if (off != &cache) return false;
if (S == T) return true;
if( S.scan_s_s_array(T) ) {
S.cache = T;
return true;
}
return false;
}
The first test is either a check of the display (if T
is a restricted primary type) or a check of the
cache (for other T). If it passes, we have a Yes
result and are done. If it fails, we still have to
figure out if T is a restricted primary type or not.
The second test determines what T is. If T is a
restricted primary type, it's check_offset is in
the display area, and we have a definite No
answer. Otherwise T is a restricted secondary
type and we missed in the cache. At this point
we have to do a self-check and then a linear scan
of S's secondary supertype list. Empirically this
happens extremely rarely.
The combined check has some obvious
simplifications for the common case.
Optimizing the common case
This last check, in it's most general form, is the
code shape used in HotSpot. The code is
exposed to the optimizer and the optimizer will
special-case it via constant folding.
The most common case is where T is a known
type, e.g. instanceof and checkcast bytecodes.
Since T is a constant, T.check_offset and
off are constants and S[off] amounts to
loading a field at a small constant offset from
klass S. Also &cache is always a compile-time
constant. If off is also constant then the
expression (off != &cache) constant folds
away. For restricted primary types the offset is
in the display, not the cache, so test is true and
the check always returns false. The optimized
code shape for known restricted primary type T
is:
S.is_subtype_of(T) :=
return S[#T.check_offset] == #T;
The # denotes expressions which are compile-
time constants. This whole expression compiles
to a simple load, compare & branch. For known
restricted secondary types T, the optimized code
shape is:
S.is_subtype_of(T) := {
if (S.cache == #T) return true;
if (S == #T) return true;
if (S.scan_s_s_array(#T)) {
S.cache = #T;
return true;
}
return false;
}
If the cache and check_offset fields are
placed near the zero-th element of the display
array then the entire check will only use 1 or 2
memory cache lines.
Array Stores
A Java array variable is polymorphic, in that the
variable can point to various kinds of arrays.
This leads to the infamous "store check"
requirement, where every reference stored into
an array must be checked against the element
type of the array. This is inconvenient, because
the element type may vary at runtime.
The Jalapeño [1] paper notes that nearly all array
variables are monomorphic, in the sense that
they hold (at runtime) references to objects of
exactly the statically determined array type and
not a subtype thereof.
This means that when an element is stored to an
array the array itself may be verified (with a
simple load, compare & branch) to be of exactly
the declared type. Verifying the array happens
only once for all stores into the array. Since the
Java verifier verifies array stores against the
declared type, further checking can be short-cir-
cuited.
Thus, most array store checks can optimistically
verify an element type in one memory reference.
In those rare cases where the optimistic
technique fails because an array variable is
polymorphic, HotSpot recompiles the code
without the check. The array element type must
be loaded from the array klass before a standard
subtype check begins.
Checking Non-Constant Supertypes
Most checks compare a non-constant reference
against a statically determined type, so that the
check_offset value is a compile time
constant. This is always true of checkcast and
instanceof operations for instance.
Like the failed array short-circuit test, the
reflective method Class.isInstance can produce
checks against non-constant types. The
compiler implements it as an intrinsic. Unless
the class operand is constant, the intrinsic code
will have to use the most general form of the
check. This includes a load of the
check_offset field and a 2nd test to see if this
is a primary supertype or secondary supertype.
Dynamically, this 2nd test fails extremely rarely.
When it does the checking code must do a linear
scan of the secondary supertype array.
Minor Optimizations
The most common and most optimized form of
the check involves a load, compare & branch.
The compare is against a compile-time constant,
which is the klass T. Intel IA32 cpus can do the
load and compare vs. a 32-bit constant in one
instruction. SUN Sparc chips require a separate
load and are limited to 13-bit immediates. Thus
Sparc chips require an extra 'sethi' instruction
to make the full constant. Typically, there are
plenty of issue slots and registers available to
form the constant, so this isn't often a problem.
However, this small slowdown can be removed
by numbering all classes with a small integer
constant. The small constant is used in lieu of
the klass pointer in the display, the cache,
secondary supertype lists and in the generated
code. As long as there are less than 8K classes
loaded, Sparc will only need 3 instructions.
With this optimization and an 8-element display,
our check requires 8*2 bytes per class, plus the
cache and offset fields, plus a pointer to the
(shared) secondary supertype list or little more
than 6 words per class.
For the sake of the instanceof operator, we may
desire to speed negative results on secondary
types by including a negative cache in klass S
which contains the last secondary supertype that
failed to be a supertype of S.
Arrays always have the same two-element list of
secondary types. Secondary supertype arrays
(s_s_array) are immutable and hence may be
shared between klasses with the same supertype
lists. We make sharing more likely among
subclasses by refusing to put the klass itself on
its own supertype lists, even though that requires
an extra compare-and-branch in the shape of the
type-checking code, to cover the case of T==S.
1 Related Work
Cohen [3] presents a display-based technique for
constant-time dynamic checking. It has variable
sized displays, requiring a range check, and
doesn't handle Java interfaces. Padding the
arrays to avoid the range check was done by
Pfister, et. al [8].
display.length 1 2 3 4
primary 1,464,914 8,839,736 9,668,198 10,016,074
!primary 0 377,148 654,151 1,188,958
cache 8,640,060 1,508,841 828,912 804,298
self 587,122 581,350 577,967 287,054
!cache1 557,221 539,204 451,748 157,502
!cache2 432,263 309,926 194,151 4,907
scan 460,346 222,515 77,365 45,016
!scan 362,626 125,832 52,060 743
Total 12,504,55
2
12,504,55
2
12,504,552 12,504,552
The work of Vitek, Horspool and Krall [12]
focusses on low memory footprint and a fast
check for all cases, including the fairly rare
interface checks. This comes at the expense of a
complex algorithm used during class loading.
All the equivalents of our display need to be re-
computed whenever a class (or interface) is
loaded or unloaded. They show that
recomputing their data structures takes about 10
msec; which for a program which loads 2000
classes comes to about 20 seconds. They
propose several solutions involving lazy
recomputation; most seem to involve an extra
test per check to verify the data is valid (and an
upcall into the VM if it is not).
Alpern, Cocchi and Grove [1] published the
check used in the Jalapeño VM. Common cases
are also fast. However, it is somewhat more
complex than our solution and involves VM
upcalls in some cases. If the array type short-
circuit test fails, they devolve into a series of
tests to break out various cases. We also
implement the short-circuit test, but if it fails our
solution handles array types like other types.
Jalapeño uses a separate mechanism for
interfaces, involving a 3-valued bit array (the
“trit” array). The first time a class is queried
against a particular interface, a VM call is
required and it's tri-value entry moves from
“Maybe” to either “Yes” or “No”. They require
at least 2 tests always (one to tell the “Maybe”
and another to tell “Yes/No”). After our cache
is warmed up, we only require the 1 check to
report a “Yes”. Emperically, “No” answers
against interface checks are exceedingly rare. In
this case HotSpot does require more code than
IBM's solution.
1 Performance
The following numbers were derived by
instrumenting the VM and running in -Xint
mode with all type checks redirected to the
runtime. (I.e., the interpreter's subtype check
logic was disabled.) We tested a short run of
SpecJVM98 [9]. The exact run was:
java -Xint SpecApplication _200_check
_202_jess _213_javac _228_jack
Inside the interpreter we annotated our checking
algorithm to gather statistics. In addition to the
presented algorithm, we also modelled a couple
of negative caches. A negative cache holds the
klass of the last interface query to fail. It allows
us to avoid scanning the secondary supertype
array when we are going to fail anyways.
The negative caches were experimental. These
numbers indicate that they are completely
unused by SpecJVM98. Hypothetical
applications which do if/then/else logic on
interface types might want them.
Table 1 shows the results of this short
SpecJVM98 run for different display.length
sizes. The 'primary' and '!primary' rows show
primary type checks that passed or failed. The
Table 1. Interpreted type checks in a short run of SpecJVM98
'cache' row shows secondary types that hit in the
cache. The 'self' row reports self-type checks
that passed. The 'scan' and '!scan' rows report
the results of scanning the secondary supertype
lists. The '!cache1' and '!cache2' report hits in a
one- or two- element negative cache.
As the size of the display increases, the number
of times we must rely on the cache and scan
drop off. Optimal display length for
SpecJVM98 is 5. To be robust, we set ours to 8
which is greater than needed for the all classes in
rt.jar.
These numbers show that the 1-element cache is
important: It hits on 6.5% of all queries, or
99.9% of all secondary queries. Basically it
makes the cost of searching the secondary array
disappear completely. Given that even in the
best case a secondary array search requires two
loads in addition to the first two loads, this
means that the cache saves at least 7% of the
total cost of type checking. The cache also adds
a margin of robustness to the performance of the
VM, which is important since we do not know
all the applications that will be performance
sensitive.
The 'self' case is not important to performance,
but allows for footprint reduction, by enabling
sharing of more secondary arrays. The idea is
that a class might be its own secondary type, but
it can share the secondary types array with its
own superclass. It might be possible to get rid
of the 'self' case with a small cost to footprint.
This sharing is key for arrays, all of which have
the same Cloneable and Serializable interface
classes.
Note that with display.length 1 the only primary
supertype is Object itself. Therefore, the
11.7% primary hits in this case are exactly all
tests against Object itself. Since the javac
compiler never emits instanceof or checkcast
operations against that type, it follows that
11.7% of all type checking operations, in this
experiment, were due to aastores to Object
arrays.
Compiled Check Performance
We then implemented this new check in our
compiler and tested with it. The compiler is
inherently less deterministic than the interpreter.
Due to vagaries of OS scheduling, methods will
have executed different amounts at the time they
are compiled, leading to different inlining
patterns and different qualities of optimization.
On the whole these effects average out, so we
ran our benchmarks three times and averaged the
results.
The optimizer performs a fairly deep analysis
and uses this information to simplify and
remove type checks. Hence, it's difficult to
determine if a piece of code is part of a type
check or not. The optimizer constant-folds
checks, commons up portions of checks and uses
CHA (amongst other analyses) [2][11] to
determine when checks can be removed.
To better track what happens to the type checks,
we tagged branches in the compiler's IR. When
parsing bytecodes and building the IR, we know
exactly which branches are part of which type
checks. We tag the branch and allow the
optimizer to remove tests as best it can. During
code emission, we emit extra code to increment
counters for each tagged branch.
Unlike the interpreter, the compiler ends up
emitting a variety of code shapes for different
optimized tests. We tagged each code shape
differently and report them on separate lines.
We tested a full run of SpecJVM98 and a short
4-warehouse run5 of SpecJBB2000 [10]. Higher
warehouse numbers can be approximated by
scaling by the count of warehouses.
The 'array short-circuit' line shows the number
of times we executed the test designed to short-
circuit the need for an array store check. This
test never failed; had it failed we would need
further checks on the stored value.
The 'known primary' lines show the number of
times we tested against a known primary
supertype and whether that test passed or failed.
The 'known secondary' line shows the number of
times we checked the 1-element cache and got a
hit. This cache never failed in compiled code.
Hence the 'self' check and 'scan' code was never
5 We used a short run of SpecJBB for two reasons: a big
run requires time on an large expensive server, and
after compiling some 250 methods, no new code is
compiled. The same old code is executed repeatedly,
increasing the absolute value of the counts but not their
relative values.
executed. The interpreter must have made a first
cache miss and warmed up the cache.
The 'unknown hit' line represents a compiled
general check, similar to what the interpreter
always does. We always passed the test, but we
didn't further check (by generating more code)
whether it was a primary or secondary type that
passed. The 'exception' lines are compiled
forms of exception handler checks, where
frequently thrown exceptions are being
dispatched to their appropriate handler. We
dropped the negative caches.
SpecJVM98 SpecJBB
array short-circuit 23,281,902 81,862,300
!array short-circuit 0 0
known primary 401,874,527 460,540,064
known !primary 4,532,923 26,898,749
known secondary 3,509,940 0
known !secondary 0 0
self 0 0
scan 0 0
!scan 0 0
exception 258,868 0
!exception 271,699 0
unknown hit 2,922 0
Table 2: Compiled type checks
The numbers in Table 2 above show several
things. First, positive primary supertype checks
outweigh all other tests by at about 20 to 1. A
full run of SpecJVM98 makes over 400 million
primary supertype checks; a 100-warehouse run
of SpecJBB will make billions of checks.
Hence, it is these tests which need to be most
optimized; tests against secondary types are less
important. Next, the array short-circuit and the
1-element cache never fail in compiled code (the
cache fails in the interpreter and gets warmed up
there). There's a handful of other checks made,
but their frequency is very low.
1 Conclusion
We have described a type-checking mechanism
which usually produces an answer in one
memory reference for instanceof and checkcast,
one or two memory references for array store
checks, and two memory references for
Class.isInstance. For some large benchmarks it
always produces an answer using only a load,
compare & branch.
The memory cost for the display, cache, offset
field and secondary supertype lists is about 24
bytes per klass. The code is short and simple,
easy to compile and optimize. No VM upcalls
are required. No complex data structures need
recomputing when classes are loaded. The
check performs dramatically better than
HotSpot's prior check, eliminating a memory
bottleneck on large SpecJBB runs and increasing
SpecJVM98 scores by 1-2%.
References
[1] B. Alpern, A. Cocchi, and D. Grove.
Dynamic Type Checking in Jalapeño. In
the 2001 USENIX Java™ Virtual Machine
Symposium.
[2] D. F. Bacon and P. F. Sweeney. Fast static
analysis of C++ virtual function calls. In
Conference on Object Oriented
Programming Systems, Languages &
Applications (OOPSLA'96).
[3] N.H. Cohen. Type-extension tests can be
performed in constant time. ACM
Transactions on Programming Languages
and Systems, 13(4):626-629, 1991.
[4] E.W. Dijkstra. Recursive Programming.
Numer. Programming, (2):312-318, 1960.
[5] James Gosling, Bill Joy, and Guy Steele.
The Java Language Specification. Addison
Wesley, 1996.
[6] Tim Lindholm and Frank Yellin. The Java
Virtual Machine Specification, Second
Edition. Addison Wesley, 1998.
[7] M. Paleczny, C. Click, C. Vick, The Java
HotSpot Server Compiler. In the 2001
USENIX Java™ Virtual Machine
Symposium.
[8] B.H.C. Pfister and J. Templ. Oberon
technical notes. Research Report 156,
Eidgenossische Techniscle Hochschule
Zurich- Departement Informatik, March
1991.
[9] The Standard Performance Evaluation
Corporation. SPEC JVM98 Benchmarks.
http://www.spec.org/osg/jvm98, 1998.
[10] The Standard Performance Evaluation
Corporation. SPEC JBB2000 Benchmarks.
http://www.spec.org/osg/jbb2000, 2000.
[11] V. Sundaresan, L. Hendren, C.
Razafimahefa, R. Vallée-Rai, and P. Lam,
E. Gagnon, and C. Godin. Practical virtual
method call resolution for Java. In
Conference on Object Oriented
Programming, Systems, Languages &
Applications (OOPSLA'2000).
[12] J. Vitek, A. Krall, and R. N. Horspool.
Efficient Type Inclusion Tests. In
Conference on Object Oriented
Programming Systems, Languages &
Applications (OOPSLA'97).
... Our scheme works with multiple inheritance, separate compilation, and dynamic class loading. [5] ✗ ✓ ✗ ✓ NHE [10] ✗ † ✓ ✓ ✗ Packed encoding [15] ✗ † ✓ ✓ ✓ ‡ PQ-Encoding [17] ✗ † ✓ ✓ ✗ R&B [13] ✗ † ✓ ✓ ✓ ‡ Gibbs and Stroustrup [8] ✓ ✓ ✓ ✗ Perfect Hashing [6] ✗ ✓ ✓ ✓ HotSpot JVM [4] ✓/✗ ✗ ✓ ✓ LLVM [1] ✓ ✗ ✓ ✗ † The per-class space requirement is very small in practice. ‡ Requires non-trivial recomputation when dynamically loaded classes change the hierarchy. ...
... The dynamic subtype checking implementation of the HotSpot JVM [4] uses a variant of Cohen's display that requires constant space. This works because all Java classes (except Object) have exactly one superclass; multiply inheritable interfaces are handled out-of-band. ...
... A dynamic subtype test against secondary supertypes that fails necessarily requires a linear scan. The original paper [4] reports experiments with one-and two-element negative caches; these caches were eventually dropped since failing tests against secondary supertypes were not found to be common on SpecJVM98. ...
Conference Paper
We address the problem of dynamically checking if an instance of class S is also an instance of class T. Researchers have designed various strategies to perform constant-time subtype tests. Yet, well-known production implementations degrade to linear search in the worst case, in order to achieve other goals such as constant space and/or efficient dynamic class loading. The fast path is usually optimized for subtype tests that succeed. However, in workloads where dynamic type tests are common, such as Scala's pattern matching and LLVM compiler passes, we observe that 74%--93% of dynamic subtype tests return a negative result. We thus propose a scheme for fail-fast dynamic subtype checking. We assign each type a randomly generated type identifier with fixed size and fixed parity. In the compiled version of each class, we store a fixed-width bloom filter, which combines the type identifiers of all its transitive supertypes. At run-time, the bloom filters enable fast refutation of dynamic subtype tests with high probability. If such a refutation cannot be made, the scheme falls back to conventional techniques. This scheme works with multiple inheritance, separate compilation, and dynamic class loading. A prototype implementation of fail-fasts in the JVM provides provides 1.44x--2.74x speedup over HotSpot's native instanceof, on micro-benchmarks where worst-case behavior is likely.
... Hotspot est la machine virtuelle Java de référence, développée par Oracle. Le mécanisme de sous-typage de Hotspot est publié et décrit dans [Click and Rose, 2002]. ...
Full-text available
Thesis
Cette thèse traite des langages à objets en héritage multiple et typage statique exécutés avec des machines virtuelles. Des analogies sont à faire avec Java bien que ce langage ne soit pas en héritage multiple.Une machine virtuelle est un système d'exécution d'un programme qui se différencie des classiques compilateurs et interpréteurs par une caractéristique fondamentale : le chargement dynamique. Les classes sont alors découvertes au fil de l'exécution.Le but de la thèse est d'étudier et de spécifier une machine virtuelle pour un langage à objets en héritage multiple, pour ensuite spécifier et implémenter des protocoles de compilation/recompilation. Ces derniers devront mettre en place les optimisations et les inévitables mécanismes de réparations associés.Nous présenterons d'abord l'architecture et les choix réalisés pour implémenter la machine virtuelle : ceux-ci utilisent le langage Nit en tant que langage source ainsi que le hachage parfait, une technique d'implémentation de l'héritage multiple.Ensuite nous présenterons les spécifications pour implémenter des protocoles de compilation/recompilation ainsi que les expérimentations associées.Dans ce cadre, nous avons présenté une extension des analyses de préexistence et de types concrets, pour augmenter les opportunités d'optimisations sans risque. Cette contribution dépasse la problématique de l'héritage multiple.
... For instance, the proposal of Palacz and Vitek [2003] is not inherently incremental, so it yields potentially high loadtime overhead, together with extra run-time indirections. All other techniques, for instance[Alpern et al. 2001] or[Click and Rose 2002], are not time-constant. The only exception might be the proposal byGagnon and Hendren [2001] of using direct access tables, that instead does not meet (ii). ...
Article
Late binding and subtyping create run-time overhead for object-oriented languages, especially in the context of both multiple inheritance and dynamic loading, for instance for Java interfaces. In a previous paper, we have proposed a novel approach based on perfect hashing and truly constant-time hashtables for implementing subtype testing and method invocation in a dynamic loading setting. In this rst study, we based our eciency assessment on Driesen's abstract computational model from the time standpoint, and on large-scale benchmarks from the space standpoint. The conclusions were that the technique was promising but required further research in order to assess its scalability. This article presents some new results that further highlight the benets of this approach. We propose and test both new hashing functions and an inverted problem which amounts to selecting the best class identiers in order to minimize the overall hashtable size. Experiments within an extended testbed with random class loading and under reasonable assumptions about what should be a sensible class loading order show that perfect hashing scales up gracefully. Furthermore, we tested perfect hashing for subtype testing and method invocation in the Prm compiler and compare it with the coloring technique that amounts to maintaining the single inheritance implementation in multiple inheritance. The results exceed our expectations and conrm that perfect hashing must be considered for implementing Java interfaces.
Article
TypeScript is a dynamically typed language widely used to develop large-scale applications nowadays. These applications are usually designed with complex class or interface hierarchies and have highly polymorphic behaviors. These object-oriented (OO) features will lead to inefficient inline caches (ICs) or trigger deoptimizations, which impact the performance of TypeScript applications. To address this problem, we introduce an inline caching design called hidden inheritance (HI). The basic idea of HI is to cache the static information of class or interface hierarchies into hidden classes, which are leveraged to generate efficient inline caches for improving the performance of OO-style TypeScript programs. The HI design is implemented in a TypeScript engine STSC (Static TypeScript Compiler) including a static compiler and a runtime system. STSC statically generates hidden classes and enhanced inline caches, which are applied to generate specialized machine code via ahead-of-time compilation (AOTC) or just-in-time compilation (JITC). To evaluate the efficiency of this technique, we implement STSC on a state-of-the-art JavaScript virtual machine V8 and demonstrate its performance improvements on industrial benchmarks and applications.
Article
Clojure was designed to be a general-purpose, practical functional language, suitable for use by professionals wherever its host language, e.g., Java, would be. Initially designed in 2005 and released in 2007, Clojure is a dialect of Lisp, but is not a direct descendant of any prior Lisp. It complements programming with pure functions of immutable data with concurrency-safe state management constructs that support writing correct multithreaded programs without the complexity of mutex locks. Clojure is intentionally hosted, in that it compiles to and runs on the runtime of another language, such as the JVM. This is more than an implementation strategy; numerous features ensure that programs written in Clojure can leverage and interoperate with the libraries of the host language directly and efficiently. In spite of combining two (at the time) rather unpopular ideas, functional programming and Lisp, Clojure has since seen adoption in industries as diverse as finance, climate science, retail, databases, analytics, publishing, healthcare, advertising and genomics, and by consultancies and startups worldwide, much to the career-altering surprise of its author. Most of the ideas in Clojure were not novel, but their combination puts Clojure in a unique spot in language design (functional, hosted, Lisp). This paper recounts the motivation behind the initial development of Clojure and the rationale for various design decisions and language constructs. It then covers its evolution subsequent to release and adoption.
Conference Paper
In a programming language, the choice of a generics typing policy impacts both typing test semantics and type safety. In this paper, we compare the cost of the generic policy chosen and analyze its impacts on the subtype test performance. To make this comparison we implement two compilers for the Nit language. One applies covariant policy using a homogeneous implementation and the second applies an erased policy and implementation. We compare the time efficiency of our solution against engines for C++, C#, Eiffel, Java and Scala. Results show that our implementations give the best performances compared to existing solutions. We also compare covariant and erased policies on existing Nit programs using our two compilers. Results show that covariance does not imply significant overhead regarding subtype test performances. Due to the small difference between the costs of the different policies, the choice of a policy by a language designer should not be influenced by performance considerations.
Full-text available
Article
In order for a garbage collector to concurrently move an object while an application mutator thread accesses it, either read or write barriers are necessary. A read barrier places certain invariants on loaded values that allow the garbage collector and mutator to progress in parallel. However, the read barrier is performed on loads and can be viewed as an impediment to the performance of the application threads. This paper builds on the work of a highly efficiency concurrent garbage collector known as the Continuously Concurrent Compacting Collector (C4) which progresses the design of read barriers to create what is known as the Loaded Value Barrier (LVB). This paper's key insight is the dynamic number of LVBs may be dramatically reduced by a compiler using the invariants the LVB provides. The paper describes three examples of this class of transformation, and reasons about their correctness and performance. We are unaware of work describing compiler optimizations to elide read barriers or restructure code to cut their dynamic execution. We detail related work on improving read barrier efficiency.
Full-text available
Article
An abstract is not available.
Full-text available
Conference Paper
This paper addresses the problem of resolving virtual method and interface calls in Java bytecode. The main focus is on a new practical technique that can be used to analyze large applications. Our fundamental design goal was to develop a technique that can be solved with only one iteration, and thus scales linearly with the size of the program, while at the same time providing more accurate results than two popular existing linear techniques, class hierarchy analysis and rapid type analysis.We present two variations of our new technique, variable-type analysis and a coarser-grain version called declared-type analysis. Both of these analyses are inexpensive, easy to implement, and our experimental results show that they scale linearly in the size of the program.We have implemented our new analyses using the Soot frame-work, and we report on empirical results for seven benchmarks. We have used our techniques to build accurate call graphs for complete applications (including libraries) and we show that compared to a conservative call graph built using class hierarchy analysis, our new variable-type analysis can remove a significant number of nodes (methods) and call edges. Further, our results show that we can improve upon the compression obtained using rapid type analysis.We also provide dynamic measurements of monomorphic call sites, focusing on the benchmark code excluding libraries. We demonstrate that when considering only the benchmark code, both rapid type analysis and our new declared-type analysis do not add much precision over class hierarchy analysis. However, our finer-grained variable-type analysis does resolve significantly more call sites, particularly for programs with more complex uses of objects.
Full-text available
Conference Paper
The Java HotSpot TM Server Compiler achieves improved asymptotic performance through a combination of ob- ject-oriented and classical-compiler optimizations. Aggressive inlining using class-hierarchy analysis reduces function call overhead and provides opportunities for many compiler optimizations.
Full-text available
Article
Virtual functions make code easier for programmers to reuse but also make it harder for compilers to analyze. We investigate the ability of three static analysis algorithms to improve C++ programs by resolving virtual function calls, thereby reducing compiled code size and reducing program complexity so as to improve both human and automated program understanding and analysis. In measurements of seven programs of significant size (5000 to 20000 lines of code each) we found that on average the most precise of the three algorithms resolved 71% of the virtual function calls and reduced compiled code size by 25%. This algorithm is very fast: it analyzes 3300 source lines per second on an 80 MHz PowerPC 601. Because of its accuracy and speed, this algorithm is an excellent candidate for inclusion in production C++ compilers. 1 Introduction A major advantage of object-oriented languages is abstraction. The most important language feature that supports abstraction is the dynamic dispatch of...
Full-text available
Article
A type inclusion test determines whether one type is a subtype of another. Efficient type testing techniques exist for single subtyping, but not for languages with multiple subtyping. To date, the only fast constant-time technique relies on a binary matrix encoding of the subtype relation with quadratic space requirements. In this paper, we present three new encodings of the subtype relation, the packed encoding, the bit-packed encoding and the compact encoding. These encodings have di erent characteristics. The bit-packed encoding delivers the best compression rates: on average 85% for real life programs. The packed encoding performs type inclusion tests in only 4 machine instructions. We present a fast algorithm for computing these encoding which runs in less than 13 milliseconds for PE and BPE, and 23 milliseconds for CE on an Alpha processor. Finally, we compare our results with other constant-time type inclusion tests on a suite of 11 large benchmark hierarchies.
Full-text available
Article
A Java virtual machine (JVM) must sometimes check whether a value of one type can be can be treated as a value of another type. The overhead for such dynamic type checking can be a signi cant factor in the running time of some Java programs. This paper presents a variety of techniques for performing these checks, each tailored to a particular restricted case that commonly arises in Java programs. By exploiting compile-time information to select the most applicable technique to implement each dynamic type check, the run-time overhead of dynamic type checking can be significantly reduced. This paper suggests maintaining three data structures operationally close to every Java object. The most important of these is a display of identifiers for the superclasses of the object's class. With this array, most dynamic type checks can be performed in four instructions. It also suggests that an equality test of the run-time type of an array and the declared type of the variable that contains it can be an important short-circuit check for object array stores. Together these techniques result in significant performance improvements on some benchmarks.
Full-text available
Article
Method Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9.4.1 Inheritance and Overriding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9.4.2 Overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 9.4.3 Examples of Abstract Method Declarations . . . . . . . . . . . . . . . 190 9.4.3.1 Example: Overriding . . . . . . . . . . . . . . . . . . . . . . . . 190 9.4.3.2 Example: Overloading . . . . . . . . . . . . . . . . . . . . . . . 191 10 Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 10.1 Array Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 10.2 Array Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii 10.3 Array Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 10.4 Array Access...