ArticlePDF Available

Applying a Scalable CORBA Event Service to Large-scale Distributed Interactive Simulations

Authors:

Abstract and Figures

Next-generation distributed interactive simulations have stringent quality of service (QoS) requirements for throughput, latency, and scalability, as well as requirements for a flexible communication infrastructure to reduce software lifecycle costs. The CORBA Event Service provides a flexible model for asynchronous communication among distributed and collocated objects. However, the standard CORBA Event Service specification lacks important features and QoS optimizations required by distributed interactive simulation systems. This paper makes five contributions to the design, implementation and performance measurement of distributed interactive simulation systems. First, it describes how the CORBA Event Service can be implemented to support key QoS features. Second, it illustrates how to extend the CORBA Event Service so that it is better suited for distributed interactive simulations. Third, it describes how to develop efficient event dispatching and scheduling mechanisms that can s...
Content may be subject to copyright.
Applying a Scalable CORBA Event Service to Large-scale Distributed Interactive
Simulations
Carlos O’Ryan, Douglas C. Schmidt, Mayur Desphande J. Russell Noseworthy
coryan,schmidt,deshpanm @uci.edu jrn@objectsciences.com
Department of Electrical and Computer Engineering Object Sciences Corp.
University of California, Irvine Alexandria, VA, 22312
Irvine, CA, 92697
Abstract
Next-generation distributed interactive simulations have strin-
gent quality of service (QoS) requirements for throughput,
latency, and scalability, as well as requirements for a flexi-
ble communication infrastructure to reduce software lifecycle
costs. The CORBA Event Service provides a flexible model
for asynchronous communication among distributed and col-
located objects. However, the standard CORBA Event Service
specification lacks important features and QoS optimizations
required by distributed interactive simulation systems.
This paper makes five contributions to the design, imple-
mentation and performance measurement of distributed inter-
active simulation systems. First, it describes how the CORBA
Event Service can be implemented to support key QoS fea-
tures. Second, it illustrates how to extend the CORBA Event
Service so that it is better suited for distributed interactive sim-
ulations. Third, it describes how to develop efficient event
dispatching and scheduling mechanisms that can sustain high
throughput. Fourth,it describes howto use multicast protocols
to reduce network traffic transparently and to improve system
scalability. Finally, it illustrates how an Event Service frame-
workcan be strategizedto supportconfigurationsthat facilitate
high throughput, predictable bounded latency, or some combi-
nation of each.
Keywords: Scalable CORBA event systems, object-
oriented communication frameworks.
1 Introduction
Overview of distributed interactive simulations: Interac-
tive simulations are useful tools for training personnel to op-
erate equipment or to experience situations that are too expen-
sive, impractical, or dangerous to execute in the real world.
The advent of high-speed LANs and WANs has enabled the
development of distributed interactive simulations, where par-
ticipants are dispersed geographically. For example, military
Figure 1: Architecture of a Distributed Interactive Simulation
Application
units stationed around the world can participate in joint train-
ing exercises, with human-in-the-loop airplane and tank sim-
ulators. Internet gaming is another form of distributed in-
teractive simulation. In both examples, heterogeneous LAN-
based computer systems can be interconnected by high-speed
WANs, as depicted in Figure 1.
The QoS requirements on the software that support dis-
tributed interactive simulations are quite demanding. They
combine aspects of distributed real-time computing, with the
1
need for low-latency, high-throughput, multi-sender/multi-
receiver communication over wide-range of autonomous and
interconnected networks. Meeting these challenges requires
new software infrastructures, such as those described in this
paper.
Distributed interactive simulation systems, such as DIS [1],
have been based, historically, on Publish/Subscribe pat-
terns [2]. Participants in the simulation declare the data that
they supply and consume. To exchange this data, distributed
interactive simulation systems require an efficient and scalable
communication infrastructure.
Typically, each participant in these event-driven systems
consume and supply only a subset of the possible events in
the system. By nature, however, these systems can vary dy-
namically, e.g.consumers and suppliers can join and leave at
arbitrary times. Likewise, the set of events published or sub-
scribed to can also vary during the lifetime of the simulation.
It is not uncommonfor large-scalesimulations, such as syn-
thetic theater of war training (STOW) activities, to be com-
posed of hundreds or thousands of suppliers and consumers
that generate enormous quantities of events in real-time. Thus,
simulation communication infrastructures must scale up to
handle large event volumes, while simultaneously conserving
network resources by minimizing the number of duplicated
events sent to separate consumers. In addition, the system
mustavoid wasteful computation. Forinstance, it should avoid
sending events to consumers that are not interested, as well as
quickly rejecting those events if they are received. Moreover,
communication infrastructures must be flexible to cope with
different simulation styles that require different optimization
points, such as reduced latency, improvedthroughput,low net-
work utilization, and reliable or best-effort delivery.
Towards a middleware-based solution: Given sufficient
time and effort, it is possible to achieve the specific require-
ments of distributed interactive simulation applications by de-
veloping these systems entirely from scratch. In practice,
however, the environment in which these systems are devel-
oped places increasingly stringent constraints on time and ef-
fort expended on developing software. Moreover, the increas-
ing scarcity of qualified softwareprofessionals exacerbatesthe
risk of companies failing to complete mission-critical projects.
Therefore, unless the scope of software development required
for each project can be constrained substantially, it is rarely
realistic to develop complex simulation systems from scratch.
For these reasons, it is necessary that distributed interactive
simulation systems be assembled largely from re-usable mid-
dleware. Middleware is software that resides between appli-
cations and the underlying operating systems, protocol stacks,
and hardware in complex real-time systems to enable or sim-
plify how these components are connected [3]. When mid-
dleware is commonly available for acquisition or purchase, it
becomes commercial-off-the-shelf(COTS).
Employing COTS middleware shields software developers
from low-level, tedious, and error-pronedetails, such as socket
level programming [4]. Moreover, it provides a consistent
set of higher level abstractions [5, 6] for developing adap-
tive systems. In addition, it amortizes software lifecycle costs
by leveraging previous design and development expertise and
reifying key design patterns [7] into reusable frameworks and
components.
COTS middleware has achieved substantial success in cer-
tain domains, such as avionics mission computing [8] and
business applications. There is a widespread belief in the
distributed interactive simulation community, however, that
the efficiency, scalability, and predictability of COTS middle-
ware, such as CORBA [9], is not suitable for next-generation
large-scale simulation applications. Thus, if it can be demon-
strated that the overhead of COTS middleware implementa-
tions can be removed, the resulting benefits make it a very
compelling choice as the communication infrastructure for
large-scale simulation systems.
Our previous research has examined many dimensions of
high-performance and real-time CORBA ORB endsystem de-
sign, including static [10] and dynamic [5] scheduling, event
processing [8], I/O subsystem [11] and pluggable proto-
col [12] integration, synchronous [13] and asynchronous [14]
ORB Core architectures, systematic benchmarking of multi-
ple ORBs [15], patterns for ORB extensibility [7] and ORB
performance [16]. This paper extends our previous work [8]
on real-time extensions to the CORBA Event Service to show
how this service can support the QoS requirements of large-
scale distributed interactive simulations by using IP multicast
to federate multiple Event Channels and conserve network re-
sources. In addition,we describethe design of a flexibleEvent
Service framework that allows developers to select implemen-
tation strategies that are most appropriate for their application
domain.
The remainder of this paper is organized as follows: Sec-
tion 2 discusses the optimizations and extensions we added
to the standard CORBA Event Service to support large-scale
distributed interactive simulation applications; Section 3 de-
scribes how this new version of the CORBA Event Service
has been used in the implementation of the High Level Archi-
tecture (HLA) Run-Time Infracstructure (RTI); next Section 4
shows the results of several benchmarks performed on our im-
plementation,underdifferent workloads, and identifies the pri-
mary sources of overhead and Section 5 presents concluding
remarks.
2
OBJECT REQUEST BROKEROBJECT REQUEST BROKER
OBJECTOBJECT
SERVICESSERVICES
APPLICATIONAPPLICATION
INTERFACESINTERFACES
DOMAINDOMAIN
INTERFACESINTERFACES
COMMONCOMMON
FACILITIESFACILITIES
Figure 2: OMG Reference Model Architecture
2 Overview of TAO’s Real-time Event
Service
CORBA is a distributed object computing middleware spec-
ification [17] being standardized by the Object Management
Group (OMG). CORBA is designed to support the develop-
ment of flexible and reusable service components and dis-
tributed applications by (1) separating interfaces from (po-
tentially remote) object implementations and (2) automating
many common network programming tasks, such as object
registration, location, and activation; request demultiplexing;
framing and error-handling; parameter marshaling and demar-
shaling; and operation dispatching. Figure 2 illustrates the pri-
mary components in the OMG Reference Model architecture.
The ACE ORB (TAO) [10] is a freely available, open-source
implementation of CORBA.
Many distributed applications exchange asynchronous re-
quests using event-based execution models [18, 19, 20]. To
support these common use-cases, the OMG defined a CORBA
Event Service component in the CORBA Object Services
(COS) layer, as shown in Figure 2. The COS specifica-
tion [21] presents architectural models and interfaces that fac-
tor out common object services, such as persistence [22], se-
curity [23], transactions [24], fault tolerance [25], and concur-
rency [26].
2.1 Overcoming Limitations with the CORBA
Event Service
The standard COS Event Service Specification lacks sev-
eral important features required by large-scale distributed in-
teractive simulations. Chief among these missing features
are centralized event filtering, efficient and predictable event
dispatching, efficient use of network and computational re-
sources, periodic event processing, and event correlations
1
.
To resolve these limitations we have developed a Real-
time Event Service (RT Event Service) as part of the TAO
project [10]. TAO’s RT Event Service extends the COS Event
Service specification to satisfy the quality of service (QoS)
needs of real-time applications in many domains, such as
distributed interactive simulations, avionics, telecommunica-
tions, and process control.
The following discussion summarizes the features missing
in the COS Event Service and outlines how TAO’s Real-time
Event Service supports them.
2.1.1 Support for Centralized Event Filtering
In a large-scale distributed interactive simulation, not all con-
sumers are interested in all events generated by all suppli-
ers. Although it is possible to let each application perform
its own filtering, this solution wastes network and computing
resources. Ideally, therefore, the Event Service should send an
event to a particularconsumeronly if the consumerhas explic-
itly subscribed for it. Care must be taken, however, to ensure
that the subscription process used to support filtering does not
itself cause undue burden on distributed system resources.
It is possible to implement filtering using standard COS
event channels [21]. For instance, channels can be chained to
create an event filtering graph that consumers use to receive a
subset of the total events in the system. However, filter graphs
defined using standard COS Event Channels increase the num-
ber of hops a message must travel between suppliers and con-
sumers. This increased traversal overhead may be unaccept-
able for applications with low latency requirements. Likewise,
it hampers system scalability because additional processing is
required to dispatch each event.
To alleviate these scalability problems, therefore, TAO’s RT
Event Service provides filtering and correlation mechanisms
that allow consumers to specify logical OR and AND event de-
pendencies. When the designatedconditionsare met, the event
channel will dispatch all events that satisfy each consumers’
dependencies.
2.1.2 Efficient and Predictable Event Dispatching
To improve scalability and take advantage of advanced hard-
ware it may be desirable to have multiple threads sending
events to their final consumers. TAO’s RT Event Service can
be configured with an application provided strategy to assign
the threads that will dispatch events. The standard distribu-
tion also includes a number of selectable strategies to cover
the more common cases. The same component can be used
1
Correlation allows an Event Channel to wait for a conjunction of events
before sending it to consumer(s).
3
to achieve greater predictability and enforce scheduling deci-
sions at run-time, since an event can be assigned to a thread
with the appropriate OS priority.
2.1.3 Efficient use of Network and Computational Re-
sources
TAO’s RT Event Service can be configured to minimize net-
work traffic, using multicast protocols to avoid duplicate net-
work traffic, and building federations of Event Service that
share filtering information to minimize or eliminate the trans-
mission on unwanted events to a remote entity.
2.2 TAO’s RT Event Service Architecture
TAO’s RT Event Service is implemented using the Mediator
pattern [27]. The heart of the RT Event Service service is
the Event Channel, shown in Figure 3. The features of TAO’s
Event Channel are defined by anEvent Channel IDL interface
and implemented by a C++ class of the same name. This class
also plays a mediator role by serving as a “location broker” so
the rest of the Event Channel components can find each other.
When a ProxyPushConsumer receives an event from an
application, it iterates over the set of ProxyPushSuppliers
that represent the potential consumers interested in that event
(section 2.3.2 describes how this set is determined). Each
ProxyPushSupplier checks to see if the event is relevant for
its consumer. This check is performed by the filter hierarchy
described in section 2.3.1. If a consumer is interested in the
event, a Dispatching Strategy selects the thread that will dis-
patch the event to the consumer. Section 2.3.6 discusses vari-
ous tradeoffsto consider when selecting the dispatching thread
strategy.
For real-time applications that require periodic event pro-
cessing, the Event Service can containan optionalTimerMod-
ule. Section 2.3.14 outlines several strategies for generating
timer events. Each strategy possesses different predictabil-
ity and performance characteristics and different resource re-
quirements.
2.3 Design and Implementation Challenges
Section 2.2 outlined the core components of the CORBA
Event Service that are defined by IDL interfaces. Below, we
describe how we have systematically applied key the design
patterns, such as Builder, Command, Composite, and Strategy
from the GoF book [27], Strategized Locking [28], and Ser-
vice Configurator [29], to tackle the design and implementa-
tion challenges posed by TAO’s RT Event Service. Since these
patterns are applicable to related systems, we document how
we applied and composed these patterns in TAO to achieveour
performance and scalability goals.
Dispatching
Module
Consumer
Admin
Supplier
Admin
Timer
Module
Event
Flow
Supplier
Supplier
Consumer
Consumer
Consumer
Figure 3: RT Event Service Architecture
2.3.1 Implementing an Extensible and Efficient Filtering
Framework
Context: TAO’s real-timeEventService providesseveralfil-
tering primitives, e.g.a consumer can only accept events of a
given type or from some particular source [8]. Not all appli-
cations require all filtering mechanisms provided by the RT
Event Service, however. For example, many distributed inter-
active simulations do not require correlation. Likewise, some
Event Service applications do not require filtering. Moreover,
consumers often compose several filtering criteria into their
subscription list, e.g.they request to receive any event from a
given list or a single notification when all the events in a list
are received.
4
Problem: The Event Channel should supportthe additionof
new filtering primitives flexibly and efficiently. For instance,
an Event Channel should allow new filtering primitives, such
as receiving a single notification when a set of events are re-
ceived in a particular order or accepting any event whose type
matches a designated bitmask.
Solution use the Composite pattern [27]: This pattern
allows clients to treat individual objects and compositions of
objects uniformly. Usually, the composition forms a tree struc-
ture, which in our case is a lter composition tree. New filter-
ing primitives can be implemented as leaves in the composi-
tion tree. These primitives provide applications with substan-
tial expressive power, e.g.they can can create complex filter
hierarchies using disjunction and conjunction composites.
To control the creation of the concrete filters, we use the
Builder Pattern [27], which separates the construction of a
complex object from its representation. In our case, we build
the filter hierarchy from the subscription IDL structures. Ul-
timately, we plan to use the complete Trader Constraint Lan-
guage [30] filtering language. One important consequence of
using the Builder pattern is that this change will not affect the
overall architecture of TAO’s Event Service framework.
2.3.2 Improving Consumer Filtering Scalability
Context: In some distributed interactive simulation applica-
tions, only a small percentage of consumers are interested in
a particular event. Therefore, an Event Channel implementa-
tion that queried each ProxyPushSupplier to check if an event
is of interest to any of its clients will scale very poorly as the
number of events increases.
Problem: Reduce the time required to dispatch an event by
reducing the set of consumers tested.
Solution pre-compute the set of consumers for each
ProxyPushConsumer object: We can use the types of
events generated by a supplier to determine which con-
sumers may be interested in the events generated by that
supplier. Likewise, we use the filter hierarchy in each
ProxyPushSupplier to estimate if the corresponding consumer
is willing to receive at least one of the events published by the
supplier. If the consumer is not interested in any of the events,
we can remove it from the set, thereby improving overall sys-
tem performance.
2.3.3 Reducing Memory Footprint
Context: In other distributed interactive simulation applica-
tions, a large percentage of consumers may be interested in the
events generated by each supplier. In such cases, it is counter-
productive to use the pre-computation optimization described
above. Instead, it may be more efficient to use a single global
consumer set, which reduces the memory footprint and mini-
mizes the time required to update the consumer set.
Problem: The application developer should be able to con-
trol over algorithm used to build consumer sets.
Solution use the Strategy Pattern [27]: In this pattern,
a family of algorithms is represented by classes that share a
common base class. Clients access these algorithms via the
base class, therebyenabling them to select differentalgorithms
without requiring changes to the client. We use this pattern
in TAO’s RT Event Service framework to encapsulate the ex-
act algorithm used to control the number of consumer sets, as
well as how these sets are are updated. Note that the varia-
tions mentioned thus far are not exhaustive, e.g.we can store
separate consumer set for each event type. The framework
implemented in TAO’s RT Event Service can support this use
case, as well.
2.3.4 Supporting Re-entrant Calls while Dispatching
Event
Context: To dispatch an event to multiple consumers, an
Event Channel must iterate over its set of ProxyPushSupplier
objects. In some concurrency models, such as the single-
threaded or reactive dispatching strategies described in Sec-
tion 2.3.6, the same thread that iterates over a consumer set
executes the upcall to consumers. Consumers are then allowed
to push new events, add or remove consumers and suppliers,
as well as call back into the Event Channel and its internal
components.
Problem: The EventChannelshouldsupportre-entrantcalls
during event dispatching, regardless of the concurrency model
that is being used. However, many iterator implementations
become invalidated when their data structure is modified [31].
Thus, the ProxyPushSupplier set cannot be changed when a
thread is iterating over it. Simply locking the set is inap-
propriate because the application will either deadlock if the
upcall changes the set or will invalidate iterators if recursive
locks are used. Another inappropriate alternative is to copy
the ProxyPushSupplier set before starting the iteration. Al-
though this strategy works for small sets, it performs poorly
for large-scale distributed interactive simulation applications.
Solution apply lazy evaluation to delay certain opera-
tions: TAO’s Event Channel tracks the number of threads it-
erating over each set of ProxyPushSupplier objects. Before
5
performing changes that would invalidate other iterators, it
checks to ensure no concurrent iterations are occurring. If it-
erations are in progress, the operation is stored as a Command
object [27]. When no threads are iterating on the set, all de-
layed command operations can be executed sequentially.
To avoid starving a delayed operationindefinitely,limits can
be placed on the number of iterations started after a pending
modificationoccurs. After this limit is reached, all new threads
must wait on a lock until the modification completes.
2.3.5 Reducing Synchronization Overhead
Context: Excessive synchronization overhead can be a sig-
nificant bottleneck when it occurs in the critical path of a con-
current system.
Problem: The lazy evaluation solution described in Sec-
tion 2.3.4 is functionally correct. However, it increases syn-
chronization overhead along the critical path of the event fil-
tering and dispatching algorithms. In particular, Event Chan-
nels may choose to decouple (1) threads that iterate over the
ProxyPushSupplier sets from (2) threads that perform con-
sumer upcalls. This decoupling (1) yields more predictable
behavior in hard real-time systems, (2) allows Event Chan-
nels to re-order the events, e.g.in order to perform dynamic
scheduling [5], and (3) isolates event suppliers from the exe-
cution time of consumer upcalls.
2
Solution use the Strategy Pattern [27] TAOs Event
Channel uses the this pattern to strategize the dispatching al-
gorithm and minimize overhead in applications that do not re-
quire complex concurrencyand re-entrancy support. For com-
plex use-cases, TAO’s Event Channel uses a special lock that
updates the state in the set to indicate that a thread is perform-
ing an iteration. When this lock is released, any operations
delayed while the lock was held are executed.
2.3.6 Selecting the Thread to Dispatch an Event
Context: Once an Event Channel has determined that a par-
ticular event should be dispatched to a consumer, it must de-
cide which thread will perform the dispatching. As shown
in Figure 4, there are several alternatives. Using the same
thread that received the eventis efficient,e.g.it reduces context
switching, synchronization, and data copying overhead [16].
However, this design may expose the Event Channel to mis-
behaving consumers. Moreover, to avoid priority inversions
in real-time systems, events must be dispatched by a thread
at the appropriate priority. Likewise, highly-scalable systems
2
Although this design may increase context switching overhead, many ap-
plications can tolerate this if Event Channels already use separate threads to
perform upcalls.
Reactive
Priority
Queues
Thread
Pool
EVENT
CHANNEL
Consumer Admin
Supplier Admin
Priority
Timers
Dispatching
Module
1 : enqueue (consumer,
event)
2 : dequeue (consumer,
event)
3 : consumer->push(
event)
2 : dequeue (consumer,
event)
1 : consumer->push(
event)
1 : enqueue (consumer,
event)
3 : consumer->push(
event)
Figure 4: Dispatching Strategies Supported in TAO’s Event
Channel
may want to use a pool of threads to dispatch events, thereby
leveraging advanced hardware and overlapping I/O and com-
putation.
Problem: An Event Channel must provide a flexible infras-
tructure to determine which thread dispatches an event to a
particular consumer.
Solution use the StrategyPattern [27]: This pattern can
be applied to encapsulate the algorithm used to choosethe dis-
patching thread. The selected dispatching strategy is respon-
sible for performing any data copies that may be necessary
to pass the event to a separate thread. The current imple-
mentation of TAO’s Event Channel exploits several optimiza-
tions, such as reference counting, in the TAO ORB to reduce
those data copies. In applications with stringent real-time re-
quirements, the dispatching strategy collaborates with TAO’s
Scheduling Service [10, 5] to determinethe appropriatequeue
(and thread) to process the event. When the same thread is
used for reception and dispatching, the strategy collaborates
with the ProxyPushSupplierto minimize locking overhead, as
described in Section 2.3.5.
2.3.7 Configuring Event Channel Strategies Consistently
Context: To adapt to various use-cases, TAO’s Event Chan-
nel provides myriad strategies that can be configured by ap-
plication developers. Often, the choice of one strategy affects
6
other strategies. For example, if the Event Channel’s dispatch-
ing strategy always uses a separate thread to process the event
there is no risk of having re-entrant calls from the consumers
modifying the ProxyPushSupplier sets. Thus, a simpler strat-
egy to manipulate those sets can be used.
Problem: Selecting a suitable combination of strategies can
impose an undue burden on the developer and yield inefficient
or semantically incompatible strategy configurations. Ideally,
developer should be able to select from a set of configurations
whose strategies have been pre-approved to achieve certain
goals, such as minimizing latency, avoiding priority inversion,
or improving system scalability.
Solution use the Abstract FactoryPattern[27]: to con-
trol the creation of all the objects in the Event Channel. In
this pattern a single interface creates families of related on de-
pendent objects. We use it to provide a single point to se-
lect all the Event Channel strategies, and avoid incompatible
choices. Concrete implementations of this Abstract Factory
ensure that strategies and components are semantically com-
patible and collaborate correctly.
2.3.8 Supporting Rapid Testing and Run-time Changes
in the Configuration
Context: Some applications may be used in multiple envi-
ronments, with different Event Channel strategies configured
for each environment. During application development and
testing, it may be necessary to evaluate multiple configura-
tions to ensure that the application works in all of them or to
identify the most efficient/scalable configurations.
Problem: If the Event Channel is statically configured, it is
hard evaluate various combinations without time consuming
recompiling/relinking.
Solution use the Service Configurator Pattern [29]:
This pattern allows applications to dynamically or statically
configure service implementations. We use ACE’s implemen-
tation of this pattern to dynamically load Abstract Factorys
that create various Event Channel configurations. Our im-
plementation includes a default Abstract Factory that uses the
scripting featuresof the ACEServiceConfiguratorframework.
By using this default,developersor end-usercan modifyEvent
Channel configuration at initialization time by simply chang-
ing entries in a configuration file.
2.3.9 Exploiting Locality in Supplier/Consumer pairs
Context: TAO’s Event Channels can be accessed trans-
parently across distribution boundaries since it is based on
Event
Service
Supplier
Consumer
CORBA/IIOP
Consumer
Consumer
Consumer
Consumer
Supplier
Supplier
Supplier
Supplier
Figure 5: A Centralized Configuration of the TAO RT Event
Service
CORBA. Many applications want to be shielded from distri-
bution aspects, while simultaneously achieving high perfor-
mance.
Problem: There are use-cases where distribution trans-
parency may not yield the most effective configuration. For
example, Figure 5 illustrates a scenario where most or all con-
sumers for common events reside in the same process, host, or
network with the supplier. Thus, sending an event to a remote
Event Channel, only to have it sent back to the same process
immediately, wastes network resources and increases latency
unnecessarily. Likewise, there may be multiple remote con-
sumers expecting the same event. Ideally, bandwidth should
be conserved in this case by sending a single message across
the network to all those remote consumers.
Solution federate Event Channels: Figure 6 illustrates
the use of Event Channel Gateways to federate Event Chan-
nels. Each Gateway is a CORBA object that connects to the
local Event Channel as a supplier and connects to the remote
Event Channel as a consumer. To reduce network traffic, the
Gateway must subscribe to events that are of interest to at least
one local consumer. Suppliers and consumers connect directly
to their local Event Channel. This design reduces average la-
tency for all the consumers in the system because consumers
and suppliers exhibit locality of reference, i.e.most consumers
for any event are in the same domain as the supplier generating
the event. Moreover, if multiple remote consumers are inter-
ested in the same event only one message is sent to the remote
Gateway, thereby minimizing network utilization.
A straightforward and portable way to implement a Gate-
way is to use IIOP to receive a single event from a remote
7
Supplier
Supplier
Consumer
Consumer
Consumer
CORBA/IIOP
Consumer
Supplier
Supplier
Event
Service
Event
Service
Event
Service
Event
Service
Figure 6: A Federated Event Channel Configuration
EventChannel and propagateit, throughthe local Event Chan-
nel, to multiple consumers. This design conserves network
resources and increases latency only for an uncommon case,
i.e.where local consumers receive events directly through the
local Event Channel.
2.3.10 Updating the Gateway Subscriptions and Publica-
tions
Context: In a dynamic environment, subscriptions change
constantly.
Problem: To use network resources efficiently, the Event
Channel Gateways described in Section 2.3.9 must avoid sub-
scribing to all events in a remote events channel. Otherwise,
the locality of reference benefits of Event Channel federation
are lost.
Solution use the Observer Pattern [27]: In this pattern,
whenever one object changes state, all its dependents are no-
tified. When this pattern is applied to TAO’s RT Event Ser-
vice framework, we propagate the changes in the subscription
and publication lists to all interested consumers and suppliers.
The implementation of observables, i.e.the Event Channels,
can be strategized. Thus, if applications know apriorithat
there will be no observers at run-time, they can configure the
Event Channel to disable this feature, thereby eliminating the
overhead required to update the (empty) observer list.
2.3.11 Further Improving Network Utilization
Context: In distributedinteractivesimulations, it iscommon
that an event will be dispatched to multiple hosts in the same
network.
Problem: Network bandwidth is often a scarce resource for
large-scale simulations, particularly when they are run over
WANs. As the number of nodes increase, therefore, sending
the same event multiple times across a network does not scale.
Solution use a multicast or broadcast protocol: TAOs
Event Channel can be configured to use UDP to multicast
events. As with the Gateways describe in Section 2.3.10, a
special consumer can subscribe to all the events generated by
local suppliers, as shownin Figure 7. This consumeruses mul-
ticast to send events to selected channels in the network. On
each receiver,a designated supplier re-publishes all events that
are of interest for local consumers. This supplier receives re-
mote multicast traffic, converts it into an event, and forwards
the eventto its local consumersvia an EventChannel. Forboth
consumers and suppliers, the Observer interface described in
Section 2.3.10 is used to modify the subscriptions and publi-
cations of multicast Gateways dynamically.
2.3.12 Exploiting Hardware- and Kernel-level Filtering
Context: If different types of events can be partitioned onto
different multicast groups, consumer hosts only receive a sub-
set of the multicast traffic. In large-scale distributed interac-
tive simulations it may be necessary to disseminate eventsover
several multicast groups. This design avoids unnecessary in-
terrupts and processing by network interfaces and OS kernels
when receiving multicast packets containing unwanted infor-
mation.
Problem: The Event Channel must select the multicast
group used for each type of event in a globally consistent way.
However, the mapping between events and multicast groups
may be different for each application. Applications can use
different mechanisms to achieve that goal. For instance, some
use pre-established mappings between their event types and
the multicast groups, whereas others use a centralized service
8
Supplier
Supplier
Consumer
Consumer
Consumer
Consumer
Supplier
Supplier
IP Multicast
Network
Event
Service
Event
Service
Event
Service
Event
Service
Figure 7: Using Multicast in Federated Event Channels
to maintain this mapping. Moreover, applications that require
highly scalable fault tolerance may choose to distribute the
mapping service across the network. An Event Channel must
be able to satisfy all these scenarios,without imposinganyone
strategy.
Solution use a user-supplier callback object: Applica-
tion developers can implement an address server,whichis
a CORBA object that Event Channel Gateways query to re-
direct events to the appropriate multicast group. On the re-
ceiver, the Gateways consult this service to decide which mul-
ticast groups to subscribe to, based upon the current set of
subscriptions in the local Event Channel. Advanced operat-
ing systems and network adapters can use this information to
process only multicast traffic that is relevant to them.
To avoid single points of failure and to improve scalability,
application developers can replicate address servers across the
network. If developers use a static mapping between events
and multicast groups, there is no need to communicate state
between address services. Conversely, if mappings change dy-
namically, applications must implement mechanisms to prop-
agate these changes to all address servers. One solution is to
use the Event Service itself to propagate this information.
2.3.13 Breaking Event Cycles in Event Channel Federa-
tions
Context: In a complexdistributedinteractivesimulation, the
same event could be important for both local and remote con-
sumers. For instance, a local supplier can generate tank posi-
tion events; if both a local and remote consumer are interested
in the event the Gateways could continuously send the event
between two federated Event Channels.
Problem: Consumers for a particular event can be present
in multiple channels in the federation. In this case, Gateways
will propagate events between the peers of the federation in-
definitely due to cycles in the event flow graph. One approach
would be to add addressing information to each event and en-
hance the routing logic in each Event Channel. However, this
design would complicate the Gateway architecture for simpler
use-cases and requires additional communication among the
peers.
Solution use a time-to-live (TTL) counter: This counter
is stored in each event and decremented each time an event
passes through a Gateway. If the TTL counter becomes zero
the event deallocated and not forwarded. Usually event chan-
nel federations are fully connected, i.e.all event channels have
a Gateway to each of their peers. Thus, setting the TTL
counter to 1 eliminates all cycles because no event traverses
more than one Gateway link. Inmore complexdistributedcon-
figurations, however, the TTL can be set to a higher number,
thoughevents may loop before being discarded. To further im-
prove performance, the TAO event channel code has been op-
timized to reduce data copying, only the event header requires
a copy to change the TTL counter, the payload, that usually
contains most of the data, is not touched.
2.3.14 Providing Predictable and Efficient Periodic
Event
Context: Real-time applications require an Event Channel
to generate events at particular times in the future. For in-
stance, applications can use these events to detect missed
dead-lines in non-critical processing or to support hardware
that requires watchdog timers to identify faulty equipment. In
addition, some applications require periodic events to initiate
periodic tasks and to detect that the periodic tasks complete
before their dead time.
9
Hard real-time applications may assign different priorities
to their timer events. To avoid priority inversions, therefore,
events should be generated and dispatched by thread at the ap-
propriate priorities. Soft real-time applications or best-effort
applicationsoften impose no such strict requirements on timer
priorities and thus are better served by simpler strategies that
conserve memory and CPU resources. Other applications re-
quire no timers at all and obviously single-threaded applica-
tions cannot use this technique to generate the periodic events.
Problem: Implement predictable periodic events for hard
real-timeapplicationswithout undueoverheadfor applications
with lower predictability requirements.
Solution use the Strategy Pattern [27] to dynamically
select the mechanisms used to generate timeout events. In
TAO’s Event Channel, the ConsumerFilterBuilder creates spe-
cial filter objects that adapt the timer module used to generate
timeouts with consumers that expect the IDL structures used
to represent events.
3 Applying TAO’s Real-time Event
Service to DMSO’s HLA RTI-NG
The U.S. Defense Modeling and Simulation Office (DMSO)
developed a standard for distributed simulation middleware
known as the High Level Architecture (HLA). Every imple-
mentation of this standard is referred to as a Run-time Infras-
tructure (RTI). Early RTI’s demonstrated the viability of the
specification, so DMSO commissioned the development of a
next generation RTI, the RTI-NG.
The architecure of the HLA RTI-NG is centered around
TAO’s Real-time Event Service. Fundamentally, HLA speci-
fies publish/subscribe distributed middleware, so this is an ex-
cellent foundation. Those aspects of the HLA not well-suited
to publish/subscribe (mostly aspects involving the control of
data dissemination)use the RMI provided by TAO’s implmen-
tation of CORBA. These RMI aspects of the RTI-NG will not
be condisered further in the remainder of this discussion.
In HLA parlance, a group of participants cooperating in a
distributed simulation is a federation.
3
A federate is the term
describing a participant of a federation. A given process that
belongs to a federation may contain one or more federates.
Every federate belongs to only a single federation, however
the federates in a multi-federate process need not all belong to
the same federation, as shown in Figure8. Federates not in the
3
Strictly speaking, federation execution is the name given to the group of
participants while they are actually running, to draw distinction between the a
running simulation and a simulation that is not running (e.g.being planned).
FFIIGGUURREE
GGOOEESS
HHEERREE
Figure 8: Federations are composed of disjoint sets of fed-
erates. A single process may have more than one federate,
possibly joined to different federations.
same federation can not communicate using the HLA.
4
Since there is no need to support communication between
federates in different federations, every RTI-NG process con-
tains a separate instance of the data dissemionation mecha-
nisms for each of the federations with which the process must
communicate. This data dissemination mechanism primarily
consists of three instances of TAO’s Real-time Event Channel
as well as several gateways as described in Section 2.3.9 and a
single multicast gateway as described in Section 2.3.11.
The HLA specifies four combinations of data transport
and ordering: reliable/receive-ordered, reliable/time-stamp-
ordered, best-effort/receive-ordered, and best-effort/time-
stamp-ordered. The multiple EventChannels and gateways are
used to efficientlysupportthese four combinations of transport
and ordering. Two of the Event Channels as well as the gate-
ways are used to support the reliable transport (both receive-
and time-stamp-ordered). The third Event Channel and the
multicast gateway are used to support the best-effort transport.
For the reliable transport, one EventChannel is used to han-
dle the outbound (reliable) data. The other Event Channel and
the gateways are used to handle the inbound data. Consider
any two processes that contain federates in the same federa-
tion, process and process . The outbound event channel in
is connected to a gateway in process that supplies events
to process s inbound Event Channel. This configuration is
4
It is possible to bridge communication between HLA federates. However,
such bridging is not standardized by the HLA specification.
10
FFIIGGUURREE
GGOOEESS
HHEERREE
Figure 9: Outbound reliable Event Channels connect to a re-
mote gateways which, in turn, connect to a co-located inbound
reliable Event Channels. Outbound multicast data is transmit-
ted directly to remote multicast gateways which, in turn, sup-
ply data to co-located inbound multicast Event Channels.
depicted in Figure 9.
As was discussed in Section 2.3.10, Event Channel sub-
scriptions and publications can change quite frequently. Using
separate Event Channels for the inbound and outbound allows
publications (subscriptions) to change concurrently with the
receipt (transmission) of events.
In order to supportthe reliable and time-stamp-ordereddata
delivery required by the HLA, the RTI-NG uses a distributed
snapshot algorithm that requires for every reliable event that
is sent and received be accounted for. To accomplish this, the
TAOs ProxyPushSupplier was extended to count every reli-
able event that is transmitted to a remote gateway. Likewise,
TAO’s Event Channel gateway was extendedto counteveryre-
liable Event that is received. This information is thenprovided
to the distributed snapshot algorithm.
For simplicy, best-effort data is transmitted via IP multicast
directly, i.e.withoutthe use of the TAO’s EventChannel. How-
ever, best-effort data is received using TAO’s mulicast Event
Channel gateway. Each multicast gateway passes the data it
receives to an inbound multicast Event Channel. Once again,
this approach minimizes the impact of publication and sub-
scription changes of not only the best-effort data, but the reli-
able data as well.
In addition to four combinations of data transport and or-
dering, the HLA specifies that the programmers developing
a federation have the ability to logically segment the data
exchanged amongst federates to reduce the amount of un-
wanted network traffic. Ideally, the implementation of the
user-defined segmentation would be perfect and each sub-
scriber would receiveonly exactly that data in which it is inter-
ested. In practice, the implementation of the user-defined seg-
mentation is (almost) neverperfect and unwanted data must be
filtered on the receive-side.
The RTI-NG has various static mappings between these
user-defined segmentations and TAO suppliers. Each supplier
supplies only one type of event. On the send-side, the RTI-
NG maps a given user-definedsegmentationonto a set of TAO
suppliers. In the case of reliabletransport,these supplierspush
eventsonto the reliableoutboundeventchannel. The outbound
reliable Event Channel uses the consumer filtering scalability
improvements described in Section 2.3.2.
In the case of best-effort transport, there is but one special-
ized supplier. This supplier collaborates with anaddressserver
(as described in Section 2.3.11) to transmit events to the mul-
ticast group dictated by the statically determined mapping.
On the receive-side, the RTI-NG instantiates one consumer
for every federate in theprocess. The inbound Event Channels
(both reliable and multicast) deliver their events to these con-
sumers. Thus the inbound Event Channels serveto dispatchan
event to a potentially large number of co-located consumers.
Finally, it shouldbe noted thatevents exchangedbythe RTI-
NG do not carry CORBA Any’s, as specified in the OMG
Event Service Specification, but instead carry CORBA octet
sequences. This is an important optimization that was made
to eliminate the potentially tremendous overhead involved in
sending and receiving CORBA Any’s.
4 Empirical Performance Evaluation
We evaluate our design decisions above using a series of ex-
periments described below.
Unless otherwisenoted, all the experimentswere performed
using two identically configured nodes. Each was a Pentium
III @ 866Mhz, with 512Mb of RAM and running Timesys’
Linux/RT v.2.2.14. The tests are based on TAO’s version
1.1.19, compiled using gcc-2.95.2. Both nodes where con-
nected via a 100BaseT ethernet hub, other than the test in
question the network had no significant traffic nor was any
other processing taking place on these hosts.
In our first experiment we establish the baseline latency
for the Event Service, to this effect a supplier and consumer
objects are placed in the same host, and the time to deliver
the event through a remote event channel is measured (see
Figure 10). Placing both the consumer and the supplier on
the same host obviates the need for a distributed clock, and
achieves higher precision in the !measurements. The fre-
11
Supplier
Consumer
Event
Service
Latency
Figure 10: Experimental setup to measure the Event Service
Latency
0
1000 2000 3000 4000
5000 6000
Latency (usecs)
0
2000
4000
6000
8000
Number of Samples
Event Service Latency
Figure 11: Measuring Latencyin a SimpleEvent Service Con-
figuration
quency distribution for this experiment is shown in Figure 11
and summarized in the following table:
Average Minimum Maximum Jitter
780 762 5490 34.53
A better representation on this results is obtained using the
ogive function for the measured latency, that is, the percent-
age of the samples that is smaller or equal than a given la-
tency value. Using this technique we can easily determine that
over of the samples are not more than one standard devi-
ation over the average, and that of the samples are under
. In fact, in this experiment are under mil-
liseconds.
In the second set of experiments we use the Event Service
architecture described in 2.3.9, in this experimentwe measure
500 600
700 800 900
1000
Latency (usecs)
80
85
90
95
100
105
Percentage
Event Service Latency
Figure 12: Latency Experiments Represented as an Ogive
Function
Supplier
Consumer
Event
Service
Latency
Event
Service
Supplier
Consumer
Figure 13: Measuring Latencyin aSimple Event Service Con-
figuration
the roundtrip latency observed when an event is sent and re-
turned via the Event Service federation, as shown in Figure 13
To put these results in perspective we also performed latency
experiments over (1) the ORB using callback calls, (2) the
ORB using twoway calls and (3) raw TCP/IP. As shown in
the following table the event service . In these experiments we
obtained the following results:
Transport Average Minimum Maximum Jitter
TCP/IP 475 473 2223 17.32
ORB 513 511 2970 29.79
Event Service 780 762 5490 34.53
Federation 722 710 9372 34.64
As we can see in Figure 14 The Ideally, the Event Service
should add as little overhead as possible over the underlying
network, as well as avoid any extra jitter.
12
400
500 600
700 800 900
1000
Latency (usecs)
95
96
97
98
99
100
101
102
Percentage of Samples
TCP/IP
ORB
Event Service
Event Service Federation
Figure14: Compared LatencyResults for DifferentTransports
5 Concluding Remarks
6 Ackowledgments
A OS and ORB Performance Analysis
Evaluating the performance of a complex software system, is
a challenging task. The problem becomes even more chal-
lenging if the software system is built over some combina-
tion of middleware and a real time OS. To help our readers
interepret the results in the previous sections, we have mea-
sure the performance of TAO over differentoperating systems,
while keeping the hardware platform constant. Such experi-
ments isolate the overhead due to the ORB in a OS and hard-
ware independent way.
A.1 Overview of the Hardware and Software
Configuration
A.1.1 Hardware Overview
All of the tests in this section were run on a Intel Pentium III
@866MHz, the system had 512 Mbytes of RAM, and a 256Kb
cache. We focused primarily on a single CPU hardware con-
figuration to factor out differences in network interface driver
support and to isolate the effects of OS design and implemen-
tation on the end-to-end performance of ORB middleware and
applications.
A.1.2 Operating Systems Overview
We ran the ORB/OS benchmarks described in this paper on
three real-time operating systems, VxWorks 5.4, QNX-RTP
6.0 and Linux/RT-2.2.14 and two general-purpose operating
systems with real-time scheduling classes, Windows NT 4.0
Workstation with SP5, an Debian GNU/Linux (kernel version
2.2.14). On all platforms, we used the TCP/IP stack supplied
with the operating system. A brief overview of each OS fol-
lows:
VxWorks VxWorks is a real-time OS that supports multi-
threading and interrupt handling. By default, the Vx-
Works thread scheduler uses a priority-basedfirst-in first-
out (FIFO) preemptive scheduling algorithm, though it
can be configured to support round-robin scheduling. In
addition, VxWorks provides semaphores that implement
a priority inheritance protocol
QNX QNX is a real-time, multi-threaded, micro-kernel OS
that is POSIX-compatible. Unique to QNX is the usage
of message passing as a fundamental means of inter pro-
cess communication (IPC). QNX supports preemptive,
priority based FIFO and round robin scheduling. QNX
also implements the priority inheritance protocol for mu-
texes.
Linux Linux is a general-purpose, preemptive, multi-
threaded implementation of SVR4 UNIX, BSD UNIX,
and POSIX. It supports POSIX real-time process and
thread scheduling. The thread implementation utilizes
processes created by a special clone version of fork. This
design simplifies the Linux kernel, though it limits scala-
bility because kernel process resources are used for each
application thread.
Linux/RT Timesys Linux/RT adds a resource Kernel (RK)
to the core Linux kernel. This is supposed to deliver
real-time capabilities from Linux such as fixed-priority
scheduling with priority-inheritance and better resolution
timers. Linux/RT is also binary ’compatible’ with Linux.
Thus, it is possible to have Linux and Linux/RT on the
same hardware very easily. Booting with the Linux/RT
kernel starts the Linux/RT OS. The rest of the system
(file-systems, C-libraries, compiler, command-line tools,
etc.) is just a vanilla Linux distribution.
Windows NT Microsoft Windows NT is a general-purpose,
preemptive, multi-threading OS designed to provide fast
interactive response. Windows NT uses a round-robin
scheduling algorithm that attempts to share the CPU
fairly among all ready threads of the same priority.
Windows NT defines a high-priority thread class called
REALTIME_PRIORITY_CLASS. Threads in this class
are scheduled before most other threads, which are usu-
ally in the NORMAL_PRIORITY_CLASS. Windows NT
is not designed as a deterministic real-time OS, however.
13
In particular, its internal queuing is performed in FIFO
order and priority inheritance is not supported for mu-
texes or semaphores. Moreover, there is no way to pre-
vent hardware interrupts and OS interrupt handlers from
preempting application threads.
A.1.3 Benchmarking Metric Description
The remainder of this section describes the results of the fol-
lowing benchmarking metrics we developed to evaluate the
performance and predictability of VxWorks, QNX, Windows
NT, Linux/RT, and Linux running TAO:
ORB/OS operation throughput This test provides an indi-
cation of the maximum operation throughput that appli-
cations can expect. It measures end-to-end two-way re-
sponse when the client sends a request immediately after
receiving the response to the previous request. This test
and its results are presented in Section A.2.
ORB/OS Latency and Jitter This test is a measure of how
high priority tasks may be affected both in terms of la-
tency of operations and jitter when the number of lower-
priority tasks is increased. In the ideal case higher prior-
ity tasks should not be affected at all while the latency of
lower priority tasks should increase as their numbers in-
crease. For real-time systems it is also imperativethat the
jitter of high priority tasks remain as constant as possible
when the number of lower priority tasks is varied. This
test and its results are presented in Section A.3.
Context switch overhead These tests measure general OS
context switch overhead. High context switch overhead
can significantly degrade application responsiveness and
determinism. These tests and their results are presented
in Section A.4.
Memory Footprint This test is a measure of the size of the
static ACE and TAO libraries on the various Operating
Systems. As already mentioned, ACE and TAO were
compiled statically on all the platforms. The results are
presented in Section A.5.
A.2 Measuring ORB/OS Operation Through-
put
A.2.1 Terminology synopsis
Operation throughput is the maximum rate at which opera-
tions can be performed. We measure the throughput of both
two-way (request/response) and one-way (request without re-
sponse) operationsfrom client to server. This test indicates the
overhead imposed by the ORB and OS on each operation.
A.2.2 Overview of the operation throughput metric
Our throughput test, called IDL_Cubit, uses a single-
threaded client that issues an IDL operation at the fastest pos-
sible rate. The server performs the operation, which is to cube
each parameter in the request. For two-way operations, the
client thread waits for the response and checks that it is cor-
rect. Interprocess communication is performed via the net-
work loop back interface because the client and server process
run on the same machine.
The time required for cubing the argument on the server is
small but non-zero. The client performs the same operation
and compares it with the two-way operation result. The cub-
ing operation itself is not intended to be a representativework-
load. However, many real-time and embedded applications do
rely on a large volume of small messages that each requires
a small amount of processing. Therefore, the IDL_Cubit
benchmarkis useful for evaluatingORB/OS overheadby mea-
suring operation throughput.
We measure throughput for one-way and two-way opera-
tions using a variety of IDL data types, including void, short,
long and sequence types. The one-way operation measure-
ment eliminates the server reply overhead. The void data type
instructs the server to not perform any processing other than
that necessary to prepare and send the response, i.e.it does not
cube its input parameters. The sequence data types exercises
TAO’s marshaling/de-marshalingengine.
A.2.3 Results of the operation throughput measurements
The throughput measurements are shown in Figure 15. Linux
(along with Linux/RT) exhibits the best operation throughput
for all the data types tested, with around 10,000 operations/sec
forsimple datatypes. The one-waythroughputonLinux,how-
ever, was significantly higher and almost double than that on
all of the other platforms. We do not have an explanation for
this, thoughwe note that Linuxthreads are heavier weightthan
on other systems.
QNX and VxWorks results QNX and VxWorks offer con-
sistently good performance for simple types such as void,
short and long with QNX’s throughput being around
calls/sec and VxWorks a little lower at
calls/sec.
Windows NT results Windows-NT performed the worst of
all the OSes compared, except on the one-way test where
it was better than QNX and VxWorks but still worse than
Linux and Linux/RT.
A.2.4 Result synopsis
Operation throughput provides a measure of the overhead im-
posed by the ORB/OS. IDL_Cubit test directly measures
14
0
5000
10000
15000
20000
25000
void short long oneway large octet
seq
Operations/second
Linux
Linux/RT
QNX
WinNT
Figure 15: Operation Throughtput
throughputfor a variety of operation types and data types. Our
measurements show that end-to end performancedepends dra-
matically on data type and the operating system.
A.3 Measuring ORB/OS Latency and Jitter
A.3.1 Terminology Synopsis
ORB end-to-end latency is defined as the average amount of
delay seen by a client thread from the time it sends the request
to the time it completely receives the response from a server
thread. Jitter is the variance of the latency for a series of re-
quests. Increased latency directly impairs the ability to meet
deadlines, whereas jitter makes real-time scheduling more dif-
ficult.
A.3.2 Overview of Latency and Jitter Metrics
We computed the latency and jitter incurred by various clients
and servers using the following configurations shown in Fig-
ure 16
Server Configuration As shown in Figure 16, our test bed
server consists of one servant , with the highest real-
time priority ,andservants that have lower
thread priorities, each with a different real-time priority
. Each thread, processes requests that are sent
to its servant by client threads in the other process. Each
client thread communicates with a servant thread that has
an identical priority, i.e.a client A with thread priority
communicates with a servant A that has thread priority
.
[P ]
0
[P ]
1
SCHEDULER
0
RUNTIME
[P ]
I/O SUBSYSTEM
Server
0 n
S
1
n
C
Pentium II
S S
0
C
1
C
...
...
Object Adapter
Servants
ORB Core
Client
...
[P]
Requests
Priority
...
[P ][P ]
1
[P ]
n
n
Figure 16: ORB Endsystem Latency and Jitter Test Configu-
ration
Client Configuration Figure 16 shows how the benchmark-
ing test uses clients from . The highest prior-
ity client, i.e. , runs at the default OS real-time pri-
ority and invokes operations at Hz, i.e.it invokes
CORBA two-way calls per second. The remaining
clients, , have different lower OS thread prior-
ities and invoke operations at 10 Hz, i.e.they
invoke 10 CORBA two-way calls per second.
All client threads have matching priorities with their corre-
sponding servant thread. In each call, the client sends a value
of type CORBA::Octet to the servant. The servant cubes the
number and returns it to the client, which checks that the re-
turned value is correct.
When the test program creates the client threads, these
threads block on a barrier lock so that no client begins until
the others are created andready to run. When all client threads
are ready to begin sending requests, the main thread unblocks
them. These threads execute in an order determined by the
real-time thread dispatcher.
Each low-priority client thread invokes CORBA two-
way requests at its prescribed rate. The high-priority client
thread makes CORBA requests as long as there are low-
priority clients issuing requests. Thus, high-priority client op-
erations run for the duration of the test.
In an ideal ORB endsystem, the latency for the low-priority
clients should rise gradually as the number of low-priority
client threads increases. This behavior is expected because
the low-priority clients compete for OS and network resources
as the load increases. However, the high-priority client should
remain constant or show a minor increase in latency. In gen-
eral, a significant amountof jitter complicates the computation
15
0
100
200
300
400
500
600
700
800
900
1000
259
Number of Low Priority Clients
Microseconds
NT
Linux/RT
Linux
QNX
Figure 17: TAO’s Latency for High Priority Clients
0
100
200
300
400
500
600
700
259
Number of Low Priority Clients
Microseconds
NT
RT-Linux
Linux
QNX
Figure 18: TAO’s Jitter for High Priority Clients
of realistic worst-case execution times, which makes it hard to
create a feasible real-time schedule.
A.3.3 Results of Latency and Jitter Metrics
The average two-way response time incurred by the high-
priority clients is shown in Figure 17. The jitter results are
shown in Figure 18.
QNX results QNX had the best results for high-priorityclient
jitter (Figure 18. It remained essentially constant as the
number of low-priority clients increased. Its low-priority
client latency was however only third best. High-priority
latency was quite consistent, showingno major change as
the number of low-priority clients increased. However,
latency was higher than both Linux-RT and Linux. Over-
all, QNX produced consistent results, i.e.low-priority
client jitter and latency rose nearly linearly with the num-
ber of low-priority clients.
5
Linux/RT (Timesys 1.0) results Linux/RT had better results
compared to QNX for high-priority client jitter with few
client threads (up to 5). However, the jitter with (10
clients) was much greater compared to that of QNX.
With a larger number of clients (15 and above), its jit-
ter started rising almost linearly to the number of low-
priority clients . Linux/RT had lower high-priority la-
tency compared to QNX, but was higher than vanilla
Linux. As the number of clients increased (15 and
above), however, Linux/RT had the lowest high-priority
latency. Low priority client jitter and latency rose al-
most linearly up to 15 client threads, though there was
a marked increase in the Low priority jitter from 15 to 20
client threads.
Linux (2.2.14) results Linux did surprisingly well on high-
priority latency for small number of low-priority client
threads. It had the least latency among all the operating
systems tested for up to 10 clients, better than Linux/RT
and QNX. With an increase in the number of clients
(above 15), however, it fell behind Linux/RT, though it
was still better than Win-NT. Thus, Linux performs well
with a low number of threads. For low-priority clients,
Linux incurred high client jitter and latency when the
number of threads was high (above 15). However, its
latency and jitter fell (was lesser compared to that of
Linux/RT) for up to 25 low priority clients.
Windows NT results The high-priority client latency for
Win-NT increased linearly with the number of client
threads. However, the latency was the worst of the OS
platforms tested. For high-priority jitter, Win-NT was
also the worst; the pattern was erratic and unpredictable.
An interesting observation is that both jitter (for both low
and high priority clients) and low priority latency fell for
20 client threads from 15. Low priority jitter was better
than that of QNX (up to 10) but it was the worst of all
the platforms as the number of clients threads increased.
Overall, Win-NT produced poor results for all the cases.
In addition, unpredictability in the results indicates that
5
Unfortunately, QNX is pre-configured to support a very small number of
socket handles (32), so the results in this paper are limited to only 9 simulta-
neous clients. We plan to rebuild the QNX release shortly and will re-run the
tests with a large number of client threads
16
Win-NT could be unsuitable for applications requiring
predictable QoS guarantees.
A.3.4 Result synopsis
In general, low latency and jitter are necessary for real-
time operating systems to bound application execution times.
General-purposeoperating systemsshow erratic behavior,par-
ticularly under higher load. For example, Win-NT and Linux
exhibit higher latency for high-priority clients. Win-NT had
almost three times the jitter as compared to Linux/RT (25
clients). In contrast, real-time operating systems are more pre-
dictable. For example high priority jitter for QNX was almost
constant and with regards to Linux/RT, jitter rose only slightly
with an increase in load.
A.4 Measuring ORB/OS Context Switching
Overhead
A.4.1 Terminology synopsis
A context switch involves the suspension of one thread and
immediate resumption of another thread. The time between
suspension and resumption is the context-switching overhead.
Context switching overhead indicates the efficiency of the OS
thread dispatcher. From the point of view of applications
and ORB middleware, context switch time overhead should
be minimized because it directly reduces the effective use of
CPU resources.
There are two types of context switching, voluntary and in-
voluntary, which are defined as follows:
Voluntary context switch This occurs when a thread volun-
tarily yields the processor before its time slice completes.
Voluntary context switching commonly occurs when a
thread blocks awaiting a resource to become available.
Involuntary context switch This occurs when a higher pri-
ority thread becomes runnable or because the current
thread’s time quantum has expired.
A.4.2 Overview of context switching overhead metrics
We measured OS context switching overhead using three met-
rics. The first context switching metric isthe Suspend-Resume
test. It measures two different times:
1. The time to resume a blocked high-priority thread, which
does nothing other than block again immediately when
it is resumed. A low-priority thread resumes the high-
priority thread, so the elapsed time includes two context
switches, one thread suspend, and one thread resume.
2. The timeto suspend and resume a low-prioritythread that
does nothing. There is no context switching. This time is
subtracted from the one describedabove, and the result is
divided by two to yield the context switch time.
POSIX threads do not support a suspend/resume thread in-
terface. Therefore, the Suspend-Resume test is not applicable
to OS platforms, such as QNX and Linux, that only support
POSIX threads.
The second context switching metric is the Yield test. It
runs two threads at the same priority. Each thread iteratively
calls its system function to immediately yield the CPU.
The third context switching metric is the Synchronized
Suspend-Resume test. This test contains two threads, one
higher priority than the other. The test measures two differ-
ent times:
1. The high-priority thread blocks on a mutex held by the
low-priority thread. Just prior to releasing the mutex,
the low-priority thread reads the high-resolution clock
(tickcounter). Immediatelyafter acquiringthe mutex,the
high-priority thread also reads the high-resolution clock.
The time between the two clock reads includes a mutex
release, context switch, and mutex acquire.
The lower priority thread uses a semaphore to suspend
each iteration of the high-priority thread. This prevents
the high-prioritythreadfrom simplyacquiring and releas-
ing the mutex ad infinitum. The timed portions of the test
do not include semaphore operation overhead.
2. The time to acquire and release a mutex in a single thread,
without context switching, is measured. This time is sub-
tracted from the one described above to yield the context
switch time.
We used multiple context switch metrics because not all OS
platforms support each approach. Moreover, some operating
systems show anomalous results with certain metrics. For ex-
ample, Win-NT performs verypoorly on the Suspend-Resume
and Yield tests.
Below, we describe the results from tests that measure the
OS context-switching overhead.
A.4.3 Results of OS context switch overhead metrics
The results in Table 1 show that QNX and VxWorks are the
top performersthough there is only one test common between
them: the Synchronized-Suspend-Resume test. In that test,
QNX was only slightly worse but its jitter was considerably
lower than that of VxWorks, making it more predictable. On
the Yield test, QNX was the best. The other real-time OS,
Linux/RT was around three times slower than QNX or Vx-
Works on the Synchronized suspend-resume test. It was also
17
Operating Suspend/Resume Yield Synch
System Test Test Test
VxWorks 0.586 (0.025) N/A 0.821 (0.019)
QNX N/A 0.470 (0.007) 0.861 (0.003)
Linux/RT N/A 0.645 (0.007) 2.88 (0.015)
Linux N/A 0.548 (0.006) 2.559 (0.011)
Windows-NT 989.559 (0.836) 932.815 (0.562) 1.667 (0.009)
Table 1: Context Switching Latency in Microsecond. The jit-
ter is shown in parenthesis
slower compared to QNX and vanilla Linux on the Yield test.
Linux/RT showed a consistent trend of having a higher con-
text switch time than Linux. This could be due to the addi-
tion of more preemption points in the kernel. Its jitter was
also more than vanilla Linux. Win-NT had anomalous results
on the Suspend-Resume and Yield tests, being thousands of
times slower than any other OS. This could be due to a bug
in the scheduler or because the scheduler takes excessive time
in computing which thread to run next. Win-NT performed
better than Linux or Linux/RT on the Synchronized-Suspend-
Resume test though.
The results above demonstrate that it is difficult to measure
context switching overhead reliably. Therefore, the multiple
measures of context switch time are useful.
A.5 Measuring ORB Footprint
A.5.1 Overview of footprint metric
We measured the sizes of the ACE (libACE) and TAO (lib-
TAO) libraries using the size’ command. Since the libraries
were compiled as static libraries, they give a measure of the
memoryoverheadthat applicationswould incur if they include
all the features of ACE and TAO.
A.5.2 Results of footprint metrics
Figure 19 shows the memory footprint measured on each of
the platforms in kilobytes.
These results show that Win-NT has the least memory foot-
print followed by VxWorks. QNX has the most memory foot-
print. This shows that the NT compiler is very good at produc-
ing compact code. For VxWorks, the levelof optimization was
only-O while for QNX, Linux and Linux/RT it was -O3.This
might have been responsible for the bigger memory footprint
due to aggressive inlining that occurs with the higher level of
optimization.
856
956
595
1612
2062
1000
0
500
1000
1500
2000
2500
Linux , RT
Linux
QNX Win NT
Kilobytes
ACE
TAO
Figure 19: Memory footprint in bytes
References
[1] I. C. Society, ed., Standard for Distributed Interactive
Simulation – Communication Services and Profiles. 345 E.
47th St, New York, NY 10017, USA: IEEE Computer Society,
1995.
[2] F. Buschmann, R. Meunier, H. Rohnert, P. Sommerlad, and
M. Stal, Pattern-Oriented Software Architecture – A System of
Patterns. Wiley and Sons, 1996.
[3] A. Gokhale and D. C. Schmidt, “Techniques for Optimizing
CORBA Middleware for Distributed Embedded Systems,” in
Proceedings of INFOCOM ’99, Mar. 1999.
[4] D.C.Schmidt,T.H.Harrison,andE.Al-Shaer,
“Object-Oriented Components for High-speed Network
Programming,” in Proceedings of the
Conference on
Object-Oriented Technologies and Systems, (Monterey, CA),
USENIX, June 1995.
[5] C. D. Gill, D. L. Levine, and D. C. Schmidt, “The Design and
Performance of a Real-Time CORBA Scheduling Service,
Real-Time Systems, The International Journal of Time-Critical
Computing Systems, special issue on Real-Time Middleware,
vol. 20, March 2001.
[6] J. A. Zinky, D. E. Bakken, and R. Schantz, “Architectural
Support for Quality of Service for CORBA Objects,Theory
and Practice of Object Systems, vol. 3, no. 1, pp. 1–20, 1997.
[7] D. C. Schmidt and C. Cleeland, “Applying Patterns to Develop
Extensible ORB Middleware,IEEE Communications
Magazine, vol. 37, April 1999.
[8] T. H. Harrison, D. L. Levine, and D. C. Schmidt, “The Design
and Performance of a Real-time CORBA Event Service,” in
Proceedings of OOPSLA ’97, (Atlanta, GA), pp. 184–199,
ACM, October 1997.
18
[9] Object Management Group, The Common Object Request
Broker: Architecture and Specification, 2.3 ed., June 1999.
[10] D. C. Schmidt, D. L. Levine, and S. Mungee, “The Design and
Performance of Real-Time Object Request Brokers,
Computer Communications, vol. 21, pp. 294–324, Apr. 1998.
[11] F. Kuhns, D. C. Schmidt, and D. L. Levine, “The Design and
Performance of a Real-time I/O Subsystem,” in Proceedings of
the
IEEE Real-Time Technology and Applications
Symposium, (Vancouver, British Columbia, Canada),
pp. 154–163, IEEE, June 1999.
[12] F. Kuhns, C. O’Ryan, D. C. Schmidt, and J. Parsons, “The
Design and Performance of a Pluggable Protocols Framework
for Object Request Broker Middleware,” in Proceedings of the
IFIP
International Workshop on Protocols For High-Speed
Networks (PfHSN ’99), (Salem, MA), IFIP, August 1999.
[13] D. C. Schmidt, S. Mungee, S. Flores-Gaitan, and A. Gokhale,
“Software Architectures for Reducing Priority Inversion and
Non-determinism in Real-time Object Request Brokers,
Journal of Real-time Systems, special issue on Real-time
Computing in the Age of the Web and the Internet, vol. 21,
no. 2, 2001.
[14] A. B. Arulanthu, C. O’Ryan, D. C. Schmidt, M. Kircher, and
J. Parsons, “The Design and Performance of a Scalable ORB
Architecture for CORBA Asynchronous Messaging,” in
Proceedings of the Middleware 2000 Conference,ACM/IFIP,
Apr. 2000.
[15] A. Gokhale and D. C. Schmidt, “Measuring the Performance
of Communication Middleware on High-Speed Networks,” in
Proceedings of SIGCOMM ’96, (Stanford, CA), pp. 306–317,
ACM, August 1996.
[16] I. Pyarali, C. O’Ryan, D. C. Schmidt, N. Wang, V. Kachroo,
and A. Gokhale, “Applying Optimization Patterns to the
Design of Real-time ORBs,” in Proceedings of the
Conference on Object-Oriented Technologies and Systems,
(San Diego, CA), USENIX, May 1999.
[17] Object Management Group, The Common Object Request
Broker: Architecture and Specification, 2.4 ed., Oct. 2000.
[18] R. Rajkumar, M. Gagliardi, and L. Sha, “The Real-Time
Publisher/Subscriber Inter-Process Communication Model for
Distributed Real-Time Systems: Design and Implementation,
in First IEEE Real-Time Technology and Applications
Symposium, May 1995.
[19] C. Ma and J. Bacon, “COBEA: A CORBA-Based Event
Architecture,” in Proceedings of the
Conference on
Object-Oriented Technologies and Systems, USENIX, Apr.
1998.
[20] Y. Aahlad, B. Martin, M. Marathe, and C. Lee, “Asynchronous
Notification Among Distributed Objects,” in Proceedings of
the
Conference on Object-Oriented Technologies and
Systems, (Toronto, Canada), USENIX, June 1996.
[21] Object Management Group, CORBAServices: Common Object
Services Specification, Revised Edition, 95-3-31 ed., Mar.
1995.
[22] Object Management Group, Persistent State Service 2.0
Specification, OMG Document orbos/99-07-07 ed., July 1999.
[23] Object Management Group, Security Service Specification,
OMG Document ptc/98-12-03 ed., December 1998.
[24] Object Management Group, Transaction Services
Specification, OMG Document formal/97-12-17 ed., December
1997.
[25] Object Management Group, Fault Tolerance CORBA Using
Entity Redundancy RFP, OMG Document orbos/98-04-01 ed.,
April 1998.
[26] Object Management Group, Concurrency Services
Specification, OMG Document formal/97-12-14 ed., December
1997.
[27] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design
Patterns: Elements of Reusable Object-Oriented Software.
Reading, Massachusetts: Addison-Wesley, 1995.
[28] D. C. Schmidt, “Strategized Locking, Thread-safe Interface,
and Scoped Locking: Patterns and Idioms for Simplifying
Multi-threaded C++ Components,C++ Report, vol. 11, Sept.
1999.
[29] P. Jain and D. C. Schmidt, “Dynamically Configuring
Communication Services with the Service Configurator
Pattern,C++ Report, vol. 9, June 1997.
[30] Object Management Group, Trading ObjectService
Specification, 1.0 ed., Mar. 1997.
[31] T. Kofler, “Robust iterators for ET++,Structured
Programming, vol. 14, no. 2, pp. 62–85, 1993.
19
Conference Paper
Full-text available
This paper is inspired from the latest trends in Distributed Simulation Technology for defence applications, being practiced world over. It gives a brief introduction to latest architectures of distributed simulation (DIS & HLA) and its use in our application (NWPE) design, to provide interoperability of networked simulations. The HLA components have been briefly explained. Also, an example application, derived from the NWPE entities, has also been illustrated.
Conference Paper
Full-text available
Supporting end-to-end quality-of-service (QoS) in Large-scale distributed interactive simulations (DIS) is hard due to the heterogeneity and scale of communication networks, transient behavior, and the lack of mechanisms that holistically schedule different resources end-to-end. This paper aims to cope with these problems in the context of wide area network (WAN)-based DIS applications that use the OMG Data Distribution Service (DDS) QoS-enabled publish/subscribe middleware. First, we show the design and implementation of the QoS framework, which is a policy-driven architecture that shields DDS-based DIS applications from the details of network QoS mechanisms by specifying per-flow network QoS requirements, performing resource allocation and validation decisions (such as admission control), and enforcing per-flow network QoS at runtime. Second, we evaluate the capabilities of the framework in an experimental large-scale multi-domains environment. The evaluation of the architecture shows that the proposed QoS framework improves the delivery of DDS services over heterogeneous IP networks, and confirms its potential impact for providing network-level differentiated performance.
Article
We conduct computer simulation experiments in pre-operative planning of vascular reconstruction with a physician in the experimental loop. We constructed a problem solving environment, which offers an integrative approach for constructing and running complex interactive systems. Grid resources are used for access to medical image repositories, segmentation services, simulation of blood flow, and visualization in virtual environments of the simulated results together with medical data obtained from MRI/CT scanners. This case study in vascular reconstruction planning has been validated via contextual analysis in a number of hospitals, resulting in a set of new requirements gathered for future versions of the system.
Conference Paper
The test and training enabling architecture (TENA) supports the rapid, reliable, decentralized and collaborative development of applications for large-scale, high-performance, distributed, real-time and embedded systems.The U.S. Department of Defense (DoD) uses TENA for distributed testing and training, combining the use of real sensor systems with data from virtual and constructive simulations. Such systems are called live, virtual and constructive (LVC) systems. The resulting data must be exchanged in real-time with high performance. Most of the simulations and sensors are inherently distributed, typically over a large geographic area. The developers working to support the ranges are themselves geographically distributed, may never have met each other, and do not report to a common authority. Nevertheless, these developers must collaborate to execute a joint testing or training event. At the core of TENA, the TENA middleware delicately weaves together a unique combination of model-driven, code-generated software with high-level, easy-to-understand programming abstractions and an API designed to detect programming errors at compile-time rather than run-time, where the cost of an error could be extremely high. TENA provides re-usable standardized components to simplify the development of TENA applications. TENA features a flexible and extensible TENA object model compiler (OMC) which can automatically generate example programs, test programs, and gateways to legacy systems with older, less sophisticated software architectures such as distributed interactive simulation (DIS) and the high-level architecture run-time infrastructure (HLA/RTI). More than 4,000 registered users currently utilize TENA at dozens of facilities worldwide. TENA supports numerous major test and training events. For more information, see https://www.tena-sda.org/.
Conference Paper
Full-text available
Historically, method-oriented middleware, such as Sun RPC, DCE, Java RMI, COM, and CORBA, has provided synchronous method invocation (SMI) models to applications. Although SMI works well for conventional client/server applications, it is not well-suited for high-performance or real-time applications due to its lack of scalability. To address this problem, the OMG has recently standardized an asynchronous method invocation (AMI) model for CORBA. AMI provides CORBA with many of the capabilities associated traditionally with message-oriented middleware, without incurring the key drawbacks of message-oriented middleware. This paper provides two contributions to research on asynchronous invocation models for method-oriented middleware. First, we outline the key design challenges faced when developing the CORBA AMI model and describe how we resolved these challenges in TAO, which is our high-performance, real-time CORBA-compliant ORB. Second, we present the results of empirical benchmarks that demonstrate the performance benefits of AMI compared with alternative CORBA invocation models. In general, AMI based CORBA clients are more scalable than equivalent SMI based designs, with only a moderate increase in programming complexity.
Conference Paper
Full-text available
The CORBA Event Service specification lacks important features in terms of quality of service (QoS) characteristics required by multimedia information. The main objective of the work described in this paper is to augment the standard CORBA Event Service specification with a set of extensions, to turn it into an adaptable QoS middleware multimedia framework. To meet this, some extensions to the CORBA Event Service already developed with the aim of providing multicasting and reliability features have been enhanced in order to allow the close interaction with multicast transport protocols and with QoS monitoring mechanisms. The result was a QoS -aware middleware platform that actively adapts the quality of service required by the applications to the one that is provided by the underlying communication channel. The main quality of service features addressed by the platform - and discussed in the paper - are the support of sessions with different reliability levels, the provision of congestion control mechanisms and the capability to suppress jitter.
Conference Paper
Interactive Problem Solving Environments (PSEs) offer an integrated approach for constructing and running complex systems, such as distributed simulation systems. New distributed infrastructures, like the Grid, support the access to a large variety of core services and resources that can be used by interactive PSEs in a secure environment. We are experimenting with Grid access for interactive PSEs built on top of the High Level Architecture (HLA), a middleware for interactive simulations. Our current approach is such that once a PSE simulation has been executed in the framework, mechanisms from both HLA and Grid middleware are used to broker resources, for job submission services, performance monitoring services, and security services for efficient and transparent execution. We are experimenting with the Web-based Open Grid Services Architecture (OGSA) for HLA RTI Federate registration and discovery, as well as for data transmission. We have found that Grid Services give the possibility to allow for dynamic modification capabilities of HLA Federates, though the dynamic discovery of Federates and the use of Service Data (metadata) for service introspection is not trivial. Also, we have found that opening many data transmission channels between HLA Federates to one destination affects the number of connections you can make with other destinations.
Article
The development of complex simulators with multiple simulation models running in a distributed way is a difficult task where communication plays a main role, particularly in cases where real-time constraints exist. This paper presents distributed simulators for nuclear power plants with communication based on Real-time CORBA, a middleware for real-time systems. The simulators are used to train power plant operators safely. They are based on a complex architecture of simulation models with real-time constraints involving many different applications that allow full scope simulation of the control room of a nuclear power plant. The use of Real-time CORBA in the simulators allows us to obtain platform and language independence, reusable components and the control of real-time properties.
Article
The IEEE 1516 Standard ‘High Level Architecture (HLA)’ and its implementation ‘Run-Time Infra-structure (RTI)’ defines a general-purpose network communication mechanism for Distributed Interactive Simulation (DIS). However, it does not address real-time requirements of DIS. Current operating system technologies can provide real-time processing through some real-time operating systems (RTOSs) and the Internet is also moving to an age of Quality of Service (QoS), providing delay and jitter bounded services. With the availability of RTOSs and IP QoS, it is possible for HLA to be extended to take advantage of these technologies in order to construct an architecture for Real-Time DIS (RT-DIS). This extension will be a critical aspect of applications in virtual medicine, distributed virtual environments, weapon simulation, aerospace simulation and others. This paper outlines the current real-time technology with respect to operating systems and at the network infrastructure level. After summarizing the requirements and our experiences with RT-DIS, we present a proposal for HLA real-time extension and architecture for real-time RTI. Similar to the growth of real-time CORBA (Common Object Request Broker) after the mature based CORBA standard suite, Real-Time HLA is a natural extension following the standardization of HLA into IEEE 1516 in September 2000. Copyright
Article
To be an effective platform for performance‐sensitive real‐time systems, commodity‐off‐the‐shelf (COTS) distributed object computing (DOC) middleware must support application quality of service (QoS) requirements end‐to‐end. However, conventional COTS DOC middleware does not provide this support, which makes it unsuited for applications with stringent latency, determinism, and priority preservation requirements. It is essential, therefore, to develop standards‐based, COTS DOC middleware that permits the specification, allocation, and enforcement of application QoS requirements end‐to‐end. The real‐time CORBA and messaging specifications in the CORBA 2.4 standard are important steps towards defining standards‐based, COTS DOC middleware that can deliver end‐to‐end QoS support at multiple levels in distributed and embedded real‐time systems. These specifications still lack sufficient detail, however, to portably configure and control processor, communication, and memory resources for applications with stringent QoS requirements. This paper provides four contributions to research on real‐time DOC middleware. First, we illustrate how the CORBA 2.4 real‐time and messaging specifications provide a starting point to address the needs of an important class of applications with stringent real‐time requirements. Second, we illustrate how the CORBA 2.4 specifications are not sufficient to solve all the issues within this application domain. Third, we describe how we have implemented portions of these specifications, as well as several enhancements, using TAO, which is our open‐source real‐time CORBA ORB. Finally, we evaluate the performance of TAO empirically to illustrate how its features address the QoS requirements for certain classes of real‐time applications. Copyright © 2001 John Wiley & Sons, Ltd.
Conference Paper
Full-text available
The paper describes the design and performance of a real time I/O (RIO) subsystem that supports real time applications running on off-the-shelf hardware and software. The paper provides two contributions to the study of real time I/O subsystems. First it describes how RIO supports end-to-end, prioritized traffic to bound the I/O utilization of each priority class and eliminates the key sources of priority inversion in I/O subsystems. Second, it illustrates how a real time I/O subsystem can reduce latency bounds on end-to-end communication between high-priority clients without unduly penalizing low priority and best-effort clients
Conference Paper
Full-text available
Historically, method-oriented middleware, such as Sun RPC, DCE, Java RMI, COM, and CORBA, has provided synchronous method invocation (SMI) models to applications. Although SMI works well for conventional client/server applications, it is not well-suited for high-performance or real-time applications due to its lack of scalability. To address this problem, the OMG has recently standardized an asynchronous method invocation (AMI) model for CORBA. AMI provides CORBA with many of the capabilities associated traditionally with message-oriented middleware, without incurring the key drawbacks of message-oriented middleware. This paper provides two contributions to research on asynchronous invocation models for method-oriented middleware. First, we outline the key design challenges faced when developing the CORBA AMI model and describe how we resolved these challenges in TAO, which is our high-performance, real-time CORBA-compliant ORB. Second, we present the results of empirical benchmarks that demonstrate the performance benefits of AMI compared with alternative CORBA invocation models. In general, AMI based CORBA clients are more scalable than equivalent SMI based designs, with only a moderate increase in programming complexity.
Conference Paper
First-generation CORBA middleware was reasonably success- ful at meeting the demands of request/response applications with best-effort quality of service (QoS) requirements. Sup- porting applications with more stringent QoS requirements poses new challenges for next-generation real-time CORBA middleware, however. This paper provides three contributions to the design and optimization of real-time CORBA middle- ware. First, we outline the challenges faced by real-time ORBs implementers, focusing on optimization principle patterns that can be applied to CORBA's Object Adapter and ORB Core. Second, we describe how TAO, our real-time CORBA imple- mentation, addresses these challenges and applies key ORB optimization principle patterns. Third, we present the results of empirical benchmarks that compare the impact of TAO's design strategies on ORB efficiency, predictability, and scala- bility. Our findings indicate that ORBs must be highly config- urable and adaptable to meet the QoS requirements for a wide range of real-time applications. In addition, we show how TAO can be configured to perform predictably and scalably, which is essential to support real-time applications. A key re- sult of our work is to demonstrate that the ability of CORBA ORBs to support real-time systems is mostly an implementa- tion detail. Thus, relatively few changes are required to the standard CORBA reference model and programming API to support real-time applications.
Conference Paper
This paper makes two contributions to the d evelopment and evaluation of object-oriented communication software. First, it reports performance results from benchmarking several network programming mechanisms (such as so ckets and CORBA) on Ethernet and ATM networks. These results il- lustrate that developers of high-bandwidth, low-delay appli- cations (such as interactive medical imaging or teleconfer- encing) must evaluate their performance requirements and the efficiency of their communication infrastructure carefully before adopting a distributed object solution. Second, the paper describes the software architecture and design princi- ples of the ACE object-oriented network programming com- ponents. These components encapsulate UNIX and Win- dows NT network programming interfaces (such as sockets, TLI, and named pipes) with C++ wrappers. D evelopers of object-oriented communication software have traditionally had to choose between high-performance, lower-l evel inter- faces provided by sockets or TLI or less efficient, higher- level interfaces provided by communication frameworks like CORBA or DCE. ACE represents a midpoint in the solution space by improving the correctness, programming simplicity, portability, and reusability of performance-sensitive commu- nication software.