Content uploaded by Mel Siegel
Author content
All content in this area was uploaded by Mel Siegel on Jun 01, 2016
Content may be subject to copyright.
ROSE
2003
Intemational Workshop
on
Robotic Sensing
Orebro University, Orebro,
SWEDEN,
5-6
June
2003
The Sense-Think-Act Paradigm Revisited
Me1 Siege1
Robotics Institute, Camegie Mellon University, Pittsburgh PA 15213 USA, phone: +1 412 268 8742
e-mail: mws@cmu.edu
Abstract
-
Approximately
25
years have passed since
the “sense-act-think” paradigm was advanced as the
operational definition of a robot, and as a broad
roadmap for robotics research. With the appearance
of mobile robots that do real work in the real world,
‘%ommunicate” has de facto been added to the list of
functionalities that are essential features of robots.
In
keeping with the theme of the
ROSE-2003
Workshop,
this paper attempts to articulate and justijy a set of
intellectual and engineering challenges for 21st
century sensing and perception for robotics. It
especially argues for examining the current and re-
examining future roles for teleoperation, both as a
practical route around the improbability that machine
intelligence will equal human intelligence in the
foreseeable future, and because
it
is apparent that
sensor improvement
is
driven in large part by
incremental advances in sensor-display design-and-test
loops that are in turn driven by human factors
governing perception.
Kevwords
-
robot, sense, think, act, communicate,
display, teleoperation, human perception
1.
1NTRODUCTION
It is approximately
25
years since the “sense-act-
think” paradigm [I] was advanced
as
an
operational
definition of a robot and
as
a
broad roadmap for robotics
research. This approximate anniversary, and the
ROSE-
2003 Workshop theme of “Sensing and Perception in
21st Century Robotics”, suggest that now is
a
good time
to revisit and revise the original paradigm and the
.
robot’s corresponding modules for perception,
cognition, and manipulatiodmobility.
One
obviously
appropriate revision is revealed in the light of steady
progress in mobile robots, which have replaced pick-
and-place industrial robots
as
the public’s image of
intelligent autonomous machines: clearly we must
append “communicate” to sense-think-act-, since
without communication the capacity for mobility
is,
if
not useless, then at least nearly
so.
Another requirement
for excellent communication capability, incompletely
foreseen
25
or
so
years ago, is related to the fact that
whereas teleoperation was then seen
as
a regrettable
-
or
even
an
embarrassing
-
intermediate step
on
the path
toward autonomy via artificial intelligence, it is now
recognized that high-fidelity intelligent teleoperation [2]
is of equivalent importance, and achieving it is
as
much
of
an
intellectual and engineering challenge
as
is
autonomy, for it requires
a
much greater understanding
of human perception and action. From this perspective,
the paper will attempt to articulate and justify
a
set
of
intellectual and engineering challenges for 21st century
sensing and perception for robotics, placing these in the
context of contemplated grand missions and
corresponding challenges to the inseparable areas of
robot thinking, acting, and communicating.
The organization is as follows: Section
2
introduces
and discusses issues relating to sensors and displays,
Section
3
discusses
a
new perspective
on
telepresence
and teleoperation, Section 4 discusses issues relating to
scaling theory in robotics, and Section
5
presents the
conclusions.
2.
SENSORS
AND
DISPLAYS
Audio and image capture (sensing) and reproduction
(display) technologies became enormously better and
cheaper during the
20“
century. Exactly how that
happened may teach
a
lesson for how effectively to
drive the- development of 21 century robotic sensing
technologies and modalities. It seems thus: make
a
better display, and
a
corresponding incremental
improvement of the sensor follows to take advantage of
the improved display; make
a
better sensor, and
a
corresponding incremental improvement of the display
follows to take advantage of the improved sensor.
This cycle of sensor-and-display improvement was
obviously in action in the “tele-” era, i.e., the period in
which the telegraph, telephone, and television appeared
on
the human scene. There was no point in having
a
telegraph key, microphone,
or
camera that was much
better
-
in temporal, spatial or spectral resolution,
dynamic range, etc.
-
than the corresponding
annunciator, loudspeaker, or display screen. Conversely,
there was
no
point in having an annunciator,
loudspeaker or display screen that was much better than
the corresponding sensor. Nevertheless, it was useful
for the sensor or the display to become incrementally
a
little better,
as
each improvement drove corresponding
incremental improvement in the complementary device.
The human element in this incremental development
cycle is critical: we understand what it is important to
improve only when we are close enough to recognize
it.
0-7803-8109-2/ 03
/
$17.00
Q
2003
EEE
For
“virtual reality”, which is really “artificial
reality”, we need sensors and displays that are no worse
than the corresponding human senses, but there is no
point to having either a sensor
or
a display whose
technical specifications are better than the technical
specifications of the corresponding human sense. In this
context more is not better, it is simply wasted.
However new sensors increasingly empower
us
with
“extended reality”: they “hear” lower and higher
frequencies that
our
ears hear, they “see” infrared and
ultraviolet colors to which
our
eyes are blind, they “feel”
textures and vibrations that our skin cannot perceive,
and they give us modalities, e.g., radar, that have no
counterpart in the human sensing apparatus. How do we
couple these super-human sensory capacities to the
human brain?
One approach in principle is to leave the perception
and cognition to the computer: let the machine
understand what these extended-modality sensors tell
about the structure and dynamics of the world, and let
the machine translate its understanding into the
structural and dynamical parameters we actually want to
discover via the sensor data. In practice ths
hypothetical approach is neither desirable
nor
possible.
It is not desirable because something is inevitably lost in
the translation, e.g., by forcing a translation from the
sensor’s natural modality into the conceptual constraints
of a human sensing modality. The breadth of potential
knowledge about the world is necessarily narrowed by
separating the inference from the data that generated the
inference. It is not possible because the machine is,
after all, programmed by people who can hardly be
expected to appreciate the richness of the information
provided by a super-human perceptual capability
without themselves being able to experience the
perception. It might be argued that learning machines,
e.g., neural networks, could overcome this impossibility,
but it seems improbable that significant learning could
occur without a source of ground-truth, i.e., human
knowledge, experience, and context.
The realistic approach
is
rather to translate
or
scale
the super-human perceptions into the human domain:
subtract
40
lcHz
from the bat’s ultrasonic chirp to shift it
into the human’s audible range
of
20
-
20,000
Hz
[3],
nonlinearly compress and shift the
1-10
pm imagery
from an extended-range infrared camera into a false-
color map in the human’s visible spectrum between
about
0.38
and
0.78
pm
[4],
translate the radar range
map into a gray-scale image with white representing
some minimum distance and black representing some
maximum distance
[5],
etc.
Of course there is a price to pay for compression, i.e.,
the corresponding lowering of resolution. This is
usually overcome by acquiring enough preliminary
information about the broad domain to isolate smaller
sub-domains of interest, then expanding the most
interesting sub-domain
or
sub-domains to fill the
perceptual domain available to the human viewer. This
is usually an extremely worthwhile tradeoff, but
of
course it is done at the risk of overlooking entirely a
sub-domain that might contain invaluable information.
To
recapitulate and wrap up, advances in sensing
happen in the context of advances in suitable ways to
display the sensed parameter. Usually there is a cycle of
sequential development of sensors and displays,
progress in each driving progress in the other. In human
communication, entertainment, and “person-in-the-loop”
control systems, it is probably fair to conclude that
display development naturally drives sensor
development, not the reverse. This is particularly
illustrated by the development of television, where it
was useful to improve cameras only to the extent that
correspondingly high resolution, his speed, high
dynamic range, etc., displays were available. In
contrast, computer automated control systems do not in
principle require physical displays,
so
sensor
development can in principle proceed without any
coi~esponding display development. However it seems
clear that even if the sensing goal is restricted to
providing the computer with input about the task
or
the
environment, as a practical matter sensor development
-
done by people, of course
-
cannot precede
development of displays that are good enough to guide
the: human developers.
As
for its
role
in the predictable future of robotics, the
author’s feeling is that it will be particularly worthwhile
to employ a sensor-display development cycle in the
context of developing practical tactile and haptic sensors
[6].
Whle there is no lack of proposals and prototypes
of these sensors, their few if any real-world examples of
their application. It seems more than just plausible that
this situation can be attributed to the practical absence
of useful tactile displays, hampering both the application
and the development
of
the sensor technology.
Since remote manipulation
-
either autonomous
or
teleoperated
-
is the most likely context for application
of haptic and tactile sensing and display technology, this
discussion leads naturally into the topic of the next
section, the re-emergence of teleoperation.
3.
TELEPRESENCE
AND
TELEOPERATION
Teleoperation in the context of robotics research and
applications has long been viewed as an unfortunate
-
even an embarrassing
-
intermediate stepping-stone on
the path toward robot autonomy. Nevertheless, the most
successful and socially-valuable of robotic applications
-
outside the realm of fixed-base pick-and-place,
welding, painting, etc., industrial robots
-
are probably
all teleoperated machines, even when that essential fact
is downplayed in the publicity. Where communication
time is not an issue and where precise motion and
manipulation are essential, e.g., in bomb examination
and disposal, teleoperation is essentially via a one-to-
one linkage, albeit via a wireless or fiber-optic channel
[7].
Where communication time is a serious issue and
where gross motion and manipulation are adequate, a
much higher level of abstraction has been demonstrated,
but what we might call “motivation”, as well as a
substantial measure of execution strategy, still must be
supplied by the human operator, e.g., in
NASA’s
Mars
exploration missions
[SI.
Telepresence, the sensing or
perception dual of teleoperation’s action or
mobility/manipulation, seems generally a less
disreputable term, but as discussed in Section
2,
the two
in reality go hand-in-hand.
NASA’s
websites, for
example, discuss “telerobotics” and “telepresence”
technologies enthusiastically, but seem consciously to
avoid any mention of the term “teleoperation”. Working
definitions of these three related terms are given in
[9],
but in fact they are practically self-explanatory.
Two things seem to be needed in the field:
(1)
open
and honest acknowledgement that the level of artificial
intelligence that will be required to obtain robotic task
performance that approximates the level of even a
minimum-wage
human
worker
is
still
so
far
off
in the
future that teleoperation is the only practical way to
apply existing off-the-shelf robotic hardware to real-
world applications, and
(2)
we therefore need to put
serious front-door effort and resources into developing
“high fidelity”
[2]
telepresence and teleoperation.
This perspective actually reduces Section
2
and
Section
3
to aspects of the same problem at different
depths: telepresence is mediated by information that
moves from sensing apparatus to display apparatus, and
teleoperation is mediated by information that moves
from interactive display apparatus to actuators being
observed by the sensing apparatus. Inasmuch as high
fidelity teleoperation is pretty much a reality, i.e.,
existing excellent servo systems can precisely control
the pose and stiffness of multi-jointed remote machines
with short time constants and negligible overshoot, the
subsequent discussion will focus on high fidelity
telepresence.
In order of importance the essential components
of
high fidelity telepresence are excellent remote vision,
excellent remote haptics, i.e., force/torque/touch, and
excellent audio.
Obtaining excellent audio is largely a matter of
acknowledging that it is important and thus providing an
appropriate quality and quantity of hardware. While at
first glance audio may seem negligibly important
compared to vision and haptics, experience shows that
for many tasks
-
particularly one-of-a-kind and first-
time-done tasks
-
sound provides essential support for
both mobility and manipulation. In particular, in the
absence of specific sensors having been provided for
every conceivable mechanical strain mode, the sound
accompanying vehicle or manipulator strain may be the
only indication of danger and the only guide to getting
out
of
danger.
The requirement to provide excellent haptic
information, especially for manipulation, is more
obvious. Sensing forces and torques is straightforward:
excellent sensors exist, it is only necessary to provide
them, and to provide the complementary forcers and
torquers as feedback at the operator’s interface, i.e.,
display. Touch is much harder, inasmuch as we have
neither adequate sensors nor adequate displays, but its
importance is perhaps somewhat overrated: gloves can
be an initial annoyance, but with practice all but the
most delicate tasks can be accomplished while wearing
them. For the perhaps small but important set of
specific tasks that do suffer irrecoverably for lack of
touch sensing and display, development of adequate if
not excellent systems ought to follow the cyclic
development path described in detail in Section
2.
That
leaves
us
with the vision
problem.
It
is
a
big
problem in large part because we have deluded
ourselves into believing that television is vision. Far
from it: NTSC-quality TV was ingeniously designed in
the
1930s
to exploit multiple bugs in the human eye-
brain system to achieve excellent apparent quality with
much less transmitted and displayed information than
would be required absent the bugs
[IO].
Fortunately
substantially higher quality is around the comer in the
form of HDTV and the possibilities it enables to achieve
high quality stereoscopic video. In fact, it can be
routinely achieved even before
HDTV
becomes routine
by employing high-resolution high-refresh rate
(120
Hz)
computer monitors
for
the displays. The high refresh
rate allows time-domain altemation of the left- and
right-eye perspectives without perceptible flicker and
without
loss
of pixel count
-
a problem with spatial
multiplexing approaches.
With present computer monitor and near-future
HDTV technologies for infrastructure, high fidelity
stereoscopic video will be available in flavors palatable
to almost all operators of remote robots. The few who
are
“stereo adepts”, able comfortably to fuse stereo pairs
more-or-less arbitrarily located and within their field of
view, will be happy with the “basic virtual reality
approach”, wherein the display system geometry mimics
perfectly the interocular separation, convergence, and
other geometrical and optical parameters of the human
eye pair. Others, less adept, will benefit from what the
author has called “just enough reality” or “kinder gentler
stereo”, in contrast to the virtual reality approach
[
111.
This alternative approach recognizes that “virtual reality
sickness” is due primarily to conflicts between the
“geometrically correct” cues received and processed by
the primary visual modality and other cues that are
missing due to the imperfection or absence of
complementary modalities. The classic example is the
conflict between convergence and accommodation
(focus) when left and right eyes are both focused
on
the
display screen surface, but stereo disparity dictates that
they be converged to a plane
in
front of or behind the
physical screen. The “just enough reality” approach is
simple and effective: reduce the interocular separation,
hence the on-screen disparity between left- and right-eye
images, to the minimum that is sufficient to stimulate
stereoscopic perception, i.e., an adequate understanding
of “frontness” and “backness” in the scene. This
minimum is remarkable small: for an arms-length
working distance and for the most stereo-normal
individuals, an interocular separation of
2-4
mm
is
adequate, compared to the typical
60-65
mm
physical
interocular separation in adults.
This sort of reduction of interocular separation has
been employed in various stereo-microscopy
applications, wherein the interocular separation is
reduced below it normal value in proportion to the
reduced distance between object and objective lens in a
microscope relative
to,
e.g., the
“arms
length” working
distance of a normal bench task.
In contrast, “just
enough reality
”
uses a reduced vs. a scaled value of
interocular separation to achieve comfort, i.e., the
absence of virtual reality sickness, at the expense of
precise geometrical reality. True scaling, such as is
employed in geometrically correct stereo-microscopy, is
a specific aspect of general scaling principles that are
discussed in the following section, particularly as they
apply to issues that are anticipated in the near future as
we increasingly progress toward mini-, micro-, nano-,
and molecular-scale-
robots.
4.
SCALING THEORY IN ROBOTICS
The final topic in this loolung-toward-the-future re-
visiting of the sense-think-act paradigm relates to
fundamental and engineering issues that arise as we
more in the directions of robots that are much larger or
much smaller than “everyday scale”. Actually much
larger is not much of a problem, inasmuch as when
roboticists build large structures they either are
mechanical engineers or they employ mechanical
engineers, and mechanical engineers are invariably well
schooled in how to design large structures that resist
scale-related disasters, e.g., collapse under their own
weight. On the other hand, we have hardly any
experience, and consequently hardly any intuition
concerning the very small sorts
of
robotic devices being
contemplated for applications as varied in nature and
scale as, for example, exploring and treating the
ailments of the human body from the inside-out, and
monitoring the earth’s atmosphere via some
10’’
robotic
sensor nodes distributed at a density of
1
km-3
throughout the atmosphere over the surface of the earth
to an altitude of
20
km.
Two somewhat counterintuitive generalities emerge
[12]:
(1)
big is weak, small is strong, i.e., it is large
structures that collapse under their own weight, large
animals that break their legs when they stumble, etc.,
whereas small structures and animals are practically
unaffected by gravity, and
(2)
horses “eat like birds” and
birds “eat like horses”, i.e., a large animal or machine
stores relatively larger quantities of energy and
dissipates relatively smaller quantities of energy than a
small animal or machine. The critical consequence of
(2)
is that the smaller the robot the smaller its range and
its operating time between refuelings. Applying simple
scaling models to contemplated small robots, the
conclusion is quickly reached that robots at currently
contemplated small scales will be not be able to store
enough energy to complete any sensible mission. They
will rather have to forage for fuel in the environment,
ju:jt as microorganisms have
to
forage for their nutrients
in the “soup” in which they must live. Thus we
understand why microorganisms
live
in liquid
environments vs. on dry surfaces or in the atmosphere,
and we also understand why long distance transportation
is most economically provided by a small number of
very large vessels vs. a large number of very small
vessels.
Quantitative scaling considerations, as outlined in
[12],
lead to a common set
of
questions that have to be
answered
in
the design of sensing, thinking, acting, and
communicating module and in the (by definition) robotic
systems composed
of
these modules. The first four
relate primarily to the context for contemplated robotic
systems, whereas the second four relate explicitly to the
quantitative scaling relationships.
(i) What limits
on
precision, rate, range, etc., are
imposed by the fundamental properties of material and
characteristics of nature?
(ii), Which of these have their origin in fundamental
sources of noise, i.e., thermodynamics, quantization, and
the uncertainty principle?
(iii) Which are practically unavoidable sources of
technical noise, e.g., the local cosmological
environment, the weather, etc?
(iv) Whch are avoidable
-
in space or time
-
sources of
technical noise, e.g., man-made electromagnetic
pollution, traffic-induced ground vibration, etc?
(v)
How are the above fundamental limitations
approached as the “robots” become smaller, the number
of robots in
a
team or
a
swarm becomes larger, etc?
(vi) Which have hard limits
-
implying that they can be
reached with finite resources
-
vs. which are approached
asymptotically
-
implying
a
law-of-diminishing-returns
on
increased expenditure of resources?
(vii) What are the limitations related to engineering and
economic realities?
(viii) How might these be overcome by “thinlung out
of
the box”,
e.g.,
schemes for “foraging” or “scavenging”
energy vs. packaging and carrying it?
It
is
important to note that the relationship between
a
robot’s scale and the value assumed by
an
environmental
parameter that influences robot performance can make
change-of-scale
in
a
particular direction either
advantageous or
a
disadvantageous, depending
on
the
performance aspect that is
most
crucial
in
a
given
application.
5.
CONCLUSIONS
In
keeping with the Robotic
Sensing
Workshop
(ROSE-2003)
theme
of
“Sensing
and
Perception in
21st
Century Robotics”, the paper revisits in
a
future-oriented
spirit the “sense-think-act” paradigm that has been
central to the practical architecture and philosophical
foundations of robotics since its emergence
as
an
embryonic academic discipline around
1980.
The first
step is to append “-communicate” to “sense-think-act”,
in recognition that whereas fixed-base robots have
evolved into economically valuable industrial
workhorses, the excitement
of
the field is clearly with
mobile robots, the
more
mobile
the
better.
In
Section
2
the paper examines sensor development
from a high level perspective: the process seems to work
best when sensors and the corresponding displays are
developed in
a
cycle whereby
an
improvement in either
places pressure
on
the other to improve correspondingly
so
as
to take advantage of the first.
In
Section
3
the model of Section
2
is expanded to
encompass issues relating to telepresence and
teleoperation. It is argued that these are not disreputable
interim stages that must be passed through
on
the way to
robot autonomy, but that there are rather excellent
reasons to devote effort and resources to optimizing this
aspect of man-machine interaction.
Section
4
summarizes qualitatively some recent
quantitative studies of scaling relations in robotics,
particularly relating to small and very small robots
intended to carry out sensing missions. It is specifically
noted that small robots cannot in principle carry
as
much
fuel relative to their energy dissipation
as
can large
robots, consequently at some limit of small size they will
have to extract energy from the environment.
REFERENCES
[
11
See,
for example,
http://www.ifi.unizh.cNgroups/ailab/teachng/semi2000/Classical
-AI.pdf
The concept of “high fidelity teleoperation” was articulated by
Gregg Podnar, personal communication.
For
a
variety of
more
precisely explained specific alternatives
see
htt~://www.users.~lohalnet.co.ukl-courtnan/ti~neexna~i.lit~n
and
httu:/lwww.batso~~nd.co~~~/us~~idet.html.
Illustrated for vegetation mapping at
http://www.fireimaging.com/imaging/veg/b/
Illustrated with
a
false-color mapping of altitude
on
Mars
at
httu://svs.
esfc.1iasa.eov/~~s/a000000/a0007OO/aOOO766/index.litml
Siegel, M., Tactile Display Development: The Driving Force
for
Tactile Sensor Development, Workshop on Haptic Audio Visual
Environments and Applications (HAVE-2002), 2002 November
17-18,
Ottawa
ONT
CA.
controllers.
htt~://ranier.ho.nasa.eov/telero~otics
uaeelintemetrobots.litinl,
follow links to
NASA
Telerobotics Program Resources
>
JPL
Robotics.
[9]
See
httn://vered.rose.
utoronto.ca/Droiects/telerobotics.html
for
practical definitions
of
the teleoperation, telerobotics, and
telepresence, and
a
collection of
good
references and
links.
[2]
[3]
[4]
[SI
[6]
[7]
[8]
See
See
littn://www.remotec-andros.co~~~,
follow links to products
>
[
101 See
htt~://www.ntsc-tv.com/ntsc-index-O~.lit~i~
[I I]
M. Siegel and
S.
Nagata,
“Just
Enough Reality:
Comfortable
3D
Viewing via Microstereopsis,”
IEEE
Trumacfions
on
Circuifs and
Systems
for
Video Technology,
vol. 10, pp. 387-396,2000,
[12] M. Siegel, “Scaling
Issues
on Robot-Based Sensing Missions”,
IEEE International Measurement Technology Conference (IMTC-
2003), 2003 May 20-22, Vail CO
USA.