Content uploaded by Claude Y. Laporte
Author content
All content in this area was uploaded by Claude Y. Laporte
Content may be subject to copyright.
THE
ARCHITECTURE
OF
A
VOICE
INTERACTIVE
BVALUATION
SYSTEM
AND
ITS
APPLICATION
TO
RADIO
FREQUENCY
SELECTION
IN
A
MILITARY
AIRCRAFT
Captain
Claude
Y,
Laporte
Assistant
Professor
Coll~ge
militaire
royal
de
Saint-Jean
Saint-Jean-sur-Richelieu,
Canada
and
Georges-~mile
April
Associate
Professor
~cole
Polytechnique
de
Montr6al
Montr~al,
Canada
ABSTRACT
This
paper
describes
the
architecture
of
a
voice
interative
evaluation
system
designed
around
the
STD
bus,
The
evaluation
system
is
composed
of
a
microcompu-
ter
module,
a
speech
synthesizer
module,
a
speech
recog-
nition
module and
an
application
or
interface
module,
The
voice
interactive
system
is
used,
under
the
control
of
a
host
computer,
in
a
laboratory
experiment
to
simu-
late
the
frequency
selection
of
VHF
and
UHF
radios
of
a
military
aircraft.
The
results
show
that
tracking
a
target
while
performing
radio
frequency
selection,
using
a
voice
interactive
system,
is
more
precise
than
the
manual
frequency
selection
method,
Keywords:
microprocessor,
voice
command,
speech
synthe-
sis,
ergonomy,
instrumentation,
INTRODUCTION
Flying
a
high
performance
aircraft
is
more
demanding
than
ever,
In
the
past,
high
performance
aircrafts
were
manned by two
crew-men:
a
pilot
and
an
assistant
respon-
sible
for
the
navigation
and
communication
functions,
Today,
the
.
pilot
is
left
alone
to
perform
all
these
tasks,
In
addition
to
the
fact
that
modern
aircraft
are
complex
to
fly,
the
pilot
is
threatened
by
highly
.
Perfor-
msnt
detection
systems
(radars,
satellites)
and
sophisti-
cated
.anti-aircraft
weapons,
We
propose
a way
to
help
the
pilot
by
allowing
him
to
command,
using
a
voice
interactive
system,
some
of
the
instruments
of
his
air-
craft,
The
prototype
system
controls
the
frequency
selection
of
UHF
and
VHF
radios
of
an
aircraft,
,
MEMORY
PROCESSOR
•
RAM
,
SERIAL
PORT
•
EPROM
•
EPROM
•
RAM
0
PEECH
SYNTHESIS
•
PCM
•
LPC
•
CVSD
•
PHONEME
An
evaluation
system
has
three
desirable
qualities;
it
should
be
flexible,
easy
to
use
and
inexpensive,
Ideally
it
should
als
.o be
small
and
easily
transporta-
ble,
The
proposed
speech
evaluation
system
is
designed
around
a
bus
widely
used
within
industry:
the
STD
Bus,
This
bus
was
selected
because
it
can
accommodate
most
8-bit
microprocessors,
its
boards
have
small
physical
dimensions
and
there
is
a
wide
diversity
of
functional
boards
available,
THE
STD
BUS
Briefly,
the
STD
bus
can
accomodate
most
8-bit
microprocessors
and
even
8-bit
versions
of
the
more
powerful
16-bit
microprocessors
like
tha
68008 and
the
8088,
The
STD
bus
is
presently
being
redefined
to
handle
16-bit
CPU,
CMOS
boards
are
now
available
and
with
the
advent
of
64HCXXX
chips,
the
temperature
range
will
be
extended
from
0
to
+ 55
Celsius
to
-40
to
+ 85
Celsius,
The
56-pin
bus
is
divided
in
five
groups:
8
pins
for
the
data
bus,
16
pins
for
the
address
bus,
22
pins
for
the
control
bus
and
6
pins
for
the
logic
power
bus
and 4
pins
for
the
auxiliary
power
bus,
The power
bus
can
accommodate
both
digital
and
analog
power
dis-
tributions,
All
digital
·
signals
on
the
busses
are
TIL
compatible,
Physically,
the
boards
are
small:
16,5
em
by
11,4
em
(6,5"
by
4,5"),
In
addition
to
the
wide
diversity
of
CPU
boards
.
there
is
a
wide
variety
of
support
modules:
RAM,
EPROM,
EE
.
PROM,
bubble
memory,
Analog
to
Digital
(A/D)
converters
and
Digital
-
to
Analog
(D/A)
converters,
digital
Input/Output
porta,
serial
communication
interface
etc,
The
modules
can
be
pur-
chased
separatly
or
as
a
development
system
with
floppy
or
hard
disks,
SPEECH
RECOGNITION
APPLICATION
MODULE
ADDRESS
BUS
DATA
BUS
CONTROL
BUS
POWER
BUS
Figure
1:
Architecture
of
the
voice
interactive
system
41
097-046
DESCRIPTION
OF
THE
VOICE
INTERACTIVE
SYSTEM
The
voice
interactive
system
is
composed
of
the
following
modules:
a
microcomputer
board,
speech
synthe-
sis
boards,
a
speech
recognition
board
and
an
application
board
,
An
additional
memory
board
can
be
added
if
neces-
sary,
Figure
1
illustrates
the
architecture
of
the
voice
interactive
evaluation
system,
The
microcomputer
module
The
microcomputer
module
uses
an
8-bit
6809
CPU
developped
by
Motorola,
The 6809. was
selected
because
of
tts
powerful
instruction
set
and
also
because
of
previous
in-house
experience
with
Motorola's
CPU
and
interface
circuits,
The
board
·
holds
a
24-pin
socket
that
can
be
used
to
accomodate
a
2-Kbyte
scratchpad
RAM,
another
24-pin
socket
is
avail
able
to
accomodate
a
2-Kbyte
EPROM,
This
EPROM
can
be
used
to
store
the
application
program,
The
serial
interface
can
be
used
to
communicate
with
a
host
computer,
a
terminal
or
a
printer,
During
the
development
phase,
a
host
computer
can
be
used
to
communicate
with
the
STD
bus
modules,
during
field
application
a
terminal
can
be
used
as
a
device
to
monitor
the
experiment,
The
speech
synthesizer
modules
In
order
to
evaluate
different
speech
synthesis
techniques
1
four
modules
were
designed
and
assem-
bled,
A
first
module,
illustrated
in
figure
2,
Figure
2:
PCM
Module
uses
the
Pulse
Code
Modulation
(PCM)
technique,
It
is
d
esig
ned
in
such
a way
that
speech
can
be
'
digiti-
zed
and
synthesized
at
three
different
sampling
frequencies:
4,
6
and
8 KHz. The
module
is
built
around
an
a,;alog
to
di
gi
tal
converter
(A/D)
used
to
digitize
the
speech
signal
and
a
digital
to
analog
converter
(D/A)
to
restitute
th
e
speech
signal,
A
pair
of
low-pass
fil
t
ers
(MC145414)
are
used
to
filter
undesirable
noise,
The
second
module
uses
a
Continuously
Variable
Slope
Delta
(CVSD)
modulation
circuit,
the
HC-55564,
manufactured
by
Harris
Semi-
conductor,
The
module
can
encode
speech
at
differ-
ent
b
it
rates
since
it
is
equipped
with
an
on-board
clock
sampling
circuit,
It
is
also
equipped
with
a
pair
of
low-pass
filters
identical
to
the
PCM
module.
A
third
s
peech
module ,
illustrated
in
figure
3,
uses
the
Linear
Predictive
Coefficient
(LPC)
speech
processor
TMS5220
manufactured
by
Texas
Instrument,
The
module
can
work
i n two
operating
modes,
First
the
5220
speech
processor
can
synthe-
size
sp
e
ech
by
reading
the
coefficients
stored
in
a
128 ·Kbit-ROM
designed
by
Texas
Instrument,
the
Voice
Synthesis
M
emory
TMS6100, The TMS5220
Speech
Pro-
cessor
may
directly
access
up
to
16 _TMS6100
circuits
with
no
exte
.
rnal
h
ard
w
are
required,
Such
an
arrangement
would
provide
about
30
minutes
of
speech
. The
second
operating
mode
allows
the
host
pro
cessor
to
fetc
h
from
its
own
·memory
the
vocabu
-
42
Figure
3:
LPC
Module
lary
to
be
synthesized,
The
inconvenience
of
the
second
approach
is
that
the
host
CPU
must
service
the
speech
processor
by
performing
a
series
of
read
and
store
of
the
LPC
coefficients
at
a
rate
of
approximatly
1200
bits
per
second,
Inversely,
in
the
first
approach,
the
host
processor
is
interrupted
only
when
the
speech
processor
has
completed
the
synthesis
of
a word
or
a
frame,
Finally,
the
last
speech
module
uses
the
CMOS
SSI
263
phonetic
speech
synthesizer
.
circuit
manufac-
tured
by
Silicon
Systems,
It
can
be
operated
in
two
modes,
At
low
bit
rate,
the
speech
quality
is
comparable
to
the
speech
produced
by
the
Vot
rex
SC-OlA
chip,
In
this
operating
mode
the
host
processor
only
transmits
the
codes
for
the
indi-
vidual
phonemes
to
be
synthesized,
This
explains
a
low
bit
rate
of
approximatly
70
bits
per
second,
In
th
e
second
operating
mode,
it
.
is
possible
to
dynamically
program
the
content
of
the
registers
of
the
263,
The
circuit
has
a
set
of
five
8-bit
registers
that
allow
software
control
of
speech
rate,
pitch
movement
rate,
amplitude,
articulation
index,
vocal
tract
filter
response
and
phoneme
selection
and
duration,
In
the
second
operating
. mode,
the
bit
rate
can
be
approximatly
400
bits
per
second,
Another
feature
of
the
263
1
that
the
SC-01
did
not
have,
is
.
an
additionnal
set
of
phonemes
that
allows
the
263
to
synthesize
speech
not
only
in
English
but
in
French
and
German
also,
The
speech
modules
have
two
common
features,
First;
each
speech
synthesize
·r
circuit
is
inter-
faced
with
the
host
CPU
(6809)
through
an
identical
circuit:
a
Peripheral
Interface
Adapter
(PIA)
chip.
Second,
all
speech
mo
dule
have
two
outputs:
a low
level
output
that
can
drive
an
amplifier
and
a
high
level
output
that
can
deliver,
using
a·n
LM-386
chip,
one
watt
to
an
8-ohm
sp~aker,
SID
liB
Figure
4:
Speech
recognition
block
diagram
T
I
I
I
I
The
speech
recognition
module
The
speech
recognition
module
is
built
around
the
Voice
Recog1,1ition
Chip
set,
VRC100-2,
manufac-
tured
by
Interstate
Electronics
Corporation.
The
chip
set
consists
of
a
16-channel
audio
spectrum
analyzer,
the
ASA-16, and a memory
circuit,
an
EPROH,
containing
the
processing
algorythms,
Figure
4
illustrates
the
block
diagram
of
the
speech
recog-
nition
module:
the
circuit
has
been
assembled
on
two
STD
bus
boards
(see
figure
5),
one
board
for
the
analog
circuitry
and
one
board
for
the
digital
cir-
cuitry.
As
shown
in
figure
4,
the
analog
signal
coming
from
a
microphone
is
first
amplified
by a
Figure
5:
Speech
recognition
module
programmable
amplifier
then
a
speech
equalizer
com-
pensates
for
the
roll-off,
The
signal
is
then
fed
to
a
16-band
audio
spectrum
analyzer,
the
ASA-16
developped
by
Interstate,
An
8-bit
analog
to
digit-
al
converter
digitizes
the
signal
coming
from
the
ASA-16
analyzer.
All
those
operations
are
performed
under
the
control
of
an
8-bit
microprocessor
the
6803
operating
at
a
frequency
of
2
Hhz.
Once
the
signal
has
been
stored
in
RA}I,
the
6803
performs
the
operations
related
to
the
training
or
the
recogni-
tion
of
words,
All
the
algorythms
to
perform
these
operations
are
stored
on
a
4-Kbyte
EPRml. The
firm-
ware
is
composed
of
a
set
of
16
commands: a
self-
test
command
checks
the
RAM
circuits
and
the
audio
spectrum
analyzer,
Evidently,
if
this
test
is
suc-
cessfull,
it
implies
that
most
of
the
digital
are
operational
(CPU
,address
decoders,
EPROM.,,),
The
train
command
creates
a
template
of
67
bytes
per
word
of
vocabulary,
the
recognize
command
puts
the
system
in
an
active
mode
where
each
spoken
utterance
will
be
given
an
identification
number
and
a
hit
score,
Finally,
a
set
of
utility
commands
allows
the
designer
to
modify
parameters
such
as
the
input-
amplifier
gain,
the
reject
threshold.
The
templa-
tes
generated
during
a
training
session
can
also
be
saved
and
retreived
using
the
upload/download
com-
mands,
All
the
communications
between
the
speech
rec-
ognition
module
and
the
STD
microprocessor
6809
are
done
using
a
parallel
interface
circuit
developped
by
Motorola:
the
6821
Peripheral
Interface
Adapter
(PIA),
The
communication
protocol
is
simple,
each
time
a
processor
wants
to
talk
to
the
other
proces-
sor
an
interruption
is
generated,
As
an
example,
when
the
rost
processor
(6809)
wants
the
speech
rec-
ognition
to
perform
a
self-test,
it
interrupts
the
6803
processor,
through
the
PIA,
then
sends
the
appropriate
data
to
initiate
a
self-test
command,
43
Once
the
6803
has
completed
the
self-test
sequence,
it
interrupts
the
6809 and
reports
the
stat
us
of
the
speech
circuitry,
For
our
purposes,
we
have
used
only
two
RAM
circuits
to
store
the
reference
templates,
This
means
that
we
can
accomodate
50
words
of
vocabu-
lary,
This
was
felt
to
be
quite
acceptable
for
our
field
evaluations
.•
The
application
module
The
speech
evaluation
system
is
presently
used
in
. a
project
to
control
radio
frequency
selection
aboard
a
military
jet
aircraft,
Ideally
essential
controls
and
displays
are
intelligently
position-
ned,
However ·
space
limitations
oJ;ten
make
this
impossible
to
achieve.
This
is
particulary
true
in
old
aircraft
like
the
CT-133
where
additionnal
avionics
·
sys
terns
have
been
added
to
meet
new
requirements,
In
the
CT-133
jet
aircraft,
radios
are
controlled
through
remote
controllers.
These
controllers
are
located
in
the
·
cockpit
while
the
radio
receivers
and
transmitters
are
located
in
the
cockpit
while
the
radio
receiv
.
ers
and
transmitters
are
located
in
the
nose
of
the
aircraft,
One
constraint
of
the
project
was
that
we
were
not
allowed
to
make
any
modification
to
either
the
radios
or
to
the
remote
controllers,
We
had
to
tap
the
communication
lines
between
the
controller
and
the
radio
in
order
to
get
the
data,
The
frequency
selection
of
the
VHF
radio
set
is
done
through
a
set
of
2
to
5
encoders/decoders,
there
is
one
encoder/decoder
per
decade.
The
data
lines
are
normally
floating
at
+20
volts,
When
a
frequency
is
selected,
2
out
of
5
lines
are
grounded
in
the
remote
controller,
This
data
is
detected
at
the
input
of
the
radio,
We
have
used
74C914
level
translators
to
interface
the
0/+20
volts
data
lines
from
the
controllers
to
the
TTL
levels
of
the
microcomputer
interface
chip,
Once
the
data
are
TTL
compatible,
a
parallel
interface
chip,
the
MC6821
from
Motorola,
is
used
to
read
all
the
data
lines
coming
from
the
remote
controller,
The
interface
for
the
UHF
radio
set
is
differ-
ent,
In
this
case,
data
from
the
remote
controller
is
transmitted
serially,
using
9614
differential
line
drivers,
to
the
radio,
The
remote
controller
continuously
transmits
pulse
trains
of
32
bits
together
with
a
clock
signal,
At
the
input
of
the
radio
the
data
bits
are
clocked
in
a
register
and
decoded.
To
acquire
the
data,
both
data
lines
and
clock
lines
have
to
be
read,
Figure
6
illustrates
the
interface
circuit
for
the
data
lines
only,
an
identical
circuit
is
needed
for
the
clock
signal,
The
signals
coming
from
the
controller
are
fed
to
9621
differential
line
recievers,
TTL
signal
from
the
9620
are
then
read
by a
parallel
interface
Figure
.
6:
Interface
circuit
for
the
UHF
radio
..
I
I'
i
I
I
I
I
chip,
The
program
that
reads
the
data
is
interrupt
driven,
on
every
active
clock
transition
the
CP.U
is
interrupted
'
and
the
logic
value
of
the
data
line
is
read
and
stored,
Description
of
the
test
bench
The
test
bench
waa
assembled
to
partially
reproduce
the
aircraft
environment.
It
is
divided
in
two
parts.
First
is
the
voice
interactive
evalu-
ation
system
second,
a
host
computer,
Figure
7
illustrates
the
block-diagram
of
the
set-up,
On
the
right
is
the
voice
inter<~ctive
system
with
its
VHF
Figure
7 r
Block-diagram
of
the
test
bench
and
UHF
remote
controllers
connected
to
the
applica-
tion
module,
A'Iso shown
are
a
jleadset
and
micro-
phone
hooked
to
the
speech
synthesizer
and
the
speech
recognition
modules,
T
his
system
is
linked
to
the
host
computer
through
a
serial
interface
port
on
the
mic
rocomp
uter
module,
On
the
left
side
of
figure
7
is
the
host
computer,
it
is
an
8-bit
micro-
compute
r
manufactured
by
Southwest
Technical
Prod-
ucts
Corporation,
It
is
composed
of
a
processor
module,
main
and
secondary
memories,
a
system
moni-
tor,
a
serial
interface
port
to
control
s
graphic
screen
and
sn
interface
module
to
measure
of
the
pilot
stress
level
snd
the
position
of
the
stick,
Finally
a
high
resolution
graphic
screen
(1024xl024)
is
used
to
display
a
moving
target
and
the
position
of
the
tracking
joysti
ck
,
Description
of
the
ex
periment
Before
describing
the
experiment
by
itself
we
have
to
describe
the
l a
yout
of
the
instruments
of
the
aircraft
selected
for
this
project,
On
the
left
side
of
the
pilot
is
a
lever
used
to
control
the
speed
of
his
engine,
He
controls
the
movements
of
the
airplane
with
its
two
feet
on
the
rudders
and
with
his
right
hand
on
the
stick,
In
this
aircraft
the
UHF
and
VHF
remote
controllers
are
located
on
the
right
side
of
the
cock
pit,
When
the
pilot
wants
to
change
radio
frequency
he
has
to
do
a
sequence
of
movements,
First,
he
has
to
move
his
left
hand
from
the
throttle
to
the
stick,
then
he
moves
his
right
ha nd
from
the
stick
to
one
of
the
remote
control-
l e
rs,
he
selects
the
proper
radio
frequency
and
he
comes
back
to
his
normal
flying
.
position,
F.
or
a
first
experiment,
volunteers
were
asked
to
perform
a
series
of
ten
frequency
selections,
The
host
co
mpu
ter,
using
the
voic
e
interactive
sys-
44
tern
speech
synthesizer,
instructed
the
subjects
to
manually
select
a
frequency,
At
the
same
time,
the
subject
was
tracking
a
moving
target
with
a
joys-
tick,
The
host
computer
updates
the
position
of
the
target
ten
time
.s
per
second,
It
also
measures
the
position
of
the
joystick
and
the
value
of
the
relative
stress
level
of
the
subject
at
the
same
rate,
During
a
second
experiment,
the
subjects
are
instructed
to
perform
a
aerie
of
ten
frequency
selections
using
the
voice
interactive
system,
Presentation
of
the
Results
i
I
I
..
..
"'
II
Figure
8
shows
the
measurements
performed
during
a
45-second
time
frame
of
the
first
experi-
ment,
The
continuous
line
illustrates
the
tracking
error,
the
dotted
line
illustrates
the
relative
stress
level
of
the
subject,
One
can
note
that
the
tracking
error
increases
as
soon
as
the
subject
is
performing
s
frequency
selection
task,
The
dura-
tion
of
the
task
is
indicated
by
a
series
of
"- "
above
the
time
axis,
The
relative
stress
level
also
increases
when
the
sub.l
ect
is
asked
to
perform
a
new
frequency
.
selection,
As
soon
·
as
the
frequen-
cy
selection
is
completed,
the
tracking
error
and
the
relative
stress
level
decrease.
Figure
9
shows
the
same
measurements
performed
during
a
voice
interactive
session,
We
can
notice
that
the
track-
ing
error
is
lower
than
in
the
first
experiment
and
it
does
not
increase
when
the
subject
is
asked
to
do
a
frequency
selection,
We
can
also
note
an
increase
in
the
relative
stress
level
of
the
subject
during
a
voice
command
operation,
This
may
be
explained
by
the
fact
that
this
was
the
firat
time
the
subjects
were
exposed
to
a
voice
inter-
active
system.
Table
1 shows
the
results
of
the
measurements
performed
on
the
three
subjects
,
..
u
..
"
mtl
(UC,)
--
!UCIIIIJ
IIWII
-
..
-
lftlll
uvn.
"
40
"''
6
"
~
..
I
" I
..
Figure
8:
Results
of
the
measurements
during
manual
freque
nc
y
sele
cti
on
i
I
I
60
n.AC&UO
IDOl
100 8
I>
J
,.
~
,.I
n
2>
10
u
20
2>
so
IS 10
T1M1
(IIC)
Figure
9r
Results
of
the
measurements
during
a
voice
interactive
session
I
I
I
!
Tracking
error:
Relative
stress
Tracking
error:
Relative
stress
Tracking
error:
Relative
stres1
average
standard
deviation
level:
average
standard
deviation
average
standard
deviation
level:
average
standard
deviation
average
standard
deviation
level:
average
standard
deviation
Manual
Selection
48
69
617
67
43
43
529
23
21
16
755
53
Table
1,
Table of
results
·
CONCLUSIONS
1/o
.
ica
Command
23
21
639
40
19
9
510
25
15
5
7.76
52
The
voice
interactive
system provides the experimen-
ter
with a
flexible
and
portable
tool
to
evaluate
poten-
tial
voice
applications,
It
has been used
in
a
project
to
evaluate
the
feasibility
of
voice
command
for
the
radio
frequency
selection,
Results
show
that
tracking
a
target
lihUe performing
radio
frequency
selection,
using
a voice
interactive
system,
is
more
precise
than
the
manual frequency ·
selection
metho.d
lihUe
stress
level
is
not
significantly
changed,
REFERENCES
Harvey,
J,H,,
Single-board
Computers Boost System
Throughput, Computer Design,
Nov,
15, 1985, pp, 45-59,
Wiggins R,, Brantingham
L,,
Three-Chip System
Synthesizes
Human
Speech,
&lectronics,
Vol, 51,
number
18, pp.
109-116.
STD
BUS
Technical
Manual
and
product
catalog,
Pro-Log
Corp,,
Feb, 1983,
Ciarca
s,,
Build a
Third-Generation
Phonetic Speach
Synthesizet',
Byte, Vol,
9,
number 3, 1984, pp. 28•42•
EON
Board Level
up
Syatem
Directory,
EON,
April 19, 1984,
pp,
233-276.