Planning and Acting under Uncertainty: A New Model for Spoken Dialogue Systems
ABSTRACT Uncertainty plays a central role in spoken dialogue systems. Some stochastic
models like Markov decision process (MDP) are used to model the dialogue
manager. But the partially observable system state and user intention hinder
the natural representation of the dialogue state. MDP-based system degrades
fast when uncertainty about a user's intention increases. We propose a novel
dialogue model based on the partially observable Markov decision process
(POMDP). We use hidden system states and user intentions as the state set,
parser results and low-level information as the observation set, domain actions
and dialogue repair actions as the action set. Here the low-level information
is extracted from different input modals, including speech, keyboard, mouse,
etc., using Bayesian networks. Because of the limitation of the exact
algorithms, we focus on heuristic approximation algorithms and their
applicability in POMDP for dialogue management. We also propose two methods for
grid point selection in grid-based approximation algorithms.
-
Citations (0)
-
Cited In (0)
Page 1
Planning?and?Acting?under?Uncertainty:?A?New?Model?for?
Spoken?Dialogue?Systems?
?
?
?
Bo?Zhang*,?Qingsheng?Cai?
Department?of?Computer?Science?&?Technology?
University?of?Science?&?Technology?Of?China?
Hefei?230027,?P.R.China?
?
?
Abstract?
Jianfeng?Mao*?
Department?of?Automation?
Tsinghua?University?
Beijing?100084,?P.R.China?
?
?
represents?the?knowledge?of?the?system?(Levin?et?al.,?1998?
&?2000;?Singh?et?al.,?2000).?The?MDP-based?system?can?
handle?uncertainty?about?the?effect?of?its?own?utterance,?
but? fails? to? handle? the? uncertainty? about? the? user’s?
intention?when?it?deviates?from?the?recognized?utterance?
in? a? complex? environment.? The? reason? is? that? the?
knowledge? of? the? MDP-based? system? can? match? the?
user’s?intention?only?in?an?ideal?environment.??
Baining?Guo?
Microsoft?Research,?China?
3F?Sigma?Center,?49?Zhichun?Rd.?
Beijing?100080,?P.R.China?
?
?
Uncertainty? plays? a? central? role? in? spoken?
dialogue? systems.? Some? stochastic? models? like?
the?Markov?decision?process?(MDP)?are?used?to?
model? the? dialogue? manager.? But? the? partially?
observable? system? state? and? user? intentions?
hinder?the?natural?representation?of?the?dialogue?
state.? A? MDP-based? system? degrades? quickly?
when? uncertainty? about? a? user’s? intention?
increases.? We? propose? a? novel? dialogue? model?
based? on? the? partially? observable? Markov?
decision? process? (POMDP).? We? use? hidden?
system?states?and?user?intentions?as?the?state?set,?
parser? results? and? low-level? information? as? the?
observation?set,?and?domain?actions?and?dialogue?
repair?actions?as?the?action?set.?Here,?low-level?
information? is? extracted? from? different? input?
modalities,? including? speech,? keyboard,? mouse,?
etc.,? using? Bayesian? networks.? Because? of? the?
limitation?of?the?exact?algorithms,?we?focus?on?
heuristic? approximation? algorithms? and? their?
applicability?in?POMDP?
management.?We?also?propose?two?methods?for?
grid?point?selection?in?grid-based?algorithms.?
for?dialogue?
1?
INTRODUCTION*?
Uncertainty? plays? a? central? role? in? spoken? dialogue?
systems.? The? system? may? be? uncertain? about? a? user’s?
intention?behind?a?recognized?utterance?and?also?the?effect?
of?its?own?utterance.?Although?participants?may?tolerate?a?
small? degree? of? uncertainty,? an? excessive? amount? in? a?
given?context?can?lead?to?misunderstanding?with?different?
costs?(Paek?and?Horvitz,?1999).?
A? dialogue? manager? can? be? formulated? as? a? Markov?
decision? process? (MDP),? where? the? dialogue? state?
?????????????????????????????????????????????????????????? ?
*?This?work?was?performed?while?these?authors?were?visiting?Microsoft?
Research,?China.?
A? dialogue? system? should? be? able? to? carry? on? a?
conversation? without? the? luxury? of? perfect? speech?
recognition,? language? understanding,? or? precise? user?
models?(Paek?&?Horvitz,?1999).?To?handle?the?uncertainty?
emerging?from?the?deviation?of?the?dialogue?state?and?the?
system?observation,?we?must?convert?the?definition?of?the?
dialogue?state?and?find?a?bridge?to?the?system?observation.?
The? partially? observable? Markov? decision? process?
(POMDP)?framework,?a?model?of?an?agent?planning?and?
acting?under?uncertainty,?provides?a?systematic?method?of?
doing?just?that?(Kaelbling?et?al.,?1998).??
Dialogue? management? is? essentially? a? problem? of?
planning? and? acting? under? uncertainty.? In? the? POMDP?
framework,?we?define?the?dialogue?state?by?a?set?of?state?
variables? directly? representing? the? user’s? intentions? and?
hidden? system? states.? The? observations? come? from?
different? input? modalities,? including? speech,? keyboard,?
mouse,?etc.?The?observation?probability?function?serves?as?
the?bridge?from?states?to?observations.??
Compared?with?the?POMDP-based?model?in?(Roy?et?al.,?
2000),?our?model?adds?hidden?system?states?in?addition?to?
user? intentions,? which? can? make? use? of? the? abstract?
observations?from?multi-modality?input.?The?construction?
of?the?state?transition?and?observation?probability?function?
is? also? simplified? by? exploiting? the? use? of? 2TBNs?
(Boutilier? et? al.,? 1999).? Unlike? their? augmented? MDP?
approximation,?we?use?heuristic?approximation?methods,?
which?are?robust?and?effective.??
Since?the?number?of?multi-modality?observations?is?large,?
using? them?directly?will?
computationally? intractable.? We? propose? an? observation?
make?the?POMDP?
Page 2
extraction?method?using?Bayesian?networks.?A?Bayesian?
network? can? combine? observations? from? various?
information?sources?and?extract?abstract?observations?to?
support? user? barge-in? and? turn-taking.? It? reduces? the?
number? of? observations? without? ignoring? important?
information.??
The?remainder?of?this?paper?is?organized?as?follows.?In?the?
next?section,?we?briefly?introduce?the?POMDP?model?and?
algorithms.?In?section?3,?we?present?the?dialogue?manager?
in? the? form? of? a? POMDP.? The? observation? extraction?
model?using?Bayesian?networks?is?described?in?section?4.?
Section? 5? contains? our? experiments? and? discussions.?
Section?6?is?devoted?to?the?conclusion?and?future?work.??
2?
POMDP?AND?ALGORITHMS?
The? planning? problem? can? be? defined? as? this:? given? a?
complete?and?correct?model?of?the?world?dynamics?and?a?
reward? structure,? find? an? optimal? way? to? behave?
(Kaelbling?et?al.,?1998).?Many?planning?problems?can?be?
modeled?as?MDPs?and?analyzed?using?the?techniques?of?
decision? theory? (Boutilier? et? al.,? 1999).? An? MDP? is? a?
model?of?an?agent?interacting?synchronously?with?a?world?
(Kaelbling?et?al.,?1998).?It?can?be?specified?as?a?tuple?<S,?
A,?T,?R?>,?where?
•?S?is?a?finite?set?of?states?of?the?world;?
•?A?is?a?finite?set?of?actions;?
•?T:S×A→Π(?S)?is?the?state-transition?function,?given?for?
each? world? state? and? agent? action,? a? probability?
distribution?over?world?states?(?we?write?T(s,?a,?s’)?for?
the?probability?of?ending?in?state?s’,?given?that?the?agent?
starts?in?state?s?and?takes?the?action?a);?and?
•?R:S×A→R?is?the?reward?function,?given?the?expected?
immediate?reward?gained?by?the?agent?for?taking?each?
action?in?each?state?(we?write?R(s,?a)?for?the?expected?
reward?for?taking?action?a?in?state?s).?
A?POMDP?is?an?MDP?in?which?the?agent?is?unable?to?
observe?the?current?state.?Instead,?it?makes?an?observation?
based?on?the?action?and?resulting?state.?A?POMDP?can?be?
specified?by?extending?the?MDP?as?a?tuple?<S,?A,?T,?R,?Ω,?
O>,?where?
•?S,?A,?T,?and?R?define?an?MDP;?
•?Ω? is? a? finite? set? of? observations? the? agent? can?
experience?in?its?world;?and?
•?O:S×A→Π(Ω)?is?the?observation?function,?which?gives,?
for? each? action? and? resulting? state,? a? probability?
distribution?over?possible?observations?(we?write?O(s’,?
a,?o)?for?the?probability?of?making?observation?o?given?
that?the?agent?took?action?a?and?landed?in?state?s’).?
In?POMDP,?an?agent?can?use?a?belief?state?to?represent?its?
knowledge? of? which? state? it? may? be? in.? A? belief? state?
b:S→[0,1]?is?a?probability?distribution?over?S.?An?agent?
uses?the?belief?update?function?τ:B×Ω×A→B?to?update?its?
belief?state.?Here?B?is?the?infinite?set?of?all?the?belief?states,?
τ?is?defined?as:?
T)o , a , ' s(O) ' s)(o , a , b(
Ss
∈
The?agent?is?expected?to?gain?the?immediate?reward??
=
Ss
?for?taking?action?a?in?belief?state?b.?
)b , a|o Pr(/ )s( b ) ' s , a , s(
?
=??
τ
.?
?
∈
)s( b ) a , s (R) a , b(
ρ
??
?
A?POMDP?can?be?converted?to?an?equivalent?belief?state?
MDP? and? solved? by? value? iteration? (Bellman,? 1957),?
considering?only?the?piecewise?linear?and?convex?(PWLC)?
representations?of?value?function?estimates?(Sondik,?1971).?
Using?a?vector?set?Γi?to?represent?a?PWLC?function?set?Vi:?
=
s
value?iteration?becomes:?
?
∈
∈
S
i
?
?
i
)s()
α
s ( bmax )b(V
ii
,?
??
?
?
???
?
?
???
+=
????
∈∈∈
∈∈
+
?
oS' s
i
Ss
?
?
Aa
i
)' s ( ) s ( b )o , a , ' s (O)' s , a , s (Tmax) a , b(
ρ
max)b(V
ii
αγ
1
or?
???
Ω∈
o
∈
+
+=
S' s
j
i
W
i
)' s ()
α
o , a , ' s ( O ) ' s , a , s(T)a , s (R)s(
o
γα
1
?
to?iterate?in?the?form?of?the?vector?set?directly.?Here?
=
α
1
represents?a?combination?of?an?action?a?and?a?permutation?
of?
the?dominated?vectors?(Cassandra,?1998)?are?removed.??
}),
α
o{},...,,
α
o{},,o{ , a(W
||j
i||
j
i
j
i
Ω
Ω
21
2
?
? i?vectors?of?size?|Ω?|.?In?each?step?of?the?iteration,?all?
There?exist?many?exact?algorithms?to?solve?the?optimal?
solution?for?POMDP?(Cassandra,?1998).?The?incremental?
pruning? algorithm? (Cassandra? et? al.,? 1997)? is? the? more?
recent? and? efficient? one.? But? it? still? suffers? from? the?
exponential?growth?of?the?number?of?the?vectors?used?to?
represent?the?optimal?value?function.??
Some?heuristic?methods?approximate?the?optimal?solution?
by? considering? only? the? partial? vector? set.? We? are?
interested?in?four?algorithms?(Hauskrecht,?2000):?
•?MDP?approximation?is?the?simplest?way?that?assumes?
full?observation?of?the?current?state.?Only?one?vector?is?
needed?to?represent?the?value?function:??
?
?
?
???
+=
?
∈
∈
+
S' s
i
Aa
i
)' s () ' s , a , s (T)a , s(R max)s(
αγα
1
.?
•?QMDP? approximation,? based? on? the? same? full?
observation?assumption,?uses?Q-functions?as?the?value?
function? for? each? state-action? pair.? So? each? action?
corresponds?to?one?vector:??
+=
' s
•?The?Fast?Informed?Bound?(FIB)?method?differs? from?
the?MDP?and?QMDP?approximation?in?that?the?agent?
cannot?know?the?current?state?of?the?world.?Here?the?
assumption?is?the?full?observation?of?future?states.?So?
we?can?select?the?best?vector?for?every?observation?and?
every?current?state?separately:?
+=
o ' s
With? exact? algorithms,? we? seek? the? best? vector? for?
?
∈
∈
+
S
' a
i
A' a
a
i
) ' s (max) ' s , a , s(T)a , s (R)s(
αγα
1
.?
??
Ω∈∈
∈
+
S
' a
i
A' a
a
i
)' s ()
α
o , a , ' s (O)' s , a , s (T max) a , s (R) s (
1
γα
.?
Page 3
every?observation?and?the?combination?of?all?states.?
•?Grid-based?approximation?with?linear?function?updates?
considers?only?the?value?functions?of?some?belief?states.?
For?every?belief?state?b?and?action?a1,?we?update?the?
vector?set?using:?
? ?
Ω∈∈
oS ' s
where??
?
=
S ' sSs
j
An? incremental? approach? (Hauskrecht,? 2000)? was?
proposed?since?the?grid-based?method?is?not?guaranteed?
to?converge.?The?idea?is?to?keep?the?original?vectors?in?
the?updated?vector?set.?
+
+=
)o , a , b(
ι
α
i
a , b
i
)' s() o , a , ' s(O) ' s , a , s(T)a , s (R)s (
1
γα
,?
? ?
∈∈
??
?
??
j
i
) ' s ()s ( b )o , a , ' s(O)' s , a , s (T maxarg)o , a , b(
ια
.?
Some? exact? methods? also? use? a? collection? of? linear?
functions?for?a?set?of?belief?states?to?represent?the?PWLC?
value?function.?But?the?exact?set?of?belief?states?is?difficult?
to?initially?identify.?The?grid-based?method?uses?an?easy-
to-compute? but? incomplete? set? of? belief? states? to?
approximate?the?optimal?solution.??
We?consider?four?strategies?for?selecting?the?grid?of?belief?
state?points.?The?first?two?strategies?are?relatively?simple.?
The?first?one?is?the?fixed-grid?strategy,?which?chooses?the?
extreme?points?of?the?belief?state?space.?The?second?one?is?
the?random-grid?strategy,?which?chooses?a?random?grid?at?
each?iteration?step.??
We? propose? another? two? strategies? based? on? the? belief?
state?points?generated?in?simulation.?The?first?one?chooses?
the? grid? points? randomly? from? the? simulation? points? at?
each? iteration? step.? We? called? it? the? random-s-grid?
strategy.?The?second?one,?the?cluster-s-grid?strategy,?must?
cluster?the?simulation?points?first.? A?typical?point? from?
each?cluster?is?chosen?as?the?grid?point.?Since?the?belief?
state? space? is? different? from? other? multi-dimensional?
spaces,?we?also?consider?the?entropy?of?the?belief?state?in?
clustering:??
*)Entropy(?) Dist(
121
b ,bb
=
Here? EDist(b1,? b2)? represents? the? Euclidian? distance?
between?b1?and?b2.?
? )
2
EDist(*)Entropy(
12
,bbb
.?
3?
POMDP?FOR?DIALOGUE?MANAGER??
Dialogue?management?is?essentially?a?planning?problem:?
the?task?of?the?dialogue?manager?is?planning?an?optimal?
policy? and? acting? under? uncertainty.? The? dialogue?
manager,?a?high-level?component?of?our?spoken?dialogue?
system,? is? modeled? in? this? section? using? the? POMDP.?
When?we?get?an?(near-)optimal?solution?of?a?POMDP?in?
the?form?of?value?function,?which?is?represented?using?a?
vector? set,? we? can? derive? the? (near-)optimal? policy?
?????????????????????????????????????????????????????????? ?
1?In?(Hauskrecht,?2000),?only?one?value?function?is?used?for?each?belief?
state.?However,?we?use?Q-functions?for?every?belief?state?and?action?pair.?
?:B→A? from? this? solution.? The? policy? will? select? the?
action?that?maximizes?the?expected?reward.??
A?simple?example?is?used?to?explain?the?model.?It?also?
serves? as? the? example? in? our? experiment.? It? is? derived?
from?the?tour?guide?system?of?the?Forbidden?City,?the?first?
application?of?the?E-Partner?project?at?Microsoft?Research,?
China.? Maggie? the? tour? guide? chooses? her? action?
according?to?the?user’s?request.??If?she?is?not?clear?about?
the? user’s? request,? she? can? ask? the? user? for? more?
information? using? different? strategies.? To? simplify? the?
discussion,? we? only? consider? two? kinds? of? requests:? to?
visit? a? place,? or? to? ask? for? a? property? of? a? place.? Two?
places? used? in? the? example? are? a? hall? and? a? gate.? The?
properties?of?these?two?places?are?their?height?and?size.???
In?the?following?sub-sections,?we?propose?our?model?for?
the?dialogue?manager?as?the?six?elements?in?the?POMDP?
tuple,?and?compare?it?with?the?model?in?(Roy?et?al.,?2000).??
3.1? STATE?
Generally,?a?dialogue? manager?must?have?the?ability?to?
clarify? the? dialogue? state.? It? updates? its? state? upon?
receiving? different? information? from? the? user? or? the?
environment.?Because?of?the?inaccessible?user?intention?
and?system?hidden?state,?many?dialogue?managers?(like?
MDP-based? model? in? (Levin? et? al.,? 2000))? use? the?
system’s? knowledge? as? the? dialogue? state.? Usually? the?
knowledge?is?gained?directly?from?different?observations?
including?user?utterances,?results?of?database?query,?etc.?
These? dialogue? managers? work? well? in? the? ideal?
environment? where? recognized? user? utterances? closely?
reflect? the? user’s? intention.? But? when? uncertainty?
increases,? i.e.,? the? environment? becomes? noisier? or? the?
user’s?task?becomes?more?complex,?the?performance?may?
degrade?quickly.?
In?the?POMDP?framework,?a?dialogue?manager?can?deal?
with?the?uncertainty?of?the?exact?dialogue?state.?So?we?can?
employ? the? user’s? intentions? and? other? hidden? system?
states? as? our? dialogue? states? directly.? The? dialogue?
manager? can? update? its? belief? state? using? observations?
extracted? from? the? user’s? utterances? and? from? other?
information.? This? makes? it? as? easy? as? the? MDP-based?
system? to? construct? the? reward? function.? Even? more?
importantly,? the? dialogue? manager? is? more? robust? in?
handling? the? uncertainty? emerging? from? the? deviation?
between?user’s?utterances?and?intentions.??
We? use? a? factored? representation? of? our? dialogue? state?
space? (Boutilier? et? al,? 1999).? In? our? example,? dialogue?
states,? which? are? also? POMDP? states,? consist? of? two?
independent? parts:? user’s? intentions? and? hidden? system?
states.?We?use?three?state?variables?to?represent?the?user’s?
intention.? They? are? the? request? type? (visit? or? ask),? the?
place?(gate?or?hall),?and?the?property?(height?or?size).?The?
hidden?system?states?include?normal,?silent,?error?(noisy),?
Page 4
error? (silent),? and? overheard.? Altogether? there? are? 40?
states,?among?which?10?state?pairs?are?equivalent?pairs,?
since?the?property?variable?is?useless?when?the?value?of?
the?type?is?equal?to?“visit”.?
3.2? ACTION?
We?divide?actions?in?our?dialogue?system?into?two?classes:?
actions?for?satisfying?the?user’s?request,?and?actions?for?
gathering?more?information?from?the?user?to?clarify?the?
user’s?intention.?Actions?belonging?to?the?first?class?are?
domain?actions?and?are?usually?simple,?and?actions?in?the?
second? class? are? known? as? repair? actions.? Selecting?
appropriate?repair?actions?is?very?important?to?the?success?
of? a? spoken? dialogue? system? working? in? a? complex?
environment.?
In?our?example,?we?define?two?actions?in?the?first?class:?
answering?the?user’s?question?and?changing?the?place?at?
the?user’s?request.?Repair?actions?include?asking?for?the?
user?to?repeat?the?statement,?asking?for?the?user’s?intention?
(type,?place?or?property),?declaring?the?user’s?intention,?
ignoring? the? user,? and? trouble-shooting? (executed? when?
the?dialogue?manager?believes?the?speech?recognizer?does?
not? work? properly,? i.e.? microphone? fails? to? work).? The?
total?number?of?actions?is?18.?
3.3? STATE?TRANSITION?FUNCTION?
We?have?two?assumptions?on?the?state?transition?function.?
First,?we?assume?that?the?user’s?intention?does?not?change?
until?the?request?is?processed.?The?repair?actions?do?not?
change?the?user’s?intention.?The?second?assumption?is?that?
only? the? troubleshooting?action?is?related?to?the? hidden?
system?state.?Other?actions?do?not?affect?the?transitions?
among?the?hidden?system?states.?These?two?assumptions?
greatly?simplify?the?design?of?the?state?transition?function.??
Since?we?use?a?factored?representation?of?the?state?space,?
we? can? use? a? two-stage? temporal? Bayesian? network?
(2TBN)? (Boutilier? et? al.,? 1999)? to? specify? the? state?
transition?function?for?every?action.?Some?actions?of?the?
same?type?can?share?the?same?2TBN.??
In?our?example,?we?have?only?three?simple?2TBNs?for?the?
18? actions.? To? demonstrate? the? ability? of? handling? the?
tremendous? uncertainty,? we? design? the? hidden? state?
transition? function? by? intentionally? increasing? the?
possibility?of?falling?into?abnormal?states.??This?model?is?
used?in?the?simulation?and?our?system?turns?out?to?be?very?
robust.? In? real? world? applications,? the? state? transition?
function?must?reflect?the?properties?of?the?system?and?the?
environment,?i.e.?both?the?hardware?and?software?of?the?
speech?recognizer.?
3.4? OBSERVATION?
Observations?come?from?recognized?user’s?utterances?and?
other? low-level? information? contained? in? the? speech?
recognition? result,? parser? result,? keyboard? and? mouse?
input,?etc.?Since?the?structures?of?these?observations?are?
different? from? each? other,? to? combine? them? in? the?
POMDP? framework,? we? must? extract? some? abstract?
observations? from? various? information? sources.? Simply?
ignoring?some?useful?information?like?confidence?in?the?
speech?recognition?result?is?not?wise.??
Like?the?Quartet?architecture?(Paek?&?Horvitz,?2000),?we?
use?a?channel?level?and?a?signal?level?as?the?lower?levels?
of? the? spoken? dialogue? system.? Bayesian? networks? are?
used?to?infer?the?status?of?each?level?from?the?low-level?
information.?In?the?channel?level,?the?system?can?be?in?
“Channel”?or?“No?channel”?status;?in?the?signal?level,?it?
can?be?in?“Signal”?or?“No?signal”?status.?So?we?have?four?
possible? observations? now.? When? the? system? is? in?
“Channel”?and?“Signal”?status,?we?divide?this?observation?
into? more? detailed? observations,? which? come? from? the?
parser?result?of?the?recognized?user’s?utterance.??
In?our?example,?22?observations?come?directly?from?the?
user’s?utterances,?including?affirmative?answers,?negative?
answers,?and?(incomplete)?user?requests,? which? may?be?
any? meaningful? combination? of? the? type,? place? and/or?
property?of?the?request.?So?the?POMDP?model?includes?25?
observations.??
3.5? OBSERVATION?PROBABILITY?FUNCTION?
The? most? complex? part? of? the? POMDP? model? for? the?
spoken? dialogue? system? is? the? observation? probability?
function.? The? same? action? may? lead? to? different?
observations?even?in?the?same?state.?One?reason?is?that?the?
speech? recognizer? is? far? from? perfect.? To? make? things?
worse,?different?users,?or?even?the?same?user?at?different?
times,? tend? to? provide? different? answers? for? the? same?
question.? So? the? construction? of? the? observation?
probability?function?requires? deep?insight?of?the?speech?
recognizer?and?a?good?user?model.??
We? also? use? 2TBNs? to? construct? the? observation?
probability? function.? Eleven? 2TBNs? are? used? in? our?
example,?among?which?six?are?for?declaring?user?intention?
actions? and? three? are? for? asking? user? intention? actions.?
2TBNs?in?the?same?group?are?very?similar.?All?of?them?are?
handcrafted,? depending? a? lot? on? the? experience? of? the?
developer.?
3.6? REWARD?FUNCTION?
The?reward?function?is?relatively?simple.?We?need?only?
specify?the?rewards?of?a?particular?action?executed?in?a?
particular? state,? i.e.? a? positive? reward? when? the? answer?
matches?the?user’s?request,?or?a?negative?reward?(cost)?if?a?
mismatch?occurs.?Repair?actions?are?also?associated?with?
negative?rewards.??
In?our?example,?11?different?rewards?are?specified.?These?
rewards? belong? to? two? classes:? 1)? for? repair? actions:?
Page 5
?
Figure?1:?Bayesian?Network?for?User?Barge-in?Detection?
?
Figure?2:?Bayesian?Network?for?Turn-taking?
asking?the?user?to?repeat,?asking?for?the?user’s?intention,?
declaring?the?user’s?intention,?ignoring?(right/wrong),?and?
troubleshooting?(right/wrong);?and?2)?for?domain?actions:?
wrong?type,?right?type?without?a?right?place?or?property,?
right?type?with?a?right?place?or?property,?and?totally?right.?
3.7? COMPARISON?WITH?OTHER?MODEL?
In? this? section,? we? compare? our? model? with? the? model?
used?in?(Roy?et?al.,?2000).?In?addition?to?user?intentions?
they?used?to?construct?the?POMDP?state?space,?we?also?
consider? hidden? system? states,? which? are? useful? in?
complex? environments? since? they? make? use? of?
observations? from? low-level? information.? Our? factored?
representation? differs? from? their? flat? state? space.? It?
simplifies?the?construction?of?the?state?transition?function?
and?observation?probability?function?by?exploiting?the?use?
of?2TBNs.?It?is?also?much?easier?to?adjust?the?parameters.??
The? definition? of? the? observations? is? also? different.?
Besides? utterances? that? can? reflect? the? user’s? (partial)?
intention,? we? also? consider? other? observations? inferred?
from?the?low-level?information?of?the?speech?recognizer,?
robust?parser?and?other?input?modalities.?It?can?improve?
the?robustness?of?the?system?and?make?it?easier?to?include?
more? input? modalities? like? visual? input? from? a? video?
camera.??
Roy? et? al.? use? an? augmented? MDP? to? approximate? the?
original?POMDP.?They?replace?the?belief?state?with?a?pair?
consisting?of?the?most?likely?state?and?the?entropy?of?the?
belief?state.?This?approach?can?be?applied?to?only?some?
POMDPs.?In?the?POMDP?for?our?example,?some?states?
are? equivalent.? So? the? entropy? cannot? fully? reflect? the?
degree?of?uncertainty?of?the?current?belief?state.?We?use?
several?approximation?algorithms?to?solve?the?POMDP.?
Among? them,? the? grid-based? algorithm? turns? out? to? be?
effective?and?adaptive.?
4?
BAYESIAN?NETWORKS?FOR?
OBSERVATION?EXTRACTION?
Low-level? observations? extracted? from? different? input?
modalities?are?very?important?for?handling?the?uncertainty?
in?a?spoken?dialogue?system.?We?use?Bayesian?networks?
to? extract? these? low-level? observations? in? the? channel?
level?and?signal?level?(Paek?&?Horvitz,?2000).??
The?existence?of?a?channel?for?communication?reflects?the?
channel?level?status,?and?the?existence?of?a?signal?reflects?
the?signal?level?status.?The?status?of?the?channel?level?is?
primary?inferred?from?the?user’s?focus,?and?the?status?of?
the?signal?level?is?relevant?to?the?confidence?of?the?speech?
recognition?result,?the?parser?result,?etc.??
We?want?to?know?the?status?of?these?two?levels?at?two?
critical? time? points.? The? first? one? is? for? user? barge-in?
detection? and? the? second? one? is? for? general? turn-taking?
between?the?system?and?the?user.??
Since? we? have? only? limited? input? modalities--speech,?
keyboard? and? mouse--the? Bayesian? network? for? the?
channel?level?is?quite?simple.?It?can?be?extended?when?we?
want?to?add?more?input?modalities?like?a?video?camera?to?
detect?eye?gaze.?Our?current?focus?is?the?more?complex?
signal?level.??
To?support?user?barge-in?in?a?noisy?environment,?we?must?
detect?the?user’s?voice?before?we?get?the?recognition?result.?
Upon?receiving?the?sound?start?event,?the?confidence?of?
the?following?three?hypotheses?are?checked,?from?which?
the?status?of?the?signal?level?can?be?inferred?(Figure?1).?In?
Microsoft?Speech?SDK?we?use,?the?confidence?consists?of?
two?parts:??ActuralConfidence?(AC)?is?a?binary?number?
and?SREngineConfidence?(EC)?is?a?real?number.?
Upon? receiving? the? recognition? result,? we? check? the?
confidence?of?its?elements,?its?parser?score,?etc.?A?slightly?
different?Bayesian?network?(Figure?2)?is?used?to?infer?the?
status?of?the?signal?and?channel?level?for?turn-taking.??
One?advantage?of?the?Bayesian?network?is?that?the?result?
of?status?is?a?probability?distribution?instead?of?an?exact?
state.? We? can? adjust? the? threshold? to? tune? the? system.?
Another?advantage?is?that?our?system?is?easy?to?extend,?
e.g.?we?need?only?add?a?node?to?the?Bayesian?network?for?
the? channel? level? and? change? some? probability?
distributions?if?we?want?to?add?eye?gaze?information.??
5?
EXPERIMENTS?AND?DISCUSSIONS?
Our? experiments? include? two? parts:? the? real? world?
experiment? of? observation? extraction? using? a? Bayesian?
network?in?the?signal?level,?and?the?simulated?experiment?
of?POMDP-based?dialogue?management.???