The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management
ABSTRACT This paper explains how Partially Observable Markov Decision Processes (POMDPs) can provide a principled mathematical framework for modelling the inherent uncertainty in spoken dialogue systems. It briefly summarises the basic mathematics and explains why exact optimisation is intractable. It then describes in some detail a form of approximation called the Hidden Information State model which does scale and which can be used to build practical systems. A prototype HIS system for the tourist information domain is evaluated and compared with a baseline MDP system using both user simulations and a live user trial. The results give strong support to the central contention that the POMDP-based framework is both a tractable and powerful approach to building more robust spoken dialogue systems.
-
Citations (0)
- Cited In (2)
-
Article: Affective dialogue management using factored POMDPs
[show abstract] [hide abstract]
ABSTRACT: Partially Observable Markov Decision Processes (POMDPs) have been demonstrated empirically to be good models for robust spoken dialogue design. This chapter shows that such models are also very appropriate for designing affective dialogue systems. We describe how to model affective dialogue systems using POMDPs and propose a novel approach to develop an affective dialogue model using factored POMDPs. We apply this model for a single-slot route navigation dialogue problem as a proof of concept. The experimental results demonstrate that integrating user’s affect into a POMDP-based dialogue manager is not only a nice idea but also helpful for improving the dialogue manager performance given that the user’s affect influences their behavior. Further, our practical findings and experiments on the model tractability are expected to be helpful for designers and researchers who are interested in practical implementation of dialogue systems using the state-of-the-art POMDP techniques. -
Conference Proceeding: Continual processing of situated dialogue in human-robot collaborative activities
[show abstract] [hide abstract]
ABSTRACT: This paper presents an implemented approach of processing situated dialogue between a human and a robot. The focus is on task-oriented dialogue, set in the larger context of human-robot collaborative activity. The approach models understanding and production of dialogue to include intension (what is being talked about), intention (the goal of why something is being said), and attention (what is being focused on). These dimensions are directly construed in terms of assumptions and assertions on situated multi-agent belief models. The approach is continual in that it allows for interpretations to be dynamically retracted, revised, or deferred. This makes it possible to deal with the inherent asymmetry in how robots and humans tend to understand dialogue, and the world in which it is set. The approach has been fully implemented, and integrated into a cognitive robot. The paper discusses the implementation, and illustrates it in a collaborative learning setting.RO-MAN, 2010 IEEE; 10/2010
Page 1
The Hidden Information Statemodel: A practical framework for
POMDP-based spoken dialogue management
Steve Young*, Milica Gas ˇic ´, Simon Keizer, Franc ?ois Mairesse, Jost Schatzmann,
Blaise Thomson, Kai Yu
Cambridge University Engineering Department, Trumpington Street, Cambridge, CB2 1PZ, UK
Received 31 October 2008; received in revised form 29 January 2009; accepted 2 April 2009
Abstract
This paper explains how Partially Observable Markov Decision Processes (POMDPs) can provide a principled math-
ematical framework for modelling the inherent uncertainty in spoken dialogue systems. It briefly summarises the basic
mathematics and explains why exact optimisation is intractable. It then describes in some detail a form of approximation
called the Hidden Information State model which does scale and which can be used to build practical systems. A prototype
HIS system for the tourist information domain is evaluated and compared with a baseline MDP system using both user
simulations and a live user trial. The results give strong support to the central contention that the POMDP-based frame-
work is both a tractable and powerful approach to building more robust spoken dialogue systems.
? 2009 Elsevier Ltd. All rights reserved.
Keywords: Statistical dialogue systems; POMDP; Hidden Information State model
1. Introduction
Spoken dialogue systems allow a human user to interact with a machine using voice as the primary com-
munication medium. The structure of a conventional spoken dialogue system (SDS) is shown in Fig. 1a. It
contains three major components: speech understanding, speech generation and dialogue management. The
speech understanding component typically consists of a speech recogniser and semantic decoder, and its func-
tion is to map user utterances into some abstract representation of the user’s intended speech act au. The
speech generation component consists of a natural language generator and a speech synthesiser and it per-
forms the inverse operation of mapping the machine’s response amback into speech.
The core of the dialogue manager is a data structure which represents the system’s view of the world in the
form of a machine state sm. This machine state typically encodes an estimate of three distinct sources of
0885-2308/$ - see front matter ? 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.csl.2009.04.001
*Corresponding author. Tel.: +44 (0) 1223 332654; fax: +44 (0) 1223 332662.
E-mail address: sjy@eng.cam.ac.uk (S. Young).
Available online at www.sciencedirect.com
Computer Speech and Language xxx (2009) xxx–xxx
www.elsevier.com/locate/csl
COMPUTER
SPEECH AND
LANGUAGE
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 2
information: the user’s input act ~ au, an estimate of the intended user goal ~ su1and some record of the dialogue
history ~ sd.2Most conventional dialogue managers rely on hand-crafted deterministic rules for interpreting
each (noisy) user dialogue act ~ auand updating the state. Based on each new state estimate, a dialogue policy
is used to select an appropriate machine response in the form of a dialogue act am. This dialogue cycle con-
tinues until either the user’s goal is satisfied or the dialogue fails.
The designers of such systems have to deal with a number of problems. Since the user’s state suis unknown
and the decoded inputs ~ auare prone to errors,3there is a significant chance that~ smwill be incorrect. Hence, the
dialogue manager must include quite complex error recovery procedures. Recognition confidence scores can
reduce the incidence of misunderstandings but these require thresholds to be set which are themselves noto-
riously difficult to optimise. Modern recognisers can produce alternative recognition hypotheses but it is not
clear in practice how these can be used effectively. Finally, the impact of decisions taken by the dialogue man-
ager do not necessarily have an immediate effect, hence dialogue optimisation requires forward planning and
this is extremely difficult in a deterministic framework.
As has been argued previously, taking a statistical approach to spoken dialogue system design provides the
opportunity for solving many of the above problems in a flexible and principled way (Young, 2002). Early
attempts at using a statistical approach modelled the dialogue system as a Markov Decision Process
(MDP) (Levin et al., 1998, 2000; Young, 2000). MDPs provide a good statistical framework since they allow
forward planning and hence dialogue policy optimisation through reinforcement learning (Sutton and Barto,
1998). However, MDPs assume that the entire state is observable. Hence, they cannot account for either the
uncertainty in the user state~ suand dialogue history~ sd, or the uncertainty in the decoded user’s dialogue act ~ au.
Fig. 1b shows an alternative model for the dialogue management component in which the uncertainty in the
user’s dialogue act and the uncertainty in the machine state are shown explicitly. In this new model, the state
estimator maintains a distribution across all states rather than a point-estimate of the most likely state. The
dialogue manager therefore tracks all possible dialogue paths rather than just the most likely path. The ensu-
ing dialogue decision is then based on the distribution over all dialogue states rather than just a specific state.
This allows competing hypotheses to be considered in determining the machine’s next move and simplifies
error recovery since the dialogue manager can simply shift its attention to an alternative hypothesis rather
than trying to repair the existing one.
If the decoded user input act is regarded as an observation, then the dialogue model shown in Fig. 1b is a
Partially Observable MDP (POMDP) (Kaelbling et al., 1998). The distribution over dialogue states is called
Speech
Understanding
State
Estimator
Dialog
Policy
Speech
Generation
User
a m
a m
a u
~
s u
s d
a u
~
s m
~
(a) Conventional
(b) Probabilistic
a u
. .
a u
~ 1
~ N
Belief
Estimator
Dialog
Policy
a m
s d
s m
s < =
u ,a u , s d >
~ ~ ~
~
s ) ( b
m
Fig. 1. Structure of a spoken dialogue system: auand amdenote user and machine dialogue acts, suis the user goal and sdis the dialogue
history. The tilde indicates an estimate. Part (a) shows a conventional dialogue manager which maintains a single state estimate; (b) shows
a dialogue manager which maintains a distribution over all states and accepts an N-best list of alternative user inputs.
1Examples of user goals are ‘‘finding flight information between London and New York”, ‘‘finding a Chinese restaurant near the centre
of town”, ‘‘ordering three Pepperoni pizza’s”, etc.
2Since both auand suare noisy, the record of dialogue history is also noisy, hence the tilde on sd.
3Word error rate (WER) is typically in the 10–30% range.
2
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 3
the belief state b and dialogue policies are based on b rather than the estimated state. The key advantage of the
POMDP formalism is that it provides a complete and principled framework for modelling the inherent uncer-
tainty in a spoken dialogue system. Thus, it naturally accommodates the implicit uncertainty in the estimate of
the user’s goal and the explicit uncertainty in the N-best list of decoded user acts. Associated with each dia-
logue state and machine action is a reward. The choice of reward function is a dialogue design issue, but it will
typically provide positive rewards for satisfying the user’s goal, and negative rewards for failure and wasting
time. As with regular MDPs, dialogue optimisation is equivalent to finding a decision policy which maximises
the total reward.
The use of POMDPs for any practical system is, however, far from straightforward. Firstly, in common
with MDPs, dialogue states are complex and hence the full state space of a practical SDS would be intractably
large. Secondly, since a belief distribution b over a discrete state s of cardinality n þ 1 lies in a real-valued
n-dimensional simplex, a POMDP is equivalent to an MDP with a continuous state space b 2 Rn. Thus, a
POMDP policy is a mapping from regions in n-dimensional belief space to actions. Not surprisingly these
are extremely difficult to construct and whilst exact solution algorithms do exist, they do not scale to problems
with more than a few states/actions.
There are two broad approaches to achieving a practical and tractable implementation of a POMDP-based
dialogue system. Firstly, the state can be factored into a number of simple discrete components. It then
becomes feasible to represent probability distributions over each individual factor. The most obvious examples
of these are so-called slot filling applications where the complete dialogue state is reduced to the state of a
small number of slots that require to be filled (Williams and Young,2007a,b). For more complex applications,
the assumption of independence between slots can be relaxed somewhat by using dynamic Bayesian Networks
(Thomson et al., 2008a,b). Provided that each slot or network node has only a few dependencies, tractable
systems can be built and belief estimates maintained with acceptable accuracy using approximate inference
(Bishop, 2006).
A second approach to approximating a POMDP-based dialogue system is to retain a full and rich state
representation but only maintain probability estimates over the most likely states. Conceptually, this
approach can be viewed as maintaining a set of dialogue managers executing in parallel where each dialogue
manager follows a distinct path. At each dialogue turn, the probability of each dialogue manager represent-
ing the true state of the dialogue is computed and the system response is then based on the probability dis-
tribution across all dialogue managers. This viewpoint is interesting because it provides a migration path for
current dialogue system architectures to evolve into POMDP-based architectures (Henderson and Lemon,
2008).
This paper describes a specific implementation of the second approach called the Hidden Information State
(HIS) model. The HIS system uses a full state representation in which similar states are grouped into partitions
and a single belief is maintained for each partition. The system typically maintains a distribution of upto sev-
eral hundred partitions corresponding to many thousands of dialogue states. The HIS system has been
described in outline in a number of conference papers (Young, 2006; Young et al., 2007; Gas ˇic ´ et al.,
2008). The aim of this paper is to provide a single coherent and more detailed description of how the HIS sys-
tem works, and an assessment of its performance characteristics. This paper is structured as follows. Section 2
reviews the theory of POMDP-based dialogue management in general and then gives the specific theory under-
lying the HIS system. Section 3 explains how the various probability models in the HIS system are imple-
mented and Section 4 deals with policy representation and optimisation. Section 5 describes how HIS
systems are trained using a user simulator and Section 6 presents experimental results. This paper ends in Sec-
tion 7 with our conclusions.
2. POMDPs for dialogue management
2.1. POMDP basics
Formally, a POMDP is defined as a tuple fSm;Am;T;R;O;Z;k;b0g where Smis a set of machine states; Amis
a set of actions that the machine may take; T defines a transition probability Pðs0
mjsm;amÞ; R defines the
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
3
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 4
expected immediate reward rðsm;amÞ; O is a set of observations; Z defines an observation probability
Pðo0js0
A POMDP operates as follows. At each time step, the machine is in some unobserved state sm2 Sm. Since
smis not known exactly, a distribution over states is maintained called a belief state such that the probability of
being in state smgiven belief state b is bðsmÞ.5Based on the current belief state b, the machine selects an action
am2 Am, receives a reward rðsm;amÞ, and transitions to a new (unobserved) state s0
smandam. The machine then receives an observation o02 O which is dependent on s0
distribution b is updated based on o0and amas follows
m;amÞ; k is a geometric discount factor 0 6 k 6 1; and b0is an initial belief state.4
m, where s0
mandam. Finally, the belief
mdepends only on
b0ðs0
mÞ ¼ Pðs0
mjo0;am;bÞ ¼Pðo0js0
m;am;bÞPðs0
Pðo0jam;bÞ
Pðs0
mjam;bÞ
¼Pðo0js0
m;amÞP
sm2SmPðs0
Pðo0jam;bÞ
mjam;b;smÞPðsmjam;bÞ
¼ k ? Pðo0js0
m;amÞ
X
sm2Sm
mjam;smÞbðsmÞ;
ð1Þ
where k ¼ 1=Pðo0jam;bÞ is a normalisation constant (Kaelbling et al., 1998). Maintaining this belief state as the
dialogue evolves is called belief monitoring.
At each time step t, the machine receives a reward rðbt;am;tÞ based on the current belief state btand the
selected action am;t. The cumulative, infinite horizon, discounted reward is called the return and it is given by
X
Each action am;tis determined by a policy pðbtÞ and building a POMDP system involves finding the policy
p?which maximises the return. Unlike the case of MDPs, the policy is a function of a continuous multi-dimen-
sional variable and hence its representation is not straightforward. However, it can be shown that for finite
horizon problems the value function of the optimal policy is piecewise linear and convex in belief space (Son-
dik, 1971). Hence, it can be represented by a set of policy vectors where each vector viis associated with an
action aðiÞ 2 Amand viðsÞ equals the expected value of taking action aðiÞ in state s. Given a complete set of
policy vectors, the optimal value function and corresponding policy is
R ¼
1
t¼0
ktrðbt;am;tÞ ¼
X
1
t¼0
ktX
sm2Sm
btðsmÞrðsm;am;tÞ:
ð2Þ
Vp?ðbÞ ¼ max
i
fvi? bgð3Þ
and
p?ðbÞ ¼ a argmax
i
fvi? bg
??
:
ð4Þ
This representation is illustrated in Fig. 2a for the case of jSmj ¼ 2 and a value function requiring just three
distinct linear segments. The value function itself is the upper heavy line. In this case, b is a 2-D vector such
that b1¼ 1 ? b2, hence it can be denoted by a single point on the horizontal axis. The linear segments divide
belief space into three regions and the optimal action to take in each region is the action associated with the
uppermost vector in that region. So for example, if b < x in Fig. 2a, then action að1Þ would be chosen, if
x < b < y then action að2Þ would be chosen, and so on.
The optimal exact value function can be found by working backwards from the terminal state in a process
called value iteration. At each iteration t, policy vectors are generated for all possible action/observation pairs
and their corresponding values are computed in terms of the policy vectors at step t ? 1. As t increases, the
estimated value function converges to the optimal value function from which the optimal policy can be
derived. Many spurious policy vectors are generated during this process, and these can be pruned to limit
the combinatorial explosion in the total number of vectors (Kaelbling et al., 1998; Littman, 1994). Unfortu-
nately, this pruning is itself computationally expensive and in practice, exact optimisation is not tractable.
However, approximate solutions can still provide useful policies. The simplest approach is to discretise belief
4Here and elsewhere, primes are used to denote the state of a variable at time t þ 1 given that the unprimed version is at time t.
5In other words, a belief state b is a vector whose component values give the probabilities of being in each machine state.
4
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 5
space and then use standard MDP optimisation methods (Sutton and Barto, 1998). Since belief space is poten-
tially very large, grid points are concentrated on those regions which are likely to be visited (Brafman, 1997;
Bonet, 2002). This is illustrated in Fig. 2b. Each belief point represents the value function at that point and it
will have associated with it the corresponding optimal action to take. When an action is required for an arbi-
trary belief point b, the nearest belief point is found and its action is used. However, this can lead to errors and
hence the distribution of grid points in belief space is very important. For example, in Fig. 2b, if b ¼ z then
að3Þ would be selected although from Fig. 2a it can be seen that the optimal action was actually að2Þ.
Grid-based methods are often criticised because they do not scale well to large state spaces and hence meth-
ods which support interpolation between points are often preferred (Pineau et al., 2003). However, the HIS
model described below avoids the scaling problem by mapping the full belief space into a much reduced sum-
mary space where grid-based approximations appear to work reasonably well.
2.2. The SDS-POMDP
As discussed in the introduction, when using a POMDP to model a spoken dialogue system, it is natural to
factor the machine state into three components sm¼ hsu;au;sdi (Williams and Poupart, 2005).6The belief state
b is then a distribution over these three components.
The transition function for an SDS-POMDP follows directly by substituting the factored state into the reg-
ular POMDP transition function and making some reasonable independence assumptions, i.e.
Pðs0
mjsm;amÞ ¼ Pðs0
This is the transition model. Making similar reasonable independence assumptions regarding the observa-
tion function gives
u;a0
u;s0
djsu;au;sd;amÞ ? Pðs0
ujsu;amÞPða0
ujs0
u;amÞPðs0
djs0
u;a0
u;sd;amÞ:
ð5Þ
Pðo0js0
This is the observation model.
The above factoring simplifies the belief update equation since substituting (5) and (6) into (1) gives
X
user goal model
As shown by the labelling in (7), the probability distribution for a0
observation probability that is conditioned on a0
given the goal s0
user acts, each with an associated probability, i.e.
m;amÞ ¼ Pðo0js0
u;a0
u;s0
d;amÞ ? Pðo0ja0
uÞ:
ð6Þ
b0ðs0
u;a0
u;s0
dÞ ¼ k ?
Pðo0ja0
|fflfflfflffl{zfflfflfflffl}
uÞ
observation model
Pða0
|fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}
ujs0
u;amÞ
user action model
su
Pðs0
|fflfflfflfflfflfflffl{zfflfflfflfflfflfflffl}
uto be scaled by the probability that the user would speak a0
ujsu;amÞ?
X
sd
Pðs0
|fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
djs0
u;a0
u;sd;amÞ
dialogue history model
bðsu;sdÞ:
ð7Þ
uis called the user action model. It allows the
u
uand the last system prompt am. The observation o is typically an N-best list of hypothesised
o ¼ ½h~ a1
u;p1i;h~ a2
u;p2i;...;h~ aN
ujoÞ for n ¼ 1...N. Thus, the combination of the observation model with the user action
model allows a posterior reranking of an N-best recognition output to be made based on the dialogue system’s
u;pNi?ð8Þ
such that pn¼ Pð~ an
s m =1 s m =2
b
v 1
v 2
v 3
a (1)
a (2)
a (3)
x
y
(a)
s m =1 s m =2
b
(b)
a ( 1 )
a ( 3 )
a(2)
z
Fig. 2. POMDP value function representation: (a) shows exact an representation and (b) shows a grid-based representation.
6Note that alternative POMDP formulations can also be used for SDS (e.g. Roy et al., 2000; Zhang et al., 2001).
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
5
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 6
current beliefs. The user goal model determines the probability of the user goal switching from suto s0
ing the system prompt am. Finally, the dialogue history model enables information relating to the dialogue his-
tory to be maintained such as a grounding state. Fig. 3 shows the SDS-POMDP in the form of a Bayesian
network. The form of the statistical models used in the HIS system to represent the components of (7) are
described in detail in Section 3.
ufollow-
2.3. The HIS POMDP
The key idea underlying the HIS model is that at any point in a dialogue, most of the possible user goal
states have identical beliefs simply because there has been no evidence offered by the user to distinguish them.
For example, if the system believes that the user might have asked for a Chinese restaurant then it must record
food=Chinese as a possible goal state, but there is no point in maintaining individual goal states for all of
the other possible food types such as food=Italian, food=French, etc. which have not been mentioned
since they are all equally likely. Significant computation can therefore be saved by grouping these states into
equivalence classes. The HIS model therefore assumes that at any time t, the space of all user goals Sucan be
divided into a number of equivalence classes p 2 P where the members of each class are tied together and are
indistinguishable. These equivalence classes are called partitions. Initially, all states su2 Suare in a single par-
tition p0. As the dialogue progresses, this root partition is repeatedly split into smaller partitions. This splitting
is binary i.e. p ! fp0;p ? p0g with probability Pðp0jpÞ.7
Since multiple splits can occur at each time step, this binary split assumption places no restriction on the
possible refinement of partitions from one turn to the next. Given that user goal space is partitioned in this
way, beliefs can be computed based on partitions of Surather than on the individual states of Su. Initially
the belief state is just b0ðp0Þ ¼ 1. Whenever a user state partition p is split, its belief mass is reallocated as
bðp0Þ ¼ Pðp0jpÞbðpÞ
Note that this splitting of belief mass is simply a reallocation of existing mass, it is not a belief update,
rather it is belief refinement.
A further simplification made in the HIS model is to assume that user goals change rarely and when they
do, they are relatively easy to detect. Hence, instead of assuming that the user goal can change every turn, the
HIS model assumes that user goal changes will be explicitly identified by the dialogue policy decision process
and signalled by explicit system responses (this is explained further in Section 4.3). Normal turn-by-turn belief
updating is therefore based on the assumption that
and
bðp ? p0Þ ¼ ð1 ? Pðp0jpÞÞbðpÞ:
ð9Þ
Pðs0
ujsuÞ ¼ dðs0
u;suÞ ¼ 1 if s0
u;suÞ;
ð10Þ
where dðs0
u¼ suand 0 otherwise.
s u
s d
a u
a m
r
s u
s d
d
a u
a m
n
a u
~
n
a u
~
r
Fig. 3. SDS-POMDP dialogue framework as a Bayesian network. Solid arrows denote conditional dependencies, open circles denote
hidden variables and shaded circles denote observations. The machine action amis a function of the belief distribution bðsu;au;sdÞ.
7The notation p ? p0should be read as denoting the partition containing all states in p except for those in p0.
6
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 7
Substituting (9) and (10) into (7) gives the belief update equation for the HIS model
b0ðp0;a0
u;s0
dÞ ¼ k ?
Pðo0ja0
|fflfflfflffl{zfflfflfflffl}
uÞ
observation model
Pða0
|fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}
ujp0;amÞ
user action model
X
sd
Pðs0
|fflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflffl}
djp0;a0
dialogue history model
u;sd;amÞ
Pðp0jpÞbðp;sdÞ
|fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl}
belief refinement
;
ð11Þ
where p is the parent of p0and where sdis the dialogue history shared by all states in partition p.
3. Implementation of the HIS probability models
The HIS system represents a complete dialogue state as a triple hp;au;sdi representing a partition of equiv-
alent user goal states, the last user dialogue act and a common dialogue history. This triple and its associated
belief represents a hypothesis regarding the true (hidden) state of the dialogue system and the HIS system
maintains a list of the M most likely hypotheses as an approximation to the full distribution. Each complete
dialogue cycle consists of splitting any necessary partitions and updating their beliefs according to Eq. (11). In
order to understand this process in more detail, it is first necessary to explain how goal states, partitions and
dialogue acts are represented in the HIS system. The implementation of the three core probability models is
then described. The section ends with a summary of how belief monitoring in the HIS system works in
practice.
3.1. Dialogue acts
Dialogue acts in the HIS system take the form actt (a1¼ v1;a2¼ v2;::) where actt denotes the type of dia-
logue act and the arguments are act items consisting of attribute–value pairs.8Attributes refer to nodes in the
user goal state tree described below and values are the atomic values that can be assigned to those nodes. In
some cases, the value can be omitted, for example, where the intention is to query the value of an attribute.
The same dialogue act representation is used for both user inputs and the dialogue manager outputs. The most
common acts are listed in Table 1 and a simple dialogue illustrating their use is shown in Table 2. A full
description of the dialogue act set used by the HIS system is given in Young (2007).
Note that in the HIS system every utterance translates into a single dialogue act. When the speech under-
standing system is uncertain, the input to the dialogue manager is typically a list of alternative dialogue acts.
For example, the utterance ‘‘I want an Italian place near the cinema” spoken in a noisy background might
yield
inform(type=restaurant,food=Italian, near=cinema) {0.6}
inform(type=restaurant,food=Indian, near=cinema) {0.3}
Table 1
The principal dialogue acts used by the HIS System. The Sys and Usr columns indicate which are valid acts for the system outputs and user
inputs, respectively.
ActSys
p
p
p
?
p
p
p
p
?
?
p
Usr
p
p
p
p
p
?
?
p
p
p
p
Description
hello(a ¼ x,b ¼ y,...)
inform(a ¼ x,b ¼ y,...)
request(a,b ¼ x,...)
reqalts(a ¼ x,...)
confirm(a ¼ x,b ¼ y,...)
confreq(a ¼ x,...,d)
select(a ¼ x,a ¼ y)
affirm(a ¼ x,b ¼ y,...)
negate(a ¼ x)
deny(a ¼ x)
bye()
Open a dialogue and give info a ¼ x, b ¼ y, ...
Give information a ¼ x, b ¼ y, ...
Request value for a given b ¼ x, ...
Request alternative with a ¼ x, ...
Explicitly confirm a ¼ x, b ¼ y, ...
Implicitly confirm a ¼ x, ... and request value of d
Select either a ¼ x or a ¼ y
Affirm and give further info a ¼ x, b ¼ y, ...
Negate and give corrected value a ¼ x
Deny that a ¼ x
Close a dialogue
8Attributes are referred to as slots in some dialogue systems.
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
7
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 8
inform(type=bar,near=cinema) {0.1}
where the number in braces is the probability of each dialogue act hypothesis. This list corresponds to the
form of observation defined in (8).
3.2. User goal state partitioning
The HIS system is designed primarily for information retrieval tasks. However, compared to simple slot
filling systems, it supports a much richer set of user goal representations based on tree-like structures built
from classes, subtypes, and atomic values where a class represents a collection of related values and a subtype
denotes a specific variant of a class. The space of all possible user goals is described by a set of simple onto-
logical rules which define the various subtypes of each class and the atomic values which can be assigned to
terminal classes.
The way this works is best illustrated by example. Table 3 shows a fragment of the rule set used by a simple
application in the Tourist Information domain.9Rules of the form x ! yðp;q;r;...Þ define how a class x can
be expanded as a subtype y with class members hp;q;r;...i. Rules of the form x ¼ ðajbjcj...Þ define which
atomic values ha;b;c;...i can be assigned to the terminal class x. As an example, the tree structure shown
in Fig. 4 would represent the goal of a user who was looking for a cheap Chinese restaurant in the centre
of town.
For a given set of rules, there are many possible trees that can be derived and as noted previously, it is not
practical to instantiate all of them and maintain a belief for each. Furthermore, without evidence from the
user, the majority of possible goals would all have the same very low probability.
To avoid this, it may be observed that a partially instantiated tree can be used to efficiently represent the set
of all possible user goals derivable from that tree. A partially instantiated tree therefore represents a partition
of user goal space. Splitting a partition then equates to selecting one of the uninstantiated leaf nodes and split-
ting it so that one copy is left unchanged and the other copy is expanded according to one or more ontology
Table 2
An example dialogue and its representation at the dialogue act level.
UtteranceDialogue act
U:
S:
Hi, I am looking for somewhere to eat.
You are looking for a restaurant.
What type of food do you like?
I’d like an Italian somewhere near the museum.
Roma is a nice Italian restaurant near the museum.
Is it reasonably priced?
Yes, Roma is in the moderate price range.
What is the phone number?
The number of Roma is 385456.
Ok, thank you goodbye.
hello(task = find,type=restaurant)
confreq(type = restaurant,food)
U:
S:
U:
S:
U:
S:
U:
inform(food = Italian,near=museum)
inform(name = ‘‘Roma”,type = restaurant,food = Italian,near = museum)
confirm(pricerange = moderate)
affirm(name = ‘‘Roma”,pricerange = moderate)
request(phone)
inform(name = ‘‘Roma”,phone = ‘‘385456”)
bye()
Table 3
Example ontology rules for a simple tourist information domain.
task
entity
type
type
area
food
!
!
!
!
=
=
find(entity)
venue(name,type,area)
bar(drinks,music)
restaurant(food,pricerange)
(centraljeastjwestj...)
(ItalianjChinesej...)
0.4
0.8
0.4
0.3
9It should be noted that apart from the database itself, there is no other application-dependent data or code in a HIS dialogue manager.
8
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 9
rules. Furthermore, if a prior probability mass is associated with each tree node, then splitting a node carrying
mass P using a rule with prior probability p simply requires that the mass of the remaining uninstantiated node
is reduced to probability P ? p and the split off node has mass p. In addition, the belief associated with the
original partition is divided between the two new partitions in the same proportion. Provided that no rule
is used to split a node more than once, this mechanism ensures that all partitions are unique, the sum of
the prior probabilities over all partitions is always unity and splitting does not change the total belief over
all partitions.
For the case of non-terminal nodes, the partition split probability Pðp0jpÞ is specified as a prior in the node
expansion rules (indicated by a !). This prior can be estimated by counting occurrences of each class type in a
training corpus. However, for the case where an atomic value a is assigned to a terminal node x, using a simple
prior for Pðp0jpÞ ¼ PðajxÞ would severely under-estimate the probability since in practice it will be heavily con-
ditioned by the values of the other terminal nodes in the goal tree. Hence, in this case, Pðp0jpÞ is estimated as
neðx;a;suÞ=neðx;suÞ where the numerator is the number of database entities consistent with the current goal
hypothesis suwhen x ¼ a and the denominator is the number of database entities consistent with suwhen x
is unspecified.
Thus, in the HIS model, partitions of user goal space are represented by a forest of trees where each tree
represents a single partition. At the start of a dialogue, there is just one partition represented by a single root
node with belief mass unity. Each incoming user act is matched against each partition in turn and if there is a
match, nothing needs to be done. However, if there is no match, the ontology rules are consulted and the sys-
tem attempts to create a match by expanding the tree. This expansion will result in partitions being split and
their belief mass redistributed accordingly. This is illustrated in Fig. 5. Following the initial system prompt, the
user requests something but due to poor speech recognition the understanding component generates two pos-
sible hypotheses. These firstly cause the task node to be split to create a find task. Since there is no match
for the type=restaurant item, an entity node is created with subtype venue and then its type node is
split to create a subtype restaurant. The second user act hypothesis is then matched against the set of par-
titions, and the type node is split again to create the bar subtype. The way in which the prior probabilities
are applied is indicated by the numbers on the top of the nodes.
The figure also illustrates the way that the belief is redistributed by the splitting process. It is important to
stress that this belief refinement is quite distinct from belief monitoring. The probability of each hypothesised
user act is irrelevant to the splitting process. The whole process is designed to be conceptually equivalent to a
system where all possible trees are fully expanded from the outset and belief monitoring is applied to all pos-
sible partitions.
task
find
entity
venue
type
name
area
addr
phone
restaurant
food
price
chinese
cheap
? central ? ?
subtype
class
atomic
Key:
Fig. 4. Tree structure representing the user goal: ‘‘find a cheap Chinese restaurant in the centre of town”.
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
9
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 10
3.3. The observation model
The observation model probabilities Pðo j auÞ are derived directly from the N-best list of hypotheses gen-
erated by the speech understanding component by assuming that
Pðojau¼ ~ ai
where ~ ai
In practice, the constant kocan be ignored since it is subsumed by the constant k in the belief update Eq. (11).
In practice, the quality of the N-best list of user acts is crucial to obtaining robust performance (Thomson
et al., 2008c). In the current system, the speech recogniser generates a word lattice which is then converted to a
confusion network (Evermann and Woodland, 2000). A ranked list of word level hypotheses is then generated
from this confusion network in which the probability of each hypothesis is given by the product of each con-
stituent word posterior including any null arcs. Each word level hypothesis is then parsed to produce a user
dialogue act and any resulting duplicates are merged by summing their probabilities. Currently, the speech
understanding component is deterministic and does not modify the word level probabilities.10
uÞ ¼ kopi;
ð12Þ
uand piare defined in (8) i.e. the posterior probability of the N-best list element corresponding to au.
3.4. The user action model
The HIS user action model is a hybrid model, consisting of a dialogue act type bigram model and an item
matching model
Pða0
ujp0;amÞ ¼ PðTða0
uÞjTðamÞÞ
bigram model
|fflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflffl}
?PðMða0
|fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl}
uÞjp0;amÞ
item matching model
;
ð13Þ
bar
venue
find
0.4
0.4
0.6
task
task
1.0
1 partition: task() b= 1.0
entity
1.0
name
...
1.0
restaurant
drinks
1.0
1.0
0.3
0.4
type
0.3
...
...
food
4 partitions:
task()
find(venue(restaurant(food=?, ...), name=?, ...))
find(venue(bar(drinks=?, ...), name=?, ...))
find(venue(type=?, name=?, ...))
b=0.6
b=0.12
b=0.16
b=0.12:
S: How may I help you?
U: I want to find a <mumble>.
=> inform(task=find, type=restaurant)
inform(task=find, type=bar)
Turn 0
Turn 1
Fig. 5. Illustration of partition splitting.
10The Phoenix decoder is currently used for semantic parsing (Ward, 1991).
10
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 11
where Tð?Þ denotes the type of the dialogue act and Mð?Þ denotes whether or not the dialogue act matches the
given user goal partition p0and the last system act am.
The bigram model reflects the dialogue phenomenon of adjacency pairs (Schegloff and Sacks, 1973). For
example, a question is typically followed by an answer, an apology by an apology-downplayer, and a confir-
mation (‘‘You want Chinese food?”) is typically followed by an affirmation (‘‘Yes please.”), or negation (‘‘No,
I want Indian.”). The bigram model is trained from data using maximum likelihood with Witten–Bell
smoothing.
The item matching model is deterministic, assigning either a full match probability or a no match probability,
depending on the outcome of matching the user act with the given user goal partition. For example, the user is
not likely to ask about Indian food when the user goal actually indicates that he wants a Chinese restaurant.
Therefore, the item arguments of an inform act should match the partition. On the other hand, a negation is
not likely if the content of the last system act matches the partition. The matching probabilities themselves are
optimised empirically.
The design of the matching model is formalised in terms of dialogue act preconditions (Cohen and Perrault,
1979). The preconditions of an action specify the conditions that the assumed dialogue state has to satisfy in
order for an agent to perform that action. For example, a user wanting to find a Chinese restaurant is moti-
vated to perform the action inform (type=restaurant, food=Chinese) (assuming cooperativity in
the sense that the system will try to satisfy the user’s goal once it has been informed about it). Each precondi-
tion is defined in terms of an agent (typically the user U), a propositional attitude (typically WANTS), and a
propositional argument (typically an attribute–value pair). For example, inform (food=Chinese) has
the precondition ‘U WANTS (food=Chinese)’, whereas a negate() after confirm(food=Indian) has
the precondition ‘U not WANTS (food=Indian)’. Table 4 presents some examples of HIS system user dialogue
acts and their preconditions in terms of the propositional attitudes of the user, and what matching operations
against the given user goal partition are required for these preconditions to be satisfied. Note that since the
preconditions do not depend on specific attribute names or their values, the item match model specification
is domain-independent.
3.5. The dialogue history model
Each user goal partition represents one possible interpretation of the goal that is in the user’s mind and
which is motivating the current query. As the dialogue progresses, the attributes and values which comprise
Table 4
Preconditions and item match conditions required by the item match model for some typical user acts. Where relevant, the last system act
amis shown as [sys: act]. The relation KNOWS_VAL means ‘‘knows the value of”; BEL means ‘‘believes”; and not(a ¼ x) means that the item
(a ¼ x) may not match the partition.
User act PreconditionsItems to match
inform(a=x, b=y)
request(a,b=x)
U WANTS a ¼ x, b ¼ y
U WANTS b ¼ x
U WANTS U KNOWS_VAL a
U WANTS a ¼ x
U WANTS U KNOWS_IF a ¼ x
[sys: confirm(a ¼ x)]
U WANTS a ¼ x
[sys: confirm(a ¼ x)]
U WANTS a ¼ x
U WANTS b ¼ y
[sys: confirm(a ¼ x)]
not(U WANTS a ¼ x)
[sys: confirm(a ¼ x)]
U BEL S BEL U WANTS a ¼ x
not(U WANTS a ¼ x)
U WANTS b ¼ y
a ¼ x, b ¼ y
b ¼ x
a
a ¼ x
a ¼ x
reqalts(a ¼ x)
confirm(a ¼ x)
affirm()
a ¼ x
affirm(b ¼ y)
a ¼ x
b ¼ y
negate()
not(a ¼ x)
negate(b ¼ y)
not(a ¼ x)
b ¼ y
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
11
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 12
the goal will be mentioned by both the user and the system in various contexts. The purpose of the dialogue
history model is to track the status of these attributes and values using a grounding-model (Traum, 1999).
Each terminal node in the associated partition is assigned a grounding state as shown in Table 5.
These states are updated according to a simple set of transition rules as the dialogue progresses. For exam-
ple, if the user requests ‘‘a bar in the centre of town”, the grounding state of the nodes representing
area=central will be UInf. If the system then queries that the desired area is indeed ‘‘central” and the user
confirms it, then the grounding state will be updated to Grnd.
It is important to emphasise that the grounding states of nodes in user goal trees are not deterministic. Any
node may have multiple possible states depending on the possible dialogue histories that led to the current
state. The combination of a specific user goal partition, last user act and specific dialogue history constitute
a hypothesis in the HIS system. For example, Fig. 6 illustrates the way that hypotheses are formed in more
detail. In this example, the system had previously output inform(music=Jazz) and the user’s response
was either request(food) or deny(music=Jazz). Previously there was a single dialogue history hypoth-
esised for the given fragment of partition p0with both nodes in the Init state. After completing the turn, there
are two distinct dialogue history states corresponding to the two different interpretations of the user input.
The actual probability Pðs0
updating the history from sdto s0
denied a goal in p0, then Pðs0
djp0;a0
u;sd;amÞ returned by the dialogue history model is deterministic. If after
d, a resulting hypothesis hp0;a0
djp0;a0
u;s0
di is inconsistent, for example the user has
djp0;a0
u;sd;amÞ ? 0, otherwise Pðs0
u;sd;amÞ ? 1.
3.6. Summary of belief updating in the HIS system
Before moving to the topic of policy representation and optimisation, it may be helpful to summarise the
process of belief updating in the HIS system. Referring to Fig. 7, the inputs to the system consist of an obser-
vation from the user and the previous system act. The observation from the user typically consists of an N-best
list of user acts, each tagged with their relative probability ((6) and Section 3.1). The user goal is represented by
a set of branching tree structures which represent partitions of the user goal space and they are grown down-
wards controlled by application-dependent ontology rules (Section 3.2). Initially, there is a single tree node
Table 5
User goal node grounding states.
StateDescription
Init
UReq
UInf
SInf
SQry
Deny
Grnd
Initial state
Item requested by user with expectation of an immediate answer
Item supplied by user during formation of a query
Item supplied by system
Item queried for confirmation by system
Item denied
Item grounded
type
food
music
Italian Jazz
p
[S: inform(music=Jazz)]
U: request(food)
[S: inform(music=Jazz)]
U: deny(music=Jazz)
qeRU > - t i n I : do
> - f n I S > - t i n I : c i s um
o f
Grnd
food: Init
> - f n I S > - t i n I : c i s u
m
Deny
1
d s
2
d s
1
u
~
a
[ ]
m
a
2
u
~
a
[ ]
m
a
restaurant
Fig. 6. Example hypothesis formation where the same partition has differing grounding states depending on the interpretation of the
previous user act.
12
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 13
representing a single partition with belief unity. As the trees are grown, the partitions are repeatedly split
allowing the belief assignment to be refined.
The tree growing process is driven entirely by the dialogue acts exchanged between the system and the user.
Every turn, the previous system act and each input user act is matched against every partition in the branching
tree structure. If a match can be found then it is recorded. Otherwise the ontology rules are scanned to see if
the tree representing that partition can be extended to enable the act to match. Once the matching and par-
tition splitting is complete, all the partitions are rescanned and for each partition, all hypothesised user acts
which have items matching that partition are attached to it.
Each combination of a user goal partition and an input user act hp;aui forms a partial hypothesis and the
observation probability and user act model probability can be calculated as in (12) and (13).
The grounding status of each tree node is recorded in the dialogue history state sd. Since the grounding sta-
tus of a tree node can be uncertain, any hp;aui pair can have multiple grounding states attached to it. However,
unlike the user act component of the state which is memoryless, the dialogue history evolves as the dialogue
progresses. Thus, at the beginning of each dialogue cycle, the various dialogue state instances are stored
directly with the partitions. Once the input user acts have been attached to the partitions, the dialogue history
states are updated to represent the new information in the dialogue acts. At this point, the dialogue history
state probabilities (Section 3.5) are computed. At the end of the turn, identical dialogue history states attached
to the same partition are merged ready for the next cycle.
Every distinct triple hp;au;sdi remaining at the end of the above process represents a single dialogue hypoth-
esis hk. The belief in each hkis computed using (11) and the complete set of values bðhkÞ represents the current
estimate of the POMDP belief state. This belief state is input to the POMDP policy which determines the next
system output. The way that policies are represented and the implementation of the decision process is
described next.
4. Policy representation and optimisation
As mentioned in Section 2.1, the HIS system represents policies by a set of grid points in summary belief
space and an associated set of summary actions. Beliefs in master space are mapped first into summary space
and then mapped into a summary action via a dialogue policy. The resulting summary action is then mapped
back into master space and output to the user. This mapping is necessary because accurate belief monitoring
requires that the full propositional content of user goals and dialogue acts be maintained, whereas policy
1
Observation
~
a u
~
a u
From
User
Ontology Rules
2
N
u
a
m
a ~
From
System
1 1
1
2 2
2
2
2
2
1
d
s
2
d
s
1
d
s
2
d
s
3
d
s
1
u
p p
2
u
p p
3
u
p p
POMDP
Policy
2
h
3
h
4
h
5
h
1
h
1
2
3
~
a u
~
a u
~
a u
~
a u
~
a u
Belief
State
Application Database
Action
Refinement
(heuristic)
m
a ~ *
Strategic
Action
Specific
Action
Fig. 7. Overview of the HIS system operation.
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
13
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 14
optimisation requires a more compact space which can be covered by a reasonable number of grid points. This
section explains this mapping process in more detail and describes the policy optimisation algorithm.
4.1. Summary space and dialogue policies
In the current HIS system, each summary belief point is a vector consisting of the probabilities of the top
two hypotheses in master space; two discrete status variables, h-status and p-status, summarising the state of
the top hypothesis and its associated partition (see Tables 6 and 7); and the type of the last user act.
The set of possible machine dialogue acts is also compressed in summary space. This is achieved by remov-
ing all act items leaving only a reduced set of dialogue act types. When mapping back into master space, the
necessary items (i.e. attribute–value pairs) are inferred by inspecting the most likely dialogue hypotheses. The
full list of summary actions is given in Table 8.
Given the above, a dialogue policy can be represented as a fixed set of belief points in summary space (i.e. a
grid) along with the action to take at each point. In order to use such a policy, a distance metric in belief space
is required to find the closest grid point to a given arbitrary belief state. In the prototype HIS system, this dis-
tance metric is11
Table 6
Hypothesis status values used in summary belief space.
H-statusMeaning
initial
supported
offered
accepted
rejected
notfound
initial state of a hypothesis
at least one grounded node in associated partition
entity consistent with this hypothesis has been offered to user
offered entity has been accepted
at least one node in associated partition is denied
no solution to user goal defined by partition is possible
Table 7
Partition status values used in summary belief space.
P-statusMeaning
initial
generic
hugegroup
smallgroup
unique
unknown
initial state of a partition
partition is consistent with at least one node instantiated
set of database entities matching partition is under-specified
set of database entities matching partition is fully specified
partition is consistent with a single unique matching entity
no entities in database are consistent with this partition
Table 8
List of summary acts.
Summary act Meaning
greet
request
confreq
confirm
offer
inform
split
findalt
querymore
bye
greet the user
request information
implicitly confirm and request further information
explicitly confirm some information
offer an entity to the user
provide further information
ask user to distinguish between two options
find an alternative solution to users goal
ask user if more information is required
say goodbye
11But see Section 6.
14
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001
Page 15
j^bi?^bjj ¼
X
2
k¼1
ak?
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð^biðkÞ ?^bjðkÞÞ2
q
þ
X
5
k¼3
ak? ð1 ? dð^biðkÞ;^bjðkÞÞÞ;
ð14Þ
where the a’s are weights, the index k ranges over the 2 continuous and 3 discrete components of^b and dðx;yÞ
is 1 iff x ¼ y and 0 otherwise.
4.2. Master–summary space mapping
The process of mapping between master and summary space is illustrated in more detail in Fig. 8. On the
left of this figure is master space consisting of a set of dialogue hypotheses where, as discussed previously, each
hypothesis consists of a user goal partition p, a dialogue history state sdand the last user act au. Note also that
each hypothesis has associated with it a notional set of database entries consisting of all entities in the database
which are consistent with the hypothesis’s partition. On the right of Fig. 8 is summary space represented by a
single vector or belief point.
The policy is shown as an irregular grid of these belief points and the figure shows how a system response is
generated by mapping the current belief state b into a summary belief state^b, then finding the nearest stored
point in the policy^biwhich in turn yields a summary action ^ ai
a heuristic which assumes that the selected summary action refers to the top hypothesis and therefore
m. This is then mapped back into master space by
Grnd
UInfo
....
affirm(..)
u d
a s p
, ,
b(1)=0.58
b(2)=0.25
b(3)=0.22
UInfo
UInfo
....
inform(..)
UInfo
SInfo
....
...
inform(..)
ecapS y r ammuS
ecapS r e t saM
b ˆ
Policy
i
m
a ˆ
i b ˆ
m
a
[ eg . confirm ] [ eg . confirm(area=central) ]
heuristically
add items
to machine
action in
master
space
b(2)
h-
status
p-
status
last
uact
b(1)
Fig. 8. Master–summary state mapping.
S. Young et al./Computer Speech and Language xxx (2009) xxx–xxx
15
ARTICLE IN PRESS
Please cite this article in press as: Young, S. et al., The Hidden Information Statemodel: A practical framework for POMDP-based spo-
ken dialogue management, Computer Speech and Language (2009), doi:10.1016/j.csl.2009.04.001