Content uploaded by Giuseppe Raffa
Author content
All content in this area was uploaded by Giuseppe Raffa on Dec 26, 2018
Content may be subject to copyright.
Kid Space: Interactive Learning in a Smart Environment
Glen J. Anderson
glen.j.anderson@intel.com
Selvakumar Panneer
selvakumar.panneer@intel.com
Meng Shi
meng.shi@intel.com
Carl S. Marshall
carl.s.marshall@intel.com
Ankur Agrawal
ankur1.agrawal@intel.com
Rebecca Chierichetti
rebecca.chierichetti@intel.com
Giuseppe Raffa
giuseppe.raffa@intel.com
John Sherry
john.sherry@intel.com
Anticipatory Computing Lab
Intel Labs
Hillsboro, Oregon
United States
Daria Loi
daria.a.loi@intel.com
Lenitra Megail Durham
lenitra.m.durham@intel.com
ABSTRACT
Kid Space is a smart space for children, enabled by an innovative,
centralized projection device that senses multimodal interactivity
and intelligently projects augmented reality (AR) content across
surfaces. Kid Space uses a visible agent to guide learning through
play. Two preliminary studies evaluated Kid Space with children
5 to 8 years old. Study 1 showed that children engaged
enthusiastically with the projected character during a math
exercise and during physically active games. A parent
questionnaire showed that parents valued Kid Space for learning
and physical activity. Study 2 found that children engaged with a
projected agent at a closer distance than with a television. Parents
showed a preference for a projected AR agent over an agent on a
television or a standard projection. Parents also showed a
preference for an agent that demonstrated awareness of children’s
physicality in the space.
CCS CONCEPTS
• Ubiquitous and mobile computing → Ubiquitous and mobile
computing systems and tools
KEYWORDS
User interface design, ambient computing, smart spaces,
intelligent agents, embodiment, presence, projected computing,
augmented reality.
1 INTRODUCTION
Permission to make digital or h ard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full
citation on the first p age. Copyrights for components of this work owned by others
than the author(s) must be honored. Abstracting with credit is permitted. To copy
otherwise, or republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions from Permissions@acm.org.
GIFT'18, October 16, 2018, Boulder, CO, USA
©2018 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-6077-7/18/10…$15.00 https://doi.org/10.1145/3279981.3279986
This paper reports progress on an ongoing project we call Kid
Space. It is an interactive learning environment enabled by
intelligent projection and sensing technologies. Following a
design thinking process [23], we are iterating the technical
components as we test experience designs with simulated versions
of the future product. The Related Work section positions Kid
Space within previous interactive systems and multimodality
research. The System section describes the functional space
mapping and intelligent projection technology. The User
Evaluation section details two studies in which children interacted
with a simulated smart space. It included an animated character
that showed awareness of the space and interacted with children
as though it could see them, hear them, and understand their
actions. The Discussion section presents a vision of Kid Space
based on our learning.
1.1 Contributions of This Paper
Kid Space is a perspective for conceptualizing new smart spaces,
and it is a prototype currently under development. This paper
makes the following contributions:
1. A description of the Projected Compute Multi-Axis
Gimbal (PCMAG), a centralized apparatus to provide
smart space functionality
2. Findings from exploratory user testing on interaction
approaches in projected augmented reality (AR) spaces
for children, particularly what aspects children and
parents found engaging and useful
3. A vision of Kid Space that utilizes PCMAG and
additional technologies
2 RELATED WORK
Research has shown that technology can enable many approaches
to benefit childhood learning [28]. For example, systems that
encourage creative building can be inherently instructive [26], and
interactive story making can improve literacy [3,20]. However, a
sense of presence, agency, and satisfaction are not simply a
function of greater interactivity in a space [20], and increased
2
presence does not necessarily increase a sense of engagement.
Factors such as immersion also play a role. Much research
remains to create systems that not only understand children’s
activities within a space but also provide the appropriate learning
intervention [19].
Kid Space is a smart space for children that uses a visible agent to
guide learning through play, enabled by a centralized projection
device that senses multimodal interactivity and intelligently
projects AR content across surfaces. Environments such as Kid
Space offer the potential for rich displays of content, multimodal
input/output, vigorous physical activity, shared experiences, and
motivational feedback based on monitoring of activities [1]. In
this section, we contextually scaffold the paper, by briefly
discussing theoretical and pedagogical concepts that grounded Kid
Space and by overviewing a sample of relevant work and
literature.
At a high level, Kid Space is grounded in the notion of Multi-
Pedagogical Frameworks [18] – these are scaffolds that integrate
notions of Multiliteracy [21], Multimodality [12,15], and
Multisensoriality [4].
Multiliteracy [21], an approach to literacy theory and pedagogy
developed in the mid-1990s by the New London Group, argued
for the creation of new forms of literacy to address a shift in how
people communicate because of new technologies and a linguistic
shift within different cultures because of increased transnational
migration. The multiliteracies approach involves four key aspects:
Situated Practice (i.e., ground learning in one’s own life
experiences); Critical Framing (i.e., support one in questioning
common sense assumptions); Overt Instruction (i.e., help one
understand expressive forms and grammars’ components); and
Transformed Practice (i.e., engage one in situated practices based
in new understandings of literacy practices). By guiding learning
through play, Kid Space’s visible agent, with its ability to
dynamically interact with the physical space, offers great
opportunities for situated as well as transformed practice.
Additionally, the tangible, multimodal, and interactive nature of
the experience afford multiple opportunities to engage children in
critical framing and overt instruction.
Multimodality [12,15], a theory of communication and social
semiotics, describes communication practices in terms of diverse
modes (e.g., textual, aural, linguistic, spatial, visual) used to
compose a message or, in the case of this paper, to create an
artifact. Multimodality is at a very high level about deliberate use
of multiple modes and about understanding that such a deliberate
use will create meaning, shaping how an idea or concept may be
received. Kid Space engages children through multiple modalities:
its visible agent can communicate with them (linguistic, aural
modes) as well as move around and interact with physical objects
(visual, spatial modes). Additionally, the technology enables the
use of surfaces as spaces where to display shapes, text,
illustrations (textual, visual mode) and to physically explore,
touch, inhabit, and cluster (visual, spatial mode).
Multisensoriality and multisensorial spaces are complex
environments “made up of sensory contrasts and overlappings that
are phenomenologically distinct,” and that space should go
beyond “being simply rich in stimuli,” instead focusing on
providing “different sensory values so that each individual can
tune into his or her own personal reception characteristics” [4]. By
interacting with physical objects and furniture and by engaging
children with spaces that mix the real and the projected, the
visible agent and all projected elements deeply enrich the overall
sensory experience.
Focus on Multi-Pedagogical Frameworks [18] enabled us to keep
track of what type of experiences we wish to develop. We were
deliberate in practically shaping those experiences, from a value
proposition level to specific use cases and modes to utilize.
Smart spaces have a rich and creative history. KidsRoom
demonstrated a visually augmented space in a simulated bedroom
[1]. The setup required multiple rear projection systems and
several computers to process machine vision. The system guided
children through a narrative and tracked their movement in
reaction to the story, which enabled the system, for example, to
repeat an instruction for each child to dance on a rug. Cassell [3]
described several studies with functional prototypes that enabled
tangible interactions with computing systems in order to develop
literacy in children. Sam the Castlemate, featured an intelligent
agent named Sam, projected next to a dollhouse. Sam supported
development of literacy by encouraging use of language as a tool
and use of language in contexts that are new to a user. Children
constructed their own stories, using real toys that the system
detected through RFID, following Sam’s guidance. RoomAlive
was a system composed of multiple projector-plus-depth-camera
units, which projected around a room, creating an immersive AR
experience [13,33]. It tracked several aspects of users, including
location, body pose, air gestures, and aim with an infrared gun.
The tracking enabled applications to respond to a user’s
movements, for example with in-game characters hiding,
attacking, and responding to gestures such as karate chops.
Much work has focused on understanding user intentions through
user position, pose, speech, and gaze [11,29]. The literature on
multimodal robot-human interaction is particularly instructive for
smart spaces with interactive characters. For example, Giuliani et
al. showed a robotic bartender with machine vision and voice
interaction capabilities was able to have positive social
interactions with people [8]. With another robotic bartender,
Foster et al. [6] found that a trained classifier outperformed a rule-
based classifier, using multimodal sensor inputs to determine user
intentions. Fujimura et al. demonstrated that projected agents
could improve problem-solving when spatial information was
required [7].
Kid Space: Interactive Learning in a Smart Environment
The research on the use of AR for teaching has indicated that
visual AR is best used for concepts that require visual
understanding and visualization [25,26]. These concepts may
include learning spatial structures and learning language with
visual associations. Use of AR can increase motivation, retention,
and task performance if interactions are natural and if curriculum
developers attend to good design.
As sensing technologies and the algorithms that make sense of the
data improve, smart environments can more successfully interpret
user intentions. Likewise, as spatial mapping and projection
technologies improve, systems can better understand
environments and evolve user interaction with smart spaces.
There would be value in providing a Kid Space system that has
the following characteristics:
Requires fewer components and an easier
installation than previous examples
Augments a space, allowing users to retain focus
on reality, including the real objects in the space
Provides projected output across an entire room
Dynamically responds to people and objects in the
environment
Is compelling for children to learn through play
2 SYSTEM
In this section, we describe the hardware components that make
up the projected compute multi-axis unit (PCMAG), the software
needed to make the agent appear, and the challenges in bringing
all of these components together.
2.1 Hardware
Our initial prototype is comprised of a compute device, a
projector, LiDAR sensor, fisheye lens camera, and steerable
motors. Figure 1 shows the components that make up the system.
Figure 1: Components of the System
The steerable unit contains the following:
A focus-free laser projector (UO Smart Beam Laser
Beam Pro C200)
Steerable motors for controlling pan/tilt of unit
(Dynamixel MX-28T Robot Actuator)
A motor controller to activate the pan/tilt motors
(ArbotiX-M Robocontroller)
A LiDAR sensor for precise distance measurement from
the projector to the projected surface (Garmin LIDAR-
Lite 3 Laser Rangefinder)
The non-steerable unit includes a compute device (Intel i7 NUC
Skull Canyon) and a fisheye camera (Kodak PixPro 4k) to track
the location of the people in the confined space. See Figure 2.
Beamatron [34] demonstrated an earlier investigation on a
steerable projection system, including a depth camera for
interaction. PCMAG substitutes a laser projector and adds LiDAR
for room mapping.
Figure 2: Projection Multi-axis Unit
2.2 System Software
Our Kid Space software system goals were to map the physical
world to a virtual space, provide fine grain control of the
projection multi-axis unit (guided by the animation of the virtual
agent in the virtual world), and provide interaction with the
projected content. Figure 3 illustrates the overall process in
projecting a virtual agent into a real world environment.
Figure 3: Process of capturing the environment, mapping to
virtual world, and displaying back into real-world
First, leveraging the LiDAR sensor in the steerable unit, we steer
the pan/tilt motors to capture the point cloud of the indoor space,
allowing us to recreate the 3D virtual environment in Unity 3D.
On every pan/tilt movement we trigger the LiDAR to obtain the
distance of the object it hit within the environment. We generate a
3D point for every pan/tilt movement possible using the pan/tilt
angle and the distance value obtained from the LiDAR.
4
Second, the system calculates the face normal using adjacent 3D
points and unifies the face normal values within a threshold value
to create a series of horizontal and vertical planes. We use these
horizontal and vertical planes to generate appropriate 3D planes in
Unity 3D. Then, we identify the upper portion of the horizontal
3D planes, which depicts the surface of the furniture in the room
that can be navigable by the virtual agent (Figure 4).
Figure 4: Mapping 3d point cloud data to Unity 3D
Third, to get smooth coordination between the hardware and
software, we created a virtual projection multi-axis unit in Unity
3D (Figure 5), which aligns the orientation of the physical unit.
The virtual projection in Unity 3D uses the ‘projective texture’ 3D
graphics feature to project a texture into the 3D environment.
When the virtual agent moves in the virtual world, a new look-at
vector is sent to the projection multi-axis unit’s software to
convert the vector into pan/tilt angles, which then direct the
physical unit to move and project the content.
Figure 5: Projective texture approach used to project the
virtual character into the real-world
Since the physical projector, attached to a steerable unit, can cause
an image distortion effect, we need to handle keystone correction
to make the virtual agent character look undistorted and to
maintain its size in all possible pan/tilt angles. To do this, we rely
on projection mapping in Unity 3D. Once the system finds the
appropriate location to show the virtual agent in the room, we
conduct the following steps (see Figure 6):
A 3D orthographic camera (1) in Unity 3D captures the
virtual agent. The capture is shown in (a)
Based on the location of the physical projector in the
room, the system calculates the projective texture
location (2) on the direct opposite side and projects the
orthographic camera output from previous step. The
orthographic camera captures the projected content on
the wall. The output on the wall is shown in (b). The
orthographic camera (1) captures the frame with the
projected content as shown in (c)
Physical projector is moved to the precise virtual
camera pan/tilt locations and the output of the
orthographic camera is sent to the physical projector (3)
in the real world to project the content on the wall as
shown in (d)
Figure 6: Aligning virtual camera to keystone correct
projected image
For interaction with the virtual agent, we tracked the user via a
live feed from the fish-eye camera (see Figure 7). Initially, we
tracked the bounding box of the user in the fisheye lens image and
then moved the virtual agent closest to that user. Later, we used
deep learning techniques to track the user pose from the fisheye
lens camera and detect a pointing gesture for where to move the
agent.
Figure 7: Projecting virtual character in reaction to a user
pointing gesture from detection in fish-eye camera
Through iterating the hardware design, we realized we needed
three key system enhancements. First, we needed sub-degree
accuracy for the steerable motors. Prior iterations made dynamic
moves by the virtual agent seem discontinuous. We replaced the
motors with smart servos that allowed fast fluid control of the
virtual agent (e.g., agent jumps from one surface to another in a
smooth and controllable motion). Second, we needed consistency
of the virtual agent size in physical dimensions. We mapped the
3D virtual camera in Unity 3D to the steerable projector and
allowed the projector to auto-steer based on the orientation of the
Kid Space: Interactive Learning in a Smart Environment
3D virtual camera. This maintained the projected content’s aspect
ratio and keystone correction irrespective of the distance between
the projectors distance and orientation from the projected surface.
Third, we needed automatic Space Virtual Mapping. After
initially using a static mapping approach with the space directly
specified in Unity 3D, we realized we needed a way to
automatically map a room. We added a light detection and
ranging (LiDAR) device to measure precise dimension of the
physical space.
Between the hardware and software components of the projection
multi-axis unit, we have achieved making a small form factor
device that can dynamically map a room, augment the space with
a virtual agent, and utilize our software framework for precisely
controlling how the agent interacts. See Figure 8.
Figure 8: Features enabled by the Kid Space HW/SW
prototype
3 USER EVALUATION
3.1 A Kid Space Experience
Consistent with the principles of iterative design [5,30], we
designed user experiences that could take advantage of the
PCMAG system capabilities and in turn discovered new
requirements to drive system development. The laser projection
from PCMAG provides a focused image across the room;
however, maintaining the size of the visible agent at greater
distances requires more of the available projection pixels. This
combination of advantage and limitation works better for adding
AR elements to a space than creating full-wall, immersive
projections.
We tested these experiences using a “Wizard of Oz” methodology
[9]. In a control room, human operators (the wizards) controlled a
projected AR agent, enabling it to show an apparent awareness of
the children in a participant room, including speaking
interactively with the children.
Our focus in the evaluation was to explore children’s interactions
in a smart space that has apparent awareness of people, objects,
and spaces. Oscar, a friendly, squeaky-voiced teddy bear
embodied the agency of the system and the wizards controlling it.
Oscar was a “master of ceremonies,” guiding the children through
math exercises (see Figure 9) and physical activities. Oscar would
appear to rest on top of horizontal surfaces, and the trajectory of
his motion from place to place (e.g., jumps from a desk to a
bookshelf) followed an intuitive sense of physics (see Figure 10).
Oscar had no facial expressions or mouth movements. He did
make broad gestures and posture changes.
Figure 9: Oscar leading a math tutorial to children
Figure 10: Oscar jumps from desk to bookshelf
3.2 Study 1
Study 1 was loosely structured and exploratory, allowing the
acquisition of qualitative data to indicate children’s responses to
the interactive character and of parent ratings of the system after
they observed the session. The wizards in the control room spoke
freely and naturally with the children through Oscar. The wizards
adlibbed many interactions with the children as opportunities
arose in the sessions. They did not rigorously follow a script but
led the children through the planned activities described below.
Procedure
The sessions took place in a room where children interacted with
Oscar as he moved around on one wall, sometimes jumping
between two small pieces of furniture (see Figure 10). The parents
watched passively during the session and then responded to
questionnaires at the end of each session.
The study was composed of five sessions. In each session 2
parents from different families each brought 1 or 2 children, thus
each session had 2 parents and 2-4 children for a total of 16
children and 10 parents. Children were 5-8 years old.
A facilitator guided children through multiple activities:
Meet the agent (Oscar the Bear)
Color on paper an outfit for Oscar
Oscar wears the new outfits (which the wizards
scanned and applied to Oscar)
Do math with Oscar (e.g., count tangible objects
and respond to fraction questions)
Open-ended physical games with Oscar (e.g., “red
light/green light,” a game in which a child must
remain still after Oscar says “red light”)
6
Results
Qualitative indicators from the first study showed that children
engaged with Oscar in many ways:
Carried on conversations with Oscar
Answered math challenges
Responded to Oscar’s suggestions to participate in
energetic dancing and jumping
Emoted in response to Oscar’s “wearing” outfits that
the children colored themselves
Touched Oscar’s projection, pretending to feed him, to
tickle him, or to experiment with the projection’s
reflectance on their hands or on paper
Near the end of the session, parents responded to a survey in
which they rated the value of various uses for Kid Space. Figure
11 shows the mean ratings for how parents valued possible uses
by their children. While the small sample size limits the external
validity of the survey, these parents consistently indicated they
saw high value in the system for learning and physical play, and
they valued the use of the children’s artwork in the experience.
Figure 11: Parent ratings for system use
Qualitative interviews were consistent with the survey and
indicated that parents saw Kid Space as a less screen-orientated
experience – compared to tablets– and indicated that they may not
count its use against their children’s screen time allowance.
3.3 Study 2
Study 2 was more tightly structured than Study 1 and focused on
evaluating two aspects of Kid Space: the AR presentation style
and the agent’s awareness of the physicality of the children. The
character spoke interactively with the children, as operated by the
wizards in the control room, but in a more restricted format as
described below.
Procedure
The sessions took place in a room where children interacted with
Oscar as he moved around on one wall, sometimes jumping
between two small pieces of furniture (see Figure 10).
The study included 8 sessions, each with 2 parents, usually from
different families, and 2 children, for a total of 16 children and 16
parents. Children were 5-8 years old. The parents watched
passively during the session and then responded to questionnaires
at the end of each session.
Each child participant completed two activities. The two activities
were not counterbalanced, but the conditions within each were
counterbalanced. The first was a math activity, completed across
three AR presentation conditions: a TV, a standard “floating”
projection, and an AR projection (see Figure 12). Oscar, as
controlled by the wizards, asked children progressively more
difficult math questions as a flock of birds flew out to show a
visualization of the answer. The questions started with simple
addition (e.g., 2 birds are here, 8 more join in, how many
together?) and became more difficult when children seemed to
answer easily (e.g., if 1 little chick needs 2 birds to take care of,
there are 2 chicks now. How many birds do we need to take care
of these 2 chicks)?
Figure 12: Display conditions for Study 2
The second activity was a drawing activity in which the agent
conversed with the children under two conditions. In one
condition, Oscar showed awareness of the location of each child
and awareness of the specifics of the drawing. In the second
condition, Oscar moved in the general vicinity of the children and
asked general questions about the drawings, showing no
awareness of specific aspects of the drawings nor awareness of
location of the children. Conditions were counterbalanced across
all sessions. Figure 13 shows an example snapshot from the
drawing activity.
Figure 13: Snapshot from the drawing activity
Results . Based on qualitative observations in Study 1, we noted
that one indicator of engagement was body positioning –
specifically, how closely children approach the animated agent.
For instance, in Study 1 children crowded closely around the
animated agent, showing it pictures, or offering it an “ice cream
Kid Space: Interactive Learning in a Smart Environment
cone” made from clay. Therefore, in Study 2, we analyzed the
distance of children from Oscar. Mean distances were 1.7, 1.2,
and 1 meters for TV, floating, and AR respectively. The distance
used in the analysis was the most common distance of interaction
for a given child (i.e., the distance the child spent the most time
interacting with Oscar) +/- .3 meter. A repeated measures
Analysis of Variance was performed on the distances for the
independent variable of presentation condition of TV, floating,
and AR (each participant was exposed to all three,
counterbalanced as described above). Results were statistically
significant (p<.05), and a post hoc Student T-test (paired for
repeated measures) showed that children interacted with Oscar
closer under the two projection conditions than under the TV
condition. Since the drawing task required children to sit near
Oscar, a proximity analysis was not done for that activity.
After the completion of tasks, the facilitator asked both children
and parents to rank preference for the display conditions and for
the Oscar awareness conditions. Figures 14 and 15 show the
frequency of a first-choice ranking for each condition.
Figure 14: Preference frequencies for presentation conditions
Figure 15: Preference frequencies for awareness conditions
Friedman’s ANOVA is a non-parametric test designed for
repeated measures of ordinal data with at least three levels, thus
appropriate for analysis of the ranking data. The data analyzed
were the individual rankings from the children and the parents. A
Friedman ANOVA showed no significant result for children’s
rankings of presentation conditions. A Friedman ANOVA showed
a significant result for parent ranking of presentation conditions
(p=.005). Wilcoxon post hoc tests showed the parents preferred
the AR condition to TV and floating. When asked about the
reason for their preference, children cited very specific influences,
such as something Oscar happened to say during a trial that was
unrelated to the experimental condition. The reasons for the
parents’ preferences included that children remained more
mentally present in their environment with the AR projection
compared to a television or floating projection.
Parents tended to prefer the Aware Oscar to Unaware. They noted
that the Aware Oscar seemed more polite and more responsive to
their children. Most children did not seem to notice that Oscar was
acting differently across conditions. This non-parametric data did
not meet assumptions for Friedman’s ANOVA or Chi Square so
were not tested.
4 DISCUSSION
4.1 Implications from Evaluation
Children readily engaged with Oscar across multiple input
modalities and were motivated to attend to learning activities.
They asked Oscar questions and attempted physical interaction
with Oscar in a variety of surprising ways, including pretending to
feed Oscar a Play-Doh breakfast. Other work has shown that
children will converse with cartoons and videos of people in
similar ways [10] and that a projected AR character has potential
for “learning through teaching” [16]. Possibly Oscar’s awareness
of physical space overcame some of the limitations of virtual
character presence previously identified [17].
Children interacted with Oscar more closely when projected on
the wall versus presented on the television. The appearance of a
virtual door allowed Oscar to enter the space. The gradual
opening of the door revealed a virtual light source apparently
coming from behind it. That simple reveal inspired considerable
curiosity among participants: Oscar, it seemed, had a room hidden
behind this virtual door. Several of the children in Study 1
wondered aloud about what Oscar had in his room. One explicitly
asked Oscar to leave his door open “so we can see your room.”
While it is possible children felt more presence of a projected
character than a television character, further study would be
needed to provide evidence. It may have been due to the
traditional admonition from parents not to be too close to the
television and to not touch the screen. Also, other age groups
could show different results than the current study.
Parents appreciated the AR approach, expressing a clear
preference for such a mode of engagement over traditional screen
time. Part of the value they saw was the use of children’s artistic
creations in the activities. Games that emphasize creation and
learning with some connection to the real world often resonate
with parents [24]. Parents showed a tendency to prefer the version
of Oscar that showed awareness of the physicality of the children
and what they were drawing, while children did not seem to notice
a difference. Since parents are decision makers for technology in
the home, their opinions are important, but future research should
include educational professionals.
4.2 A Vision for Kid Space
To meet the promise of these wizard-enabled user evaluation
studies, we must deliver interactive content through PCMAG and
8
go beyond that to provide several other capabilities, all in concert
with strong user experience design. Returning to a pedagogy of
Multiliteracies, the “pedagogical acts” or “knowledge processes”
of experiencing, conceptualizing, analyzing, and applying [3] may
be nicely supported by a technical implementation that recognizes
people in spaces and their interaction with both real and virtual
objects.
Interactive Projection . The current prototype offers a number of
potential advantages in the implementation of Kid Space. By
unifying the sensing and projection capabilities in a single device,
we can potentially offer an experience that is easier to deploy and
initiate. We anticipate a number of ongoing technology research
challenges associated with the prototype, inspired by the user
research, including optimizing projection onto irregular surfaces
and objects, overcoming distortions, and improving the realism by
altering lighting and color properties of the virtual content to
match the ambient conditions in the room.
Intent and Dialog . The Wizard of Oz approach enabled Oscar to
interact verbally with the children in a fashion that would be
advanced for an automated system. Recent research on the
application of deep reinforcement and supervised learning to
verbal interactions is promising [9,32,35], enabling more realistic
conversations on particular topics. The importance of multimodal
dialog and timing between peers is also emerging [37].
Multimodal Interaction and Person Tracking . Detection of air
gestures, touch gestures, pose, and facial expressions will enable a
better understanding of attention and motivation in learning
activities. If a learning system can track attention and
engagement, it can return to methods that have been successful in
teaching an individual [11,31]. For example, one child may be
more motivated with drawing activities to learn math while
another thrives on verbal quizzing.
Audio features of the experience are also important, most
obviously the character’s voice. The initial Kid Space
implementation used a human voice altered to a higher octave.
Sound effects for actions are important maintain a natural feel.
Thematic music for games and characters adds to the appeal and
allows anticipation of events such as Oscar’s entrance. As pointed
out for KidsRoom, apparent audio direction and source is
important, especially when sound directs attention [1].
The capability to attribute inputs to a particular person is very
powerful in a system such as Kid Space. In Study 1, the
experience involved taking turns, and the wizard operators
controlling the experience informally tracked the relevant
knowledge level of individual participants, thus allowing play
across multiple children of different levels. Oscar managed
challenge levels, using the same content for younger children to
count and older children to answer questions about fractions. The
informal tracking was not limited to a verbal modality: gestures
such as raising hands and face orientation also held cues about a
child’s knowledge.
The potential wealth of data in a smart space is promising for
understanding, but rich data sets also carry risks and expectations
for privacy [22]. Parents in Study 1 questioned where data from
children may be stored and who has access (in an actual product).
While the public has begun to allow devices that track voice in
their homes, identification of children involves additional
considerations. Designers of a smart space should consider what
identification data a learning experience actually needs. For
example, a system may differentiate users from each other and
track their learning goals without full names and without saving
images of people.
Tangibles . One of the exercises in Study 1 involved children’s
use of paper and crayons to create custom outfits for Oscar.
Children were eager for Oscar to try on and display their outfits
(an activity made possible through optical scanning, done in the
control room during a break). The younger children
enthusiastically counted real beads to “feed” virtual birds. Such
direct engagement has been shown to improve learning [3,5,14].
The Kid Space awareness should eventually extend to objects in
the environment. For example, in a game in which a child stacks
blocks to help Oscar jump from one surface to the other, the
system would detect the block placement, and Oscar would
successfully make the leap after the child provides the correct
answer. Like other interaction modalities, tangibility is not
necessarily advantageous under all circumstances, but it can add
richness, enjoyment, and better retention of information [36].
4.3 Future User Research
While the studies were encouraging for the potential of such a
concept, we did not evaluate beyond the initial experience with
the simulated system. We were not, for example, able to
determine whether children would remain enthusiastic over time
or across a wider range of activities. Children from different
localities, with different customs, and perhaps with special needs,
merit further study. Finally, as each technology matures for
implementing Kid Space, further evaluation with target users with
functional prototypes is essential.
While this paper focuses on Kid Space for education, many other
application types could use it, for example:
A health and fitness suite to guide children and
encourage them to be active
Creation-themed applications to allow children to bring
toys and sculptures into a mixed reality environment
Providing a new interaction model with characters from
movies and television
Kid Space: Interactive Learning in a Smart Environment
REFERENCES
[1] Aaron F. Bobick, Stephen S. Intille, James W. Davis, Freedom Baird, Claudio S.
Pinhanez, Lee W. Campbell, Yuri A. Ivanov, Arjan Schütte, and Andrew Wilson.
1999. The KidsRoom: A perceptually-based interactive and immersive story
environment. Presence: Teleoperators and Virtual Environments. 8(4), 369-393.
DOI: https://doi.org/10.1162/105474699566297
[2] William Buxton and Richard Sniderman. 1980. Iteration in the design of the
human-computer interface. In Proceedings of the 13th Annual Meeting of the Human
Factors Association of Canada. 72-81.
[3] Justine Cassell. 2004. Towards a Model of Technology and Literacy
Development: Story Listening Systems. Journal of Applied Developmental
Psychology. 25(1), 75-105. DOI: https://doi.org/10.1016/j.appdev.2003.11.003
[4] Giulio Ceppi and Michele Zini. 1998. Children, spaces & relations – Metaproject
for an environment for young children. Reggio Children S.r.l. & Domus Academy
Research Center, Reggio Emilia, Italy
[5] Steven Dow, Manish Mehta, Ellie Harmon, Blair MacIntyre, and Michael
Mateas. 2007. Presence and engagement in an interactive drama. In Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems. 1475–1484. DOI:
https://doi.org/10.1145/1240624.1240847
[6] Mary Ellen Foster , Andre Gaschler, and Manuel Giuliani. 2013. How can i help
you': comparing engagement classification strategies for a robot bartender.
Proceedings of the 15th ACM on International conference on multimodal interaction,
Sydney, Australia. DOI: https://doi.org/10.1145/2522848.2522879
[7] Ryota Fujimora, Kazuhiro Nakadai, Michita Imai, and Ren Ohmura. 2010.
PROT: An embodied agent for intelligible and user-friendly human-robot interaction.
2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. 3860-
3867. DOI: https://doi.org/10.1109/IROS.2010.5649116
[8] Manuel Giuliani, Ron Petrick, Mary Ellen Foster, Andre Gaschler, Amy Isard,
Maria Pateraki, and Markos Sigalas. 2013. Comparing task-based and socially
intelligent behaviour in a robot bartender. In Proceedings of the 15th International
Conference on Multimodal Interfaces (ICMI 2013), Sydney, Australia. 263-270.
DOI: https://doi.org/10.1145/2522848.2522869
[9] John D. Gould , John Conti, and Todd Hovanyecz. 1983. Composing letters with
a simulated listening typewriter. Communications of the ACM. 26(4), 295-308, DOI:
https://doi.org/10.1145/2163.358100
[10] Jennifer Hyde , Sara Kiesler , Jessica K. Hodgins , and Elizabeth J. Carter. 2014.
Conversing with children: cartoon and video people elicit similar conversational
behaviors. Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems. Toronto, Ontario, Canada. 1787-1796. DOI:
https://doi.org/10.1145/2556288.2557280
[11] Alejandro Jaimes and Nicu Sebe. 2007. Multimodal human-computer
interaction A survey. CVIU. 108(1-2), 116-134. DOI:
https://doi.org/10.1016/j.cviu.2006.10.019
[12] Carey Jewitt and Gunther Kress (Eds.). 2003. Multimodal literacy. New York:
Peter Lang.
[13] Brett Jones, Rajinder Sodhi, Michael Murdock, Ravish Mehra, Hrvoje Benko,
Andrew Wilson, Eyal Ofek, Blair MacIntyre, Nikunj Raghuvanshi, and Lior Shapira.
2014. RoomAlive: Magical Experiences Enabled by Scalable, Adaptive Projector-
Camera Units. In Proceedings of the 27th annual ACM symposium on User interface
software and technology (UIST '14). 637-644. DOI:
https://doi.org/10.1145/2642918.2647383
[14] Laura M. Justice and Paige C. Pullen. 2003. Promising interventions for
promoting emergent literacy skills. Topics in early childhood special education.
23(3), 99-113. DOI: https://doi.org/10.1177/02711214030230030101
[15] Gunther Kress and Theo Van Leeuwen. 2001. Multimodal discourse: the modes
and media of contemporary communication. London: Arnold; New York: Oxford
University Press.
[16] Douglas B. Lenat and Paula J. Durlach. 2014. Reinforcing Math Knowledge by
Immersing Students in a Simulated Learning-By-Teaching Experience. International
Journal of Artificial Intelligence in Education. 24(3), 216-250. DOI:
https://doi.org/10.1007/s40593-014-0016-x
[17] Jamy Li. 2015. The benefit of being physically present: A survey of
experimental works comparing copresent robots, telepresent robots and virtual
agents. International Journal of Human-Computer Studies. 77(C), 23-37. DOI:
https://doi.org/10.1016/j.ijhcs.2015.01.001
[18] Daria Loi. 2007. TUIs as mediating tools within adaptive educational
environments. In Enhancing Learning Through Human Computer Interaction.
Elspeth McKay (Ed.). Idea Group, Hershey, PA. 178-191. DOI:
https://doi.org/10.4018/978-1-59904-328-9.ch010
[19] Vicente Nacher, Fernando Garcia-Sanjuan, Javier Jaen. 2016. Interactive
technologies for preschool game-based instruction: experiences and future
challenges. Entertainment Computing. 17, 19-29. DOI:
https://doi.org/10.1016/j.entcom.2016.07.001
[20] Marija Nakevska, Anika van der Sanden, Mathias Funk, Jun Hu, and Matthias
Rauterberg. 2017. Interactive storytelling in a mixed reality environment: the effects
of interactivity on user experiences. Entertainment Computing. 21. 97-104. DOI:
https://doi.org/10.1016/j.entcom.2017.01.001
[21] The New London Group. 1996. A pedagogy of multiliteracies: Designing social
futures. Harvard educational review. 66(1), 60-93. DOI:
https://doi.org/10.17763/haer.66.1.17370n67v22j160u
[22] Heather Patterson. 2013. Contextual Expectations of Privacy in Self-Generated
Health Information Flows. TPRC 41: The 41st Research Conference on
Communication, Information and Internet Policy. DOI:
http://dx.doi.org/10.2139/ssrn.2242144
[23] Hasso Plattner, Christoph Meinel and Ulrich Weinberg. 2009, Design Thinking,
München (Mi-Verlag).
[24] Ben Popper. 2014. Why parents are raising their kids on Minecraft. Retrieved
September 18, 2017 from http://www.theverge.com/2014/9/15/6152085/why-
parents-love-minecraft
[25] Iulian Radu. 2014. Augmented reality in education: a meta-review and cross-
media analysis. Personal and Ubiquitous Computing. 18(6), 1533–1543. DOI:
https://doi.org/10.1007/s00779-013-0747-y
[26] Iulian Radu, Betsy McCarthy, and Yvonne Kao. 2016. Discovering educational
augmented reality math applications by prototyping with elementary-school teachers.
In Virtual Reality (VR), 2016 IEEE. 271-272. DOI:
https://doi.org/10.1109/VR.2016.7504758
[27] Mitchel Resnick. 1998. Technologies for Lifelong Kindergarten. Educational
Technology Research and Development. 46(4), 43-55. DOI:
https://doi.org/10.1007/BF02299672
[28] Kimiko Ryokai, Cati Vaucelle, and Justine Cassell. 2003. Virtual peers as
partners in storytelling and literacy learning. Journal of Computer Assisted Learning.
19(2), 195-208. DOI: http://dx.doi.org/10.1046/j.0266-4909.2003.00020.x
[29] Sidney K. D'mello and Jacqueline Kory. 2015. A Review and Meta-Analysis of
Multimodal Affect Detection Systems, ACM Computing Surveys (CSUR). 47(3),
Article 43, 1-36. DOI: https://doi.org/10.1145/2682899
[30] Larry Tesler. 1983. Enlisting user help in software design. ACM SIGCHI
Bulletin. 14(3), 5-9. DOI: https://doi.org/10.1145/1044774.1534167
[31] Mariët Theune, Sander Faas, Dirk Heylen, and Anton Nijholt. 2003. The virtual
storyteller: story creation by intelligent agents. In Proceedings of the Technologies
for Interactive Digital Storytelling and Entertainment (TIDSE) Conference. 204-215.
DOI: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.5222
[32] Jason D. Williams and Geoffrey Zweig. 2016. End-to-end LSTM-based dialog
control optimized with supervised and reinforcement learning. ArXiv e-prints.
arXiv:1606.01269
[33] Andrew D. Wilson and Hrvoje Benko. 2017. Holograms without Headsets:
Projected Augmented Reality with the RoomAlive Toolkit. In Proceedings of the
2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems
(CHI EA '17). 425-428. DOI: https://doi.org/10.1145/3027063.3050433
[34] Andrew Wilson, Hrvoje Benko, Shahram Izadi, and Otmar Hilliges. 2012.
Steerable augmented reality with the beamatron. In Proceedings of the 25th annual
ACM symposium on User Interface software and technology (UIST ’12). 413-422.
DOI: https://doi.org/10.1145/2380116.2380169
[35] Tiancheng Zhao and Maxine Eskenazi. 2016. Towards end-to-end learning for
dialog state tracking and management using deep reinforcement learning. In the
Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL). ArXiv e-prints.
arXiv:1606.02560
[36] Oren Zuckerman and Ayelet Gal-Oz. 2013. To TUI or not to TUI:. Evaluating
performance and preference in tangible vs. graphical user interfaces. International
Journal of Human-Computer Studies. 71(7-8), 803-820. DOI:
https://doi.org/10.1016/j.ijhcs.2013.04.003