ChapterPDF Available

Development of an Amazon Alexa App for a University Online Search

Authors:

Abstract

Today, our homes become smarter and smarter. We started to interact with our home with Intelligent Personal Assistants, like Amazon Alexa. In this paper we want to present and give a review on the Alexa skill developed for an online search for resources like rooms, courses, and persons. The goal is to provide the users an easy-to-use way to ask for information like phone numbers, e-mail addresses, room details, … and the Alexa Skill should provide this information also in an easy understandable way. We will describe how we solved to formulate suitable search queries from spoken Alexa commands and how we presented them to the user accordingly. Other obstacles like the presentation of the search results, due to the limited context and prioritization for individual search results, to the user will be discussed.
Draft finally published in: Rupitz, J., Ebner, M., Ebner, M. (2022). Development of an Amazon Alexa
App for a University Online Search. In: Zaphiris, P., Ioannou, A. (eds) Learning and Collaboration Tech-
nologies. Designing the Learner and Teacher Experience. HCII 2022. Lecture Notes in Computer Science,
vol 13328. Springer, Cham. https://doi.org/10.1007/978-3-031-05657-4_10
Development of an Amazon Alexa app for a university
online search
Jakob Rupitz and Markus Ebner1[0000-0002-5445-1590] and Martin Ebner1[0000-0001-5789-5296]
1 Graz University of Technology, Graz, Austria
martin.ebner@tugraz.at
Abstract. Today, our homes become smarter and smarter. We started to interact
with our home with Intelligent Personal Assistants, like Amazon Alexa. In this
paper we want to present and give a review on the Alexa skill developed for an
online search for resources like rooms, courses, and persons. The goal is to pro-
vide the users an easy-to-use way to ask for information like phone numbers, e-
mail addresses, room details, … and the Alexa Skill should provide this infor-
mation also in an easy understandable way. We will describe how we solved to
formulate suitable search queries from spoken Alexa commands and how we pre-
sented them to the user accordingly. Other obstacles like the presentation of the
search results, due to the limited context and prioritization for individual search
results, to the user will be discussed.
Keywords: Amazon Alexa, Smart Home, voice-based search, voice user inter-
face, Amazon Echo
1 Introduction
Today, more and more technical devices are available in our homes and become smarter
and smarter. A smart home helps its residents to increase the standard of living and
safety by the aid of additional technical “smart” equipment. These include, for example,
remote-controlled lights and shutters, a door lock that can be locked and unlocked with
a smartphone, a heating control system connected and controlled over the Internet, a
voice-controlled music system and so on.
A possible way to interact with a smart home over voice is the Amazon Alexa plat-
form and its Echo devices. With them, a central point of interaction with the system is
offered to the users. For example, Echo devices allow users to use voice commands to
turn electronic devices in the home on and off (e.g. lights), change the temperature, or
play the next song. Further, learning is supported by Alexa [1], e.g. learning geographic
facts [2] and it can be used in mass education [3] as well.
The skill developed and presented in this paper line up in these Skills and is intended
to simplify students’ daily lives by using a voice query to find out where the lecture
2
hall is located, or which professor is giving the next lecture. This allows the student to
save time to get the information on the side while doing other work.
To develop the skill, we started with categorical queries that a student might make.
Rooms and courses should support the search initially, as these are also in our everyday
language from the language model and therefore easily understood by Amazon Alexa.
Later we added the possibility to search for people, which caused us to encounter some
problems. Among them, a large error rate when recognizing the name of the person to
be searched and the prioritization of the search results.
Beginning with a short introduction of personal Assistants, the paper further focuses
in the Background chapter on Amazon Alexa Skills and Voice User Interfaces. The
following chapter State of the Art will examine existing Alexa Skills, focus on the com-
munication with the uses and lessons learned for the development of the university
search skill. Before discussing the implementation, an overview about the technology
used to develop the skill is given. The chapter implementation deals with the imple-
mentation of the Amazon Alexa Skills for the search at Graz University of Technology.
Finally, the authors will discuss the development and give an outlook on future features
to be developed for the skill.
2 Background
This chapter will introduce the history and background of Assistants and focus on ex-
isting Amazon Alexa Skills in terms of functions, procedures, and analyzation of their
conversation models as well as Voice User Interfaces.
2.1 Personal Digital Assistants
Personal digital assistants (PDAs) are not a new invention. Already in 1984, the first
digital assistant came onto the market [4] to simplify daily tasks for users. PDAs were
used as calendars, address books or as e-mail devices with steady increasing popularity.
Cell phones and smartphones made PDAs useless, as more modern cell phones came
with more functions. With the ability to surf the Web and use various apps, the new
smartphones became more and more popular; PDAs were hardly used anymore.
Intelligent Personal Assistants. Today's Intelligent Personal Assistants (IPAs) can be
compared to earlier PDAs. As Hauswald et al. describe it, "An IPA is an application
that uses inputs such as the user's voice, vision (images), and contextual information to
provide assistance by answering questions in natural language, making recommenda-
tions, and performing actions." [5]. An IPA is a more modern PDA, with additional
features such as voice recognition, touch screen, camera and so on. In any case, an IPA
is normally connected to the Internet and attempts to answer the user's questions with
information acquired from the web. Modern IPA devices can further expand their range
of functions through machine learning and Artificial Intelligence (AI).
3
Artificial Intelligence. Artificial Intelligence (AI) refers to a large field of computer
science that is characterized by "intelligent behavior". The German Research Center for
Artificial Intelligence defines the term as follows: "Artificial Intelligence is the property
of an IT system to exhibit 'human-like' intelligent behavior." [6]
According to Russel and Norvig, research on AI divides into two categories: Human
vs. Rational and Thought vs. Behavior [7]. The combination of the two categories re-
sults in four different combinations, each with their own research areas. Further, sub-
areas of AI in computer science include Natural Language Processing, Knowledge Rep-
resentation, Automated Reasoning and Machine Learning.
2.2 Amazon Alexa Platform
Amazon was founded in 1994 by Jeff Bezos in a garage on the US coast [8]. Originally,
only books were sold, but due to the simplicity of the purchasing process, the company
was a success. Meanwhile, the company operates one of the largest online stores world-
wide. To attract even more customers, Amazon is constantly researching new concepts.
In addition to Amazon Web Services (AWS) which provides infrastructure in the form
of IT resources and servers, the Amazon Prime offering with music and movie stream-
ing services, the company also has other products in stock. One of them, which was
created by the employees/staff of the Amazon Development Center, is called Alexa.
Alexa is a voice assistant that runs on a wide variety of devices and is designed to
simplify the lives of its users. The voice assistant first appeared on the market in 2015,
in the form of Amazon Echo devices, and is completely cloud-based [8]. This means
everything a user says is processed through AWS to give them the best possible result
to their question. Through cloud connectivity, Alexa is constantly learning new features
and has become continuously more useful. Alexa now speaks nine languages and can
be found on many electronic devices around the world [9].
Functionality. Alexa falls into the category of IPAs [10]. This refers to a speech-ena-
bled device that can understand a user's voice and other important information from
context for providing answers to questions in a spoken form. Many new IPAs emerged
in the last years. Examples of the most known include Google Now from Alphabet, Siri
from Apple, Cortana from Microsoft, and Bixby from Samsung [10]. They are able to
understand spoken words, process them and interact with the user. At today's level,
these assistants are largely used to perform small tasks that are intended to make the
user's life easier. For example, users can utilize voice commands to select the music
they want to listen to, adjust the volume, control smart home components such as lights,
or set a timer. Alexa is built in such a way that developers can program apps for it -
Amazon calls them skills [11]. In this way, Amazon wants to constantly expand the
functionality of its voice assistant and give other companies the possibility to develop
their own skills.
Structure of the Alexa platform. When a user asks the voice assistant something, a
special procedure runs in the background. The device running Alexa records what is
4
said using speech recognition as well as machine learning. Depending on what was
said, Alexa builds a conversation to learn even more information from the user. Once
it has learned the necessary facts, Alexa sends a query to a service that can provide the
appropriate results. The results are then processed by the skill and presented to the user
in the form of a voice output.
Structure of a request. Amazon's Alexa consists of two modules. On the one hand,
there is the user interface, which is in contact with the user. This is a so-called Voice
User Interface (VUI). The VUI differs from a Graphical User Interface (GUI): There is
only acoustic communication between the user and the device. Thus, it can be thought
of as the front end of Amazon Alexa.
On the other hand, the processing on the server with the help of lambda functions,
the communication between the API and the frontend, and the preparation of the data
for the voice user interface are all part of the same process. In other words, the backend
of Amazon Alexa.
The frontend, the VUI, is there to interpret the requests from the user and to pass
them on to the backend. To do so, it must know which function the user wants to call
and which data the user provides for this function. For this purpose, there are the so-
called intents and utterances.
An intent specifies the function in the backend, and the utterances specify the real
words that the user could say to activate this intent. This means an Amazon Alexa skill
consists of multiple intents, which in turn consist of multiple utterances. A slot can also
be configured for the utterances. The slot works similar to a variable and is used to
detect parameters in the utterance and then forward them to the lambda function in the
backend [12].
If several slots are needed for a complete intent, and the user does not provide them
in the sentence, slot filling can be used. Depending on which slots are missing, Alexa
asks the respective ones. The user can then fill in these slots with further utterances.
Furthermore, it is possible to have Alexa ask for a confirmation at the end of the intent
to check whether the conversation was processed correctly or not.
2.3 Voice-User-Interface
The GUI and the VUI have the same goal: get information from the user and then to do
something with this information. There are some differences that are important to think
about. The main ones are providing a user-friendly and practical usage of the system.
With a GUI, users usually use physical input devices such as a mouse, keyboard,
touch screens to click or enter something in a text field. The user decides what to click,
what to type into, and what data to be sent. This type of human-machine interaction has
been very successful over the last few decades. In addition, the GUI helps to correct
user errors by directly correcting words through spelling programs or displaying sug-
gestions to the user. The interaction between human and machine is quite fast, as the
graphical feedback is usually seen immediately after the input.
5
The VUI on the other side is still relatively new. It has been around only for several
years, but since more powerful smartphones reached the mainstream market, this tech-
nology has become popular among smartphone users [13]. With Apple's Siri (intro-
duced in 2006) and Google's Google Now (introduced in 2007), the general population
is benefitting from VUIs.
These VUIs, which mostly find their way to the customer in the form of digital as-
sistants, are very easy and convenient to use. You no longer have to type, you can
simply say out loud what you want and don't even have to look at the screen, because
the digital assistant usually also answers by voice.
This means that it is now possible to interact with computers without taking your
hands away from the current activity. One problem, however, is still the accuracy of
speech recognition, so-called Natural Language Processing (NLP). Many linguistic ex-
pressions depend strongly on the surrounding context of the statement for their content.
Words that are homonyms denote several concepts.
For example, there are different meanings for the word "the ball". On the one hand
the dance event, and on the other hand soccer. So-called phonetic words are very diffi-
cult to interpret correctly because there is usually no context [14]. Written out, for ex-
ample, there are still the capitalization and lower case, which can draw attention to the
meaning of a word. Spoken however, these capitalizations do not exist.
For a VUI, the recognition and correct interpretation of the language of the user is
therefore very difficult, since there is no other input besides the spoken one, and thus
there is no context that could be helpful in interpreting the phrase [14]. Another problem
is the privacy of the user. For Amazon Alexa to be activated when someone says the
keyword "Alexa", her microphones must be continuously active and picking up ambient
sounds. Once the cue is said, the conversation is recorded and stored in Amazon Web
Services.
3 State of the Art: Examples for Amazon Alexa Skills with
search function
This chapter will examine existing Alexa Skills in terms of their structure and function-
ality. The focus will be on how the communication with its users is structured and how
user-friendly the chosen strategy is.
3.1 Deutsche Bahn
The Amazon Alexa Skill from Deutsche Bahn is developed for train connection search.
It is designed to make people's everyday lives easier by announcing information about
the next departure times and picking out special connections for users. For example,
Amazon Alexa answers the question: "Alexa, ask Deutsche Bahn for the next connec-
tion to sports." [15] with "Your direct connection from Frankfurt (Main) Hauptbahnhof
to Frankfurt (Main) Süd: departure on time at 15:42 with S6 from platform 101. Arrival
expected at 15:53." [15] thus giving the user an exact answer without ever having to
touch his/her smartphone or computer.
6
If one starts the skill with "Alexa, start Deutsche Bahn!", one is first greeted with.
"Welcome to Deutsche Bahn! What do you want to search for: Connections or De-
partures?". Here you can already see the first design decision of the developers.
Connections and Departures are two different categories, and the Deutsche Bahn
Skill asks for this category first. This is the case because different intents are used for
the two categories and thus a large part of the utterances that the user could say are
omitted.
If one decides on a category, Amazon Alexa asks for the slots that still must be filled
in so that the intent is complete and can make the API call. In this case, they are the
origin and destination stations as well as the date and time of departure. The origin and
destination stations are of slot type AMAZON.DE_City, the date and time are of slot
type AMAZON.DATE and AMAZON.TIME and thus all slots have a build-in slot
type. This means that an NLP model has been trained on the cities, date and time and
thus the chance of correct speech recognition is relatively high.
The lambda function in the backend now sends a request to the Deutsche Bahn
server. The result is then played back audibly on the Echo device and if the Echo device
also has a display, the result is also shown graphically on the screen.
3.2 Spotify
Spotify is a music streaming service that has made over 70 million songs, audio books
and podcasts available to 400 million active users since its founding in 2006 [16].
Spotify is available as a program on computers, as an app on Android and iPhone, and
also on various voice assistants, including Amazon Alexa. To activate Spotify on Am-
azon Alexa, the Spotify account must first be linked to the Amazon account. After that,
Spotify is saved as a music service on Amazon Alexa and the standard commands, such
as play, pause, stop, louder, quieter are all forwarded to the Spotify skill. To play a
song, you can activate the Spotify Skill by saying "Alexa, start Spotify!". A song title
can also be added directly.
The recognition of song titles by the Spotify Skill works well. However, since there
are a lot of songs on Spotify, the artist of the song usually must be named as well,
otherwise Amazon Alexa plays the first song with the requested song title. We also
noticed that certain artists with non-standard names, such as BLOKHE4D, are not rec-
ognized. Even with song title and artist name together it doesn't work very well in this
case. However, with 70 million different songs available on Spotify, this is a relatively
good performance. Most songs that are relatively high in the charts are still recognized
very well.
3.3 Wikinator
Wikinator is an Amazon Alexa Skill, which was developed to find knowledge on the
website wikipedia.org and to render it to the user [17]. For the Wikipedia encyclopedia,
there are already a lot of skills, but all of them only find generic terms or do not return
the correct article. Wikinator takes a different approach. The skill searches for terms
and reads the title of the first article. If this is the article the user was looking for, the
7
user can simply repeat the title and the skill reads the article to the user. If the article is
not the user's desired search result, the user can simply say, "Next article!" or "Previous
article!" [17] and the skill will read the title of the next search result. This allows the
user to navigate through the various search results.
In addition to this search function, the user can also subsequently search a Wikipedia
article that has been found. This works with the help of keywords. The skill searches
the article for the user's keyword and returns relevant information.
3.4 Outcome
The analysis of the three Amazon Alexa Skills reveals a clear picture. Predefined
terms or pre-trained speech models are very important for speech recognition to achieve
a good usability of the skill. It is also of benefit to leave as little as possible to speech
recognition. That means, it should be tried to generate as much information as possible
from the context. A categorical search supports this system by minimizing the risk of
incorrect input from a single function query. In addition, queries with very similar
meanings can lead to conflicts between the individual utterances of the intents, and the
user may get a wrong result.
4 Amazon Alexa - Technology overview
This chapter presents Amazon software used for developing the Alexa skill for the uni-
versity online search.
4.1 Amazon Alexa Developer console
The Amazon Alexa Developer Console is the main entry point for all Amazon Alexa
developers. Since Amazon Alexa is an online service, it must be stored somewhere and
be accessible to all. To make it as easy as possible for developers, Amazon offers this
hosting service. An Amazon Alexa Skill is therefore always stored on Amazon's servers
from its first initialization, during its development and right up to its publication. The
development process also takes place online. For this purpose, Amazon offers software
developers an Integrated Development Environment (IDE). This can also be found in
the Amazon Developer Console. It is also possible to write the entire code offline in an
editor, but this complicates testing because it must be uploaded to the Amazon servers
for each test.
Amazon offers three different methods for the backend to host an Alexa Skill. In the
first case, Amazon will provide its servers and give the developer a Node.js template
as a start file. The skill is then hosted entirely on Amazon's servers, both frontend and
backend. Secondly, there is a similar method, but instead of Node.js, Python is used.
So only the programming language is different. The third option is to host the skill itself
on specifically designed hardware provided from the developer's company. This makes
sense if the skill generates or requires very large amounts of data. For example, for
music or video streaming applications.
8
The Developer Console consists of several modules (see Fig. 1). The Dashboard is
the entry point for that. Here all current projects are shown where it is possible to edit
monetization, view current invoices or payments, set the data rates of the hosting, and
set general options for the developer account. Clicking on an ongoing project takes you
to the skill-specific page, now you are in the Build module. Here, the basic settings of
the Amazon Alexa skill are defined, such as the call name, the intents and utterances,
the slots and slot types, multimodal responses, interfaces, and if a separate endpoint is
used, this can also be defined here.
[place Fig. 1 here]
Fig. 1. A Screenshot of the test environment in the Developer Console. The various modules
are displayed at the top.
Code Module. This page functions as an online IDE. It works like a code editor such
as Visual Studio or Notepad++. Here, you can edit the individual files of the lambda
function and define what the backend should do. A request from the frontend arrives
here, is processed, and a suitable response is created. This is also where the access to
the S3 storage (Amazon's cloud storage for files) and the CloudWatch logs (the log files
of the test environment) are located.
Text module. In this module, the developer can test the programmed skill. Here,
you can enter phrases via microphone or keyboard to test the frontend or the backend.
This module acts like an Alexa simulator, and you get exactly the same output that a
user would get later.
Distribution module. The distribution module contains the settings for publishing
the skill. This means that you can enter the name, a description, examples and other
information about the skill.
Certification and Analysis are there to fix bugs and validate the skill, or to track
the number of users after publication and get general statistics.
4.2 Alexa Presentation Language
Amazon Alexa is a digital voice assistant. Over time, however, the developers at Am-
azon have built small screens into the Echo devices. This has the advantage of not only
being able to give the user auditory feedback, but also to display the search results
graphically. Graphical feedback is much better for user experience than purely auditory
feedback, because the user sees all the results at once and does not have to listen to
them one after the other from Alexa.
The Alexa Presentation Language is for displaying content on the screens. The way
this works is that in the backend, while formulating the result, a JSON file with the
answer is passed to the frontend. In this file, the screen layout is defined by document
and the basic framework for data formulation is provided. In the backend, the developer
can now save the search results in a list and add them to the JSON file as datasources.
9
This basic framework is then populated with content. The JSON code is then passed
with the response to the frontend, which then renders the content on the screen.
5 Implementation of the Alexa Skill
This chapter deals with the implementation of the Amazon Alexa Skills for the
search at Graz University of Technology. In particular, the frontend, backend, the indi-
vidual functions, and the queries.
5.1 Skill invocation
Skill Invocations are used to activate an Amazon Alexa skill. For example:
"Alexa, start TU Graz!". Here you define the name of the skill, it should be short and
concise and easy to understand.
In our case, we use t. u. graz. The dots after the letters ("t" and "u") show Amazon
Alexa that the letters are pronounced and not spelled out in one word ("T. U. Graz" and
not "tu Graz"). The upper and lower case of words makes no difference here, as Amazon
Alexa automatically converts all sentences, words and letters to lower case. This sim-
plifies the later parsing of the information.
Furthermore, there is the possibility to create Skill Launch Phrases, which are used
to activate the skill without using the Invocation-name. For example, "Alexa, what is
the weather for tomorrow?". This Launch Phrase directly activates the Intent of a
weather skill and queries the weather for the next day. However, these Launch Phrases
can only be created once the Amazon Alexa Skill has been released, so we'll leave it
out here.
5.2 Interaction Model
The interaction model defines the conversation between the user and Amazon Alexa.
In this chapter the intents are specified, the utterances are defined, and the slots as well
as slot types are determined.
Intents. Intents define the various functions of the skill. Depending on which intent
the user activates, the specified intent is executed, and the function is called in the
backend. There are five intents, which are already included in most interaction models
by default. These are AMAZON.CancelIntent, AMAZON.HelpIntent,
AMAZON.StopIntent, AMAZON.NavigateHomeIntent and
AMAZON.FallbackIntent. These describe the functionalities that every Amazon Alexa
Skill should have. The user should be able to cancelthe skill, ask for help, stop
it, and navigate hometo the welcome message. These four intents call the functions
in the backend ("IntentHandler"). These functions are also included by default and
should be customized by the developer to fit the design of the rest of the skill [11]. If
the user says something that cannot be assigned to an intent, the fallback intent is
10
triggered. It returns an error message that describes the further procedure to the user.
This could be something like: "I'm sorry, I didn't understand that. Can you repeat that?".
In addition to these required intents, there are the custom intents that determine the
functionality of the skill.
Here we have introduced the intents "room", "person" and "course". These are the
three categories for which the TU-Graz-Search should work.
Room-Intent. To search for rooms at Graz University of Technology, there is the
room intent. This requires different utterances to trigger this Intent. The elementary
utterance is simply "room". This allows the user to select one of the categories after
being asked if he/she wants to search for a room, person or course and thus trigger the
intent. A few more utterances would be something like "where is the {request}", "I am
looking for the lecture hall {request}", "where is the {request}".
Where {request} is always the slot to be filled with the search query. One advantage
is that utterances like "room" or "auditorium" can also be said on their own, since the
slot fulfillment setting causes the {request} slot to be filled in any case before the intent
sends the query to the backend. So even if an utterance like "room" is said and the slot
is not yet filled, or if the word is not understood, Amazon Alexa will prompt with a
voice prompt to fill the slot. After the utterance says "room", Amazon Alexa prompts
the utterance with the question "What room are you looking for?" to name a room.
In addition to Slot Fulfillment, there is also the setting Slot Confirmation or Intent
Confirmation. This setting is used to ask the user once again whether what Amazon
Alexa has understood is correct and reflects the user's intention. Amazon Alexa asks
("Are you sure you want to search for {request}?") and the user must answer "Yes" or
"No".
Person intent. The principle of the intents and utterances works here very similar to
the other intents, only the utterances are different, and the settings vary a bit. In this
intent we also have a categorical utterance ("Person") which triggers the intent. Another
utterance with slot would be: "give me information about {request}". The filling of the
slots happens here again with Slot-Fulfillment and Slot-Confirmation.
Course Intent. The last intent of our skill is responsible for searching for courses.
As a categorical utterance there is again the word "course", and as an utterance with
slot the sentence "Search for course {request}". The filling of the slots happens here
again with Slot-Fulfillment and Slot-Confirmation. Slot-Confirmation.
Problems with Intents. One problem that mainly affects the Person intent is recog-
nizing the language of the user. Amazon Alexa is good at recognizing colloquial words
and phrases, but very poor at recognizing proper names or last names. The language
model, which is adjusted by the settings in the interaction model, is not trained for the
different last names, and thus does not know how or what it should recognize when the
user searches for a person. This affects searching for people as well as searching in
general. It is very difficult with a voice user interface to develop a language model that
can know all the words in a language and all the technical terms and interpret them
11
correctly in context. Amazon Alexa Skills are relatively good at having simple conver-
sations where the manner of the conversation is predictable.
However, Amazon offers a partial solution here with what it calls slot validation.
Here, Amazon allows developers to adjust certain slots so that they only recognize val-
ues that have been programmed in in advance. For example, in a slot that is supposed
to recognize food for a shopping list, the food could be programmed in as a word be-
forehand. This way, Amazon Alexa subsequently knows what words to expect from the
user and can match the user's spoken phrase with this internal list and find the right
word. Basically, it works like a dictionary.
One limitation, however, is that this list of pre-programmed words can only be 1000
words long. This is probably sufficient for a grocery shopping list, but a problem for a
people search with several thousand different names. Here one would have to filter out
and enter the most probable thousand persons to solve this problem.
5.3 Multimodal Responses
Certain Echo devices, such as the Amazon Echo Show 8 or the Amazon Echo Show
15, not only have a microphone and speaker, but they also have a build-in display. This
allows developers to create visual feedback in addition to auditory feedback to the user.
[place Fig. 2 here]
Fig. 2. Screenshot of the development environment for visual multimodal responses.
To accomplish this, we use the Alexa Presentation Language (APL). With the help
of APL, we can create a graphical user interface that can provide the user with even
more information, in addition to the VUI. For this purpose, Amazon provides a tool in
the Developer Console that we can use to create such a multimodal response. This tool
is shown in Fig. 2.
Using drag-and-drop, individual modular elements can be dragged onto the screen
and customized. This mainly uses placeholders, which are later replaced by the actual
values in the backend.
[place Fig. 3 here]
Fig. 3. Screenshot of a multimodal response: search results displayed as a list.
We use the display to show a welcome message and a first tip on how to use our
Amazon Alexa Skill. In addition, the search results of rooms and courses are displayed
in a list. Extra information such as the room number, or the address are additionally
displayed in the respective entry in the list. An example of this list can be seen in Fig.
3.
12
5.4 Lambda-Function
The lambda function processes the requests of the frontend. Each intent in the
frontend has a corresponding intent handler in the backend. Thus, if e.g. the room intent
is called, it passes the information to the room handler, which then processes this infor-
mation. The structure of the lambda endpoint is explained in Fig. 4. In the file
lambda_funtion.py the intent handlers are registered. The handlers.py file contains all
handlers.
[place Fig. 4 here]
Fig. 4. Representation of the directory of the lambda endpoint.
6 Discussion
Since its release in 2015, Amazon Alexa has become a very large platform with thou-
sands of skills. Most of the skills are small games, puzzles, news, tools, music services
or applications that represent companies. Most of them have a very limited repertoire
of intents, which makes them quite simple and very easy to use.
One problem with a voice user interface is the small amount of information ex-
changed compared to the time the user must spend on it. If you give the user the same
amount of time to look through an entire list of results on the screen, as it takes Amazon
Alexa to read just two of those results aloud, the user has probably already read through
the entire list of results. This major disadvantage of VUIs makes them inferior to GUIs
and therefore will never have such a presence in the modern world as GUIs.
The developed Amazon Alexa Skill described in this paper is able to recognize three
different search categories and give the user an answer to a simple search query. This
answer consists of a spoken phrase and a list of search results displayed on a screen.
Combining a voice user interface with a graphical user interface, as is already the
case with certain Amazon Echo devices or Google Nest Hub, improves the user expe-
rience many times over, as the advantages of both systems add up and the disadvantages
are compensated for by the respective other system.
Extensions for our skill in this case would mean even better prioritization of search
results. The functionality of the search can be extended by navigating to the lecture hall,
writing an email to the person you are looking for, registering for a course, or integrat-
ing online services of the TU Graz.
Going further, this skill could be extended to other platforms like Google's "Now"
or Microsoft's "Cortana", and even more functionalities could be added. These could
include directions to the exact address of the lecture hall, or an appointment calendar
that checks the students' online system to find out when their next lecture is scheduled
to begin and informs the students accordingly.
13
References
1. Weiss, M., Ebner, M. & Ebner, M. (2021). Speech-based Learning with Amazon Alexa. In
T. Bastiaens (Ed.), Proceedings of EdMedia + Innovate Learning (pp. 156-163). United
States: Association for the Advancement of Computing in Education (AACE). Retrieved
July 27, 2021 from https://www.learntechlib.org/primary/p/219651/.
2. Bilic, L., Ebner, Markus, Ebner, Martin (2020) A Voice-Enabled Game Based Learning
Application using Amazon's Echo with Alexa Voice Service: A Game Regarding Geo-
graphic Facts About Austria and Europe, International Journal of Interactive Mobile Tech-
nologies (iJIM), 14 (3), pp. 226 - 232
3. Schoegler, P., Ebner, M. & Ebner, M. (2020). The Use of Alexa for Mass Education. In
Proceedings of EdMedia + Innovate Learning (pp. 721-730). Online, The Netherlands:
Association for the Advancement of Computing in Education (AACE).
4. Viken, A. (2009). The history of personal digital assistants 19802000. Agile Mobility, 10.
5. Johann Hauswald u. a. „Sirius: An Open End-to-End Voice and Vision Personal Assistant
and Its Implications for Future Warehouse Scale Computers“. In: ACM SIGPLAN Notices
50 (Mai 2015), S. 223238. DOI: 10.1145/2775054.2694347.
6. Bitkom e.V und Deutsches Forschungszentrum für Künstliche Intelligenz. „Künstliche In-
telligenz“. URL: https://www. dfki.de/fileadmin/user\%5Fupload/im-
port/9744\%5F171012- KI- Gipfelpapier-online.pdf, last accessed 2021/10/30.
7. Stuart J. Russell und Peter Norvig. Artificial Intelligence: A modern approach. 4. Aufl.
PEARSON, 2020.
8. Amazon. Unsere Geschichte: Was aus einer Garagen-Idee werden kann? Feb. 2020. URL:
https://www.aboutamazon.de/%C3%BCber-amazon/unsere-geschichte-was-aus-einer-gar-
agen-idee-werden-kann, last accessed 2021/10/28.
9. Theresa Strohbach. Amazon Alexa: 28 unterschiedliche Stimmen in 9 Sprachen verfügbar.
Aug. 2018. URL: https://www.amazon-watchblog. de/technik/1475-amazon-alexa-9-spra-
chen-28-stimmen.html, last accessed 2021/11/15.
10. Nil Canbek und Mehmet Mutlu. „On the track of Artificial Intelligence: Learning with
Intelligent Personal Assistants“. In: International Journal of Human Sciences 13 (Jan.
2016), S. 592. DOI: 10.14687/ijhs.v13i1.3549
11. Amazon. Amazon Alexa Voice AI | Alexa Developer Official Site. 2021. URL: https://de-
veloper.amazon.com/en-US/docs/alexa/ask-overviews/what-is-the-alexa-skills-kit.html, last
accessed 2021/11/15.
12. Peter Haase u. a. „Alexa, Ask Wikidata! Voice Interaction with Knowledge Graphs using
Amazon Alexa“. In: SEMWEB. 2017.
13. Archna Oberoi. The Rise of Voice user interface (VUI). March 2020. URL: https://in-
sights.daffodilsw.com/blog/the-rise-of-voice-user-interface-vui, last accessed 2021/10/28.
14. Nathan Hunt. Voice input: the interface problem - UX Collective. Aug. 2018. URL:
https://uxdesign.cc/voice-input-the-interface-problem-1700be45ec18, last accessed
2021/11/15.
15. DB Redaktion. Deutsche Bahn Skill auf Amazon Alexa: Reiseauskunft per Sprachsteue-
rung jetzt noch flexibler. Mai 2021. URL: https://inside.bahn.de/amazon-alexa/ , last ac-
cessed 2021/11/02.
16. Marinus Martin und Friedrich Kühne. Spotify: Test | Kosten | Unterstützte Geräte im
Überblick | NETZWELT. Apr. 2021. URL: https://www.netzwelt.de/spotify/testber-
icht.html, last accessed 2021/11/02.
17. Alexander Martin. Wikinator: Amazon.de. URL: https://www.amazon. de/Alexander-Mar-
tin-Wikinator/dp/B07LFFHYLM, last accessed 2021/11/03.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
An educational, interactive Amazon Alexa Skill called “Österreich und Europa Spiel / Austria and Europe Game” was developed at Graz University of Technology for a German as well as English speaking audience. This Skills intent is to assist learning geographic facts about Austria as well as Europe by interaction via voice controls with the device. The main research question was if an educational, interactive speech assistant application could be made in a way such that both under-age and full age subjects would be able to use it, enjoy the Game Based Learning experience overall and be assisted learning about the Geography of Austria and Europe. The Amazon Alexa Skill was tested for the first time in a class with 16 students at lower secondary school level. Two further tests were done with a total of five adult participants. After the tests the participants opinion was determined via a questionnaire. The evaluation of the tests suggests that the game indeed gives an additional motivational factor in learning Geography.
Article
Full-text available
In a technology dominated world, useful and timely information can be accessed quickly via Intelligent Personal Assistants (IPAs). By the use of these assistants built into mobile operating systems, daily electronic tasks of a user can be accomplished 24/7. Such tasks like taking dictation, getting turn-by-turn directions, vocalizing email messages, reminding daily appointments, setting reminders, responding any factual questions and invoking apps can be completed by IPAs such as Apple’s Siri, Google Now and Microsoft Cortana. The mentioned assistants programmed within Artificial Intelligence (AI) do create an interaction between human and computer through a natural language used in digital communication. In this regard, the overall purpose of this study is to examine the potential use of IPAs that use advanced cognitive computing technologies and Natural Language Processing (NLP) for learning. To achieve this purpose, the working system of IPAs is reviewed briefly within the scope of AI that has recently become smarter to predict, comprehend and carry out multi-step and complex requests of users.
Article
Full-text available
As user demand scales for intelligent personal assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana, we are approaching the computational limits of current datacenter architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this paper, we present the design of Sirius, an open end-to-end IPA web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FP-GAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of 7 benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 10 × and 16 ×. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of datacenters by 2.6 × and 1.4 ×, respectively.
Was aus einer Garagen-Idee werden kann?
  • Amazon Unsere Geschichte
Amazon. Unsere Geschichte: Was aus einer Garagen-Idee werden kann? Feb. 2020. URL: https://www.aboutamazon.de/%C3%BCber-amazon/unsere-geschichte-was-aus-einer-garagen-idee-werden-kann, last accessed 2021/10/28.
The history of personal digital assistants
  • A Viken
Viken, A. (2009). The history of personal digital assistants 1980-2000. Agile Mobility, 10.
Amazon Alexa Voice AI|Alexa Developer Official Site
  • Amazon
Amazon. Amazon Alexa Voice AI | Alexa Developer Official Site. 2021. URL: https://developer.amazon.com/en-US/docs/alexa/ask-overviews/what-is-the-alexa-skills-kit.html, last accessed 2021/11/15.
Ask Wikidata! voice interaction with knowledge graphs using amazon alexa
  • P Haase
Peter Haase u. a. "Alexa, Ask Wikidata! Voice Interaction with Knowledge Graphs using Amazon Alexa". In: SEMWEB. 2017.
Deutsche Bahn Skill auf Amazon Alexa: Reiseauskunft per Sprachsteuerung jetzt noch flexibler
  • D B Redaktion
DB Redaktion. Deutsche Bahn Skill auf Amazon Alexa: Reiseauskunft per Sprachsteuerung jetzt noch flexibler. Mai 2021. URL: https://inside.bahn.de/amazon-alexa/, last accessed 2021/11/02.
The rise of voice user interface (VUI
  • A Oberoi
Archna Oberoi. The Rise of Voice user interface (VUI). March 2020. URL: https://insights.daffodilsw.com/blog/the-rise-of-voice-user-interface-vui, last accessed 2021/10/28.