Conference PaperPDF Available

Dynamic Action Selection Using Image Schema-based Reasoning for Robots

Authors:

Abstract and Figures

Dealing with robotic actions in uncertain environments has been demonstrated to be hard. Many classic planning approaches to robotic action make the closed world assumption, rendering them inefficient for everyday household activities, as they function without generalizability to other contexts or the ability to deal with unexpected changes. In contrast, humans robustly execute underspecified instructions in unfamiliar environments. In this paper, we initiate our research program where we propose the use of functional relations in the form of image-schematic micro-theories, formally represented in ISL, to enrich action descriptors with semantic components. It builds on the body of work in embodied cognition showing that human conceptualization of action sequences is founded on abstract patterns learned from physical experiences in the form of spatiotemporal relationships between objects, agents and environments. These theories are used to inform action selection mechanisms for behavioural robotics written in EL++ and we argue how these micro-patterns can be applied in a more general way to deal with underspecified action commands and commonsense problem-solving.
Content may be subject to copyright.
Dynamic Action Selection Using Image
Schema-based Reasoning for Robots
Maria M. Hedblom1,Mihai Pomarlan2,Robert Porzel3,Rainer Malaka3and
Michael Beetz1
1Institute of Artificial Intelligence, University of Bremen, Am Fallturm 1, 28359 Bremen, Germany
2Applied Linguistics Department, University of Bremen, Uni-Boulevard 13, 28359 Bremen, Germany
3Digital Media Lab, University of Bremen, Bibliothekstr. 5, 28359 Bremen, Germany
Abstract
Dealing with robotic actions in uncertain environments has been demonstrated to be hard. Many classic
planning approaches to robotic action make the closed world assumption, rendering them inefficient for
everyday household activities, as they function without generalizability to other contexts or the ability
to deal with unexpected changes. In contrast, humans robustly execute underspecified instructions in
unfamiliar environments. In this paper, we initiate our research program where we propose the use of
functional relations in the form of image-schematic micro-theories, formally represented in ISL
𝐹 𝑂𝐿
, to
enrich action descriptors with semantic components. It builds on the body of work in embodied cognition
showing that human conceptualization of action sequences is founded on abstract patterns learned from
physical experiences in the form of spatiotemporal relationships between object, agents and environments.
These theories are used to inform action selection mechanisms for behavioral robotics written in EL++
and we argue how these micro-patterns can be applied in a more general way to deal with underspecified
action commands and commonsense problem-solving.
Keywords
Cognitive robotics, image schemas, reasoning, uncertain environments, action descriptors
1. Introduction
Robot agents are starting to accomplish human-scale everyday manipulation tasks such as setting a
table, cleaning up, and preparing (very) simple meals. Most knowledge representation approaches
to reasoning about actions conceptualize the repertoire of robots as a state transition system, with
actions as atomic transitions between states [
1
]. Representatives of this research approach are
PDDL [2], situation [3] and event calculus representations [4,5] and their variations.
This abstraction is critical from the robot agent perspective because the main reasoning task of
a robot agent is to infer how it has to move its body in order to accomplish an underdetermined
task such as “put the oat milk on the table” without causing unwanted side effects. However,
the use of abstract action models make robots inflexible as the same task can be implemented
JOWO 2021
"
hedblom@uni-bremen.de (M. M. Hedblom); pomarlan@uni-bremen.de (M. Pomarlan); porzel@tzi.de (R. Porzel);
malaka@tzi.de (R. Malaka); beetz@uni-bremen.de (M. Beetz)
0000-0001-8308-8906 (M. M. Hedblom); 0000-0002-1304-581X (M. Pomarlan); 0000-0002-7686-2921
(R. Porzel); 0000-0001-6463-4828 (R. Malaka); 0000-0002-7888-7444 (M. Beetz)
© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
through different body motions: the robot can close a door with its hand but, analogically, it can
also use its elbow if the hand is occupied by carrying an object. Further, the same motion can
implement different tasks: a pushing motion can close a door but also position an object more
accurately. The lack of such reasoning capabilities not only limits the manipulation tasks that
robots can perform, but the accuracy by which it performs the actions as well.
There are approaches to include more detailed representations, such as the combination of task
and motion planning. However, motion planning typically tries to compute collision-free paths
rather than motions that achieve certain effects [
6
]. Approaches to axiomatizing manipulation
actions have been attempted for some commonsense reasoning problems, e.g., egg cracking [
7
].
However, these often result in long and complicated axiomatizations that are difficult to ground
into the action execution systems of robots.
In this paper, we propose to equip knowledge representation and reasoning systems for robotic
agents with a generalizable layer of understanding of the conditions in the environment. This
allows the robot to reason about action execution in terms of motion types when encountering
unexpected changes in a given situation. For any physical state of affairs in the world, this layer of
understanding can be described using a set of functional relationships between objects, agents and
environments, called image schemas. While in an early stage, we will argue that the contribution
of our proposition is multifold:
(i) Reasoning about functional relationships: It allows robots and artificial agents to reason
about the functional relationships between objects and other entities situated in complex environ-
ments in a cognitively-plausible way. (ii) Reasoning about alternatives to a plan: This semantic
grounding of the environment offers problem-solving capabilities in that it provides means to
expand the reasoning outside the specified action plan. This enables corrective actions such as
avoidance of blockages. (iii) Increase adaptability through analogy: It also offers generalizability
in that information relevant to one situation can be analogically transferred to another situation.
(iv) Improve natural language understanding: Building on research founded in cognitive lin-
guistics, it enables a large body of work to be leveraged towards increasing natural language
understanding and for robots to follow human instructions and commands more accurately.
2. Action Semantics with Image Schemas and Affordances
Adult humans are creatures of habit. Repeated experiences with similar situations, i.e. particular
states of the world, shape us into experts on a variety of situations possible in our environment.
We can use generalized patterns of information from our previous experiences to reason about
outcomes of uncertain situations.These expectations extend to action instructions as well. Asking
someone to set the table or get the milk will likely result in a satisfactory outcome, regardless of
whether that person has been in that particular kitchen or not. Humans have extensive understand-
ing of the affordances of a kitchen, which tools are needed for eating, and an appropriate set-up
of a dining table given different contexts, such as type of meal or number of participants. In com-
parison, robots, which are relatively proficient in well-defined tasks in constrained environments
[8?], struggle when confronted with vague and underspecified instructions.
To aid this research agenda, we propose to utilize the underlying patterns of expectations
found in humans as suggested by the theory of embodied cognition [
9
]. It proposes that our
conceptualization and understanding of the environment comes from perceiving and interacting
with it. Such information is formed into generalized patterns, image schemas. These encompass
the spatiotemporal relationships between objects, agents and environments1.
One way this could be computationally realized is to characterize the image schemas in relation
to the affordances they require [
10
]. Concrete examples are, for instance, how a glass affords
CONTAINMENT of liquids and a plate offers SUP PO RT for food, but also abstract concepts and
more dynamic transformations are included in this, e.g. how ‘space to grow’ is ENABLEMENT and
SCALING. Turning affordance theory into a computationally applicable theory for commonsense
reasoning of events and actions has been approached in both theoretical and applied computer
science [
11
,
12
]. Image schemas have also been proposed to provide patterns to organize hybrid
reasoning involving qualitative and quantitative descriptions of scenes [13].
While affordances can be described by the interplay of respective dispositions of objects and
agents [
14
], image schemas offer another layer of abstraction describing common manifestations
of affordance-based configurations that, furthermore, constitute the meaning of many linguistic
constructions [
15
]. Consequently, it has been argued that the meanings of linguistic units can be
traced back into the generalized patterns of the sensorimotor experiences seen as image schemas,
as well as force-dynamic schemas [16].
The image schemas capture relationships such as SOURCE_PATH_GOAL (S PG) - depicting
movement of an object between two points, LINK - the force-dynamic relationship that connects
objects with one another, and VE RTI CA LITY - vertical movement, and relative position and
symmetry on the vertical axis. Another strength of using image schemas for formal research on
underspecified environments and instructions is that they have been shown to come in different
levels of specificity, e.g. the difference between a tight and loose containment, or how SOURCE_
PAT H_GOAL ranges from simple object movement to increasingly specific notions with source
and goal locations [
17
]. For instance, a milk-carton is a prime example of a tight container which
functions as a required component of transportation (SPG) of any liquid. They have been argued
to manifest as graph hierarchies of increasingly complex image-schematic relationships, and have
been formalized accordingly [17].
The embodied grounding of image schemas makes them a prime subject to be learned through
statistical methods and deep learning. For example, in the work by [
?
], subsymbolic information
of robotic object manipulations are collected and transformed into symbolic Narrative-Enabled
Episodic Memories (NEEMs) featuring a semantics based on the SOMA ontology [
18
]. These
are collected into an episodic memory knowledge base and are used to learn general knowledge
about particular situations – such as, that milk is usually in the fridge, or that cups can be found in
cupboards. A robot that has a NEEM about a fridge containing such perishable objects as cartons
of oat milk, can use this information when tasked to “put the oat milk on the table.” Within
these NEEMs, the image-schematic relationships exist as background knowledge evoked by the
possible actions. In the next section, we provide an overview of the formal system that we employ
for representing the abstract image schemas before moving onto a working example.
1
Some cognitive scientists might oppose this rather narrow view of image schemas. However, we argue that this
is a good starting point for modeling and simulating computational intelligence as this definition is more formalizable
than the abstract, multi-modal notions that may be more accurate from the cognitive perspective.
3. Action and Event Analysis using ISL𝐹 𝑂𝐿
It has been suggested that the conceptualization of particular events and actions can be described
using image schema profiles [
19
]. While these profiles tend to be conceptually unstructured and
describe simple groups of the image-schematic relationships that represent the way we think
about a particular concept or event, recent research in knowledge representation has brought forth
an approach to employ structured combinations of image schemas to describe in conceptual detail
what functional relationships take place in certain activities [
20
,
21
]. After introducing ISL
𝐹 𝑂𝐿
,
we use the household action of ‘fetching milk’ to demonstrate how such structured combinations
can look for our action selection approach.
3.1. The Image Schema Logic, ISL𝐹 𝑂𝐿
ISL
𝐹 𝑂𝐿
, the image schema logic, is an expressive multi-modal logic intended to capture the basic
spatiotemporal interactions present in image-schematic events [
17
]. In short, it combines the
Region Connection Calculus (RCC) [
22
], Ligozat’s Cardinal Directions (CD) [
23
], Qualitative
Trajectory Calculus (QTC) [
24
], with 3D Euclidean space assumed for the spatial domain,
and Linear Temporal Logic over the reals (RTL). This combination of calculi allows the formal
modeling of spatial relationships between objects and regions in RCC and their relative movement
using a reduced version of QTC with the following syntax:
𝑂1moves towards 𝑂2s position: 𝑂1ù𝑂2,
𝑂1moves away from 𝑂2s position: 𝑂1Ðâ𝑂2
𝑂1is at rest with respect to 𝑂2’s position: 𝑂1𝑂2.
The temporal dimension is based on linear temporal logic (RTL) over the reals [
25
] with future
and past operators. The syntax of this logic is defined by the grammar
𝜙::𝑝| J | ␣𝜙|𝜙^𝜙|𝜙U𝜙|𝜙𝜙
where
𝜙U𝜓
reads as “
𝜙
holds, until
𝜓
” and
𝜙 𝜓
reads as “
𝜙
holds, since
𝜓
. As it is standard in
temporal logic, we can define additional temporal operators based on these, for instance, operators
like:
F𝜙
(at some time in the future,
𝜙
) is defined by
JU𝜙
; and,
G𝜙
(at all times in the future,
𝜙) is defined as F𝜙,
In ISL
𝐹 𝑂𝐿
, the temporal structures, often disregarded due to the increase in complexity in
formal image schema modeling, constitute the primary model-theoretical object, e.g., a linear
order to represent the passage of time, in which complex propositions that employ a secondary
semantics are included. The atoms are topological assertions about regions in space using RCC,
the relative movement of objects with respect to each other using QTC, and relative orientation,
using CD. We refer the reader to [17] for more details on this language.
ISL
𝐹 𝑂𝐿
axioms are based on a concept language in First Order Logic (FOL), making it an
expressive tool to represent different situations and concepts. The idea is that each image schema
(e.g. LINK (x,y) and SOURCE_ PATH_GOAL (x,p,s,g)) is modelled using the logic’s semantics
and is represented using FOL.
3.2. The Underlying Logic of Image Schemas
While ISL
𝐹 𝑂𝐿
is predominantly a modeling language, the image schemas are defined by internal
logical rules that can be used to reason with. For instance, the CONTAINMENT relationship is
transitive in that an object that is contained within another object will move if the container moves
(consider how the milk will remain in the carton when the carton moves), see:
@𝑎, 𝑏:𝑂𝑏𝑗𝑒𝑐𝑡, 𝑠, 𝑔:𝑅𝑒𝑔𝑖𝑜𝑛, 𝑝:𝑃 𝑎𝑡ℎ P𝑋:
pCON TAINED_INp𝑎, 𝑏q ^ SPGp𝑏, 𝑝, 𝑠, 𝑔qq Ñ SPGp𝑎, 𝑝, 𝑠, 𝑔q
Following the same reasoning, a LINKed relationship ensures that what happens to one of the
objects is transferred also to the linked object (a robot holding a carton will ensure that if the
robot moves, then the carton moves as well). Likewise, a SU PP ORTed object will move, if the
SUP PO RTing object is transferred to another location (consider a robot carrying the milk-carton
on a tray).
These kinds of built-in rules for our expectations of the environment offer the possibility to
make predictions for the outcomes of particular actions. These rules also offer the possibility
for dealing with unexpected problems that can arise. For instance, if the robot’s movement is
BLO CKed by another entity getting in the way, the robot would be able to reason if that object is
in movement and thus, wait for it to pass, or if it is still, circumvent the BLOCKAGE. Likewise, a
robot would be able to reason about how a container needs to be opened for something to be able
to exit the container. The next section will tackle this and related reasoning challenges.
3.3. Fetching and Placing Actions
One of the most common things we ask other people to do is to fetch different things for us. From
a household robot’s perspective, the instruction Get the oat milk can be described as a particular
instance of a Fetch-And-Place action descriptor. In addition to the call to perform the action, the
instruction includes additional important – albeit implicit – forms of knowledge. First, the noun
oat milk evokes information about what it is, e.g. a perishable good, where it is usually stored
and so on. The SOMA ontology of everyday activities [
18
], provides a foundational framework
as well as additional dedicated modules for the household domain, expressing that
OatMilk
𝑖𝑠_𝑎PerishableSubstance
that is
𝑠𝑡𝑜𝑟𝑒𝑑_𝑖𝑛 CoolingDevices
such as Refrigerators.
In image-schematic terms, all liquid substances (LS) are further specified as requiring a container
(C) for transport:
@𝐿𝑆, 𝐶:𝑂𝑏𝑗𝑒𝑐𝑡p𝑀 𝑜𝑣𝑒p𝐿𝑆q Ø CONTAI NE D_INp𝐿𝑆, 𝐶q ^ 𝑀 𝑜𝑣𝑒p𝐶qq
Second, the verb get requires the understanding of transporting something by evoking a SOURCE_
PAT H_GOAL schema with the locations of the source and the goal being an integral part of the
transportation expressed. The ABox for the oat milk reveals that
𝑝𝑜𝑠𝑡p𝑂𝑎𝑡𝑀 𝑖𝑙𝑘q:𝐹 𝑟𝑖𝑑𝑔𝑒
, and
the robot needs to understand that the source of the instruction (the person speaking) reveals the
goal of the transportation – namely, close to the speaker, 𝑝𝑜𝑠𝑔𝑜𝑎𝑙p𝑂𝑎𝑡𝑀 𝑖𝑙𝑘q:𝑆𝑝𝑒𝑎𝑘𝑒𝑟.
If the robot has an episodic memory concerning oat milk, in addition to the ontological
semantics contained in a NEEM, it has information that milk is a perishable liquid stored in
a container that is placed on a shelf in a fridge. Image-schematically this represents nested
CONTAINMENT: the object in question is inside one container (carton) which in turn is inside
another container (fridge) for different purposes. This is crucial information for knowing how
to treat the object. While a robot could theoretically move the entire fridge to the person asking
for oat milk, this is an inefficient way of solving the problem. Likewise, it is not very smart
to take the oat milk out of the carton before attempting to moving it. One reason for this is
because the purposes for CONTAINMENT are fundamentally different, the liquid needs tight
CONTAINMENT for transportation, whereas the fridge’s loose CONTAINMENT has nothing to do
with transportation but instead for static storage and preservation.
The second image-schematic component is the movement of the oat milk from the fridge to
the person asking for it. This represents the construction of S OURCE_PATH_GOA L capturing
different levels of specificity of the conceptualizations of movement. The classical linguistic
interpretation is that a trajector (object or agent) moves along a path from a particular SOURCE to
a determined GOAL. In this case, the robotic agent needs to be able to deduce that the SOURCE is
the initial location inside the fridge and that the GOAL is to reach the near vicinity of the person
asking for the milk. This may sound like a trivial problem, but it includes not only the SPG
schema, but also the CONTAINMENT schema as the Going_OU T schema is part of the schema’s
dynamic relationships and can, in isolation to the whole event, also be described as a combination
of the image schemas SPG and CONTAINMENT.
The third image-schematic relationship we cover in our working example is SUPPORT. Obvi-
ously, all objects are supported by the ground, but for a robot to masterfully be able to manipulate
objects, it is not possible for it to neglect the naïve rules and structures of placing things on top
of other things. With the oat milk, this offers important information to be transferred from the
source to the goal state. At the source inside the fridge, the oat milk is vertically
2
SUP PO RTed on
a shelf. This could be seen as a required property of the oat milk-carton throughout the action
and in the goal state, as it has an opening at the top.
In ISL
𝐹 𝑂𝐿
, most
3
image schemas and their hierarchical graphs can be formally represented in
the form of ontological patterns. Any formalization using ISL
𝐹 𝑂𝐿
would use their placeholder
names to access the full axiomatization. For instance, the LINK, image schema is formalized
using
𝐸𝐶 p𝑥, 𝑦q ^ 𝑓𝑜𝑟𝑐𝑒p𝑥, 𝑦q ^ 𝑓 𝑜𝑟𝑐𝑒p𝑦, 𝑥q
describing how for two objects, x,y, to be linked,
they are externally connected (asserted by the RCC8 operator EC) and there is a force from
each respective object towards the other. The action event can be described as six different
image-schematic states, depicted and verbalized in Figure 1.
The purpose of understanding image-schematic relationships in scenes is not only to identify
the logical axiomatizations thereof in ISL
𝐹 𝑂𝐿
, it is to describe the underlying patterns for
understanding the meaning of events and actions. By giving an artificial agent access to this
level of semantic layer, it becomes possible to reason about this particular scenario, as well
as to transpose this knowledge to other similar situations with different objects and contexts.
Additionally, this type of reasoning becomes vital to recover from errors and mishaps that might
occur[26].
2Assuming that the carton is elongated on the vertical axis.
3
The logic is not able to elegantly handle transformational relationships involved in image schemas such as
Spiraling or Scaling, as they are more loosely described in terms of other objects and instead their previous states. For
this the addition of mathematical functions could be a way forward.
Scene 1
Scene 2
Scene 3
Scene 4
Scene 5
Scene 6
Scene 1: Initial state. There is a L INK between the Handle
and the FridgeDoor. The FridgeDoor is in CO NTACT with the
Fridge. The Milk is CONTAINED_I NSIDE in the Carton, the
Carton is VERT ICA L and SU PPO RTed by the Shelf which is
CONTAINED_INSIDE the Fridge.
Scene 2: Opening the door. To open the door, the
FridgeDoor needs to leave its state of being in CON TACT
with the Fridge. This is done by SOURCE_PATH, where the
Handle moves along a CircledPath away from the Fridge.
Scene 3: Lift milk-carton. Through VERTICAL oriented
SOURCE_PATH_G OA L the SUP POR T from the Shelf on the
Carton is removed, since the Milk is CONTAINED_INSIDE in
the Carton it MOVEs as well.
Scene 4: Take the Milk out. The Carton goes OU T of
the Fridge through its opening, and is no longer CON-
TAIN ED_ INSIDE.
Scene 5: Milk goes to its destination. Since the Carton
MOVEs towards the Table, the Milk M OVEs to the Table.
Scene 6: Goal state of the Milk resting on the Table: The
Carton’s movement is BLO CKed when it comes in CONTAC T
with the Table, turning the force from the movement into the
reversed force as SUPP ORT from the Table.
Figure 1: Image-schematic scene breakdown of taking the milk out of the fridge.
4. The Contributions in Practice
In this section, we will illustrate how ISL
𝐹 𝑂𝐿
can contribute to robotics to address the issues
listed in the introduction. We will, for each item in turn, explain the underlying problems, provide
an example of reasoning in ISL
𝐹 𝑂𝐿
to address such problems, and then show how the results
inform the construction of axioms in a simpler formalism, EL++ [
27
,
28
], that can be used in a
quick perception-action loop of a robot. The overall approach then is to use ISL
𝐹 𝑂𝐿
“off-line,
in a robot’s idle moments, to either imagine new possible situations or analyze past experience,
and to encode knowledge thus obtained into simple rules that can be employed effectively for
reflexive, fluid actions.
4.1. Reasoning about Functional Relations
One thing that we have repeatedly stressed in this paper is how the image schemas offer a
cognitively plausible method for artificial agents to reason about their surroundings. A simple
example is how humans, birds, and other animals understand that if you want to eat a nut you
have to crack the shell before you can take it out. Likewise, humans are quite experienced with
the understanding that if you want to take the milk out of the fridge, it is not possible unless you
first open the fridge door.
This level of commonsense reasoning is intuitive in biological intelligence, but it has been
demonstrated to be time-consuming to axiomatize the full scenarios (as with the egg cracking
problem) and inefficient to rely purely on statistical methods. This is where the image schema
logic could play a vital role.
To give an example of how a CONTAINMENT inference can work in ISL
𝐹 𝑂𝐿
, consider the
following axiom about solid objects (for simplicity, we omit to include conditions about no
parthood relations between the objects involved)4:
@𝑂1, 𝑂2:𝑆𝑜𝑙𝑖𝑑𝑂𝑏𝑗𝑒𝑐𝑡 𝑂1𝑂2Ñ ␣𝑃 𝑂 p𝑂1, 𝑂2q
ISL
𝐹 𝑂𝐿
allows us to define a few regions of interest around an object: its interior (which we will
take here as a primitive predicate), its exterior, and openings. Let then a closed container be an
object with an interior but no openings, and an enclosed object be one contained inside a closed
container.
𝑒𝑥𝑡𝑒𝑟𝑖𝑜𝑟p𝐸 , 𝑂q:𝑖𝑛𝑡𝑒𝑟𝑖𝑜𝑟p𝐼 , 𝑂q ^ 𝐸𝐶𝑜𝑚p𝑂Y𝐼q
𝑜𝑝𝑒𝑛𝑖𝑛𝑔p𝑜𝑝, 𝑂q:𝑖𝑛𝑡𝑒𝑟𝑖𝑜𝑟p𝐼 , 𝑂q ^ 𝑒𝑥𝑡𝑒𝑟𝑖𝑜𝑟p𝐸, 𝑂q
^𝑃 𝑃 p𝑜𝑝, 𝐸q ^ 𝐸𝐶p𝑜𝑝, 𝐼q
𝑐𝑙𝑜𝑠𝑒𝑑p𝑂q:“ ␣D𝑜𝑝 𝑜𝑝𝑒𝑛𝑖𝑛𝑔p𝑜𝑝, 𝑂q
𝑒𝑛𝑐𝑙𝑜𝑠𝑒𝑑p𝑂q:“ D𝐶 𝑐𝑙𝑜𝑠𝑒𝑑p𝐶q ^ CO NTA IN ED _INp𝑂, 𝐶 q
Reasoning inside the RCC8 fragment of ISL
𝐹 𝑂𝐿
then establishes that, for solid objects, to touch
an enclosed object without also being enclosed in the same container is impossible:
@𝑅, 𝑂, 𝐶 :𝑆𝑜𝑙𝑖𝑑𝑂𝑏𝑗𝑒𝑐𝑡
𝐸𝑛𝑐𝑙𝑜𝑠𝑒𝑑p𝑂, 𝐶q^␣𝐸𝑛𝑐𝑙𝑜𝑠𝑒𝑑p𝑅, 𝐶qÑ␣𝐸𝐶p𝑅, 𝑂q
In upcoming work, it is our intention to use EL++ ontologies to encode image-schematic knowl-
edge for action selection. The previous result from ISL
𝐹 𝑂𝐿
may be approximated in EL++
as:
𝐸𝑛𝑐𝑙𝑜𝑠𝑒𝑑 ” D𝑖𝑠𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑒𝑑𝐼 𝑛.𝑂𝑏𝑗
𝐹 𝑟𝑒𝑒 [𝐸𝑛𝑐𝑙𝑜𝑠𝑒𝑑 ĎK
𝐸𝐶 [ Dpℎ𝑎𝑠𝑃 𝑎𝑟 𝑡𝑖𝑐𝑖𝑝𝑎𝑛𝑡.𝐸𝑛𝑐𝑙𝑜𝑠𝑒𝑑q
[ pDℎ𝑎𝑠𝑃 𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑛𝑡.𝐹 𝑟𝑒𝑒qĎK
Meaning, a relationship in which one participant is a free object and another is enclosed cannot
be of type EC. This has consequences for action selection because, e.g., an EC relation between a
gripper and an item to grasp is necessary. If such a relation is impossible, then reaching for an
item should be delayed. Further axioms could pinpoint possible alternative actions, such as to
manipulate handles on the container to open it.
4
As defined in RCC8:
𝑃 𝑂
- Partial Overlap;
𝐸𝐶
- Externally Connected.
𝑃 𝑃
- Proper Part,
𝐶𝑜𝑚
- set theoretical
complement in R3.
4.2. Reasoning about Alternatives to a Plan
Another vital contribution that the image schemas bring to reasoning about action execution
is dealing with unexpected situations. In a kitchen, it is not unlikely that multiple humans are
present at the same time and may continuously change the state of the environment. If a person
at a table asks the robot to fetch them the milk, the robot needs to be able to reason about any
changes that might have taken place. Perhaps the originally intended path is no longer possible to
take because another person put something in front of the robot or is actively crossing the path in
that instance. Not only it is expected that the robot should stop and not run over the other human,
but it should also be able to redirect its movement if the object on the path is unlikely to move
anytime soon, i.e. if it does not have a movement state. If the human blocking its movement is
simply walking past as part of an SOURCE_ PATH_GOAL of its own right, then the robot can
simply take a break before continuing along the same path. However, if the person, or item,
remains on the path, the robot needs to be able to reroute to still be able to successfully reach the
goal of the instructions. It would do this, by generating a new SOURCE_PATH_GOAL construct
in which the source is no longer based at the fridge, but at the location of the BLOCKAGE, and
the path is no longer the fastest route (or what previously had been suggested) but a route that
bypasses the blocking object.
The above discussion can be formally represented in ISL
𝐹 𝑂𝐿
as follows with omitting the
assumption that the source and goal (S,G) are TPP, tangential proper parts, of the path. Assuming
again the axiom of solid objects, and the following axiom about SPG:
@𝐴, 𝑋:𝑆𝑜𝑙𝑖𝑑𝑂𝑏𝑗𝑒𝑐𝑡, @𝑆, 𝐺:𝑅𝑒𝑔 𝑖𝑜𝑛, @𝑃:𝑃 𝑎𝑡ℎ
SPGp𝐴, 𝑃, 𝑆, 𝐺q ^ Gp𝑃 𝑂p𝑋, 𝑃 qq Ñ Fp𝑃 𝑂 p𝐴, 𝑋qq
That is, if a trajector
𝐴
follows a path
𝑃
, it will pass through any region along the path. Then,
one can show the following:
@𝐴, 𝐵:𝑆 𝑜𝑙𝑖𝑑𝑂𝑏𝑗𝑒𝑐𝑡, @𝑆, 𝐺:𝑅𝑒𝑔𝑖𝑜𝑛, @𝑃:𝑃 𝑎𝑡ℎ
𝐴𝐵^Gp𝑃 𝑂p𝐵 , 𝑃 qq Ñ Gp␣SPGp𝐴, 𝑃, 𝑆, 𝐺qq
In other words, if a path is forever blocked, it is never possible to use it to go from S to G.
As before, we wish to provide a robotic agent with some simple rules to select, or filter out,
actions. The previous result from ISL
𝐹 𝑂𝐿
could be approximated in EL++ thusly, assuming ap-
propriate recognition procedures for entities such as paths and stationary objects, and appropriate
controllers for actions such as moving the robot base:
𝐵𝑙𝑜𝑐𝑘𝑒𝑑𝑃 𝑎𝑡ℎ 𝑃 𝑎𝑡ℎ [ D𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑝𝑒𝑑𝐵𝑦.𝑆𝑡𝑎𝑡𝑖𝑐𝑂𝑏𝑗
𝐵𝑎𝑠𝑒𝑀 𝑜𝑣𝑒𝑚𝑒𝑛𝑡 [ D𝑢𝑠𝑒𝑠.𝐵𝑙𝑜𝑐𝑘 𝑒𝑑𝑃 𝑎𝑡ℎ ĎK
That is, no action that involves moving the base should use a blocked path. If such a path is the
one currently used by the robot, it should search for a different one. A similar set of simple rules
might encode for the robot that it may be worth waiting if a path is blocked by a moving object:
𝐵𝑢𝑠𝑦 𝑃 𝑎𝑡ℎ 𝑃 𝑎𝑡ℎ [ D𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑝𝑒𝑑𝐵𝑦.𝑀 𝑜𝑣𝑖𝑛𝑔𝑂𝑏𝑗
𝐵𝑎𝑠𝑒𝑀 𝑜𝑣𝑒𝑚𝑒𝑛𝑡 [ D𝑢𝑠𝑒𝑠.𝐵𝑢𝑠𝑦𝑃 𝑎𝑡ℎ Ď𝐷𝑒𝑙𝑎𝑦𝑒𝑑𝐴𝑐𝑡
Such a set of action rules would be justified if one believed that objects moving away from a path
eventually do not overlap it:
@𝐵:𝑆𝑜𝑙𝑖𝑑𝑂𝑏𝑗𝑒𝑐𝑡, @𝑃:𝑃 𝑎𝑡ℎ :
Gp𝐵Ðâ𝑃q Ñ Fp␣𝑃 𝑂 p𝐵, 𝑃 qq
4.3. Increase Adaptability through Analogy
Bypassing things like BL OCKED_MOVE ME NT by redirecting routes requires the robot to rethink
its actions based on new image-schematic states of the world. This is useful, but it is also possible
to use this in a more general way by abstracting away from the actual parts of the world.
This builds on the idea of analogical reasoning, that there are underlying patterns that can
be transferred from an information rich source domain to an underspecified target domain. For
robotic actions, this offers the possibility to reuse previously learned relationships. One crucial
component for successful analogical transfer is that the source and target share the same structure.
In the settings of functional relations as the foundation to guide robotic action selection, these
patterns are also useful as a basis for generalization. For instance, if a robot has access to the
image-schematic information that for something to be taken out of a closed container as for
instance a fridge, it can use this information to reason about similar CONTAINMENT situations. It
can use this generalized information to take the lasagna out of the oven (take note of how Figure
1would be exactly the same with this example), a letter out of a envelope and join the masses of
biological species whose evolutionary predecessors learned long ago that the nut needs to come
out of its shell.
A less complex analogy, as only one object needs to be exchanged, is how it is possible to
close a door with another body part than a hand, should it be preoccupied holding milk-cartons
or lasagnas. In this case, what we want is to describe, image-schematically, what it means for a
solid object to push another, via some third solid object:
𝑝𝑢𝑠ℎp𝑅, 𝐴, 𝐿q:
p𝑅𝐿_LINKp𝑅, 𝐿qq ^ p𝐿ù𝐴q ^ 𝐸𝐶p𝐿, 𝐴q
A robot using EL++ rules to select and parameterize actions might then be interested in classifying
what objects it can push with:
𝑃 𝑢𝑠ℎ𝑖𝑛𝑔 ĎD𝑢𝑠𝑒𝑠.𝐴𝑓 𝑓 𝑜𝑟𝑑𝑠𝑃 𝑢𝑠ℎ𝑖𝑛𝑔
𝑂𝑤𝑛𝐵 𝑜𝑑𝑦𝑃 𝑎𝑟𝑡 Ď𝐴𝑓 𝑓 𝑜𝑟𝑑𝑠𝑃 𝑢𝑠ℎ𝑖𝑛𝑔
D𝑙𝑖𝑛𝑘𝑒𝑑𝑇 𝑜.𝑂 𝑤𝑛𝐵𝑜𝑑𝑦𝑃 𝑎𝑟 𝑡 Ď𝐴𝑓 𝑓 𝑜𝑟𝑑𝑠𝑃 𝑢𝑠ℎ𝑖𝑛𝑔
4.4. Improve Natural Language Understanding
Another important requirement of a successful household robot system is to be able to understand
human instructions. Natural language instructions are usually underspecified and contain vast
amounts of ambiguity, polysemy, and implicit information that need to resolved and explicated in
order to execute the corresponding actions appropriately. Another problem is that instructions
often omit vital semantic components such as determiners, quantities and even the object them-
selves [
29
]. For instance, in the example above get the oat milk, neither the source nor the target
locations are made explicit next to the omission of the addressee. Yet any adult human would
be able to successfully reach the correct goal state based on this instruction. While we have not
dived deeper into the linguistic aspects of image schemas for this particular paper, the theory,
analysis and application of image schemas stem from research in cognitive linguistics.
To improve natural language understanding in robotics, with a special focus on instructions,
we employ an efficient construction-based parser [
30
] that produces semantic specifications as
connected RDF triples that represent as much of the meaning of the instructions as contained
in the textual commands. All terms in these semantic specifications are aligned to the SOMA-
SAY module [
31
] that is part of the larger SOMA framework [
18
] that rests on the DUL+D&S
foundational ontology [
32
]. Image-schematic theories are part of the descriptive branch of SOMA
and constitute the central anchoring point of the semantic representations of the instructions given
to the robotic agents. While these OWL-DL based representations only afford limited reasoning
as compared to ISL
𝐹 𝑂𝐿
, we ensure a seamless usage of the ensuing semantic representations
by using the terms provided in SOMA as a lingua franca throughout the system. Additional
mechanisms that are part of our deep language understanding pipeline are needed for further
explicating the implicit information to arrive at executable robotic action plans. This concerns,
for example, the learning of tool selections via human computation approaches [
33
] or the setting
of action parameters and execution variations by means of physics-based simulations that satisfy
the expectation constraints provided by the given ISL𝐹 𝑂𝐿 models [13].
5. Discussion on Past and Future Work
Using commonsense reasoning to improve robotic action selection is not a novel idea, it has
been a fundamental component since the beginning of formal research on intelligent systems (a
comprehensive overview is given in [
34
]). Many researchers (e.g. [
35
,
8
,
36
]) have worked on
providing robotic systems with human-like commonsense knowledge so that the agents more
efficiently can plan and execute their actions.
Similar to the ideas in this paper, is the work by [
37
]. They consider activity knowledge as a
means to fill the gaps in abstract instructions, but treat these gaps in a much more general point of
view than the specifics found in image schemas. A more general approach to activity modeling
for robotic agents is presented by the IEEE-RAS working group ORA [
38
]. The group has the
goal of defining a standard ontology for various sub-domains of robotics, including a model for
object manipulation tasks. It has defined a core ORA ontology [
39
], as well as additional modules
for industrial tasks such as kitting [
40
]. In terms of methodology, we differ in the foundational
assumptions we assert, with important consequences on the structure of our ontology, modeling
workflow, and inferential power. In the case of ORA, the SUMO upper-level ontology is used
as foundational layer. Compared to SUMO, we use a richer axiomatization of entities on the
foundational layer, and put particular emphasis on the distinction between physical and social
activity context.
Unlike most previous methods, that often build action descriptors for particular actions and
scenarios, we suggest relying on the generalized information learned from the sensorimotor expe-
riences, encoded as functional relationships based on image schemas. While there exists research
on how to formalize image schemas [
41
] and to use them for simulation-based reasoning [
42
],
the role they play in active applications is not quite as thoroughly investigated. Another novel
approach is to construct hybrid reasoning pipelines that connect simulation-based reasoning with
qualitative reasoning about functional relations [13], but this needs further investigation.
At this stage, the contributions of the paper remain purely theoretical. However, the novelty of
the approach and our conviction of the ideas underlying the core concepts and their contributions
motivates future work.
The next steps of this research program is to further strengthen the applicability of this work by
providing a more feasible connection between ISL
𝐹 𝑂𝐿
and simple formal languages commonly
used in robotics, such as EL++. Additionally, we intend to develop an image schema parser that
can identify and extract image-schematic relationships from the subsymbolic data of robotic
simulations and visually recorded human activity to provide automation to the system. Thirdly,
we aim to connect the formal part and the identification parser to the body of work in cognitive
linguistics to improve the robotic agents’ understanding of instructions in natural language.
Acknowledgements
The authors thank John Bateman and Fabian Neuhaus for valuable insights and constructive
feedback on the paper. The research reported in this paper has been supported by FET-Open
Project #951846 “MUHAI - Meaning and UNderstanding for Human-centric AI” funded by the
EU Program H2020 and the German Research Foundation DFG, as part of Collaborative Research
Center (Sonderforschungsbereich) 1320 “EASE - Everyday Activity Science and Engineering”,
University of Bremen (http://www.ease-crc.org/). The research was conducted in sub-projects
“P01 Embodied Semantics for the Language of Action and Change” and “R01 CRAM 2.0 - a 2nd
Generation Cognitive Robot Architecture for Accomplishing Everyday Manipulation Tasks”.
References
[1]
M. Ghallab, D. Nau, P. Traverso, Automated Planning: theory and practice, Elsevier, 2004.
[2]
Z. Kootbally, C. Schlenoff, C. Lawler, T. Kramer, S. Gupta, Towards robust assembly
with knowledge representation for the planning domain definition language (pddl), Robot.
Comput.-Integr. Manuf. 33 (2015) 42–55.
[3] R. Reiter, A logic for default reasoning, Artificial intelligence 13 (1980) 81–132.
[4]
M. Thielscher, Introduction to the fluent calculus, Electronic Transactions on Artificial
Intelligence 2 (1998) 179–192.
[5]
M. Shanahan, The event calculus explained, in: Artificial intelligence today, Springer, 1999,
pp. 409–430.
[6]
J.-C. Latombe, Robot motion planning, volume 124, Springer Science & Business Media,
2012.
[7]
L. Morgenstern, Mid-Sized Axiomatizations of Commonsense Problems: A Case Study in
Egg Cracking, Studia Logica 67 (2001) 333–3384.
[8] L. Kunze, M. Tenorth, M. Beetz, Putting people’s common sense into knowledge bases of
household robots, in: Annual Conf. on Artificial Intelligence, Springer, 2010, pp. 151–159.
]Beetz2018 M. Beetz, D. Be{ssler, A. Haidu, M. Pomarlan, A. K. Bozcuoglu, G. Bartels,
Know Rob 2.0 - A 2nd Generation Knowledge Processing Framework for Cognition-Enabled
Robotic Agents, Proc. - IEEE Int. Conf. on Robotics and Automation (2018) 512–519.
[9] L. Shapiro, Embodied Cognition, New problems of philosophy, Routledge, London, 2011.
[10]
A. Galton, The Formalities of Affordance, in: M. Bhatt, H. Guesgen, S. Hazarika (Eds.),
Proc. of workshop Spatio-Temporal Dynamics, 2010, pp. 1–6.
[11]
M. Raubal, M. Worboys, A formal model of the process of wayfinding in built environments,
in: Int. Conf. on Spatial Information Theory, Springer, 1999, pp. 381–399.
[12]
D. Beßler, R. Porzel, M. Pomarlan, M. Beetz, R. Malaka, J. Bateman, A formal model of
affordances for flexible robotic task execution, in: Proc. of the 24th European Conf. on
Artificial Intelligence, 2020.
[13]
M. Pomarlan, J. Bateman, Embodied functional relations: a formal account combining
abstract logical theory with grounding in simulation, in: 11th Int. Conf. on Formal Ontology
in Information Systems (FOIS), 2020.
[14]
M. T. Turvey, Affordances and prospective control: An outline of the ontology, Ecological
psychology 4 (1992) 173–187.
[15]
N. Chang, J. Feldman, R. Porzel, K. Sanders, Scaling cognitive linguistics: Formalisms for
language understanding, in: Proc. of the First International Workshop On Scalable Natural
Language Understanding, 2002.
[16]
L. Talmy, Toward a Cognitive Semantics. Volume 2: Typology and Process in Concept
Structuring, Language, Speech, and Communication, MIT Press, Cambridge, MA, 2000.
[17]
M. M. Hedblom, Image Schemas and Concept Invention: Cognitive, Logical, and Linguistic
Investigations, Cognitive Technologies, Springer Computer Science, 2020.
[18]
D. Bessler, R. Porzel, M. Pomarlan, S. Hoefner, J. Batemann, R. Malaka, M. Beetz,
Foundations of the Socio-physical Model of Activities (SOMA) for Autonomous Robotic
Agents, in: Proc. of the Int. Conf. on Formal Ontology in Information Systems, 2021.
[19]
T. Oakley, Image Schema, in: D. Geeraerts, H. Cuyckens (Eds.), The Oxford Handbook of
Cognitive Linguistics, Oxford University Press, 2010, pp. 214–235.
[20]
M. M. Hedblom, O. Kutz, R. Peñaloza, G. Guizzardi, Image schema combinations and
complex events, KI-Künstliche Intelligenz 33 (2019) 279–291.
[21]
R. St. Amant, C. T. Morrison, Y.-H. Chang, P. R. Cohen, C. Beal, An image schema
language, in: Int. Conf. on Cognitive Modeling (ICCM), 2006, pp. 292–297.
[22]
D. A. Randell, Z. Cui, A. G. Cohn, A spatial logic based on regions and connection, in:
Proc. of the 3rd Int. Conf. on Knowledge Representation and Reasoning (KR-92), 1992.
[23] G. Ligozat, Reasoning about cardinal directions, J. Vis. Lang. Comput. 9 (1998) 23–44.
[24]
N. V. D. Weghe, A. G. Cohn, G. D. Tré, P. D. Maeyer, A qualitative trajectory calculus as a
basis for representing moving objects in geographical information systems, Control and
cybernetics 35 (2006) 97–119.
[25]
M. Reynolds, The complexity of temporal logic over the reals, Annals of Pure and Applied
Logic 161 (2010) 1063–1096.
[26]
M. Diab, M. Pomarlan, D. Beßler, A. Abkari, J. Rossel, J. Bateman, M. Beetz, An ontology
for failure interpretation in automated planning and execution, in: Fourth Iberian Robotics
Conference, ROBOT ’19, Porto, Portugal, 2019.
[27] F. Baader, S. Brandt, C. Lutz, Pushing the el envelope., 2005, pp. 364–369.
[28] F. Baader, C. Lutz, S. Brandt, Pushing the el envelope further, in: OWLED, 2008.
[29]
R. Porzel, V. S. Cangalovic, J. A. Bateman, Filling constructions: Applying construction
grammar in the kitchen, in: Proc. of the 11th Int. Conf. on Construction Grammar, Antwerp,
Belgium, 2021.
[30]
V. S. Cangalovic, R. Porzel, J. A. Bateman, Streamlining formal construction grammar, in:
Proc. of the ICCG11 Workshop on Constructional approaches in formal grammar, Antwerp,
Belgium, 2021.
[31]
R. Porzel, V. Cangalovic, What say you: An ontological representation of imperative
meaning for human-robot interaction, in: Proc. of JOWO, Bolzano, Italy, 2020.
[32]
A. Gangemi, P. Mika, Understanding the semantic web through descriptions and situations,
in: Proc. of the ODBASE Conference, Springer, 2003.
[33]
J. Pfau, R. Malaka, We asked 100 people: How would you train our robot?, in: Extended
Abstracts of the 2020 Annual Symposium on Computer-Human Interaction in Play, CHI
PLAY ’20, Association for Computing Machinery, New York, NY, USA, 2020, p. 335–339.
[34]
A. Olivares-Alarcos, D. Beßler, A. Khamis, P. Gonçalves, M. Habib, J. Bermejo, M. Barreto,
M. Diab, J. Rosell, J. Quintas, J. Olszewska, H. Nakawala, E. Pignaton de Freitas, A. Gyrard,
S. Borgo, G. Alenyà, M. Beetz, H. Li, A review and comparison of ontology-based
approaches to robot autonomy, The Knowledge Engineering Review 34 (2019).
[35] N. J. Nilsson, Shakey the robot, Technical Report, SRI, Menlo Park, CA, 1984.
[36]
R. D. Nielsen, R. Voyles, D. Bolanos, M. H. Mahoor, W. D. Pace, K. A. Siek, W. H. Ward,
A platform for human-robot dialog systems research, in: 2010 AAAI Fall Symposium
Series, 2010.
[37]
M. Tenorth, M. Beetz, Representations for robot knowledge in the knowrob framework,
Artificial Intelligence 247 (2017) 151–169.
[38]
C. Schlenoff, E. Prestes, R. Madhavan, P. Goncalves, H. Li, S. Balakirsky, T. Kramer,
E. Miguelanez, An IEEE standard ontology for robotics and automation, in: IEEE Int. Conf.
on Intelligent Robots and Systems (IROS), 2012, pp. 1337–1342.
[39]
E. Prestes, J. L. Carbonera, S. R. Fiorini, V. A. M. Jorge, M. Abel, R. Madhavan, A. Locoro,
P. Goncalves, M. E. Barreto, M. Habib, A. Chibani, S. Gérard, Y. Amirat, C. Schlenoff,
Towards a core ontology for robotics and automation, Robotics and Autonomous Systems
61 (2013) 1193 – 1204. Ubiquitous Robotics.
[40]
S. R. Fiorini, J. L. Carbonera, P. Gonçalves, V. A. Jorge, V. F. Rey, T. Haidegger, M. Abel,
S. A. Redfield, S. Balakirsky, V. Ragavan, H. Li, C. Schlenoff, E. Prestes, Extensions to the
core ontology for robotics and automation, Robot. Comput.-Integr. Manuf. 33 (2015) 3–11.
[41]
A. U. Frank, M. Raubal, Formal specification of image schemata – a step towards interoper-
ability in geographic information systems, Spatial Cognition and Computation 1 (1999)
67–101.
[42]
S. Nayak, A. Mukerjee, Concretizing the image schema: How semantics guides the
bootstrapping of syntax, in: 2012 IEEE Int. Conf. on Development and Learning and
Epigenetic Robotics, ICDL 2012, 2012.
... In [14], these issues were suggested to be approached through the integration of formal representations of conceptual building blocks for object states that are commonly used for reasoning, referred to as image schemas [12,16]. Following this research agenda, we develop this idea further by specifically looking at the role situational awareness plays when a robot fails to perform a particular action (i.e. is unable to reach its goal state). ...
... In previous research of looking at event segmentation for formal systems, the imageschematic states have been argued to map onto the conceptual cuts of events [13]. [14] introduce the idea of using image-schematic states for action selection by arguing how it would enrich robot action descriptors with semantic components. One main benefit that was argued, was that as image schemas are both intrinsically meaningful and exist in a finite number, using them in robotic planning offers intelligent reasoning methods to find alternative plans in unexpected state changes. ...
Full-text available
Conference Paper
Autonomous robots struggle with plan adaption in uncertain and changing environments. Although modern robots can make popcorn and pancakes, they are incapable of performing such tasks in unknown settings and unable to adapt action plans if ingredients or tools are missing. Humans are continuously aware of their surroundings. For robotic agents, real-time state updating is time-consuming and other methods for failure handling are required. Taking inspiration from human cognition, we propose a plan adaption method based on event segmentation of the image-schematic states of subtasks within action descriptors. For this, we reuse action plans of the robotic architecture CRAM and ontologically model the involved objects and image-schematic states of the action descriptor 'cutting.' Our evaluation uses a robot simulation of the task of 'cutting bread' and demonstrates that the system can reason about possible solutions to unexpected failures regarding tool use.
... Furthermore, everyday commonsense is bound to naive inferential physical patterns in order to operate in open world environments [67], e.g. water is in the fridge usually implies that water is in a bottle, which is in the fridge [39]. Given the dynamic and flexible character of natural language, automatically understanding those implications for concrete, physical situations is very challenging [6,13,44]. ...
Full-text available
Article
Commonsense knowledge is a broad and challenging area of research which investigates our understanding of the world as well as human assumptions about reality. Deriving directly from the subjective perception of the external world, it is intrinsically intertwined with embodied cognition. Commonsense reasoning is linked to human sense-making, pattern recognition and knowledge framing abilities. This work presents a new resource that formalizes the cognitive theory of image schemas. Image schemas are dynamic conceptual building blocks originating from our sensorimotor interactions with the physical world, and enable our sense-making cognitive activity to assign coherence and structure to entities, events and situations we experience everyday. ImageSchemaNet is an ontology that aligns pre-existing resources, such as FrameNet, VerbNet, WordNet and MetaNet from the Framester hub, to image schema theory. This article describes an empirical application of ImageSchemaNet, combined with semantic parsers, on the task of annotating natural language sentences with image schemas.
... The mission of this research endeavour (see [7,8] for some previous work) is to use the semantic information found in image-schematic patterns when designing robotic actions descriptions to generate meaningful event segments that can be reasoned about [9]. ...
Full-text available
Conference Paper
In this extended abstract, we present initial work on intelligent object stacking by household robots using a symbolic approach grounded in image schema research. Image schemas represent spatiotemporal relationships that capture objects' affordances and dispositions. Therefore, they offer the first step to ground semantic information in symbolic descriptions. We hypothesise that for a robot to successfully stack objects of different dispositions, these relationships can be used to more intelligently identify both task constraints and relevant event segments.
... number of different domains, ranging from cognitive linguistics and developmental psychology to 37 artificial intelligence and robotics (Hedblom et al., 2021). They may be explained somewhat differently 38 in different contexts, for example: 39 40 • "An image schema is a recurring, dynamic pattern of our perceptual interactions and motor 41 programs that gives coherence and structure to our experience." ...
Full-text available
Article
The notion of affordance remains elusive, notwithstanding its importance for the representation of agency, cognition, and behaviors. This paper lays down a foundation for an ontology of affordances by elaborating the idea of “core affordance” which would serve as a common ground for explaining existing diverse conceptions of affordances and their interrelationships. For this purpose, it analyzes M. T. Turvey’s dispositional theory of affordances in light of a formal ontology of dispositions. Consequently, two kinds of so-called “core affordances” are proposed: specific and general ones. Inspired directly by Turvey’s original account, a specific core affordance is intimately connected to a specific agent, as it is reciprocal with a counterpart effectivity (which is a disposition) of this agent within the agent-environment system. On the opposite, a general core affordance does not depend on individual agents; rather, its realization involves an action by an instance of a determinate class of agents. The utility of such core affordances is illustrated by examining how they can be leveraged to formalize other major accounts of affordances. Additionally, it is briefly outlined how core affordances can be employed to analyze three notions that are closely allied with affordances: the environment, image schemas, and intentions.
... However, we apply a symbolic flair to our models to anticipate the future, and have the robot learn symbolic rules in a logic of image schemas to describe how a situation might develop. These symbolic rules could then be used by an agent to construct perception queries and select actions [15] and we intend to pursue this line of research in future work. ...
Full-text available
Conference Paper
Despite rapid progress, cognitive robots have yet to match the facility with which humans acquire and find ways to reuse manipulation skills. An important component of human cognition seems to be our curiosity-driven exploration of our environments, which results in generalizable theories for action outcome prediction via analogical reasoning. In this paper, we implement a method to emulate this curiosity drive in simulations of the situation of pouring liquids between containers, and to use these simulations to construct a symbolic theory of pouring. The theory links qualitative descriptions of an initial state and manner of pouring with observed behaviors, and can be used to predict qualitative outcomes or select manners of pouring towards achieving a goal.
Full-text available
Preprint
Prokopchuk Y. (2022). Intuition: The Experience of Formal Research. Dnipro, Ukraine: PSACEA Press ISBN 978-966-323-188-4 (1st edition). References
Full-text available
Conference Paper
One of the key reasoning tasks of robotic agents is inferring possible actions that can be accomplished with a given object at hand. This cognitive task is commonly referred to as inferring the affordances of objects. In this paper, we propose a novel conceptualization of affordances and its realization as a description logic ontology. The key idea of the framework is that it proposes candidate affordances through inference, and that these can then be validated through physics-based simulation. We showcase the practical use of our conceptualization by means of demonstrating what competency questions an agent equipped with it can answer. The proposed formal model is implemented as a TBox OWL ontology of affordances based on the DOLCE Ultra Light + DnS foundational ontology.
Full-text available
Conference Paper
While robotic proficiency excels in constrained environments, the demand for vast amounts of world knowledge to cover unforeseen circumstances, constellations and tasks prevents sufficiently robust real-world application. Human computation has shown to provide successful advances to close this reasoning gap and accumulate knowledge, yet being greatly reliant on the quality of the provided data. In this paper, we introduce the game with a purpose Tool Feud that collects popularity rankings of object choices for robotic everyday activity tasks and evaluate an approach for classifying malicious responses automatically.
Full-text available
Article
Formal knowledge representation struggles to represent the dynamic changes within complex events in a cognitively plausible way. Image schemas, on the other hand, are spatiotemporal relationships used in cognitive science as building blocks to conceptualise objects and events on a high level of abstraction. In this paper, we explore this modelling gap by looking at how image schemas can capture the skeletal information of events and describe segmentation cuts essential for conceptualising dynamic changes. The main contribution of the paper is the introduction of a more systematic approach for the combination of image schemas with one another in order to capture the conceptual representation of complex concepts and events. To reach this goal we use the image schema logic ISL, and, based on foundational research in cognitive linguistics and developmental psychology, we motivate three different methods for the formal combination of image schemas: merge, collection, and structured combination. These methods are then used for formal event segmentation where the changes in image-schematic state generate the points of separation into individual scenes. The paper concludes with a demonstration of our methodology and an ontological analysis of the classic commonsense reasoning problem of ‘cracking an egg.’
Chapter
Functional relations such as containment or support have proven difficult to formalize. Although previous efforts have attempted this using hybrids of several theories, from mereology to temporal logic, we find that such purely symbolic approaches do not account for the embodied nature of functional relations, i.e. that they are used by embodied agents to describe fragments of a physical world. We propose a formalism that combines descriptions of a high level of abstraction with generative models that can be used to instantiate or recognize arrangements of objects and trajectories conforming to qualitative descriptions. The formalism gives an account of how a qualitative description of a scene or arrangement of objects can be converted into a quantitative description amenable to simulation, and how simulation results can be qualitatively interpreted. We use this to describe functional relations between objects in terms of spatial arrangements, expectations on behavior, and counterfactual expectations for when one of the participants is absent. Our method is able to tackle important questions facing an agent operating in the world, such as what would happen if an arrangement of objects is created and why. This gives the agent a deeper understanding of functional relations, including what role background objects, not explicitly asserted to participate in a functional relation such as containment, play in enabling or hindering the relation from holding.
Article
In order to robustly perform tasks based on abstract instructions, robots need sophisticated knowledge processing methods. These methods have to supply the difference between the (often shallow and symbolic) information in the instructions and the (detailed, grounded and often real-valued) information needed for execution. For filling these information gaps, a robot first has to identify them in the instructions, reason about suitable information sources, and combine pieces of information from different sources and of different structure into a coherent knowledge base. To this end we propose the KnowRob knowledge processing system for robots. In this article, we discuss why the requirements of a robot knowledge processing system differ from what is commonly investigated in AI research, and propose to re-consider a KR system as a semantically annotated view on information and algorithms that are often already available as part of the robot's control system. We then introduce representational structures and a common vocabulary for representing knowledge about robot actions, events, objects, environments, and the robot's hardware as well as inference procedures that operate on this common representation. The KnowRob system has been released as open-source software and is being used on several robots performing complex object manipulation tasks. We evaluate it through prototypical queries that demonstrate the expressive power and its impact on the robot's performance.
Article
The effort described in this paper attempts to integrate agility aspects in the “Agility Performance of Robotic Systems” (APRS) project, developed at the National Institute of Standards and Technology (NIST). The new technical idea for the APRS project is to develop the measurement science in the form of an integrated agility framework enabling manufacturers to assess and assure the agility performance of their robot systems. This framework includes robot agility performance metrics, information models, test methods, and protocols. This paper presents models for the Planning Domain Definition Language (PDDL), used within the APRS project. PDDL is an attempt to standardize Artificial Intelligence planning languages. The described models have been fully defined in the XML Schema Definition Language (XSDL) and in the Web Ontology Language (OWL) for kit building applications. Kit building or kitting is a process that brings parts that will be used in assembly operations together in a kit and then moves the kit to the area where the parts are used in the final assembly. Furthermore, the paper discusses a tool that is capable of automatically and dynamically generating PDDL files from the models in order to generate a plan or to replan from scratch. Finally, the ability of the tool to update a PDDL problem file from a relational database for replanning to recover from failures is presented.
Conference Paper
In this computational study for acquiring syntactic containment constructions, we suggest that emergent structures in the sensorimotor space may help limit the search for linguistic patterns to a much smaller corpus, in which the earliest glimmerings of syntax may originate. We consider an early learner exposed to perceptual data (video) with co-occurring word-segmented commentaries. In the initial stages of an incremental statistical learning process, content words (e.g. object names) are acquired. Then, some prototypical containment categories emerge by clustering spatial relations in the perceptual input. These situational regularities are used to focus only on sentences uttered during containment situations. By using k-gram and path alignment models on this limited corpus, we show that such a system may begin to learn word categories as well as syntacto-semantic constructions such as “into the X”/[IN(tr,X)]. A key step in the process is the structure in which this perceptual knowledge is organized. The term “image schema” has often been used to refer to such structures, but has widely divergent interpretations. We view the image schema as a category oracle; initially limited to spatio-temporal situation, it may extend, with increasing exposure to language, to new situations.
Article
We extend the description logic EL++ with re∞exive roles and range restrictions, and show that subsumption remains tractable if a certain syntactic restriction is adopted. We also show that subsumption becomes PSpace-hard (resp. undecidable) if this restriction is weakened (resp. dropped). Additionally, we prove that tractability is lost when symmetric roles are added: in this case, subsumption becomes ExpTime- hard.
Book
1 Introduction and Overview.- 2 Configuration Space of a Rigid Object.- 3 Obstacles in Configuration Space.- 4 Roadmap Methods.- 5 Exact Cell Decomposition.- 6 Approximate Cell Decomposition.- 7 Potential Field Methods.- 8 Multiple Moving Objects.- 9 Kinematic Constraints.- 10 Dealing with Uncertainty.- 11 Movable Objects.- Prospects.- Appendix A Basic Mathematics.- Appendix B Computational Complexity.- Appendix C Graph Searching.- Appendix D Sweep-Line Algorithm.- References.