Enabling context-aware multimedia annotation by a novel generic semantic problem-solving platform
ABSTRACT Automatic generation of metadata, facilitating the retrieval of multimedia items, potentially saves large amounts of manual work. However, the high specialization degree of feature extraction algorithms makes them unaware of the context they operate in, which contains valuable and often necessary information. In this paper, we show how Semantic Web technologies can provide a context that algorithms can interact with. We propose a generic problem-solving platform that uses Web services and various knowledge sources to find solutions to complex requests. The platform employs a reasoner-based composition algorithm, generating an execution plan that combines several algorithms as services. It then supervises the execution of this plan, intervening in case of errors or unexpected behavior. We illustrate our approach by a use case in which we annotate the names of people depicted in a photograph.
-
Citations (0)
-
Cited In (0)
Page 1
Multimed Tools Appl
DOI 10.1007/s11042-010-0709-6
Enabling context-aware multimedia annotation
by a novel generic semantic problem-solving platform
Ruben Verborgh·Davy Van Deursen·Erik Mannens·
Chris Poppe·Rik Van de Walle
© Springer Science+Business Media, LLC 2011
Abstract Automatic generation of metadata, facilitating the retrieval of multimedia
items, potentially saves large amounts of manual work. However, the high specializa-
tion degree of feature extraction algorithms makes them unaware of the context they
operate in, which contains valuable and often necessary information. In this paper,
we show how Semantic Web technologies can provide a context that algorithms can
interact with. We propose a generic problem-solving platform that uses Web services
and various knowledge sources to find solutions to complex requests. The platform
employs a reasoner-based composition algorithm, generating an execution plan that
combines several algorithms as services. It then supervises the execution of this plan,
intervening in case of errors or unexpected behavior. We illustrate our approach by
a use case in which we annotate the names of people depicted in a photograph.
Keywords Annotation·Metadata generation·Semantic Web·
Service composition·Web services
R. Verborgh (B ) · D. Van Deursen · E. Mannens · C. Poppe · R. Van de Walle
ELIS—Multimedia Lab, Ghent University—IBBT, Gaston Crommenlaan 8 bus 201,
9050, Ledeberg-Ghent, Belgium
e-mail: ruben.verborgh@ugent.be
D. Van Deursen
e-mail: davy.vandeursen@ugent.be
E. Mannens
e-mail: erik.mannens@ugent.be
C. Poppe
e-mail: chris.poppe@ugent.be
R. Van de Walle
e-mail: rik.vandewalle@ugent.be
Page 2
Multimed Tools Appl
1 Introduction
1.1 Background
The ever increasing multimedia production rate on the Internet cannot be harnessed
unless we have an efficient means of retrieving relevant information. There are many
algorithms for searching textual data; searching data types such as image and video
however,ismoredifficult.Metadataannotations[29]facilitateretrievalbydescribing
each item. Unfortunately, metadata generation is a tedious task that involves a
significant amount of manual work and knowledge about the annotation domain.
For example, a person annotating press photographs needs to recognize depicted
people and situations. Algorithms for detecting and recognizing human faces exist,
but they are prone to errors and lack an understanding of the photograph in its
entirety. Furthermore, none of them are designed to handle composite problems;
instead, they are specialized for a specific task.
On the one hand, we can consider these algorithms as services on the World Wide
Web. In fact, the Web has evolved from a static document-oriented information
source to a dynamic service-oriented platform providing loosely coupled applica-
tions [10]. The main focus of Web services is to achieve interoperability between
heterogeneous, decentralized, and distributed applications. Furthermore, there is
a growing need for composing Web services into more complex services due to
increasing user demands and inability of single Web services to achieve a user’s goal
by itself.
On the other hand, there is the Semantic Web [4] which contains a vast amount of
information about diverse domains in extensive databases such as DBpedia [8] and
Freebase [12]. This structured data enables advanced reasoning about multimedia
item contents, if we connect it to feature extraction algorithms.
1.2 Goal
This paper describes how Semantic Web knowledge and technologies can provide
a context to feature extraction algorithms, generating multimedia annotations the
algorithms cannot discover individually. We present a generic semantic problem-
solvingplatform,whichautomaticallycombinesWebservicestoachieveapredefined
taskandusestheSemanticWebasknowledgesourcetoinitiateandactivelymaintain
the task context.
The platform composes an execution plan that answers a certain request using ser-
vices. Furthermore, it supervises the execution of this plan, handling the information
collection and the interaction between services. When errors occur, it is able to find
alternative paths that lead towards an equivalent solution. We apply this platform to
a multimedia annotation use case, indicating the added value of context.
1.3 Use case
During this paper, we will demonstrate the introduced concepts by means of an
image annotation use case. Take the case of a publisher of a current affairs magazine
Page 3
Multimed Tools Appl
who has a digital photo archive which needs to be annotated. Apart from the image
bitmap data, no additional information is available. As a first step, we would like
to identify the people on the photographs, which will mostly be celebrities. Anno-
tations should be linked to the corresponding DBpedia entities to enable semantic
searches.
A major difficulty is that the photographs are taken under varying and sometimes
poor conditions (insufficient lighting, poor resolution etc.), which has an impact on
the precision of the algorithms. Also, given a limited training set and the current
limitations of face recognition technology, the probability associated with the results
will not always cross a certain reliability threshold. Contextual information can play
an important role to generate better annotations.
Suppose we dispose of the following algorithms (among others):
–
–
a face detection algorithm;
a face recognition algorithm.
Furthermore, we assume access to the following knowledge:
–
–
image, region, and face ontologies and rules;
Semantic Web knowledge, particularly about celebrities, through DBpedia.
1.4 Paper outline
In Section 2, we outline the architecture of the platform and introduce its main com-
ponents, which are detailed in the following sections. The interaction modalities and
description of services are described in Section 3. Section 4 details the composition
algorithm used to combine different services into a plan, the execution and error
handling of which is discussed in Section 5. A multimedia use case forms the subject
of Section 6. Related work is listed in Section 7 and we conclude with Section 8 and
sketch future research possibilities.
2 Architecture
The architecture of the problem-solving platform, depicted in Fig. 1, implements the
blackboard architectural pattern [7] widely used in artificial intelligence applications.
It consists of the following components:
–
–
a blackboard that contains the currently requested and the gathered information;
a collection of services, accompanied by a description, that perform a variety of
specific tasks;
a supervisor, which invokes the services that contribute to the solution of
the request and handles failures.
–
Page 4
Multimed Tools Appl
Collabor ator 3
Collabor ator 2
Service 1
se man c
capability
desc rip on
capabi lity
desc rip on
description
se man c
semantic
output
Supervisor
request
Blackboard
request
description
gathered
information
Composer
Semantic
Web
knowledge
Fig. 1 Blackboard-based architecture of the platform
The supervisor accepts a SPARQL [24] query and the blackboard uses RDF [18]
to store supplied and gathered information while retaining all semantics. In our use
case, the query in Listing 1 could start the process on the image Loft.jpg.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT { <Loft.jpg> foaf:depicts ?person. }
WHERE {
<Loft.jpg> a foaf:Image;
foaf:depicts ?person.
}
Listing 1 SPARQL request for image annotation
The supervisor does not naively try different services, but follows an execution
plan created by a service composer. Both are assisted by formally described knowl-
edge to relate the services to the request and each other. Note that such knowledge
can either be application-specific or knowledge available in the Semantic Web, as
detailed in Section 6. An iterative process progresses towards a solution:
1.
2.
3.
the supervisor invokes a service with the current blackboard contents;
the service produces a result and sends this to the blackboard;
the supervisor supplements the blackboard with derived knowledge, inferred
from available knowledge.
For our use case, the supervisor could invoke a face detection algorithm on the
image Loft.jpg. The algorithm would then return the coordinates of the detected
regionsandthe supervisorcouldforexampleinfer thatnoneoftheseregionsoverlap,
which would otherwise indicate a detection error.
Page 5
Multimed Tools Appl
PREFIX sr:
<http://ninsuna.elis.ugent.be/ontologies/arseco/sparqlrequest#>
PREFIX imreg: <http://www.w3.org/2004/02/image-regions>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT { <Loft.jpg> foaf:depicts ?person }
WHERE {
[a sr:Request;
sr:input [sr:bindsParameter "region";
sr:boundTo <Loft.jpg#xywh=5,7,42,43>];
sr:output [sr:bindsParameter "person";
sr:boundTo ?person]]
}
Listing 2 Face recognition SPARQL query
Listing 3 Input and output conditions of the face recognition service in OWL-S
Page 6
Multimed Tools Appl
3 Services
Our platform requires a flexible interaction model for services, as a great variety of
differentservicesneedstobepluggedin.Itisofutmostimportancethatthesemantics
of the concepts of the blackboard are preserved when communicating with a service.
Furthermore, we need a formal description of the capabilities and requirements of
each service.
We access multimedia algorithms by invoking them as SPARQL endpoints [33].
Benefits include interoperability, flexibility in terms of inputs and outputs, and for-
mal communication with well-defined semantics. For our platform, it is specifically
interesting that input can be sent in RDF as part of the WHERE clause of the query,
and output can be retrieved as RDF by using a CONSTRUCT query. An example of a
face recognition service query is shown in Listing 2. This query is executed directly
at a service endpoint, which implements a specific face recognition algorithm. A
query-based interaction mechanism enables more complex interaction models than
simple REST services. For example, the algorithm could return multiple solution
alternatives with an associated probability, instead of a single solution [33].
The algorithms can be described formally as Web services in OWL-S [20],
complemented with formal input and output relationships described in an expression
language [33]. These descriptions should not only cover input and output types,
but should also determine the effect of the former on the latter. The description
of the use case’s face recognition service with inputs, outputs, preconditions, and
postconditions is shown in Listing 3.
4 Composition
4.1 Formal definitions
When discussing the composer, it is convenient to dispose of a formal definition of a
service composition. Firstly, we specify sets that appear in the definitions.
–
The set of parameter names ? which is the union of all possible input and output
parameter names of services (e.g., image, language).
The set of parameter values ? which is the union of all possible input and output
values of services (e.g., <file.jpg>, "en-US").
Thesetofvariablereferences?,containingcomposergeneratedidentifiers,used
as placeholders for unknowns (e.g., ?image1, ?language7).
–
–
Definition 1 A parameter mapping β is a function β: ? → ? ∪ ? which assigns
parameter names to either a value or a variable reference. The set of all parameter
mappings is B. An element (p,v) of B is written as p ?→ v and called a parameter
assignment of p to v.
Definition 2 A service invocation I is a triple (S,βin,βout), written as βin⇐? S βout,
that represents an execution of a service S with input mappings βin and output
mappings βout. The domains of βin and βout are the service input and output
Page 7
Multimed Tools Appl
parameter names, respectively. The parameter value for each parameter name
must be an element of the corresponding service parameter domain. The set of all
invocations is ?.
Definition 3 An invocation execution I is a process step that executes the service S
of an invocation (S,βin,βout), passing the actual values of the parameters in accor-
dance with βin. The output values returned by the service are stored in accordance
with βout.
Definition 4 A service composition C is a directed, labeled, acyclic multigraph with
–
–
a subset ??of the invocation set ?as vertex set;
a subset ??of the variable reference set ?as edge label set.
An edge with label ψ from a vertex (S1,β1
and only if ψ ∈ ??∩ R(β1
is a variable reference produced by the second invocation as an output value. An
edge between two invocations signifies a dependency of the first on the second. In a
complete composition, dependencies are satisfied by values or other invocation out-
puts: ∀IS(βin,βout) ∈ ??: ∀ψ ∈ R(βin) ∩ ??: ∃IS?(β?
means that there exists at least one invocation execution order in which all parameter
values are known at the start of each execution. A composition is partial if it does not
satisfy this requirement.
in,β1
out) to a vertex (S2,β2
in,β2
out) is created if
in) ∩ R(β2
out). That is: if an input value of the first invocation
in,β?
out) ∈ ??: ψ ∈ R(β?
out). This
Example 1 Consider the following complete composition C0 of calculus service
invocations, which computes the value of the calculation (1 + 2)(1+2)·(−1+3).
One possible execution of all invocations of C0is:
1.
Ia: execute Add, using 1 for termA and 2 for termB, storing the value of sum (=3)
in the variable ?s;
Ib: execute Add, using -1 for termA and 3 for termB, storing the value of
sum (=2) in the variable ?t;
Ic: execute Multiply, using ?s for factorA and ?t for factorB, storing the value
of product (=6) in the variable ?p;
2.
3.
Page 8
Multimed Tools Appl
4.
Ic: execute Exp, using ?s for base and ?p for exp, storing the value of re-
sult (=729) in the variable ?r.
4.2 Service matching
The first obstacle in composition creation is to determine whether two services
match. A start service Sσmatches an end service S?if an invocation ISσ(βσ
Sσexists that enables an invocation IS?(β?
fulfillment of both the input conditions (necessary to allow the invocation) and the
output conditions (as a result of the invocation) of Sσ. This signifies that a match is
guaranteed when the union of the start service’s input and output conditions implies
the end service’s input conditions.
Listing 3 shows the description of a service that recognizes a face in an image
region. It shows the input conditions, consisting of the input type declarations 1
and the preconditions
4 . Similarly, the output conditions consist of the output
type declarations2 3ad the postconditions5. The additional conditions4 5are
expressed in the Notation3 format (N3, [2]). Input and output parameters are re-
ferredtobyvariablesintheseexpressions.Here,thepreconditionsstatethatRegion
should depict the face of a person; the postconditions state that the Person’s Face
is depicted in Region. These complex expressions prevent that service matchers and
composersonlyfocusondatatypematching.Forexample,thereisnopointinpassing
a region of a chart to the recognition algorithm. Therefore, semantic matching is
required.
in,βσ
out) of
in,β?
out) of S?. The first invocation implies
4.3 Inadequacy of point-to-point matching
Service composition comprises more than simple point-to-point matching. Consider
the following services:
1.
face detection service (input: an image, output: the list of detected image regions
that contain a face);
face recognition service (input: a region that contains a face, output: the depicted
person’s name);
2.
Upon seeing these, we humans know that, in order to annotate people in an image,
we need to (1) detect face regions in the image and (2) recognize the faces in each
region. That is because we realize that the person names returned by service 2 are
connected to the image of service 1, even though service 2 is completely unaware
of the existence of such an image. We intuitively construct a holistic vision on the
problem by combining effects of different services on a concrete problem instance.
Composers that function by matching services point-to-point are unable to tran-
scend the individual service capabilities and, as a consequence, cannot create similar
complex compositions. Although they understand the complete functionality of
the above services and are even able to match both services, they cannot devise
that this composition recognizes faces in an image. This occurs because they do
not “remember” the semantics across different junctions, interpreting the result of
service 2 as a person in some region, not the person in that region of the image. This
example illustrates that we require a composer with a holistic vision on the problem,
Page 9
Multimed Tools Appl
understanding that the combination of services embodies more than a simple sum of
their individual capabilities.
4.4 Reasoner-based composing process
To create a holistic composition, we employ a goal-driven reasoner on the problem
as a whole instead of solely on the junctions. The composing process consists of
three steps:
1.
2.
3.
each service is translated into an N3 Logic rule [3] that simulates its functionality;
a reasoner determines whether the request can be deduced from the input;
the compositions are reconstructed from the rules used for deduction.
Obviously, the first action needs to execute only once. The number of deductions
found in step 2 indicates how many possible compositions exist. The individual
actions are discussed below.
4.4.1 Translation into rules
Based on the OWL-S description, an N3 Logic rule is created, simulating the
execution of an actual service (Algorithm 1). Instead of producing actual content,
the rule creates placeholders. The conversion process translates input conditions
into antecedents and output conditions into consequents. Input parameters become
unbound variables; outputs parameters become placeholder variables that will be
instantiated with a dummy value upon execution of the rule.
We complemented the rule with tracking information necessary to reconstruct
the composition later on, including the service name and the parameters it was
invoked with. This was achieved by adding to the consequence of the rule a boundBy
statement, with the output mapping as subject and the service name and input
mapping as object. The parameter assignments of the input and output mapping are
formatted as a list of mappedTo statements. The automatic translation of the face
recognition service is displayed in Listing 4.
Note that some reasoners, such as Eye [9], have an option to display a proof of
the deducted knowledge, eliminating the need of tracking. However, such a proof
Algorithm 1 Translation of an OWL-S description into an N3 Logic rule
Require: Service(name,inputParameters,outputParameters, preconditions, postconditions)
antecedent ← preconditions
consequent ← postconditions
for all param ∈ inputParameters do
antecedent ← antecedent ∪ “variable(param.name) rdf:type param.type.”
end for
for all param ∈ outputParameters do
consequent ← consequent ∪ “variable(param.name) rdf:type param.type.”
end for
consequent ← consequent ∪ “mappings(outputParameters) c:boundby [a c:Invocation;
c:ofService name; c:withInput mappings(inputParameters)].”
return “{ antecedent } => { consequent }.”
Page 10
Multimed Tools Appl
@prefix : <http://example.org/facerecognition#>.
@prefix c: <http://example.org/composer#>.
@prefix imreg:
<http://www.w3.org/2004/02/image-regions#>.
@prefix face:
<http://example.org/ontologies/Face.owl#Face>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
{
?region a imreg:Region;
imreg:regionDepicts [a face:Face].
}
=>
{
?person a foaf:Person.
?face a face:Face;
face:isFaceOf ?person.
?region imreg:regionDepicts ?face.
({:Face c:mappedTo ?face.}
{:Person c:mappedTo ?person.}) c:boundBy
[a c:Invocation;
c:ofService :FaceRecognitionService;
c:withInput ({:Person c:mappedTo ?person.})].
}.
Listing 4 AutomaticN3Logicruletranslationof the face recognition service description of Listing 3
contains a lot of unnecessary details and is more difficult to interpret than our custom
tracking statements.
4.4.2 Reasoner deduction
Now that we dispose of N3 Logic rules for all services, we need one more rule
representing the request. Again, information to track the binding is added, using a
hasBinding statement. Listing 5 shows the request rule representing the query of
the use case.
Listing 5 N3 Logic rule translation of the use case request
Page 11
Multimed Tools Appl
A backward-chaining reasoner is called with the service rules, request rule, and
possibly input statements reflecting the current state of the blackboard. We ask to
deduce all possible boundBy, hasBinding and mappedTo statements, which are
then stored for composition reconstruction. The reasoner will try to use the request
rule, as this is the only way to generate hasBinding statements. This requires the
fulfillment of the rule’s antecedents, each of which can be satisfied either directly
by the inputs or by a service rule. In the latter case, the fulfillment of the rule’s
antecedents is necessary, again by inputs or a rule. The output is built up recursively
using this principle.
It is important to notice that the reasoner’s knowledge is not limited to the
N3 rules deduced from the service descriptions. Indeed, application-specific ontolo-
gies and rules, and knowledge available on the Semantic Web, can also be part of the
reasoner’s knowledge, resulting in advanced capabilities of the rule-based composer.
Moreover, we should strive to create knowledge on the highest possible level of
abstraction, so that it can be reused across many problem domains.
4.4.3 Composition reconstruction
We then transform the generated statements using a three-step process:
1.
2.
3.
find all solution bindings, indicated by hasBinding statements;
find all variable mappings that are unresolved, they will lead to new invocations;
recursively repeat step 2 to generate the entire composition graph.
The algorithm produces the correct result, because of the following reasons:
– Each hasBinding statement corresponds to exactly one possible composition.
The only rule that creates such a statement is the request rule, which can solely
be triggered if the solution bindings were successful.
Each boundBy statement uniquely identifies the invocation that executed the
binding, because those statements are created only by service rules, which can
solely be triggered if their input conditions are satisfied.
availableInvocations will not become empty until the composition is finished:
because the composition exists, a path that respects dependencies must exist
as well.
–
–
5 Supervision
The supervisor is a component responsible for solving a problem using services and
an execution plan composer. Its tasks include:
1.
2.
3.
4.
5.
selecting the appropriate execution plan;
executing this plan;
recovering from unanticipated output or errors;
displaying the solution process progress (optional);
formulating a response to the request.
Note that we will not consider displaying the progress in this paper. To formulate
a response to the request, we have two options. Only the requested output could
be returned: the request parameters are bound using the variable binding and
Page 12
Multimed Tools Appl
returned, all other obtained information is discarded. Alternatively, since often
certain intermediary results are of interest as well, the supervisor could also return
all the statements available on the blackboard in addition to the response output.
In the next subsections, we elaborate on three tasks of the supervisor: selection,
execution, and recovery.
5.1 Composition selection
The supervisor firstly demands the composer to search for complete compositions.
If none were found, partial compositions can also be considered. The compositions
can then evaluated by criteria such as the following—where available in the service
descriptions—which should be balanced against each other. This balance is not
predefined but depends on the application domain and expected results.
Possible evaluation criteria:
–
Cost: the expected cost associated with the execution of the services. This cost
is at least the sum of the individual execution costs, but it can increase in case
of failure. It should be expressed as a mixture of different quantities, such as
processor time and amount of money, as external services and employees can be
involved.
Accuracy: some services have a higher success rate than others, usually at the
expense of a higher cost.
Performance: faster compositions should be allocated to urgent tasks.
Availability: some services are not always available, which can be due to server
outage or working schedules if the service task involves people.
Completeness: if the request cannot be solved entirely or if the proposed solution
is too expensive, other solutions that only solve part of the problem can be
included.
–
–
–
–
5.2 Composition execution
When the supervisor receives a new request, it initializes the blackboard and adds
the input. The composer transforms the blackboard and the request into in a number
of possible executions, the best of which is selected. We have to keep track of these
additional items:
–
–
–
the current information kept on the blackboard;
the variable binding, a mapping of variable identifiers and values;
the current composition, current invocation and past invocations with results.
Since a composition consists of an invocation list, its execution—in most basic form—
comes down to the execution of these invocations, as detailed in Algorithm 2.
The variable binding clearly plays a crucial part in the contiguity of the execution
and deserves some explanation. Its concept is similar to that of a single-assignment
store [27] in programming languages, meaning that once a variable is assigned to,
its value cannot change. The composition is in fact a declarative program whose
execution order is solely governed by data dependencies. This declarativeness follows
naturally from the fact that a composer constructs a plan that indicates how to solve
a certain problem. In contrast, the supervisor interprets the declarative program,
Page 13
Multimed Tools Appl
Algorithm 2 Execution of an invocation list
Require: invocations, binding, blackboard
for all invocation ∈ invocations do
service ← invocation.service
inputs ← invocation.inputMapping, replacing variables using binding
output ← service.execute(inputs)
blackboard ← blackboard ∪ output.statements
binding ← binding ∪ invocation.outputMapping, replacing variables using output
end for
return blackboard
determining what steps should be performed. We can take advantage of this high
level of freedom to exploit parallel or batch execution capabilities. The following
definition is analogous to Definition 1 of a parameter mapping.
Definition 5 A variable binding βυ is a function βυ: ?υ→ ?υ, which assigns
variable names to a simple value (∈ ? ∩ ?υ) or complex value (∈ ?υ). The set of
all bindings is Bυ. An element (n,v) of Bvis called a variable assignment, assigning
v to n.
5.3 Failure recovery
The composer optimistically assumes correct and successful behavior of all involved
services. If we were to withdraw this assumption, the construction of viable composi-
tions would be virtually impossible since every service can be subject to failure. The
supervisor therefore handles error recovery, a process consisting of:
1.
2.
3.
failure detection: catching runtime errors and incomplete service output;
impact determination: defining the consequences of the failure;
plan adaptation: changing the plan to reach (possibly adjusted) goals in a
different way.
We now examine these different steps thoroughly.
5.3.1 Failure detection
We distinguish two kinds of failures: errors during service execution and normal
execution with incomplete output. Since the surrounding programming environment
usually detects errors by an exception mechanism, we assume that this task is trivial.
To detect incomplete or empty input, we make use of the invocation’s output
mapping. If certain parameters of the output mapping do not appear in the output,
or if the postconditions specified in the service’s description are not met, the output
is incomplete and we should initiate the failure recovery process.
For example, the face recognition algorithm could block because of server down-
time (error) or could fail to recognize the face (incomplete).
Page 14
Multimed Tools Appl
5.3.2 Impact determination
Once the point of failure is identified, we can determine the failure impact by search-
ing for the invocations that—directly or indirectly—depend on its output. At least
one invocation will be affected, since only outputs necessary for future invocations
are mapped. The failure repeatedly propagates through these invocations, eventually
reaching one or several of the solution generating invocations. The affected part of
the composition consists of all these invocations, starting at the point of failure. The
affected part in Fig. 2 spans the failed invocation and the two rightmost invocations.
The relative size of the affected part indicates whether the composition should
be adapted locally or recreated as a whole. We designate a resumption point where
normal execution is continued. Figure 2 shows the same failure with different
resumption points. In Fig. 2a, this is the second invocation from the right; in Fig. 2b,
it is the rightmost invocation. The selection of the resumption point is influenced by
the availability of an alternative plan and the history of attempted invocations. For
example, if face detection fails, then face recognition is also affected.
5.3.3 Plan adaptation
To recover from failure, the supervisor asks the composer to generate compositions
for the affected part of the plan. New compositions start with the current state of the
blackboard and end in the resumption point.
alternative
plan
(a) Plan adaptation with local recovery
alternative
plan
(b) Plan adaptation with global recovery
Fig. 2 Different plan adaptation strategies. Dotted lines indicate the affected part
Page 15
Multimed Tools Appl
Prior to the generation, the supervisor deduces as many additional facts as
possiblefromtheblackboardusingapplication-specificknowledgeand/orknowledge
available in the Semantic Web. The amount of available information is generally
larger than that at the time of the initial composition, since the partial execution
may have yielded intermediary results. As a result, new compositions that make use
of this increased knowledge are possible. This practice can be seen as a forward-
chaining reasoning approach that, together with the backward-chaining approach
used for composition, constitutes a hybrid mechanism. This brings the advantages
of forward-chaining to the execution of compositions, that were created in a goal-
driven way.
We only consider compositions without previously executed invocations—failed
or successful—to avoid infinite failure recovery loops and the overhead of duplicate
invocations, whose results are already known.
Figure 2 shows two different strategies to handle the same failure. Figure 2a
depicts an adaptation which tries to correct the failure locally, substituting one
service invocation with an alternative plan. Figure 2b uses an entirely new compo-
sition instead of the old one, restarting at the inputs while retaining the blackboard
contents. Of course, for more complex compositions, various intermediate degrees
exist. The supervisor decides what strategy to use, based on the problem parameters,
the last failure point, and the possible history of failed recovery attempts.
For example, in case face recognition fails, we could try another algorithm (local)
or request human assistance for the entire task (global).
6 Use case
The framework developed so far is a general-purpose semantic problem-solver. The
employed knowledge and available services determine the problem domain in which
a framework instance operates. This section discusses a metadata generation use
case, illustrating the added value of Semantic Web technologies in metadata problem
solving.
We return to the image annotation use case introduced in Section 1.3. We start by
plugging in services relevant to the problem domain. Therefore, we transformed two
algorithms into SPARQL endpoints and described them using OWL-S.
These algorithms were:
–an implementation of the Viola-Jones face detection algorithm [35], which finds
regions in an image that contain a human face;
an implementation of the face recognition algorithm by Verstockt et al. [34],
which recognizes a face in a well-delineated region, using a training set.
–
We add links to relevant ontologies and rulesets describing common facts about
images, people and faces. These include both simple and complex facts relating
different concepts, such as:
–
–
–
–
–
a person has exactly one face;
a region belongs to exactly one image;
regions can depicts faces;
the depiction of a face of a person implies the depiction of that person;
...
View other sources
Hide other sources
-
Available from Ruben Verborgh · 6 Nov 2012
-
Available from Erik Mannens · 15 Dec 2012
-
Available from ipn.mx