Conference PaperPDF Available

An Innovative Approach for Indexing and Searching Digital Rights

Authors:

Abstract and Figures

Our aim is the management of the metadata related to the digital rights in centralized systems or networks with indexing capabilities for both text and similarity searches, providing the basic infrastructure enabling the private use and the commercial exploitation as well. We present an innovative approach that treats the DRM metadata as metric objects, enabling similarity search on IPR attributes between digital items. Moreover we show how the content base similarity search can help both the user to deal with a huge amount of similar items with different licenses and the content providers to detect fake copies or illegal uses.
Content may be subject to copyright.
An Innovative Approach for Indexing and Searching Digital Rights
Walter Allasia, Francesco Gallo
EURIX
Torino, Italy
{allasia,gallo}@eurixgroup.com
Filippo Chiariglione
CEDEO.net
Torino, Italy
filippo@cedeo.net
Fabrizio Falchi
ISTI-CNR
Pisa, Italy
fabrizio.falchi@isti.cnr.it
Abstract
Our aim is the management of the metadata related to
the digital rights in centralized systems or networks with
indexing capabilities for both text and similarity searches,
providing the basic infrastructure enabling the private use
and the commercial exploitation as well. We present an in-
novative approach that treats the DRM metadata as met-
ric objects, enabling similarity search on IPR attributes be-
tween digital items. Moreover we show how the content
base similarity search can help both the user to deal with a
huge amount of similar items with different licenses and the
content providers to detect fake copies or illegal uses.
1. Introduction
In the last years a multitude of devices able to consume
and produce good-quality digital content has become ac-
cessible to a big part of the population, and the number of
individuals, left aside the professionals, using them is expo-
nentially growing. People from all over the world are cre-
ating images and audio/video files, and most of the times
are happy to share them with others by means of electronic
mail, Web sites, chats, multimedia messaging services and
several distributed systems.
Our cultural heritage is no longer made up of videos, im-
ages and text documents provided by “institutional” public
or private bodies only, but also by every connected device
as well.
However, people do start feeling the importance of as-
sociating Intellectual Property Rights (IPR) information to
the created content before they release it to the public, for
several reasons, such as to obtain the correct attribution of
the ownership of the content or to allow the consumption
of content only under some conditions (for example to limit
the usage of the content to a restricted set of users), just to
mention a few.
In order to be able to guarantee the preservation and ac-
cess of these digital items, we have to take into account the
Digital Rights Management (DRM) information associated
with them during the creation phase and especially during
the search.
Several approaches have been proposed so far for mana-
ging digital rights and many standards are available for rep-
resenting them. However, usually open as well as trusted
systems provide a simple attribute search on a single spe-
cific type of license.
We are proposing here a different approach for indexing
and searching the information related to the licenses of the
digital items, towards a more flexible and interoperable en-
vironment.
2. Backgrounds
Many network infrastructures are arising in order to pro-
vide the bases for Web sharing and searching functionali-
ties on digital items. Most of them are peered oriented net-
works, such as eMule [1] or BitTorrent [2] for images and
audio/video files and Joost [3] for video streaming. Further-
more there are several multimedia platforms which enable
the automatic audio/video processing for cataloging and in-
dexing digital items [4] that, in combination with the net-
work infrastructure, will provide powerful solutions for dig-
ital content management.
Before introducing the novel approach for dealing with
DRM at query time, a number of related concepts are dis-
cussed in the following sections.
2.1. Digital Rights Management
The digital items residing in most of network infrastruc-
tures mentioned above are currently not associated with IPR
information, moreover service providers and users want to
be aware of the license related to the searched content and
want to know what are the actions that can be done on the
downloaded digital items.
There are many standard initiatives that are trying to pro-
vide the basis for a DRM infrastructure, such as MPEG [5],
OMA [6], DVB [7], DMP [8], Coral [9], TV-Anytime [10]
as well as many proprietary solutions.
The DRM landscape is still very much fragmented, and
it would be very challenging today to predict if, when and
how there would be a standard and widely adopted DRM
solution prevailing over the others, particularly in such an
heterogeneous environment as the Web.
Fortunately the expression of the kind of the license is
evolving in a more agreed way. Even if from the point of
view of the copyright law there are still some differences
between the license and the actions that a DRM system is
able to provide according to that license, some standards on
the expression of the license are arising and are commonly
used. Anyway we have to point out the important differ-
ence between the license, a statement concerning rights to
use content and the enforcement that the terms of the license
are indeed followed by the user. This article deals with the
definition of the license and in particular the search capa-
bilities of a digital item by its license. We do not want to
provide any guideline nor implementation of the software
for controlling the respect of use of the license as a DRM
system has to guarantee. We want to provide an innova-
tive approach for managing the license attributes in order to
be able to search for a similar (as well as for different or
not similar) digital object as query with its related digital
license.
Up to now, a few solutions for expressing digital rights
have been standardized, such as MPEG-21 Right Expres-
sion Language (REL) [11], Open Digital Rights Lan-
guage (ODRL) [12], TV-Anytime RMPI [13] , Adobe
Content Manager, [14], CreativeCommons (CC) [15], and
Publishing Requirements for Industry Standard Metadata
(PRISM) [16].
In this paper we will concentrate on the digital items
(mainly audio/video files) produced by individuals and pro-
fessionals and made available on the Web with associated
rights information. Hence we are focusing our attention
here on the language for expressing their licenses.
In order to be able to calculate a metric distance between
licenses, a common Rights Expression Language (REL) has
to be chosen. In our analysis we adopted MPEG-21 REL for
the following reason:
1. it is an ISO Standard;
2. it is widely considered the most flexible and powerful
tool to digitally express rights;
3. several profiles of the REL standard have already been
defined within MPEG: the Mobile And optical Me-
dia (MAM) [17] , the Dissemination And Capture
(DAC) [18] and the Open Release Content (ORC) [19]
profiles;
4. it is being adopted as the default language for the ex-
pression of the rights in a number of standard spec-
ifications dealing with DRM issues such as in most
Multimedia Application Formats (MAF) [20] specifi-
cations in MPEG and DMP Interoperable DRM Plat-
form, as well as by several European IST Projects such
as AXMEDIS [21].
The three REL profiles mentioned above were conceived
for allowing the expression of MPEG-21 REL licenses able
to represent equivalent rights expressions such as OMA
ODRL [22], TV-Anytime RMPI and CC licenses. Therefore
it is possible to convert ODRL, RMPI and CC licenses into
equivalent MPEG-21 REL licenses when governed content
is ingested in the network considered in this paper. The
similarity search algorithm can then be applied on content
belonging to different application domains.
The full MPEG-21 REL standard has more functionali-
ties than those supported by the three considered profiles:
MAM, DAC and ORC. However, as a first step, we will
focus on a subset of the language. The MPEG-21 REL
schemas, although not having a normative value, are split
into six different namespaces: Core, Standard Extension,
Multimedia Extension, Multimedia Extension 1, Multime-
dia Extension 2 and Multimedia Extension 3. The analysis
described in this paper will be based on licenses validating
against the six namespaces mentioned above but only con-
taining the elements and complex types defined in the three
profiles considered.
2.2. Similarity Search
The notion of similarity has been studied extensively in
the field of psychology and has an important role in cogni-
tive sciences. A similarity search can be seen as a process
of obtaining data objects in order of their distance or dis-
similarity from a given query object. It is a kind of sorting,
ordering, or ranking of objects with respect to the query ob-
ject, where the ranking criterion is the distance measure.
From a database prospective, similarity search is based
on gradual rather than exact relevance. A distance between
objects is used to quantify the proximity, similarity or dis-
similarity of a query object versus the objects stored in a
database to be searched. Though this principle works for
any distance measure, we restrict the possible set of mea-
sure to the metric distance. Because of the mathemati-
cal foundations of the metric space notion, partitioning and
pruning rules can be constructed for developing efficient in-
dex structures. Therefore, in the past years the research has
been focused on metric spaces [23].
2.3. The metric space approach
Although many similarity search approaches have been
proposed, the most generic one considers the mathematical
metric space as a suitable abstraction of similarity [23]. The
simple but powerful concept of the metric space consists of
a domain of objects and a distance function that measures
the proximity of pairs of objects.
Let M = (D, d) be a metric space defined over a domain
of objects D and a total (distance) function d : D ×D R.
The following properties always hold in M x, y D:
d(x, y) 0 (non-negativity),
d(x, y) = 0 iff x = y (identity),
d(x, y) = d(y, x) (symmetry),
d(x, z) d(x, y) + d(y, z) (triangle inequality).
The metric space approach has been proved to be
very important for building efficient indexes for similarity
searching. A survey of existing approaches for centralized
structures (e.g. M-tree [24] and D-Index [25]) can be found
in [23] and [26].
Very recently even scalable and distributed index struc-
tures based on Peer-to-Peer networks have also been
proposed for similarity searching in metric spaces, i.e.
GHT* [27], VPT* [28], MCAN [29], M-Chord [30]
(see [28] for a comparison of their performances).
Currently many research projects are investigating these
fields, such as SAPIR [31], a project funded by European
Research Area in the 6
th
Framework Program, that aims to
develop cutting-edge technology that will break the barri-
ers and enable search engines to look for large scale audio-
visual information by content, using the query by example
paradigm. SAPIR intends to propose new solutions for an
innovative technological infrastructure for next-generation
Multimedia Search Engines. This research effort should
lead towards a distributed, P2P based, search engine archi-
tecture, as opposed to today parallel search engines within
a centralized Web data warehouse.
3. The proposed solution of metric distance for
licenses
It is worthwhile to underline that in order to be able
to handle the rights associated to digital contents, they
should be defined during the creation phase of the items.
Hence the GUI that a sharing system should provide has
to take into account the multiple choice of available li-
censes. Flickr [32], for example, is currently providing at
least one type of license definition (CreativeCommons). So
far the search engine for the rights is nothing but an attribute
search, looking for the same value of a specific attribute of
the expressed license.
We suggest that for enabling efficient and distributed
queries on digital rights the metadata expressing the license
can be considered as special features in a given space, in-
stead of simple attributes. Hence we can perform the simi-
larity search on digital rights by finding out the appropriate
distance function.
A typical REL license is composed of a number of grants
and an issuer, the latter being the entity granting a right to
the principal. Every grant, in turn, may still contain three
main types of information: the principal, identifying whom
a right is granted to; the right: an action that a principal may
perform on the associated resource; and the resource, the
object which the right in the grant applies to. Figure 1 shows
a simplified model of REL license where some components
are omitted for clarification purposes.
Figure 1. A simplified model of REL license.
The issuer, in the most general case, could be seen as
an element not adding significant value as a parameter in a
search. However, there are some cases in which the lack
of metadata describing the content item could make the in-
formation contained within the issuer element rather im-
portant. For instance, an issuer specialised in issuing li-
censes for a specific type of content could become a target
for searches for that specific type.
The principal, as well as the right, are certainly very fun-
damental for any similarity search. If the target of a search
is the content having an associated license specifying a prin-
cipal p
1
and a right r
1
, the result of the search not matching
exactly (p
1
,r
1
) would certainly score bad in terms of dis-
tance from the search target.
It has to be noted that in ORC licenses (those express-
ing in REL the CreativeCommons licenses) the principal is
never specified, meaning that anyone is granted the rights
specified by the license. All ORC licenses would immedi-
ately satisfy the principal requirement in a search. Instead,
for all the cases in which the principal is a well defined en-
tity (e.g. a device with a unique certificate, or a domain or
a unique user identifier contained in a Smart/SIM Card), if
this value doesn’t match the same specified value, the result
should certainly be interpreted as distant from the target. If
a principal is specified in a license, this has a high relevance.
In an open network it can be the only distinctive parameter
and in a local network (e.g. a home domain) the principal
can be used to identify one of a restricted number of users.
Moreover, a number of content items may be governed
by a license granting access only to users or devices be-
longing to a specific domain. In this case, in order to obtain
more meaningful results, it is important that when a similar-
ity search is performed, the principal parameter contains all
the identities a user has, including, for instance, any domain
which the user is subscribed to. In some circumstances, a
user may have the opportunity to join a domain or obtain a
license for a content item at a later stage, therefore even if
the license of a content item does not grant a right specifi-
cally for the principal being searched, the user may become
a valid principal in the future, hence the result of the search
could get an high score in terms of distance even if the prin-
cipal found is not the same as the principal searched for.
The same applies to the right; if the right found is not
the same as the right searched for, the distance from the
optimal result is certainly high. However, if the user has
the chance to acquire the specific right he is looking for at
a later stage, the distance would be reduced. Those spec-
ifications supporting the super-distribution model, such as
ISO/IEC MPEG Media Streaming MAF standard [33], as
well as the DMP IDP, enable specifying in the content item
itself the location from where a license can be obtained,
therefore if this information is present, it must be consid-
ered as a key factor.
Finally, the conditions under which a principal may exer-
cise a right may play a decisive role, too. Only if all condi-
tions in a license are fully met, a right can be exercised.
While some conditions are deal-breaking (e.g. the geo-
graphical location in which a content item can be played:
either the user is in that area or he is not), others may still
be acceptable by the user even if they are not the preferred
choice.
We propose in Figure 2 an example of rights and con-
ditions for DAC and ORC profiles where the candidate at-
tributes to be compared are reported. Rights can be repre-
sented mainly by binary attributes, while conditions include
also numerical and textual attributes.
3.1. IPR-based distance functions
In this section we analyse the problem of defining a met-
ric distance (i.e. a dissimilarity measure) between two li-
censes. We focus on comparing two licenses considering
only the information related to the principal expressed in
the query. For simplicity we will not consider the issuer in
searching for similarity. However, it would be easy to mod-
ify the defined distance taking into account also the issuer
and assigning a greater distance value to those licenses hav-
ing different issuers. Thus, the distance is evaluated con-
Figure 2. Example of rights and conditions for
DAC and ORC profiles.
sidering the rights and conditions that are associated with a
given principal in both licenses.
Let D be the domain of metadata related to the license of
any given object. For any object x D we use the follow-
ing notation:
r
1
, r
2
, ..., r
n
r
are the n
r
possible rights;
c
1
, c
2
, ..., c
n
c
are the n
c
possible conditions;
x
i,1
, x
i,2
, ..., x
i,n
c
are the n
c
condition values for the
i-th right.
We define the global distance d(x, y) [0, 1] as the
weighted sum of the distances between the rights, i.e.
d(x, y) =
1
P
n
r
j=1
w
i
·
n
r
X
i=1
w
i
· d
r
i
(x, y) (1)
where d
r
i
(x, y) [0, 1] is the distance between the two
licences considering the i-th right, and w
i
are weights used
to give more or less importance to the various rights. Note
that
P
n
r
j=1
w
i
normalizes the distance between 0 and 1.
We define the distance d
r
i
(x, y) [0, 1] between two
licenses x and y considering only the right r
i
as:
d
r
i
(x, y) = 0, if r
i
is not present in both licences;
d
r
i
(x, y) = 1, if r
i
is present only in one license;
d
r
i
(x, y) =
1
P
n
c
j=1
w
i,j
·
P
n
c
j=1
w
i,j
·d
c
j
(x
i,j
, y
i,j
), oth-
erwise (i.e. if the right is present in both licences).
d
c
j
[0, 1] is the distance between the j-th conditions
for the right i in the two licenses, while w
i
j
are weights
used to give more or less importance to the various
rights. Note that
P
n
c
j=1
w
i,j
normalizes the distance
between 0 and 1.
Once a general definition of a distance function is given, the
the distance d
c
j
(x
i,j
, y
i,j
) between two values x
i,j
and y
i,j
of the j-th condition must be defined considering the spe-
cific attribute type. However, we must deal with the fact that
the given condition could be not associated with the given
right in one or both of the licenses. Thus, while the distance
between two conditions of the same type must be defined
considering the type of the condition (see next sections), all
the condition distances must follow two rules:
d
c
j
(x
i,j
, y
i,j
) = 0, if c
j
for the right r
i
is not specified
in both the license x and y;
d
c
j
(x
i,j
, y
i,j
) = d
c
j
(x
i,j
, u
c
j
), if c
j
for the right r
i
is
specified in only one (x) of the two licenses. u
c
j
is
the most unrestrictive value for condition c
j
. In fact,
if the condition is not present it means that it is not
necessary.
We now give some example of distances for specific condi-
tion types.
3.1.1. Binary conditions
In many cases conditions are expressed by binary values,
describing whether or not a given condition is necessary for
the given right. In this case (i.e. x
i,j
, y
i,j
{0, 1}) we
can use the L
1
norm, which assumes binary values as well.
Hence the distance is directly computed as:
d
c
j
(x
i,j
, y
i,j
) = |x
i,j
y
i,j
| = x
i,j
y
i,j
. (2)
Please note that a specific weight to this distance can be
given by setting w
i,j
. Obviously, for binary value u
c
j
= 0.
3.1.2. Numeric conditions
For conditions expressed with a number (i.e. x
i,j
, y
i,j
R),
which for example can represent a fee for accessing to the
digital content, it is possible to apply the L
1
norm or the
euclidian distance. However, more sophisticated metric dis-
tances could be used for specific numerical attributes. For
instance, we suggest to define the distance between fees as
|log (x
i,j
) log (y
i,j
)| , multiplied by a normalization fac-
tor α:
d
c
j
(x
i,j
, y
i,j
) = α
log
x
i,j
y
i,j
. (3)
According to Equation 3, the distance between $5 and
$10 is the same as the distance between two fees of $50 and
$100 respectively. We believe that given a fee as query, the
user is interested on the proportion between its query and
any given fee. Unfortunately in this case any non $0 fees
would be at infinite distance from $0 objects. Incidentally,
in the case of fees this feature may reflect a common atti-
tude, because users that are looking for free digital content
are not interested in non free items. For instance, they are
not willing (or able) to purchase items on the Web, or they
are browsing for material just because it is free.
Anyway, to avoid this problem we suggest that when-
ever the fee value is smaller than a given threshold x
min
,
the value used for evaluating the distance is automatically
set to x
min
(this value is assigned also to u
c
j
). In this way,
if for example we assign to x
min
the value $0.01, the dis-
tance between $0 and $1 becomes the same of the distance
between $1 and $100, which is a reasonable assumption. It
is worth noting that the distance is still a metric.
To keep the distance in the range [0, 1] we do not only
need a minimum but also a maximum value x
max
for the
fees. The value x
max
will be assigned to any fee having a
value above the upper limit x
max
. Therefore, only for the
purpose of evaluating a distance between two fees we will
limit the fee value in a given interval [x
min
, x
max
]. The α
factor is given by |log(x
min
) log(x
max
)|.
It should also be noted that there are other possible ap-
proaches for measuring differences in prices, for example
the gap-ratio computed by the following equation:
d
c
j
(x
i,j
, y
i,j
) =
x
i,j
y
i,j
y
i,j
. (4)
Anyway the gap-ratio is not symmetric and thus it cannot
be used for the proposed metric approach.
3.1.3. Term based conditions
For conditions whose value is a term from a given vocabu-
lary, we propose a specific approach. If the j-th attribute of
the i-th group is a term taken from a specific vocabulary of
m terms, we can define the distance d
i,j
(x
i,j
, y
i,j
) between
the two values according to what reported in Table 1 where
term
0
is used as u
c
j
(i.e. whenever the given condition is
not specified). In this way it is possible to define a specific
distance between any given term and a not specified condi-
tion.
It is assumed that the values of α are manually cho-
sen according to the semantic of the given terms. For
d
i,j
(x
i,j
, y
i,j
) to be a metric the matrix must be symmet-
ric, thus α
i,j
a,b
= α
i,j
b,a
, all the diagonal values must be 0 and
l , α
i,j
a,b
α
i,j
a,l
+ α
i,j
l,b
. (5)
A user interface should help the administrator setting the α
values according to these requirements. A trivial solution
is to set all α
i,j
a,b
= 1 when a 6= b, and in this case textual
attributes are considered as binary attributes.
term
0
term
1
term
2
... term
m
term
0
0 α
i,j
1,0
α
i,j
2,0
... α
i,j
m,0
term
1
α
i,j
0,1
0 α
i,j
2,1
... α
i,j
m,1
term
2
α
i,j
0,2
α
i,j
1,2
0 ... α
i,j
m,2
... ... ... ... ... ...
term
m
α
i,j
0,m
α
i,j
1,m
α
i,j
2,m
... 0
Table 1. Distance values for attributes taken
from terms in a given dictionary
3.1.4. Distance for sets
For conditions that have sets as value we suggest to use the
Jaccard’s coefficient (which is a metric). Assuming two sets
x
i,j
and y
i,j
, the Jaccard’s coefficient is defined as follows:
d(x
i,j
, y
i,j
) = 1
x
i,j
y
i,j
x
i,j
y
i,j
. (6)
For conditions based on sets we use the empty set as u
c
j
.
3.1.5. Metric distance
We now prove that if all the distances used to compare the
values of each right are metric, the proposed distance be-
tween two licenses is still metric. In other words, a weighted
sum of metric distances is a metric distance too. In fact,
x, y D and for any given right i = 1, ..., n
r
and condi-
tion j = 1, ...n
c
we get:
d
i,j
(x
i,j
, y
i,j
) d
i,j
(x
i,j
, z
i,j
) + d
i,j
(z
i,j
, y
i,j
)
n
i
X
j=1
w
i,j
· d
i,j
(x
i,j
, y
i,j
)
n
i
X
j=1
w
i,j
· [d
i,j
(x
i,j
, z
i,j
) + d
i,j
(z
i,j
, y
i,j
)]
d(x, y) d(x, z) + d(z, y).
Thus, if all the distances defined for the conditions are met-
ric, the proposed distance is metric too.
4. Significant use cases
Searching for a digital item can be done by similarity
search and/or by textual attributes. For example we can look
for an image or a sound similar to what someone provided
as query to the search engine. Moreover we can add to our
query some keywords that the search engine will take into
account as specific attributes.
Figure 3. Use case diagram for similarity
search
Nowadays most of the search engines available on the
Web are providing nothing but the “full text” and/or “at-
tribute” search capabilities. Even if many research projects
are developing audio and image similarity search, the only
approach concerning the licenses is based on attributes. In-
stead, according to our proposal, a user will be able for ex-
ample to search for an image similar to the one provided
considering both the multimedia content (content base) and
the related license (provided by the user as well). Further-
more the user can apply for searching similar images re-
garding the multimedia content and a specific kind of li-
cense defined by mean of attributes.
In Figure 3 we are pointing out that the similarity search
use case is made up of the inclusion of two distinct use cases
described below, respectively the similarity search on mul-
timedia features and the similarity search on IPR features.
We have to consider the IPR features as all the possible
attributes expressing a license. Focusing our attention to
MPEG-21 REL the IPR features could be considered as ex-
ample as all the attributes available for expressing the issuer,
the principal, the rights and conditions. Hence the user can
either provide a REL license (conforming to MAM, DAC
or ORC profiles), or a list of attributes expressing the search
criteria to the algorithm performing the search.
We can figure out at least the following use cases dealing
with images:
1. the user is searching for images similar to a given one
both considering its visual appearance and the pro-
vided license
2. the user is searching for images similar to a given one
but with specific license “attributes”
In the first use case, let’s consider as an example a user inter-
ested in an artistic image for his desktop. Let’s also suppose
that he finds a low resolution image on the Web of a paint-
ing whose he does not know who the painter is. If the user
is interested in acquiring the same picture with higher qual-
ity, he could perform a query using the system described
in this paper by providing in input the low quality image
and the desired rights information, expressed in this case as
an ORC license. The search engine will display as a result
the ranking list of images similar in terms of “pixels” and
rights information to the one provided. For example a result
could be a picture of the same painting made by a tourist
and released according to a CreativeCommons license and
another result, probably distant from the previous in the re-
sult list, could be the photograph of the painting made by
a professional photographer having an attached license ex-
pressed in MPEG-21 REL, expressing for example the price
for downloading the high resolution picture. The user can
then choose among the two results, according to both the
quality of the image in the result list and the kind of license,
having immediately the feeling of how far is the result from
the provided object query.
In the second use case the user is able to identify objects
stored in the network that are very similar to those provided
in the query, but for example with a different license. Imag-
ine a professional photographers agency that wants to be
sure that nobody is using their own pictures in an illicit way.
The agency can query the system by providing the picture
to be searched and the attributes of an open license or some-
thing “similar” to an open one. If the system finds a result,
it means either that someone has made the same picture or
that someone is sharing an unauthorized copy of the pic-
ture. This use case is innovative because the current search
engines are focused on content sharing without addressing
the “control” of the content itself, delegating this feature
entirely to the DRM systems.
5. Conclusions and future works
We have proposed an innovative approach for index-
ing and searching digital items based on their associated
rights information expressed in MPEG-21 Rights Expres-
sion Language, according to specific profiles of this stan-
dard. The metadata shown are taken as examples and should
be changed to fit the needs of the software infrastructure the
user has to deal with. This approach is considering the IPR
attributes as special features which a specific metric dis-
tance function can be applied to, enabling efficient querying
on different IPR schema representations and complex rights
expression structures.
Although several organizations are dealing with the ex-
pression of rights information, not so much has been done
so far concerning the “retrieval” of the license associated
to the digital items. The proposed approach does not claim
to cover all possible rights statements and expression lan-
guages available nowadays. However, by selecting a joint
of the three MPEG-21 REL profiles mentioned above, the
solution will indeed support a large number of use cases.
Furthermore we can also have a ranking list of results ac-
cording to the metric function, by defining the distance be-
tween the license attributes and, eventually, by a mapping
between license types that the user has to manage.
Since the definition of the distance between licenses de-
pends on specific needs, the relative importance of the rights
and/or conditions should be configurable. If the weights
are defined by the system administrator, being the proposed
whole distance a metric, the search can be efficiently per-
formed using state of the art data structures for similarity
search in metric spaces. If a few sets of different weights
are used (e.g. for different user groups), the whole distances
can be used to build distinct instantiation of the data struc-
tures. Instead, if the weights must be fully configurable at
query time, algorithm for complex query execution should
be used, causing a degradation in the system performance.
References
[1] eMule. http://www.emule.org. Last visited on
Jun 30th 2007.
[2] BitTorrent. http://www.bittorrent.com.
Last visited on Jun 30th 2007.
[3] Joost. http://www.joost.com. Last visited on
Jun 30th 2007.
[4] A. Messina, L. Boch, G. Dimino, W. Bailer, P. Schal-
lauer, W. Allasia, M. Groppo, M. Vigilante, and
R Basili. Creating rich metadata in the tv broadcast
archives environment: The prestospace project. In Au-
tomated Production of Cross Media Content for Multi-
Channel Distribution, 2006. AXMEDIS ’06. Second
International Conference on, pages 193–200, Dec.
2006.
[5] The Moving Picture Expert Group (MPEG). http:
//www.chiariglione.org/mpeg/. Last vis-
ited on Jun 30th 2007.
[6] Open Mobile Alliance (OMA). http://www.
openmobilealliance.org. Last visited on Jun
30th 2007.
[7] Digital Video Broadcasting (DVB). http://www.
dvb.org. Last visited on Jun 30th 2007.
[8] The Digital Media Project (DMP). http://www.
dmpf.org. Last visited on Jun 30th 2007.
[9] The Coral Consortium. http://www.
coral-interop.org. Last visited on Jun
30th 2007.
[10] TV-Anytime. http://www.tv-anytime.org.
Last visited on Jun 30th 2007.
[11] ISO/IEC Information Technology Multimedia Frame-
work (MPEG-21), 2005. 21000-5.
[12] Open Digital Rights Language (ODRL) version 1.1.
http://www.w3.org/TR/odrl, 2002. World
Wide Web Consortium, W3C Note.
[13] The TV-Anytime WG Rights Management and
Protection. http://www.tv-anytime.org/
workinggroups/wg-rmp.html. Last visited on
Jun 30th 2007.
[14] Adobe Content Manager. http://www.adobe.
com/products/contentserver/. Last visited
on Jun 30th 2007.
[15] CreativeCommons. http://
creativecommons.org. Last visited on Jun
30th 2007.
[16] Publishing Requirements for Industry Standard Meta-
data (PRISM). http://prismstandard.org.
Last visited on Jun 30th 2007.
[17] ISO/IEC 21000-5 AMD 1 Rights Expression Lan-
guage: the MAM (Mobile and optical Media) profile.
[18] ISO/IEC 21000-5 AMD 2 Rights Expression Lan-
guage: the DAC (Dissemination and Capture) profile.
[19] ISO/IEC 21000-5 AMD 3 Rights Expression Lan-
guage: ORC (Open Release Content) profile.
[20] MPEG-A Multimedia Application Format
Overview and Requirements. http://www.
chiariglione.org/MPEG/standards/
mpeg-a/mpeg-a.htm. Last visited on Jun 30th
2007.
[21] Automating Production of Cross Media Content for
Multi-channel Distribution (AXMEDIS). http://
www.axmedis.org. Last visited on Jun 30th 2007.
[22] The Open Mobile Alliance (OMA). http://www.
openmobilealliance.org. Last visited on Jun
30th 2007.
[23] Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal,
and Michal Batko. Similarity SearchThe Metric Space
Approach, volume 32 of Advances in Database Sys-
tems. 233 Spring Street, New York, NY 10013, USA,
2006.
[24] Paolo Ciaccia, Marco Patella, and Pavel Zezula. M-
tree: An efficient access method for similarity search
in metric spaces. In VLDB ’97: Proceedings of the
23rd International Conference on Very Large Data
Bases, pages 426–435, San Francisco, CA, USA,
1997. Morgan Kaufmann Publishers Inc.
[25] Vlastislav Dohnal, Claudio Gennaro, Pasquale
Savino, and Pavel Zezula. D-index: Distance search-
ing index for metric data sets. Multimedia Tools
Appl., 21(1):9–33, 2003.
[26] Hanan Samet. Foundations of Multidimensional and
Metric Data Structures. Computer Graphics and Geo-
metric Modeling. Morgan Kaufmann Publishers Inc.,
San Francisco, CA, USA, 2006.
[27] Michal Batko, Claudio Gennaro, and Pavel Zezula.
Similarity grid for searching in metric spaces. In Peer-
to-Peer, Grid, and Service-Orientation in Digital Li-
brary Architectures6th Thematic Workshop of the EU
Network of Excellence DELOS, Cagliari, Italy, June
24-25, 2004, Revised Selected Papers, volume 3664
of LNCS, pages 25–44, 2004.
[28] Michal Batko, David Novak, Fabrizio Falchi, and
Pavel Zezula. On scalability of the similarity search
in the world of peers. In InfoScale ’06: Proceedings
of the 1st international conference on Scalable infor-
mation systems, page 20, 2006.
[29] Fabrizio Falchi, Claudio Gennaro, and Pavel Zezula.
A content-addressable network for similarity search in
metric spaces. In DBISP2P ’05: Proceedings of the
the 2nd International Workshop on Databases, Infor-
mation Systems and Peer-to-Peer Computing, Trond-
heim, Norway, volume 4125 of LNCS, pages 98–110.
Springer, 2005.
[30] David Novak and Pavel Zezula. M-chord: a scalable
distributed similarity search structure. In InfoScale
’06: Proceedings of the 1st international conference
on Scalable information systems, page 19, 2006.
[31] Search on Audio-visual content using Peer-to-peer
Information Retrieval (SAPIR). http://www.
sapir.eu. Last visited on Jun 30th 2007.
[32] Flickr. The best way to store, search, sort and share
your photos. http://www.flickr.com/. Last
visited on Jun 30th 2007.
[33] ISO/IEC FCD 23000-5. Media streaming player.
... It has to support metadata queries on all the published contents available on the network as well as on a specific peer. The queries should be as follows: (c) queries for similarity searches on contents [84] as well as rights [85], [86] 7. It has to support a schema for metadata management as follows: ...
Thesis
Full-text available
The aim of this work is the design of a software solution for the management of digital rights on a Peer-to- Peer Structured Overlay Network (SON), making use of open standards and enabling further extensions. The thesis is the result of the research activity performed by the author within the following projects: • Search in Audio-Visual Content using P2P Information Retrieval (SAPIR - EU-FP6-45128) • Digital Media Project (DMP) • Digital Media IN Italy (DMIN.it) Chapter 1 presents a survey on security issues on Peer-to-Peer networks, focussing the attention on Dis- tributed Hash Tables (DHT) with an overall discussion on core security goals, anti-spam techniques, pol- lution and index poisoning. This work is the result of the security analysis on DHT performed within the SAPIR project. The reader familiar with Peer-to-Peer networks and DHT terminology will find several well known concepts which will be used as a starting point for the discussion in the following chapters. The original contribution of the author is presented in Chapters 2, 3 and 4. Chapter 2 analyses an Interoperable Digital Rights Management (iDRM) System enabling the digital con- tent distribution on Peer-to-Peer system, defining requirements and specifications, according to the DMP guidelines. This work was integrated in the DMIN.it activities and was published as DMP documents. Chapter 3 presents a prototype implementation, which makes use of and extends the Chillout, the DMP reference software. It is completely decentralized and can be exploited in any MPEG-21 Right Expression Language (REL) compliant metadata representation. It also proposes a special technique for indexing rights metadata on DHTs, which is currently under a conference review. The indexing process is made up of two steps: few metadata are indexed on DHTs and the others are stored into the exchanged DMP Content Files (DCF), that could be figured out as wrapper of MPEG-7 and MPEG-21 metadata bundled together with the associated resources. Finally, Chapter 4 presents an innovative approach for indexing and searching digital rights on distributed networks. The work has been presented to conferences and published as well. This approach treats the metadata expressing rights as metric objects, enabling similarity search on Intellectual Property Rights (IPR) attributes between digital items. The content base similarity search can help both the users to deal with a huge amount of similar items with different licenses, and the content providers to detect fake copies or illegal uses. A software application implementation is under development. The software written by the author for implementing the prototype is currently released under Open Source distribution 2. The aim of the author is that the iDRM Peer-to-Peer Application will be widely adopted and used, as well as improved, by scientific academies and researchers.
... Rights metadata are expressed by means of MPEG-21 REL, which describes the license associated to a specific resource, along with several available rights (play, copy, modify, print, etc.). According to the schema shown in Figure 6.1 [AGCF07] we can imagine the license as made up of an issuer (with multiplicity 0 or 1), an undefined number of grants (multiplicity 0 or more), and a principal (multiplicity 0 or 1). The issuer is the owner of the rights associated to a given content (eventually coincident with the creator or distributor of the resource) and can assign a given right (e.g., the authorization to copy or modify the content) to the principal. ...
... Rights metadata are expressed by means of MPEG-21 REL, which describes the license associated to a specific resource, along with several available rights (play, copy, modify, print, etc.). According to the schema shown in Figure 1 [29] we can imagine the license as made up of an issuer (with multiplicity 0 or 1), an undefined number of grants (multiplicity 0 or more), and a principal (multiplicity 0 or 1). The issuer is the owner of the rights associated to a given content (eventually coincident with the creator or distributor of the resource) and can assign a given right (e.g., the authorization to copy or modify the content) to the principal. ...
Article
Full-text available
This paper proposes a decentralized, distributed and secure communication infrastructure for indexing and retrieving multimedia contents with associated digital rights. The lack of structured metadata describing the enormous amount of multimedia contents distributed on the the web leads to simple search mechanisms that usually are limited to queries by title or by author. Our approach is based on structured peer-topeer networks and allows complex queries using standard MPEG-7 and MPEG-21 multimedia metadata. Moreover, security aspects limit the development of general purpose real applications using a peer-topeer routing infrastructure for sharing digital items with an associated license. Accordingly, we propose a framework made up of a secure Distributed Hash Table layer based on Kademlia, including an identity based scheme and a secure communication protocol, providing an effective defense against well known attacks. Povzetek: Predstavljen je sistem za učinkovito indeksiranje in doseganje digitalnih vsebin. 1
... Rights metadata are expressed by means of MPEG-21 REL, which describes the license associated to a specific resource, along with several available rights (play, copy, modify, print, etc.). According to the schema shown inFigure 1 (taken from [3]) we can imagine the license as made up of an issuer (with multiplicity 0 or 1), an undefined number of grants (multiplicity 0 or more), and a principal (multiplicity 0 or 1). The issuer is the owner of the rights associated to a given content (eventually coincident with the creator or distributor of the resource) and can assign a given right (e.g., the authorization to copy or modify the content) to the principal. ...
Conference Paper
Full-text available
This paper introduces a suitable way for indexing multimedia metadata on a structured peer-to-peer overlay network, with special care to the management of rights metadata expressed by MPEG-21. We have selected a suitable subset of MPEG-21 rights expression language elements to be indexed, in order to map governed contents into a flat space and allow insertion and retrieval of digital contents. Furthermore, we present a distributed application built on a structured overlay network enabling the search of multimedia items using rights related information. Our solution is completely decentralized and can be exploited in any MPEG-21 compliant metadata representation.
... Rights metadata are expressed by means of MPEG-21 REL [7], which describes the license associated to a specific resource, as along with several available grants (play, copy, modify, print, etc.). According to the schema shown in Figure 1, we can imagine the license as made up of an issuer (with 0 or 1 multiplicity), an undefined number of grants (0 or more multiplicity), and a principal (0 or 1 multiplicity) [8]. In widely used CreativeCommons licenses the principal is missing because this kind of licenses is intended for everyone, therefore the grant is not addressed to a specific logical actor. ...
Conference Paper
Full-text available
Peer-to-peer (P2P) systems are widely used for sharing digital items without structured metadata and in absence of any kind of digital rights management applied to the distributed contents. In this paper we propose the implementation of a prototype application that makes use of a structured P2P system enabling the indexing of complex metadata, used to express digital rights. In this way the media contents are exchanged and played according to the expressed grants. The creation and the consumption of the shared contents can be performed through any MPEG-21 REL compliant software and the application allows indexing and search for both governed and ungoverned contents. The information about the license can be included in the queries and the P2P network can be used to share governed contents (both free and with fee) in a legitimate way. In particular the proposed approach represents a suitable solution for indexing and querying rights complex structures on DHT based networks.
Article
Full-text available
Digital rights management solutions are today quite widespread. Their cost is still quite expensive, and thus in many cases, their application is limited to specific business cases. On the other hand, the market still offers large cases where scalable DRM solutions would find their applicability, for example the management of complex cross media content such as the one used for the educational content, for electronic medical record, etc. In this paper, the focus is on reducing DRM costs by solving scalability problems lying behind the complexity of granting authorizations and performing verification for a large number of users, content and rights associated with them. The proposed solution is based on the exploitation of the DHT P2P network and storage to cope with verification and grant authorization. The paper reports the structure of the DRM solutions, the details for including the DHT P2P into the DRM architecture. The paper also reports details on how the proposed P2P DRM solution can be integrated into traditional DRM solutions. The provided experimental results have proved the reduction of costs, the scalability against the aforementioned cases. The studies and solutions reported in this paper have been worked out and validated on top of MPEG-21/AXMEDIS DRM solutions and tools. On the other hand, the solution is general enough to be adapted in other DRM solutions.
Conference Paper
In this paper an approach to multicriteria search for people is presented using data extracted from telephone calls to emergency services. In the considered application a procedure of searching the most similar object (objects) to the reference object using metadata mechanisms is presented. The proposed solution computes as much reliable result as possible using even unreliable records in the database. In the considered case the commonly used mechanism based on exact matching does not give the best result. Instead, the multicriteria metadata matching mechanisms exploiting weights, probabilities, distances, and correlations among database records are more reliable.
Conference Paper
Full-text available
The need for a retrieval based not on the attribute val- ues but on the very data content has recently led to rise of themetric-basedsimilarity search. Thecomputationalcom- plexity of such a retrieval and large volumes of processed data call for distributed processing which allows to achieve scalability. In this paper, we propose M-Chord, a dis- tributed data structure for metric-based similarity search. The structure takes advantage of the idea of a vector index method iDistance in order to transform the issue of simi- larity searching into the problem of interval search in one dimension. The proposed peer-to-peer organization, based on the Chord protocol, distributes the storage space and parallelizes the execution of similarity queries. Promising features of the structure are validated by experiments on the prototype implementation and two real-life datasets.
Conference Paper
Full-text available
Due to the increasing complexity of current digital data, similarity search has become a fundamental computational task in many applications. Unfortunately, its costs are still high and the linear scalability of single server implemen- tations prevents from efficient searching in large data vol- umes. In this paper, we shortly describe four recent scalable distributed similarity search techniques and study their per- formance of executing queries on three different datasets. Though all the methods employ parallelism to speed up query execution, different advantages for different objec- tives have been identified by experiments. The reported re- sults can be exploited for choosing the best implementations for specific applications. They can also be used for design- ing new and better indexing structures in the future.
Conference Paper
Full-text available
This paper describes the part of the European PrestoSpace project dedicated to the study and development of a metadata access and delivery (MAD) system for television broadcast archives. The mission of the MAD system, inside the wider perspective of the PrestoSpace factory, is to generate, validate and deliver to the archive users metadata created through the employment of both automatic and manual information extraction tools. Automatic tools include audiovisual content analysis and semantic analysis of text extracted by automatic speech recognition (ASR). The MAD publication platform provides access and search facilities to the imported and newly produced metadata in a synergic and easy-to-use interface
Conference Paper
Full-text available
Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. The proposed GHT* index is a scalable and distributed structure. By exploiting parallelism in a dynamic network of computers, the GHT* achieves practically constant search time for similarity range queries in data-sets of arbitrary size. The structure also scales well with respect to the growing volume of retrieved data. Moreover, a small amount of replicated routing information on each server increases logarithmically. At the same time, the potential for interquery parallelism is increasing with the growing data-sets because the relative number of servers utilized by individual queries is decreasing. All these properties are verified by experiments on a prototype system using real-life data-sets.
Conference Paper
Full-text available
In this paper we present a scalable and distributed access structure for similarity search in metric spaces. The approach is based on the Content- addressable Network (CAN) paradigm, which provides a Distributed Hash Table (DHT) abstraction over a Cartesian space. We have extended the CAN structure to support storage and retrieval of more generic metric space objects. We use pivots for projecting objects of the metric space in an N -dimensional vector space, and exploit the CAN organization for distributing the objects among computer nodes of the structure. We obtain a Peer-to-Peer network, called the MCAN, which is able to search metric space objects by means of the similarity range queries. Experiments conducted on our prototype system confirm full scalability of the approach.
Book
Full-text available
In the Information Society, information holds the master key to economic influence. Similarity Search: The Metric Space Approach will focus on efficient ways to locate user-relevant information in collections of objects, the similarity of which is quantified using a pairwise distance measure. This book is a direct response to recent advances in computing, communications and storage which have led to the current flood of digital libraries, data warehouses and the limitless heterogeneity of internet resources. Similarity Search: The Metric Space Approach will introduce state-of-the-art in developing index structures for searching complex data modeled as instances of a metric space. This book consists of two parts. Part 1 presents the metric search approach in a nutshell by defining the problem, describes major theoretical principals, and provides an extensive survey of specific techniques for a large range of applications. Part 2 concentrates on approaches particularly designed for searching in very large collections of data. Similarity Search: The Metric Space Approach is designed for a professional audience, composed of academic researchers as well as practitioners in industry. This book is also suitable as introductory material for graduate-level students in computer science.
Article
Full-text available
In order to speedup retrieval in large collections of data, index structures partition the data into subsets so that query requests can be evaluated without examining the entire collection. As the complexity of modern data types grows, metric spaces have become a popular paradigm for similarity retrieval. We propose a new index structure, called D-Index, that combines a novel clustering technique and the pivot-based distance searching strategy to speed up execution of similarity range and nearest neighbor queries for large files with objects stored in disk memories. We have qualitatively analyzed D-Index and verified its properties on actual implementation. We have also compared D-Index with other index structures and demonstrated its superiority on several real-life data sets. Contrary to tree organizations, the D-Index structure is suitable for dynamic environments with a high rate of delete/insert operations.
Article
Full-text available
A new access meth d, called M-tree, is proposed to organize and search large data sets from a generic "metric space", i.e. whE4 object proximity is only defined by a distance function satisfyingth positivity, symmetry, and triangle inequality postulates. We detail algorith[ for insertion of objects and split management, whF h keep th M-tree always balanced - severalheralvFV split alternatives are considered and experimentally evaluated. Algorithd for similarity (range and k-nearest neigh bors) queries are also described. Results from extensive experimentationwith a prototype system are reported, considering as th performance criteria th number of page I/O's and th number of distance computations. Th results demonstratethm th Mtree indeed extendsth domain of applicability beyond th traditional vector spaces, performs reasonably well inhE[94Kv#E44V[vh data spaces, and scales well in case of growing files. 1
Creating rich metadata in the tv broadcast archives environment: The prestospace project. In Au-tomated Production of Cross Media Content for Multi-Channel Distribution
  • A Messina
  • L Boch
  • G Dimino
  • W Bailer
  • P Schal-Lauer
  • W Allasia
  • M Groppo
  • M Vigilante
A. Messina, L. Boch, G. Dimino, W. Bailer, P. Schal-lauer, W. Allasia, M. Groppo, M. Vigilante, and R Basili. Creating rich metadata in the tv broadcast archives environment: The prestospace project. In Au-tomated Production of Cross Media Content for Multi-Channel Distribution, 2006. AXMEDIS ’06. Second International Conference on, pages 193–200, Dec. 2006.