Conference PaperPDF Available

Hierarchical multidimensional modelling in the concept-oriented data model

Authors:

Abstract and Figures

In the paper the concept-oriented data model (COM) is described from the point of view of its hierarchical and multidimensional properties. The model consists of two levels: syntactic and semantic. At the syntactic level each element is defined as a combination of its superconcepts. At the semantic level each item is defined as a combination of its superitems. Such a definition has several general interpretations such as a hierarchical coordinate system or multidimensional categorization schema. The described approach can be applied to very different problems for dimensional modelling including database systems, knowledge based systems, ontologies, complex categorizations, knowledge sharing and semantics web.
Content may be subject to copyright.
Hierarchical Multidimensional Modelling
in the Concept-Oriented Data Model
Alexandr Savinov
Fraunhofer Institute for Autonomous Intelligent Systems
Schloss Birlinghoven, Sankt Augustin, D-53754 Germany
savinov@ais.fraunhofer.de
http://www.ais.fraunhofer.de/~savinov
http://conceptoriented.com
Abstract. In the paper the concept-oriented data model (COM) is described
from the point of view of its hierarchical and multidimensional properties. The
model consists of two levels: syntactic and semantic. Concepts are combina-
tions of superconcepts while items are combinations of superitems. It is de-
scribed how this model can be interpreted as a hierarchical coordinate system.
Two operations of projection and de-projection are used in the mechanism of
access path. Grouping and aggregation with roll up and drill down are imple-
mented via multidimensional de-projection. The described approach can be ap-
plied to very different problems for multidimensional modelling including data-
base systems, online analytical processing, knowledge based systems, ontolo-
gies, complex categorizations, knowledge sharing and semantics web.
1 Introduction
Currently there exist several general approaches to data modelling based on different
principles and main notions such as relations in the RM [5,9], entity and relationships
in the ERM [4], facts and object roles in the ORM [13], subject-predicate-object tri-
ples in the RDF [2] and many others. One of the most interesting approaches is based
on using dimension as the main notion and construct of the model and dimensionality
(degrees of freedom) as one of its main characteristics. This direction has been devel-
oped in the area of multidimensional databases [1,12,15,17,22] and online analytical
processing (OLAP) [3,17]. In this paper we describe an approach to dimensionality
modelling based on the concept-oriented data model (COM).
The described model has been proposed in [19,20] and it takes its origin from the
idea of multidimensional hierarchical space (analytical space) [18]. There are several
general assumptions about the nature of data in the COM. One of them is that the
whole model is viewed as one global construct with canonical syntax and semantics.
Analogous assumption is used in the universal relation model (URM) [6,14,16]. Us-
ing this assumption we can derive properties of elements as properties of the whole
model as well as automate many operations such logical navigation and query con-
struction. Another important assumption is that the model is based on ordering its
elements which is analogous to concept-lattices, formal concept analysis (FCA) [8]
Proc. the 3rd international conference on Concept Lattices and Their Applications (CLA 2005),
Olomouc, Czech Republic, September 7-9, 2005, 123-134.
and ontologies [7]. This means that the order of elements is of crucial importance for
the model and element position with respect to other elements determines most of its
properties. In great extent everything in the COM is about order of elements. In con-
trast to FCA where objects are grouped into concepts depending on their properties
(represented in incidence relation), items in the COM belong to concepts in advance
and then their properties depend on the relative position to other items. The third
assumption is that the hierarchical multidimensional structure of the model can be
used for automating data access and logical navigation. The mechanism of access
paths and queries in the COM is very close to that used in the functional data model
(FDM) [10,11,21].
In section 2 we formally define the COM including its physical and logical struc-
ture (section 2.1), syntactic dimensionality structure (section 2.2) and semantics (sec-
tion 2.3). In section 3 we describe how this model can be interpreted as a hierarchical
multidimensional coordinate system. In section 4 we introduce two operations of
projection and de-projection and the mechanism of access path based on them. Sec-
tion 4 describes multidimensional grouping and aggregation. We will follow a con-
vention that concept names are capitalized and written in plural while dimension
names are in singular and start from lower case letter, for example,
Products is a
concept while
product is a dimension with the domain in Products. Formulas will
be written as usual in italics while queries and examples in monotype font (Courier).
2 Model Definition
2.1 Physical and Logical Structure
In the concept-oriented model each element E is defined via other elements and this
definition consists of two parts: (i) a collection of other elements
},,{ KbaE =
,
K, , EbEa , and (ii) a combination of other elements:
= K,,VUE
,
K<< , , EVEU . Here
denotes membership in collection and < denotes member-
ship in combination. This means that each element stores and knows directly two
groups of elements which have different purpose and interpretation. Elements within
a collection are identified by reference while elements within a combination are iden-
tified by position. Collectional parts of elements describe physical structure of the
model which is assumed to be strictly hierarchical so that each element is a member
of one parent collection (Fig. 1a). It is important that this structure is fixed in the
sense that elements can be added or deleted but they cannot change their parent col-
lection. For example, element b cannot be moved from C to U (Fig. 1a). Physical
structure is used to implement references which are a mechanism of physical repre-
sentation and access. In other words, each element in the model has a reference ac-
cording to its position in the physical hierarchy of elements starting from the root.
Combinational parts of elements are intended to describe logical structure of the
model. Logical structure defines how elements are characterized by other elements. It
is important that an element may be a member of several combinations and itself
combines several other elements (Fig. 1b). Another important property of logical
structure is that it is not fixed and we can always change how any element is logically
defined.
=
cbag ,,, K
,
gcgbga <K<< , , ,
a b c
d e f
R
C U V
a b c d e f
concepts
items
model root
a) b)
},,,{ fedg K
=
,
gfgegd >K>> , , ,
>
- membership i
n
logical collection
- membership in
physical collection
g
Fig. 1. a) Physical structure is defined by a hierarchical collection where each element has only
one parent. In two-level model the root consists of concepts (syntax) and concepts consists of
items or concept instances (semantics). b) Logical structure is defined by combinations with
dual interpretation as a logical collection of elements. In two-level model each concept is a
combination of superconcepts and each item is a combination of superitems. Here element g
can be a concept (and then a, b,…,c are superconcepts and d,e,…,f are subconcepts) or an item
(and then a, b,…,c are superitems and d,e,…,f are subitems).
Elements in combination are interpreted as object properties, i.e., an element is a
combination of its properties. The dual interpretation is that an element is a logical
collection consisting of all the elements which include it into their combinational part.
Thus any element is a combination of other elements and, dually, a (logical) collec-
tion of other elements. Formally, if
=
KK ,,UE ( EU < ) then
},,{ KK EU =
( UE > ) where
>
denotes membership in logical collection. In contrast to physical
collections which are fixed and intended to implement references, logical collections
(dual combinations) are intended to represent data itself. We will follow a convention
that combination = K,,VUE is drawn below its elements K,,VU . In this case an
upward arrow connects combination with its constituents and denotes membership in
logical collection. Then each element can be represented as a combination of all ele-
ments above it and, dually, a (logical) collection of all elements below it (Fig. 1b). In
conventional terms this means that an object/record is a combination of its properties
but at the same time it is a collection/group of all objects/records where it is used as a
property. (A property is a group/category for all objects that have it.)
From the point of view of physical structure we distinguish three types of the
model: (i) one-level model has one root and a number of data items in it, (ii) two-level
model has one root, a number of concepts in it, each of them having a number of data
items, (iii) multi-level model has an arbitrary number of levels in the physical hierar-
chy. In this paper we consider only two-level model defined as consisting of the fol-
lowing elements:
[Root] One root element R is a physical collection of concepts,
},,,{
21 N
CCCR K
=
,
[Syntax] Each concept is (i) a combination of other concepts called superconcepts
(while this concept is a subconcept),
RCCCC
n
=
,,,
21
K
, and (ii) a physical
collection of data items (or concept instances),
RiiC
=
},,{
21
K
,
[Semantics] Each data item is (i) a combination of other data items called superitems
(while this item is a subitem),
Ciiii
n
=
,,,
21
K
, and (ii) empty physical collec-
tion,
{}=i
,
[Special elements] If a concept does not have a superconcept then it is referred to as
primitive and its superconcept is one common top concept; and if a concept does
not have a subconcept then it is assumed to be one common bottom concept, and
an absence of superitem is denoted by one special null item.
[Cycles] Cycles in subconcept-superconcept relation and subitem-superitem relation
are not allowed,
[Syntactic constraints] Each data item from a concept may combine only items from
its superconcepts.
2.2 Dimensions and Inverse Dimensions
Each superconcept has a uniquely identified position within a concept definition,
=
nn
CxCxCxC :,,:,:
2211
K
, which is called dimension (of rank 1 or local di-
mension). Here superconcepts
n
CCC ,,,
21
K
are called domains or ranges for di-
mensions
n
xxx ,,,
21
K , )Dom(
jj
xC
=
. Normally dimensions (concept positions
within a combination) are identified by names or integer values. The syntactic struc-
ture of concepts can be represented by a graph where nodes are concepts and edges
are dimensions connecting subconcepts with their superconcepts. Cycles are not al-
lowed in this graph so that concepts cannot be defined via themselves. Loops can be
permitted for simplicity and performance reasons. If one of two concepts is direct or
indirect parent of another then they are called syntactically dependent or parallel
concepts, otherwise the two concepts are referred to as independent or orthogonal. In
particular, all primitive concepts are orthogonal.
A dimension x of rank k is a sequence of k dimensions of rank 1 (separated by
dots) where each next dimension in the sequence belongs to the domain of the previ-
ous one:
k
xxxx ...
21
L=
, where )Dom()(Dom
1jj
xx < ,
kj ,,3,2 K
=
Dimensions will be frequently prefixed by the very first concept:
k
xxxC ....
21
L . Also
we frequently will use the terms dimension and its domain interchangeably. Each
dimension is represented by one path in the concept graph and the number of edges in
the path is its rank. A dimension with the primitive domain (direct superconcept of
the top concept) is referred to as primitive dimension. For example,
Auc-
tions.product.category
in Fig. 2 is a primitive dimension of rank 2 from the
source concept
Auctions to its superconcept Categories of rank 2. Note that there
may be several different paths (dimensions) between a concept and its superconcept.
The number of different primitive dimensions (paths) of a concept is referred to as
the concept primitive dimensionality. The model dimensionality is equal to that of the
bottom concept, that is, the number of dimension paths from the bottom concept to
primitive concepts (or, equivalently, to the top concept) is the model primitive dimen-
sionality. The length of the longest dimension (path) of a concept is referred to as
concept rank. The length of the longest dimension of the bottom concept is the model
rank. Thus each concept-oriented model is characterized by two properties: (i) model
rank describing its hierarchy (depth), and (ii) model dimensionality describing its
multidimensionality (width). Model structure can vary from the flat space with many
dimensions to one-dimensional hierarchal space. The task of the model syntactic
design consists in defining what concrete structure of dimensions is appropriate for
one or another problem domain.
Dually, each concept is a logical collection of its subconcepts:
},,,{
21 n
SSSC K= ,
j
SC <
. Inverse dimension is a dimension with opposite direc-
tion, i.e., where the dimension starts the inverse dimension has its domain, and where
the dimension ends the inverse dimension has its start. In concept graph inverse di-
mension is a path from superconcept to some its subconcept. Inverse dimension of
rank 1 is an edge in concept graph with the opposite direction (from superconcept to a
subconcept). Inverse dimensions do not have their own identifiers because they can
be produced from the corresponding dimensions by inverting their direction. We will
denote inverse dimensions by enclosing the corresponding dimension into curly
brackets. If
k
xxxC ....
21
L is a dimension of concept C with rank k then
}....{
21 k
xxxC L
is inverse dimension of concept
)Dom(
k
x
with the domain in C and
the same rank k. Thus any concept is characterized by (i) a set of dimensions leading
to superconcepts and (ii) a set of inverse dimensions leading to subconcepts. For
example,
{Auctions.product.category} is an inverse dimension of rank 2 from
superconcept
Categories to its subconcept Auctions (Fig. 1).
Prices
Users
Auctions
Top
AuctionBi
d
s
auction
Dates
Products
Categories
price
user
date
product
category
date
user
Fig. 2. An example of logical structure of concepts (syntax). Each concept is a combination of
its superconcepts and a logical collection of its subconcepts. The model syntax can be repre-
sented as a concept graph where one path is one named dimension. Dimensionality and inverse
dimensionality of this model is 6.
2.3 Semantics
Semantically each concept is a physical collection of its items,
},,{
21
KiiC
=
, where
each item is a combination of items from its superconcepts:
=
n
iiii ,,,
21
K , where
=
n
CCCCi ,,,
21
K and
njCi
jj
,,2,1 , K
=
Just like for the concept graph cycles are not permitted so that the items constitute an
acyclic graph. In such a definition two levels of logical description (syntactic and
semantic) are isolated, i.e., we can represent things either by means of concepts or by
means of items within their own acyclic graphs. A bridge between them is established
via the physical structure where each item is physically a member of one concept. The
logical structure of concepts is used to describe model dimensionality and impose
syntactic constraints on possible items. In other words, without syntactic constraints
an item can combine any items from any other concepts. In the presence of syntactic
constraints an item can combine only items from its superconcepts.
An important property of the concept-oriented semantics is its globality which
means that each item has its semantics distributed all over the model. Indeed, each
individual item taken in isolation as an instance of its concept has no its own intrinsic
properties or characteristics and the only thing that can be used to distinguish it from
other items is its (physical) reference (location in the physical hierarchy). Items in the
concept-oriented model can be only characterized by other items which in turn are
characterized by other items and so on. If we want to get some property then we need
to specify what other items we want to retrieve. However, an item representing a
property may well be also semantically empty (if it is not a primitive item) so we need
to retrieve some other item and so on. Thus any individual item semantics is distrib-
uted all over the model and finding item properties is reduced to specifying what part
of the whole model semantics (associated with this item) we want to get.
Another property of this approach is that it does not have a dedicated mechanism
of multi-valued properties. In order to model single-valued and multi-valued attrib-
utes the COM uses dimensions and inverse dimensions, respectively. In other words,
dimensions are always single-valued and return one superitem as a value while in-
verse dimensions are always multi-valued and return a collection of subitems as its
values. Single-valued attributes are upward directed while multi-valued attributes are
downward directed in the concept graph. Such a mechanism has significant conse-
quences. We cannot now simply change the multiplicity of an attribute because it
entails changing the relative position of the domain. For single-valued attributes the
domain is positioned above the source concept while for multi-valued attributes the
domain is positioned below the source concept. Thus changing the multiplicity results
in significant changes in the model semantics and syntax.
3 Hierarchical Coordinate System
Universe of discourse of a concept is the Cartesian product of its superconcepts:
},,2,1,|,,,{
2121
njCiiiiCCC
jjnn
KKK
=
=
=×××=
ω
Each syntactically possible combination
ω
of superitems is referred to as point.
Only a subset of all points from
really exists and this subset is precisely the con-
cept semantics (a set of its items). The superconcepts in the definition of a concept are
treated as axes of the multidimensional space while items in these superconcepts are
coordinates on these axes. The model can be then interpreted as a hierarchical multi-
dimensional coordinate system where each item is characterized by its position along
superconcepts as axes and, at the same time, is a coordinate taken by its subitems.
For example, in Fig. 3 the problem domain consists of 2-dimensional plane XY 4
points of which are combinations of 2 coordinates along two axes X and Y. From the
concept-oriented point of view both axes X and Y are superconcepts each with 2 su-
peritems (black circles). Their common subconcept XY potentially can consist of 4
subitems because there exist only 4 combinations of superitems. However, subcon-
cept XY has only 3 subitems in its semantics while the fourth potentially possible
subitem does not exist (dashed circles). Each item from concept XY corresponds to
one point in the 2-dimensional plane while its constituents are interpreted as coordi-
nates taken along the superconcepts as axes.
Such a coordinate system is flexible because we can change the available coordi-
nates for positioning objects by adding or removing items. This means that the coor-
dinate system is an integral part of and is described by the model itself. New axes are
introduced by defining new concepts and new possible coordinates can be added by
defining new data items within existing concepts. For example, in Fig. 3 we might
add a new coordinate Z and then data points from their subconcept XYZ could be
positioned by assigning three coordinates rather than two.
X
Y
X Y
XY
Fig. 3. Interpretation the model as a coordinate system. Plane with two axes is a subconcept
with two superconcepts.
Another advantage is that it allows the modeller to create hierarchical coordinate
systems. The thing is that all concepts have equal rights and can be treated as an axis
with coordinates for its subconcepts and as a space with objects for its superconcepts.
Thus any new subconcept defined via its superconcepts can be treated as a separate
individual axis for future subconcepts. The coordinates in such a complex axis are
complex multidimensional objects having many their own coordinates. For example,
in Fig. 3 each point in XY is a combination of two coordinates from X and Y but in the
hierarchical system each coordinate from X and Y can be itself a point with its own
coordinates taken from their superconcepts. On the other hand, each item from XY can
be used as a possible coordinate for new subconcepts added later. For example, we
might add a new subconcept A which has two dimensions in XY and Z. The primitive
dimensionality of A is 3 but we specify only two coordinates for each its item.
The problem of modelling in such a setting is formulated as designing a hierarchi-
cal coordinate system (syntax) and positioning objects in it via other objects treated as
coordinates (semantics). After defining the model syntactic structure items are added
in superconcepts and serve as coordinates for adding items in subconcepts. For exam-
ple, in Fig. 2 each auction bid from concept
AuctionBids has four coordinates along
axes
Prices, Users, Dates and Auctions. However, the last coordinate has its
own three coordinates, i.e., each auction along axes
Users, Dates and Products.
Finally each auction bid has 6 primitive coordinates because there are 6 different
paths from the concept AuctionBids to the top concept. We might equivalently
describe this model as the flat multidimensional space consisting of 6 primitive con-
cepts and one 6-dimensional concept
AuctionBids. Obviously, such a model would
have very low level and too inflexible. Having a hierarchical coordinate system al-
lows us to introduce intermediate levels, which impose syntactic constraints. The
concept
AuctionBids in this case has only 4 its own dimensions and describing its
items in this case is much more convenient.
4 Projection and De-projection
Let us assume that I is a subset of items from concept C, CI . In order to get re-
lated items from some target domain concept we need to specify a path by means of
either dimension (with the domain in a superconcept of C) or inverse dimension (with
the domain in a subconcept of C). This path (dimension or inverse dimension) is ap-
plied to the source subset of items and returns a subset of related items from the do-
main concept. If
k
dddd ...
21
L= is a dimension of C with domain in )Dom(dU =
then operation dI
is referred to as projection of I to U and returns a set of su-
peritems referenced by items from I:
} ,.|{ CIiudiUudI
==
It is important that each item from U can be included into the result collection (pro-
jection) only one time. If we want to include superitems as many times as they are
referenced then we can use dot instead of arrow, i.e., xI. includes all referenced su-
peritems of U even if they occur more than once. For example, if A is a collection of
today’s auctions then
A->product.category will return all today’s categories
(each category only once) while
A.product.category will return categories for
each auction in A (as many categories as we have auctions).
Dual operation to projection returns a set of related items from a subconcept. If
}...{}{
21 k
dddd L= is an inverse dimension of C with domain in
})Dom({dS
=
then de-projection of I to S is defined as follows:
} ,.|{}{ CIiidsSsdI
==
The result collection of de-projection
}{dI
consists of all items from subconcept
S which reference items from I via dimension d. For example, if C is a set of catego-
ries then
C->{Auctions.product.category} returns a set of all auctions with the
products having these categories. Dimension d specifying path between the source
concept and the target concept is referred to as bounding dimension.
Access path is sequence of dimensions or inverse dimensions separated either by
dot or by arrow where each next operation is applied to the result collection returned
by the previous operation. Each access path has a zigzag form in the concept graph
where dimensions move up to a superconcept while inverse dimensions move down
to a subconcept in the concept graph.
It is possible to restrict items that are included into de-projection by providing a
condition that all items from the domain subconcept have to satisfy:
} ,true)(.|{)}(|.:{ CIisfidsSssfdSsI
=
=
=
Here dS. is bounding dimension from subconcept S to the source collection I, s is
instance variable that takes all values from set S and the predicate f (separated by bar)
must be true for all items s included into the result collection (de-projection). After
the query in angle brackets we can specify a list of values returned with each item of
the new result collection. For example, access path
C->{a : Auction.product.category | a.date==today}<user,date>
will return user and date for all today’s auctions for a subset of categories from C.
Frequently we need to have aggregated characteristics of items computed from re-
lated items. This can be done by defining a derived property of concept which is a
named definition of a query returning one item or a collection for each item from this
concept. For example, we could define a derived property
allBids of concept Auc-
tions
returning a collection of all bids for one auction:
Auctions.allBids = this->{ AuctionBids.auction }
(Keyword this is an instance variable referencing the current item.) Derived proper-
ties can use other properties. For example, the maximum bid for an auction could be
defined as the following derived property:
Auctions.maxBid = max( this.allBids.price )
Here we get a set of all bids by applying existing property allBids to the current
item, then get their prices via dot operation and then find the maximum price. In the
same way we might compute the mean price for ten days for one category:
Category.meanPriceForTenDays =
avg( {ab in AuctionBids.auction.product.category |
ab.auction.date > today-10 }.price );
5 Multidimensional Grouping and Aggregation
In the previous section we assumed that there is only one path between the source
concept and the target concept with related items, which is specified by means of one
bounding dimension. However, in multidimensional case there can be more than one
path which is specified by means of several bounding dimensions. If I is a subset of
items from the source concept C,
CI
, S is some subconcept of C, and
n
ddd ,,,
21
K are different dimensions of S with the domain in C,
Cd
j
=)(Dom
,
nj ,,2,1 K=
, then multidimensional de-projection of I to S is defined as a set of
subitems Ss
that reference source items Ii
along all dimensions:
} ,...|{},,,{
2121
CIiidsidsidsSsdddI
nn
=
=
=
= KK
If we use only one dimension then we get 1-dimensional de-projection defined in the
previous section. The idea of multidimensional de-projection is that we want to get
subitems s belonging to one group item i along all specified dimensions. For example,
if we select a set of points interpreted as groups/categories then multidimensional de-
projection will return a set of all objects with coordinates in these points (projected to
these points).
The mechanism of multidimensional de-projection can be used for online analyti-
cal processing (OLAP) where the task consists in finding aggregated characteristics of
objects with the possibility to vary level of details. Let us assume that S is some con-
cept with items to be aggregated. In concept S we select a number of dimension paths
n
ppp ,,,
21
K
which will be used for analysis. A level
=
n
lllL ,,,
21
K
is a set of
numbers specifying a set of positions along dimension paths, a set of dimensions
n
ddd ,,,
21
K of concept S with ranks
n
lll ,,,
21
K with their domains
n
DDD ,,,
21
K ,
)(Dom
jj
dD =
,
nj ,,2,1 K=
. One constituent of level
j
l
is a rank of dimension
j
d
taken along path
j
p
. Operation of increasing rank
j
l
(one constituent of level)
of one dimension
j
d is referred to as roll up. Operation of decreasing rank
j
l of one
dimension
j
d is referred to as drill down. If all level constituents are 0s then we get
concept S which contains the most detailed information used for analysis. If all level
constituents are 1s then we get a set of dimensions with rank 1 and their correspond-
ing direct superconcepts of S as domains.
For each level we can define its universe of discourse or multidimensional cube as
a set of all possible points produced from the corresponding domains:
}|,,,{
2121 jjnnL
DDDD
=
=××
×
=
ω
ω
ω
ω
ω
KK , )Dom(
jj
dD
=
Projection of a set of items SI
to level L is a set of points from
L
referenced by
items from I via dimensions
n
ddd ,,,
21
K of level L:
} ,...|{
21
SIidididiLI
nL
=
=
==
ω
ω
ω
ω
K
Multidimensional de-projection of a subset of items
L
I
(where
L
is deter-
mined by level L) is a set of items from S with projection in I:
} ,...|{}{
21 Ln
IdsdsdsSsLI
=
=
==
ω
ω
ω
ω
K
Multidimensional de-projection allows us to find items from a subconcept that are
related to each point from one level. Thus items from the subconcept are broken into
groups depending on the point from the level they are projected to. For each such
group we can then find some aggregated property.
For example, let us suppose that S=
Auctions is the concept we want to analyse.
The first dimension path is p
1
=Auctions.date has only one level of details. The
second dimension path p
2
=Auctions.products.categories has two levels of
details so that we can consider either
Products (more details) or Categories (less
details) as domains for our analysis. The goal is to understand how properties of auc-
tions are distributed over these two dimensions. The universe of discourse is defined
as follows:
{Dates, Categories | having(this)}. Here we enumerate do-
mains from the level D
1
=Dates, D
2
=Categories and provide also restriction on the
points we are interested in the predicate
having(this) (analogous to HAVING
clause in SQL). Then each point from this space has to be de-projected to the target
concept S=
Auctions and this group of items can be used to find some aggregated
property. The following query will find average maximum bid for last week:
{d : Dates, c : Categories | isLastWeek(d) }<
avg(this->{Auctions.date,Auctions.product.category}.maxBid)
as averagePrice >
Here in angle brackets after the source collection we specify values computed and
returned for each point from the universe of discourse. In this example, we select only
points for last week by imposing constraint on the source 2-dimensional space is-
LastWeek(d)
. Each point from this space is de-projected to concept Auctions
using query
this->{Auctions.date, Auctions.product.category} with
two bounding dimensions. Then we add property
maxBid (defined in the previous
section) to this access path because we want to find average value for all items of this
de-projection. And finally this group of auction prices is passed as a parameter to the
aggregation function which returns one value as a computed characteristic of one
source point with the new name
averagePrice. The result is that we have a two-
dimensional space with one aggregated property (called measure in OLAP).
It is important that we can vary the level of details. For example, to get more de-
tails we can drill down along the second dimension path and consider
Products
instead of
Categories. Additionally, we might want to consider only auctions with
price higher than some threshold:
{d : Dates, c : Products | isLastWeek(d) }<
avg(this->{Auctions.date,
Auctions.product | this.maxBid > 10 }.maxBid)
as averagePrice >
This query will produce all combinations of dates and products for last week and for
each of them find a group of auctions with price higher than 10. Then this group of
prices is averaged and returned as a new aggregated property. Note that grouping is
carried out independent of any measure because we simply de-project a point and find
a set of related subitems. Measure in this case is any property of items in de-
projection (concept S). In particular, it is possible to define several measures in one or
more different subconcepts with their own bounding dimensions. It is important also
that dimension paths, measure and cubes are roles which are assigned only for certain
type of analysis while the model itself is defined only by its concept graph.
6 Conclusion
In the paper we described an approach to hierarchical multidimensional modelling
based on the concept-oriented data model. This approach separates physical structure
and logical structure by defining each element as a collection of other elements and as
a combination of other elements. The logical structure is then used to represent data.
For modelling (syntactic) dimensionality structure we use concepts which are combi-
nations of their superconcepts. A distinguishing feature is that we define the model
structure as an acyclic graph with edges as dimensions and after that this graph is
used for analysis by assigning roles to its elements such as levels, cubes, dimensions
and measures. In contrast, the conventional approaches are based on defining dimen-
sions, cubes and other elements needed for analysis as initial elements of the model.
Two important operations used for analysis are projection and de-projection which
use the mechanism of dimensions and inverse dimensions, respectively. Taking into
account simplicity and rich set of features and mechanisms this model can be applied
not only to dimensionality modelling but used in many other application areas for
modelling a wide range of practical situations and use cases.
References
1 R. Agrawal, A. Gupta and S. Sarawagi, Modeling multidimensional databases, Proc. 13th
International Conference on Data Engineering (ICDE’97), 232-243, 1997.
2 T. Berners-Lee, J. Hendler and O. Lassila, The Semantic Web, Scientific American, May
2001.
3 A. Berson and S.J. Smith, Data warehousing, data mining, and OLAP, New York, McGraw-
Hill, 1997.
4 Peter Pi-Shan Chen: The Entity-Relationship Model. Toward a Unified View of Data. In:
ACM Transactions on Database Systems 1/1/1976 ACM-Press ISSN 0362-5915, S. 9-36
5 E.F. Codd, A relational model of data for large shared data banks, Communications of the
ACM 13(6), 377-387, 1970.
6 R. Fagin, A.O. Mendelzon, J.D. Ullman, A Simplified Universal Relation Assumption and
Its Properties. ACM Trans. Database Syst. 7(3), 343-360, 1982.
7 D. Fensel, Ontologies: a silver bullet for knowledge management and electronic commerce.
Springer, 2004.
8 B. Ganter and R. Wille, Formal Concept Analysis: Mathematical Foundations, Springer,
1999.
9 H. Garcia-Molina, J. Ullman and J. Widom, Database Systems: the Complete Book, Prentice
Hall, 2003.
10 P.M.D. Gray, P.J.H. King and L. Kerschberg (eds.), Functional Approach to Intelligent
Information Systems (Special issue). Journal of Intelligent Information Systems 12, 107–
111, 1999.
11 P.M.D. Gray, L. Kerschberg, P. King, and A. Poulovassilis (eds.), The Functional Approach
to Data Management: Modeling, Analyzing, and Integrating Heterogeneous Data, Heidel-
berg, Germany, Springer, 2004.
12 M. Gyssens and L.V.S. Lakshmanan, A foundation for multi-dimensional databases, Proc.
23th VLDB '97, Athens, Creece, 106-115, 1997.
13 T.A. Halpin, Entity Relationship modeling from ORM perspective. Journal of Conceptual
Modeling (www.inconcept.com/jcm), 11, 1999.
14 W. Kent, Consequences of assuming a universal relation, ACM Trans. Database Syst., 6(4),
539-556, 1981.
15 C. Li and X.S. Wang, A data model for supporting on-line analytical processing, Proc.
Conference on Information and Knowledge Management, Baltimore, MD, 81-88, 1996.
16 D. Maier, J. D. Ullman, and M. Y. Vardi, On the foundation of the universal relation model,
ACM Trans. on Database System (TODS), 9(2), 283-308, 1984.
17 T.B. Nguyen, A.M. Tjoa and R.R. Wagner, An Object Oriented Multidimensional Data
Model for OLAP, Proc. 1st International Conference on Web-Age Information Management
(WAIM'00), Shanghai, China, June 2000. LNCS, Springer, 2000.
18 A. Savinov, Analytical space for data representation and interactive analysis, Computer
Science Journal of Moldova, 10(1), 59-80, 2002.
19 A. Savinov, Principles of the Concept-Oriented Data Model, Technical Report, Institute of
Mathematics and Informatics, http://conceptoriented.com/savinov/publicat/imi-report’04.pdf
2004.
20 A. Savinov, Logical Navigation in the Concept-Oriented Data Model. Journal of Conceptual
Modeling, http://www.inconcept.com/jcm, 2005.
21 D.W. Shipman, The Functional Data Model and the Data Language DAPLEX. ACM Trans-
actions on Database Systems, 6(1), 140–173, 1981.
22 R. Torlone, Conceptual multidimensional models, In: Multidimensional Databases: Prob-
lems and Solutions, Idea Group, 69–90, 2003.
... We address this problem by rethinking dimensions and dimensionality (degrees of freedom) as first-class notions of the model which play a primary role for describing both transactional and analytical aspects. In COM, data is thought of as originally existing in multidimensional space (Savinov, 2005a) so that we always can say what coordinates this element has and how many dimensions this schema has. Also, existing multidimensional models provide weak support for ad-hoc agile analytics because they are intended for describing predefined (OLAP) analysis scenarios expressed in terms of cubes, dimensions, measures and aggregation types. ...
... The short answer is that partial order is needed for data semantics where formal structure gets some meaning and all elements have formal definitions in terms of other elements. In particular, partial order is used for representing hierarchical multidimensional spaces (Savinov, 2005a). The meaning of an element is defined by its coordinates which in turn are treated as points with their own coordinates. ...
Technical Report
Full-text available
Concept-oriented model of data (COM) has been recently defined syntactically by means of the concept-oriented query language (COQL). In this paper we propose a formal embodiment of this model, called nested partially ordered sets (nested posets), and demonstrate how it is connected with its syntactic counterpart. Nested poset is a novel formal construct that can be viewed either as a nested set with partial order relation established on its elements or as a conventional poset where elements can themselves be posets. An element of a nested poset is defined as a couple consisting of one identity tuple and one entity tuple. We formally define main operations on nested posets and demonstrate their usefulness in solving typical data management and analysis tasks such as logic navigation, constraint propagation, inference and multidimensional analysis.
... This partial order relation leads to significant simplification of the model by unifying many existing data modeling patterns. It can be interpreted in terms of multidimensional spaces [Sav05b], containment relation (inclusion by-reference) or as relationships among ele-ments (as they are understood in the entity-relationship model). Partial order is also an important part underlying such mechanisms as logical navigation [Sav05c] and inference [Sav06]. ...
Preprint
Full-text available
The main goal of concept-oriented programming (COP) is describing how objects are represented and accessed. It makes references (object locations) first-class elements of the program responsible for many important functions which are difficult to model via objects. COP rethinks and generalizes such primary notions of object-orientation as class and inheritance by introducing a novel construct, concept, and a new relation, inclusion. An advantage is that using only a few basic notions we are able to describe many general patterns of thoughts currently belonging to different programming paradigms: modeling object hierarchies (prototype-based program-ming), precedence of parent methods over child methods (inner methods in Beta), modularizing cross-cutting con-cerns (aspect-oriented programming), value-orientation (functional programming). Since COP remains backward compatible with object-oriented programming, it can be viewed as a perspective direction for developing a simple and natural unified programming model.
... In the context of this paper, the most important property of partial order is that it can be used for representing multidimensional hierarchical spaces (Savinov, 2005a). The basic idea is that greater elements are interpreted as coordinates with respect to their lesser elements which are interpreted as points. ...
Preprint
Full-text available
In spite of its fundamental importance, inference has not been an inherent function of multidimensional models and analytical applications. These models are mainly aimed at numeric (quantitative) analysis where the notions of inference and semantics are not well defined. In this paper we argue that inference can be and should be integral part of multidimensional data models and analytical applications. It is demonstrated how inference can be defined using only multidimensional terms like axes and coordinates as opposed to using logic-based approaches. We propose a novel approach to inference in multidimensional space based on the concept-oriented model of data and introduce elementary operations which are then used to define constraint propagation and inference procedures. We describe a query language with inference operator and demonstrate its usefulness in solving complex analytical tasks.
... In COQL, product is written as a number of sets enclosed in parentheses and optionally prefixed by the CUBE or PRODUCT keyword. For example (Fig. 5), given two source collections with countries and product categories we can produce a 2-dimensional set where every element is a combination of one country and one produce category: The main application of the product operation is multidimensional analysis where the task is to group facts over cells of the cube and then compute some numeric aggregated characteristic (measure) for this group (Savinov, 2005). Grouping is performed by de-projecting a cell down to the facts. ...
... COM (2009a) is a general purpose unified data model aimed at describing many existing views and patterns of thoughts currently used in data modeling. As a unified model, its main goal is to significantly decrease differences and incongruities between various approaches to data modeling such as transactional, analytical (Savinov, 2011b), multidimensional (Savinov, 2005a), object-oriented (Savinov, 2011a), conceptual and semantic (Savinov, 2012c). The motivation behind COM and its practical benefits are similar to those for the Business Intelligence Semantic Model (BISM) (Russo, Ferrari, & Webb, 2012) introduced in Microsoft SQL Server 2012. ...
Chapter
Full-text available
... An approach described in this paper is being developed within a new paradigm which covers several branches in computer science including programming, data modelling [1,2,3] and system design. The conceptoriented paradigm is based on several general principles that distinguish it from the currently existing theories and approaches. ...
Article
Full-text available
In the paper we describe a new construct which is referred to as concept and a new concept-oriented approach to program-ming. Concept generalizes conventional classes and consists of two parts: an objects class and a reference class. Each concept has a parent concept specified via inclusion relation. Instances of reference class are passed by value and are intended to rep-resent instances of child object classes. The main role of con-cepts consists in indirecting object representation and access. In concept-oriented programming it is assumed that a system con-sists of (i) conventional target business methods (BMs), and (ii) hidden representation and access (RA) methods. If conventional classes are used to describe only BMs then concepts allow the programmer to describe both types of functionality including its hidden intermediate functions which are automatically executed when objects are being accessed.
... Currently there exist several general approaches to data modelling based on different principles and main notions such as relations in the RM [5], entity and relationships in the ERM [4], facts and object roles in the ORM [12], subject-predicate-object triples in the RDF [2] and many others. In this paper we describe a new approach to data modelling proposed in [16,17,18] and called the concept-oriented data model (COM). The COM belongs to a set of approaches based on using dimension (degree of freedom) as the main construct for data modelling. ...
Conference Paper
Full-text available
In the paper we describe the problem of grouping and aggregation in the concept-oriented data model. The model is based on ordering its elements within a hierarchical multidimensional space. This order is then used to define all its main properties and mechanisms. In particular, it is assumed that elements positioned higher are interpreted as groups for their lower level elements. Two operations of projection and de-projection are defined for one-dimensional and multidimensional cases. It is demonstrated how these operations can be used for multidimensional analysis.
Article
Full-text available
The paper describes logical navigation in the concept-oriented data model. This model explicitly and formally separates physical structure and logical structure so that each element of the model is simultaneously a collection and a combination of other elements. The physical structure is used to representing and access by elements by means of references. The logical structure is used to reflect the problem domain dependencies. The two-level model considered in the paper consists of a set of concepts and a set of items. Concept structure defines the model syntax while item structure defines its semantics. In the paper it is shown how the properties of the model can be used for logical navigation where we do not need to specify join conditions or other complicated parameters of queries.
Chapter
In the paper we describe a novel query language, called the concept-oriented query language (COQL), and demonstrate how it can be used for data modeling and analysis. The query language is based on a novel construct, called concept, and two relations between concepts, inclusion and partial order. Concepts generalize conventional classes and are used for describing domain-specific identities. Inclusion relation generalized inheritance and is used for describing hierarchical address spaces. Partial order among concepts is used to define two main operations: projection and de-projection. We demonstrate how these constructs are used to solve typical tasks in data modeling and analysis such as logical navigation, multidimensional analysis and inference.
Article
Full-text available
In the article the notion of analytical space is introduced and its application to data representation and interactive analysis is studied. Analytical space is defined through membership relation among its elements where each element is characterised by its extensional and intensional. All data element properties are derived from this fundamental relation. Inference in analytical space is thought of as finding mapping from one subspace into another and is carried out through propagation of information by means of deaggregation and aggregation operations. The general goal of data analysis is defined to be simplifying the form of representation with simultaneous retaining most of the data semantics. An interactive data analysis procedure is proposed, which is based on selecting interesting subspaces in analytical space, finding mapping from this space into the range of values by means of inference and finally visualising this mapping in an appropriate form.
Article
Full-text available
The paper describes logical navigation in the concept-oriented data model. This model explicitly and formally separates physical structure and logical structure so that each element of the model is simultaneously a collection and a combination of other elements. The physical structure is used to representing and access by elements by means of references. The logical structure is used to reflect the problem domain dependencies. The two-level model considered in the paper consists of a set of concepts and a set of items. Concept structure defines the model syntax while item structure defines its semantics. In the paper it is shown how the properties of the model can be used for logical navigation where we do not need to specify join conditions or other complicated parameters of queries.
Article
Full-text available
One problem concerning the universal relation assumption is the inability of known methods to obtain a database scheme design in the general case, where the real-world constraints are given by a set of dependencies that includes embedded multivalued dependencies. We propose a simpler method of describing the real world, where constraints are given by functional dependencies and a single join dependency. The relationship between this method of defining the real world and the classical methods is exposed. We characterize in terms of hypergraphs those multivalued dependencies that are the consequence of a given join dependency. Also characterized in terms of hypergraphs are those join dependencies that are equivalent to a set of multivalued dependencies.
Chapter
It is generally observed throughout the world that in the last two decades, while the average speed ofcomputers has almost doubled in a span of around eighteen months, the average speed of the networkhas doubled merely in a span of just eight months! In order to improve the performance, more and moreresearchers are focusing their research in the field of computers and its related technologies. DataMining is one such research area. It extracts useful information the huge amount of data present in thedatabase. The discovered knowledge can be applied in various application areas such as marketing,fraud detections and customer retention. It discovers implicit, previously unknown and potentially usefulinformation out of datasets. Recent trend in data mining include web mining where it discover knowledgefrom web based information to improve the page layout, structure and its content.
Article
Future users of large data banks must be protected from having to know how the data is [sic] organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at terminals and most application programs should remain unaffected when the interned representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information. Existing noninferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data. Inadequacies of these models are discussed. A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced.
Article
Although central to the current direction of dependency theory, the assumption of a universal relation is incompatible with some aspects of relational database theory and practice. Furthermore, the universal relation is itself ill defined in some important ways. And, under the universal relation assumption, the decomposition approach to database design becomes virtually indistinguishable from the synthetic approach.
Article
We describe how to express constraints in a functional (semantic) data model, which has a working implementation in an object database. We trace the development of such constraints from being integrity checks embedded in procedural code to being something declarative ...