Content uploaded by Frederic Andres
Author content
All content in this area was uploaded by Frederic Andres
Content may be subject to copyright.
Visual Topic Maps Layer between Document Collections
and Learning Material
Sachit Rajbhandari
FAO/NAiST Lab, Kasetsart
Univ.
50 Phaholyothin Rd.
Bangkok, Thailand
sachit.rajbhandari@gmail.com
Frederic Andres
NII
2-1-2 Hitotsubashi,
Chiyoda-ku
Tokyo 101-8430, Japan
andres@nii.ac.jp
Asanee Kawtrakul
NAiST Research Laboratory,
Kasetsart University
P.O. Box 1212
Bangkok, Thailand
asanee.kawtrakul@nectec.or.th
ABSTRACT
This paper introduces a semantic layer, called visual topic
maps, to build a bridge between annotated document col-
lections and the use of these documents as learning ma-
terial. The main components of a visual classification are
metadata-based topic maps attached to documents that al-
low customization according to users’ needs and profiles.
The metadata of documents and visual topic maps are based
on MPEG7 and its Semantic Descriptor including additional
attributes on instructional information. These generic meta-
data are complemented by domain specific metadata based
on the domain of the document collection. The paper ex-
plains the visual topic maps concept and sets it in context
to research on learning objects, metadata and educational
modeling languages. The structure of visual topic maps and
their metadata are then discussed in more detail and the
process of building visual topic maps and using them is out-
lined.
Categories and Subject Descriptors
H.4 [Information Systems Applications]: Miscellaneous;
I.7.5 [Document Capture ]: Document analysis
General Terms
Management, Standardization
Keywords
Visual Semantic, Visual Topic Maps, ISO standard
1. INTRODUCTION
The context for the research presented in this paper is the
”Memory of the Past” project being carried out at the Na-
tional Institute of Informatics, Japan. This project aims at
collaborative building of a multi-lingual multi-cultural digi-
tal memory of the past by school pupils, teachers and multi-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ICIKM ’08 Kathmandu, Nepal
Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.
disciplinary experts. A wide variety of techniques are em-
ployed to present the history: ancient maps and drawings
are digitized, 3D animations of artefacts are constructed,
historical documents are analyzed, images and videos are
recorded to show the present condition of historical sites,
etc. The resulting document collection needs to be seman-
tically enriched and annotated using metadata to facilitate
the searching for documents. Currently, the documents are
being annotated using standardized metadata schemes like
Dublin Core 1and LOM (2002) 2and partly with specif-
ically developed metadata schemes such as the ontology-
based metadata schemes developed in [10]. From the per-
spective of eLearning, a project such as the ”Memory of
the Past” project delivers a very large archive of potential
learning objects. The challenge that each individual teacher
faces is to locate the appropriate information using a se-
mantic search engine. This task is compounded by several
factors: the shear number of documents of various types
and content, the distribution of documents and their meta-
data across multiple sites, the limitations of standardized
metadata, the lack of context for standardized metadata,
the restrictions in time available to ’surf ’ and search for
resources, the variety of languages connected with such a
project, or possibly the lack of domain knowledge in highly
specialized areas. The research presented in this paper aims
to address some of the issues mentioned to facilitate access
to information in contexts of very large collections of doc-
uments relating to a common subject area as given in the
example of the ”Memory of the Past” project. The idea
is to introduce a semantically rich layer, informally called
’visual topic maps’, between document collection and learn-
ing material that links documents according to topics. One
motivation behind this approach is to add a more focused,
semantic layer on top of the untargeted metadata that are
commonly used to describe single documents. Speaking from
an eLearning context, the visual topic maps build on learn-
ing objects and become information sources and building
blocks for stories and learning material. The implementa-
tion and practice have been done using a semantic extension
of TM4L [5] as TM4L is one of the most popular topic maps
editors currently available. In the following of the paper,
Section 2 outlines the background of ”Visual Topic Maps”
and a discussion emphasizes the thinking behind the con-
cept that is explained in its main aspects thereafter. Then,
1Dublin Core http://dublincore.org
2IEEE Learning Technology Standards Committee (LTSC)
http://ltsc.ieee.org/wg12
topic map implementation is introduced in Section 3, fol-
lowed by section 4 on authoring and use of topic maps. A
summary and plans for future research conclude the paper
in section 5.
2. WHY VISUAL TOPICS ?
Visual Topics are a form of semantic where the knowledge
base is driven by visual features. These characteristics sup-
port our idea of providing an interpretive, semantic layer on
top of document collections that classifies these documents
according to scope, context and constraints. The terms ’vi-
sual topic’ further matches well the call for ’subjective topic’
or ’subjective meta’ data which we will outline in later sec-
tions of this paper. Regarding the learners [11], visual topic
maps provide support for an efficient context-based retrieval
of learning resources [1] as textual topics have semantic
limitation compared to visual topics which have a better
awareness in topic-domain browsing using a higher semantic
level. Furthermore, other advantages are related to informa-
tion visualization; customized views, adaptive guidance, and
context-based feedbacks. Regarding the instructors, visual
topic maps improve the effectiveness of management and the
maintenance of knowledge and information according to sev-
eral layers [3]. It provides a better personalized courseware
presentations using visual semantics. It can help distributed
courseware development, to reuse and exchange of learning
materials. Finally, it is possible to set collaborative visual
topic maps authoring as it has been shown in [4].
3. TOPIC MAPS MODEL
The Topic Maps model is defined by a resource algebra
to handle topic maps produced by semantic computing ap-
proach such as Latent Semantic Analysis [2]. The resource
algebra is described in the following of this section.
Resource Algebra: This Resource algebra uses resources’
domain data types. Resource semantic type and func-
tions in the topic maps are directly represented using
the appropriate data type and functions supported by
the resource algebra. This algebra follows two targets.
First, it is the semantic interface between scientist to
reduce the semantic gap and to strength the metadata
bridging between them. Second, this high level seman-
tic algebra facilitates the collaborative intersection of
scientists using topic maps integrating high level se-
mantics. Let us remind the notion of many sorted
algebra [6]. Such an algebra consists of several sets
of values and a set of operations (functions) between
these sets. It consists of two sets of symbols called
sorts (e.g. topic, pdf, rtf, jpeg) and operators (e.g.
tm transcribe, semantic similarity); the function sec-
tions constitute the signature of the algebra. Second
Order Signature [7] is based on two coupled many-
sorted signatures where the top-level signature pro-
vides kinds (set of types) as sorts (e.g. DATA, RE-
SOURCE, SEMANTIC DATA) and type constructors
as operators (e.g. set).
To illustrate the approach, we assume the following
simplified many-sorted algebra as topic map model im-
plementation:
Kinds DATA,RESOURCE,SEMANTIC DATA, TOPIC MAPS,SET;
Type Constructor
→DATA topic;
→RESOURCE pdf, rtf, htm, xml, cvs, jpeg, tiff;
SEMANTIC DATA lsi sm, mpeg7 sm;
→SEMANTIC DATA 5w1h sm;
→TOPIC MAPS tm(topic maps);
TOPIC MAPS →SET set;
Unary operations
∀Resource in RESOURCE, resource →sm:
SEMANTIC DATA, tm tm transcribe;
∀sm in SEMANTIC DATA sm set(tm) →semantic similarity;
Binary operations
∀tm in TOPIC MAPS, (tm)+→tm topicmaps merging;
∀sm in SEMANTIC DATA , ∀tm in TOPIC MAPS,
sm, tm →tm semantic merging;
∀topic in DATA, ∀tm in TOPIC MAPS,
set(tm) x (topic →bool) →set(tm ) select;
The notion sm:SEMANTIC DATA means ”some type
sm in SEMANTIC DATA,”. So There is a typing map-
ping associated with the tm transcribe operator. Each
operator determines the result type within the kind of
SEMANTIC DATA, depending on the given operand
resource types. The semantic merging operation takes
two or more operands that are all topic maps values.
The select takes an operand type set (tm) and a pred-
icate of type topic and returns a subset of the operand
set fulfilling the predicate. From the implementation
of view, the resource algebra is an extensible library
package providing a collection of resource data types
and operations for domain-oriented resource computa-
tion (e.g. agriculture field [8]). The major research
challenge will be the formalization and the standard-
ization of cultural resource data types and semantic
operations through ISO standardization.
Representation of Topic Maps: A Topic Maps can be
represented as a triple G = (T, G, r) where T is the
set of topics, G is the graph representation of T, and
r is a function called ”representation function”. The
domain of r is a part of G. The range of r is T. A topic
is defined by a name, by properties links to learning
resources and by metadata. The name is textual data.
on what the author can write. The author can use the
narrative to tell facts, provide interpretations, make
comparisons, draw attention or similar. The narratives
of stories are structured into units and paragraphs.
Within a unit a line of argument will be preserved.
Units can be used to tell different aspects of a story.
Definition: A visual topic map is a topic map where the
type of topic name is not only a string but a pointer to
a multimedia object. so a topics are visual topics. The
semantic vectors related to resources are associated to
each topic. furthermore, links to learning object re-
sources are visual topic’s resources. The purpose of
these links is to relate the visual topic closely to the
underlying documents using some knowledge manage-
ment systems or ontologies [9]. The documents can
provide examples, illustration, proof, further explana-
tion, additional material or similar.
Figure 1: Setting the language preference in TM4L
4. AUTHORING AND USING "VISUAL TOPIC
MAPS"
The concept of Visual Topic Maps was introduced in the
context of the ”Memory of the Past” project to enable or-
ganizing, maintaining, and using very large repositories of
digitized images. The project’s ambitious goal to build a
rich, multi-lingual, multi-cultural digital memory of the past
by users with different nationality, background, and level of
expertise posed a number of requirements to the authoring
environment of visual semantics, which was designed as an
extension of the Topic Map Editor TM4L, which supports
visual topic maps. These requirements include implement-
ing the user interface in various languages, enabling handling
of visual topics, supporting the author in building a visual
topic map, etc. It includes the full functionality of TM4L
complemented with new features for supporting the creation
and use of visual topic maps.
4.1 User interface internationalization
The user interface has been implemented currently in more
than 12 languages including English, Spanish, German, French,
Traditional and Simplified Chinese, Japanese, Nepali and
Thai language. While internalization of a website or a tool
with simple interface is straightforward using the Java in-
ternationalization feature cite as it is shown in Figure 1,
the complication in the case of this application comes from
the translation of the predefined in English object types,
which are used in a special way in the application, for exam-
ple, the predefined basic relationship types ’class-subclass’,
’part-whole’, and ’instance-of’, which are used for building
the topic hierarchy in the Topics panel.
4.2 Handling visual topics
A visual topic map contains two types of topics - ’stan-
dard’ topics and ’visual’ topics. Visual topics represent con-
cepts as images. Thus a visual topic presents a repository
image and has the image file name as its primary name and
a resource of type ’File Path’ containing the path of the im-
age file. In this way the primary topic name is not related
to any specific natural language. The term for the concept
reified by that topic, translated in different languages can
be added as additional topic names, scoped with the corre-
Figure 2: View of the name information of a visual
topic
sponding language topic. The use of scopes allows displaying
only the names scoped with a theme specified by the user
when visualizing the topic map. Thus concept names can
be displayed in different languages by specifying different
scopes. Among the implemented additions in TM4L is the
treatment of topic names and resources. In addition to the
two resource types defined in the Topic Map standard - in-
ternal resources that contain text included directly in the
topic map and external resources that specify URLs of web
resource, we have introduced a third type, different from
both of them. It resembles the external resources in that it
is not included in the TM and is specified by its address,
which however is not a URL, but a path of a file residing on
the local machine where TM4L is installed. The type of the
file is one of: JPEG, GIF, or PNG. Topics represent con-
cepts. In a conventional topic map, a concept is reified with
a topic, which is named with the term that is used to name
the concept. In a visual topic map, a concept is reified with
an image, which is reified with a topic having as a name the
name of the image file. Since a file name often doesn’t reveal
the semantics of the concept (image), it is very important
for the topic map authoring that the tool provides a mean
for displaying the image. Thus in TM4L, topic name infor-
mation is displayed differently for the standard and visual
topics. If a topic is a standard topic, then the topic name
string is displayed; if it is a visual topic, then in addition to
the name, the image represented by the topic is displayed
(see Figure 2). This way by seeing the image, the author can
identify the concept and subsequently add additional name
and/or annotate it.
Similarly, in the visualization of the topic map, for all
’standard’ topics, their topic names are displayed; for the
visual topics - icons of the images that they represent are
displayed. For each icon, the corresponding full-size image
can be displayed if the ”View image” option of the context-
sensitive menu is selected. (See Figure 3 ).
4.3 Support in building visual topic maps
As it was already mentioned, the concept of visual topic
maps was introduced to facilitate the structuring and use of
large collections of images. Since the visual topics are rep-
resented in the topic map by the paths of the corresponding
Figure 3: Topic Map visualization
images, the creation of such a map is a time-consuming and
unpleasant work for the author. From another side, such a
presentation allows an automatic extraction of (a draft) of
the topic map. In implementing this functionality we took in
consideration the fact that in many cases images are already
classified in subdirectories with meaningful names (indicat-
ing the scope of the stored there images). Thus, in order to
support the author, we implemented an automatic creation
of a draft topic map by recursive extraction of the structure
of a specified file directory, containing image files organized
hierarchically in subdirectories. Figure 4 displays the dialog
for extracting topics from a file directory. The author spec-
ifies the directory, the name of the relationship type to be
used in building the topic hierarchy and the root topic to
which the extracted hierarchy should be attached. The ex-
tracted topics are added to the current (a newly created or
an opened) topic map and after its reloading are displayed
in the Topic panel (see Figure 5).
The author can then use this draft map as a starting
point to produce the desired map. For example, he can
delete unwanted topics, restructure the topics hierarchy (by
adding/deleting parent topics), annotate topics, create new
relationships between topics. Figure 6 shows the visualiza-
tion of the automatically extracted topic map from Figure
5.
5. CONCLUSIONS
The ”Visual Topic Maps”concept introduces a new seman-
tic layer between collections of learning objects and learning
material. The topics link semantically related learning ob-
jects. As learning objects by definition are not restricted in
size the links from the visual topic maps refer to specific top-
ics within the learning objects to guide the reader precisely
to relevant sections. Furthermore, visual topic maps provide
the context and scope that are required for the specification
and use of metadata. The visual topic maps descriptions can
be seen as rich metadata that annotate the referred learning
objects. Visual topic maps specific metadata and domain
specific metadata allow for the customization of topic maps
according to the needs and scope of individual users. The Vi-
Figure 4: Automatic extraction of topics from a file
directory
Figure 5: Visualisation of the topic map as Tree
View
Figure 6: Visualization of the automatically ex-
tracted topic map
sual topic maps themselves form new learning objects that
provide annotation to thematically linked learning objects
stemming from large document collections. The metadata
of visual topic maps are based on MPEG7 for the links to
the multimedia document of the referenced learning objects
and on the MPEG7 Semantic Descriptor for each node of the
topic map. These metadata are complemented by attributes
for instructional information and story semantics. We envis-
age that the stories will be written by domain experts and
used by teachers. The term ”Visual” was chosen to express
the desire to communicate via a medium, the visual, that is
familiar to everyone as there and therefore can be easily au-
thored and easily understood. The visual topic maps provide
the mechanism required expressing knowledge and interpre-
tations in a natural way, the visual topic maps metadata
deliver the precision for search and retrieval both of top-
ics and underlaying documents. The visual topics concept
aims at classifying large document collections like they are
provided by the ”Memory of the past”project for the preser-
vation of the knowledge and culture. Work is currently un-
dertaken to specify multi-dimensional metadata under topic
maps model and to collect related data from experts and
pupils. The next step will be for experts and non-experts
such pupils to enrich visual topic maps with that set the vast
amount of single documents or learning objects in context
to make them more easily accessible.
6. ACKNOWLEDGMENTS
We would like to thanks NII for this International Co-
operation support and the Japanese Ministry of Education,
Science and Technology for the support to the visual topic
maps project under the Geomedia project.
7. REFERENCES
[1] H. Allert, H. Dhraief, and W. Nejdl. How are learning
objects used in learning process? instructional role of
learning objects in lom. In Proceedings of
EDMedia2002, pages 40–41, Denver, Colorado, USA,
2002. AACE.
[2] F. Andres and M. Naito. Dynamic topic mapping
using latent semantic indexing. In IEEE International
Conference on Information Technology and
Applications, pages 220–225. IEEE, July 2005.
[3] F. Buendia, P. Diaz, and J. V. Benlloch. A framework
for the instructional design of multi-structured
educational applications. In Proceedings of
EDMedia2002, volume 5021, pages 40–41, Denver,
Colorado, USA, 2002. AACE.
[4] D. C., D. Dicheva, and L. Aroyo. Using topic maps for
web-based education. In Int. J. of Advanced
Technology for Learning, volume 1(1), pages 1–7, 2004.
[5] D. Dicheva and C. Dichev. Tm4l: Creating and
browsing educational topic maps. In British Journal of
Educational Technology - BJET, volume 37(3), pages
92–109, 2006.
[6] R. G¨
uting. Gral: An extensible relational database
system for geometric applications. In Proc. of the 15th
Intl. Conf. on Very Large Databases, pages 33–44,
1989.
[7] R. H. G¨
uting. Second-order signature: a tool for
specifying data models, query processing, and
optimization. In SIGMOD ’93: Proceedings of the
1993 ACM SIGMOD international conference on
Management of data, pages 277–286, New York, NY,
USA, 1993. ACM Press.
[8] A. Kawtrakul, C. Yingsaeree, and F. Andres. Semantic
tracking in peer-to-peer topic maps management. In
DNIS 2007, pages 54–69. Springer LNCS 4777, 2007.
[9] D. Noikongka, D. Thamvijit, A. Imsombut,
M. Suktarachan, S. Rajbhandari, F. Andres, and
A. Kawtrakul. A workbench for collaborative
ontological knowledge construction and maintenance
with authoring tools. In Proceedings of Ontolex
workshop (From Text to Knowledge: The
Lexicon/Ontology Interfac)e, pages 98–107, 2007.
[10] T. Okamura, N. Fukami, C. Robert, and F. Andres.
Digital resource semantic management of islamic
historical buildings case study on isfahan islamic
architecture digital collection. In the International
Journal of Architectural Computing, volume 5 (2),
pages 356–373. Multi-Science Publishing Co Ltd, June
2007.
[11] C. Robert, F. Andres, and K. Veltman. Advances in
collaborative annotation in semantic management
environment. In IEEE/ACM ICDIM’2007, pages
339–344. IEEE, October 2007.