Content uploaded by Veslava Osinska
Author content
All content in this area was uploaded by Veslava Osinska on Aug 20, 2014
Content may be subject to copyright.
Veslava Osinska
Nicolaus Copernicus University, Torun, Poland
Joanna Dreszer-Drogorob
Nicolaus Copernicus University, Torun, Poland
Grzegorz Osinski
College of Social and Medial Culture, Torun, Poland
Michal Gawarkiewicz
Nicolaus Copernicus University, Torun, Poland
Cognitive Approach in Classification Visualization:
end-users study
Abstract: Visualization of scientific information extends the possibility to explore how the science is
organized and does change over the time. Particularly classified data include a great potential of
discovering the structure and dynamics of specified domain. The authors applied tested and
previously presented conception of ACM CCS (Computing Classification System) classification
mapping into a sphere surface. Classified documents form pattern according to their semantic
similarity. Two main goals of obtained visualizations were determined. It could be mainly used as
multiperspective analytical tool of original classification and its structure. Classification sphere also
might be considered as an ergonomic interface for exploring scientific resources as well as
information retrieval. Obtained graphical representations deliver quantitative material for analysis of
classification development and dynamics. The authors try to find reliable tools to evaluate it. They
constructed an appropriate interface and surveyed the distinct groups of users, who were asked
about key aspects of visualization layout and their changes. Results of our study allow to evaluate
visualization of classification thereby to improve proposed methodology as well as to discover a
new semantic features and laws in visual layout.
Keywords: visualization interface; Infoviz; mapping; classification scheme; ACM Computing
Classification System.
1. Introduction
Information Visualization is one of the stages in data analysis process and
delivers research material in a graphical form. Data correlations and hidden
structure can be discovered using visualization layouts with more or less
complex distribution of nodes. High perceptible attributes like colour, size and
location describe the main properties of data. Is interpretation of such graphical
pattern always coherent and correct? In any case, the author(s) take main part
in interpretation, and therefore, the final conclusion involves subjective aspects.
The question “Do you see the same as me?” relates to the problem of
evaluation in visual analysis. Computer scientists narrow down the evaluation
of visualization to comparison of layouts created by means of different metrics
and/or mapping algorithms (Boyack, 2005; Börner, 2010). However, this
approach does not resolve the problem of subjective interpretation because,
2
again, the end-analyst is a person who knows the main research problem and
may conduct his or her own scheme of assessment (Szelag et al, 2010).
Meteorological maps are claimed to be designed for a large group of recipients.
Usually cognitive mechanisms are implemented there, thus represented data
are easily understandable for all users. Examples of science maps (
Exhibit
Purpose and Goals 2007
) are helpful in stydying pattern recognition by human.
Authors observation of the participants of Places@Spaces exhibition leads to
the conclusion, that computer scientists take more interest in scientific data
landscapes than Humanities and Social Sciences specialists. Probably it
caused by habituation of visual communication among information technology
practicians (Carretié et all, 2003).
Birger Hjørland underlined the role of user-based study to Knowledge
Organization (2013). Cognitive approach has a long tradition (since 1970s) in
library and information science study (Birger 2013). In the past two decades
user-centered tendency became influential in broader society branches, for
example: technologies, businesses organizations, market and education.
According to Birger Hjørland Knowledge Organization, research must include
subjective aspects, and therefore is based on “collective views in discourse
communities” (Birger 2013).
This paper is a continuation of the study on visualization interface of classified
articles using a new topology. Obtained visualizations of computer science
classification and related classified articles are described from different
epistemological perspectives in a series of papers (Osinska, Bala 2010;
Osinska 2010). The authors discern two main practical applications of this
novel approach. The first one is a visual analysis of classification which
consists of scheme evolution and dynamics study as well as their consequent
improvement. The second potential concerns documents navigation space and
semantic retrieval. Because of specialization of the interface, the authors focus
on the first issue i.e. visual analysis of classification and its reliable evaluation.
In the case of classification visualization where a high level of abstraction is
involved, interpretation of graphical distribution of classes is extremely difficult.
The authors have decided to include different groups of users in the
interpretation process.
2. Model of Classification Visualization
The main problem in the graphical presentation is a low dimension topology
accessible in a standard computer monitor. Multidimensional data structure is
displayed on a plane, however the human perception systems are naturally
adapted for spherical vision (Ware, 2004). The target visualization space to be
chosen is the sphere surface because of its good ergonomic properties.
Theoretical basis of dimension reduction of large-scale data was proposed by
Thurston (1997). The analysis method applied by the author tries to exploit the
human perception and cognition mechanisms through 3D computer graphics
3
and interactive interface. This combination leads to natural and efficient
human-computer interaction. Such property of sphere surface as
homeomorphism preserves cohesion in terms of topology. Similar topological
space is used in visualization like Large Map of Science (Klavans & Boyack,
206) and Circos application dedicated to comparative genomics (Krzywinski,
2009). Currently, it is a standard visualization model for considerable data sets.
In the presented study, a well-known cognitive problem of Cartesians
coordinates was eliminated by using nonlinear metrics (Osinska & Bala, 2010).
Datasets constituted documents classified by Computing Classification System
and derived from Association for Computing Machinery Digital Library. The
metadata such as theme categories (classes), title, abstract, subject
descriptors and keywords were used. An original conception was based on
similarity of co-classes and the assumption: similarity degree is proportional to
the number of common publications (Osinska & Bala, 2010). The final number
of all classes and subclasses in collection determined the dimension of co-
classes similarity matrix. To place the nodes of classes and documents on
sphere surface, the multidimensional scaling technique was applied.
Documents nodes formed a pattern according to their thematic similarity. The
nodes were indicated by a proper colour of main class they belonged to
(Figure 1). The authors investigated similar and distinct research fields as well
as clusters organization by means of obtained graphical patterns. They also
analyzed the dynamics of classification due to data series for different
publishing periods with a 10-year step. The results show that visualization of
classified documents reveals both organization of digital library content and
allows to identify hierarchical thematic categories.
Figure 1: Visualization sphere – screenshot of interface for on-line classification
exploration (Application is accessible online:
4
http://www-users.mat.umk.pl/~garfi/vis2009v3).
3. Methodology
3.1. Application
Figure 1 demonstrates the screenshot of the application interface. The user is
able to rotate the sphere, select different combinations of main classes and
thus analyze the graphical distribution of documents nodes. The application is
accessible online in two language versions: Polish and English.
For the interface design, popular Web technologies such as Hyper Text Markup
Language with Cascading Style Sheets and JavaScript are used. This ensures
compatibility with all web browsers, but it is recommended to access the
interface using up-to-date versions of modern web browsers like Mozilla Firefox
or Google Chrome. Rendering of 3D task is provided by “Canvas K3D library”.
3.2. Survey research
Seventeen volunteers were divided into three groups (two students groups –
participants ranging in age from 19 to 26 years, and one experts group),
distinguished according to subjects’ computer usage experience. The first
group included seven undergraduate university students who had been
recruited from different humanistic or social sciences departments. The second
group consisted of seven undergraduate university students of Mathematics
and Computer Science. All participants from both students groups had been
recruited via a word-of mouth or an announcement at Nicolaus Copernicus
University in Torun (Poland) whereby extra course credits were offered as well
as feedback about their performance on the study tasks. The last group
consisted of three experts from Computer Science Faculty at the same
University. Subjects from all three groups were tested individually. They were
asked about key aspects of visualization layout using a questionnaire.
The survey research was oriented towards substantial assessment of
interpretation of graphical representations by both individuals and groups. To
reach uniformity of output quantitative data, series of closed-ended questions
were constructed. Thematic categories labeled with letters A-K predefined an
exhaustive set of responses. Multiple-choice questions indicated the labels
(categories/classes). The authors added simple instructions at the beginning of
the questionnaire form to get respondents-amateurs acquainted with the topic.
The ten questions concerned four crucial characteristics of graphical
distribution of documents nodes related to the three-dimensional layout: (1)
dynamics of changes, (2) clustering degree, (3) closeness and (4) semantic
correlation between two contemporary infrastructure logistics, i.e. “cloud
computing” and “grid computing”. The dynamics was analyzed through seeking
the most changeable patterns across three different years of publishing. First
quaternary questions were concerned to the dynamical or permanent
5
distribution. If a category was highly dynamical, it couldn’t be selected again for
the question about the non-changeable pattern. The next question:
“
Find
overlapping/separate categories by seeking the most/least colours mixing
between groups?” - was related to significant clustering and even distribution
which are mutually exclusive. Some categories of documents indicated by
different colours were located close to each other; some of them were more
distant. The proper question was formulated: “What the categories are the most
distant from each other?”. Semantic distribution could be discovered by
tracking graphical pattern of articles by selected topic across a years. Cloud
computing, the most popular technology today, has evolved from networks
(sub)classes. Three last questions were designed to prove that dependence.
The final item in the questionnaire included all ideas and comments about the
implementation of interface.
4. Results
Evaluation of research based on comparison the true answers defined by
experts and respondents answers. Only the experts may properly comprehend
and interpret classification system evolution. Accuracy was defined as a simple
percentage quotient of correct to all answers. A wide choice of response items
(11 main categories plus negative option) in each question has determined the
survey results to a large extent. It was observed that respondents in each
group displayed two different cognitive styles during exploration of graphical
patterns. The first one related immediate reply and the second was
characterized by longer time to select questionnaire answering sequences. In
the last case, the choice was more reasonable. Those distinct analysis
approaches caused discrepancy in the results and complicated the estimation
process.
Final accuracy varies in the range from 50% to 86% and strongly depends on
the nature of a question. Given response sequences were compared with the
ones accepted by the experts. The lowest accuracy is observed in two cases
related to clustering and overlapping of thematic categories. Finding clusters
was problematic as the respondents wrongly understood the concept and the
formulation of the question was unclear. That can be the reason of
comparatively low accuracy in relative responses. Semantic overlapping is
possible to be revealed through tracking the intensity of colour mixing.
According to suggestions of the experts, some colours combination create
difficulties in complex patterns perception. That problem concerns orange
(H._Information Systems Organization) and aqua (I._Computer Methodologies)
nodes distribution.
The best accuracy has been obtained for the following issues: dynamics
characteristics, even distribution as well as monitoring the cloud computing
patterns. It was noted that Computer Science and Humanities students
exposed different behaviour during interface analysis. Humanities students
6
focused on the practical application (information retrieval, Web browsers,
digitalization, library resources cataloguing) and knowledge domains (Library
and Information Science, computer science, robotics, education) of presented
visualization while computer science students were more interested in the
working principle of application and ignored appearing technical weaknesses of
interface like large delay time or lack of grid sphere.
5. Discussion and Conclusion
Obtained visualization maps might be used for methodological study of original
classification, its development, structure, dynamics as well as efficiency
regarding users. Classified documents form a complex pattern according to
their thematic similarity. The main assumption is that output graphical pattern
corresponds to the semantic structure of classification using relevant
topological space (sphere surface in current paper). Professionals who are
competent in both computer science and science of science issues are able to
comprehend output visualization maps and finally interpret them in the most
rational manner. The experts play such a role in the proposed experiment.
Users-amateurs are involved simultaneously in the process of visual reading.
The main objective is based on comparison of users-nonprofessionals
perception of maps with experts’ insight. The results could provide a basis for
quantitative evaluation of classification visualization approach.
Accuracy of responses does not fall below 50%. The best accuracy (86%)
relates to both dynamics and cloud/grid computing distribution analysis.
Clustering and overlapping of (sub)classes observations deal with great
discrepancy (lowest accuracy). Suitable questions are biased by insufficient
understanding and perception mechanism, for instance, scattered nodes of
isolated category were perceived as cluster. Besides, distance estimation
(proximity task) on sphere surface is problematic due to lack of texture,
coordinates and reference point (something like Greenwich). Experts have also
noticed the difficulty with colours differentiation on a black background.
Except those weaknesses, visualization interface is positively assessed by
users. Interaction and spherical configuration provides the convenient
exploration. By rotating a sphere, it is possible to see a graphical pattern
created by all categories and simultaneously to investigate the distribution of
nodes locally. That technique is widely used in Infoviz and it is called
focus+context (Osinska 2010, Young 1996).
This pilot study was a first step to answer the question how to prove that
visualization method and interface. Distinct groups of users had to investigate
classification visualization using interactive interface. They demonstrate distinct
cognitive styles according to their individual cognitive processes. The authors
plan to extend experiment with a large users group (more that one hundred).
7
Parallely all experts observations and suggestions must be taken into
consideration. Improvement of survey and interface includes:
- better clarification of formulation of such concepts like clustering,
overlapping, closeness;
- consideration of distinct cognitive styles of users;
- sphere rendering including grid, coordinates and reference points;
- textured sphere surface;
- better perceptible colours combination;
Proposed visualization method which provides a nonlinear exploration space
can be very useful for librarians, classifiers, information specialists and all
scientists from interdisciplinary research fields.
References
Borner, K. (2010). Atlas of Science, MA: MIT Press.
Börner, K. (2010) Extracting and Visualizing Semantic Structure in Retrieval Results for
Browsing. In: Proceedings of the fifth ACM conference on Digital Libraries, NY,
USA:ACM, 2010. Available at:
ftp://ftp.cse.buffalo.edu/users/azhang/disc/disc01/cd1/out/papers/dl/p234-borner.pdf
Boyack, K W. et al. (2005). Mapping the backbone of science. Scientometrics. Vol. 64,
no. 3, pp. 351-374. Available at: http://scimaps.org/exhibit/docs/05-boyack.pdf
Birger, H. (2013). User-based and Cognitive Approaches to Knowledge Organization: A
Theoretical Analysis of the Research Literature. Knowledge Organization. 40(1).
Carretié, L. et al. (2003). Cerebral patterns of attentional habituation to emotional visual
stimuli. Psychophysiology, 40, pp. 381-338.
Klavans, R.; Boyack, K. (2006). Quantitative Evaluation of Large Maps of Science.
Scientometrics 68 (3): pp. 475-499, 2006. Available at:
http://www.researchgate.net/publication/220365101_Quantitative_evaluation_of_large_
maps_of_science/file/d912f50757fca9ec7a.pdf
Krzywinski, M. et al. (2009). Circos: an information aesthetic for comparative genomics.
Genome Research, 19(9). Available at:
http://genome.cshlp.org/content/early/2009/06/15/gr.092759.109.full.pdf+html
Osinska, V. (2010). Visual Analysis of Classification Scheme. Knowledge Organization,
37(4).
Osinska, V.; Bala, P. (2010). New Methods for Visualization and Improvement of
Classification Schemes – the case of computer science. Knowledge Organization,
37(3).
Exhibit Purpose and Goals [online] (2007- ). Places@Spaces: Mapping Science.
Available at: http://www.scimaps.org/.
8
Szelag, E.; Dreszer, J.; Lewandowska, M.; Medygral, J.; Osinski, G.; Szymaszek, A.
(2010). Time and Cognition from the Aging Brain Perspective: Individual Differences.
Personality from biological, cognitive and social perspectives. Eliot Werner Publications
INC, 2010, pp. 331-364.
Thurston, W. (1997). Three-dimensional geometry and topology. Princeton
Mathematical Series, Vol.1, 35. Princeton University Press, NJ.
Ware, C. (2004) Information Visualization: Perception for Design. Morgan Kaufmann,
pp. 11, 188, 273.
Young, P. (1996). Three Dimensional Information. [online]. Department of Computer
Science. Available at: http://vrg.dur.ac.uk/misc/PeterYoung/pages/work/documents/lit-
survey/IV-Survey/.
About authors
Veslava Osinska is an assistant professor at the Institute of Information Science and Book
Studies, Nicolaus Copernicus University in Torun where she teaches information architecture,
information visualization, ICT and computer graphics. She has a degree in physics and holds a
PhD in library and information science. Her research interest is in modern information and
knowledge domain visualization with particular interest in applications of nonlinear properties to
information organization and streaming. Veslava is also a member of the Polish Chapter of
International Society of Knowledge Organization and the Polish Computer Science Society
Joanna Dreszer-Drogorob holds a PhD in psychology. She is an assistant in the Multimedia Lab
at the Fine Arts Department at the Nicolaus Copernicus University in Torun where she also teaches
cognitive psychology. Her research interests are in developmental aspects in time perception,
neural basis of human cognition, intelligence, cognitive abilities. She is also interested in the
dynamics in social psychology, such as dynamics in attitudes, feeling, and self-esteem.
Grzegorz Osinski is a computer scientist, neuroscientist and physicist. His current research is in
nonlinear dynamics systems in biomedical sciences where he applies numerical methods and
computer simulations for modeling behaviours of neural correlates in different dynamical states. His
other interests are in cognitive aspects of neural activity in perception and communication.
processes.
Michal Gawarkiewicz holds a degree in computer science from the Nicolaus Copernicus
University in Torun. His master’s thesis was on semantic memories in narrow domain using
machine-readable information. His current PhD studies are on database optimizations. Michal also
teaches programming classes and works in the area of data processing and mobile technologies.