ArticlePDF Available

Abstract and Figures

Abstract The twenty first century sees the tremendous advancement of computer and machine technologies that are able to produce ginormous amout of data. Current software architecture, management and analysis approaches are unable to cope with the flood of data. The challenge of understanding large and complex data includes issues such as clutter, performance, information loss and limited cognition. Medical field involves analyzing the body system which includes many different scientists and medical professionals. The datasets are a hybrid of many different medical areas databases to understand and answer the many questions of the human body. This paper explores the capability of interactive star coordinate visualization technique to identify clusters correlation between selected attributes using interactive star coordinate for multi-dimensional datasets An interactive Star coordinates is designed consists of four stages that includes Information Objects Transformation; Dimension Mapping; Interactive Features design and Coloring. Finally the performance of the interactive star coordinates is compared to histograms of the data of interest. Interactive star coordinate is found as a promising method of visualizing information clusters pattern which provides one of the means for fast decision making.
Content may be subject to copyright.
Procedia Computer Science 42 ( 2014 ) 247 254
Available online at www.sciencedirect.com
1877-0509 © 2014 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/3.0/).
Peer-review under responsibility of the Center for Humanoid Robots and Bio-Sensing (HuRoBs)
doi: 10.1016/j.procs.2014.11.059
ScienceDirect
International Conference on Robot PRIDE 2013-2014 - Medical and Rehabilitation Robotics and
Instrumentation, ConfPRIDE 2013-2014
Multidimensional Data Medical Dataset Using Interactive
Visualization Star Coordinate Technique
Noor Elaiza Abd Khalid
a
, Marina Yusoff
b
, Ezzatul Akma Kamaru-Zaman
c
, Izyan Izzati
Kamsani
d
*
a,b,c,d
Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, 40000, Malaysia
Abstract
The twenty first century sees the tremendous advancement of computer and machine technologies that are able to
produce ginormous amout of data. Current software architecture, management and analysis approaches are unable to
cope with the flood of data. The challenge of understanding large and complex data includes issues such as clutter,
performance, information loss and limited cognition. Medical field involves analyzing the body system which
includes many different scientists and medical professionals. The datasets are a hybrid of many different medical
areas databases to understand and answer the many questions of the human body. This paper explores the capability
of interactive star coordinate visualization technique to identify clusters correlation between selected attributes using
interactive star coordinate for multi-dimensional datasets An interactive Star coordinates is designed consists of
four stages that includes Information Objects Transformation; Dimension Mapping; Interactive Features design and
Coloring. Finally the performance of the interactive star coordinates is compared to histograms of the data of
interest. Interactive star coordinate is found as a promising method of visualizing information clusters pattern which
provides one of the means for fast decision making.
© 2014 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the Center for Humanoid Robots and Bio-Sensing (HuRoBs).
Keywords: Big data; Clutter; Decision making; Histogram graph; Multidimensional data; Star coordinate technique; Visualization
* Corresponding author. Tel.:
+60192692717
; fax: +60355435501.
E-mail address: elaiza@tmsk.uitm.edu.my
© 2014 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/3.0/).
Peer-review under responsibility of the Center for Humanoid Robots and Bio-Sensing (HuRoBs)
248 Noor Elaiza Abd Khalid et al. / Procedia Computer Science 42 ( 2014 ) 247 – 254
1. Introduction
Recent advent of technology has provided tremendous amount of data and information in various fields such
businesses, economy, healthcare, biomedical and bioinformatics [1][2]. Datasets in these fields usually include
multidimensional datasets consisting of a significantly larger range of attributes [3]. Gaining insight about of data is
not just a matter of presenting it, but translating data into information as observed by Spence [4]. Foraging and
unearthing vital hidden links between the diverse variables and parameters within voluminous datasets is a slow
painstaking, tedious and complex tasks. Analysing information within a pool of data may cause complexity in term
of clutter, performance, information loss and limited cognition [5].
Data visualization method provides the means to summarize and interpret large data promptly [6]. Even though
many methods have been developed, the scientific community has yet to create and agree upon standard tools for
data visualization, manipulation and analysis. The existence of multi-dimentional datasets further complicates the
analysis. Thus, more robust visualization technique is needed to create intelligent visualization designs such an
interactive view [7]. It allows users to view data in different angles, axis and attributes manipulation. Applying
colors allows instant recognition of similarities or differences of the large data items and expressed attributes
relationship [8]. Interactive visualization is able represent huge amount of information coherently, compactly from
different viewpoints and provides several level of details [6]. Star coordinate allows interactive online manipulation
of attributes dimension[9]. Kandogan [10] found it to be useful in gaining insight (not numerical analysis) into
hierarchical clustered datasets.This research aims apply the flexibility of interactive star coordinate data
visualization technique to uncover clusters of hidden associations and relationships between the data within a span
of attributes.
2. Proposed Method
This research work consists of four phases as illustrated in Figure 1 including the data collection; interactive star
coordinates design; manipulation of the interactive features and usability and accessibility.
Fig. 1. Design process for ViStar.
2.1. Data Collection
In this study, six hundred and ninty nine retrospective datasets of early stage (Stage I and II) Estrogen Positive
(EC+) Breast Cancer of women treated and not treated with tamoxifenmonotherapy diagnosed between 1980 to1995
from GEO Database [11].
2.2. Interactive Star Coordinate engine design phase
This phase consists of four stages that includes Information Objects Transformation, Dimension Mapping,
Interactive Features design and Coloring.
Stage1 involves the transformation of information objects from the data file. This involves assigning numerical
Information object transformation
Dimension ma
pp
in
g
Interactive features design
Colorin
g
Data Collection
Designing the interactive star coordinates
Usabilit
y
& accessabilit
y
Manipulation of the interactive features
phase
249
Noor Elaiza Abd Khalid et al. / Procedia Computer Science 42 ( 2014 ) 247 – 254
values to non numerical values. Subsequently, the data are arranged into a matrix where columns represent the
dimensions and row values for each field in a record. Figure 2 shows a matrix of information objects P
1,
P
2,
..., P
n
which are associated with each of the six fields or attribute of the dataset denoted as.
where
P
n
: n information objects ; n = 699 .
F
i
: i attributes or dimensions; i = 6
Fig. 2. Information objects to matrix transformation.
Stage 2 involves mapping each information object onto the star coordinate axes. The axes C
1
,...,C
6
denoted the
fields or dimensions share a common origin, which in the Cartesian Coordinate system may be conveniently denoted
by (0,0) shown in Figure 3. Each field vector f
1,
...,f
6
is calculated by multiplying the distance with its
corresponding unit vector, oriented in a direction along the axis,C
j
. Subsequently, vector P
j
(x,y) denoting the final
point are calculated based on equation 1.
Fig. 3. Mapping Architecture.
Each point are calculated according to Equation 1 from Kadogan [12].
)).(du),.(du((x,y)P
iji
n
iyiiji
n
ixij
minmin
11
=
¦
¦
==
(1)
where
,
minmax
),,...,,...,,(
1
ii
ijnjijjoj
c
uddddD
==
(2)
where,
)sin,(cos),(
α
α
==
yixii
uuu
},__0,min{max elementsofnumberjd
jii
¢=
}.__0,max{min elementsofnumberjd
jii
¢
=
Stage 3 involves incorporating interactive features including scaling and rotation features for online data
manipulation. Scaling allows users to change one or more axis length concurrently. Scaling feature involves
recalculating the contribution of attributes by multiplying the ratios accordingly to ‘mapping’ equation and re-
250 Noor Elaiza Abd Khalid et al. / Procedia Computer Science 42 ( 2014 ) 247 – 254
mapped in accordance to the new scaling factor as in Equation 3 [10].
scale
c
ii
*
minmax
(3)
Rotation on the other hand, provides users the facility to rotate the axis which re-correlate the relationships
between atributes. Rotation changes the axis angle and re-distributes the scatter plots as in equation 4.
mi
m
i
m
i
v
i
,...,1,1,
2
sin,
2
cos =
¸
¹
·
¨
©
§
=
ππ
(4)
Stage 4 involves coloring mapped data according to specified categories. Coloring created another dimension of
data visualization and can be classified into interactivity features as users are given the liberty to select a variety of
colors to represent different attribute values in the dimension. It involves categorizing data based on similar factors
and assigning colors to each group of factors. Thus, enabling visualization of information data distribution or clear
clustering effect.
2.3. Manipulation of the interactive features phase
Initial stage of this phase involves gaining knowledge and understanding of the multidimensional data mapped
into the star coordinate. This is done through comparing the star coordinate views with the quantitative views using
the histogram.
The next experiment involves the manipulation of the incorporated interactive feature to produce the clustering
effect. This phase involved three processes which are scaling; rotation and coloring process.
Process 1 involves scaling features where the length of the axis is changed based on the quantitative analysis
using histograms.In this experiment the attribute is adjusted to 2:1 ratio of other attributes and vice versa for
example. The ratios are then multiplied to mapping calculation. The data are re-mapped again according to the new
scaling factor depending on the length of the axis.
Process 2 involves rotation features which it is used to determine the correlation between selected attributes. A
function is created to store the rotated angle value and send the parameter to mapping function. As angle changes,
recalculation is made and the data are re-mapped again.
Process 3 involves coloring features. This process is also beneficial in visualization to differentiate the required
information and also can be used to identify clusters if any. In this process, the data are categorized in same factors
and then colors are assigned to each other. When users select coloring feature, data are plotted in its corresponding
colors.
2.4. Usability & accessibility
In this phase, the clustering outcome of the previous phase are being presented Professor Dr. Mohd Zaki Salleh
and Professor Dr. Teh Lay Kek experts in the pharmacy domain. Since dataset is multi-factorial dataset, many
factors need to be considered. They suggested that high correlated attributes must be positioned next to each other to
highlight important information relationships and optimize analysis.
3. Result and Discussion
The results discussion is divided into two sections; comparing star coordinates with histogram visualization of
the same attributes and clustering multidimensional data using the interactive features.
3.1. Comparison of star coordinates color data distribution and histogram
Comparison between star coordinates and histogram are made according to attributes clusters. Star coordinates
have limitation by providing the amount of data in each cluster of numerical data compared to histogram. However,
it provides a better illustration in terms of data relationship distribution and are able to categorize non-numerical
data as depicted in table 1 and table 1. Table 1 depicts the visualization outcome of real numerical values. The
251
Noor Elaiza Abd Khalid et al. / Procedia Computer Science 42 ( 2014 ) 247 – 254
histogram provides the frequency of each attribute. While star coordinate visualizes clusters during the data
mapping.
Analysis of age attributes (numerical data) shows various colors that represent different ranges of age during the
mapping clearly shows clustered data (range age between 60-65 has the highest frequency). However, conclusion
about the frequency of patients based on age are just a guess. A histogram of the age range is able to verify this
guess quantitatively. The same goes with the tumor size analysis.
Table 1. Comparison between star coordinates with histogram to identify clusters for age and tumor size.
Table 2 depicts the visualization outcome of discrete numerical (Elston Ellis Grade) and non numerical
(Tamoxifen Treatment) data. The histogram of the Elston Ellis Grade data shows the frequency of each discrete data
whereas the star coordinates shows the discrete data distribution in relation with other attributes. Analysis of
tamoxifen treatment attributes (categorical data) shows two clusters that represent the different colors for No and
Yes classification in star coordinate. Here we can see that most patients are on medication but we did not know
whether it is accurate or not. In this case, the histogram is applied to see the frequency of patients on medication.
Furthermore, it is proved that majority patients are on medication for breast cancer treatment.
Attributes Histogram Star Coordinate
Age
Tumor Size
252 Noor Elaiza Abd Khalid et al. / Procedia Computer Science 42 ( 2014 ) 247 – 254
Table 2. Comparison between star coordinates with histogram to identify clusters for Elston Ellis Grade and
Tamoxifen Treatment.
This analysis shows that star coordinate is suitable for illustrating for non-numerical data (categorical data)
while histogram is for numerical data (non-categorical data). Attributes for age and tumor size are examples of
numerical data while Elston-Ellis Grade and tamoxifen treatment are fall under non-numerical data (categorical
data). Attributes of age and tumor size clearly shows that star coordinate is not preferable when analyzing numerical
data compared to the histogram.
3.2. Interactive manipulation results
This section discusses the data clustering observed from interactive star coordinates using scaling , rotation and
coloring interactive features.
3.2.1. Scaling results
Initially, all the axis scale size for all attributes (axis) are set to 1. The data point is observed as coarsely scattered
over each attribute. Scaling transformation allows the users to change the length of the corresponding dimensional
axis. Generally, scaling may be applied to one or more axis. Scaling is crucial to observe relationship between
dimension in the specific range of the scale. This often result in the visualization of data similarities when data on
different factors fall under same cluster. In this case, some form of clustering is revealed when the scale size is set to
0.2 for axis 1 as observed in table 3.
Histogram Star Coordinate
Elston Ellis
Grade
Tamoxifen
Treatment
253
Noor Elaiza Abd Khalid et al. / Procedia Computer Science 42 ( 2014 ) 247 – 254
Table 3. Comparison of scaling result (before and after)
3.2.2. Rotation results
Users are allowed to rotate particular attribute by adjusting the angle value of the axis. As angle change,
recalculation is made and the data are mapped again. Each angle represent for each attribute; axis 1 (age), axis 2
(disease free survival days), axis 3 (disease metastasis free survival days), axis 4 (Elston Ellis Grade), axis 5
(tamoxifen treatment) and axis 6 (tumor size). Rotation makes selected fields or attributes (axis 1to axis 4) more or
less correlated to other attributes. Figure 6 shows the visual analysis using interactive star coordinates is able to form
two clusters when four fields/attributes(axis 1 to axis 4: age, disease free survival days, disease metastasis free
survival days and Elston Ellis Grade respectively) are aligned in the same angular direction. In this case the Age
field/attribute plays significant role in surviving breast cancer, depending on the level of tumor grade.
Fig. 4. Star Coordinate : After rotation.
3.2.3. Coloring Results
Colors created another dimensions in data visualization. Attributes used for this analysis are axis 1 (age), axis 2
(disease free survival days), axis 3 (disease metastasis free survival days), axis 4 (Elston Ellis Grade), axis 5
(tamoxifen treatment) and axis 6 (tumor size). Blue color represents for tamoxifen treatment (No) and red color
represents for tamoxifen treatment (Yes).
Before scaling
After scaling. Scale size =0.2 for axis 1
Axes are
rotated
254 Noor Elaiza Abd Khalid et al. / Procedia Computer Science 42 ( 2014 ) 247 – 254
Fig. 5. Star Coordinate : Tamoxifen Treatment chosen as coloring attribute.
Fig. 5. shows that most patients treated with tamoxifen (Yes) with large tumor size in red appears on axis 6,
while patients with small tumor size who are not treated with tamoxifen (No) in blue appears on axis 2. Based on
this analysis, it can be concluded that when patients get treatment at an early stage have longer survival free days.
4. Conclusion
Interactive Star Coordinate technique has provided the transformation of data from tabular form into a more
understandable visualization view. Furthermore, this technique provides a method that enhances user capability in
gaining insight multidimensional information sets. As a result, interactive star coordinates contains various
capabilities, including representing multidimensional information sets (such as Breast Cancer Data) in two
dimensional spaces and assisting in obtaining intuitive information of patients clinical pattern. The interactive
features incorporated into star coordinates can successfully visualize the data categories into clusters that are much
easier to recognize. Thus, facilitates user understanding of large and multidimensional data that can assist in
decision making. However, a more accurate quantitative value can be obtained using histograms.
Acknowledgement
The authors wish to thank to UiTM for the facilities provided. Special gratitude to Prof Dr. Mohd Zaki Salleh
and Prof Teh Lay Kek from Pharmacy faculty (UiTM) for their advice, comments, guidance and sharing information
throughout this research.
References
[1] J. Barkai, “Using Visual Decision Making to Optimize Manufacturing Design and Development,” 2012.
[2] N. Andrienko, G. Andrienko, S. Birlinghoven, and S. Augustin, “Informed Spatial Decision Making Using Coordinated Views,” 2003.
[3] G. Dzemyda, O. Kurasova, and J. Zilinskas, Multidimensional Data Visualization. Springer Optimization and Its Applications, 2013, p. 248.
[4] R. Spence, Information Visualization. ACM Press Book, 2001.
[5] Enrico, “How do you visualize too much data,” 2011. [Online]. Available: http://fellinlovewithdata.com/guides/how-do-you-visuali ze-too-
much-data.
[6] M. Khan and S. S. Khan, “Data and Information Visualization Methods , and Interactive Mechanisms: A Survey,” Int. J. Comput. Appl.,
vol. 34, no. 1, pp. 1–14, 2011.
[7] Q. V. Nguyen, G. Nelm es, M. L. Huang, S. Simoff, and D. Catchpoole, “Interactive Visualization for Patient-to-Patient Comparison.,”
Genomics Inform., vol. 12, no. 1, pp. 21–34, Mar. 2014.
[8] D. A. Keim, “Information Visualization and Visual Data Mining,” IEEE Trans. Vis. Comput. Graph., vol. 8, no. 1, pp. 1–8, 2002.
[9] W. W. Chan, “A Survey on Multivariate Data Visualization,” 2006.
[10] E. Kandogan, H. Road, and S. Jose, “Star Coordinates: A Multi-dimensional Visualization Technique with Uniform Treatment of
Dimensions,” Proc. IEEE Inf. Vis. Symp. Vol. 650, 2000.
[11] S. Loi, B. Haibe-Kains, C. Desmedt, P. Wirapati, F. Lallemand, A. M. Tutt, C. Gillet, P. Ellis, K. Ryder, J. F. Reid, M. G. Daidone, M. a
Pierotti, E. M. Berns, M. P. Jansen, J. a Foekens, M. Delorenzi, G. Bontempi, M. J. Piccart, and C. Sotiriou, “Predicting prognosis using
molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen.,” BMC Genomics, vol. 9, p. 239, Jan. 2008.
[12] E. Kandogan and S. Jose, “Visualizing Multi-dimensional Clusters , Trends , and Outliers using Star Coordinates,” Proc. seventh ACM
SIGKDD Int. Conf. Knowl. Discov. data Min., pp. 107–116, 2001.
Different
colors
... SCs have been used in literature for decision tree construction to classify different objects [53], perform cluster analysis [54], finding trends for decision making [55], and visualizing linearly separable clusters [56]. However, for our data sets with hundreds of dimensions, it would not be possible to interact with the vectors in an intuitive way. ...
Article
Full-text available
Mass spectrometry imaging (MSI) is an imaging technique used in analytical chemistry to study the molecular distribution of various compounds at a micro-scale level. For each pixel, MSI stores a mass spectrum obtained by measuring signal intensities of thousands of mass-to-charge ratios (m/z-ratios), each linked to an individual molecular ion species. Traditional analysis tools focus on few individual m/z-ratios, which neglects most of the data. Recently, clustering methods of the spectral information have emerged, but faithful detection of all relevant image regions is not always possible. We propose an interactive visual analysis approach that considers all available information in coordinated views of image and spectral space visualizations, where the spectral space is treated as a multi-dimensional space. We use non-linear embeddings of the spectral information to interactively define clusters and respective image regions. Of particular interest is, then, which of the molecular ion species cause the formation of the clusters. We propose to use linear embeddings of the clustered data, as they allow for relating the projected views to the given dimensions. We document the effectiveness of our approach in analyzing matrix-assisted laser desorption/ionization (MALDI-2) imaging data with ground truth obtained from histological images.
... Human Computer Interaction (HCI) principles are highly relevant in this domain and the user experience could be enhanced by incorporating principles such as gestault laws of grouping, and providing frequent and appropriate feedback (Seokyeon et al. 2015). Similarly, the use of colours is a simple way to highlight similarities or differences in the data (Elaiza et al. 2014). As mentioned in Sect. ...
Chapter
Full-text available
The continual growth of big data necessitates efficient ways of analysing these large datasets. Data visualisation and visual analytics has been identified as a key tool in big data analysis because they draw on the human visual and cognitive capabilities to analyse data quickly, intuitively and interactively. However, current visualisation tools and visual analytical systems fall short of providing a seamless user experience and several improvements could be made to current commercially available visualisation tools. By conducting a systematic literature review, requirements of visualisation tools were identified and categorised into six groups: dimensionality reduction, data reduction, scalability and readability, interactivity, fast retrieval of results, and user assistance. The most common themes found in the literature were dimensionality reduction and interactive data exploration.
... Clustering methods have been highlighted in many research and applied in many domains [9][10][11][12][13]. In clustering the idea is not to predict the target class as like classification, it is more ever trying to group the similar kind of things by considering the most satisfied conditions all the items in the same group should be similar and no two different group items should not be similar [14]. ...
Article
Full-text available
Students' performance is a key point to get a better first impression during a job interview with an employer. However, there are several factors, which affect students' performances during their study. One of them is their learning style, which is under Neurolinguistic Programming (NLP) approach. Learning style is divided into a few behavioral categories, Visual, Auditory and Kinesthetics (VAK). This paper addresses the evaluation of clustering methods for the identification of learning style based on system preferences. It starts with the distribution of questionnaires to acquire the information on the VAK for each student. About 167 respondents in the Faculty of Computer and Mathematical Science are collected. It is then pre- processed to prepare the data for clustering method evaluations. Three clustering methods; Simple K-Mean, Hierarchical and Density-Based Spatial Clustering of Applications with Noise are evaluated. The findings show that Simple K-Mean offers the most accurate prediction. Upon completion, by using the dataset, Simple K-Means technique estimated four clusters that yield the highest accuracy of 74.85 % compared to Hierarchical Clustering, which estimated four clusters and Density- Based Spatial Clustering of Applications with Noise which estimated three clusters with 52.69% and 61.68 % respectively. The clustering method demonstrates the capability of categorizing the learning style of students based on three categories; visual, auditory and kinesthetic. This outcome would be beneficial to lecturers or teachers in university and school with an automatically clustering the students' learning style and would assist them in teaching and learning, respectively.
Chapter
Full-text available
Solar energy supplies pure environmental-friendly and limitless energy resource for human. Although the cost of solar panels has declined rapidly, technology gaps still exist for achieving cost-effective scalable deployment combined with storage technologies to provide reliable, dispatchable energy. However, it is difficult to analyze a solar data, in which data was added in every 10 min by the sensors in a short time. These data can be analyzed easier and faster with the help of data visualization. One of the popular data visualization methods for displaying massive quantity of data is parallel coordinates plot (PCP). The problem when using this method is this abundance of data can cause the polylines to overlap on each other and clutter the visualization. Thus, it is difficult to comprehend the relationship that exists between the parameters of solar data such as power rate produced by solar panel, duration of daylight in a day, and surrounding temperature. Furthermore, the density of overlapped data also cannot be determined. The solution is to implement clutter-reduction technique to parallel coordinate plot. Even though there are various clutter-reduction techniques available for visualization, they are not suitable for every situation of visualization. Thus this research studies a wide range of clutter-reduction techniques that has been implemented in visualization, identifies the common features available in clutter-reduction technique, produces a conceptual framework of clutter-reduction technique as well as proposes the suitable features to be added in parallel coordinates plot of solar energy data to reduce visual clutter.
Article
Full-text available
With availability of enough visualization techniques it can be very confusing to know what and when should be appropriate technique to use in order to convey maximum possible understanding. The basic purpose of visual representation is to efficiently interpret what is insight, as easy as possible. Different available visualization techniques are use for different situation which convey different level of understanding. This document is guide for the young researchers who wants to start work in visualization. The purpose of this piece of document is to collect all visualization techniques with their brief introduction. This paper deals with many definitions and aspects of visualization, how visualization take place i.e. different steps of visualization process, problems that are confront in visualization, categorization of visualization techniques on the bases of distinct perspective, typically known common data and information visualization techniques, basic interactive methods for visualization their advantages and disadvantages, interactivity process, and the scope of visualization up to some extent in different field of research.
Article
Full-text available
A visual analysis approach and the developed supporting technology provide a comprehensive solution for analyzing large and complex integrated genomic and biomedical data. This paper presents a methodology that is implemented as an interactive visual analysis technology for extracting knowledge from complex genetic and clinical data and then visualizing it in a meaningful and interpretable way. By synergizing the domain knowledge into development and analysis processes, we have developed a comprehensive tool that supports a seamless patient-to-patient analysis, from an overview of the patient population in the similarity space to the detailed views of genes. The system consists of multiple components enabling the complete analysis process, including data mining, interactive visualization, analytical views, and gene comparison. We demonstrate our approach with medical scientists on a case study of childhood cancer patients on how they use the tool to confirm existing hypotheses and to discover new scientific insights.
Article
Full-text available
Visualizing multi-dimensional data has tremendous effects on science, engineering, and business decision-making. A new visualization technique called Star Coordinates is presented to support users in early stages of their visual thinking activities. Star Coordinates arranges coordinates on a circle sharing the same origin at the center. It uses simply points to represent data, treating each dimension uniformly at the cost of coarse representation. Current implementation of Star Coordinates provided valuable insight on several real data sets for cluster discovery and multi-factor analysis tasks. The work on Star Coordinates will continue on developing advanced transformations that will improve data understanding in multi-dimensions.
Article
Full-text available
Estrogen receptor positive (ER+) breast cancers (BC) are heterogeneous with regard to their clinical behavior and response to therapies. The ER is currently the best predictor of response to the anti-estrogen agent tamoxifen, yet up to 30-40% of ER+BC will relapse despite tamoxifen treatment. New prognostic biomarkers and further biological understanding of tamoxifen resistance are required. We used gene expression profiling to develop an outcome-based predictor using a training set of 255 ER+ BC samples from women treated with adjuvant tamoxifen monotherapy. We used clusters of highly correlated genes to develop our predictor to facilitate both signature stability and biological interpretation. Independent validation was performed using 362 tamoxifen-treated ER+ BC samples obtained from multiple institutions and treated with tamoxifen only in the adjuvant and metastatic settings. We developed a gene classifier consisting of 181 genes belonging to 13 biological clusters. In the independent set of adjuvantly-treated samples, it was able to define two distinct prognostic groups (HR 2.01 95%CI: 1.29-3.13; p = 0.002). Six of the 13 gene clusters represented pathways involved in cell cycle and proliferation. In 112 metastatic breast cancer patients treated with tamoxifen, one of the classifier components suggesting a cellular inflammatory mechanism was significantly predictive of response. We have developed a gene classifier that can predict clinical outcome in tamoxifen-treated ER+ BC patients. Whilst our study emphasizes the important role of proliferation genes in prognosis, our approach proposes other genes and pathways that may elucidate further mechanisms that influence clinical outcome and prediction of response to tamoxifen.
Chapter
In this chapter, an analytical review of methods for multidimensional data visualization is presented. The methods based on direct visualization and projections are described. Some quantitative criteria of the visualization quality are also introduced.
Conference Paper
Interactive visualizations are effective tools in mining scientific, engineering, and business data to support decision-making activities. Star Coordinates is proposed as a new multi-dimensional visualization technique, which supports various interactions to stimulate visual thinking in early stages of knowledge discovery process. In Star Coordinates, coordinate axes are arranged on a two-dimensional surface, where each axis shares the same origin point. Each multi-dimensional data element is represented by a point, where each attribute of the data contributes to its location through uniform encoding. Interaction features of Star Coordinates provide users the ability to apply various transformations dynamically, integrate and separate dimensions, analyze correlations of multiple dimensions, view clusters, trends, and outliers in the distribution of data, and query points based on data ranges. Our experience with Star Coordinates shows that it is particularly useful for the discovery of hierarchical clusters, and analysis of multiple factors providing insight in various real datasets including telecommunications churn.
Article
Many people and institutions possess considerable volumes of data which may 'hide' some fundamental relation which could be exploited to advantage: estate agents, banks, medical researchers, fraud investigators and many others would like to be able to view some graphical presentation of that data and perhaps interact with it and, at some point, be able to say "Ah Ha! Now that is interesting!" That is what Information Visualization is all about: it is the process of forming a mental model of data, thereby supporting insight into that data.