Visual Analysis of Bipartite Biological Networks.
ABSTRACT In life sciences, the importance of complex network visualization is ever increasing. Yet, existing approaches for the visualization of networks are general purpose techniques that are often not suited to support the specific needs of researchers in the life sciences, or to handle the large network sizes and specific network characteristics that are prevalent in the field. Examples for such networks are biomedical ontologies and biochemical reaction networks, which are bipartite networks – a particular graph class which is rarely addressed in visualization. Our table-based approach allows to visualize large bipartite networks alongside with a multitude of attributes and hyperlinks to biological databases. To explore complex network motifs and perform intricate selections within the visualized network data, we introduce a new script-based brushing mechanism that integrates naturally with the interlinked, tabular representation. A prototype for exploring bipartite graphs, which uses the proposed visualization and interaction techniques, is also presented and used on real data sets from the application domain.
[Show abstract] [Hide abstract]
ABSTRACT: Large scale data analysis is nowadays a crucial part of drug discovery. Biologists and chemists need to quickly explore and evaluate potentially effective yet safe compounds based on many datasets that are in relationship with each other. However, there is a lack of tools that support them in these processes. To remedy this, we developed ConTour, an interactive visual analytics technique that enables the exploration of these complex, multi-relational datasets. At its core ConTour lists all items of each dataset in a column. Relationships between the columns are revealed through interaction: selecting one or multiple items in one column highlights and re-sorts the items in other columns. Filters based on relationships enable drilling down into the large data space. To identify interesting items in the first place, ConTour employs advanced sorting strategies, including strategies based on connectivity strength and uniqueness, as well as sorting based on item attributes. ConTour also introduces interactive nesting of columns, a powerful method to show the related items of a child column for each item in the parent column. Within the columns, ConTour shows rich attribute data about the items as well as information about the connection strengths to other datasets. Finally, ConTour provides a number of detail views, which can show items from multiple datasets and their associated data at the same time. We demonstrate the utility of our system in case studies conducted with a team of chemical biologists, who investigate the effects of chemical compounds on cells and need to understand the underlying mechanisms.IEEE Transactions on Visualization and Computer Graphics 12/2014; 20(12):1883-1892. DOI:10.1109/TVCG.2014.2346752 · 1.92 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: Jointly analyzing biological pathway maps and experimental data is critical for understanding how biological processes work in different conditions and why different samples exhibit certain characteristics. This joint analysis, however, poses a significant challenge for visualization. Current techniques are either well suited to visualize large amounts of pathway node attributes, or to represent the topology of the pathway well, but do not accomplish both at the same time. To address this we introduce enRoute, a technique that enables analysts to specify a path of interest in a pathway, extract this path into a separate, linked view, and show detailed experimental data associated with the nodes of this extracted path right next to it. This juxtaposition of the extracted path and the experimental data allows analysts to simultaneously investigate large amounts of potentially heterogeneous data, thereby solving the problem of joint analysis of topology and node attributes. As this approach does not modify the layout of pathway maps, it is compatible with arbitrary graph layouts, including those of hand-crafted, image-based pathway maps. We demonstrate the technique in context of pathways from the KEGG and the Wikipathways databases. We apply experimental data from two public databases, the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA) that both contain a wide variety of genomic datasets for a large number of samples. In addition, we make use of a smaller dataset of hepatocellular carcinoma and common xenograft models. To verify the utility of enRoute, domain experts conducted two case studies where they explore data from the CCLE and the hepatocellular carcinoma datasets in the context of relevant pathways.BMC Bioinformatics 11/2013; 14 Suppl 19(Suppl 19):S3. DOI:10.1186/1471-2105-14-S19-S3 · 2.67 Impact Factor
Conference Paper: Improving Co-Cluster Quality with Application to Product Recommendations[Show abstract] [Hide abstract]
ABSTRACT: Businesses store an ever increasing amount of historical customer sales data. Given the availability of such information, it is advantageous to analyze past sales, both for revealing dominant buying patterns, and for providing more targeted recommendations to clients. In this context, co-clustering has proved to be an important data-modeling primitive for revealing latent connections between two sets of entities, such as customers and products. In this work, we introduce a new algorithm for co-clustering that is both scalable and highly resilient to noise. Our method is inspired by k-Means and agglomerative hierarchical clustering approaches: (i) first it searches for elementary co-clustering structures and (ii) then combines them into a better, more compact, solution. The algorithm is flexible as it does not require an explicit number of co-clusters as input, and is directly applicable on large data graphs. We apply our methodology on real sales data to analyze and visualize the connections between clients and products. We showcase a real deployment of the system, and how it has been used for driving a recommendation engine. Finally, we demonstrate that the new methodology can discover co-clusters of better quality and relevance than state-of-the-art co-clustering techniques.Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai, China; 01/2014