HiTSEE: A Visualization Tool for Hit Selection and Analysis in High-Throughput Screening Experiments

To read the full-text of this research, you can request a copy directly from the authors.


We present HiTSEE (High-Throughput Screening Exploration Environment) a visualization tool for the analysis of large chemical screens for the analysis of biochemical processes. The tool supports the analysis of structure-activity relationships (SAR analysis) and, through a flexible interaction mechanism, the navigation of large chemical spaces. Our approach based on the projection of one or few molecules of interest and the expansion around their neighborhood allows for the exploration of large chemical libraries without the need to create an all encompassing overview of the whole library. We describe the requirements we collected during our collaboration with biologists and chemists, the design rationale behind the tool, and two case studies.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Recently, tools tailored to the specific needs of life scientists in the chemical biology, medicinal chemistry and pharmaceutical domain were developed. These include MONA 2 [21], Screening Assistant 2 [22], DataWarrior [23], the Chemical Space Mapper (CheS-Mapper) [16,17] and the High-Throughput Screening Exploration Environment (HiTSEE) [24,25]. The last two tools complement the workflow environment KNIME with a visualization node. ...
Full-text available
The era of big data is influencing the way how rational drug discovery and the development of bioactive molecules is performed and versatile tools are needed to assist in molecular design workflows. Scaffold Hunter is a flexible visual analytics framework for the analysis of chemical compound data and combines techniques from several fields such as data mining and information visualization. The framework allows analyzing high-dimensional chemical compound data in an interactive fashion, combining intuitive visualizations with automated analysis methods including versatile clustering methods. Originally designed to analyze the scaffold tree, Scaffold Hunter is continuously revised and extended. We describe recent extensions that significantly increase the applicability for a variety of tasks.
Full-text available
Live cell imaging is an important biomedical research paradigm for studying dynamic cellular behaviour. Although phenotypic data derived from images are difficult to explore and analyse, some researchers have successfully addressed this with visualization. Nonetheless, visualization methods for live cell imaging data have been reported in an ad hoc and fragmented fashion. This leads to a knowledge gap where it is difficult for biologists and visualization developers to evaluate the advantages and disadvantages of different visualization methods, and for visualization researchers to gain an overview of existing work to identify research priorities. To address this gap, we survey existing visualization methods for live cell imaging from a visualization research perspective for the first time. Based on recent visualization theory, we perform a structured qualitative analysis of visualization methods that includes characterizing the domain and data, abstracting tasks, and describing visual encoding and interaction design. Based on our survey, we identify and discuss research gaps that future work should address: the broad analytical context of live cell imaging; the importance of behavioural comparisons; links with dynamic data visualization; the consequences of different data modalities; shortcomings in interactive support; and, in addition to analysis, the value of the presentation of phenotypic data and insights to other stakeholders.
Conference Paper
Exploration of the chemical space is an important component of drug discovery process and its importance grows with the increase in the computation power which allows to explore larger areas of the chemical space. Recently, there emerged new algorithms proposed to automatically generate and search for compounds (objects in the chemical space) with desired properties. Although these approaches can be a big help, human interaction is usually still inevitable in the end. Visualization of the space can help make sense of the generated data and therefore visualization techniques are usually an integral part of any task related to chemical space exploration. Currently, there exist methods dealing with visualization of the chemical space but there is no framework supporting simple development of new methods. The purpose of this paper is to introduce such a modular framework called ViFrame. ViFrame offers the possibility to implement every single part of the visualization pipeline consisting of steps such as reading and merging molecules from multiple data sources, applying transformations and, of course, visualization of the data set in 2D space. The advantage of the framework consists in providing an environment where the user can focus on the development of the previously mentioned tasks while the framework supports seamless integration of the developed components. The framework also incorporates an application that provides the user with graphical interface for modules manipulation and presentation of the visualization results. For simple utilization of the application without the necessity of implementation of one's own module, several visualization methods have been implemented.
When representing 2D data points with spacious objects such as labels, overlap can occur. We present a simple algorithm which modifies the (Mani-) Wordle idea with scan-line based techniques to allow a better placement. We give an introduction to common placement techniques from different fields and compare our method to these techniques w.r.t. euclidean displacement, changes in orthogonal ordering as well as shape and size preservation. Especially in dense scenarios our method preserves the overall shape better than known techniques and allows a good trade-off between the other measures. Applications on real world data are given and discussed. © 2012 Wiley Periodicals, Inc.
Full-text available
We describe Scaffold Hunter, a highly interactive computer-based tool for navigation in chemical space that fosters intuitive recognition of complex structural relationships associated with bioactivity. The program reads compound structures and bioactivity data, generates compound scaffolds, correlates them in a hierarchical tree-like arrangement, and annotates them with bioactivity. Brachiation along tree branches from structurally complex to simple scaffolds allows identification of new ligand types. We provide proof of concept for pyruvate kinase.
Conference Paper
Full-text available
We present an algorithm to find fragments in a set of molecules that help to discriminate between different classes of for instance, activity in a drug discovery context. Instead of carrying out a brute-force search, our method generates fragments by embedding them in all appropriate molecules in parallel and prunes the search tree based on a local order of the atoms and bonds, which results in substantially faster search by eliminating the need for frequent, computationally expensive reembeddings and by suppressing redundant search. We prove the usefulness of our algorithm by demonstrating the discovery of activity-related groups of chemical compounds in the well-known National Cancer Institute's HIV-screening dataset.
Chemoinformatics draws upon techniques from many disciplines including computer science, mathematics, computational chemistry and data visualisation to tackle these problems. This, the first text written specifically for this field, aims to provide an introduction to the major techniques of chemoinformatics. The first part of the book deals with the representation of 2D and 3D molecular structures, the calculation of molecular descriptors and the construction of mathematical models. The second part describes other important topics including molecular similarity and diversity, the analysis of large data sets, virtual screening, and library design. Simple illustrative examples are used throughout to illustrate key concepts, supplemented with case studies from the literature. The book is aimed at graduate students, final-year undergraduates, and professional scientists. No prior knowledge is assumed other than a familiarity with chemistry and some basic mathematical concepts.
DrugViz is a Cytoscape plugin that is designed to visualize and analyze small molecules within the framework of the interactome. DrugViz can import drug–target network information in an extended SIF file format to Cytoscape and display the two-dimensional (2D) structures of small molecule nodes in a unified visualization environment. It also can identify small molecule nodes by means of three different 2D structure searching methods, namely isomorphism, substructure and fingerprint-based similarity searches. After selections, users can furthermore conduct a two-side clustering analysis on drugs and targets, which allows for a detailed analysis of the active compounds in the network, and elucidate relationships between these drugs and targets. DrugViz represents a new tool for the analysis of data from chemogenomics, metabolomics and systems biology. Availability: DrugViz and data set used in Application are freely available for download at Contact: jkshen{at}
We introduce SARANEA, an open-source Java application for interactive exploration of structure-activity relationship (SAR) and structure-selectivity relationship (SSR) information in compound sets of any source. SARANEA integrates various SAR and SSR analysis functions and utilizes a network-like similarity graph data structure for visualization. The program enables the systematic detection of activity and selectivity cliffs and corresponding key compounds across multiple targets. Advanced SAR analysis functions implemented in SARANEA include, among others, layered chemical neighborhood graphs, cliff indices, selectivity trees, editing functions for molecular networks and pathways, bioactivity summaries of key compounds, and markers for bioactive compounds having potential side effects. We report the application of SARANEA to identify SAR and SSR determinants in different sets of serine protease inhibitors. It is found that key compounds can influence SARs and SSRs in rather different ways. Such compounds and their SAR/SSR characteristics can be systematically identified and explored using SARANEA. The program and source code are made freely available under the GNU General Public License.
Turning the motor off: A malachite green based assay leads to the identification of BTB-1 (see picture), the first small-molecule inhibitor of the mitotic motor protein Kif18A. BTB-1 reversibly inhibits the ATPase activity of the recombinant motor domain of Kif18A in vitro (see picture; red microtubules, blue/black structure =Kif18A) and will be a valuable tool to dissect the mechanochemical properties of Kif18A.
While many data sets contain multiple relationships, depicting more than one data relationship within a single visualization is challenging. We introduce Bubble Sets as a visualization technique for data that has both a primary data relation with a semantically significant spatial organization and a significant set membership relation in which members of the same set are not necessarily adjacent in the primary layout. In order to maintain the spatial rights of the primary data relation, we avoid layout adjustment techniques that improve set cluster continuity and density. Instead, we use a continuous, possibly concave, isocontour to delineate set membership, without disrupting the primary layout. Optimizations minimize cluster overlap and provide for calculation of the isocontours at interactive speeds. Case studies show how this technique can be used to indicate multiple sets on a variety of common visualizations.
New technologies in high-throughput screening have significantly increased throughput and reduced assay volumes. Key advances over the past few years include new fluorescence methods, detection platforms and liquid-handling technologies. Screening 100,000 samples per day in miniaturized assay volumes will soon become routine. Furthermore, new technologies are now being applied to information-rich cell-based assays, and this is beginning to remove one of the key bottlenecks downstream from primary screening.
Chemical genetics is a research approach that uses small molecules as probes to study protein functions in cells or whole organisms. Here, I review the parallels between classical genetic and chemical-genetic approaches and discuss the merits of small molecules to dissect dynamic cellular processes. I then consider the pros and cons of different screening approaches and specify strategies aimed at identifying and validating cellular target proteins. Finally, I highlight the impact of chemical genetics on our current understanding of cell biology and its potential for the future.
Natural compounds are evolutionary selected and prevalidated by Nature, displaying a unique chemical diversity and a corresponding diversity of biological activities. These features make them highly interesting for studies of chemical biology, and in the pharmaceutical industry for development of new leads. Of utmost importance, for the discovery of new biologically active compounds, is the identification and charting of the corresponding biologically relevant chemical space. The primary key to this is the coverage of the natural products' chemical space. Here we introduce ChemGPS-NP, a new tool tuned for handling the chemical diversity encountered in natural products research, in contrast to previous tools focused on the much more restricted drug-like chemical space. The aim is to provide a framework for making compound classification and comparison more efficient and stringent, to identify volumes of chemical space related to particular biological activities, and to track changes in chemical properties due to, for example, evolutionary traits and modifications in biosynthesis. Physical-chemical properties not directly discernible from structural data can be discovered, making selection more efficient and increasing the probability of hit generation when screening natural compounds and analogues.
We present structure-activity relationship (SAR) maps, a new, intuitive method for visualizing SARs targeted specifically at medicinal chemists. The method renders an R-group decomposition of a chemical series as a rectangular matrix of cells, each representing a unique combination of R-groups and thus a unique compound. Color-coding the cells by chemical property or biological activity allows patterns to be easily identified and exploited. SAR maps allow the medicinal chemist to interactively analyze complicated datasets with multiple R-group dimensions, rapidly correlate substituent structure and biological activity, assess additivity of substituent effects, identify missing analogs and screening data, and create compelling graphical representations for presentation and publication. We believe that this method fills a long-standing gap in the medicinal chemist's toolset for understanding and rationalizing SAR.
Graphi cal perception: Theory, experime ntation, a nd applicati o n to the developme nt o f graphi ca lmethods. l oumal oj Ih e Americall Slalislica l Association
  • W S Cleveland A Nd
  • R Mcgill
W. S. Cleveland a nd R. McGill. Graphi cal perception: Theory, experime ntation, a nd applicati o n to the developme nt o f graphi ca lmethods. l oumal oj Ih e Americall Slalislica l Association, 79(387):53 1-554, 1984.
Bubbl e sets: Re vealing set re latio ns with isocontours over ex isting vis ualizatio ns. I EEE TrailS' aclioll s 0 11 Visua liza lioll alld Compuler Graphics
  • C Collins
  • G Pe Nn
  • S Carpe Ndale
C. Collins, G. Pe nn, and S. Carpe ndale. Bubbl e sets: Re vealing set re latio ns with isocontours over ex isting vis ualizatio ns. I EEE TrailS' aclioll s 0 11 Visua liza lioll alld Compuler Graphics, 15: 1009-101 6, November 2009.
C hemical genetics: tai lorin g tools for ce ll bio logy
  • Oj Mayer
OJ 1'. Mayer. C hemical genetics: tai lorin g tools for ce ll bio logy. 'li'ellds ill Cell Biology, 13(5):270-277,2003.
High-throughput screening: New technology for the 21st century
  • R Hertzberg
  • A Pope
R. He rtzberg a nd A. Pope. Hi gh-throughput screening: new techno logy for the 2 1 st century. Currelll Opillioll ill Chemi ca l 1Ji0iogy, 4(4):44545 1,2000.
Sar Maps: A new SAR visua li zation technique for medic ina l chemi sts
  • K M Iij O
  • Y S Farnum
  • Lobanov
IIJ O. K. Ag ra fi Ol is, M. She manarev. P. 1. Connoll y, M. Farnum. and Y. S. Lobanov. Sar Maps: A new SAR visua li zation technique for medic ina l chemi sts. l oumal oJMedicillal ChemislI)" 50(24):5926-5937, 2007.
ChemGPS-NP: Tuned for na vigation in biologica ll y releva nt che mi cal space
  • J J La Rsson
  • J Goufri Es
  • S Muresan
  • A Back
  • Lund
J J. La rsson, J. Goufri es, S. Muresan, a nd A. Back lund. ChemGPS-NP: Tuned for na vigation in biologica ll y releva nt che mi cal space. l oumal o/Natllral Producls, 70(5):789-794, 2007.
All IlIlroduclioll 10 Chemoilljorllla/ics
  • A Leach A Nd
  • Y Illet
A. Leach a nd Y. G illet. All IlIlroduclioll 10 Chemoilljorllla/ics.
Mayer. C hemical genetics: tai lorin g tools for ce ll bio logy
OJ 1'. Mayer. C hemical genetics: tai lorin g tools for ce ll bio logy. 'li'ellds ill Cell Biology, 13(5):270-277,2003.
Interacti ve ex ploration of che mica l space with scaffold hunter
  • Iij S Wetzel
  • K Kl E In
  • S Renner
  • O Rau H
  • T I Oprea
  • P Mutzel
  • H Waldmann
IIJ S. Wetzel, K. Kl e in, S. Renner, O. Rau h, T. I. Oprea, P. Mutzel, and H. Waldmann. Interacti ve ex ploration of che mica l space with scaffold hunter. Nal Chem BioI, 5(8):5 8 1-583, 2009.
Orug Yi z: a Cytoscape plug in for vis uali zing and a na ly zing sma ll mol ecul e drugs in bio logical networks
  • B Xiong
  • K Liu
  • J Wu
  • O L Burk
  • H Jiang
B. Xiong, K. Liu, J. Wu, O. L. Burk, H. Jiang, a nd J. S he n. Orug Yi z: a Cytoscape plug in for vis uali zing and a na ly zing sma ll mol ecul e drugs in bio logical networks. Bioill/ol'lllalics, 24( 18):2 11 7-2 11 8,2008.