Cagatay TurkayCity, University of London · Department of Computer Science
Cagatay Turkay
PhD
About
83
Publications
20,680
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,561
Citations
Introduction
Skills and Expertise
Additional affiliations
January 2010 - November 2014
Publications
Publications (83)
Visual analytics requires analysts to constantly plan and execute analysis tasks based on observations and generate visualizations and support analysis for gaining insights from data. Completing this process requires significant effort, emphasizing the necessity for a more intelligent and lightweight visual analytics process. Large language model a...
Visual analytics (VA) requires analysts to iteratively propose analysis tasks based on observations and execute tasks by creating visualizations and interactive exploration to gain insights. This process demands skills in programming, data processing, and visualization tools, highlighting the need for a more intelligent, streamlined VA approach. La...
Computational modeling is a commonly used technology in many scientific disciplines and has played a noticeable role in combating the COVID-19 pandemic. Modeling scientists conduct
sensitivity analysis
frequently to observe and monitor the behavior of a model during its development and deployment. The traditional algorithmic ranking of sensitivit...
This paper introduces design patterns for dashboards to inform dashboard design processes. Despite a growing number of public examples, case studies, and general guidelines there is surprisingly little design guidance for dashboards. Such guidance is necessary to inspire designs and discuss tradeoffs in, e.g., screenspace, interaction, or informati...
We report on an ongoing collaboration between epidemiological modellers and visualization researchers by documenting and reflecting upon knowledge constructs—a series of ideas, approaches and methods taken from existing visualization research and practice—deployed and developed to support modelling of the COVID-19 pandemic. Structured independent c...
This paper introduces design patterns for dashboards to inform their design processes. Despite a growing number of public examples, case studies, and general guidelines there is surprisingly little design guidance for dashboards. Such guidance is necessary to inspire designs and discuss tradeoffs in screenspace, interaction, and information shown....
Uncertainty quantification is a formal paradigm of statistical estimation that aims to account for all uncertainties inherent in the modelling process of real-world complex systems. The methods are directly applicable to stochastic models in epidemiology, however they have thus far not been widely used in this context. In this paper, we provide a t...
We report on an ongoing collaboration between epidemiological modellers and visualization researchers by documenting and reflecting upon knowledge constructs -- a series of ideas, approaches and methods taken from existing visualization research and practice -- deployed and developed to support modelling of the COVID-19 pandemic. Structured indepen...
In the process of developing an infrastructure for providing visualization and visual analytics (VIS) tools to epidemiologists and modeling scientists, we encountered a technical challenge for applying a number of visual designs to numerous datasets rapidly and reliably with limited development resources. In this paper, we present a technical solut...
In the process of developing an infrastructure for providing visualization and visual analytics (VIS) tools to epidemiologists and modeling scientists, we encountered a technical challenge for applying a number of visual designs to numerous datasets rapidly and reliably with limited development resources. In this paper, we present a technical solut...
Deep learning methods are being increasingly used for urban traffic prediction where spatiotemporal traffic data is aggregated into sequentially organized matrices that are then fed into convolution-based residual neural networks. However, the widely known modifiable areal unit problem within such aggregation processes can lead to perturbations in...
Natural language and visualization are being increasingly deployed together for supporting data analysis in different ways, from multimodal interaction to enriched data summaries and insights. Yet, researchers still lack systematic knowledge on how viewers verbalize their interpretations of visualizations, and how they interpret verbalizations of v...
A graph is a mathematical model for representing a system of pairwise relationships between entities. The term “graph” or “graph data” is quite often used to refer, actually, to a system of relationships, which can be represented as a graph, rather than to the mathematical model itself. In line with this practice, the term “graph” is used in this c...
Data scientists usually aim at building computer models. Computeroriented modelling methods and software tools are developed in statistics, machine learning, data mining, and various specialised disciplines, such as spatial statistics, transportation research, and animal ecology. However, valid and useful computerbased models cannot be obtained by...
There are two major types of temporal data, events and time series of attribute values, and there are methods for transforming one of them into the other. For events, a general analysis task is to understand how they are distributed in time. For time series, as well as for events of diverse kinds, a general task is to understand how the attribute v...
Texts are created for humans, who are trained to read and understand them. Texts are poorly suited for machine processing; still, humans need computer help when it is necessary to gain an overall understanding of characteristics and contents of large volumes of text or to find specific information in these volumes. Computer support in text analysis...
We begin with a simple motivating example that shows how putting spatial data on a map and seeing spatial relationships can help an analyst to make important discoveries. We consider possible contents and forms of spatial data, the ways of specifying spatial locations, and how to use spatial references for joining different datasets. We discuss the...
There are different kinds of spatio-temporal phenomena, including events that occur at different locations, movements of discrete entities, changes of shapes and sizes of entities, changes of conditions at different places and overall situations across large areas. Spatio-temporal data may specify positions, times, and characteristics of spatial ev...
Visual analytics approaches combine interactive visualisations with the use of computational techniques for data processing and analysis. Combining visualisation and computation has two sides. One side is computational support to visual analysis: outcomes of computations are intended to provide input to human cognition; for this purpose, they are r...
An illustrated example of problem solving is meant to demonstrate how visual representations of data support human reasoning and deriving knowledge from data.We argue that human reasoning plays a crucial role in solving non-trivial problems. Even when the primary goal of data analysis is to create a predictive model to be executed by computers, thi...
Images and video recordings are commonly categorised as unstructured data, which means that they are not primarily suited for computer analysis. The contents of unstructured data cannot be adequately represented by numbers or symbols and require the power of human vision for extracting meaningful information. While images and video are well suited...
In this chapter, we discuss how visual analytics techniques can support you in investigating and understanding the properties of your data and in conducting common data processing tasks. We consider several examples of possible problems in data and how they may manifest in visual representations, discuss where and why data quality issues can appear...
We introduce the basic principles and rules of the visual representation of information. Any visualisation involves so-called visual variables, such as position along an axis, size, colour hue and lightness, and shape of a graphical element. The variables differ by their perceptual properties, and it is important to choose appropriate variables dep...
One very common challenge that every data scientists has to deal with is to make sense of data sets with many attributes, where “many” can sometimes be tens, sometimes hundreds, and even thousands. Whether your goal is to do exploratory analysis on the relationships between the attributes, or to build models of the underlying phenomena, working wit...
Analysis is always focused on a certain subject, which is a thing or phenomenon that needs to be understood and, possibly, modelled. The data science process involves analysis of three different subjects: data, real world phenomena portrayed in the data, and computer models derived from the data. A subject can be seen as a system composed of multip...
This chapter very briefly summarises the main ideas and principles of visual analytics, while the main goal is to show by example how to devise new visual analytics approaches and workflows using general techniques of visual analytics: abstraction, decomposition, selection, arrangement, and visual comparison.We take an example of an analysis scenar...
Deep learning methods are being increasingly used for urban traffic prediction where spatiotemporal traffic data is aggregated into sequentially organized matrices that are then fed into convolution-based residual neural networks. However, the widely known modifiable areal unit problem within such aggregation processes can lead to perturbations in...
This textbook presents the main principles of visual analytics and describes techniques and approaches that have proven their utility and can be readily reproduced. Special emphasis is placed on various instructive examples of analyses, in which the need for and the use of visualisations are explained in detail.
The book begins by introducing the m...
Multimodal approaches where interactive visualization and natural language are used in tandem are emerging as promising techniques for data analysis. A significant barrier in effectively designing such multimodal techniques is the lack of a systematic understanding of how people verbalize visual representations of data. Motivated by these gaps, thi...
User behaviour analytics (UBA) systems offer sophisticated models that capture users' behaviour over time with an aim to identify fraudulent activities that do not match their profiles. Motivated by the challenges in the interpretation of UBA models, this paper presents a visual analytics approach to help analysts gain a comprehensive understanding...
Promoting a wider range of contribution types can facilitate healthy growth of the visualization community, while increasing the intellectual diversity of visualization research papers. In this paper, we discuss the importance of contribution types and summarize contribution types that can be meaningful in visualization research. We also propose se...
We define behavior as a set of actions performed by some agent during a period of time. We consider the problem of analyzing a large collection of behaviors by multiple agents, more specifically, identifying typical behaviors as well as spotting behavior anomalies. We propose an approach leveraging topic modeling techniques -- LDA (Latent Dirichlet...
Data-driven stories, widely used in journalism and scientific communication, match well with the recent focus on interpretable machine learning and AI explainability. Current technologies allow authors to break away from narratives that reflect traditional analytical workflows. To support designing such types of stories, we introduce a descriptive...
Visual analytics usually deals with complex data and uses sophisticated algorithmic, visual, and interactive techniques. Findings of the analysis often need to be communicated to an audience that lacks visual analytics expertise. This requires analysis outcomes to be presented in simpler ways than that are typically used in visual analytics systems...
Data science requires time-consuming iterative manual activities. In particular, activities such as data selection, preprocessing, transformation, and mining, highly depend on iterative trial-and-error processes that could be sped up significantly by providing quick feedback on the impact of changes. The idea of progressive data science is to compu...
In this position paper, we argue that a combination of visualization and verbalization techniques is beneficial for creating broad and versatile insights into the structure and decision-making processes of machine learning models. Explainability of machine learning models is emerging as an important area of research. Hence, insights into the inner...
Across the globe, rapid growth and urbanization are placing increasing pressure on cities and governances to make the most efficient use of their resources. Some estimates predict that 70 percent of the worlds population will live in a city or suburb by 2050. One way to address this challenge is to integrate digital technology into a citys resource...
Action sequences, where atomic user actions are represented in a labelled, timestamped form, are becoming a fundamental data asset in the inspection and monitoring of user behaviour in digital systems. Although the analysis of such sequences is highly critical to the investigation of activities in cyber security applications, existing solutions fai...
The analysis of financial assets’ correlations is fundamental to many aspects of finance theory and practice, especially modern portfolio theory and the study of risk. In order to manage investment risk, in‐depth analysis of changing correlations is needed, with both high and low correlations between financial assets (and groups thereof) important...
Visual analytics systems combine machine learning or other analytic techniques with interactive data visualization to promote sensemaking and analytical reasoning. It is through such techniques that people can make sense of large, complex data. While progress has been made, the tactful combination of machine learning and data visualization is still...
We describe a selection of challenges at the intersection of machine learning and data visualization and outline a subjective research agenda based on professional and personal experience. The unprecedented increase in the amount, variety and the value of data has been significantly transforming the way that scientific research is carried out and b...
Digital communication has changed human life since the invention of the internet. The growth of E-mail, social websites and other interpersonal communication systems in turn have brought rapid development in especially the key technological area of data analytics. Using advanced forms of analytics helps the examination of data and better informs in...
The primary purpose for which statistical models are employed in the social sciences is to understand and explain phenomena occurring in the world around us. In order to be scientifically valid and actionable, the construction of such models need to be strongly informed by theory. To accomplish this, there is a need for methodologies that can enabl...
Visual analytics systems combine machine learning or other analytic techniques with interactive data visualization to promote sensemaking and analytical reasoning. It is through such techniques that people can make sense of large, complex data. While progress has been made, the tactful combination of machine learning and data visualization is still...
Fundamental to the effective use of visualization as an analytic and descriptive tool is the assurance that presenting data visually provides the capability of making inferences from what we see. This paper explores two related approaches to quantifying the confidence we may have in making visual inferences from mapped geospatial data. We adapt Wic...
Electronic discovery (E-discovery) is a legal process for investigating various events in the corporate world, for the purpose of produc-ing/obtaining evidence, one such example is an email communication (eg. Enron case). Investigating emails collected over a period of time, manually, is a strenuous process and the tools currently available on the...
To help improve efficiency and reduce costs involved in an electronic discovery (E-discovery) process for email investigations, visualisations can be of great help, and they can change the way analysts/investigators understand contacts, messages in inboxes and their relationship. Though email data is a central resource in E-discovery processes but...
Many datasets have multiple perspectives – for example space, time and description – and often analysts are required to study these multiple perspectives concurrently. This concurrent analysis becomes difficult when data are grouped and split into small multiples for comparison. A design challenge is thus to provide representations that enable mult...
Comparing multiple variables to select those that effectively characterize complex entities is important in a wide variety of domains – geodemographics for example. Identifying variables that correlate is a common practice to remove redundancy, but correlation varies across space, with scale and over time, and the frequently used global statistics...
In interactive data analysis processes, the dialogue between the human and the computer is the enabling mechanism that can lead to actionable observations about the phenomena being investigated. It is of paramount importance that this dialogue is not interrupted by slow computational mechanisms that do not consider any known temporal human-computer...
Small multiples enable comparison by providing different views of a single data set in a dense and aligned manner. A common frame defines each view, which varies based upon values of a conditioning variable. An increasingly popular use of this technique is to project two-dimensional locations into a gridded space (e.g. grid maps), using the underly...
We describe and discuss a visual analysis prototype to support volume crime analysis, a form of exploratory data analysis that aims to identify and describe patterns of criminality using historical and recent crime reports. Analysis requirements are relatively familiar: analysts wish to identify, define and compare sets of crime reports across mult...
Poster presented at IEEE VIS 2015.
The visual analysis of geographically referenced datasets with a large number of attributes is challenging due to the fact that the characteristics of the attributes are highly dependent upon the locations at which they are focussed, and the scale and time at which they are measured. Specialized interactive visual methods are required to help analy...
Predicting how temporally varying phenomena will evolve over time, or in other terms forecasting, is one of the fundamental tasks in time series analysis. Prediction has gained particular importance with the advent of real time data collection activities. Although there exist several sophisticated methodologies to predict time series, the success o...
Flow data is often visualized by animated particles inserted into a flow field. The velocity of a particle on the screen is typically linearly scaled by the velocities in the data. However, the perception of velocity magnitude in animated particles is not necessarily linear. We present a study on how different parameters affect relative motion perc...
Deployment of biometric systems in the specific environment is not straightforward. Based on pre-deployment performance test results, a decision maker needs to consider the selection of sensors and matching algorithms in terms of the cost, expected false-match and false-non-match failure rates and the underlying quality factors. Which depend on ope...
Medical cohort studies enable the study of medical hypotheses with many samples. Often, these studies acquire a large amount of heterogeneous data from many subjects. Usually, researchers study a specific data subset to confirm or reject specific hypotheses. A new approach enables the interactive visual exploration and analysis of such data, helpin...
Dual analysis uses statistics to describe both the dimensions and rows of a high-dimensional dataset. Researchers have integrated it into StratomeX, a Caleydo view for cancer subtype analysis. In addition, significant-difference plots show the elements of a candidate subtype that differ significantly from other subtypes, thus letting analysts chara...
With the advance of new data acquisition and generation technologies, the biomedical domain is becoming increasingly data-driven. Thus, understanding the information in large and complex data sets has been in the focus of several research fields such as statistics, data mining, machine learning, and visualization. While the first three fields predo...
Molecular surfaces provide a useful mean for analyzing interactions between biomolecules; such as identification and characterization of ligand binding sites to a host macromolecule. We present a novel technique, which extracts potential binding sites, represented by cavities, and characterize them by 3D graphs and by amino acids. The binding sites...
High dimensional, heterogeneous datasets are challenging for domain experts to analyze. A very large number of dimensions often pose problems when visual and computational analysis tools are considered. Analysts tend to limit their attention to subsets of the data and lose potential insight in relation to the rest of the data. Generating new hypoth...
The process of surface perception is complex and based on several influencing factors, e.g., shading, silhouettes, occluding contours, and top down cognition. The accuracy of surface perception can be measured and the influencing factors can be modified in order to decrease the error in perception. This paper presents a novel concept of how a perce...
Datasets with a large number of dimensions per data item (hundreds or more) are challenging both for computational and visual analysis. Moreover, these dimensions have different characteristics and relations that result in sub-groups and/or hierarchies over the set of dimensions. Such structures lead to heterogeneity within the dimensions. Although...
Molecular surfaces provide a suitable way to analyze and to study the evolution and interaction of molecules. The analysis is often concerned with visual identification of binding sites of ligands to a host macromolecule. We present a novel technique that is based on implicit representation, which extracts all potential binding sites and allows an...
Microarray data represents the expression levels of genes for different samples and for different conditions. It has been a central topic in bioinformatics research for a long time already. Researchers try to discover groups of genes that are responsible for specific biological processes. Statistical analysis tools and visualizations have been wide...
In this study, an information theory based framework to automatically construct analytical maps of crowd's locomotion, called behavior maps, is presented. For these behavior maps, two distinct use cases in crowd simulation domain are introduced; i) automatic camera control ii) behavioral modeling. The first use case for behavior maps is an automati...
In this study, an information theory based framework to automatically construct analytical maps of crowd’s locomotion, called behavior maps, is presented. For these behavior maps, two distinct use cases in crowd simulation domain are introduced; i) automatic camera control ii) behavioral modeling.
The first use case for behavior maps is an automati...
In many application fields, data analysts have to deal with datasets that contain many expressions per item. The effective analysis of such multivariate datasets is dependent on the user's ability to understand both the intrinsic dimensionality of the dataset as well as the distribution of the dependent values with respect to the dimensions. In thi...
Crowds must be simulated believable in terms of their appearance and behavior to improve a virtual environment's realism.
Due to the complex nature of human behavior, realistic behavior of agents in crowd simulations is still a challenging problem.
In this paper, we propose a novel behavioral model which builds analytical maps to control agents’ be...
Cluster analysis is a useful method which reveals underlying structures and relations of items after grouping them into clusters. In the case of temporal data, clusters are defined over time intervals where they usually exhibit structural changes. Conventional cluster analysis does not provide sufficient methods to analyze these structural changes,...
Cluster analysis is a popular method for data investigation where data items are structured into groups called clusters. This analysis involves two sequential steps, namely cluster formation and cluster evaluation. In this paper, we propose the tight integration of cluster formation and cluster evaluation in interactive visual analysis in order to...
In this paper, we propose a novel behavioral model which builds analytical maps to control agents’ behavior adaptively with agent crowd interaction formulations. We introduce information theoretical concepts to construct analytical maps automatically. Our model can be integrated into crowd simulators and enhance their behavioral complexity
Web search query logs contain valuable information which can be utilized for personalization and improvement of search engine performance. The aim in this paper is to cluster users based on their interests, and analyze the temporal dynamics of these clusters. In the proposed approach, we first apply clustering techniques to group similar users with...
Navigation and monitoring of large and crowded virtual environments is a challenging task and requires in- tuitive camera control techniques to assist users. In this pa- per, we present a novel automatic camera control technique providing a scene analysis framework based on information theory. The developed framework contains a probabilistic model...