Project

Interactive data visualization by means of controllable dimensionality reduction

Goal: The objective of this project is to link the field of dimensionality reduction (DR) with that of information visualization (IV), in order to harness the special properties of the latter within DR frameworks. In particular, the properties of controllability and interactivity are of interest, which should make the DR outcomes significantly more understandable and tractable for the (no-necessarily-expert) user. These two properties allow the user to have freedom to select the best way for representing data. Then, in other words, it can be said that the goal of this project is to develop a DR framework that facilitates an interactive and quick visualization of data representation to make more intelligible the DR outcomes, as well as to allow users modifying the views of data according to their needs in an affordable way.

Updates
0 new
2
Recommendations
0 new
0
Followers
0 new
16
Reads
1 new
417

Project log

Diego Peluffo
added 2 research items
This letter formally introduces the concept of interaction model (IM), which has been used either directly or tangentially in previous works but never defined. Broadly speaking, an IM consists of the use of a mixture of dimensionality reduction (DR) techniques within an interactive data visualization framework. The rationale of creating an IM is the need for simultaneously harnessing the benefit of several DR approaches to reach a data representation being intelligible and/or fitted to any user’s criterion. As a remarkable advantage, an IM naturally provides a generalized framework for designing both interactive DR approaches as well as readily-to-use data visualization interfaces. In addition to a comprehensive overview on basics of data representation and dimensionality reduction, the main contribution of this manuscript is the elegant definition of the concept of IM in mathematical terms.
Broadly, the area of dimensionality reduction (DR) is aimed at providing ways to harness high dimensional (HD) information through the generation of lower dimensional (LD) representations, by following a certain data-structure-preservation criterion. In literature there have been reported dozens of DR techniques, which are commonly used as a pre-processing stage withing exploratory data analyses for either machine learning or information visualization (IV) purposes. Nonetheless, the selection of a proper method is a nontrivial and -very often- toilsome task. In this sense, a readily and natural way to incorporate an expert’s criterion into the analysis process, while making this task more tractable is the use of interactive IV approaches. Regarding the incorporation of experts’ prior knowledge there still exists a range of open issues. In this work, we introduce a here-named Inverse Data Visualization Framework (IDVF), which is an initial approach to make the input prior knowledge directly interpretable. Our framework is based on 2D-scatter-plots visuals and spectral kernel-driven DR techniques. To capture either the user’s knowledge or requirements, users are requested to provide changes or movements of data points in such a manner that resulting points are located where best convenient according to the user’s criterion. Next, following a Kernel Principal Component Analysis approach and a mixture of kernel matrices, our framework accordingly estimates an approximate LD space. Then, the rationale behind the proposed IDVF is to adjust as accurate as possible the resulting LD space to the representation fulfilling users’ knowledge and requirements. Results are greatly promising and open the possibility to novel DR-based visualizations approaches.
Diego Peluffo
added 3 research items
In recent times, an undeniable fact is that the amount of data available has increased dramatically due mainly to the advance of new technologies allowing for storage and communication of enormous volumes of information. In consequence, there is an important need for finding the relevant information within the raw data through the application of novel data visualization techniques that permit the correct manipulation of data. This issue has motivated the development of graphic forms for visually representing and analyzing high-dimensional data. Particularly, in this work, we propose a graphical approach, which, allows the combination of dimensionality reduction (DR) methods using an angle-based model, making the data visualization more intelligible. Such approach is designed for a readily use, so that the input parameters are interactively given by the user within a user-friendly environment. The proposed approach enables users (even those being non-experts) to intuitively select a particular DR method or perform a mixture of methods. The experimental results prove that the interactive manipulation enabled by the here-proposed model-due to its ability of displaying a variety of embedded spaces-makes the task of selecting a embedded space simpler and more adequately fitted for a specific need.
Dimensionality reduction (DR) is a methodology used in many fields linked to data processing, and may represent a preprocessing stage or be an essential element for the representation and classification of data. The main objective of DR is to obtain a new representation of the original data in a space of smaller dimension, such that more refined information is produced, as well as the time of the subsequent processing is decreased and/or visual representations more intelligible for human beings are generated. The spectral DR methods involve the calculation of an eigenvalue and eigenvector decomposition, which is usually high-computational-cost demanding, and, therefore, the task of obtaining a more dynamic and interactive user-machine integration is difficult. Therefore, for the design of an interactive IV system based on DR spectral methods, it is necessary to propose a strategy to reduce the computational cost required in the calculation of eigenvectors and eigenvalues. For this purpose, it is proposed to use locally linear submatrices and spectral embedding. This allows integrating natural intelligence with computational intelligence for the representation of data interactively, dynamically and at low computational cost. Additionally, an interactive model is proposed that allows the user to dynamically visualize the data through a weighted mixture.
Diego Peluffo
added 3 research items
The large amount of data generated by different activities -academic,scientific, business and industrial activities, among others- contains meaningful information that allows developing processes and techniques, which have scientific validity to optimally explore such information. Doing so, we get newknowledge to properly make decisions. Nowadays a new and innovative field is rapidly growing in importance that is Artificial Intelligence, which involvescomputer processing devices of modern machines and human reasoning. Bysynergistically combining them -in other words, performing an integration of natural and artificial intelligence-, it is possible to discover knowledge in a more effective way in order to find hidden trends and patterns belonging to the predictive model database. As well, allowing for new observations and considerations from beforehand known data by using data analysis methods as well as the knowledge and skills (of holistic, flexible and parallel type) from human reasoning. This work briefly reviews main basics and recent works on artificial and natural intelligence integration in order to introduce users and researchers on this field integration approaches. As well, key aspects to conceptually compare them are provided.
Stochastic neighbor embedding (SNE) is a method of dimen-sionality reduction that involves softmax similarities measured between all pairs of data points. To build a suitable embedding, SNE tries to reproduce in a low-dimensional space the similarities that are observed in the high-dimensional data space. Previous work has investigated the immunity of such similarities to norm concentration, as well as enhanced cost functions. This paper proposes an additional refinement, in the form of multiscale similarities, namely averages of softmax ratios with decreasing bandwidths. The objective is to maximize the embedding quality at all scales, with a better preservation of both local and global neighborhoods, and also to exempt the user from having to fix a scale arbitrarily. Experiments on several data sets show that this multiscale version of SNE, combined with an appropriate cost function (sum of Jensen-Shannon divergences), outperforms all previous variants of SNE.
Diego Peluffo
added a research item
Dimensionality reduction (DR) methods are able to produce low-dimensional representations of an input data sets which may become intelligible for human perception. Nonetheless, most existing DR approaches lack the ability to naturally provide the users with the faculty of controlability and interactivity. In this connection, data visualization (DataVis) results in an ideal complement. This work presents an integration of DR and DataVis through a new approach for data visualization based on a mixture of DR resultant representations while using visualization principle. Particularly, the mixture is done through a weighted sum, whose weighting factors are defined by the user through a novel interface. The interface’s concept relies on the combination of the color-based and geometrical perception in a circular framework so that the users may have a at hand several indicators (shape, color, surface size) to make a decision on a specific data representation. Besides, pairwise similarities are plotted as a non-weighted graph to include a graphic notion of the structure of input data. Therefore, the proposed visualization approach enables the user to interactively combine DR methods, while providing information about the structure of original data, making then the selection of a DR scheme more intuitive.
Paul Rosero
added 2 research items
Dynamic or time-varying data analysis is of great interest in emerging and challenging research on automation and machine learning topics. In particular, motion segmentation is a key stage in the design of dynamic data analysis systems. Despite several studies have addressed this issue, there still does not exist a final solution highly compatible with subsequent clustering/classification tasks. In this work, we propose a motion segmentation compatible with kernel spectral clustering (KSC), here termed KSC-MS, which is based on multiple kernel learning and variable ranking approaches. Proposed KSC-MS is able to automatically segment movements within a dynamic framework while providing robustness to noisy environments.
To perform an exploration process over complex structured data within unsupervised settings, the so-called kernel spectral clustering (KSC) is one of the most recommended and appealing approaches, given its versatility and elegant formulation. In this work, we explore the relationship between (KSC) and other well-known approaches, namely normalized cut clustering and kernel k-means. To do so, we first deduce a generic KSC model from a primal-dual formulation based on least-squares support-vector machines (LS-SVM). For experiments, KSC as well as other consider methods are assessed on image segmentation tasks to prove their usability.
Andres Javier Anaya Isaza
added a research item
RESUMEN La representación visual de datos es una técnica de extracción de conocimiento que permite tener una percepción de toda la información disponible dentro de Big Data. Para lograr el objetivo de captación de la atención humana, es necesario representar el conjunto de datos de una manera intuitiva. De esta forma, el usuario pueda tomar decisiones adecuadas. Consecuentemente, el mejoramiento de la experiencia humano computador se basa en el uso de técnicas de análisis de datos, donde los recursos computacionales deben ser optimizados. En este trabajo, se desarrolla una metodología con una investigación de tipo descriptivo, exploratorio y documental con los diferentes enfoques de visualización de datos orientados al análisis exploratorio para el descubrimiento científico y aumento de las capacidades humanas como apoyo a las decisiones automáticas. Palabras Claves: Big data, minería de datos, morfología visual, visualización. Abstract Visual representation is an approach to extract knowledge, which enables users to perceive the information whitin a context of Big Data. To involve the human perception into the data analysis, an inuitive data representation is needed. Consequently, any user will be able to make more adequate decisions. Indeed, the enhancement of human-computer interaction is based on the use of data anaylisis techniques while computational broad is optimized. In this work, a descriptive, exploratory and documental methodology is presented aimed at highlighting the benefit of involving the human skills within the process of automatic decision making.
Juan Antonio Castro Silva
added a research item
This work describes a new model for interactive data visualization followed from a dimensionality-reduction (DR)-based approach. Particularly, the mixture of the resulting spaces of DR methods is considered, which is carried out by a weighted sum. For the sake of user interaction, corresponding weighting factors are given via an intuitive color-based interface. Also, to depict the DR outcomes while showing information about the input high-dimensional data space, the low-dimensional representations reached by the mixture is conveyed using scatter plots enhanced with an interactive data-driven visualization. In this connection, a constrained dissimilarity approach define the graph to be drawn on the scatter plot.
Ana Cristina Umaquinga
added a research item
Resumen: Ante el crecimiento exponencial y vertiginoso del volumen de los datos de diferente tipo: estructurados, semiestructurados y no estructurados provenientes de una variedad de fuentes entre ellas: la web, redes sociales, bases de datos, archivos de audio/video, datos transaccionales, sensores, comunicación máquina a máquina (denominado M2M). El área de Big Data pretende dar respuesta a los desafíos del tratamiento de la información. Es por ello, que el proceso de análisis de grandes volúmenes de datos Big Data Analytics (denominado BDA) facilita el descubrimiento de patrones, predicciones, fraudes, tendencias de mercado, comportamientos y preferencias de los clientes e información de utilidad, que no sería posible con las herramientas convencionales. BDA se convierte en una de las herramientas de soporte para la toma de decisiones empresariales y ventaja competitiva en tiempo real o en el menor tiempo posible frente a sus competidores, ofreciendo nuevos niveles de competitividad, procesos, modelos de negocio basados en datos y reducción del riesgo para conservar, fidelizar y captar una mayor cantidad de clientes generando un aumento en las fuentes de ingreso de las empresas. El presente artículo es de tipo exploratorio, descriptivo y documental. Se realiza un estudio descriptivo del impacto de Big Data Analytics (BDA) en el campo empresarial, así como un breve recorrido por sus tendencias, oportunidades, dificultades y retos. Este estudio pretende contribuir a la comunidad de investigadores, así como al personal de las empresas y a quienes se inician en el conocimiento de Big Data Analytics para una mejor comprensión en este campo. Abstract: By the exponential and vertiginous growth of the volume of data of different types: structured, semi-structured and unstructured from a variety of sources including: the web, social networks, databases, audio / video files, transactional data, sensors, machine-to-machine communication (denominated M2M). The Big-Data-area is intended to address the challenges of information processing. Therefore, the Big Data Analytics (BDA) process of large volumes of data facilitates the discovery of patterns, predictions, fraud, market trends, customer behaviours and preferences and useful information that would not be possible with conventional tools. BDA becomes one of the tools to support business decision-making and competitive advantage in real time or in the shortest possible time in relation its competitors, offering new levels of competitiveness, processes, business models based in data and risk reduction, to conserve, retain and attract a greater number of customers generating an increase in the sources of income of companies. This article is exploratory, descriptive and documentary. A descriptive study of the impact of Big Data Analytics (BDA) in the business field, as well as a brief tour of its tendencies, opportunities, difficulties and challenges. This study aims to contribute to the research community, as well as the staff of the companies and those who are introduced to the knowledge of Big Data Analytics for a better understanding in this field.
Paul Rosero
added a research item
Abstract. This work presents an improved interactive data visualization inteface based on a mixture of the outcomes of dimensionality reduction (DR) methods. Broadly, it works as follows: The user can input the mixture weighting factors through a visual and intuitive interface with a primary-light-colors-basemodel (Red, Green, and Blue). By design, such a mixture is a weighted sum othe color tone. Additionally, the low-dimensional representation space produceby DR methods are graphically depicted using scatter plots powered via an interactive data-driven visualization. To do so, pairwise similarities are calculateand employed to define the graph to simultaneously be drawn over the scatteplot. Our interface enables the user to interactively combine DR methods by thhuman perception of color, while providing information about the structure ooriginal data. Then, it makes the selection of a DR scheme more intuitive -evefor non-expert users.
Paul Rosero
added a research item
Business Intelligence (BI) es el conjunto de estrategias y herramientas para analizar gran cantidad de volúmenes de datos con el fin de encontrar patrones o tendencias de consumo de las personas y establecer estrategias de negocio, para lograr este objetivo es necesario contar con servicios y aplicaciones como RealTime BI, Social BI, Cloud BI, BI 3.0, Business Analitics y Mobile BI. Todo el proceso de BI es soportado por diferentes análisis que implementan algoritmos de machine learning en grandes volúmenes y diferentes fuentes de datos, considerado como Big Data. En este trabajo se hace relación a las funcionalidades y los requerimientos necesarios de BI desde un concepto inicial hasta conceptos específicos y herramientas para su implementación.
Diego Peluffo
added a research item
This work presents a new interactive data visualization approach based on mixture of the outcomes of dimensionality reduction (DR) methods. Such a mixture is a weighted sum, whose weighting factors are defined by the user through a visual and intuitive interface. Additionally, the low-dimensional representation space produced by DR methods are graphically depicted using scatter plots powered via an interactive data-driven visualization. To do so, pairwise similarities are calculated and employed to define the graph to be drawn on the scatter plot. Our visualization approach enables the user to interactively combine DR methods while provided information about the structure of original data, making then the selection of a DR scheme more intuitive.
Diego Peluffo
added a research item
Resumen Los enormes volúmenes de datos, generados por la actividad académica, científica, empresarial e industrial, entre muchas más, contienen información muy valiosa, lo que hace necesario desarrollar procesos y técnicas robustas, de validez científica que permitan explorar esas grandes cantidades de datos de manera óptima, con el propósito de obtener información relevante para la generación de nuevo conocimiento y toma de decisiones acertadas. La robustez y altas capacidades de procesamiento computacional de las maquinas modernas son aprovechadas por áreas como la inteligencia artificial que si se integra de forma holística con la inteligencia natural, es decir, si se combina sinérgicamente los métodos sofisticados de análisis de datos con los conocimientos, habilidades y flexibilidad de la razón humana, es posible generar conocimiento de forma más eficaz. La visualización de información propone formas eficientes de llevar los resultados generados por los algoritmos a la comprensión humana, la cual permite encontrar tendencias y patrones ocultos de forma visual, los cuales pueden formar la base de modelos predictivos que permitan a los analistas producir nuevas observaciones y consideraciones a partir de los datos existentes, mejorando el desempeño de los sistemas de aprendizaje automático, haciendo más inteligibles los resultados y mejorando la interactividad y controlabilidad por parte del usuario. Sin embargo, la tarea de presentar y/o representar datos de manera comprensible, intuitiva y dinámica, no es un tarea trivial; uno de los mayores problemas que enfrenta la visualización es la alta dimensión de los datos, entendiendo dimensión o como el número de variables o atributos que caracterizan a un objeto. Una solución efectiva son los métodos de reducción de dimensión (RD) que permiten representar los datos originales en alta dimensión en dimensiones inteligibles para el ser humano (2D o 3D). En la actualidad, los métodos kernel representan una buena alternativa de RD debido su versatilidad y fácil implementación en entornos de programación. En este trabajo se presenta una breve descripción y forma de uso de un método generalizado conocido como análisis de componentes principales basado en kernel (KPCA). Palabras claves: Inteligencia artificial, inteligencia natural, kernel PCA, reducción de dimensión.
Andres Javier Anaya Isaza
added 2 research items
Nowadays, a consequence of data overload is that world's technology capacity to collect, communicate, and store large volumes of data is increasing faster than human analysis skills. Such an issue has motivated the development of graphic ways to visually represent and analyze high-dimensional data. Particularly, in this work, we propose a graphical interface that allow the combination of dimensionality reduction (DR) methods using a chromatic model to make data visualization more intelligible for humans. This interface is designed for an easy and interactive use, so that input parameters are given by the user via the selection of RGB values inside a given surface. Proposed interface enables (even non-expert) users to intuitively either select a concrete DR method or carry out a mixture of methods. Experimental results proves the usability of our interface making the selection or configuration of a DR-based visualization an intuitive and interactive task for the user.
Resumen—En la actualidad se puede evidenciar un crecimiento exponencial del volumen de datos, dando lugar al área emergente denominada Big Data. Paralelamente a este crecimiento, ha aumentado la demanda de herramientas, técnicas y dispositivos para almacenar, transmitir y procesar datos de alta dimensión. La mayoría de metodologías existentes para procesar datos de alta dimensión producen resultados abstractos y no envuelven al usuario en la elección o sintonización las técnicas de análisis. En este trabajo proponemos una metodología de análisis visual de Big Data con principios de interactividad y controlabilidad de forma que usuarios (incluso aquellos no expertos) puedan seleccionar intuitivamente un método de reducción de dimensión para generar representaciones inteligibles para el ser humano. Palabras Clave—Big Data, reducción de dimensión, análisis visual Abstract— Today, the volumen of available data is experiencing an exponential growing, introducing an emergent are so-called Big Data. Along with the data growing, the demand of tools, techniques and devices to store, transmit and process high-dimensional data (HD) is increased. Most available methodologies to process HD output abstract outcomes and user is not involved in the selection or parameter tuning processes of data analysis techniques. In this work, we propose a visual analysis methodology following principles of interactivity and controlability. Doing so, users (even non-expert ones) can inuitively select a dimensionality reduction method to generate inteligible representations for human beings. Palabras Clave—Big Data, reducción de dimensión, análisis visual I. INTRODUCCIÓN El crecimiento del volumen de datos de diferente tipo (estructurados, no estructurados, semiestructurados) es exponencial y actualmente en términos de almacenamiento alcanza el orden de petabytes, y exabytes. Dichos datos son generados por diferentes fuentes, entre ellas: Los seres humanos, la comunicación máquina a máquina (también denominada como M2M), los grandes datos transaccionales, la información biométrica [1], [2], entre otros. El gran volumen de información se debe a los avances electrónicos e informáticos, como sensores, satélites, bandas magnéticas, GPS, tecnologías web, cloud computing, y redes sociales [3], [4]. Uno de los desafíos del manejo de información que presenta el mercado es analizar, descubrir y entender más allá de lo que sus procesos y herramientas tradicionales reportan sobre su información [1]. En efecto, si la información no puede ser fácilmente interpretada, se genera un mayor consumo de recursos tecnológicos, económicos, tiempo, y talento humano (presencia requerida de expertos en análisis de datos). Las técnicas comunes de tratamiento de datos no permiten recuperar la información oculta en su totalidad o no tienen la capacidad para tratarlos, en consecuencia la visualización de datos en muchos casos se vuelve imprescindible, en especial, en las etapas de análisis en donde se realizan las hipótesis
Juan Carlos Alvarado Pérez
added 2 research items
Nowadays, great amount of data is being created by several sources from aca-demic, scientific, business and industrial activities. Such data intrinsically con-tains meaningful information allowing for developing techniques, and have scientific validity to explore the information thereof. In this connection, the aim of artificial intelligence (AI) is getting new knowledge to make decisions proper-ly. AI has taken an important place in scientific and technology development communities, and recently develops computer-based processing devices for modern machines. Under the premise, the premise that the feedback provided by human reasoning -which is holistic, flexible and parallel- may enhance the data analysis, the need for the integration of natural and artificial intelligence has emerged. Such an integration makes the process of knowledge discovery more effective, providing the ability to easily find hidden trends and patterns belong-ing to the database predictive model. As well, allowing for new observations and considerations from beforehand known data by using both data analysis meth-ods and knowledge and skills from human reasoning. In this work, we review main basics and recent works on artificial and natural intelligence integration in order to introduce users and researchers on this emergent field. As well, key aspects to conceptually compare them are provided.
Ana Cristina Umaquinga
added 2 research items
Resumen— Este trabajo presenta un estudio comparativo con métodos de reducción de la dimensión lineal, tales como: Análisis de Componentes Principales & Análisis Discriminante Lineal. El estudio pretende determinar, bajo criterios de objetividad, cuál de estas técnicas obtiene el mejor resultado de separabilidad entre clases. Para la validación experimental se utilizan dos bases de datos, del repositorio científico (UC Irvine Machine Learning Repository), para dar tratamiento a los atributos del data-set en función de confirmar visualmente la calidad de los resultados obtenidos. Las inmersiones obtenidas son analizadas, para realizar una comparación de resultados del embedimiento representados con RNX(K), que permite evaluar el área bajo la curva, del cual asume una mejor representación en una topología global o local que posteriormente genera los gráficos de visualización en un espacio de menor dimensión, para observar la separabilidad entre clases conservando la estructura global de los datos. Palabras clave— Análisis de componentes principales, Análisis discriminante lineal, Aprendizaje de máquina, Clasificación lineal, Clasificación supervisada, Métodos de reducción de la dimensión. Abstract— This paper presents a comparative study with methods of reducing the linear dimension, such as Principal Component Analysis & linear discriminant analysis. The study aims to determine, based on criteria of objectivity, which of these techniques get the best result of separability between classes. For experimental validation two databases, scientific repository (UC Irvine Machine Learning Repository) are used to provide treatment to the attributes of the data-set according to visually confirm the quality of the results. The dives obtained are analyzed, for comparison of results embedimiento represented with RNX (K), which evaluates the area under the curve, which assumes better representation in a local global topology or subsequently generates graphics display a space of smaller dimension, to observe the class separability preserving the overall structure of the data.
Resumen En el campo de visualización de la información (VI) en Big Data (también denominado DataVis, InfoVis, Analítica Visual, VA), se han realizado innumerables esfuerzos, a nivel empresarial, educación e investigación, entre otros, que han dado como resultado diversas propuestas de herramientas de software que usan interfaces y técnicas de VI. En la actualidad, existen decenas de herramientas que potencializan y se especializan en determinadas técnicas de visualización. Por tanto, para un usuario, la elección de una herramienta en particular no es una tarea trivial. En este trabajo, se presenta un estudio descriptivo de técnicas de visualización de información abarcando diferentes grupos o tipos de técnicas, tales como: Geometric Projection, Interactive, Icon-based, y Hierarchical, entre otros. Para este fin, se realiza una tabulación de información, presentando las herramientas de software y técnicas de visualización consideradas en este estudio, de forma que pueda realizarse la identificación de las técnicas más comúnmente utilizadas y recomendadas para uso en entornos de tipo Open Source y Soluciones Empresariales. Para ello, se parte de la revisión de análisis de literatura de VI, Analítica Visual y artículos científicos sobre herramientas de análisis de Big Data enfocados en establecer herramientas de software y técnicas de visualización. Dichos análisis y revisiones se realizan sobre un total de 58 técnicas de visualización y 31 herramientas de software. Como resultado, se obtiene una valoración de técnicas de visualización y se establece aspectos clave y recomendaciones para realizar la selección de técnicas de visualización de acuerdo con los requerimientos del usuario. Palabras clave: Data Vis, Herramientas de Visualización de Big Data, Técnicas de visualización, Software Comercial, Software Open Source. Abstract In the field of information visualization (IV) in Big Data (also called DataVis, InfoVis, Visual Analytics, VA), there have been countless efforts, in enterprise, education and research spheres, among others. Such efforts have led to different proposals for software tools using IV interfaces and techniques. Currently, there are dozens of tools that enhance and specialize in certain visualization techniques. Therefore, the choice of a particular tool is not a trivial task for users. In this work, we present a descriptive study on IV techniques encompassing several groups or types of techniques, such as: geometric projection, IV hierarchical, IV interactive, and icon-based IV, among others. To this end, a tabulation of information is performed, presenting software tools and visualization techniques considered in this study, so that the identification of the techniques most commonly used and recommended for use in environments such Open Source Solutions and business software can be readily performed. To do this, we start by a review of literature on IV, Visual Analytics, as well as scientific articles about Big Data analysis tools focused on establishing software tools and visualization techniques. Such a review is conducted on a total of 58 visualization techniques and 21 software tools. As a result, an assessment of visualization techniques is obtained and key issues and recommendations for the selection of visualization techniques according to user’s requirements are established. Key words: Big Data Visualization tools, Commercial Software, DataVis, Open Source Software Tools, Visualization Techniques.
Diego Peluffo
added an update
Similarity-based approach for interactive data visualization
 
Diego Peluffo
added 8 research items
Dimensionality reduction methods aimed at preserving the data topology have shown to be suitable for reaching high-quality embedded data. In particular, those based on divergences such as stochastic neighbour embedding (SNE). The big advantage of SNE and its variants is that the neighbor preservation is done by optimizing the similarities in both high- and low-dimensional space. This work presents a brief review of SNE-based methods. Also, a comparative analysis of the considered methods is provided, which is done on important aspects such as algorithm implementation, relationship between methods, and performance. The aim of this paper is to investigate recent alternatives to SNE as well as to provide substantial results and discussion to compare them.
Dimensionality reduction is a key stage for both the design of a pattern recognition system or data visualization. Recently, there has been a increasing interest in those methods aimed at preserving the data topology. Among them, Laplacian eigenmaps (LE) and stochastic neighbour embedding (SNE) are the most representative. In this work, we present a brief comparative among very recent methods being alternatives to LE and SNE. Comparisons are done mainly on two aspects: algorithm implementation, and complexity. Also, relations between methods are depicted. The goal of this work is providing researches on this field with some discussion as well as criteria decision to choose a method according to the user's needs and/or keeping a good trade-off between performance and required processing time.
Dimensionality reduction methods aimed at preserving the data topol-ogy have shown to be suitable for reaching high-quality embedded data. In particular , those based on divergences such as stochastic neighbour embedding (SNE). The big advantage of SNE and its variants is that the neighbor preservation is done by optimizing the similarities in both high-and low-dimensional space. This work presents a brief review of SNE-based methods. Also, a comparative analysis of the considered methods is provided, which is done on important aspects such as algorithm implementation, relationship between methods, and performance. The aim of this paper is to investigate recent alternatives to SNE as well as to provide substantial results and discussion to compare them.
Paul Rosero
added a research item
This work presents a new interactive data visualization approach based on mixture of the outcomes of dimensionality reduction (DR) methods. Such a mixture is a weighted sum, whose weighting factors are defined by the user through a visual and intuitive interface. Additionally, the low-dimensional representation space produced by DR methods are graphically depicted using scatter plots powered via an interactive data-driven visualization. To do so, pairwise similarities are calculated and employed to define the graph to be drawn on the scatter plot. Our visualization approach enables the user to interactively combine DR methods while provided information about the structure of original data, making then the selection of a DR scheme more intuitive.
Diego Peluffo
added an update
Preliminary results: First MatLab and Processing interfaces:
Interactive Interface for Data-Vis
Color-Based Model for Dimensionality Reduction
 
Diego Peluffo
added 2 research items
This work introduces a multiple kernel learning (MKL) approach for selecting and combining different spectral methods of dimensionality reduction (DR). From a predefined set of kernels representing conventional spectral DR methods, a generalized kernel is calculated by means of a linear combination of kernel matrices. Coefficients are estimated via a variable ranking aimed at quantifying how much each variable contributes to optimize a variance preservation criterion. All considered kernels are tested within a kernel PCA framework. The experiments are carried out over well-known real and artificial data sets. The performance of compared DR approaches is quantified by a scaled version of the average agreement rate between K-ary neighborhoods. Proposed MKL approach exploits the representation ability of every single method to reach a better embedded data for both getting more intelligible visualization and preserving the structure of data.
This paper presents the development of a unified view of spectral clustering and unsupervised dimensionality reduction approaches within a generalized kernel framework. To do so, we propose a multipurpose latent variable model in terms of a high-dimensional representation of the input data matrix, which is incorporated into a least-squares support vector machine to yield a generalized optimization problem. After solving it via a primal-dual procedure, the final model results in a versatile projected matrix able to represent data in a low-dimensional space, as well as to provide information about clusters. Also, our formulation yields solutions for kernel spectral clustering and weighted-kernel principal component analysis.
Diego Peluffo
added a project goal
The objective of this project is to link the field of dimensionality reduction (DR) with that of information visualization (IV), in order to harness the special properties of the latter within DR frameworks. In particular, the properties of controllability and interactivity are of interest, which should make the DR outcomes significantly more understandable and tractable for the (no-necessarily-expert) user. These two properties allow the user to have freedom to select the best way for representing data. Then, in other words, it can be said that the goal of this project is to develop a DR framework that facilitates an interactive and quick visualization of data representation to make more intelligible the DR outcomes, as well as to allow users modifying the views of data according to their needs in an affordable way.