
Boris Kovalerchuk- Prof. PhD.
- Professor (Full) at Central Washington University
Boris Kovalerchuk
- Prof. PhD.
- Professor (Full) at Central Washington University
About
237
Publications
38,591
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,002
Citations
Introduction
Visual Knowledge Discovery, Machine Learning, High Dimensional Data Visualization
Current institution
Additional affiliations
September 1996 - present
Publications
Publications (237)
High-risk artificial intelligence and machine learning classification tasks, such as healthcare diagnosis, require accurate and interpretable prediction models. However, classifier algorithms typically sacrifice individual case-accuracy for overall model accuracy, limiting analysis of class overlap areas regardless of task significance. The Adaptiv...
In the geological context, the analysis of multidi-mensional data is a very common task and requires visualization techniques for exploration. General Line Coordinates (GLC) is a technique especially intended for lossless representation among the various techniques available for the visualization of multidimensional data. In this study, the applica...
Insufficient amounts of available training data is a critical challenge for both development and deployment of artificial intelligence and machine learning (AI/ML) models. This paper proposes a unified approach to both synthetic data generation (SDG) and automated data labeling (ADL) with a unified SDG-ADL algorithm. SDG-ADL uses multidimensional (...
In the geological context, the analysis of multidi-mensional data is a very common task and requires visualizationtechniques for exploration. General Line Coordinates (GLC)is a technique especially intended for lossless representationamong the various techniques available for the visualization ofmultidimensional data. In this study, the application...
To increase the interpretability and prediction accuracy of the Machine Learning (ML) models, visualization of ML models is a key part of the ML process. Decision Trees (DTs) are essential in machine learning because they are used to understand many black box ML models including Deep Learning models. In this research, two new methods for creation a...
This study explores a new methodology for machine learning classification tasks in 2-dimensional visualization space (2-D ML) using Visual knowledge Discovery in lossless General Line Coordinates. It is shown that this is a full machine learning approach that does not require processing n-dimensional data in an abstract n-dimensional space. It enab...
This work uses visual knowledge discovery in parallel coordinates to advance methods of interpretable machine learning. The graphic data representation in parallel coordinates made the concepts of hypercubes and hyperblocks (HBs) simple to understand for end users. It is suggested to use mixed and pure hyperblocks in the proposed data classifier al...
Building accurate and explainable/interpretable Machine Learning (ML) models for heterogeneous/mixed data is a long-standing challenge for algorithms designed for numeric data. This work focuses on developing numeric coding schemes for non-numeric attributes for ML algorithms to support accurate and explainable ML models, methods for lossless visua...
Understanding black-box Machine Learning methods on multidimensional data is a key challenge in Machine Learning. While many powerful Machine Learning methods already exist, these methods are often unexplainable or perform poorly on complex data. This paper proposes Visual Knowledge Discovery approaches based on several forms of lossless General Li...
Domains with high-stakes tasks include medical diagnosis, financial decision-making, autonomous vehicles, criminal justice, disaster response, search and rescue operations, and others. In high-stakes tasks the effects of incorrect decisions or predictions can cause significant harm to individuals or society. This paper presents (1) major topics of...
A major motivation for explaining and rigorous evaluating Machine Learning (ML) models is coming from high-stake decision-making tasks like cancer diagnostics, self-driving cars, and others with possible catastrophic consequences of wrong decisions. This chapter shows that visual knowledge discovery (VKD) methods, based on the General Line Coordina...
This chapter describes machine learning (ML) for financial applications with a focus on interpretable relational methods. It presents financial tasks, methodologies, and techniques in this ML area. It includes time dependence, data selection, forecast horizon, measures of success, quality of patterns, hypothesis evaluation, problem ID, method profi...
The importance of visual methods in machine learning (ML) as tools to increase the interpretability and validity of models, is growing. The visual exploration of multidimensional data for knowledge discovery of all possible sizes and dimensions is a long-standing challenge. While multiple efficient methods for visual representation of high-dimensio...
To increase the interpretability and prediction accuracy of the Machine Learning (ML) models, visualization of ML models is a key part of the ML process. Decision Trees (DTs) are essential in machine learning (ML) because they are used to understand many black box ML models including Deep Learning models. In this research, two new methods for creat...
Building accurate and interpretable Machine Learning (ML) models for heterogeneous/mixed data is a long-standing challenge for algorithms designed for numeric data. This work focuses on developing numeric coding schemes for non-numeric attributes for ML algorithms to support accurate and explainable ML models, methods for lossless visualization of...
This work uses visual knowledge discovery in parallel coordinates to advance methods of interpretable machine learning. The graphic data representation in parallel coordinates made the concepts of hypercubes and hyperblocks (HBs) simple to understand for end users. It is suggested to use mixed and pure hyperblocks in the proposed data classifier al...
Understanding black-box Machine Learning methods on multidimensional data is a key challenge in Machine Learning. While many powerful Machine Learning methods already exist, these methods are often unexplainable or perform poorly on complex data. This paper proposes visual knowledge discovery approaches based on several forms of lossless General Li...
This study explores a new methodology for machine learning classification tasks in 2-dimensional visualization space (2-D ML) using Visual knowledge Discovery in lossless General Line Coordinates. It is shown that this is a full machine learning approach that does not require processing n-dimensional data in an abstract n-dimensional space. It enab...
Developing Machine Learning (ML) algorithms for heterogeneous/mixed data is a longstanding problem. Many ML algorithms are not applicable to mixed data, which include numeric and non-numeric data, text, graphs and so on to generate interpretable models. Another longstanding problem is developing algorithms for lossless visualization of multidimensi...
Powerful deep learning algorithms open an opportunity for solving non-image Machine Learning (ML) problems by transforming these problems into the image recognition problems. The CPC-R algorithm presented in this chapter converts non-image data into images by visualizing non-image data. Then deep learning CNN algorithms solve the learning problems...
This book is devoted to the emerging field of integrated visual knowledge discovery that combines advances in artificial intelligence/machine learning and visualization/visual analytic. A long-standing challenge of artificial intelligence (AI) and machine learning (ML) is explaining models to humans, especially for live-critical applications like h...
This paper contributes to interpretable machine learning via visual knowledge discovery in general line coordinates (GLC). The concepts of hyperblocks as interpretable dataset units and general line coordinates are combined to create a visual self-service machine learning model. The DSC1 and DSC2 lossless multidimensional coordinate systems are pro...
Visualization of Machine Learning (ML) models is an important part of the ML process to enhance the interpretability and prediction accuracy of the ML models. This paper proposes a new method SPC-DT to visualize the Decision Tree (DT) as interpretable models. These methods use a version of General Line Coordinates called Shifted Paired Coordinates...
This volume is devoted to the emerging field of Integrated Visual Knowledge Discovery that combines advances in Artificial Intelligence/Machine Learning (AI/ML) and Visualization/Visual Analytics. Chapters included are extended versions of the selected AI and Visual Analytics papers and related symposia at the recent International Information Visua...
Integrating artificial intelligence (AI) and machine learning (ML) methods with interactive visualization is a research area that has evolved for years. With the rise of AI applications, the combination of AI/ML and interactive visualization is elevated to new levels of sophistication and has become more widespread in many domains. Such application...
Machine learning algorithms often produce models considered as complex black-box models by both end users and developers. Such algorithms fail to explain the model in terms of the domain they are designed for. The proposed Iterative Visual Logical Classifier (IVLC) is an interpretable machine learning algorithm that allows end users to design a mod...
It is challenging for humans to enable visual knowledge discovery in data with more than 2–3 dimensions with a naked eye. This chapter explores the efficiency of discovering predictive machine learning models interactively using new Elliptic Paired coordinates (EPC) visualizations. It is shown that EPC are capable to visualize multidimensional data...
It is challenging for humans to enable visual knowledge discovery in data with more than 2-3 dimensions with a naked eye. This chapter explores the efficiency of discovering predictive machine learning models interactively using new Elliptic Paired coordinates (EPC) visualizations. It is shown that EPC are capable to visualize multidimensional data...
Machine learning algorithms often produce models considered as complex black-box models by both end users and developers. They fail to explain the model in terms of the domain they are designed for. The proposed Iterative Visual Logical Classifier (IVLC) is an interpretable machine learning algorithm that allows end users to design a model and clas...
Powerful deep learning algorithms open an opportunity for solving non-image Machine Learning (ML) problems by transforming these problems to into the image recognition problems. The CPC-R algorithm presented in this chapter converts non-image data into images by visualizing non-image data. Then deep learning CNN algorithms solve the learning proble...
This paper contributes to interpretable machine learning via visual knowledge discovery in parallel coordinates. The concepts of hypercubes and hyper-blocks are used as easily understandable by end-users in the visual form in parallel coordinates. The Hyper algorithm for classification with mixed and pure hyper-blocks (HBs) is proposed to discover...
This paper proposed a new methodology for machine learning in 2-dimensional space (2-D ML) in inline coordinates. It is a full machine learning approach that does not require to deal with n-dimensional data in n-dimensional space. It allows discovering n-D patterns in 2-D space without loss of n-D information using graph representations of n-D data...
This chapter surveys and analyses visual methods of explainability of Machine Learning (ML) approaches with focus on moving from quasi-explanations that dominate in ML to actual domain-specific explanation supported by granular visuals. The importance of visual and granular methods to increase the interpretability and validity of the ML model has g...
This paper surveys visual methods of explainability of Machine Learning (ML) with focus on moving from quasi-explanations that dominate in ML to domain-specific explanation supported by granular visuals. ML interpretation is fundamentally a human activity and visual methods are more readily interpretable. While efficient visual representations of h...
The algorithm of k-fold cross validation is actively used to evaluate and compare machine learning algorithms. However, it has several important deficiencies documented in the literature along with its advantages. The advantages of quick computations are also a source of its major deficiency. It tests only a small fraction of all the possible split...
This tutorial covers the state-of-the-art research, development, and applications in the KDD area of interpretable knowledge discovery reinforced by visual methods to stimulate and facilitate future work. It serves the KDD mission and objectives of gaining insight from the data. The topic is interdisciplinary bridging of scientific research and app...
Preserving all multidimensional data in two-dimensional visualization is a long-standing problem in Visual Analytics, Machine Learning/Data Mining, and Multiobjective Pareto Optimization. While Parallel and Radial (Star) coordinates preserve all n-D data in two dimensions, they are not sufficient to address visualization challenges of all possible...
Developing more efficient automated methods for
interpretable machine learning (ML) is an important and long term
machine-learning goal. Recent studies show that
unintelligible “black” box models, such as Deep Learning
Neural Networks, often outperform more interpretable “grey” or
“white” box models such as Decision Trees, Bayesian networks,
Logic...
While knowledge discovery and n-D data visualization procedures are often efficient, the loss of information, occlusion, and clutter continue to be a challenge. General Line Coordinates (GLC) is a rather new technique to deal with such artifacts. GLC-Linear, which is one of the methods in GLC, allows transforming n-D numerical data to their visual...
Explanation and occlusion are the major problems for interactive visual knowledge discovery, machine learning and data mining in multidimensional data. This paper proposes a hybrid method that combines the visual and analytical means to deal with these problems. This method, denoted as FSP, uses visualization of n-D data in 2-D, in a set of Shifted...
An important challenge for Machine Learning (ML) methods such as the Support Vector Machine (SVM), and others, is the selection of the structure of ML models for given data. This paper shows that the abilities of the pure analytical ML methods to address this challenge are limited. It is due to the fundamental nature of the ML methods, which rely o...
Occlusion is one of the major problems for visualization methods in finding the patterns in the n-D data. This chapter describes the methods for decreasing the occlusion, and pattern simplification in different General Line Coordinates by adjusting GLCs to the given data via shifting, relocating, and scaling coordinates. In contrast, in Parallel...
The Big data challenge includes dealing with a big number of, heterogeneous and multidimensional, datasets, of all possible sizes, not only with the data of big size. As a result, a huge number of Machine Learning (ML) tasks, which must be solved, dramatically exceeds the number of the data scientists, who can solve these tasks. Next, many ML tasks...
The Pareto Front is a mathematically correct solution of multi-objective optimization problems with several conflicting objectives. However, it is only a partial solution for many real-world situations, where only few of the PF alternatives can be implemented, due to resource limitations and other reasons. Commonly Pareto Front is narrowed by linea...
In this chapter, we first compare General Line Coordinates with other visualization methods that were not analyzed in the previous chapters yet. Then we summarize some comparisons that were presented in other chapters. Next, the hybrid approach that fuses General Line Coordinates with other methods is summarized along with the outline of the futur...
This chapter presents a visual text mining approach to modeling humor within text. It includes algorithms for visualizing and discovering shifts in text interpretation as intelligent agents parse meaning from garden path jokes. Three successful text visualization methods are described to identify discrimination features for humorous and non-humorou...
This chapter describes various types of General Line Coordinates for visualizing multidimensional data in 2-D and 3-D in a reversible way. These types of GLCs include n-Gon, Circular, In-Line, Dynamic, and Bush Coordinates, which directly generalize Parallel and Radial Coordinates. Another class of GLCs described in this chapter is a class of rever...
Previous chapters demonstrated the ways of visual discovery of patterns using different General Line Coordinates. This chapter demonstrates the hybrid visual and analytical way to enhance the estimation of accuracy and errors of machine leaning discovery. It focuses on improvement of k-fold cross validation. It provides: (1) a justification for the...
The analysis of data visualized with different GLCs in previous chapters shows that multiple visual features could be estimated for each individual graph. This chapter evaluates efficiency of the human visual system in discovering discriminating features for n-D data classification learning tasks in Closed Contour Paired Coordinates (traditional S...
High-dimensional data play an important and growing role in knowledge discovery, modeling, decision making, information management, and other areas. Visual representation of high-dimensional data opens the opportunity for understanding, comparing and analyzing visually hundreds of features of complicated multidimensional relations of n-D points in...
Knowledge discovery is an important aspect of human cognition. The advantage of the visual approach is in the opportunity of solving easier perceptual tasks instead of complex cognitive tasks. However for cognitive tasks such as financial investment decision making, this opportunity faces the challenge that financial data are abstract multidimens...
This chapter provides several case studies on the use of General Line Coordinates for knowledge discovery and supervised learning mostly from the data from the University of California Irvine Machine Learning repository. The real-world tasks include Health monitoring, Iris data classification, Challenger disaster and others. These cases studies in...
The goal of this chapter is to present a new interactive visual machine learning system for solving supervised learning classification tasks based on a GLC-L visualization algorithm and associated interactive and automatic algorithms GLC-IL, GLC-AL and GLC-DRL for discovery of linear and non-linear relations and dimension reduction. Classification...
This chapter mathematically defines concepts that form various General Line Coordinates (GLCs). It provides relevant algorithms and statements that describe mathematical properties of GLCs and relations that GLCs represent. The theoretical basis of a GLC is considered in connection with the Johnson-Lindenstrauss Lemma.
The focus of this paper is to clarify the concepts of solutions of linear equations in interval, probabilistic, and fuzzy sets setting for real-world tasks. There is a fundamental difference between formal definitions of the solutions and physically meaningful concepts of solution in applied tasks, when equations have uncertain components. For inst...
The exploration of multidimensional datasets of all possible sizes and dimensions is a long-standing challenge in knowledge discovery, machine learning, and visualization. While multiple efficient visualization methods for n-D data analysis exist, the loss of information, occlusion, and clutter continue to be a challenge. This paper proposes and ex...
In the long run the cognitive algorithms intend to make super-intelligent machines and super-intelligent humans. This paper presents a technical process to reach specific aspects of super-intelligence that are out of the current human cognitive abilities. These aspects are inabilities to discover patterns in large numeric multidimensional data with...
The goal of this paper is to investigate the use of visualization as an approach to modeling humor within text. In particular, we developed algorithmic and automated approaches to visualizing and detecting shifts in interpretation as intelligent agents parse meaning from garden path jokes. Garden path jokes can occur when a reader’s initial interpr...
Knowledge discovery is an important aspect of human cognition. The advantage of the visual approach is in opportunity to substitute some complex cognitive tasks by easier perceptual tasks. However for cognitive tasks such as financial investment decision making this opportunity faces the challenge that financial data are abstract multidimensional a...
This work shows how dependence in many-valued logic and probability theory can be fused into one concept by using copulas and marginal probabilities. It also shows that the t-norm concept used in fuzzy logic is covered by this approach. This leads to a more general statement that axiomatic probability theory covers logic structure of fuzzy logic. T...
We propose the ontological approach to Data Mining that is based on: (1) the analysis of subject domain ontology, (2) information in data that are interpretable in terms of ontology, and (3) interpretability of Data Mining methods and their results in ontology. Respectively concepts of Data Ontology and Data Mining Method Ontology are introduced. T...
The goal of a new area of Computing with Words (CWW) is solving computationally tasks formulated in a natural language (NL). The extreme uncertainty of NL is the major challenge to meet this ambitious goal requiring computational approaches to handle NL uncertainties. Attempts to solve various CWW tasks lead to the methodological questions about ri...
The importance of the optimal Sensor Resource Management (SRM) problem is growing. The number of Radar, EO/IR, Overhead Persistent InfraRed (OPIR), and other sensors with best capabilities, is limited in the stressing tasking environment relative to sensing needs. Sensor assets differ significantly in number, location, and capability over time. To...
The focus of this paper is to clarify the concepts of
solutions of linear equations in interval, probabilistic, and fuzzy
sets setting for real world tasks. There is a fundamental
difference between formal definitions of the solutions and
physically meaningful concepts of solution in applied tasks, when
equations have uncertain components. For inst...
Fundamental challenges and goals of the cognitive algorithms are moving super-intelligent machines and super-intelligent humans from dreams to reality. This paper is devoted to a technical way to reach some specific aspects of super-intelligence that are beyond the current human cognitive abilities. Specifically the proposed technique is to overcom...
This research is motivated by a long-standing problem of ineffective heuristic initial selection of a class of models, and its structures in modern data mining, machine learning, and other fields. Such heuristics usually are due to insufficient prior knowledge to select a class of models, and inability to represent visually and losslessly the compl...
A new field of Computing with Words (CWW)
intends to “compute” with phrases that involve linguistic
numbers and relations between them such as “about five applies”
and “close to six PM”. CWW includes answering arithmetic
question such as: What is the number of apples that John
obtained if he bought about 5 apples two times today? The
common sense a...
The collaborative approach is a natural way to enhance visualization and visual analytics methods. This paper continues our long-term efforts on enhancement of visualization and visual analytics methods. The major challenges in visualization of large n-D data in 2-D are not only in providing lossless visualization by using sophisticated computation...
The major challenges in visualization of large n-D data in 2-D are in supporting the most efficient and fast usage of abilities of users to analyze visualized information and to extract patterns visually. This paper describes experimental results of a collaborative approach to support n-D data visualization based on new lossless n-D visualization m...
Zadeh posed several Computing with Words (CWW) test problems such as: "What is the probability that John is short?" These problems assume a given piece of information in the form of membership functions for linguistic terms including tall, short, young, middle-aged, and the probability density functions of age and height. This paper proposes a solu...
This paper addresses the problem of improving the relevance of a search engine results in a vertical domain. The proposed algorithm is built on a structured taxonomy of keywords. The taxonomy construction process starts from the seed terms (keywords) and mines the available source domains for new terms associated with these entities. These new term...
Often multidimensional data are visualized by splitting n-D data to a set of low dimensional data. While it is useful it destroys integrity of n-D data, and leads to a shallow understanding complex n-D data. To mitigate this challenge a difficult perceptual task of assembling low-dimensional visualized pieces to the whole n-D vectors must be solved...
Although shape perception is the main information channel for brain, it has been poor used by recent visualization techniques. The difficulties of its modeling are key obstacles for visualization theory and application. Known experimental estimates of shape perception capabilities have been made for low data dimension, and they were usually not con...
We developed an original approach to cognition, based on the previously developed theory of neural modeling fields and dynamic logic. This approach is based on the detailed analysis and solution of the problems of artificial intelligence — combinatorial complexity and logic and probability synthesis. In this paper we interpret the theory of neural...
Correlating and fusing video frames from distributed and moving sensors is important area of video matching. It is especially difficult for frames with objects at long distances that are visible as single pixels where the algorithms cannot exploit the structure of each object. The proposed algorithm correlates partial frames with such small objects...
Automated Feature Extraction (AFE) plays a critical role in image understanding. Often the imagery analysts extract features better than AFE algorithms do, because analysts use additional information. The extraction and processing of this information can be more complex than the original AFE task, and that leads to the "complexity trap". This can h...