Conference Paper

A Study on Labeling Network Hostile Behavior with Intelligent Interactive Tools

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Labeling a real network dataset is specially expensive in computer security, as an expert has to ponder several factors before assigning each label. This paper describes an interactive intelligent system to support the task of identifying hostile behaviors in network logs. The RiskID application uses visualizations to graphically encode features of network connections and promote visual comparison. In the background, two algorithms are used to actively organize connections and predict potential labels: a recommendation algorithm and a semi-supervised learning strategy. These algorithms together with interactive adaptions to the user interface constitute a behavior recommendation. A study is carried out to analyze how the algorithms for recommendation and prediction influence the workflow of labeling a dataset. The results of a study with 16 participants indicate that the behaviour recommendation significantly improves the quality of labels. Analyzing interaction patterns, we identify a more intuitive workflow used when behaviour recommendation is available.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The inclusion of a statistical learning model can be a valuable tool for helping the user during the decision process. Moreover, some of the approaches ( Fan et al., 2019;Guerra et al., 2019 ) claim the expertise required for using such systems is reduced. Nevertheless, the role of the ex-pert remains a fundamental aspect for guaranteeing the quality of the labels. ...
... Not differently is the case of assisted labeling strategies, where most of them seem not specially prepared for dealing with privacy mechanisms. Only the work of Guerra et al. (2019) have considered the inclusion of anonymized network traces during the labeling process. ...
... However, most of the reviewed works considering visualization tools ( Beaugnon et al., 2017;Fan et al., 2019;Koike et al., 2006;Livnat et al., 2005;Ren et al., 20 05;Scott et al., 20 03 ) have not evaluated the benefits and usefulness of the proposed visualizations. Fan et al. (2019) , and Guerra et al. (2019) are among the few authors to analyze the performance of different visualization techniques used to improve pattern perception during the interactive process. The fact is that the availability and cost of conducting a validation with expert users and traffic analysts affect the evaluation process. ...
Article
In contrast to previous surveys, the present work is not focused on reviewing the datasets used in the network security field. The fact is that many of the available public labeled datasets represent the network behavior just for a particular time period. Given the rate of change in malicious behavior and the serious challenge to label, and maintain these datasets, they become quickly obsolete. Therefore, this work is focused on the analysis of current labeling methodologies applied to network-based data. In the field of network security, the process of labeling a representative network traffic dataset is particularly challenging and costly since very specialized knowledge is required to classify network traces. Consequently, most of the current traffic labeling methods are based on the automatic generation of synthetic network traces, which hides many of the essential aspects necessary for a correct differentiation between normal and malicious behavior. Alternatively, a few other methods incorporate non-experts users in the labeling process of real traffic with the help of visual and statistical tools. However, after conducting an in-depth analysis, it seems that all current methods for labeling suffer from fundamental drawbacks regarding the quality, volume, and speed of the resulting dataset. This lack of consistent methods for continuously generating a representative dataset with an accurate and validated methodology must be addressed by the network security research community. Moreover, a consistent label methodology is a fundamental condition for helping in the acceptance of novel detection approaches based on statistical and machine learning techniques.
... Otherwise, Guerra et al. present RiskID [79,80], a modern application focus in the labeling of real traffic. Specifically, RiskID pretend to create labeled datasets based in botnet and normal behaviors. ...
... Fan et al. [65] and Guerra et al [79] are one of the few authors analyzing the performance of different visualization techniques applied to help users perceive patterns during the interactive process. In particular, Guerra et al. [79] provide the community with an important user study to measure the impact of the tool in the labeling process and to get relevance information about the labeling strategy followed by users. ...
... Fan et al. [65] and Guerra et al [79] are one of the few authors analyzing the performance of different visualization techniques applied to help users perceive patterns during the interactive process. In particular, Guerra et al. [79] provide the community with an important user study to measure the impact of the tool in the labeling process and to get relevance information about the labeling strategy followed by users. However, they give no information about the expertise level of the user in the security. ...
Preprint
Full-text available
In contrast to previous surveys, the present work is not focused on reviewing the datasets used in the network security field. The fact is that many of the available public labeled datasets represent the network behavior just for a particular time period. Given the rate of change in malicious behavior and the serious challenge to label, and maintain these datasets, they become quickly obsolete. Therefore, this work is focused on the analysis of current labeling methodologies applied to network-based data. In the field of network security, the process of labeling a representative network traffic dataset is particularly challenging and costly since very specialized knowledge is required to classify network traces. Consequently, most of the current traffic labeling methods are based on the automatic generation of synthetic network traces, which hides many of the essential aspects necessary for a correct differentiation between normal and malicious behavior. Alternatively, a few other methods incorporate non-experts users in the labeling process of real traffic with the help of visual and statistical tools. However, after conducting an in-depth analysis, it seems that all current methods for labeling suffer from fundamental drawbacks regarding the quality, volume, and speed of the resulting dataset. This lack of consistent methods for continuously generating a representative dataset with an accurate and validated methodology must be addressed by the network security research community. Moreover, a consistent label methodology is a fundamental condition for helping in the acceptance of novel detection approaches based on statistical and machine learning techniques.
... In addition to the information provided by the flow-based predictors, the CTU19 dataset includes information related with the flow like source IP, destination IP, protocol, port and the source linked with each capture. However, the present study will focus only on the information provided by the flow-based predictors as discussed in [2,8]. Fig. 4 describes the process used for the creation of the training and testing sets according to each splitting strategy. ...
Preprint
Full-text available
Even though a train/test split of the dataset randomly performed is a common practice, could not always be the best approach for estimating performance generalization under some scenarios. The fact is that the usual machine learning methodology can sometimes overestimate the generalization error when a dataset is not representative or when rare and elusive examples are a fundamental aspect of the detection problem. In the present work, we analyze strategies based on the predictors' variability to split in training and testing sets. Such strategies aim at guaranteeing the inclusion of rare or unusual examples with a minimal loss of the population's representativeness and provide a more accurate estimation about the generalization error when the dataset is not representative. Two baseline classifiers based on decision trees were used for testing the four splitting strategies considered. Both classifiers were applied on CTU19 a low-representative dataset for a network security detection problem. Preliminary results showed the importance of applying the three alternative strategies to the Monte Carlo splitting strategy in order to get a more accurate error estimation on different but feasible scenarios.
... Visualizations for network log analysis There are already many visual analytics approaches for the analysis of network logs (see surveys from Shiravi et al. [72] and Zhang et al. [79,80]). There are approaches that only focus on the interactive visualization of these logs [1,22,46,49], but also approaches that incorporate automatic data processing and detection in some way [28,29,54]. Several publications focus on one visualization technique [4,18,77], while others combine various visualizations and different views [2,13,26,27]. ...
Preprint
In this paper, we present our design study on developing an interactive visual firewall log analysis system in collaboration with an IT service provider. We describe the human-centered design process, in which we additionally considered hedonic qualities by including the usage of personas, psychological need cards and interaction vocabulary. For the problem characterization we especially focus on the demands of the two main clusters of requirements: high-level overview and low-level analysis, represented by the two defined personas, namely information security officer and network analyst. This resulted in the prototype of a visual analysis system consisting of two interlinked parts. One part addresses the needs for rather strategical tasks while also fulfilling the need for an appealing appearance and interaction. The other part rather addresses the requirements for operational tasks and aims to provide a high level of flexibility. We describe our design journey, the derived domain tasks and task abstractions as well as our visual design decisions, and present our final prototypes based on a usage scenario. We also report on our capstone event, where we conducted an observed experiment and collected feedback from the information security officer. Finally, as a reflection, we propose the extension of a widely used design study process with a track for an additional focus on hedonic qualities.
Article
Data labeling is crucial in various areas, including network security, and a prerequisite for applying statistical-based classification and supervised learning techniques. Therefore, developing labeling methods that ensure good performance is important. We propose a human-guided auto-labeling algorithm involving the self-supervised learning concept, with the purpose of labeling data quickly, accurately, and consistently. It consists of three processes: auto-labeling, validation, and update. A labeling scheme is proposed by considering weighted features in the auto-labeling, while the generalized extreme learning machine (GELM) enabling fast training is applied to validate assigned labels. Two different approaches are considered in the update to label new data to investigate labeling speed and accuracy. We experiment to verify the suitability and accuracy of the algorithm for network traffic, applying the algorithm to five traffic datasets, some including distributed denial of service (DDoS), DoS, BruteForce, and PortScan attacks. Numerical results show the algorithm labels unlabeled datasets quickly, accurately, and consistently and the GELM’s learning speed enables labeling data in real-time. It also shows that the performances between auto- and conventional labels are nearly identical on datasets containing only DDoS attacks, which implies the algorithm is quite suitable for such datasets. However, the performance differences between the two labels are not negligible on datasets, including various attacks. Several reasons that require further investigation can be considered, including the selected features and the reliability of conventional labels. Even with this limitation of the current study, the algorithm will provide a criterion for labeling data in real-time occurring in many areas.
Article
Full-text available
Supervised machine learning techniques require labelled multivariate training datasets. Many approaches address the issue of unlabelled datasets by tightly coupling machine learning algorithms with interactive visualisations. Using appropriate techniques, analysts can play an active role in a highly interactive and iterative machine learning process to label the dataset and create meaningful partitions. While this principle has been implemented either for unsupervised, semi-supervised, or supervised machine learning tasks, the combination of all three methodologies remains challenging. In this paper, a visual analytics approach is presented, combining a variety of machine learning capabilities with four linked visualisation views, all integrated within the mVis (multivariate Visualiser) system. The available palette of techniques allows an analyst to perform exploratory data analysis on a multivariate dataset and divide it into meaningful labelled partitions, from which a classifier can be built. In the workflow, the analyst can label interesting patterns or outliers in a semi-supervised process supported by active learning. Once a dataset has been interactively labelled, the analyst can continue the workflow with supervised machine learning to assess to what degree the subsequent classifier has effectively learned the concepts expressed in the labelled training dataset. Using a novel technique called automatic dimension selection, interactions the analyst had with dimensions of the multivariate dataset are used to steer the machine learning algorithms. A real-world football dataset is used to show the utility of mVis for a series of analysis and labelling tasks, from initial labelling through iterations of data exploration, clustering, classification, and active learning to refine the named partitions, to finally producing a high-quality labelled training dataset suitable for training a classifier. The tool empowers the analyst with interactive visualisations including scatterplots, parallel coordinates, similarity maps for records, and a new similarity map for partitions.
Article
Full-text available
The assignment of labels to data instances is a fundamental prerequisite for many machine learning tasks. Moreover, labeling is a frequently applied process in visual interactive analysis approaches and visual analytics. However, the strategies for creating labels usually differ between these two fields. This raises the question whether synergies between the different approaches can be attained. In this paper, we study the process of labeling data instances with the user in the loop, from both the machine learning and visual interactive perspective. Based on a review of differences and commonalities, we propose the “visual interactive labeling” (VIAL) process that unifies both approaches. We describe the six major steps of the process and discuss their specific challenges. Additionally, we present two heterogeneous usage scenarios from the novel VIAL perspective, one on metric distance learning and one on object detection in videos. Finally, we discuss general challenges to VIAL and point out necessary work for the realization of future VIAL approaches.
Article
Full-text available
Anomaly based approaches in network intrusion detection suffer from evaluation, comparison and deployment which originate from the scarcity of adequate publicly available network trace datasets. Also, publicly available datasets are either outdated or generated in a controlled environment. Due to the ubiquity of cloud computing environments in commercial and government internet services, there is a need to assess the impacts of network attacks in cloud data centers. To the best of our knowledge, there is no publicly available dataset which captures the normal and anomalous network traces in the interactions between cloud users and cloud data centers. In this paper, we present an experimental platform designed to represent a practical interaction between cloud users and cloud services and collect network traces resulting from this interaction to conduct anomaly detection. We use Amazon web services (AWS) platform for conducting our experiments.
Thesis
Full-text available
Botnets are the technological backbone supporting myriad of attacks, including identity stealing, organizational spying, DoS, SPAM, government-sponsored attacks and spying of political dissidents among others. The research community works hard creating detection algorithms of botnet network traffic. These algorithms have been partially successful, but are difficult to reproduce and verify; being often commercialized. However, the advances in machine learning algorithms and the access to better botnet datasets start showing promising results. The shift of the detection techniques to behavioral-based models has proved to be a better approach to the analysis of botnet patterns. However, the current knowledge of the botnet actions and patterns does not seem to be deep enough to create adequate traffic models that could be used to detect botnets in real networks. This thesis proposes three new botnet detection methods and a new model of botnet behavior that are based in a deep understanding of the botnet behaviors in the network. First the SimDetect method, that analyzes the structural similarities of clustered botnet traffic. Second the BClus method, that clusters traffic according to its connection patterns and uses decision rules to detect unknown botnet in the network. Third, the CCDetector method, that uses a novel state-based behavioral model of known Command and Control channels to train a Markov Chain and to detect similar traffic in unknown real networks. The BClus and CCDetector methods were compared with third-party detection methods, showing their use in real environments. The core of the CCDetector method is our state-based behavioral model of botnet ac tions. This model is capable of representing the changes in the behaviors over time. To support the research we use a huge dataset of botnet traffic that was captured in our Malware Capture Facility Project. The dataset is varied, large, public, real and has Background,Normal and Botnet labels. The tools, dataset and algorithms were released as free software. Our algorithms give a new high-level interface to identify, visualize and block botnet behaviors in the networks.
Conference Paper
Full-text available
Understanding the results of a multi objective optimization process can be hard. Various visualization methods have been proposed previously, but the only consistently popular one is the 2D or 3D objective scatterplot, which cannot be extended to handle more than 3 objectives. Additionally, the visualization of high dimensional parameter spaces has traditionally been neglected. We propose a new method, based on heatmaps, for the simultaneous visualization of objective and parameter spaces. We demonstrate its application on a simple 3D test function and also apply heatmaps to the analysis of real-world optimization problems. Finally we use the technique to compare the performance of two different multi-objective algorithms.
Conference Paper
Full-text available
Anomaly detection for network intrusion detection is usu- ally considered an unsupervised task. Prominent techniques, such as one-class support vector machines, learn a hyper- sphere enclosing network data, mapped to a vector space, such that points outside of the ball are considered anoma- lous. However, this setup ignores relevant information such as expert and background knowledge. In this paper, we rephrase anomaly detection as an active learning task. We propose an eective active learning strategy to query low- confidence observations and to expand the data basis with minimal labeling eort. Our empirical evaluation on net- work intrusion detection shows that our approach consis- tently outperforms existing methods in relevant scenarios.
Conference Paper
Full-text available
Flow-based intrusion detection has recently become a promising security mechanism in high speed networks (1-10 Gbps). Despite the richness in contributions in this field, benchmarking of flow-based IDS is still an open issue. In this paper, we propose the first publicly available, labeled data set for flow-based intrusion detection. The data set aims to be realistic, i.e., representative of real traffic and complete from a labeling perspective. Our goal is to provide such enriched data set for tuning, training and evaluating ID systems. Our setup is based on a honeypot running widely deployed services and directly connected to the Internet, ensuring attack-exposure. The final data set consists of 14.2M flows and more than 98% of them has been labeled.
Conference Paper
Full-text available
Understanding the results of a multi objective optimization process can be hard. Various visualization methods have been proposed previously, but the only consistently popular one is the 2D or 3D objective scatterplot, which cannot be extended to handle more than 3 objectives. Additionally, the visualization of high dimensional parameter spaces has traditionally been neglected. We propose a new method, based on heatmaps, for the simultaneous visualization of objective and parameter spaces. We demonstrate its application on a simple 3D test function and also apply heatmaps to the analysis of real-world optimization problems. Finally we use the technique to compare the performance of two different multi-objective algorithms.
Article
Full-text available
Despite the flurry of anomaly-detection papers in recent years, effective ways to validate and compare proposed so- lutions have remained elusive. We argue that evaluating anomaly detectors on manually labeled traces is both impor- tant and unavoidable. In particular, it is important to eval- uate detectors on traces from operational networks because it is in this setting that the detectors must ultimately suc- ceed. In addition, manual labeling of such traces is unavoid- able because new anomalies will be identified and character- ized from manual inspection long before there are realistic models for them. It is well known, however, that manual labeling is slow and error-prone. In order to mitigate these challenges, we present WebClass, a web-based infrastructure that adds rigor to the manual labeling process. WebClass allows researchers to share, inspect, and label traffic time- series through a common graphical user interface. We are releasing WebClass to the research community in the hope that it will foster greater collaboration in creating labeled traces and that the traces will be of higher quality because the entire community has access to all the information that led to a given label.
Conference Paper
With exponential growth in the size of computer networks and developed applications, the significant increasing of the potential damage that can be caused by launching attacks is becoming obvious. Meanwhile, Intrusion Detection Systems (IDSs) and Intrusion Prevention Systems (IPSs) are one of the most important defense tools against the sophisticated and ever-growing network attacks. Due to the lack of adequate dataset, anomaly-based approaches in intrusion detection systems are suffering from accurate deployment, analysis and evaluation. There exist a number of such datasets such as DARPA98, KDD99, ISC2012, and ADFA13 that have been used by the researchers to evaluate the performance of their proposed intrusion detection and intrusion prevention approaches. Based on our study over eleven available datasets since 1998, many such datasets are out of date and unreliable to use. Some of these datasets suffer from lack of traffic diversity and volumes, some of them do not cover the variety of attacks, while others anonymized packet information and payload which cannot reflect the current trends, or they lack feature set and metadata. This paper produces a reliable dataset that contains benign and seven common attack network flows, which meets real world criteria and is publicly available. Consequently, the paper evaluates the performance of a comprehensive set of network traffic features and machine learning algorithms to indicate the best set of features for detecting the certain attack categories.
Conference Paper
Acquiring a representative labelled dataset is a hurdle that has to be overcome to learn a supervised detection model. Labelling a dataset is particularly expensive in computer security as expert knowledge is required to perform the annotations. In this paper, we introduce ILAB, a novel interactive labelling strategy that helps experts label large datasets for intrusion detection with a reduced workload. First, we compare ILAB with two state-of-the-art labelling strategies on public labelled datasets and demonstrate it is both an effective and a scalable solution. Second, we show ILAB is workable with a real-world annotation project carried out on a large unlabelled NetFlow dataset originating from a production environment. We provide an open source implementation (https://github.com/ANSSI-FR/SecuML/) to allow security experts to label their own datasets and researchers to compare labelling strategies.
Conference Paper
Network security is a long-lasting field of research constantly encountering new challenges. Inherently, research in this field is highly data-driven. Specifically, many approaches employ a supervised machine learning approach requiring labelled input data. While different publicly available data sets exist, labelling information is sparse. In order to understand how our community deals with this lack of labels, we perform a systematic study of network security research accepted at top IT security conferences in 2009 -- 2013. Our analysis reveals that 70% of the papers reviewed rely on manually compiled data sets. Furthermore, only 10% of the studied papers release the data sets after compilation. This manifests that our community is facing a missing labelled data problem. In order to be able to address this problem, we give a definition and discuss crucial characteristics of the problem. Furthermore, we reflect and discuss roads towards overcoming this problem by establishing ground-truth and fostering data sharing.
Chapter
Novel graphical and direct-manipulation approaches to query formulation and information visualization are now possible. A useful starting point for designing advanced graphical user interfaces is the Visual Information-Seeking Mantra: first overview, followed by zoom and filter, and then details-on-demand. This chapter offers a task by data type taxonomy with seven data types (1D, 2D, 3D data, temporal data, multi-dimensional data, tree data, and network data) and seven tasks (overview, zoom, filter, details-on-demand, relate, history, and extracts). The success of direct-manipulation interfaces is indicative of the power of using computers in a more visual or graphical manner. Visual displays become even more attractive to provide orientation or context, to enable selection of regions, and to provide dynamic feedback for identifying changes (for example, a weather map). Scientific visualization has the power to make atomic, cosmic, and common 3D phenomena (for example, heat conduction in engines, airflow over wings, or ozone holes) visible and comprehensible. In the visual representation of data, users can scan, recognize, and recall images rapidly and can detect changes in size, color, shape, movement, or texture. They can point to a single pixel, even in a megapixel display, and can drag one object to another to perform an action. The novel-information exploration tools—such as dynamic queries, treemaps, fisheye views, parallel coordinates, starfields, and perspective walls—are a few of the inventions that will have to be validated.
Article
With exponential growth in the number of computer applications and the sizes of networks, the potential damage that can be caused by attacks launched over the Internet keeps increasing dramatically. A number of network intrusion detection methods have been developed with respective strengths and weaknesses. The majority of network intrusion detection research and development is still based on simulated datasets due to non-availability of real datasets. A simulated dataset cannot represent a real network intrusion scenario. It is important to generate real and timely datasets to ensure accurate and consistent evaluation of detection methods. In this paper, we propose a systematic approach to generate unbiased fullfeature real-life network intrusion datasets to compensate for the crucial shortcomings of existing datasets. We establish the importance of an intrusion dataset in the development and validation process of detection mechanisms, identify a set of requirements for effective dataset generation, and discuss several attack scenarios and their incorporation in generating datasets. We also establish the effectiveness of the generated dataset in the context of several existing datasets.
Article
Systems that can learn interactively from their end-users are quickly becoming widespread. Until recently, this progress has been fueled mostly by advances in machine learning; however, more and more researchers are realizing the importance of studying users of these systems. In this article we promote this approach and demonstrate how it can result in better user experiences and more effective learning systems. We present a number of case studies that demonstrate how interactivity results in a tight coupling between the system and the user, exemplify ways in which some existing systems fail to account for the user, and explore new ways for learning systems to interact with their users. After giving a glimpse of the progress that has been made thus far, we discuss some of the challenges we face in moving the field forward.
Article
Automatic network intrusion detection has been an important research topic for the last 20 years. In that time, approaches based on signatures describing intrusive behavior have become the de-facto industry standard. Alternatively, other novel techniques have been used for improving automation of the intrusion detection process. In this regard, statistical methods, machine learning and data mining techniques have been proposed arguing higher automation capabilities than signature-based approaches. However, the majority of these novel techniques have never been deployed on real-life scenarios. The fact is that signature-based still is the most widely used strategy for automatic intrusion detection. In the present article we survey the most relevant works in the field of automatic network intrusion detection. In contrast to previous surveys, our analysis considers several features required for truly deploying each one of the reviewed approaches. This wider perspective can help us to identify the possible causes behind the lack of acceptance of novel techniques by network security experts.
Conference Paper
In network intrusion detection research, one popular strategy for finding attacks is monitoring a network's activity for anomalies: deviations from profiles of normality previously learned from benign traffic, typically identified using tools borrowed from the machine learning community. However, despite extensive academic research one finds a striking gap in terms of actual deployments of such systems: compared with other intrusion detection approaches, machine learning is rarely employed in operational "real world" settings. We examine the differences between the network intrusion detection problem and other areas where machine learning regularly finds much more success. Our main claim is that the task of finding attacks is fundamentally different from these other applications, making it significantly harder for the intrusion detection community to employ machine learning effectively. We support this claim by identifying challenges particular to network intrusion detection, and provide a set of guidelines meant to strengthen future research on anomaly detection.
Chapter
The results of a multi-year research program to identify the factors associated with variations in subjective workload within and between different types of tasks are reviewed. Subjective evaluations of 10 workload-related factors were obtained from 16 different experiments. The experimental tasks included simple cognitive and manual control tasks, complex laboratory and supervisory control tasks, and aircraft simulation. Task-, behavior-, and subject-related correlates of subjective workload experiences varied as a function of difficulty manipulations within experiments, different sources of workload between experiments, and individual differences in workload definition. A multi-dimensional rating scale is proposed in which information about the magnitude and sources of six workload-related factors are combined to derive a sensitive and reliable estimate of workload.
Conference Paper
A useful starting point for designing advanced graphical user interfaces is the visual information seeking Mantra: overview first, zoom and filter, then details on demand. But this is only a starting point in trying to understand the rich and varied set of information visualizations that have been proposed in recent years. The paper offers a task by data type taxonomy with seven data types (one, two, three dimensional data, temporal and multi dimensional data, and tree and network data) and seven tasks (overview, zoom, filter, details-on-demand, relate, history, and extracts)
Why interpretability in machine learning? an answer using distributed detection and data fusion theory
  • K R Varshney
  • P Khanduri
  • P Sharma
  • S Zhang
  • P K Varshney
K. R. Varshney, P. Khanduri, P. Sharma, S. Zhang, and P. K. Varshney. Why interpretability in machine learning? an answer using distributed detection and data fusion theory. CoRR, abs/1806.09710, 2018.
Providing SCADA network data sets for intrusion detection research
  • lemay
A. Lemay and J. M. Fernandez. Providing SCADA network data sets for intrusion detection research. Usenix Cset, 2016.
Why interpretability in machine learning? an answer using distributed detection and data fusion theory
  • varshney