Article

MILP approach to pattern generation in logical analysis of data

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Pattern generation methods for the Logical Analysis of Data (LAD) have been term-enumerative in nature. In this paper, we present a Mixed 0–1 Integer and Linear Programming (MILP) approach that can identify LAD patterns that are optimal with respect to various previously studied and new pattern selection preferences. Via art of formulation, the MILP-based method can generate optimal patterns that also satisfy user-specified requirements on prevalence, homogeneity and complexity. Considering that MILP problems with hundreds of 0–1 variables are easily solved nowadays, the proposed method presents an efficient way of generating useful patterns for LAD. With extensive experiments on benchmark datasets, we demonstrate the utility of the MILP-based pattern generation.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Ryoo et al. [39] and Guo et al. [18] proposed mathematical models that find strong patterns. In Bonates [6], the patterns generation problem is solved by a column generation technique to generate the maximum α-patterns that maximize the separation gap between positive and negative classes. ...
... Mathematical programming models are proposed in Ryoo et al. [39] to find strong, strong prime, and strong spanned patterns. The proposed models can also find different types of patterns in terms of homogeneity and prevalence. ...
... In Guo et al. [18], compact mathematical models are proposed to find strong prime and strong spanned patterns. The compact models have fewer variables than the models proposed in Ryoo et al. [39] and resulted in higher classification accuracy. ...
Article
Full-text available
Classification is a common task in data mining that assigns a class label to an unseen situation. It has been widely used in decision making for various applications, and many machine learning algorithms have been developed to accomplish this task. Classification becomes critical when the problem under concern is related to serious situations such as fraud detection, cancer diseases, and quality control. Learning in these situations is characterized by predetermined asymmetric costs of incorrect class prediction, or critical consequences associated with erroneous class prediction. In this paper, a novel approach of cost-sensitive learning is proposed. The approach is constructed by employing the theory of logical analysis of data (LAD) to build accurate cost-sensitive classifiers. Two classifiers are proposed. The first classifier is established by solving a proposed pattern selection model, minimum misclassification cost model (MMCM), that aims at minimizing the asymmetric misclassification cost. The second classifier is established by solving another proposed pattern selection model, maximum precision–recall model (MPRM), that maximizes precision and recall willing to reach a 100% accuracy. A comparative study is conducted by using real datasets. The proposed MMCM has enabled LAD to realize up to 32.22% cost reduction from the misclassification cost realized by the traditional implementation of LAD. Moreover, MPRM has provided up to 19.15% increase in the precision and up to 37% increase in the recall. Also, MPRM has enhanced the performance of LAD while compared to common machine learning algorithms by providing better combinations of recall and false positive rate. This enabled LAD to provide the closet to the optimal point on the receiver operating characteristic (ROC) diagram when compared with existing machine learning methods. Incorporating the MMCM and the MPRM models into LAD establishes a novel implementation of LAD that makes LAD a promising cost-sensitive learning classifier compared to other machine learning classifiers.
... The literature of LAD presents different approaches for solving the pattern generation problem. The problem can be represented by mathematical programming models and solved by commercial solvers, Bonates (2007), Bonates et al. (2008), Ryoo et al. (2009), Hansen et al. (2011), Guo et al. (2012), Caster et al. (2016, and Chou et al. (2017). Heuristic techniques for solving the pattern generation problem is developed in Boros et al. (2000), and Alexe et al. (2006). ...
... The algorithm employs Black and Quine consensus method in finding the prime implicants of Boolean function. Mixed binary integer programming models are proposed in Ryoo and Jang (2009). The purpose of the model is to generate strong prime patterns and strong spanned patterns. ...
... More compact binary models are proposed in Guo and Ryoo (2012). These models have the advantage of a fewer number of binary variables as compared to the models presented in Ryoo and Jang (2009). ...
... Despite the undoubted advancements in the area of multiclass classification, there is still room for developing new approaches to improve the effectiveness and efficiency of the methods and tools to analyze archives of historical records for the purpose of discovering hidden structural relationships in large-scale datasets. In this paper, we integrate the mixed integer linear programming based Logical Analysis of Data (LAD) approach of Ryoo and Jang (2009) with the multiclass LAD method of Avila-Herrera and Subasi (2013Subasi ( , 2015 to develop a new multiclass LAD algorithm, where two control parameters, homogeneity and prevalence, are incorporated to generate relaxed (fuzzy) patterns. ...
... The research area of LAD was introduced and developed by Hammer (1986) whose vision expanded the LAD methodology from theory to successful data applications in numerous biomedical, industrial, and economics case studies, see, for example, Alexe et al. (2003Alexe et al. ( , 2004Alexe et al. ( , 2005, Hammer et al. (1999, 2011), Hammer and Bonates (2006, Lauer et al. (2002), Reddy et al. (2008Reddy et al. ( , 2009 and the references therein. The implementation of LAD method was described in Boros et al. (1997Boros et al. ( , 2000, Crama et al. (1988) and several further developments of the original technique were presented in Alexe et al. (2007), Alexe and Hammer (2006), Bonates et al. (2008), Guo and Ryoo (2012), Hammer et al. (2004), Ryoo and Jang (2009). An overview of standard LAD method can be found in Alexe et al. (2007) and Bonates et al. (2008). ...
... The second OvOtype method modifies the architecture of the pattern generation and theory formation steps in standard LAD method, where a LAD pattern P i j is generated for each pair of classes C i , C j ∈ C, i = j. Mortada (2010) proposed a multiclass LAD method, integrating ideas from the second approach presented by Moreira (2000), which is based on OvO scheme and an implementation of LAD based on mixed integer linear programming (MILP) presented by Ryoo and Jang (2009). The methodology of Mortada (2010) was applied to five multiclass benchmark datasets. ...
Article
Full-text available
An efficient and robust algorithm based on mixed integer linear programming is proposed to extend the Logical Analysis of Data (LAD) methodology to solve multiclass classification problems, where One-vs-Rest learning models are constructed to classify observations in predefined classes. The proposed algorithm uses two control parameters, homogeneity and prevalence, for identifying relaxed (fuzzy) patterns in multiclass datasets. The utility of the proposed method is demonstrated through experiments on multiclass benchmark datasets. Numerical experiments show that the efficiency and performance of the proposed multiclass LAD method with relaxed patterns is comparable to, if not better than, those of the previously developed LAD based multiclass classification as well as other well-known supervised learning methods.
... In the literature, there are three common approaches for pattern generation: enumeration-based approaches (Bores et al. 2000), heuristic approaches (Hammer and Bonates 2006), and mixed integer linear programming (MILP)-based approaches (Mortada et al. 2011;Ryoo and Jang 2009). In MILP based approaches, the objective is to maximize the number of positive (negative) observations that are covered by the generated positive (negative) patterns, while generating patterns that are optimal with respect to certain preferences or constraints (Ryoo and Jang 2009). ...
... In the literature, there are three common approaches for pattern generation: enumeration-based approaches (Bores et al. 2000), heuristic approaches (Hammer and Bonates 2006), and mixed integer linear programming (MILP)-based approaches (Mortada et al. 2011;Ryoo and Jang 2009). In MILP based approaches, the objective is to maximize the number of positive (negative) observations that are covered by the generated positive (negative) patterns, while generating patterns that are optimal with respect to certain preferences or constraints (Ryoo and Jang 2009). ...
... The MILP based method proposed in Guo and Ryoo (2012) involves much smaller number of 0-1 integer variables than the approach presented in Ryoo and Jang (2009), thus requires shorter training time. The MILP based method proposed in Mortada et al. (2011) is a modified version of the approach introduced in Ryoo and Jang (2009). ...
Technical Report
1.1. What has been done? The objective of this scientific collaboration is the application and evaluation of the Logical analysis of Data (LAD) approach and the related software cbmLAD as a data-driven tool in the context of fault detection and isolation in complex chemical processes. Some experimentation on the accuracy and robustness of the LAD was done when the method is used to exploit the simulated data that are generated from the Tennessee Eastman Process (TEP). The cbmLAD has been used to analyze the data without any priori statistical assumption. The software was capable of discovering hidden knowledge in the form of interpretable patterns, thus relates the detected faults to their causes. The software was also used to analyze data collected from a biomass recovery boiler, which is a critical and complex part in the pulp and paper industry. The discovered interpretable patterns were used to build a decision model (classification rule) to classify the new observations, and thus do detect the fault as it happens.
... There are many optimization criteria for pattern generation. The pattern optimization criteria used in this paper are Degree, Homogeneity and number of covered observations [5]. Where, a good LAD model has a small number of features, a small number of patterns, and high-quality patterns, which have (a small Degree and High Homogeneity). ...
... A Mixed 0 -1 Integer and Linear Programming (MILP) represented to determine LAD patterns, which are optimal in terms of pattern selection preferences and previously studied patterns too. Performing extensive experiments on large datasets, the proposed approach gives efficiency in generating LAD patterns [5]. MILP models as developed in [27] is for many different optimal and Pareto-optimal LAD patterns. ...
... Other types of patterns are maximum prime patterns and maximum spanned patterns. Art formulation, a MILP method, can also be made to determine optimal patterns, stratifying prevalence, complexity and Homogeneity as per requirements specified clearly by the user [5]. In this research, the MILP is employed to improve the quality of LAD outputs. ...
Article
Full-text available
Logical analysis of data (LAD) is an important subfield of supervised machine learning and data mining. It is a methodology for data analysis, which uses concepts of optimization, combinatorics and Boolean functions. LAD is a binary classification that used for Boolean data with high explanatory power. Because patterns are the most important building blocks in LAD, they must be selected carefully. One of the main drawbacks in LAD, which needs to be addressed, is the quality of the generated patterns and extraction of positive and negative patterns. By these quality patterns, we can classify new observations with high accuracy. The proposed methodology developed to address this issue. It studied the LAD method, its refinements, and define quality measures for pattern generation. Then, contribute to improving the pattern selection procedures using an optimization technique called Mixed Integer-Linear Programs (MILP) and the General Algebraic Modelling System (GAMS) tools using MIP solver. Using this technique for generating an optimized set of patterns aims at selecting the most important patterns to improve pattern quality, and get very strong results with a high accuracy. Experiments carried out on the SPECT dataset, it shows the efficiency of the proposed method in regards to minimize the number of generated patterns and increase the accuracy of the classification model.
... Le problème de génération de patterns optimaux peut se résoudre par un programme linéaire. Dans Ryoo and Jang [2009] sont proposés des programmes linéaires en nombres entiers permettant de générer différents types de pattern. ...
... Pour chaque observations positives, les auteurs de Ryoo and Jang [2009] impose la contrainte : ...
... En la remplaçant par les contraintes (b), (c) et (d), cette fois-ci, nous pouvons dire qu'une solution (x, y, z) est réalisable si et seulement si le terme généré par x est un pattern. Toutefois, avec la contrainte (a) ou les contraintes (b), (c) et (d), le second théorème de Ryoo and Jang [2009] est vérifié dès lors qu'il existe un pattern : ...
Thesis
L’analyse de groupes de données binaires est aujourd’hui un défi au vu des quantités de données collectées. Elle peut être réalisée par des approches logiques. Ces approches identifient dessous-ensembles d’attributs booléens pertinents pour caractériser les observations d’un groupe et peuvent aider l’utilisateur à mieux comprendre les propriétés de ce groupe.Cette thèse présente une approche pour caractériser des groupes de données binaires en identifiant un sous-ensemble minimal d’attributs permettant de distinguer les données de différents groupes.Nous avons défini avec précision le problème de la caractérisation multiple et proposé de nouveaux algorithmes qui peuvent être utilisés pour résoudre ses différentes variantes. Notre approche de caractérisation de données peut être étendue à la recherche de patterns (motifs) dans le cadre de l’analyse logique de données. Un pattern peut être considéré comme une explication partielle des observations positives pouvant être utilisées par les praticiens, par exemple à des fins de diagnostic. De nombreux patterns existent et plusieurs critères de préférence peuvent être ajoutés pour se concentrer sur des ensembles plus restreints (prime patterns,strong patterns,. . .). Nous proposons donc une comparaison entre ces deux méthodologies ainsi que des algorithmes pour générer des patterns. Un autre objectif est d’étudier les propriétés des solutions calculées en fonction des propriétés topologiques des instances. Des expériences sont menées sur de véritables ensembles de données biologiques.
... A non-pure pattern is allowed to cover a large proportion of observations in one class and a much smaller proportion of observations in the opposite class. The classification accuracy of LAD depends mainly on the type of extracted patterns ( Ryoo & Jang, 2009 ). ...
... In the literature, there are three common methods for pattern generation: (1) enumeration-based methods ( Boros et al.,20 0 0 ), (2) heuristic and meta-heuristic methods ( Hammer & Bonates, 2006;Kim & Choi, 2015a, 2015b, and (3) mixed integer and linear programming (MILP) optimization-based methods Ryoo & Jang, 2009 ). Enumeration-based methods are computationally demanding and time consuming if the dataset has a large number of binary attributes. ...
... Nevertheless, their computational time is supposed to be shorter. The MILP-based method proposed in ( Ryoo & Jang, 2009 ) generates useful patterns that are optimal with respect to various selection preferences, namely simplicity, selectivity, and evidential ( Hammer, Kogan, Simeone, & Szedmák, 2004 ). It also generates patterns that satisfy user specified requirements such as the degree of the pattern, the coverage, and the homogeneity ( Ryoo & Jang, 2009 ). ...
Article
This paper applies the Logical Analysis of Data (LAD) to detect and diagnose faults in industrial chemical processes. This machine learning classification technique discovers hidden knowledge in industrial datasets by revealing interpretable patterns, which are linked to underlying physical phenomena. The patterns are then combined to build a decision model that serves to diagnose faults during the process operation, and to explain the potential causes of these faults. LAD is applied to two case studies, selected to exemplify the difficulty in interpreting faults in complex chemical processes. The first case study is the Tennessee Eastman Process (TEP), a well-known benchmark problem in the field of process monitoring and control that uses simulated data. The second one uses a real dataset from a black liquor recovery boiler in a pulp mill. The results are compared to those obtained by other common machine learning techniques, namely artificial neural networks (ANN), Decision Tree (DT), Random Forest (RF), k nearest neighbors (kNN), quadratic discriminant analysis (QDA) and support vector machine (SVM). In addition to its explanatory power, the results show that LAD's performance is comparable to the most accurate techniques.
... The idea is to build computer programs that refine the databases automatically. Among the extracted patterns, some will be trivial and non-interesting, others, on the other hand, will general and can contribute to accurate prediction of future data (Ryoo & Jang, 2009). The patterns discovered must be meaningful and have some advantage in an economic sense. ...
... This step is essential in identifying the positive and negative patterns from the binarized dataset of positive and negative observations. The accuracy of LAD decision model depends on the type of generated patterns (Ryoo & Jang, 2009). ...
... The set of observations covered by the pattern is denoted as ( ). A high degree pattern is more likely to cover small proportion of observations, while pattern with a low degree is more likely to have higher coverage (Ryoo & Jang, 2009). In the testing dataset, misclassified observations are results of generating high degree patterns while unclassified observations are results of low degree patterns (Ryoo & Jang, 2009). ...
Article
Full-text available
This paper addresses the applicability of multi-class Logical Analysis of Data (LAD) as a face recognition technique (FRT). This new classification technique has already been applied in the field of biomedical and mechanical engineering as a diagnostic technique; however, it has never been used in the face recognition literature. We explore how Eigenfaces and Fisherfaces merged to multi-class LAD can be leveraged as a proposed FRT, and how it might be useful compared to other approaches. The aim is to build a single multi-class LAD decision model that recognizes images of the face of different persons. We show that our proposed FRT can effectively deal with multiple changes in the pose and facial expression, which constitute critical challenges in the literature. Comparisons are made both from analytical and practical point of views. The proposed model improves the classification of Eigenfaces and Fisherfaces with minimum error rate.
... A pattern is a Boolean logical term that is made up of literals that are either Boolean attributes or their negated attributes, and pattern generation is a combinatorial optimization task that serves as the bottleneck as well as key operation for a successful analysis of data by LAD. LAD patterns have long been generated heuristically [4,8,10,17,35], but optimization-based approaches have recently been proposed, too [15,32,47,57]. ...
... For reference, there are different types of LAD patterns that are (Pareto-)optimal with respect to two other well-adopted pattern generation preferences [32,35,47], but the definition of LAD pattern dictates that the 0-1 MP models for all LAD patterns share one same requirement ...
... by LAD via solving smaller and stronger 0-1 MILP/IP pattern generation instances, which indicates enhanced efficacy of LAD in practice. This section tests the aforementioned utility of the new valid inequalities of this paper via extensive pattern generation experiments with 12 real-life datasets from [44]; we note that six of these datasets are well-studied in the LAD literature (e.g., [15,17,32,35,47,57].) For notational simplicity, for • ∈ {+, −}, we denote by• the complementary element of • with respect to the set {+, −}. ...
Article
Full-text available
0–1 multilinear programming (MP) captures the essence of pattern generation in logical analysis of data (LAD). This paper utilizes graph theoretic analysis of data to discover useful neighborhood properties among data for data reduction and multi-term linearization of the common constraint of an MP pattern generation model in a small number of stronger valid inequalities. This means that, with a systematic way to more efficiently generating Boolean logical patterns, LAD can be used for more effective analysis of data in practice. Mathematical properties and the utility of the new valid inequalities are illustrated on small examples and demonstrated through extensive experiments on 12 real-life data mining datasets.
... (3) Mixed 0-1 Integer and Linear Programming (MILP) based method. The MILP-based method proposed in [16] can generate useful patterns that are optimal without total enumeration. It can generate strong patterns which make LAD decision model generalizes better on new observations. ...
... It can generate strong patterns which make LAD decision model generalizes better on new observations. The MILP approach proposed in [13] is a modified version of the approach introduced in [16]. The modification aims at increasing the number of patterns generated from the same training data set without a significant increase in training time, thus increasing the classification power in the two-class problems. ...
... In this tutorial, we explain the MILP-based pattern generation method presented in [16] in order to generate the patterns in LAD approach. The basic task is the generation of positive and negative patterns. ...
... The idea is to build computer programs that refine the databases automatically. Among the extracted patterns, some will be trivial and non-interesting, others, on the other hand, will general and can contribute to accurate prediction of future data (Ryoo & Jang, 2009). The patterns discovered must be meaningful and have some advantage in an economic sense. ...
... This step is essential in identifying the positive and negative patterns from the binarized dataset of positive and negative observations. The accuracy of LAD decision model depends on the type of generated patterns (Ryoo & Jang, 2009). ...
... The set of observations covered by the pattern is denoted as ( ). A high degree pattern is more likely to cover small proportion of observations, while pattern with a low degree is more likely to have higher coverage (Ryoo & Jang, 2009). In the testing dataset, misclassified observations are results of generating high degree patterns while unclassified observations are results of low degree patterns (Ryoo & Jang, 2009). ...
... . The accuracy of LAD classifier depends on the characteristics of the generated patterns [21]. ...
... There are three common methods for pattern generation: enumeration-based methods, heuristic methods, and mixed integer and linear programming (MILP)-based methods [13,17,21]. The MILP-based methods are more accurate than the others, and give optimal solutions, as reported in [21]. ...
... There are three common methods for pattern generation: enumeration-based methods, heuristic methods, and mixed integer and linear programming (MILP)-based methods [13,17,21]. The MILP-based methods are more accurate than the others, and give optimal solutions, as reported in [21]. ...
Conference Paper
This paper presents a prognostic methodology that can be implemented in a condition-based maintenance (CBM) program. The methodology estimates the remaining useful life (RUL) of a system by using a pattern-based machine learning and knowledge discovery approach called Logical Analysis of Data (LAD). The LAD approach is based on the exploration of the monitored system's database, and the extraction of useful information which describe the physics that characterize its degradation. The diagnostic information, which is updated each time the new data is gathered, is combined with a non-parametric reliability estimation method, in order to predict the RUL of a monitored system working under different operating conditions. In this paper, the developed methodology is compared to a known CBM prognostic technique; the Cox proportional hazards model (PHM). The methodology has been tested and validated based on the Friedman statistical test. The results of the test indicate that the proposed methodology provides an accurate RUL prediction.
... The mixed integer linear programming (MILP) based approach can be used for logic rule extractions. (Ryoo & Jang et., 2009) proposed that the MILP-based approach generates useful rules that are optimal for various preferential choices, i.e., simplicity, selectivity, and evidentially (Hammer et., 2004). The method identifies rules that are optimal with respect to various previously studied and new rule selection preferences. ...
... for various preferential choices, i.e., simplicity, selectivity, and evidentially (Hammer et., 2004). The method identifies rules that are optimal with respect to various previously studied and new rule selection preferences. It also generates rules that satisfy user-specified requirements such as the length, coverage, and homogeneity of the rules (Ryoo & Jang et. al., 2009). The main equations associated with the MILP algorithm are given below. ...
... This paper introduces a new model-based control chart, called Logical Analysis of Data Regression (LADR) based control chart to detect the anomalies and to perform root cause analysis for corrective actions. The approach combines LADR, which is a machine learning technique for pattern generation and regression based on combining the logical analysis of data (LAD) (Ryoo & Jang, 2009) with a control chart. The generated patterns identify the multidimensional zones that characterize different groups of observations in the original data. ...
... LADR uses the LAD approach to generate patterns that differentiate and characterize different process states. It uses the cbmLAD software (Yacout et al., 2017) to obtain strong patterns, which have high coverage (Ryoo & Jang, 2009). The following subsections show the steps to implement LADR technique as depicted in Fig. 1. ...
Article
Full-text available
Control charts are widely used as a tool in process quality monitoring to detect anomalies and to improve the quality of a process and product. Nevertheless, their limitations have increased in the face of increasingly complex manufacturing processes. They do not have capability of handling large streams of non-normal and autocorrelated multivariate data, which is in most real applications. This may lead to an increase in false alarm signals and/or missed detection of anomalies. They are not designed to automatically identify the root causes of an anomaly when the process is out-of-control. Several machine-learning techniques were integrated with control charts to improve the sensitivity and specificity of anomaly detection. Nevertheless, some existing techniques still produce a high false alarm rate and/or missed detection. The root cause analysis is seldom performed. In this paper, we propose a new integration that combines the logical analysis of data regression technique (LADR) and the exponential weighted moving average (EWMA) as a new model-based control chart. LADR is based on the traditional LAD methodology, which is a supervised data mining technique for pattern generation. LADR transforms the original independent variables into pattern variables by using cbmLAD software to develop a regression model. The LADR–EWMA increases the sensitivity of anomaly detection in the process and uses the patterns to perform root cause analysis of that anomaly. We applied LADR–EWMA to a real application: a concrete manufacturing process. We compared its performance with Linear regression, Support vector regression, Partial Least Square regression, and Multivariate adaptive regression Spline. The results demonstrate that the LADR–EWMA, which is based on pattern recognition, performs better compared to the other techniques in terms of a reduction of false alarms and missed detection. In addition, LADR–EWMA facilitates interpretation and identification of the root cause of the detected anomaly.
... There are three common methods for pattern extraction: (1) enumeration-based methods ( Boros et al., 20 0 0 ), (2) heuristic and meta-heuristic methods ( Hammer & Bonates, 2006;Kim & Choi, 2015a, 2015b, and (3) mixed integer and linear programming (MILP) optimization-based methods Ryoo & Jang, 2009 ). Enumeration-based methods are computationally demanding and time consuming for large datasets. ...
... The heuristicbased methods give feasible solutions but not the optimal ones; nevertheless, their computational time is supposed to be shorter. The MILP-based method proposed in ( Ryoo & Jang, 2009 ) generates useful patterns that are optimal with respect to various selection preferences, namely simplicity, selectivity, and evidential ( Hammer, Kogan, Simeone, & Szedmák, 2004 ). More details on pattern extraction methods are found in ( Chikalov et al., 2012;Lejeune et al., 2018 ). ...
... There is a multitude of researches that have already been conducted to theoretically verify and validate the LAD approach. For more details, the interested readers are referred to [21,26,27]. ...
... A pattern is strong if there is no other pattern such that ( ) ⊂ ( ) [21]. There are three common methods for pattern extraction: enumeration-based methods, heuristic methods, and mixed integer and linear programming (MILP)-based methods [24][25][26]. The key issue in this step is to extract the strong patterns which have more generalized and explanatory power. ...
Conference Paper
This paper proposes an interpretable knowledge discovery approach to detect and diagnose faults in chemical processes. The approach is demonstrated using simulated data from the Tennessee Eastman Process (TEP), as a challenging benchmark problem. The TEP is a plant-wide industrial process that is commonly used to study and evaluate a variety of topics, including the design of process monitoring and control techniques. The proposed approach is called Logical Analysis of Data (LAD). LAD is a machine learning approach that is used to discover the hidden knowledge in historical data. The discovered knowledge in the form of extracted patterns is employed to construct a classification rule that is capable of characterizing the physical phenomena in the TEP, wherein one can detect and identify a fault and relate it to the causes that contribute to its occurrence. To evaluate our approach, the LAD is trained on a set of observations collected from different faults, and tested against an independent set of observations. The results in this paper show that the LAD approach achieves the highest accuracy compared to two common machine learning classification techniques; Artificial Neural Networks and Support Vector Machines.
... • Enumeration-based approaches [16,19] • Heuristic approaches [20] • Mixed 0-1 Integer and Linear Programming (MILP)-based methods [21,22] The cbmLAD software program [23] used for this paper generates patterns based on an optimized version of the MILP approach. ...
... , [22]. If a positive pattern covers a positive observation, the dot product of the Boolean pattern vector U and the Boolean observation vector V h of each observation covered by that pattern must be equal to the degree d of that pattern [21]. ...
Article
Full-text available
This paper deals with the application of Logical Analysis of Data (LAD) to machinery-related occupational accidents, using belt-conveyor-related accidents as an example. LAD is a pattern recognition and classification approach. It exploits the advancement in information technology and computational power in order to characterize the phenomenon under study. The application of LAD to machinery-related accident prevention is innovative. Ideally, accidents do not occur regularly, and as a result, companies have little data about them. The first objective of this paper is to demonstrate the feasibility of using LAD as an algorithm to characterize a small sample of machinery-related accidents with an adequate average classification accuracy. The second is to show that LAD can be used for prevention of machinery-related accidents. The results indicate that LAD is able to characterize different types of accidents with an average classification accuracy of 72%-74%, which is satisfactory when compared with other studies dealing with large amounts of data where such a level of accuracy is considered adequate. The paper shows that the quantitative information provided by LAD about the patterns generated can be used as a logical way to prioritize risk factors. This prioritization helps safety practitioners make decisions regarding safety measures for machines.
... For example, Bonnates et al. [35] proposed an integer programming method to construct patterns with maximum coverage. Ryoo and Jang [36] introduced a Mixed 0-1 Integer and Linear Programming (MILP) approach to identify LAD patterns optimized for various preferences. Alternative pattern generation approaches have emerged. ...
Article
Full-text available
Initially introduced by Peter Hammer, logical analysis of data (LAD) is a methodology that aims at computing a logical justification for dividing a group of data into two groups of observations, usually called the positive and negative groups. Let us consider this partition into positive and negative groups as the description of a partially defined Boolean function; the data are then processed to identify a subset of attributes, whose values may be used to characterize the observations of the positive groups against those of the negative group. LAD constitutes an interesting rule-based learning alternative to classic statistical learning techniques and has many practical applications. Nevertheless, the computation of group characterization may be costly, depending on the properties of the data instances. A major aim of our work is to provide effective tools for speeding up the computations, by computing some a priori probability that a given set of attributes does characterize the positive and negative groups. To this effect, we propose several models for representing the data set of observations, according to the information we have on it. These models, and the probabilities they allow us to compute, are also helpful for quickly assessing some properties of the real data at hand; furthermore, they may help us to better analyze and understand the computational difficulties encountered by solving methods. Once our models have been established, the mathematical tools for computing probabilities come from Analytic Combinatorics. They allow us to express the desired probabilities as ratios of generating function coefficients, which then provide a quick computation of their numerical values. A further, long-range goal of this paper is to show that the methods of Analytic Combinatorics can help in analyzing the performance of various algorithms in LAD and related fields.
... In [11], integer programming is used to generate patterns. A mixed approach using both integer and linear programming principles was presented in [12,13]. The authors in [14] described a genetic algorithm for generating patterns. ...
Conference Paper
The main stage of logical analysis of data is the generation of patterns that have the maximum coverage of sample observations and also fulfill the constraints on the homogeneity of the coverage. To reduce the effect of overfitting, these constraints should be weakened and the heterogeneity of the pattern should be allowed. However, it is a challenge to determine in advance what kind of heterogeneity is acceptable for each data set. In this paper, we consider the problem of pattern generation as a multicriteria optimization problem where the criteria are the coverage of objects of the target class and coverage of objects that are not included in the target class. To solve this problem, a modified evolutionary algorithm based on the NSGA-II algorithm has been developed. The new algorithm takes into account the representation of the solution (coding patterns) in logical analysis of data and the informativeness of patterns.
... That is, a LAD pattern is a conjunction of Boolean literals (where a literal is either a 0-1 attribute or its negation) that describes a subset of the + observations only. For background on LAD, we refer the reader to [5,6,35]. In addition, with data binarization if necessary (e.g., [6],) we assume hereafter that data under analysis are (converted into equivalent) 0-1 vectors in n Boolean attributes a j ∈ {0, 1}, j ∈ N := {1, . . . ...
Article
Full-text available
Logical analysis of data (LAD) discovers useful knowledge from a set of data in the form of a Boolean pattern for classifying future data. Generating a pattern has been shown to be equivalent to solving a 0–1 multilinear program (MP). Thus, the success of LAD is tightly related to how efficiently practical instances of pattern generation MP’s can be solved. For a polyhedral relaxation of LAD pattern generation MP, this paper introduces a new notion of similarity among data that allows for simultaneously relaxing multiple terms of the objective function of MP into a single valid inequality for the Boolean MP polytope. Specifically, we present a framework for constructing three types of strong valid inequalities from cliques in multiple graph representations of data that collectively yield a tight polyhedral relaxation of MP. Furthermore, we specify conditions under which each type of the new inequalities defines a facet of the MP polytope. In comparison with methods from the literature, benefits of the new inequalities are validated through classification experiments with 8 public machine learning datasets.
... Boolean or logical analysis of data has been a hot topic of research in the past few decades [1][2][3][4][5][6][7][8][9][10][11][12][13]. Of a particular interest herein is a seminal paper, in which Crama et al. [1] dealt with the problem of "identifying the small subsets of plausible causes of a given effect, among a large set of factors including all the potential causes, along with many other (irrelevant) factors." ...
Chapter
Full-text available
This chapter utilizes a modern regular and modular version of the eight-variable Karnaugh map in a systematic and exhaustive investigation of cause-effect relationships modeled by partially-defined Boolean functions (PDBF) (known also as incompletely-specified switching functions). First, we present a simple Karnaugh-map test that can decide whether a certain variable must be included in a set of supporting variables of the function, and, otherwise, might enforce the exclusion of this variable from such a set. This exclusion is attained via certain don’t-care assignments that ensure the equivalence of the Boolean quotient w.r.t. the variable, and that w.r.t. its complement, i.e., the exact matching of the half map representing the internal region of the variable, and the remaining half map representing the external region of the variable, in which case any of these two half maps replaces the original full map as a representation of the function. Such a variable exclusion might be continued w.r.t. other variables till a minimal set of supporting variables is reached. The paper addresses a dominantly-unspecified PDBF to obtain all its minimal sets of supporting variables without explicit resort to integer-programming techniques. For each of the minimal sets obtained, standard map methods for extracting prime implicants allow the construction of all irredundant disjunctive forms (IDFs). According to this scheme of first identifying a minimal set of supporting variables, we avoid the task of drawing prime-implicant loops on the initial eight-variable map, and postpone this task till the map is (usually dramatically) reduced in size. The procedure outlined herein has important ramifications for the newly-established discipline of Qualitative Comparative Analysis (QCA). These ramifications are not expected to be welcomed by the mainstream QCA community, since they clearly indicate that the too-often strong results claimed by QCA adherents need to be checked and scrutinized. In our opinion, more observations have to be made in order to narrow down the possibilities and decrease the number of candidate IDFs.
... In LAD, feature selection is an NP-hard problem, like other data related problems (Miao et al. (2018); Cai et al. (2019); Miao et al. (2020b, a).) Feature selection in LAD is usually formulated into the well-known minimum set covering (SC) problem (see, e.g., Boros et al. (2000); Alexe et al. (2007).) Unlike its subsequent stage, pattern generation, which draws a lot of research efforts (Alexe and Hammer (2006a, b); Bonates et al. (2008); Ryoo and Jang (2009); Guo and Ryoo (2012); Ryoo (2017a, b, 2019b, a)), the literature specifically addressing feature selection problem in LAD is relatively scarce ( Bruni (2007); Ryoo and Jang (2007).) To be specific, Bruni (2007) reformulated feature selection problem from SC into a weighted SC problem, and Ryoo and Jang (2007) designed a memory and time efficient procedure called implicit SC-based feature selection for large-scale datasets. ...
Article
Full-text available
Feature selection in logical analysis of data (LAD) can be cast into a set covering problem. In this paper, extending the results on feature selection for binary classification using LAD, we present a mathematical model that selects a minimum set of necessary features for multi-class datasets and develop a heuristic algorithm that is both memory and time efficient for this model correspondingly. The utility of the algorithm is illustrated on a small example and the superiority of our work is demonstrated through experiments on 6 real-life multi-class datasets from UCI repository.
... In our study, the classes are positive class (defect) and negative class (healthy).In this paper, the pattern generation is proposed based on the formulation and solution of a mixed-integer and linear programming (MILP) problem (Linderoth & Lodi, 2010). The advantages of this approach than the described pattern generation methodology proposed in (Ryoo & Jang, 2009;Moreira, 2000) are the ability to control the discriminating power between the different classed through a user-defined parameter called the discriminating which explain the minimum number of patterns that must separate each observation to class i c from those belonging to class j c (Mortada et al., 2014). The second advantage of the proposed approach is no limit on the number of patterns generated by the pattern generation algorithm, which creates more patterns finding possibilities. ...
Article
Full-text available
New online fault monitoring and alarm systems, with the aid of Cyber-Physical Systems (CPS) and Cloud Technology (CT), are examined in this article within the context of Industry 4.0. The data collected from machines is used to implement maintenance strategies based on the diagnosis and prognosis of the machines' performance. As such, the purpose of this paper is to propose a Cloud Computing Platform containing three layers of technologies forming a Cyber-Physical System which receives unlabelled data to generate an interpreted online decision for the local team, as well as collecting historical data to improve the analyzer. The proposed troubleshooter is tested using unlabelled experimental data sets of rolling element bearing. Finally, the current and future Fault Diagnosis Systems and Cloud Technologies applications in the maintenance field are discussed. Keywords: Remote Fault Diagnosis System (RFDS); Logical Analysis of Data (LAD); Cyber-Physical System (CPS); Pattern recognition; Industry 4.0; Cloud computing
... Observation X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 Class There are three common methods for pattern extraction: (1) enumeration-based methods (Boros et al., 2000), (2) heuristic and meta-heuristic methods (Hammer & Bonates, 2006;H. H. Kim & Choi, 2015b, 2015a, and (3) mixed integer and linear programming (MILP) optimization-based methods (Mortada et al., 2011;Ryoo & Jang, 2009). More details about steps of LAD through an illustrative example are found in (Ragab, El-Koujok, Poulin, Amazouz, & Yacout, 2018). ...
Chapter
This chapter provides a state-of-the-art review of the data-driven FDD methods that have been developed for complex industrial systems focusing on machine learning (ML)-based methods. Among common ML techniques, the top fault diagnosis algorithms are discussed in this chapter according to their efficiencies and widespread popularities. A number of common predictive and descriptive ML techniques have been discussed according to their pros and cons. A literature review was done on the characteristics of these methods, according to a multitude of papers and recent reviews. The chapter also presents a number of methodologies applied to real case studies in industrial plants located in Canada. Each methodology combines diversified predictive and descriptive methods integrated together. Finally, the chapter concludes the results and briefly lists some of the lessons learned through these case studies.
... One core concept behind LAD is pattern identification. A pattern is a set of literals that discriminates between or classifies a certain set of effects or a class of effects (Boros et al., 2000;Crama et al., 1988;Ryoo and Jang, 2009). Literals are either attributes, the binarized values of health condition indicators, or their negations, the absence or the complement of the attributes. ...
Article
Purpose Condition-based maintenance (CBM) has become a central maintenance approach because it performs more efficient diagnoses and prognoses based on equipment health condition compared to time-based methods. CBM models greatly inform maintenance decisions. This research examines three CBM fault prognostics models: logical analysis of data (LAD), artificial neural networks (ANNs) and proportional hazard models (PHM). A methodology, which involves data pre-processing, formulating the models and analyzing model outputs, is developed to apply and compare these models. The methodology is applied on NASA’s Turbofan Engine Degradation data set and the structural health monitoring (SHM) data set from a Nova Scotia Bridge. Results are evaluated using three metrics: error, half-life error and a cost score. This paper concludes that the LAD and feedforward ANN models compares favorably to the PHM model. However, the feedback ANN does not compare favorably, and its predictions show much larger variance than the predictions from the other three methods. Based on these conclusions, the purpose of this paper is to provide recommendations on the appropriate situations in which to apply these three prognostics models. Design/methodology/approach LAD, ANNs and PHM methods are adopted to perform prognostics and to calculate the mean residual life (MRL) of eqipment using NASA’s Turbofan Engine Degradation data set and the SHM data set from a Nova Scotia Bridge. Statistical testing was used to evaluate the statistical differences between the approaches based on these metrics. By considering the differences in these metrics between the models, it was possible to draw conclusions about how the models perform in specific cases. Findings Results were evaluated using three metrics: error, half-life error and a cost score. It was concluded that the LAD and feedforward ANN models compares favorably to the PHM model. However, the feedback ANN does not compare favorably and its predictions show much larger variance than the predictions from the other three methods. Overall the models predict failure after it has already occurred (negative error) when the residual life is large and vice versa. Practical implications It was concluded that a good CBM prognostics model for practical implications can be determined based on three main considerations: accuracy, run time and data type. When accuracy is a main concern, as in the case where impacts of failure are large, LAD and feedforward neural network are preferred. The preference changes when run time is considered. If data can be easily collected and updating the model is performed often, the ANNs and LAD are preferred. On the other hand, if CM data are not easily obtainable and existing data are not representative of the population’s behavior, data type comes into play. In this case, PHM is preferred. Originality/value Previous research in the literature performed reviews of multiple independent studies on CBM techniques performed on different data sets. They concluded that it is typically harder to implement artificial intelligence models, because of difficulties in data procurement, but these approaches offer improved performance as compared to more traditional model-based and statistical approaches. In this research, the authors further investigate and compare the performance and results from two major artificial intelligence models, namely, ANNs and LAD, and one pioneer statistical model, PHM over the same two real life prognostics data sets. Such in-depth comparison and review of major CBM techniques was missing in current literature of CBM field.
... As such, the explanatory power of LAD resides within the generated patterns, and the pattern generation algorithms are the main areas of research on this approach (Boros et al. 2011). In general, three methods are used for pattern generation: enumeration-based techniques, mathematical programming algorithms, and heuristics techniques (Boros et al. 2000;Hammer et al. 2004;Ryoo and Jang 2009). The objective of these techniques and algorithms is to find the minimum number of patterns that characterize all of the observations in the dataset that form a robust theory that is capable of correctly classifying any new observation. ...
Article
Full-text available
This paper develops a prognostic technique called the logical analysis of survival curves (LASC). This technique is used to learn the degradation process of any physical asset, and consequently to predict its failure time (T). It combines the reliability information that is obtained from a classical Kaplan–Meier non-parametric curve to that obtained from online measurements of multiple sensed signals of degradation. An analysis of these signals by the machine learning technique, logical analysis of data (LAD), is performed to exploit the instantaneous knowledge about the state of degradation of the asset studied. The experimental results of the predictions of failure times for cutting tools are reported. The results show that LASC prognostic results are better than the results obtained by well-known machine learning techniques. Other advantages of the proposed techniques are also discussed.
... to the formulation of a 0-1 linear relaxation of (PG). Alternatively, one may aggregate the constraints in (2) with respect to j to concavify ϕ + by m + valid inequalities (e.g., [29]) ...
Article
Full-text available
0–1 multilinear program (MP) holds a unifying theory to LAD pattern generation. This paper studies a multi-term relaxation of the objective function of the pattern generation MP for a tight polyhedral relaxation in terms of a small number of stronger 0–1 linear inequalities. Toward this goal, we analyze data in a graph to discover useful neighborhood properties among a set of objective terms around a single constraint term. In brief, they yield a set of facet-defining inequalities for the 0–1 multilinear polytope associated with the McCormick inequalities that they replace. The construction and practical utility of the new inequalities are illustrated on a small example and thoroughly demonstrated through numerical experiments with 12 public machine learning datasets.
... They also describe two heuristics for an approximate solution of this problem. In [100], the authors propose a Mixed 0-1 Integer and Linear Programming (MILP) approach to identifying LAD patterns that are optimal with respect to various preferences. ...
Article
Logical Analysis of Data (LAD) is a data analysis methodology introduced by Peter L. Hammer in 1986. LAD distinguishes itself from other classification and machine learning methods by the fact that it analyzes a significant subset of combinations of variables to describe the positive or negative nature of an observation and uses combinatorial techniques to extract models defined in terms of patterns. In recent years, the methodology has tremendously advanced through numerous theoretical developments and practical applications. In the present paper, we review the methodology and its recent advances, describe novel applications in engineering, finance, health care, and algorithmic techniques for some stochastic optimization problems, and provide a comparative description of LAD with well-known classification methods.
... For more details about data binarization see [16]. Many techniques for pattern generation are adopted by many researchers, such as enumeration [16], heuristics [14,17], and integer linear programming [18]. This last one has been proposed in this paper. ...
Article
Full-text available
In this paper, logical analysis of data (LAD) is used to predict the seismic response of building structures employing the captured dynamic responses. In order to prepare the data, computational simulations using a single degree of freedom (SDOF) building model under different ground motion records are carried out. The selected excitation records are real and of different peak ground accelerations (PGA). The sensitivity of the seismic response in terms of displacements of floors to the variation in earthquake characteristics, such as soil class, characteristic period, and time step of records, peak ground displacement, and peak ground velocity, have also been considered. The dynamic equation of motion describing the building model and the applied earthquake load are presented and solved incrementally using the Runge-Kutta method. LAD then finds the characteristic patterns which lead to forecast the seismic response of building structures. The accuracy of LAD is compared to that of an artificial neural network (ANN), since the latter is the most known machine learning technique. Based on the conducted study, the proposed LAD model has been proven to be an efficient technique to learn, simulate, and blindly predict the dynamic response behaviour of building structures subjected to earthquake loads.
... A heuristic algorithm [8] would be characterized by selection the patterns with determined supported observation and testing error. Another approach to find the LAD patterns adopt linear programming As optimization-based, the proposed MILP-based approach can generate patterns of all different degrees with equal ease and, hence, do not require redundant artificial interval variables for analyzing data [11]. ...
Article
Full-text available
Logical Analysis of Data is a technique that used in finding a specific pattern. In Data mining, using this technique through learning process with available data set in order to extract a useful pattern. Then use it in different tasks. So in this paper, we apply this framework in construction a predictive model based on the useful extracted patterns concerned with the pattern sets generation. The practical work testing this predictive model, would be performed based on data set of disease that convenient with the nature of this technique. The data would be obtained from UCI web site for experimented data set.
... In this way, the detection of these patterns in any new dataset is an indication of the wear stage. There are many techniques for pattern generation such as enumeration (Bores et al. 2000), heuristics (Hammer, 1986;Hammer and Bonates, 2006), and integer linear programming (Ryoo and Jang, 2009). In this article, the software cbmLAD (Yacout et al. 2012) is used; it is based on heuristic techniques for pattern generation. ...
Article
Full-text available
This article presents a new tool wear multiclass detection method. Based on the experimental data, tool wear classes are defined using the Douglas–Peucker algorithm. Logical analysis of data (LAD) is then used as machine learning, pattern recognition technique for double objectives of detecting the present tool wear class based on the recent sensors' readings of the time-dependent machining variables, and deriving new information about the intercorrelation between the tool wear and the machining variables, by doing pattern analysis. LAD is a data-driven technique which relies on combinatorial optimization and pattern recognition. The accuracy of LAD is compared to that of an artificial neural network (ANN) technique, since ANN is the most familiar machine learning technique. The proposed method is applied to experimental data those are gathered under various machining conditions. The results show that the proposed method detects the tool wear class correctly and with high accuracy.
... A positive pattern is a pattern found in some observations of tools which experienced the physical phenomenon under study, and it has not been seen in any observations of tools which have not experienced the physical phenomenon. Many techniques were proposed for pattern generation such as enumeration [12], heuristics [13,14], column generation [15], and linear programming [16]. In this paper, we use the concept of Logical Analysis of Survival Data (LASD) that was developed and applied successfully in the medical field in [17]. ...
Conference Paper
Full-text available
This paper presents a novel approach for incorporating condition information based on historical data into the development of reliability curves. The approach uses a variation of Kaplan-Meier (KM) estimator and degradation– based estimators of survival patterns. From a statistical perspective, the use of KM estimator to create a reliability curve of a specific type of equipment, results in a general curve that does not take into consideration the instantaneous condition of each individual equipment. The proposed degradation-based estimator updates the KM estimator in order to capture the actual condition of equipment based on the detected patterns. These patterns identify interactions between condition indicators. The degradation-based reliability curves are obtained by a new methodology called ‘Logical Analysis of Survival Data (LASD). LASD identifies interactions between condition indicators without any prior hypotheses. It generates patterns based on machine learning and pattern recognition technique. Using these set of patterns, survival curves, which can predict the reliability of any device at any time based on its actual condition, are developed. To evaluate the LASD approach, it was applied to experimental results that represent cutting tool degradation during turning TiMMCs with condition monitoring. The performance of the LASD when compared to the traditional Kaplan-Meier based reliability curve improves the reliability prediction.
... We conclude by emphasizing that while the LAD approach has been proven to be mathematically rigorous, the results depend solely on the quality and the volume of the database. Although LAD can avoid many errors that may be present in the database [17,18] , the accuracy of the output depends on the accuracy of the input. Finally, ideally the database is expected to represent the whole population of interest or at least enough historical experience in order to be worthy of analysis. ...
Article
Full-text available
In this paper, a knowledge discovery tool called Logical Analysis of Data is used to shed light on the causal relationship, if any, between three clinical procedures, namely blood transfusion, surgery and organ transplant, and Alzheimer's disease, which is thought to be a prion-type disease of protein misfolding, capable of spreading infectiously from human to human. The Logical Analysis of Data is a data-mining artificial intelligence technique that allows the classification of phenomena based on knowledge extraction and pattern recognition, without the reliance on prior hypotheses or any statistical analysis.By creating a database of clinical information obtained from a systematic review of the literature on the risk factors of Alzheimer's Disease, we were able to apply the Logical Analysis of Data to reveal the patterns distinguishing cases of AD that have undergone any of the three clinical procedures, and those cases that have not. Although several eye-opening patterns were revealed, results show that there is no evidence of relation between blood transfusion, surgery or organ transplant and the onset or development of Alzheimer's disease.
Article
Industrial processes are hybrid systems, when they involve both discrete and continuous variables. Discrete variables contain valuable and easily comprehensible information, particularly crucial information about fault occurrences. To utilize this information, this study proposes a hybrid neural network model that integrates knowledge and data. A knowledge-based expert system is developed using discrete variables to assist the neural network in decision-making tasks. This study proposes a fusing expert system-based neural network optimization algorithm for hybrid systems fault diagnosis in industrial processes. Continuous variables are used as learning objects for the neural network, whereas guidance for the learning process is provided by discrete variable information. The mixed integer linear programming algorithm is employed to extract logic rules from discrete variables, and the influence of samples in the network learning algorithm is evaluated on the basis of the attributes of these logic rules. The effectiveness of the proposed algorithm is validated through experimental verification in a three-tank simulation systems and a practical industrial case of coal gasification processes. Experimental results demonstrate that the proposed algorithm exhibits higher fault diagnostic performance, better generalization ability, and stronger adaptability to sparse samples than baseline methods.
Article
Purpose This study aims to address the critical issue of machine breakdowns in industrial settings, which jeopardize operation economy, worker safety, productivity and environmental compliance. It explores the efficacy of a predictive maintenance program in mitigating these risks by proactively identifying and minimizing failures, thereby optimizing maintenance activities for higher efficiency. Design/methodology/approach The article implements Logical Analysis of Data (LAD) as a predictive maintenance approach on an industrial machine maintenance dataset. The aim is to (1) detect failure presence and (2) determine specific failure modes. Data resampling is applied to address asymmetrical class distribution. Findings LAD demonstrates its interpretability by extracting patterns facilitating the failure diagnosis. Results indicate that, in the first case study, LAD exhibits a high recall value for failure records within a balanced dataset. In the second case study involving smaller-scale datasets, enhancement across all evaluation metrics is observed when data is balanced and remains robust in the presence of imbalance, albeit with nuanced differences in between. Originality/value This research highlights the importance of transparency in predictive maintenance programs. The research shows the effectiveness of LAD in detecting failures and identifying specific failure modes from diagnostic sensor data. This maintenance strategy exhibits its distinction by offering explainable failure patterns for maintenance teams. The patterns facilitate the failure cause-effect analysis and serve as the core for failure prediction. Hence, this program has the potential to enhance machine reliability, availability and maintainability in industrial environments.
Article
In logical analysis of data, the feature selection step only considers selecting minimal number of features after binarization. This paper develops a nonlinear set covering model that explores the interaction between the number of selected original attributes and binarized features. Utilizing the partial derivative of pseudo-Boolean functions, we give a greedy heuristic for the nonlinear set covering problem. The efficacy of the algorithm is demonstrated through experiments on 10 public machine learning datasets from UCI repository.
Article
Purpose In this paper, a data mining approach is proposed for monitoring the conditions leading to a rail wheel high impact load. The proposed approach incorporates logical analysis of data (LAD) and ant colony optimization (ACO) algorithms in extracting patterns of high impact loads and normal loads from historical railway records. In addition, the patterns are employed in establishing a classification model used for classifying unseen observations. A case study representing real-world impact load data is presented to illustrate the impact of the proposed approach in improving railway services. Design/methodology/approach Application of artificial intelligence and machine learning approaches becomes an essential tool in improving the performance of railway transportation systems. By using these approaches, the knowledge extracted from historical data can be employed in railway assets monitoring to maintain the assets in a reliable state and to improve the service provided by the railway network. Findings Results achieved by the proposed approach provide a prognostic system used for monitoring the conditions surrounding rail wheels. Incorporating this prognostic system in surveilling the rail wheels indeed results in better railway services as trips with no-delay or no-failure can be realized. A comparative study is conducted to evaluate the performance of the proposed approach versus other classification algorithms. In addition to the highly interpretable results obtained by the generated patterns, the comparative study demonstrates that the proposed approach provides classification accuracy higher than other common machine learning classification algorithms. Originality/value The methodology followed in this research employs ACO algorithm as an artificial intelligent technique and LDA as a machine learning algorithm in analyzing wheel impact load alarm-collected datasets. This new methodology provided a promising classification model to predict future alarm and a prognostic system to guide the system while avoiding this alarm.
Article
Maritime logistics, which accounts for around 80% of international trade around the world, has been a driving force for economic growth. Increases in maritime traffic, however, lead to increased congestion in berths and terminals. This congestion in turn negatively affects the total ship turnaround time and leads to decreased efficiency in port operations and a higher demurrage rate, which refers to the number of vessels in queue for more than a fixed time period waiting to load/unload out of the total number of vessels entering a port. The demurrage rate directly affects a port’s operating profits; thus, this rate needs to stay as low as possible. In this study, we focus on developing a methodology to address the demurrage rate of a port. To this end, we first collect two sets of vessel data (2016 annual data for training and 2019 annual data for validating) for ships entering and leaving the Port of Ulsan in the Republic of Korea and integrate these datasets with berth data and weather data. We tailor the logical analysis of data (LAD) technique to derive the patterns from the training data that mitigate or aggravate the demurrage rate. We use these patterns to predict the demurrage rate for the validating set of data. The overall binary classification results demonstrate the proposed LAD technique’s competitive performance, compared with other state-of-the-art machine learning methods. We then analyze the patterns to derive policy suggestions that can lower the demurrage rate at the Port of Ulsan. Our computational experiments find that the availability of tugs or pilots and port arrival times mainly affect the demurrage rate at the Port of Ulsan. Finally, our study showcases new possibilities for using patterns of demurrage and non-demurrage vessels obtained by LAD to help policymakers and port operators address the growing demurrage problem.
Article
This paper clarifies the difference between intrinsically 0–1 data and binarized numerical data for Boolean logical patterns and strengthens mathematical results and methods from the literature on Pareto-optimal LAD patterns. Toward this end, we select suitable pattern definitions from the literature and adapt them with attention given to unique characteristics of individual patterns and the disparate natures of Boolean and numerical data. Next, we propose a set of revised criteria and definitions by which useful LAD patterns are clearly characterized for both 0–1 and real-valued data. Furthermore, we fortify recent pattern generation optimization models and demonstrate how earlier results on Pareto-optimal patterns can be adapted in accordance with revised pattern definitions. A numerical study validates practical benefits of the results of this paper through optimization-based pattern generation experiments.
Article
In this paper, we present a new integrated optimization model and a greedy algorithm for generating patterns, directly derived from original data instead of binarized data, in logical analysis of data (LAD). Pattern generation, following data discretization (binarization) and support set selection to handle non-binary data, is a building block that largely influences LAD classification. These stand-alone steps are generally considered optimization problems, which are difficult to solve and make the LAD procedure very tedious. To this end, we propose a new mixed-integer linear program, in which data discretization and support set selection are integrated into a single pattern generation optimization model, aiming to generate multiple logical patterns to cover observations maximally in the original data space. Furthermore, we develop a greedy search algorithm, in which the optimization model is reduced and solved iteratively to efficiently generate patterns. We then examine the effectiveness of the generated patterns in both one-class and large-margin LAD classifiers. The computational results for simulated and real datasets demonstrate the competitive performance in terms of classification accuracy in a relatively short runtime compared with previously developed pattern generation methods and other state-of-the-art machine learning algorithms.
Article
This paper presents a new rule-based classification method that partitions data under analysis into spherical patterns. The forte of the method is twofold. One, it exploits the efficiency of distance metric-based clustering to fast collect similar data into spherical patterns. The other, spherical patterns are each a trait shared among one type of data only, hence are built for classification of new data. Numerical studies with public machine learning datasets from Lichman (2013), in comparison with well-established classification methods from Boros et al. (IEEE Transactions on Knowledge and Data Engineering, 12, 292–306, 2000) and Waikato Environment for Knowledge Analysis (http://www.cs.waikato.ac.nz/ml/weka/), demonstrate the aforementioned utilities of the new method well.
Article
Explainable AI is highly demanded nowadays for analyzing faults in industrial systems since the expert needs to unlock the complexity of AI models that fitting the system inputs with its outputs. The paper objective is to propose an approach for building a graphical causality analysis for fault diagnosis named Interpretable logic tree (ILTA) that link the model-based with the data-driven method. The paper novelty is to use the fault tree analysis (FTA) logic for decomposing the fault to its root causes as a graphical interface to represent the discovered fault causality knowledge by the data-driven patterns. Therefore, the proposed methodology introduces an automatic construction approach for FTA that builds it directly from the data, meanwhile minimize the expert involvement. The new hybrid FTA version will be employed in our future research for diagnosis and prognosis the faults in complex systems.
Chapter
0–1 multilinear program (MP) holds a unifying theory to Boolean logical pattern generation. For a tighter polyhedral relaxation of MP, this note exploits cliques in the graph representation of data under analysis to generate valid inequalities for MP that subsume all previous results and, collectively, provide a much stronger relaxation of MP. A preliminary numerical study demonstrates strength and practical benefits of the new results.
Chapter
Data-based explanatory fault diagnosis methods are of great practical significance to modern industrial systems due to their clear elaborations of the cause and effect relationship. Based on Boolean logic, logical analysis of data (LAD) can discover discriminative if-then rules and use them to diagnose faults. However, traditional LAD algorithm has a defect of time-consuming computation and extracts only the least number of rules, which is not applicable for high-dimensional large data set and for fault that has more than one independent causes. In this paper, a novel fast LAD with multiple rules discovery ability is proposed. The fast data binarization step reduces the dimensionality of the input Boolean vector and the multiple independent rules are searched using modified mixed integer linear programming (MILP). A Case Study on Tennessee Eastman Process (TEP) reveals the superior performance of the proposed method in reducing computation time, extracting more rules and improving classification accuracy.
Article
This paper revisits the Boolean logical requirement of a pattern and develops 0-1 multilinear programming (MP) models for (Pareto-)optimal patterns for logical analysis of data (LAD). We show that all existing and also new pattern generation models can naturally be obtained from the MP models via linearization techniques for 0-1 multilinear functions. Furthermore, 0-1 MP provides an insight for understanding how different and independently developed models for a particular type of pattern are inter-related. These show that 0-1 MP presents a unifying theory for pattern generation in LAD.
Conference Paper
In this paper, the conditional reliability function and the Remaining Useful Life (RUL) of a cutting tool are estimated as a function of the current condition's states. RUL is estimated based on the available information obtained from condition monitoring. The cutting forces' measurements define the states, and are considered as the monitoring signals that offer diagnosis of the tool wear state. The cutting tool is used under constant machining parameters, namely the cutting speed, the feed rate, and the depth of cut. Experimental data is collected during turning titanium metal matrix composites (TiMMCs) which are a new generation of materials and have proven to be viable in aerospace application. Two modeling tools are used to model the tool's reliability and hazard functions; The Proportional Hazards Model (PHM), which is a statistical tool that uses EXAKT software, and the Logical Analysis of Data (LAD), which is a machine learning tool that uses cbmLAD software. A comparison between the two approaches is given. The results are presented, and the practical use of these results is discussed. The Remaining Useful Life (RUL) of a cutting tool during turning TiMMCs, and its conditional reliability function are estimated as functions of the current condition's states.
Article
Full-text available
Neural networks or connectionist models for parallel processing are not new. However, a resurgence of interest in the past half decade has occurred. In part, this is related to a better understanding of what are now referred to as hidden nodes. These algorithms are considered to be of marked value in pattern recognition problems. Because of that, we tested the ability of an early neural network model, ADAP, to forecast the onset of diabetes mellitus in a high risk population of Pima Indians. The algorithm's performance was analyzed using standard measures for clinical tests: sensitivity, specificity, and a receiver operating characteristic curve. The crossover point for sensitivity and specificity is 0.76. We are currently further examining these methods by comparing the ADAP results with those obtained from logistic regression and linear perceptron models using precisely the same training and forecasting sets. A description of the algorithm is included.
Book
Full-text available
Integer and combinatorial optimization deals with problems of maximizing or minimizing a function of many variables subject to (a) inequality and equality constraints and (b) integrality restrictions on some or all of the variables. Because of the robustness of the general model, a remarkably rich variety of problems can be represented by discrete optimization models. This chapter is concerned with the formulation of integer optimization problems, which means how to translate a verbal description of a problem into a mathematical statement of the form linear mixed-integer programming problem ( MIP), linear (pure) integer programming problem ( IP), or combinatorial optimization problem (CP). The chapter presents two important uses of binary variables in the modeling of optimization problems. The first concerns the representation of nonlinear objective functions of the form l>jfj(yj) using linear functions and binary variables. The second concerns the modeling of disjunctive constraints.
Book
Full-text available
This paper presents the fundamental principles underlying tabu search as a strategy for combinatorial optimization problems. Tabu search has achieved impressive practical successes in applications ranging from scheduling and computer channel balancing to cluster analysis and space planning, and more recently has demonstrated its value in treating classical problems such as the traveling salesman and graph coloring problems. Nevertheless, the approach is still in its infancy, and a good deal remains to be discovered about its most effective forms of implementation and about the range of problems for which it is best suited. This paper undertakes to present the major ideas and findings to date, and to indicate challenges for future research. Part I of this study indicates the basic principles, ranging from the short-term memory process at the core of the search to the intermediate and long term memory processes for intensifying and diversifying the search. Included are illustrative data structures for implementing the tabu conditions (and associated aspiration criteria) that underlie these processes. Part I concludes with a discussion of probabilistic tabu search and a summary of computational experience for a variety of applications. Part II of this study (to appear in a subsequent issue) examines more advanced considerations, applying the basic ideas to special settings and outlining a dynamic move structure to insure finiteness. Part II also describes tabu search methods for solving mixed integer programming problems and gives a brief summary of additional practical experience, including the use of tabu search to guide other types of processes, such as those of neural networks. INFORMS Journal on Computing, ISSN 1091-9856, was published as ORSA Journal on Computing from 1989 to 1995 under ISSN 0899-1499.
Article
Full-text available
A new type of efficient and accurate proteomic ovarian cancer diagnosis systems is proposed. The system is developed using the combinatorics and optimization-based methodology of logical analysis of data (LAD) to the Ovarian Dataset 8-7-02 (http://clinicalproteomics.steem.com), which updates the one used by Petricoin et al. in The Lancet 2002, 359, 572-577. This mass spectroscopy-generated dataset contains expression profiles of 15 154 peptides defined by their mass/charge ratios (m/z) in serum of 162 ovarian cancer and 91 control cases. Several fully reproducible models using only 7-9 of the 15 154 peptides were constructed, and shown in multiple cross-validation tests (k-folding and leave-one-out) to provide sensitivities and specificities of up to 100%. A special diagnostic system for stage I ovarian cancer patients is shown to have similarly high accuracy. Other results: (i) expressions of peptides with relatively low m/z values in the dataset are shown to be better at distinguishing ovarian cancer cases from controls than those with higher m/z values; (ii) two large groups of patients with a high degree of similarities among their formal (mathematical) profiles are detected; (iii) several peptides with a blocking or promoting effect on ovarian cancer are identified.
Article
Full-text available
The paper describes a new, logic-based methodology for analyzing observations. The key features of the Logical Analysis of Data (LAD) are the discovery of minimal sets of features necessary for explaining all observations and the detection of hidden patterns in the data capable of distinguishing observations describing positive outcome events from negative outcome events. Combinations of such patterns are used for developing general classification procedures. An implementation of this methodology is described in the paper along with the results of numerical experiments demonstrating the classification performance of LAD in comparison with the reported results of other procedures. In the final section, we describe three pilot studies on applications of LAD to oil exploration, psychometric testing, and the analysis of developments in the Chinese transitional economy. These pilot studies demonstrate not only the classification power of LAD, but also its flexibility and capability t...
Article
Full-text available
The "Logical Analysis of Data" (LAD) is a methodology developed since the late eightees, aimed at discovering hidden structural information in data sets. LAD was originally developed for analyzing binary data by using the theory of partially defined Boolean functions. An extension of LAD for the analysis of numerical data sets is achieved through the process of "binarization" consisting in the replacement of each numerical variable by binary "indicator" variables, each showing whether the value of the original variable is above or below a certain level. Binarization was successfully applied to the analysis of a variety of real life data sets. This paper develops the theoretical foundations of the binarization process studying the combinatorial optimization problems related to the minimization of the number of binary variables. To provide an algorithmic framework for the practical solution of such problems, we construct compact linear integer programming formulations of them. We develop...
Article
Full-text available
The main objective of this paper is to compare the classification accuracy provided large, comprehensive collections of patterns (rules) derived from archives of past observations, with that provided by small, comprehensible collections of patterns. This comparison is carried out here on the basis of an empirical study, using several publicly available datasets. The results of this study show that the use of comprehensive collections allows a slight increase of classification accuracy, and that the "cost of comprehensibility" is small.
Article
Full-text available
We consider the application of several computeintensive classification techniques to disk drive manufacturing quality control. This application is characterized by very high dimensions, with hundreds of features, and tens of thousands of cases. Two principal issues are addressed# (a) can a very expensive testing process be eliminated while still maintaining high quality throughput in disk drive manufacturing, and (b) can the manufacturing process be made more efficient by identifying bad disk drives prior to the expensive testing. Preliminary results indicate that although the expensive testing cannot becompletely eliminated, afraction of the disk drives can be determinedtobe faulty prior to further testing. This detection may improve the throughput of the manufacturing line. 1 Introduction We consider the application of several computeintensive classification techniques [14] to solveanimportant application in manufacturing qualitycontrol. Unlike many applications typically reported...
Article
Full-text available
A linear support vector machine (SVM) is used to extract 6 fea- tures from a total of 31 features in a dataset of 253 breast cancer patients. Five features are nuclear features obtained during a non-invasive diagnostic procedure while one feature, tumor size, is obtained during surgery. The linear SVM selected the 6 features in the process of classifying the patients into node-positive (patients with some metastasized lymph nodes) and nodenegative (patients with no metastasized lymph nodes). Node-positive patients are typically those who undergo chemotherapy. The 6 features were then used in a Gaussian kernel nonlinear SVM to classify the patients into three prog- nostic groups: good (node-negative), intermediate (1 to 4 metastasized nodes) and poor (more than 4 metastasized nodes). Very well separated Kaplan-Meier survival curves were constructed for the three groups with pairwise p-value of less than 0.009 based on the logrank statistic. Patients in the good prognostic group had the highest survival, while patients in the poor prognostic group had the lowest. The majority (72.8%) of the good group did not receive chemotherapy, while the majority (87.5%) of the poor group received chemotherapy. Just over half (56.7%) of the intermediate group received chemotherapy. New patients can be assigned to one of these three prognostic groups with its associated survival curve, based only on 6 features obtained before and during surgery, but without the potentially risky procedure of removing lymph nodes to determine how many of them have metastasized.
Article
Full-text available
We investigate the application of Support Vector Machines (SVMs) in computer vision. SVM is a learning technique developedbyV.Vapnik and his team (AT&T Bell Labs.) that can beseen as a new methodfortraining polynomial, neural network, or Radial Basis Functions classifiers. The decision surfaces are found by solving a linearly constrained quadratic programming problem. This optimization problem is challenging because the quadratic form is completely dense and the memory requirements grow with the square of the number of data points. We present a decomposition algorithm that guarantees global optimality, and can beusedtotrain SVM's over very large data sets. The main ideabehind the decomposition is the iterative solution of sub-problems and the evaluation of optimality conditions which areused both to generate improved iterative values, and also establish the stopping criteria for the algorithm. We present experimental results of our implementation of SVM, and demonstrate the feasibili...
Article
Full-text available
Two medical applications of linear programming are described in this paper. Specifically, linearprogramming -based machine learning techniques are used to increase the accuracy and objectivity of breast cancer diagnosis and prognosis. The first application to breast cancer diagnosis utilizes characteristics of individual cells, obtained from a minimally invasive fine needle aspirate, to discriminate benign from malignant breast lumps. This allows an accurate diagnosis without the need for a surgical biopsy. The diagnostic system in current operation at University of Wisconsin Hospitals was trained on samples from 569 patients and has had 100% chronological correctness in diagnosing 131 subsequent patients. The second application, recently put into clinical practice, is a method that constructs a surface that predicts when breast cancer is likely to recur in patients that have had their cancers excised. This gives the physician and the patient better information with which to plan treat...
Article
This paper investigates the use of Boolean techniques in a systematic study of cause-effect relationships. The model uses partially defined Boolean functions. Procedures are provided to extrapolate from limited observations, concise and meaningful theories to explain the effect under study, and to prevent (or provoke) its occurrence.
Article
This paper develops global optimization algorithms for linear multiplicative and generalized linear multiplicative programs based upon the lower bounding procedure of Ryoo and Sahinidis [30] and new greedy branching schemes that are applicable in the context of any rectangular branch-and-bound algorithm. Extensive computational results are presented on a wide range of problems from the literature, including quadratic and bilinear programs, and randomly generated large-scale multiplicative programs. It is shown that our algorithms make possible for the first time the solution of large and complex multiplicative programs to global optimality.
Article
Specializing a general framework of logical analysis of data for efficiently handling large-scale genomic data, we develop in this paper a probe design method for selecting short oligo probes for genotyping applications. When tested on genomic sequences obtained from the National Center of Biotechnology Information in various monospecific and polyspecific in silico experiments, the proposed probe design method was able to select a small number of oligo probes of length 7 or 8 nucleotides that perfectly classified all unseen testing sequences. These results demonstrate the efficacy of the proposed probe design method and illustrate the usefulness and potential a well-designed optimization-based probe selection method has in genotyping applications.
Article
An implicit Lagrangian [Math. Programming Ser. B 62 (1993) 277] formulation of a support vector machine classifier that led to a highly effective iterative scheme [J. Machine Learn. Res. 1 (2001) 161] is solved here by a finite Newton method. The proposed method, which is extremely fast and terminates in 6 or 7 iterations, can handle classification problems in very high dimensional spaces, e.g. over 28,000, in a few seconds on a Pentium II machine. The method can also handle problems with large datasets and requires no specialized software other than a commonly available solver for a system of linear equations. Finite termination of the proposed method is established in this work.
Article
In a finite dataset consisting of positive and negative observations represented as real valued n-vectors, a positive (negative) pattern is an interval in Rn with the property that it contains sufficiently many positive (negative) observations, and sufficiently few negative (positive) ones. A pattern is spanned if it does not include properly any other interval containing the same set of observations. Although large collections of spanned patterns can provide highly accurate classification models within the framework of the Logical Analysis of Data, no efficient method for their generation is currently known. We propose in this paper, an incrementally polynomial time algorithm for the generation of all spanned patterns in a dataset, which runs in linear time in the output; the algorithm resembles closely the Blake and Quine consensus method for finding the prime implicants of Boolean functions. The efficiency of the proposed algorithm is tested on various publicly available datasets. In the last part of the paper, we present the results of a series of computational experiments which show the high degree of robustness of spanned patterns.
Article
The paper presents a review of the basic concepts of the Logical Analysis of Data (LAD), along with a series of discrete optimization models associated to the implementation of various components of its general methodology, as well as an outline of applications of LAD to medical problems. The combinatorial optimization models described in the paper represent variations on the general theme of set covering, including some with nonlinear objective functions. The medical applications described include the development of diagnostic and prognostic systems in cancer research and pulmonology, risk assessment among cardiac patients, and the design of biomaterials.
Article
Advanced Scout is a PC-based data mining application used by National Basketball Association (NBA) coaching staffs to discover interesting patterns in basketball game data. We describe Advanced Scout software from the perspective of data mining and knowledge discovery. This paper highlights the pre-processing of raw data that the program performs, describes the data mining aspects of the software and how the interpretation of patterns supports the process of knowledge discovery. The underlying technique of attribute focusing as the basis of the algorithm is also described. The process of pattern interpretation is facilitated by allowing the user to relate patterns to video tape.
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
The classification rules induced by machine learning systems are judged by two criteria: their classification accuracy on an independent test set (henceforth "accuracy"), and their complexity. The relationship between these two criteria is, of course, of keen interest to the machine learning community. There are in the literature some indications that very simple rules may achieve surprisingly high accuracy on many datasets. For example, Rendell occasionally remarks that many real world datasets have "few peaks (often just one) " and so are "easy to learn" (Rendell & Seshu, 1990, p.256). Similarly, Shavlik et al. (1991) report that, with certain qualifications, "the accuracy of the perceptron is hardly distinguishable from the more complicated learning algorithms " (p.134). Further evidence is provided by studies of pruning methods (e.g. Buntine & Niblett, 1992; Clark & Niblett, 1989; Mingers, 1989), where accuracy is rarely seen to decrease as pruning becomes more severe (for example, see Table 1) 1. This is so even when rules are pruned to the extreme, as happened with the "Err-comp " pruning method in Mingers (1989). This method produced the most accurate decision trees, and in four of the five domains studied these trees had only 2 or 3 leaves
Article
Logical Analysis of Data is a methodology of mathematical optimization on the basis of the systematic identification of patterns or "syndromes." In this study, we used Logical Analysis of Data for risk stratification and compared it to regression techniques. Using a cohort of 9454 patients referred for exercise testing, Logical Analysis of Data was applied to identify syndromes based on 20 variables. High-risk syndromes were patterns of up to 3 findings associated with >5-fold increase in risk of death, whereas low-risk syndromes were associated with >5-fold decrease. Syndromes were derived on a randomly derived training set of 4722 patients and validated in 4732 others. There were 15 high-risk and 26 low-risk syndromes. A risk score was derived based on the proportion of possible high risk and low risk syndromes present. A value > or =0, meaning the same or a greater proportion of high-risk syndromes, was noted in 979 patients (21%) in the validation set and was predictive of 5-year death (11% versus 1%, hazard ratio 8.3, 95% CI 5.9 to 11.6, P<0.0001), accounting for 67% of events. Calibration of expected versus observed death rates based on Logical Analysis of Data and Cox regression showed that both methods performed very well. Using the Logical Analysis of Data method, we identified subsets of patients who had an increased risk and who also accounted for the majority of deaths. Future research is needed to determine how best to use this technique for risk stratification.
Article
Objective: The goal of this study is to re-examine the oligonucleotide microarray dataset of Shipp et al., which contains the intensity levels of 6817 genes of 58 patients with diffuse large B-cell lymphoma (DLBCL) and 19 with follicular lymphoma (FL), by means of the combinatorics, optimisation, and logic-based methodology of logical analysis of data (LAD). The motivations for this new analysis included the previously demonstrated capabilities of LAD and its expected potential (1) to identify different informative genes than those discovered by conventional statistical methods, (2) to identify combinations of gene expression levels capable of characterizing different types of lymphoma, and (3) to assemble collections of such combinations that if considered jointly are capable of accurately distinguishing different types of lymphoma. Methods and materials: The central concept of LAD is a pattern or combinatorial biomarker, a concept that resembles a rule as used in decision tree methods. LAD is able to exhaustively generate the collection of all those patterns which satisfy certain quality constraints, through a systematic combinatorial process guided by clear optimization criteria. Then, based on a set covering approach, LAD aggregates the collection of patterns into classification models. In addition, LAD is able to use the information provided by large collections of patterns in order to extract subsets of variables, which collectively are able to distinguish between different types of disease. Results: For the differential diagnosis of DLBCL versus FL, a model based on eight significant genes is constructed and shown to have a sensitivity of 94.7% and a specificity of 100% on the test set. For the prognosis of good versus poor outcome among the DLBCL patients, a model is constructed on another set consisting also of eight significant genes, and shown to have a sensitivity of 87.5% and a specificity of 90% on the test set. The genes selected by LAD also work well as a basis for other kinds of statistical analysis, indicating their robustness. Conclusion: These two models exhibit accuracies that compare favorably to those in the original study. In addition, the current study also provides a ranking by importance of the genes in the selected significant subsets as well as a library of dozens of combinatorial biomarkers (i.e. pairs or triplets of genes) that can serve as a source of mathematically generated, statistically significant research hypotheses in need of biological explanation.
Article
Describes a new, logic-based methodology for analyzing observations. The key features of this “logical analysis of data” (LAD) methodology are the discovery of minimal sets of features that are necessary for explaining all observations and the detection of hidden patterns in the data that are capable of distinguishing observations describing “positive” outcome events from “negative” outcome events. Combinations of such patterns are used for developing general classification procedures. An implementation of this methodology is described in this paper, along with the results of numerical experiments demonstrating the classification performance of LAD in comparison with the reported results of other procedures. In the final section, we describe three pilot studies on applications of LAD to oil exploration, psychometric testing and the analysis of developments in the Chinese transitional economy. These pilot studies demonstrate not only the classification power of LAD but also its flexibility and capability to provide solutions to various case-dependent problems
Article
Patterns are the key building blocks in the logical analysis of data (LAD). It has been observed in empirical studies and practical applications that some patterns are more “suitable” than others for use in LAD. In this paper, we model various such suitability criteria as partial preorders defined on the set of patterns. We introduce three such preferences, and describe patterns which are Pareto-optimal with respect to any one of them, or to certain combinations of them. We develop polynomial time algorithms for recognizing Pareto-optimal patterns, as well as for transforming an arbitrary pattern to a better Pareto-optimal one with respect to any one of the considered criteria, or their combinations. We obtain analytical representations characterizing some of the sets of Pareto-optimal patterns, and investigate the computational complexity of generating all Pareto-optimal patterns. The empirical evaluation of the relative merits of various types of Pareto-optimality is carried out by comparing the classification accuracy of Pareto-optimal theories on several real life data sets. This evaluation indicates the advantages of “strong patterns”, i.e. those patterns which are Pareto-optimal with respect to the “evidential preference” introduced in this paper.
Article
This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees. 1. Introduction Current data collection technology provides a unique challenge and opportunity for automated machine learning techniques. The advent of major scientific projects such as the Human Genome Project, the Hubble Space Telescope, and the human brain mappi...
UCI repository of machine learning databases: Readable data repository, Department of Computer Science, University of California at Irvine, CA, available from World Wide Web: http://www.ics.uci.edu/ mlearn/MLRepository
  • P Murphy
  • P Murphy
CPLEX 9.0 User's Manual
  • Ilog Cplex
ILOG CPLEX Division, CPLEX 9.0 User's Manual, Incline, Nevada, October 2003.
Using the ADAP learning algorithm to forecast the onset of diabetes mellitus
  • J Smith
  • J Evelhart
  • W Dickinson
  • W Knowler
  • R Johannes
Ovarian cancer detection by logical analysis of data
  • G Alexe
  • S Alexe
  • L Liotta
  • E Petricoin
  • M Reiss
  • P Hammer
Ovarian cancer detection by logical analysis of data
  • Alexe