Main flowchart of the GLISTER-ONLINE framework for Data Selection.

Main flowchart of the GLISTER-ONLINE framework for Data Selection.

Source publication

Figure 1: Main flowchart of the GLISTER-ONLINE framework for Data...

Figure 3: Active Learning Results: (a) SVM-Guide, (b) SVM Guide with...

Figure 10: Top Row: Convergence time for Data Selection for efficient...

Figure 11: (a) Training Data (b) "GLISTER-ONLINE" Subset, (c) CRAIG...

Figure 14: (a) Training Data (b) "GLISTER-ONLINE" Subset, (c) CRAIG...

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Preprint

Full-text available

Dec 2020

Large scale machine learning and deep models are extremely data-hungry. Unfortunately, obtaining large amounts of labeled data is expensive, and training state-of-the-art models (with hyperparameter tuning) requires significant computing resources and time. Secondly, real-world data is noisy and imbalanced. As a result, several recent papers try to...

Context 1

... algorithm is iterative, in that it proceeds by simultaneously updating the model parameters and selecting subsets. Figure 1 gives a flowchart of GLISTER-ONLINE. Note that instead of performing data selection every epoch, we perform data selection every L epochs, for computational reasons. ...

View in full-text

Context 2

... extensive analysis, we used L = 20 in our experiments. Convergence with varying r values In this section, we analyzed the effect of varying r values on our model convergence rate on various datasets as shown in the figure 10. From the results, it is evident that our GLISTER-ONLINE framework is highly unstable for low values of r and that the stability of our model increases with r. ...

View in full-text

Context 3

... it is not always advisable to use large r values. It is also evident from our results shown in figure 10, since our GLISTER-ONLINE framework has faster convergence when r = 10 instead of r = 20 in the majority of the datasets. ...

View in full-text

Context 4

... synthetic datasets include linearly separable binary data, multi-class separable data, and linearly separable binary data with slack. The linearly separable binary dataset, as shown in fig.11a, comprises two-dimensional feature data points from two non-overlapping data points clusters of class 0 and class 1. ...

View in full-text

Context 5

... linearly separable binary dataset, as shown in fig.11a, comprises two-dimensional feature data points from two non-overlapping data points clusters of class 0 and class 1. The multi-class separable data comprises two-dimensional feature data points from four non-overlapping data points clusters of classes 0,1,2,3 which is shown in fig.12a. A overlapping version of the same is shown in fig.14a. ...

View in full-text

Context 6

... multi-class separable data comprises two-dimensional feature data points from four non-overlapping data points clusters of classes 0,1,2,3 which is shown in fig.12a. A overlapping version of the same is shown in fig.14a. Whereas the linearly separable dataset with slack comprises two-dimensional feature data points from two overlapping data points clusters of class 0 and class 1 as shown in fig.13a. ...

View in full-text

Context 7

... overlapping version of the same is shown in fig.14a. Whereas the linearly separable dataset with slack comprises two-dimensional feature data points from two overlapping data points clusters of class 0 and class 1 as shown in fig.13a. The result in 11 shows the subset selected by various methods for a linearly binary separable dataset. ...

View in full-text

Context 8

... A overlapping version of the same is shown in fig.14a. Whereas the linearly separable dataset with slack comprises two-dimensional feature data points from two overlapping data points clusters of class 0 and class 1 as shown in fig.13a. The result in 11 shows the subset selected by various methods for a linearly binary separable dataset. From fig. 11b, we can see that our GLISTER-ONLINE framework selects a subset that is close to the boundary, whereas other methods like CRAIG (Mirzasoleiman, Bilmes, and Leskovec 2020), Random, KNNSubmod (Wei, Iyer, and Bilmes 2015) selects subsets that are representative of the training dataset as shown in figures 11c, 11e, 11d respectively. ...

View in full-text

Context 9

... 11b, we can see that our GLISTER-ONLINE framework selects a subset that is close to the boundary, whereas other methods like CRAIG (Mirzasoleiman, Bilmes, and Leskovec 2020), Random, KNNSubmod (Wei, Iyer, and Bilmes 2015) selects subsets that are representative of the training dataset as shown in figures 11c, 11e, 11d respectively. Similarly, in fig. 12 points selected by the methods are highlighted for linearly separable dataset with four classes, in figure 13 points selected by the methods are highlighted for binary dataset with outliers and in figure 14 points selected by the methods are highlighted for overlapping dataset with four ...

View in full-text

Context 10

... in fig. 12 points selected by the methods are highlighted for linearly separable dataset with four classes, in figure 13 points selected by the methods are highlighted for binary dataset with outliers and in figure 14 points selected by the methods are highlighted for overlapping dataset with four classes. ...

View in full-text

Context 11

... in fig. 12 points selected by the methods are highlighted for linearly separable dataset with four classes, in figure 13 points selected by the methods are highlighted for binary dataset with outliers and in figure 14 points selected by the methods are highlighted for overlapping dataset with four classes. ...

View in full-text

Context 12

... GLISTER-ONLINE is driven by validation data, GLISTER-ONLINE can better handle situations where there is distribution shift in test and validation data from the training data, This what is called as the covariate shift. To illustrate this we use two synthetic datasets which are very similar to dataset, as shown in figure 14a and figure 13a except that their validation dataset is shifted as shown in figure 15b and figure 16b respectively. Figures 15c and 15d shows how effective the methods -CRAIG (Mirzasoleiman, Bilmes, and Leskovec 2020), Random, KNNSubmod (Wei, Iyer, and Bilmes 2015) and GLISTER-ONLINE are reducing validation and test loss respectively. ...

View in full-text

Context 13

... GLISTER-ONLINE is driven by validation data, GLISTER-ONLINE can better handle situations where there is distribution shift in test and validation data from the training data, This what is called as the covariate shift. To illustrate this we use two synthetic datasets which are very similar to dataset, as shown in figure 14a and figure 13a except that their validation dataset is shifted as shown in figure 15b and figure 16b respectively. Figures 15c and 15d shows how effective the methods -CRAIG (Mirzasoleiman, Bilmes, and Leskovec 2020), Random, KNNSubmod (Wei, Iyer, and Bilmes 2015) and GLISTER-ONLINE are reducing validation and test loss respectively. ...

View in full-text

Context 14

... illustrate this we use two synthetic datasets which are very similar to dataset, as shown in figure 14a and figure 13a except that their validation dataset is shifted as shown in figure 15b and figure 16b respectively. Figures 15c and 15d shows how effective the methods -CRAIG (Mirzasoleiman, Bilmes, and Leskovec 2020), Random, KNNSubmod (Wei, Iyer, and Bilmes 2015) and GLISTER-ONLINE are reducing validation and test loss respectively. Clearly GLISTER-ONLINE outperforms other methods. ...

View in full-text

Context 15

... GLISTER-ONLINE outperforms other methods. A similar trend is seen Figures 16c and 16d for the binary dataset. ...

View in full-text

Application of Artificial Intelligence Tools in Classification and Diagnosis of Heart Disease: General Review

Chapter

Full-text available

Jun 2021

To identify heart diseases several contributory risk factors and information about patient must be taken into account. Diagnosis of heart disease requires medical specialist that are often timely unavailable or costly for some patients in remote and rural areas, but also in overly populated urban areas. For this reason, over the years, new approach...