Science topic

Classifier Ensemble - Science topic

Explore the latest questions and answers in Classifier Ensemble, and find Classifier Ensemble experts.
Questions related to Classifier Ensemble
  • asked a question related to Classifier Ensemble
Question
10 answers
Can anyone suggest any ensembling methods for the output of pre-trained models? Suppose, there is a dataset containing cats and dogs. Three pre-trained models are applied i.e., VGG16, VGG19, and ResNet50. How will you apply ensembling techniques? Bagging, boosting, voting etc.
Relevant answer
  • asked a question related to Classifier Ensemble
Question
3 answers
I have implemented Stacking classifier using Decision Tree, kNN and Naive bayes as base learner and Logistic Regression as metaclassifier (final predictor), stacking has increased the accuracy in comparison to individual classifiers. The problem is multiclass (6 classes) with categorical target (Target= Activity performed by user exa: walking, running, standing.... on UCI-HAR dataset). Now I am unable to understand:
1. How Logistic Regression is working on the output/prediction of base level classifier?
2. What will be the final output of Logistic Regression if: model 1 is giving class 1, model 2 is giving class 2 and model 3 is giving class 3 as their prediction. (exa: DT-Running; kNN-Walking; NB-Standing, then how logistic regression will decide the final output)? If possible kindly explain why and how.
Relevant answer
Answer
Muhammad Sakib Khan Inan what I know is when using voting classifier we can ensemble using Hard voting (maximum votes) and soft voting (using probabilities); but I have not used voting instead I have used stacking classifier i.e., three classifiers at level 0 (DT, kNN, NB) and on the output of these individual classifiers (predictions) I have used meta learner (logistic Regression) which is working as final classifier on this new dataset (prediction from individual classifier). This stacking is increasing the accuracy also; but what I am not able to understand that what will happen if each of these three classifiers will predict different class. (how logistic regression will work); thank you for showing your concern.......
  • asked a question related to Classifier Ensemble
Question
7 answers
i was making a classification model ( 3 class ) for early detection of cracks in ball bearings, the data set is limited 120 rows and 14 features. the classifiers and their parameters is listed below can you please suggest which model will be the best (not simply accuracy also consider model complexity )
Relevant answer
Answer
It is better to use 10 fold cross-validation mode for calculating results and comparing with other trees.
Most probably if this is vibration data, random forest tree always exhibits superior performance.
  • asked a question related to Classifier Ensemble
Question
11 answers
What are the best techniques for geospatial datasets? Also, are there some techniques that are better suited for stacking of models than using a single model.?
Relevant answer
Answer
I dig myself into CNN explainability in the past months. Most of the projects are suitable only one purpose or damn slow per image. I found an implementation of GradCAM, GradCAM++ and ScoreCAM from a Japanese developer on Github. It is awesome, no dependencies and very easy to integrate into your Python stack. And these methods can be run on a larger scale on many images in a reasonable amount of time:
Other notes:
- SHAP APIs are not straightforward to use and that method is damn slow. Not very useful to do quick iterations of your work.
- Backprop-based techniques got criticism for being not really working.
- LIME got similar criticism.
- For saliency maps, I did not find them useful at all.
Any other methods are very much WIP, no public software or you have to implement yourself. I found the above mentioned CAM-based methods, easy to integrate, more or less working and they are relative quick per image.
  • asked a question related to Classifier Ensemble
Question
3 answers
How to perform cross validation to prepare the input for the next level classifier in multi layer stacking classifier in mlxtend ?
Thanks in advance.
Relevant answer
Answer
  • asked a question related to Classifier Ensemble
Question
4 answers
Assume that, after running some hyperparameter optimization technique based on the training data and possibly using cross-validation to guide the search, there are M best models avaliable to create an ensemble.
How can the performance of the method be assessed?
Some thoughts:
  • The simplest way would be selecting just a set of M models (resulting of only one optimization run), create N replications (i.e. random train/test splits) and get the statistics. I believe this is a highly skewed approach, as it only consider the randomness of the ensemble creation, but not the randomness of the optimization technique.
  • A second alternative would be creating N replications and, for each one of them, running the entire method (from hyperparameter optimization to ensemble creation). Then, we could look at the statistics for the N measures.
  • The last alternative I can think of is creating N replications and, for each one of them, (1) run the optimization algorithm and (2) create K ensembles. In the end, we could extract statistics from all K*N measures.
Since I wasn't able to find good references on this specific issue, I hope you can help me :)
Please, cite published work that supports your answer.
Thank you.
Relevant answer
Answer
Oi Vitor,
I personally always fit the ensembles as I would fit any model, with k-fold cross-validation or bootstrap, using the standard metrics (R2, MSE, Gini index, accuracy, Kappa, etc.). Do not worry about bias from optimizing the components of the ensemble - after all, they were cross-validated right? Try out with a toy example and compare performances.
Btw, I don't think selecting the single best model is ideal - that is not an ensemble! What you should definitely keep in check, is that the predictions of the M models you want to plug in into the ensemble have uncorrelated CV predictions (scatterplots will help). This is because models that yield similar predictions will not contribute much to the ensemble. Boa sorte!
Francisco
  • asked a question related to Classifier Ensemble
Question
5 answers
When applying Random forests classifiers for 3D image segmentation, what is the best practice to deal with the high size of the images in the training and the test step? Is it a common practice to resize( downsample ) the image? 
Relevant answer
Answer
Ahlan Amal,
Perhaps it is a normal practice to represent those images or ROIs within them with vectors comprising some statistical features. I did that with RF and SVM. If you end up with a large matrix then you can resort to using dimensionality reduction techniques such as PCA.
Hope that has answered your question to some extent!
  • asked a question related to Classifier Ensemble
Question
4 answers
I know how does the Random Forest works if we have two choices. If apple is Red go left, if Apple is Green go right. and etc. 
But for my question, if the data is texts"features" I trained the classifier with training data, I would like to understand deeply how does the algorithm split the node, based on what? the tf-idf weight, or the word itself. In addition, how did it precidt the class for each example. 
I would really appreciate a very explained in details with example in texts.
Relevant answer
Answer
Hi Sultan,
I am not familiar on which implementation of RandomForest you are using. Anyway, you are bit far from what it does.
Random forests for classification builds Multiple decision trees (not only one). Each tree output a label (using a process similar to the one described by you with the apple). Then, the final label is decided by a majority voting process.
Each tree is generated through a subset of the initial training set randomly selected (usually, 1/3 of the samples are used, depending on the default value of the implementation that you are used). Each tree is also generated using just a few features (e.g. if your features are frequency counts of words, it will consider just a small subset of words, randomly selected among the initial ones). Usualy, 3-5 features are used...but, again...it depends on the default value of your implementation.
The splitting criteria on each tree node is to use a condition over typically one feature which divides the dataset in two equal parts. The criteria to select which is the feature to consider in each node is, by default, the gini criterion (see here to know more about this). This process stops whenever all the remaining data samples have the same label. This node will be a leaf one which will assign labels to test samples.
Please let me know if you got everything from here.
Best,
Luis
  • asked a question related to Classifier Ensemble
Question
11 answers
The task involves predicting a binary outcome in a small data set (sample sizes of 20-70) using many (>100) variables as potential predictors. The main problem is that the number of predictors is much larger than the sample size, and there is limited/no knowledge of which predictors may be more important than other. Therefore it is very easy to "overfit" the data - i.e. to produce models which seemingly describe the data at hand very well, but in fact include spurious predictor variables. I tried using an ensemble classification method called randomGLM (see http://labs.genetics.ucla.edu/horvath/htdocs/RGLM/#tutorials) which seeks to improve on AICc-based GLM selection using the "bagging" approach taken from random forests. I checked results by K-fold cross-validation and ROC curves. The results seemingly look good - e.g. a GLM which contains only those variables which were used in >=30 out of 100 "bags" produced a ROC curve AUC of 87%. However, I challenged these results with the following test: several "noise" variables (formulas using random numbers from the Gaussian and other distributions) were added to the data, and the random GLM procedure was run again. This was repeated several times with different random values for the noise variables. The noise variables actually attained non-negligible importance - i.e. they "competed" fairly strongly with the real experimental variables and were sometimes selected in as many as 30-50% of the random "bags". To "filter out" these nonsense variables, I tried discarding all variables whose correlation coefficient was not statistically significantly different from zero (with Bonferroni correction for multiple variables) and run randomGLM on the retained variables only. This works (I checked it with simulated data), but is of course very conservative on real data - almost all variables are discarded, and resulting classification is poor. What would be a better way to eliminate noise variables when using ensemble prediction methods like randomGLM in R? Thank you in advance for your interest and comments!
Relevant answer
Answer
Dear Craig,
The two problems you mention, overfitting and interpretation of random forests, are partly mythological and mostly, now, historical.
About overfitting. It's almost always caused by letting the trees continue splitting to purity. There are analytical results showing that doing this is provably inconsistent. We exactly no idea as to why this is a default option in random forest.
Solution, don't let any terminal nodes in a single tree contain less than some fixed small size, say around 5% of the sample size. As expected, the current best analytical results show that the terminal node size must grow, slowly, with the sample size. But we've found that just using some small fraction, like 5%, works fine, as shown by intensive replications and cross-validation and permutation testing. 
More on overfitting. Use a crowd machine or a regression collective over a large number of differently tuned machines, of any type. This way we avoid having to name a winning scheme for any data set, and get asymptotically optimal results. The larger point here is that tuning parameters of any kind, or application of any collection of machines, such as SVMs with different kernels, is no longer necessary. Researcher brainpower is better directed at other problems. 
About the black box aspects of a model-free learning machine. We use two solutions. First, use the machine as a signal detector and as a refinement scheme to declare good predictors, while driving out noise. Recurrency, mentioned in an earlier note, seem to work well for both problems. Then--with some much smaller feature list in hand--send it to any small, interpretable parametric model of the researcher's choice. And, of course, compare the two predictions made by the original machine and the small model. They are routinely quite close.
Another method for opening up the black box: use a risk machine. That is, let random forest, or any other machine, run as a probability machine and get the log odds for each feature. Similarly you can get risk effects, risk differences. These all compare well with a logistic regression model, in the sense that the individual risk machine estimates for each parameter are agreement with that given by the logistic model. The two estimates of the individual parameter can be shown to be asymptotically equal. And importantly, using a risk machine we don't have to initially declare any parametric model, linear or logistic, over a set of features. 
We have peer-reviewed publications for everything above.
Jim
  • asked a question related to Classifier Ensemble
Question
3 answers
hello dear...
Please  I need to know  about the flowing few questions, regarding  the feature selection problem in Intrusion detection :
- this problem consist in selecting the more relevent feature subset for each class (here we have 5 classes so the result ll be 5 subsets  ). then  these subset (needless to say one at a time ) are used to train a classifier.
is it necessary to use a binary classifier for this purpos (combine 5 binary classifier for the 5 classes ) ??  and if we use a multiclasse classifier trained on data that have only features relevent for a particular classe,  will this affect the classifier's performance ??? 
- What is the influence of the distribution of instances in the training data sets (numbre of instances of each classe in the training datasets ) on the feature selection's performance  or in the classification's performance ? 
Thanks.  
Relevant answer
Answer
Hello,
Basically, if you train your multi-class classifier on a set of data containing only some of the classes, the model would be over-fitted to those particular classes. In other words, you wont have any knowledge of the absent classes in your trained model.  This can also happen if the number of instances from various classes are significantly out of balance, even in binary classification case.
Therefore, if you have only instances from some classes, you'd better train binary classifier. Moreover, tray to make your positive and negative instances (or instances from every class in multi-class case) balanced.
Good luck!
  • asked a question related to Classifier Ensemble
Question
1 answer
Please give your replies with a valid references
Relevant answer
Answer
Hello dear Imane,
 I have used the vote fusion method in classifier fusion as primary technique. For voting ; majority voting and max voting is the same . it consists on choosing the class label wich have the max number of vote by classifiers ensemble; but there are other variety of voting like : vote with threshold , weighted voting baysian voting  and  unanimity voting.
You can see more on the enclosed  paper titled 'Ensemble Methods in Machine Learning" by  TG Dietterich.
N.AZIZI
  • asked a question related to Classifier Ensemble
Question
6 answers
Molecular dynamics.
Relevant answer
Answer
You can use in LAMMPS fix ave/atom command 
  • asked a question related to Classifier Ensemble
Question
3 answers
Ensemble of classifiers (EOC) combines a set of diverse classifiers to solve a classification problem. If we combine many classifiers, what kind of limitations might I encounter? Concept drifting would be one issue in EOC formulation. To train all single diverse classifiers requires training, which would be another problem.
Can anyone please suggest some resources regarding the limitations, and some ways we might resolve them.
Relevant answer
Answer
Concept drifting is an issue of all machine learning, I wouldn't consider it as ensemble-learning-specic problem.
Cornerstone of ensemble-learning is problem of diversity. The ideal ensemble systems should be accurate as possible and make errors on different parts of the input space => classifiers should be diverse. Although this definition is logic and clear, there are number of different measures of diversity and their relationship with ensemble accuracy is not straightforward.
The methods of construction of the ensemble systems that attempt to achieve the highest diversity by intuitive way, i.e. by weakening the weights of similar classifiers such as AdaBoost, work significantly better than the basic methods such as simple majority vote approach. Surprisingly, the experiments proved that the measuring of diversity and trying to explicitly use it during the iterative process of construction of the ensemble system has not worked nearly as well.
There are several possibilities how to achieve the highest diversity and build up classifiers with different decision boundaries:
- Different training datasets: Training dataset is partitioned into several subsets and each subset is used for learning of one classifier. The partitioning can be performed through resampling techniques such as bagging or bootstrapping which draws training data subset with replacement. Another possibility is to use methods which divide training data subsets without replacement, such as jack-knife method.
-Different training parameters: Each type of classifier usually offers to set up some initial parameters. For example: In case of multilayer perceptron neural networks, it can be number of layers / nodes or number of error goals.
- Different types of classifiers: Certain types of classifiers are more or less suitable for different types of problems. However, there is usually not only one suitable type of classifier at all, so choice of several of them and combination of their outputs could be also way how to build up set of classifiers with different decision boundaries.
I would recommend you to study at least several general articles about ensemble learning, i.e.:
For deeper understanding of principles and concepts, read this excellent book:
Hope it helps!