Science topic

Semi-Supervised Learning - Science topic

Explore the latest questions and answers in Semi-Supervised Learning, and find Semi-Supervised Learning experts.
Questions related to Semi-Supervised Learning
  • asked a question related to Semi-Supervised Learning
Question
5 answers
In which application of Machine Learning ( NLP, Computer Vision, etc ) would we find maximum value with Semi-Supervised Learning and Self-Training ?
Relevant answer
Answer
  • asked a question related to Semi-Supervised Learning
Question
9 answers
I'm working on images classification using ResNet, I have two datasets to train my model :
- 30 000 labeled images (260 class) ;
- 30 000 unlabeled images.
Images contains numers and letters so technically I should be able to classify 260 class (combinaison of 26 letters and 10 numers).
So I was wondering if there's any unsupervised or semi-supervised model that can help me to label my images ?
Relevant answer
Answer
Deep Learning is beneficial you can use almost raw images to test. Transfer learning using AlexNet, GoogLeNet, ResNet50,Vgg19 etc., can be use to train the completely new datasets. Also, you can create your own custom CNN models using MATLAB to find out solution your classification problems. (Can create all possible combinations of parameters learning rates, epochs, frequency, optimizer etc.)
  • asked a question related to Semi-Supervised Learning
Question
5 answers
I know what contrastive learning is, and I know what other traditional segmentation losses are. What I understand is that the goal of contrastive loss is basically to pull similar things together and push dissimilar things apart. But I want to know how this can guide a segmentation pipeline (e.g. semantic segmentation)? My question is pretty basic. Blogs/video links are more welcome than research paper links.
Relevant answer
Answer
I have no good idea on this and want to know from others.
Thanks!
  • asked a question related to Semi-Supervised Learning
Question
4 answers
Self-training is a wrapper method for semi-supervised learning. First, a supervised learning algorithm is trained based on the labeled data only. This classifier is then applied to the unlabeled data to generate more labeled examples as input for the supervised learning algorithm. One problem with self-training is that the First a supervised learning algorithm mistakes reinforce themselves.
Relevant answer
  • asked a question related to Semi-Supervised Learning
Question
6 answers
After reading a lot of semi-supervised methods, the way these methods work is by training a small labeled data, then predicting the labels of the unlabeled points incrementally. So if I want to test the algorithm on new unseen test data, does the new training data become (Labeled + Unlabeled (with predicted labels)) and then perform supervised classification? Is it like a two step supervised procedure?
Relevant answer
Nice Topic , Following
  • asked a question related to Semi-Supervised Learning
Question
4 answers
Few days back I first heard about the semi-supervised learning from this amazing book called "Introduction to statistical learning". So, I started some digging. Gathered some idea about semi-supervised learning. Unfortunately, all of my findings had discussed semi-supervised learning with customised or unrealistic dataset. Or just explained the concept.
Can anyone please give me a REAL WORLD dataset example where semi-supervised learning is applicable?
Relevant answer
Answer
In semi-supervised learning we try to leverage information from both labeled and unlabeled data to improve prediction performance. This is particularly useful when we have more unlabeled than labeled data and when the cost of labeling data (economic and time investment) is high.
For example, in the problem of Automatic Prediction of Protein Function (AFP) we are exactly in this situation. We have few labeled proteins for each function (e.g. GO term) and we want to use the entire protein network to make predictions. Labeling new proteins requires time to perform experimental validation and, even if we have high-throughput technologies we are far to fill the gap.
Hope it helps :)
  • asked a question related to Semi-Supervised Learning
Question
4 answers
Hi,
I confused on one thing, so I need you help in figuring out the right way:
I have introduce a supervised machine learning-based framework, which is definitely easy to evaluate because you only have labeled data, now the thing is that I have enhanced the same supervised machine learning framework to have semi-supervised capabilities. So, I have labeled data and unlabeled data. I used self-training to classify the unlabeled data and add them in the dataset if the predication probabilities of unlabeled data is above threshold. Now, my question is how to evaluate my semi-supervise learner and see whether it add value in the supervised learning model?
I am confused if I use the updated dataset (with newly added data using semi-supervised learning) for evaluation then wouldn't it add a bias in my results? because my classifier is not 100% accurate and it might have add some false positive samples from the self training part.
Relevant answer
Answer
Dear Ali Masood,
I recommend that you do a simulation with fully labeled data. You can randomly select a subset of the data to represent the unlabeled data. Any time your model makes a prediction, you would check the actual label (which you know) and score accordingly. I also recommend that you perform cross-validation.
If you plan to use unlabeled data, then you would have to manually verify each prediction and score accordingly. If the class is somewhat subjective, e.g. mood or emotion, then you would have to have several people verify the predicted class and reach consensus.
  • asked a question related to Semi-Supervised Learning
Question
4 answers
Which R packages are you using for semi-supervised machine learning? I have looked at RSSL, ssc, and AdaSampling. I am interested in PUL that doesn't use any negative labels in the training set.
Relevant answer
Answer
Somnath Paramanik True and thx for the response. I may need to clarify my intent. I am publishing a paper that describes a new heirarchial mixed model semi-supervised method. I have compared the performance of our method to various other supervised and unsupervised algorithms. I have even compared to semi-supervised methods that aren't PUL (positive and unlabeled learning) algorithms. Reviewers want me to do this comparison before acceptance.
  • asked a question related to Semi-Supervised Learning
Question
5 answers
The justification in using Positive and Unlabeled (PU) Learning for Satellite Imageries is the high cost of labeling the pixels and that when we want to extract only one object, say Road, we just collect labeled data for that class. However, if we could have small number of labeled data for other classes, say the negative class, besides the target class, then we could use less complicated methods such as semi-supervised learning instead of PU leaning. But, I've seen papers working on PU leaning for classification of Satellite Imageries. Since, Remote Sensing is not my main focus, I was wondering if someone could help me and let me know if there is a strong justification in using PU learning for the classification of Satellite Imageries? Thanks.
Relevant answer
Answer
Tom Arjannikov Thank you very much Tom.
  • asked a question related to Semi-Supervised Learning
Question
5 answers
Hi,
I'm currently using Random Forests to classify acoustic data to corresponding bat species, however I was wondering if there is a way to also incorporate unsupervised cluster analysis into my classifier (such as K-Means or Hierarchical modelling)?
Whilst I have training data for ~40% of my target species, there is likely to be a large proportion of my field dataset which corresponds to species that haven't yet been described in the supervised model and therefore I would like for the classifier to be able to assign new clusters/call types to new classes (that hopefully can be matched to a species later).
I have been looking at consensus maximisation algorithms but I'm not sure if this would be the best method.
Any help is greatly appreciated!
(I'm not a computer scientist so please forgive me if this is a very straightforward question)
Relevant answer
Answer
What do you mean for semi-supervised learning?
Reinforcement learning?
  • asked a question related to Semi-Supervised Learning
Question
6 answers
What is the right validation method to determine the accuracy of any semi-supervised machine learning algorithm?
I have implemented the expectation maximization version of Naive Bayes, my problem is its accuracy is compromised if the training size increases.
For instance, when I apply 5 fold validation its accuracy of classification is 92% and when I apply 10 fold validation the accuracy of classification reduces to 84%.
I need to compare its accuracy with the Naive Bayes so what would be the valid approach for measuring the accuracy of both?
  • asked a question related to Semi-Supervised Learning
Question
2 answers
Difference between label propagation and label spreading with example..
Relevant answer
Answer
The key idea of the two methods are essentially the same. The difference lies in the design of the transition matrix. Label propagation uses the graph Laplacian while Label spreading uses the normalized graph Laplacian.
  • asked a question related to Semi-Supervised Learning
Question
5 answers
reinforcement learning
semi-supervised learning
Relevant answer
Answer
your question reminded me of the following post:
which is interesting.
in my opinion reinforcement learning is not semi-supervised approach. In semi-supervised approaches you make use of both labeled and unlabeled data to train a model (for e.g., co-training). In reinforcement learning you don't have any labels. You use reward to provide feedback to the agent on how well it has performed but you don't actually say what the correct answer was.
  • asked a question related to Semi-Supervised Learning
Question
3 answers
I'm having a concrete problem I'm trying to solve but I'm not sure in which direction I should go.
  • Goal: Identify formation of a soccer team based on a static positional data (x,y coordinates of each player) frame
  • Input: Dataframe with player positions + possible other features
  • Output: Formation for the given frame
  • Limited, predefined formations (5-10) like 5-3-2 (5 defenders, 3 midfield players, 3 strickers)
  • Possible to manually label a few examples per formation
I already tried k-means clustering on single frames, only considering the x-axis to identify defense, midfield and offense players which works ok but fails in some situations.
Since I don't have (much) labels im looking for unsupervised neural network architectures (like self organizing maps) which might be able to solve this problem better than simple k-means clustering on single frames.
I'm looking for an architecture which could utilize the additional information I have about the problem (number and type of formations, few already labeled frames, ..).
Relevant answer
Answer
Dear Blust,
Please follow the papers given below:
1. Bialkowski, A., Lucey, P., Carr, P., Yue, Y., Sridharan, S., & Matthews, I. (2014, December). Large-scale analysis of soccer matches using spatiotemporal tracking data. In Data Mining (ICDM), 2014 IEEE International Conference on (pp. 725-730). IEEE.
2. Link, D. (2018). Data Analytics in Professional Soccer: Performance Analysis Based on Spatiotemporal Tracking Data. Springer.
  • asked a question related to Semi-Supervised Learning
Question
2 answers
I want to use embedded word vectors as feature in existing conditional random field (CRF) with gazetteer features for sequence labeling task in text. One way to do this is make cluster of word vectors and take the cluster id as feature, Is it possible to use word vector itself as feature. Since CRF needs binary features and embedded word vectors are real number. How would one use real number valued feature in CRF ? 
There is one paper of Bengio's group. "Word representations: A simple and general method for semi-supervised learning". They have used the same but its not at all clear to me. 
Relevant answer
Answer
Short answer: you cannot use real-valued embedding vectors in a nominal-feature CRF.
Three ways out:
1. Nominalize the vectors using clustering as you proposed, or produce a fixed list of the highest/lowerst-valued dimensions and use these as ids. Does not work so well
2. Switch over to a neural classifier and provide your gazetteer and other features as additional 'neurons' in the last layer before the classification layer. Works better, but you'll lose the CRF capability of normalizing over the label sequence.
3. Use representations from distributional semantics that produce nominal features, such as distributional thesauri. See e.g.:
  • asked a question related to Semi-Supervised Learning
Question
4 answers
Dear colleagues, I am doing some research in data classification, and I would like to compare to some classification approaches that uses label propagation (see e.g. Zhu, Ghahramani, Lafferty, Semi-supervised Learning Using Gaussian Fields and
Harmonic Functions. In ICML, pp. 912–919, 2003.). Do you know of any software doing this that I can download and use? I am working with the classical UCI datasets ionosphere, pima, spam, etc, are there published results on those datasets for those methods? 
Thanks and regards,
Renato Bruni
Relevant answer
Answer
I suggest you to review MEKA.
I think that perhaps it's that you need.
Best regards!
  • asked a question related to Semi-Supervised Learning
Question
5 answers
I have been reading about semi-supervised techniques and would like to ask if anyone could direct me to any semi-supervised machine learning methods capable of generating probability distribution in a graph-based environment for discrete data? E.g. a Bayesian network like for semi-supervised category.
Relevant answer
Answer
You can also try a semi-supervised graph-label propagation algorithm called Modified Adsorption.
Reference: New Regularized Algorithms for Transductive Learning [ Slides ] [ Video ]
Partha Pratim Talukdar, Koby Crammer, ECML 2009, 
Here is a link to the Junto package that implements this algorithm: http://code.google.com/p/junto/
  • asked a question related to Semi-Supervised Learning
Question
14 answers
Supervised learning handle the classification problem with certain labeled training data and semi-supervised learning algorithm aims to improve the classifiers performance by the help of amount of unlabeled samples. While is there any  theory or classical frame work to handle the training with soft class label? This soft label are prior knowledge of the training sample,  which may be the class probabilities, class beliefs or  expert experience values. 
Relevant answer
Answer
I disagree here. The question is not how to train a classifier in general and which are the well known courses/softwares to (learn to) do so.
The question,  as far as I understood,  is how to train a classifier when the available supervision at training time are soft class labels in the form of a class conditional probability value for each training example. In a binary classification case, this would be p(x|c=1) for each sample x in your training set conditioned to the class "c" (or its complement p(x|c=0) for the other class).
The fact that a standard logistic regression outputs a model that produces such probability values, once the model is trained and can be used on independent test examples, does not answer how to use such a soft supervision at training time.
In other words, a standard logreg package would typically require as inputs a  training  set in the form of concrete examples "x" together with their hard class labels e.g. (c=1) or (c=0).
Besides, the most standard way of fitting a logreg is by  iteratively-reweighted least squares (IRLS) and it does not look immediate that this specific type of optimization actually "minimize H(p, q), averaged over your training set."  At least, it would deserve some math to make the (doubtful) link or specify under which type of specific distributions p and q, both optimizations would happen to be equivalent.
  • asked a question related to Semi-Supervised Learning
Question
2 answers
What are the practical implementations that are available? Consider this scenario: There are 100K input neurons. There are 10 hidden layers with an average of 1K neurons per hidden layer. There are only 2 neurons in the output layer.
(1) Can weights of such a neural network be learned in a distributed fashion? What I mean is:
a) can we split the NN into parts, learn the weights for individual parts, store parts separately in different machines
b) combine the results from parts (during testing) in order to get the final answer.
(2) Will the results be the same as having one huge monolithic NN?
(3) Are there advantages in terms of convergence time and sample size if we study a NN in a distributed fashion?
(4) Are there practical implementations of such systems, hopefully open-source?
Relevant answer
Answer
Thanks for the pointers Simone. Sure looks like a nascent field that as sparked some fundamental rethinking in terms of engineering machine learning algorithms.
  • asked a question related to Semi-Supervised Learning
Question
4 answers
I was going over the Co-training technique, which is a widely used semi-supervised method, where you look at the data in two views (split it into two uncorrelated feature sets) and then learn the classifiers. To predict a new test data you take into account the decision of both the learned classifiers. This explanation sounds similar to in the way random forest works. Have I interpreted it wrong?
Relevant answer
That is right Eric. They adopt the same philosophy. That is, to diversify the classifiers and get more reliable predictions.
  • asked a question related to Semi-Supervised Learning
Question
8 answers
Recent research showed that all semi-supervised learning algorithms are not capable of outperforming supervised learning algorithms that work on just the small amount of labeled data.
Guo, Y., Niu, X., & Zhang, H. (2010). An Extensive Empirical Study on Semi-supervised learning. Proceedings of IEEE International Conference on Data Mining, 10, 186–195.
Relevant answer
Answer
An important point when discussing the "reliability" of semi-supervised algorithms (especially the ones which can be initialized with very few seed labeled instances) is how to select a subset of labeled data for a particular problem. Performance can be very sensitive to the initial labeled instances selected. This is particularly the case with minimally supervised relation extraction settings. However, I haven't seen much "robustness" analysis on semi supervised algorithms conducted with this aspect in mind.
  • asked a question related to Semi-Supervised Learning
Question
1 answer
My problem is multi-class document classification. Any Ideas?
Relevant answer
Answer
Have you got any evidence that Label Propagation is the best tool to use?