Machine Learning

Machine Learning

  • Paulo Fontainha Gomes added an answer:
    What publicly available datasets of annotated social interactions exist?
    I am interested in using machine learning to recognize social interaction patterns such as disagreements, and potentially use those patterns to generate new simulated interactions. I've been working with crowd sourced descriptions of social interactions, but these are more narrative and less action driven.

    Are you aware of publicly available datasets of annotated social interactions?
    Types of data that might be good candidates are annotated movie scripts or forum threads. Skeletal/gesture data could also be interesting.
    Paulo Fontainha Gomes · University of California, Santa Cruz

    For those of you interested in the Wikipedia dataset, it is described in:

    [Jankowski-Lorek et al., 2014] Jankowski-Lorek, M., Nielek, R., Wierzbicki, A., and Zielinski, K.(2014). Predicting controversy of wikipedia articles using the article feedback tool. In Proc. Seventh ASE International Conference on Social Computing.

    And available at:

    Although not yet available online at the time of this post, I suggest contacting one of the authors, Kazimierz Zielinski, to potentially have access to an author version of the document.

  • Ronán Michael Conroy added an answer:
    Mandelbrotian vs Gaussian ?

    In engineering/economic  practices I encounter many applications that gaussian distribution is used to model the probability of events. I know that distributions like gaussian distribution have a definite variance and mean that gives us good information on the probability of events. 

    On the other hand mandelbrotian or fat-tailed distributions describe processes that  outlier events  are not as unlikely as processes that are  modeled using  gaussian/poisson/exponential distribution.

    I see many processes that could be considered mandelbrotian but treated as gaussian (for example in economic predictions/forecast or some machine learning problems).

    I wonder who gives us permission to use distributions like gaussian/poisson/exponential distribution with so much confidence in many applications?

    Ronán Michael Conroy · Royal College of Surgeons in Ireland

    We might regard continuous distributions as being the sums of simpler distributions. In health research, if we see fat tails (or just one) we tend to think that there is a process driving the majority of observations and different processes driving the extremes. For example, in pregnancy you have high outlying blood pressures due to disorders in the response of the body to the pregnant state. Indeed, even the extremes of height are driven by conditions that do not affect most of the distribution.

    So really, finding a single distribution to fit a real life scenario is not as good a tactic as the one proposed centuries ago by Bacon : "We learn when we establish the general rule, but we learn again when we examine the exceptions"

    Those fat tails may be trying to tell you something!

  • Lambert Zijp added an answer:
    What will happen if the feature dataset has been scaled in the range of 0.1 to 1 for linear classifier rather than using the scaling range 0 to 1?

    In general the dataset used for classification using Lilinear classifier is scaled in the range 0 to 1. now if the scaling range  of 0.1 to 1 is feeded to the same classifier, would it effect the result?

    Lambert Zijp · Netherlands Cancer Institute

    If I remember well, the scale of the features should be in the same order of magnitude, so that small scale range difference should not have any effect.

  • Alireza Yousefi added an answer:
    Can someone explain to me the degree of freedom very clearly with examples in relation to the theory of machines?
    Alireza Yousefi · University of Zanjan

    in that time we are solving problems expolation core barber  hyper particles by planar motation

  • Bhavana Bharat Dalvi added an answer:
    Is there a semi-supervised method capable of probability distribution?
    I have been reading about semi-supervised techniques and would like to ask if anyone could direct me to any semi-supervised machine learning methods capable of generating probability distribution in a graph-based environment for discrete data? E.g. a Bayesian network like for semi-supervised category.
    Bhavana Bharat Dalvi · Carnegie Mellon University

    You can also try a semi-supervised graph-label propagation algorithm called Modified Adsorption.
    Reference: New Regularized Algorithms for Transductive Learning [ Slides ] [ Video ]
    Partha Pratim Talukdar, Koby Crammer, ECML 2009, 

    Here is a link to the Junto package that implements this algorithm:

  • Riadh Belkebir added an answer:
    Any opinion, statement or experiences about feature representation learning in deep learning?

    Feature representation learning is not intended for feature selection but to make a better presentation of good features. Even by using representation learning, the bad features are still bad or even worse. 

    Riadh Belkebir · University of Science and Technology Houari Boumediene

    see : Learning Deep Architectures for AI by Yoshua Bengio

    It gives a good discussion about the issue

  • Marcus Neuer added an answer:
    What is the affect of fine-tune in Deep Belief Network?

    In DBN, will fine-tuning (with back-propagation) make a lot of change to the representation learned by stacked RBM? I have visualized the representation of the first layer after pretraining and fine-tuning , and I can not tell the differences of the representations with my eye. Does that mean, fine-tuning will only affects the higher levels of DBN and is used particularly for providing error signals for classification?

    Which kind of topology did you used for your experiment?

    I am using the structure 786*1000*1000, each layer is pretained using RBM, and the network is using sigmoide activation functions.

    Marcus Neuer · VDEh-Betriebsforschungsinstitut

    Considering Rui Rodrigues, I would think you will find the attached publication helpful. It is only one work where the effect is shown. It also contains some graphical illustration on the impact of the pretraining on the  DBN. In Fig. 8 of this work you can clearly see the increasing separation of no-pretraining and with-pretraining with the number of layers.

  • Nima Teimouri added an answer:
    Which Classifier is best for real time applications?

    I would like to know which classifier is best suited for real time applications both in term of performance and computation. The answer with references will be appreciated.

    Lets Suppose we have a good computing facility, the application is utilizing hybrid computing (Parallel computing on CPU + GPU Compting) and feature vector is robust and has high dimension. In addition to that, delay of nanoseconds is acceptable then which classifier will be the best for classification task?

    Nima Teimouri · University of Tehran

    I think that LDA and QDA classifiers are the best methods to classify data into different classes for real time application. I used them for classifying data into five classes.

  • Jose Dolz added an answer:
    Is their any connection between computer vision and machine learning?

    Are machine learning concepts used in opencv library functions to detect any object any separate the foreground with the background etc?

    Jose Dolz · Aquilab

    Yes, of course there are connections. Since a lot of computer visions problems can be solved (or optimized) by classification approaches (i.e, segmentation is basically to label pixels/voxels), machine learning approaches are becoming popular in computer vision field (ANNs, SVMs, RBFs,...). 

  • Theodoros Anagnostopoulos added an answer:
    For the K nearest neighbor recognition what would be the best distance metric to implement for a handwritten digit recognizer?
    This is related to Machine Learning, digit/object recognition using KNN which is a supervised learning algorithm.based upon instance/lazy learning without using generalization and it is non-parametric in that it makes no assumptions of normalcy (Gaussian) distribution. I am trying to implement in: octave, python and java.
    Theodoros Anagnostopoulos · National Research University of Information Technologies, Mechanics and Optics

    Are you planning to apply a distance metric on raw characters or also planing to include some more context in your approach? What I mean is that you can test the following two cases: (i) Case 1: measure the distance with a variety of distance metrics in an online stream of characters this is a straight forward approach, (ii) Case 2: try to cluster a set of similar characters with online stream clustering and then apply a variety of distance metrics between the character and the cluster.   You can also use a variety of distances including what you have already used in the previous case. In any case you should apply a ceiling analysis first in order to decide whether the data you have can be manipulated more sufficiently with either the first or the second case. Moreover I think that there is some other efficient approach with context spaces theory and trajectory context prediction, but for the beginning please try the first two cases and if you please let me know.

  • Kishore Gopalan added an answer:
    Can anyone help with classification from text to web page?


    I have some text data that has details of customer name, address etc,..along with some other inforamtion which are not required and I would like to extract the required data and place it in the web page in appropriate fields.

    Please let me know which classification holds good and the different types of rules and any file conversion involved in this?

    Kishore Gopalan · Infosys


    Your understanding is correct.

  • Johan A.K. Suykens added an answer:
    Did you have any experience about Least Squares Support Vector Machines (LS-SVM) for function estimation? specially please describe about over-fitting

    The main results of LS-SVM model are so perfect but I'm hesitant about using LS-SVM , due to it's results that indicates over-fitting occurrence. Do you have any similar observations? Is it safe to use LS-SVM for data modeling? I haven't any similar problem with one layer Artificial Neural Networks(ANN).

    Also i am not sure how important are sigma and gamma values? they are very far apart in any train of algorithm , but their primary results are very similar.

    Thanks for your contribution 

    with regards


    Johan A.K. Suykens ·

    Dear all,

    If you select the regularization constant and kernel parameters in a good way, there will be no overfitting. The LS-SVM model will generalize well then.

    In the LS-SVMlab software

    this is also automatically tuned for you. The use of regularization prevents overfitting.

    If you would like to learn more about all of this, please consult the book

    J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle, Least Squares Support Vector Machines, World Scientific, Singapore, 2002 (ISBN 981-238-151-1)

    Best regards,

    Johan Suykens


    Prof. Johan Suykens
    Katholieke Universiteit Leuven
    Departement Elektrotechniek - ESAT-STADIUS
    Kasteelpark Arenberg 10
    B-3001 Leuven (Heverlee)

  • Najmeh Tavassoli added an answer:
    Can anybody provide me the 2D-PLS Matlab code?
    I am trying to make a prediction model according to some bright field images. Many thanks in advance.
    Najmeh Tavassoli · University of British Columbia - Vancouver


    I mean using 2-D images to make a prediction model. So we have a 2D matrix for Xi and  a number for Yi.

  • Mike Tian-Jian Jiang added an answer:
    Can you suggest tools for chunk idetification in English or Chinese sentences?

    I am looking for jar files or tools for identifying chunks in sentences (even better if they are open source and can be used for Chinese sentences). For example, the "data mining"  and "machine learning"  are chunks in the sentence "I like data mining and machine learning". Any suggestions? Thanks

    Mike Tian-Jian Jiang · Yaraku, Inc., Japan

    NLTK is elegant but it requires annotated corpus while Chinese one is limited.

    OpenNLP and StanfordNLP are similar. Indeed the latter one is better in terms of accuracy.

    StanfordNLP parser ( got an online demo you can have a taste of it:

    Also I guess you would like to read something about it in Chinese:

  • Sumathi Rajaji added an answer:
    Is there any ontology for the psychology domain?
    If any ontology has been used to represent the knowledge of a psychologist, how can I view the particular ontology?
    Sumathi Rajaji · skpasc

    Thank you for the information

  • Bernd Ludwig added an answer:
    Can anyone help with datasets that can be naturally clustered along multiple dimensions?

    I'm looking for datasets that can be naturally clustered along multiple dimensions. For instance, a collection of movie reviews can be clustered along the dimensions of topic, sentiment, or genre (e.g., action, romantic, documentary etc.). Similarly, political blog postings can be clustered by topic as well as other dimensions like author’s stance on an issue (e.g., support, oppose), or her political affiliation.

    I have collected a few datasets which has been listed here: Do you know of other datasets which has been annotated along multiple dimensions? The datasets would be useful to evaluate a multi-clustering system which seeks to organize, or cluster, a set of text documents along multiple dimensions.

    Bernd Ludwig · Universität Regensburg

    I have a data set containing ratings of cocktails along the dimensions "strong", "sour", "fruity", "sweet".

  • Javed Ahmed added an answer:
    How to increase the number of inputs in ANFIS?

    I read that the number of the inputs in ANFIS can be at most six. I want to optimize a problem wherein I have around 37 inputs. And I want to apply all the inputs simultaneously. So how to increase the number of inputs?

    Javed Ahmed · National University of Sciences & Technology

    I don't remember the details right now, since I worked on Fuzzy Logic long time ago. However, if there is really a limit in the number of outputs to be just one, you can construct 8 parallel ANFIS systems each having 1 (different) output but 37 (same) inputs.

    As far as the ANN is concerned, it can help you a great deal only if you have got sufficient dataset for all the combinations of the values of inputs and the corresponding values of outputs.

  • Faton Merovci added an answer:
    Which one is best for implementing machine learning algorithms and statistical modelling, SAS, R, or MATLAB ?
    If all of the above are available, which one should be chosen to implement machine learning algorithms and statistical modelling?
    Faton Merovci · University of Prishtina

    Dear Verma,

    I would like to prefer MAPLE.


  • Hussein Mazaar added an answer:
    Who is using RapidMiner and for which applications?

    Who is using RapidMiner in their research?

    Which application or problem do you address with RapidMiner?

    Hussein Mazaar · Cairo University

    It is a data mining tool (Machine learning)  for classification, clustering,.... 

  • Chitta Behera added an answer:
    Can we use SVR for an underdetermined system?

    Underdetermined system (number of independent variables are more than dependent variable).

    SVR- support vector regression.

    Chitta Behera · Indian Institute of Technology Gandhinagar

    Thanks to all for nice idea.  Can  SVR  be reliable for noisy data?  I mean let say i have training data set which is noise  free and i used for learning purpose (i.e to find weight (W) and bias term (b)). now i get input data which is noisy and try to estimate output(y) from that.  Can SVR do well here?

  • Iman Khodadi added an answer:
    What is the difference between data mining, pattern recognition, machine learning and artificial intelligence?

    I am working in artificial neural networks from 5 months. But, up to now I couldn't figure out any difference between data mining, machine learning and artificial intelligence. Can anyone please explain the difference between them.

    And, how is statistics different from artificial intelligence?

    Iman Khodadi · Tarbiat Modares University

    Machine learning is a sub-field of artificial intelligence.

    Patter recognition is set of machine learning methods (with some differences and also new methods)  for recognition of pattern.

    Data mining is set of machine learning methods (with some differences and also new methods) for mining data.

    The relationship between them from general to specific is:

    Artificial intelligence -> Machine learning -> Pattern recognition -> Data mining

  • Ted Dunning added an answer:
    Classification/regression with very large dataset - any thoughts?

    I'd like to train a model with about 10^10 (10 billion) samples. The number of features is quite low (~100).

    Do you know of any machine learning method that can deal with this amount of data (within my lifetime)?


    Do you know of any dataset size reduction method that can deal with this amount of data?

    Ted Dunning · Apache Software Foundation

    For linear regression on a problem that is billions of rows by thousands of columns, you can construct the normal equation in a single pass over the data.  That, in turn, will give you a decent solution for your problem.

    For non-linear regression, with or without regularization and for linear regression with regularization, you can use various forms of parallel stochastic gradient descent (SGD) to solve your problem.  The H2O package from 0xdata has an implementation of this that should work for you if you can fit your data into memory.  In your case, each 1000 element row of data is likely to require 0.1-8kB of memory.  Your 10 billion row example will thus require somewhere between 1 and 80TB of memory if you want it to be completely resident.  The low end of this range is very easy to arrange these days and the high end is not at all implausible if it is commercially worthwhile to solve this system.  If you can devote this much memory to your problem, then you will be able to solve your problem in a stunningly short time.

    If your data exceeds memory, then parallel SGD is likely the best answer.  It is just harder to find really excellent implementations of solvers for this.

  • Mahdieh Askarian added an answer:
    How to train a classifier using probability class labeled data in machine learning?

    Supervised learning handle the classification problem with certain labeled training data and semi-supervised learning algorithm aims to improve the classifiers performance by the help of amount of unlabeled samples. While is there any  theory or classical frame work to handle the training with soft class label? This soft label are prior knowledge of the training sample,  which may be the class probabilities, class beliefs or  expert experience values. 

    Mahdieh Askarian · University of Tehran

    Hi Dingfu

    Naive Bayesian network as a classifier can handle probability very good. when you have no information about the classes, the prior probability can be consider as uniform distribution. Whenever you have additional information about classes , the prior probability can be changed. 

    P(class l feature)=P(feature l class)*P(class) / P(feature )

    BNT is a package in Matlab that can be use for applying Naive Bayesian network.

  • Bojan Ploj added an answer:
    What advantages does backpropagation have over the border pairs method?

    In the linked article a border pairs method is described which have numerous advantages against the backpropagation algorithm.

    Does anyone observed any deficiency?

    Bojan Ploj · School centre Ptuj, Slovenia

    Q: I noticed that you present results on noisy patterns starting from 0% noise. Is that a problem for your method?

    A: On the contrary, BPM results are better than those in the backpropagation method. BPM method is even able to effectively eliminate the noise.

    Q: If you have noisy patterns throughout (always 10% noise) will that degrade the performance, or, if the noise diminishes, which I could imagine would be a problem for any constructive algorithm?

    A:  Yes, noise degrade the results, but less than by the backpropagation method. Noise can be also reduced without any special problems for algorithm.

  • What is the realitionship between deep learning methods and reservoir computing (if any)?

    I have the intuition that these two types of methodologies are related but I could not find any references nor any clear explanation of this relationship besides the fact that they are 2 types of modern, novel and evolved artificial neural networks.

    Antonio Valerio Miceli Barone · Università di Pisa

    Deep learning doesn't necessarily involve recurrent neural networks. In fact, most research in deep learning is done on feed-forward neural networks.

    A feed-forward neural network is generally considered to be deep if it has more than one hidden layer.

    A recurrent neural network, when unfolded over time for an example of duration T, essentially becomes a feed-forward neural netowork with k*T hidden layers (where k is a constant usually equal to one).

    Since training many hidden layers using standard backpropagation techniques is difficult, extreme/reservoir computing gives up and just trains the output layer, while deep learning trains all the layers using techniques that extend standard backpropagation.

  • Lisa Neef added an answer:
    Learning sources for Data Assimilation.

    Dear All,

    I am new to Data Assimilation I would really appreciate if you suggest me some good sources to learn this topic. I am interested in learning with practical exercises using Matlab or Python.

    Your comments and suggestions are welcome.

    Lisa Neef · Helmholtz Centre for Ocean Research Kiel


    Check out the Data Assimilation Research Testbed at NCAR:

    It's a Fortran-based Ensemble assimilation software that includes small dynamical models (such as the Lorenz 1963 "chaos butterfly" model) where you can play with the assimilation parameters.  On their website they also have a tutorial that introduces the concept of ensemble data assimilation.  They also offer various Matlab codes for looking at the output.

    If you want to learn more about DA in general, especially from the perspective of variational DA, I recommend the ECMWF training course on DA (google it). 

  • Hossein Abedi asked a question:
    Situations that we can use machine learning - can anyone help?

    I read some chapters of a book called "Black Swan"by "Nassim Taleb". In the book events in the world are categorized as mediocre/extreme. In mediocre stand you don't need to worry about an anomaly with big effect on the results in the data but in extreme stand we may after a long period of time and in very rare situations run into an event with very large influence(Black Swan) that is not considered to be observed in the data in advance.

    I want to know what are the major criteria for some data to be considered from the mediocre or extreme stands in the machine learning.

    I presume it is wrong to use machine learning  for extreme stand view events because a single data can have extreme effect and question our findings.

  • Jose Jairo Camacho added an answer:
    Are there rules (semantics) that can be used to derive implicit requirements given explicit requirements of a domain?
    How can we learn them (Implicit Requirements)? Are there mechanisms or a set of steps that we will need to follow such that if A follows that step it's going to produce Implicit Requirements and If B follows that same step, it will end up with the similar Implicit Requirements?
    Jose Jairo Camacho · National University of Colombia

    I agree with mr Fannader, when eliciting requirements one should foucus all the effort in user needs, and as stated in the swebook chapter of requirements, using a matrix to match each requirement with typical NFR or a requirements dependency tree to see if all the needs are covered.

  • Preeti Balaji added an answer:
    Does anyone have experience with SAR image classification?

    Dear all, I am trying to classify a SAR image using machine learning algorithms. I am finding difficulty in choosing an appropriate approach. Also, once a technique is chosen (SVM or Random forests or any), how do I transform the SAR image training samples into the required format suitable for the classification approach? I am trying this in Python. Any help is highly appreciated. Thanks very much!

    Preeti Balaji · University College Cork

    @Madhu Bala Myneni; Thanks very much for sharing your paper with me. Its very helpful.

Topic Followers (26768) See all