Machine Learning

Machine Learning

  • Nuha A. S. Alwan added an answer:
    Which journals in computer science is free for publish?
    Is there a lists of free journals in computer science?
    Nuha A. S. Alwan · University of Baghdad

    IEEE Transactions on Computers

    Computers and Electrical Engineering (Elsevier)

    Journal of Computational Engineering (Hindawi)

    Computing (Springer)

    Parallel Computing (Elsevier)

  • Khelassi Abdeldjalil added an answer:
    How to implement Linear Genetic Programming.

    Anyone can recommend any software or codes to implement linear genetic programming?

    Thanks in advance.

    Khelassi Abdeldjalil · Abou Bakr Belkaid University of Tlemcen

    try the r it contains some interisting packages as ag, rgp and others.

  • Hanlin Tan added an answer:
    Any recommended C++ lib/source code for sparse coding?
    Is there any recommended C++ lib/source code for sparse coding? i.e. K-SVD. I want to embed sparse coding into my C++ based system. I am using K-SVD matlab code and call *.m functions in my C++ based system, however there are some problems and it seems to be not fast enough. I wonder anyone know C++ lib/source code for sparse coding? Thanks in advance!
    Hanlin Tan · National University of Defense Technology

    I'm meeting exactly the same problem as you do. As far as I know, SPAMS is implemented in C++ but not providing a C++ interface.

    I get mlpack at It provides C++ interface of sparse coding. The drawback is that this is only source code and it's really hard to get it compiled correctly. I'm still working on it.

  • Alexey Mekler added an answer:
    What is an appropriate method to measure the separability of multiple clusters?

    I would like to compare two feature selection methods in an application that consists of 10 classes. In order to be independent of the classification methods (LVQ, Neural Networks, SVMs, etc ...) I would like to focus on measuring the separability of these two methods in the 10-class problem at hand. Is the Dunn's index a suitable measurement in this regard? Do you recommend any other measurements? Looking forward for your feedback. Thank you.

    Alexey Mekler · St. Petersburg State University of Telecommunication


    This our article will be helpful for you, I think.

    It is good for cases of nonlinear discrimination, when ANN for further classification is used.

  • Danilo Rastovic added an answer:
    What are some good function approximation methods using fuzzy sets and logic such as fuzzy expert systems, fuzzy SVR, etc.?
    I have a project on function approximation by fuzzy decision trees and I want to compare my results with some other methods improved by fuzzy logic.
    Danilo Rastovic · Zagreb,Croatia

    You can read my papers about infinite fuzzy logic controllers (Fuzzy Sets and Systems, 1995, InterStat 2003) if you want work with the fuzzy approximations of Lebesque functions or fuzzy probability)

  • Bipin Kunjumon added an answer:
    Which Algorithmic Concept is suitable for building a Self Learning Intelligence in Machine..?

    I  am  using  a Genetic  Programming Concept  in  my  Embedded project... so far  ..  It  works  fine..  But I am still confused  with  some  stages...because  a  lot of  logistics  operations  are  there  in my  project....

    Bipin Kunjumon · Amrita Vishwa Vidyapeetham

    Thank  you  very much sir..

  • Shlomo Geva added an answer:
    What is the best distance measure for high dimensional data?
    Inspired by Aggarwal et al.`s paper "On the Surprising Behavior of Distance Metrics in High Dimensional Space" the question arises which distance measure is best suited for high dimensional data. My targeted analysis tasks are: NN calculation, clustering, and the projection of data
    Shlomo Geva · Queensland University of Technology

    I believe that the question has no good answer. There simply isn’t an answer as to which distance measure is best suited for high dimensional data because it is an ill defined question.  It always depends on the choice of representation. Others have already commented that it first comes down to feature selection or feature engineering. Features are rarely an homogeneous set and often mixed mode (symbolic, discrete, continuous, etc.) How to represent objects cannot be separated from how to measure distances/similarity/dissimilarity.  The two questions are not separable and must be answered simultaneously.  But let us suppose for a moment that L1 (or some other metric) is found to be “best” in 80% of all cases ever studied in "NN calculation, clustering, and the projection of data" -the target applications that the question is motivated by. This L1 observation is a totally useless observation when you come to solve a specific problem where you really want the best possible solution that you can find. Will you really take the risk and not check L2, cosine similarity, or some other metric just because L1 is *usually* better?  Surely you will explore quite a few options, within reason.

  • Ramadan Gad added an answer:
    Can anyone help with the structure of the affinity matrix in MATLAB?

    Well, I'm a novice for both matlab and research on machine learning. Can anyone tell me how to represent affinity matrix in random walk graph matching case (i.e. what will be its structure?). e.g. Wia,jb  will it be two dimensional or four dimensional array (matrix)? As you see from the example the affinity matrix W has four indexes (ia, jb). Does it mean the matrix should be four dimensional? Thanks in advance for any kind of explanation.

    Ramadan Gad · Minoufiya University

    you could use the cell matrix

    search in matlab help about cell();

  • Is there any article that has the results of applying machine learning algorithms to the Poker Hand DataSet?

    I started working with the Poker Hand DataSet and I need to know what are the results of the state of the art to compare them with my future results. I'm thinking of using Neural Networks, Naive Bayes and Decision Trees. I would appreciate advice on other algorithms.

    Gabriel Santiago Pujol Fariña · Pontifícia Universidade Católica do Rio de Janeiro

    Thank you so much, Yaakov HaCohen-Kerner !!!  It helped me a lot. Best regards!

    The dataset is available in:

  • Guillaume Ollivier added an answer:
    What is the best method to chose number of topics?

    Many indicators are used to evaluate topic modeling results (Log-Likelihood, AIC, perplexity) but in my experience (using r topicmodels) results don't converge. What can be the explanation? Which indicator is the more performant?

    Guillaume Ollivier · French National Institute for Agricultural Research

    HDP seems to be one solution, but to my mind there is no implementation under R. My aim is to identify major topics, and their evolution, in a small research field.

  • Hossein Abedi added an answer:
    Mandelbrotian vs Gaussian ?

    In engineering/economic  practices I encounter many applications that gaussian distribution is used to model the probability of events. I know that distributions like gaussian distribution have a definite variance and mean that gives us good information on the probability of events. 

    On the other hand mandelbrotian or fat-tailed distributions describe processes that  outlier events  are not as unlikely as processes that are  modeled using  gaussian/poisson/exponential distribution.

    I see many processes that could be considered mandelbrotian but treated as gaussian (for example in economic predictions/forecast or some machine learning problems).

    I wonder who gives us permission to use distributions like gaussian/poisson/exponential distribution with so much confidence in many applications?

    Hossein Abedi · Amirkabir University of Technology

    Thank you all  for the answers.

  • Deep Patel added an answer:
    Can anyone suggest me sites where i can find good research papers?
    I am trying to find good research papers in field of security and machine learning but unable to get enough materials. Please suggest websites where I can view and download the papers.
    Deep Patel · Nirma University

    In nirma u will find on the from library. try it

  • Mbaye Babacar Gueye added an answer:
    How can we prove the significance of features in classification?
    I have a binary classification problem. I have extracted 500 features from a set of 5000 samples using my domain knowledge. In other words, I have got hand crafted features.

    I wish to prove that these features actually are enough for performing classification and they make the 2 classes of samples separable. i.e. When the samples are represented with these features, there is exists a (reasonable) decision boundary.

    Please advise how I can prove this. Is there any statistically appropriate way of measuring the significance of the set of features as a whole (NOT the significance of individual features)?
    Mbaye Babacar Gueye · Pierre and Marie Curie University - Paris 6

    The simplest (and robust) way is to use PCA or KPCA  for feature extraction and then link the original features to all PCs by computing contribution of each feature to PCs. The sum of feautre contribution multiply by Eighen values of PC will give the importance of  feature to all the phenomenon. This paper may help

    The paper given by @Mahdieh can be useful.

    You can try random forest, also.

  • Ali M Abdulshahed added an answer:
    How to increase the number of inputs in ANFIS?

    I read that the number of the inputs in ANFIS can be at most six. I want to optimize a problem wherein I have around 37 inputs. And I want to apply all the inputs simultaneously. So how to increase the number of inputs?

    Ali M Abdulshahed · University of Huddersfield


    We have solved the same problem in this paper:

    A novel approach for ANFIS modelling based on Grey system theory for thermal error compensation

  • Mahboobeh Parsapoor added an answer:
    What is the realitionship between deep learning methods and reservoir computing (if any)?

    I have the intuition that these two types of methodologies are related but I could not find any references nor any clear explanation of this relationship besides the fact that they are 2 types of modern, novel and evolved artificial neural networks.

  • Muhammad Shahzad Cheema added an answer:
    Does anyone know the relationship of the number of support vector and the data dimension in SVM?


    Does any one know the relationship of the number of support vector and the number of data dimension in SVM?
    Is it possible that #support vector < #data dimension?
    If yes, for #support vector < dimension, will there be any bad imapct for SVM operation?

    Muhammad Shahzad Cheema · University of Bonn

    "Is it possible that #support vector < #data dimension?"

    YES. And it can happen particularly when dealing the High Dimension Low Sample(HDLSS) data and where attributes are sparse and intrinsitc dimensionality of data is also high e.g. in classification of unconstrained videos. In such cases nearly every instance provide some critical 'support' and becomes part of decision boundary. Still optimal SVMs may achieve performance equal or better than other classifiers -- at some extra/un-necessary computational costs.

    Now if your problem is an HDLSS problem, have a look for some insight and the remedy in Chapter-5 of

  • Volker Lohweg added an answer:
    If we obtain many results from different models given the same input, is there any technique to select the best result corresponding to that input?

    We trained the same input data using 5 different methods (machine learning models). As a result, some of them output the same or different results and we don't know which  result is the best output.

    Question: is there any scoring function or classification method which allows to select the best result from the obtained outputs?
    e.g., outputs A, A, A, A, B  ==> Best output = A

            outputs A, B, A, A, B ==> Best output = B


    Volker Lohweg · Hochschule Ostwestfalen-Lippe

    Try this. It will work.

    Generalisation Ability of Test Samples
    You can perform the generalisation test of an approach by 5 × 2 𝑐𝑣 𝐹 cross-validation method (meaning five times 2-fold cross-validation [1]). The original of the method is cited
    in [2]. It was pointed out in [2] that after five folds, the sets overlap and share many statistics implying that new folds do not add new information.

    [1] E. Alpaydın, Introduction to Machine Learning, 2nd ed. Cambridge:
    The MIT Press, 2010.

    [2] T. G. Dietterich, “Approximate Statistical Tests for Comparing Supervised
    Classification Learning Algorithms,” Neural Computation, vol. 10,
    pp. 1895–1923, 1998.

    We have applied the a/m method in (the papers will be published in IEEEXplore end of Sept, beginning of Oct.):

    Dörksen, Helene; Mönks, Uwe; Lohweg, Volker: Fast Classification in Industrial Big Data Environments. In: 19th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA) Barcelona Spain, Sep 2014.

    Dörksen, Helene; Lohweg, Volker: Combinatorial Refinement of Feature Weighting for Linear Classification. In: 19th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA) Barcelona, Spain, Sep 2014.

  • Volker Lohweg added an answer:
    What's the current hot direction in Data Mining, Machine Leaning and Computer Vision?

    See above. 

    Volker Lohweg · Hochschule Ostwestfalen-Lippe

    Well, this is a very wide question. Therfore, you get some short answers:

    1) Big Data approaches are really a hot top in engineering, Services, and banking as well as investigations. Lots of reasearch is necessary.

    2) Machine Learning (really difficult): Deep Learning is an interesting and relatively new topic. Interesting approaches for applications you will find in the field of image  processing. However, it is worthwhile to do research in other directions, like acoustics (Speech-processing) and especially semantics.

    3) In Image Processing interesting topics are Fusion approaches in Cobotics (colaborative machines) and Human-Machine-Interaction.

    A general comment: All of the above mentioned topics are basics for Industry 4.0 and Cyber-Physical-Production-System concepts.(really hot in Europe).

  • Francesco Orabona added an answer:
    Why does regular SGD fail to produce sparse solutions?

    Stochastic Gradient Descent (SGD) is fast for optimizing many convex objectives. But why does it fail to produce sparse solutions? Is there an intuitive explanation?

    To further clarify my question: while Coordinate Descent is "naturally" suitable for producing sparse solutions, why does GD fail to have this ability before any fix added to it? 

    Or, what's the difference between CD and GD that makes one sparsity-inducing while the other not?

    Francesco Orabona · Toyota Technological Institute at Chicago

    The rate of convergence of an algorithm usually depends on the geometry of the objective function and if the optimization is stochastic or not. An objective function that induces sparse solutions does not necessarily have to be a difficult/slow one to optimize. See for example elastic net regularized objective, that are strongly convex and induces sparse solutions. Also, algorithms that produce sparse intermediate solutions have usually the same or better (theoretical) speed of convergence of the ones that do not produce sparse solutions. Last point is that it might be non-trivial to produce a sparse solution from a non-sparse one and also guarantee the same error tolerance.

    Hence, yes it might be possible, but it would be very specific to your objective function and probably non-trivial.

  • Scott Lett added an answer:
    Is there any publicly available Hyperspectral data-set which can be used for detection of hidden/ camouflaged objects like tanks, mines etc?

    Hyperspectral data available for public is limited and I am unable to find any data for the purpose mentioned above. Any help will be greatly appreciated. 

    Scott Lett · Consulting Data Detective

    I don't speak for them, but you might try contacting Exelis Visual Information Solutions, who sell ENVI.  They may know where some freely available data sets might be located.  NASA might be another place to ask.  Good luck!

  • Chris Basta added an answer:
    What are the most effective adaptive learning techniques to train data to learn about tokens and sql selects?
    See above
    Chris Basta · Alexandria University

    Thanks Pedro for trying to help me, Appreciated

  • Manjusha Kulshrestha added an answer:
    Which one is best for implementing machine learning algorithms and statistical modelling, SAS, R, or MATLAB ?
    If all of the above are available, which one should be chosen to implement machine learning algorithms and statistical modelling?
  • Prabhash Kumarasinghe added an answer:
    Where to find the original paper of Rosenblatt's Perceptron algorithm? (The perceptron, a perceiving and recognizing automaton, Rosenblatt, F., 1957)

    I like to read the original paper about Perceptron by Rosenblatt in 1957. It was one of the pivoting points for machine learning community. Shall be much obliged if you can direct to any database or an online link etc.

    Thank you.


    Rosenblatt, Frank. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957.

    Prabhash Kumarasinghe · Nanyang Technological University

    Thank you very much, Simone. Appreciate your well-elaborated answer.

  • Manoj Karathiya added an answer:
    What is the difference between machine learning and data mining ?
    Is ML related to the algorithms and DM to the data ?
    Manoj Karathiya · Atmiya Institute Of Technology & Science

    Data mining and machine learning used to be two cousins. They have different parents. Now they grow increasingly like each other, almost like twins. Many times people even call data mining by the name Machine learning.
    The field of machine learning grew out of the effort of building artificial intelligence. Its major concern is making a machine learn and adapt to new information. The origin of machine learning can be traced back to 1957 when the perception model was invented. This is modeled after neurons in human brain. That prompted the development of neural network model, which flourished in late 1980s. From 1980s to 1990s, the decision tree method has become very popular, owing to the efficient package of C4.5. SVM was invented in mid-1990s and it has since been widely used in industry. Logistic regression, an old method in statistics, has seen growing adoption in machine learning after 2001 when the book on statistical learning (The Elements of Statistical Learning) was published.
    The field of data mining grows out of knowledge discovery from databases. In 1993, a seminal paper by Rakesh Agrawal and two others proposed an efficient algorithm of mining association rules in large databases. This paper promoted many research papers on discovering frequent patterns and more efficient mining algorithms. The early work of data mining in 1990s was linked to creating better SQL statement and working with databases directly.
    Data mining has its strong focus on working with industrial problems and getting practical solutions. Therefore it concerns with not only data size (large data), but also data processing speed (stream data). In addition, personalized recommended systems and network mining are all developed due to business need, outside the machine learning field.
    The two major conferences for data mining are KDD (Knowledge Discovery and Data Mining) and ICDM (International Conference on Data Mining). The two major conferences for machine learning are ICML (International Conference on Machine Learning) and NIPS (Neural Information Processing Systems). Machine learning researchers attend both types of conferences. However, the data mining conferences have much stronger industrial link.
    Data Miners typically have strong foundation in machine learning, but also have a keen interesting in applying it large-scale problems.
    Over time, we will see deeper connection between data mining and machine learning. Could they become twins one day? Only time will tell.

  • Eren Erdal Aksoy added an answer:
    What is the current state of the art methods tracking human detection and segmentation in static images?
    I am looking for state of the art methods to capture the human segments of images (mostly clothing images that are including only one person with natural a background, for example ). The next stage will be the segmentation of the clothes of the person. If you know any studies touching on these topics, it would be great to know.
    Eren Erdal Aksoy · Georg-August-Universität Göttingen

    Hi Eren, 

    you can even use this web page to segment your should distinguish clothes and skin...



  • Jose Luis Lopez Pino added an answer:
    How to use R language for larger datasets of size more than a machine RAM size?

    I am implementing statistical models for my project having very large data with me. I have used R language for implementing this and now i want to use machine learning model on it. But it creates problems while loading the data into RAM or sometimes performs operation for some data and throws error and stops working. I need a solution for this problem so that it should carry its operation automatically. I have tried big memory and ff packages also but its not working. Is there any solution for this problem?

    Jose Luis Lopez Pino · Technische Universität Berlin

    Hi @Mayur

    I classify the solutions in four groups: external memory (bgml bigmemory), divide and Recombine (rmr, rhype, etc.), query languages (RMySQL, RSQLLite would be here) and distributed collection manipulation (Presto, SparkR). My slide on it:

    If you want to do data mining, more than likely query languages are not enough. I would start with distributed collection manipulation packages.

  • Angel Kuri added an answer:
    Why is back-propagation still used extensively to train ANN while it is beaten by GA?
    I just have read an article "Training Feedforward Neural Networks Using Genetic Algorithms" written by David J Montana. Experiment 5 compares BP and GA to train ANN, the results show that GA is faster and gives smaller error margin than BP. If those results can be generalized why does BP dominate ANN training?

    the link to the paper:
    Angel Kuri · Instituto Tecnológico Autónomo de México (ITAM)

    One possible advantage of GAs over gradient descent methods is that, in the former, the metric is arbitrary, whereas in the latter L2 is usually applied. This does not mean that L2 is inferior. But GAs applied to training NNs are apt to be more flexible. At least in this aspect.

    Comments regarding the length (in terms of lines of code) of one approach vs. any other tend to be secondary if the "larger" implementation leads to better answers.

  • Luis Enrique Sucar added an answer:
    What is the difference between cause/effect and feature vector/label?

    I participated in a contest a while ago and the goal was to determine the cause of an effect from a training set of causes and effects and finding the regularity between them and then using it to classify the test data.

    I wonder if it is the same as a usual classification task or it is different somehow ?

    Thanks in advance.

    Luis Enrique Sucar · Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE)

    Although it is indeed a controversial aspect, there are recently some interesting developments based on graphical models (see the work of J. Pearl). In principle you can distinguish correlation and causal relation by doing interventions, that is controlled experiments.

  • Yogesh Vasant Kadam added an answer:
    Can anyone suggest a book for learning Hadoop with minimum resources (like with a single machine)?
    Yogesh Vasant Kadam · Bharati Vidyapeeth's Institute of Technology(Poly.), Palus. Dist:Sangli is best link to start with basics of Hadoop and none other than "Hadoop:The Definitive Guide" book is good resource for learning the programming concept of Hadoop.

  • Sebastian Raschka added an answer:
    How can I increase the automatic detection of a small local feature as part of a probability density function?

    I have an interesting challenge to detect a very small local feature as part of a 1-D probability density function. The task is very straightforward for a human being, but it appears to be quite cumbersome to perform this automatically, which is my aim (Actually, I could detect in 60% of the cases, but that is very low for my application).

    I have attached an example: the left hand side is where there is nothing to detect, and the right hand side is where there is a little peak (I am interest in) as indicated by the arrow.

    Many thanks!

    Sebastian Raschka · Michigan State University

    If you are interested in a Python implementation for peak detection, I think this can be very useful and is probably what you are looking for:

Topic Followers (26835) See all