Question
What is the best algorithm for a classification task?
Classification is one of the data mining tasks, applied in many area especially in medical applications. One reason for using this technique is selecting the appropriate algorithm for each data set. It lead to the question of "What is the best algorithm for classification task?" and "How to select it?"
Popular Answers
First of all, I assume you already know this but it is always important to remember: there is no single algorithm that is better than all the others on all the problems. This is discussed by several authors. The most noteworthy is probably Wolpert:
Wolpert DH. The Lack of A Priori Distinctions Between Learning Algorithms. Neural Computation. 1996;8:1341-1390.
Therefore, for each problem, you must select the right algorithm. Your question is how to do this. If you have plenty of computational resources, you can test multiple algorithms and parameter settings. In this approach, the main question is how to estimate and compare the performance of the algorithms in a reliable way. This has also been the object of a significant amount of research in statistics, machine learning and data mining. I particularly like Dietterich's paper but, more recently, the work of Demsar is the basis for many comparisons:
Dietterich TG. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation. 1998;10(7):1895-1924.
Demsar J. Statistical Comparisons of Classifiers over Multiple Data Sets. The Journal of Machine Learning Research. 2006;7:1-30.
If you want to predict a priori which algorithm will be the best for a given problem, then one approach you can use is metalearning. Essentially, you collect (meta)data about problems and the performance of algorithms on those problems and, when you apply a learning algorithm to those data, you get a (meta)model that can be used to predict the (relative) performance of algorithms on new problems. We have been working on this problem for many years and recently there is a growing interest in the area. There are already some books:
Brazdil P, Giraud-Carrier C, Soares C, Vilalta R. Metalearning: Applications to Data Mining. Berlin, Heidelberg: Springer; 2009:176. Available at: http://dblp.uni-trier.de/db/series/cogtech/index.html#0022052. Accessed February 6, 2012.
Jankowski N, Duch W, GrÄ bczewski K. Meta-Learning in Computational Intelligence. Springer; 2011:362. Available at: http://www.springer.com/engineering/computational+intelligence+and+complexity/book/978-3-642-20979-6.
and some interesting surveys have been published recently:
Serban F, Vanschoren J, Kietz JU, Bernstein A. A survey of intelligent assistants for data analysis. ACM Computing Surveys. 2012;(in press).
Smith-Miles KA. Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. 2008;41(1):1-25. Available at: http://portal.acm.org/ft_gateway.cfm?id=1456656&type=pdf&coll=Portal&dl=GUIDE&CFID=67145908&CFTOKEN=66408030.
I hope this helps!
Cheers,
Carlos
There is data sets that a algorithm work good on it and another does not work good