Classification is the main research target of many algorithms in data mining. Of all the algorithms, decision trees are more preferred by researchers due to their clarity and readability. ID3, as a heuristic algorithm, is fairly classic and popular in the induction of decision trees. The key of ID3 is to choose information gain as the standard for testing attributes. ID3 algorithm, however, tends to choose the attribute with more attribute values as the splitting node, and this attribute is often not the best attribute. In this paper, the improved information gain based on dependency degree of condition attributes on decision attribute is used as a heuristic for selecting the optimal splitting attribute in order to overcome above-stated shortcoming of the traditional ID3 algorithm. Experiments prove that the tree size and classification accuracy of the decision trees generated by the improved algorithm is superior to the ID3 algorithm.
Seventh International Conference on Natural Computation, ICNC 2011, Shanghai, China, 26-28 July, 2011; 01/2011