Article

# A MapReduce based distributed SVM algorithm for binary classification

Turkish Journal of Electrical Engineering and Computer Sciences (Impact Factor: 0.57). 12/2013;

- Citations (14)
- Cited In (0)

- [Show abstract] [Hide abstract]

**ABSTRACT:**Machine rule induction was examined on a difficult categorization problem by applying a Holland-style classifier system to a complex letter recognition task. A set of 20,000 unique letter images was generated by randomly distorting pixel images of the 26 uppercase letters from 20 different commercial fonts. The parent fonts represented a full range of character types including script, italic, serif, and Gothic. The features of each of the 20,000 characters were summarized in terms of 16 primitive numerical attributes. Our research focused on machine induction techniques for generating IF-THEN classifiers in which the IF part was a list of values for each of the 16 attributes and the THEN part was the correct category, i.e., one of the 26 letters of the alphabet. We examined the effects of different procedures for encoding attributes, deriving new rules, and apportioning credit among the rules. Binary and Gray-code attribute encodings that required exact matches for rule activation were compared with integer representations that employed fuzzy matching for rule activation. Random and genetic methods for rule creation were compared with instance-based generalization. The strength/specificity method for credit apportionment was compared with a procedure we call “accuracy/utility.”Machine Learning 01/1991; 6:161-182. · 1.47 Impact Factor -
##### Article: Principal component analysis

[Show abstract] [Hide abstract]

**ABSTRACT:**Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter-correlated quantitative dependent variables. Its goal is to extract the important information from the table, to represent it as a set of new orthogonal variables called principal components, and to display the pattern of similarity of the observations and of the variables as points in maps. The quality of the PCA model can be evaluated using cross-validation techniques such as the bootstrap and the jackknife. PCA can be generalized as correspondence analysis (CA) in order to handle qualitative variables and as multiple factor analysis (MFA) in order to handle heterogeneous sets of variables. Mathematically, PCA depends upon the eigen-decomposition of positive semi-definite matrices and upon the singular value decomposition (SVD) of rectangular matrices. Copyright © 2010 John Wiley & Sons, Inc.For further resources related to this article, please visit the WIREs website.Wiley Interdisciplinary Reviews: Computational Statistics. 06/2010; 2(4):433 - 459. -
##### Article: Are loss functions all the same?

[Show abstract] [Hide abstract]

**ABSTRACT:**In this letter, we investigate the impact of choosing different loss functions from the viewpoint of statistical learning theory. We introduce a convexity assumption, which is met by all loss functions commonly used in the literature, and study how the bound on the estimation error changes with the loss. We also derive a general result on the minimizer of the expected risk for a convex loss function in the case of classification. The main outcome of our analysis is that for classification, the hinge loss appears to be the loss of choice. Other things being equal, the hinge loss leads to a convergence rate practically indistinguishable from the logistic loss rate and much better than the square loss rate. Furthermore, if the hypothesis space is sufficiently rich, the bounds obtained for the hinge loss are not loosened by the thresholding stage.Neural Computation 06/2004; 16(5):1063-76. · 1.76 Impact Factor

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.