Chapter Five begins with a discussion of the differences between supervised and unsupervised methods. In unsupervised methods, no target variable is identified as such. Most data mining methods are supervised methods, however, meaning that (a) there is a particular pre-specified target variable, and (b) the algorithm is given many examples where the value of the target variable is provided, so that the algorithm may learn which values of the target variable are associated with which values of the predictor variables. A general methodology for supervised modeling is provided, for building and evaluating a data mining model. The training data set, test data set, and validation data sets are discussed. The tension between model overfitting and underfitting is illustrated graphically, as is the bias-variance tradeoff. High complexity models are associated with high accuracy and high variability. The mean-square error is introduced, as a combination of bias and variance. The general classification task is recapitulated. The k-nearest neighbor algorithm is introduced, in the context of a patient-drug classification problem. Voting for different values of k are shown to sometimes lead to different results. The distance function, or distance metric, is defined, with Euclidean distance being typically chosen for this algorithm. The combination function is defined, for both simple unweighted voting and weighted voting. Stretching the axes is shown as a method for quantifying the relevance of various attributes. Database considerations, such as balancing, are discussed. Finally, k-nearest neighbor methods for estimation and prediction are examined, along with methods for choosing the best value for k.