Two dominant classes of modern approaches for the detection and classification of focal lesions are a bag of visual words and end-to-end learning machines. In this study, we reviewed and compared these approaches for lung nodule detection, colorectal polyp detection, and lung nodule classification in CT images. Specifically, we considered massive-training artificial neural networks (MTANNs) and ... [Show full abstract] convolutional neural networks (CNNs) as representatives of end-to-end learning machines, and Fisher vectors as a representative of the bag of visual words. We first compared CNNs with Fisher vectors in nodule detection, nodule classification, and polyp detection, concluding that the best performing CNN model achieved comparable performance to that of Fisher vectors. We also analyzed the performance of CNNs with varying depths for the 3 studied applications. Our experiments showed that the CNN architectures with 3 or 4 convolutional layers were more effective than shallower architectures, but we did not observe a further performance gain by using deeper architectures. We then compared CNNs with MTANNs, concluding that MTANNs outperformed CNNs for nodule detection and classification particularly given limited training data. Specifically, for nodule detection, the MTANNs generated 0.08 false positives per section at 100% sensitivity, which was significantly (p < 0.05) lower than the best performing CNN model with 0.67 false positives per section at the same level of sensitivity. We showed that the best performing CNN model achieved comparable performance to that of Fisher vectors in the 3 studied applications, and that MTANNs outperformed CNNs in nodule detection and classification, especially given limited training data.