This study proposes a computational-video-analysis pipeline using OpenPose for keypoint detection, the RNN-LSTM network for constructing 12 gesture classifiers, and data augmentation and epoch early-stopping techniques for performance optimization. Through the measurement of accuracy, precision, recall, and F1 scores, this study compares three approaches (the vanilla approach, data-augmentation approach, and epoch-optimization approach), which gradually increase the model performance for all gesture features. The study suggests that a combination of data augmentation and epoch early-stopping techniques can effectively solve the imbalanced dataset problem faced by customized datasets and substantially increase the accuracy and F1 scores by 10–20%, achieving a satisfying accuracy of 70%–90% for most gesture detections.