The semi-supervised video anomaly detection assumes that only normal video clips are available for training. Therefore, the intuitive idea is either to learn a dictionary by sparse coding or to train encoding-decoding neural networks by minimizing the reconstruction errors. For the former, the optimization of sparse coefficients is extremely time-consuming. For the latter, this manner cannot guarantee that an abnormal data corresponds to a larger reconstruction error due to the strong generalization of neural networks. To remedy their weaknesses and leverage their strengths, we propose a Fast Sparse Coding Network (FSCN) based on High-level Features. First, we propose a two-stream neural network to extract Spatial-Temporal Fusion Features (STFF) in hidden layers. With the STFF at hand, we use a Fast Sparse Coding Network to build a normal dictionary. By leveraging the predictor to produce approximate sparse coefficients, our FSCN generates sparse coefficients within a forward pass, which is simple and computationally efficient. Compared with traditional sparse coding based methods, FSCN is hundreds of or even thousands of times faster at the test stage. Extensive experiments on benchmark datasets demonstrate that our method reaches the state-of-the-art level.