Unformatted text is a big obstacle to human reading and degrades the performance of many downstream language understanding tasks. To improve the readability, this paper proposes a multitask deep neural model to restore format standards including punctuation and capitalization. Unlike prior research which usually solved a single task or many tasks separately, our model employs multitask learning
... [Show full abstract] to simultaneously perform the restoration tasks. The model consists of a backbone network to learn language features, and attention-based predictors for the two tasks. To find the efficient encoding method for unformatted text, we analyze the model behaviour with different backbone architectures such as convolutional neural networks (CNN), unidirectional and bidirectional recurrent-based networks. The model is validated on two Vietnamese datasets and integrated into an automatic speech recognition (ASR) system. The experiments show the promising results for both restoration tasks and the applicability of our model.