Preprint

SC2: Towards Enhancing Content Preservation and Style Consistency in Long Text Style Transfer

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Text style transfer (TST) aims to vary the style polarity of text while preserving the semantic content. Although recent advancements have demonstrated remarkable progress in short TST, it remains a relatively straightforward task with limited practical applications. The more comprehensive long TST task presents two challenges: (1) existing methods encounter difficulties in accurately evaluating content attributes in multiple words, leading to content degradation; (2) the conventional vanilla style classifier loss encounters obstacles in maintaining consistent style across multiple generated sentences. In this paper, we propose a novel method SC2, where a multilayer Joint Style-Content Weighed (JSCW) module and a Style Consistency loss are designed to address the two issues. The JSCW simultaneously assesses the amounts of style and content attributes within a token, aiming to acquire a lossless content representation and thereby enhancing content preservation. The multiple JSCW layers further progressively refine content representations. We design a style consistency loss to ensure the generated multiple sentences consistently reflect the target style polarity. Moreover, we incorporate a denoising non-autoregressive decoder to accelerate the training. We conduct plentiful experiments and the results show significant improvements of SC2 over competitive baselines. Our code: https://github.com/jiezhao6/SC2.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Text style transfer aims at converting the stylistic features of a sentence to another style while preserving its content. Despite the remarkable progress achieved in English style transfer, Chinese style transfer still relies heavily on manual processing. Taking classical and modern Chinese style transfer as an example, most of the existing method cannot carry out this task due to the lack of sufficient parallel corpus for supervised learning and the special language phenomenon in Chinese. In this paper, we propose an unsupervised prompt-based reinforcement learning (PBRL) framework to transfer text between classical and modern Chinese styles via an entangled approach. The PBRL framework mainly consists of two stages, i.e., a prompt-based fine-tuning stage and a bi-directional reinforcement learning stage. In the first stage, we leverage a priori knowledge-based synonym dictionary to build a pseudo-parallel corpus for prompt learning to provide the system a warm start. Then the style-transfer-accuracy reward and content-preservation reward are specially designed for bi-directional-reinforcement optimization. Experimental evaluations show that our model outperforms state-of-art networks by a large margin.
Conference Paper
Full-text available
Charge prediction is to automatically predict the judgemental charges for legal cases. To convict a person/unit of a charge, the case description must contain matching instances of the constitutive elements (CEs) of that charge. This knowledge of CEs is a valuable guide for the judge in making final decisions. However, it is far from fully exploited for charge prediction in the literature. In this paper we propose a novel method named Constitutive Elements-guided Charge Prediction (CECP). CECP mimics human's charge identification process to extract potential instances of CEs and generate predictions accordingly. It avoids laborious labeling of matching instances of CEs by a novel reinforcement learning module which progressively selects potentially matching sentences for CEs and evaluates their relevance. The final prediction is generated based on the selected sentences and their relevant CEs. Experiments on two real-world datasets show the superiority of CECP over competitive baselines.
Article
Full-text available
Standard multi-task benchmarks are essential for developing pretraining models that can generalize to various downstream tasks. Existing benchmarks for natural language processing (NLP) usually focus only on understanding or generating short texts. However, long text modeling requires many distinct abilities in contrast to short texts, such as the modeling of long-range discourse and commonsense relations, and the coherence and controllability of generation. The lack of standardized benchmarks makes it difficult to assess these abilities of a model and fairly compare different models, especially Chinese models. Therefore, we propose a story-centric benchmark named LOT for evaluating Chinese long text modeling, which aggregates two understanding tasks and two generation tasks. We construct new datasets for these tasks based on human-written Chinese stories with hundreds of words. Furthermore, we release an encoder-decoder-based Chinese long text pretraining model named LongLM with up to 1 billion parameters. We pretrain LongLM on 120G Chinese novels with two generative tasks including text infilling and conditional continuation. Extensive experiments show that LongLM outperforms similar-sized pretraining models substantially on both the understanding and generation tasks in LOT.
Article
Full-text available
Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text, such as politeness, emotion, humor, and many others. It has a long history in the field of natural language processing, and recently has re-gained significant attention thanks to the promising performance brought by deep neural models. In this article, we present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017. We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data. We also provide discussions on a variety of important topics regarding the future development of this task.
Conference Paper
Full-text available
This paper tackles the problem of disentangling the latent variables of style and content in language models. We propose a simple yet effective approach, which incorporates auxiliary multi-task and adversarial objectives, for label prediction and bag-of-words prediction, respectively. We show, both qualitatively and quantitatively, that the style and content are indeed disentangled in the latent space. This disentangled latent representation learning method is applied to style transfer on non-parallel corpora. We achieve substantially better results in terms of transfer accuracy, content preservation and language fluency, in comparison to previous state-of-the-art approaches.
Conference Paper
Full-text available
This paper focuses on the task of sentiment transfer on non-parallel text, which modifies sentiment attributes (e.g., positive or negative) of sentences while preserving their attribute-independent contents. Existing methods adopt RNN encoder-decoder structure to generate a new sentence of a target sentiment word by word, which is trained on a particular dataset from scratch and have limited ability to produce satisfactory sentences. When people convert the sentiment attribute of a given sentence, a simple but effective approach is to only replace the sentiment tokens of the sentence with other expressions indicative of the target sentiment, instead of building a new sentence from scratch. Such a process is very similar to the task of Text Infilling or Cloze. With this intuition, we propose a two steps approach: Mask and Infill. In the \emph{mask} step, we identify and mask the sentiment tokens of a given sentence. In the \emph{infill} step, we utilize a pre-trained Masked Language Model (MLM) to infill the masked positions by predicting words or phrases conditioned on the context\footnote{In this paper, \emph{content} and \emph{context} are equivalent, \emph{style}, \emph{attribute} and \emph{label} are equivalent.}and target sentiment. We evaluate our model on two review datasets \emph{Yelp} and \emph{Amazon} by quantitative, qualitative, and human evaluations. Experimental results demonstrate that our model achieve state-of-the-art performance on both accuracy and BLEU scores.
Article
Full-text available
Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused.
Conference Paper
Deep neural networks often require copious amount of labeled-data to train their scads of parameters. Training larger and deeper networks is hard without appropriate regularization, particularly while using a small dataset. Laterally, collecting well-annotated data is expensive, timeconsuming and often infeasible. A popular way to regularize these networks is to simply train the network with more data from an alternate representative dataset. This can lead to adverse effects if the statistics of the representative dataset are dissimilar to our target. This predicament is due to the problem of domain shift. Data from a shifted domain might not produce bespoke features when a feature extractor from the representative domain is used. In this paper, we propose a new technique (d-SNE) of domain adaptation that cleverly uses stochastic neighborhood embedding techniques and a novel modified-Hausdorff distance. The proposed technique is learnable end-to-end and is therefore, ideally suited to train neural networks. Extensive experiments demonstrate that d-SNE outperforms the current states-of-the-art and is robust to the variances in different datasets, even in the one-shot and semi-supervised learning settings. d-SNE also demonstrates the ability to generalize to multiple domains concurrently.
Article
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Conference Paper
Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.
  • Jiatao Gu
  • James Bradbury
  • Caiming Xiong
  • O K Victor
  • Richard Li
  • Socher
Jiatao Gu, James Bradbury, Caiming Xiong, Victor OK Li, and Richard Socher. 2017. Nonautoregressive neural machine translation. arXiv preprint arXiv:1711.02281.
  • Qinghong Han
  • Yuxian Meng
  • Fei Wu
  • Jiwei Li
Qinghong Han, Yuxian Meng, Fei Wu, and Jiwei Li. 2020. Non-autoregressive neural dialogue generation. arXiv preprint arXiv:2002.04250.
Few-shot charge prediction with discriminative legal attributes
  • Zikun Hu
  • Xiang Li
  • Cunchao Tu
  • Zhiyuan Liu
  • Maosong Sun
Zikun Hu, Xiang Li, Cunchao Tu, Zhiyuan Liu, and Maosong Sun. 2018. Few-shot charge prediction with discriminative legal attributes. In Proceedings of the 27th International Conference on Computational Linguistics, pages 487-498.
Few-shot adversarial domain adaptation. Advances in neural information processing systems
  • Saeid Motiian
  • Quinn Jones
  • Seyed Iranmanesh
  • Gianfranco Doretto
Saeid Motiian, Quinn Jones, Seyed Iranmanesh, and Gianfranco Doretto. 2017. Few-shot adversarial domain adaptation. Advances in neural information processing systems, 30.
Non-autoregressive neural text-to-speech
  • Kainan Peng
  • Wei Ping
  • Zhao Song
  • Kexin Zhao
Kainan Peng, Wei Ping, Zhao Song, and Kexin Zhao. 2020. Non-autoregressive neural text-to-speech. In International conference on machine learning, pages 7586-7598. PMLR.
Exploring the limits of transfer learning with a unified text-to-text transformer
  • Colin Raffel
  • Noam Shazeer
  • Adam Roberts
  • Katherine Lee
  • Sharan Narang
  • Michael Matena
  • Yanqi Zhou
  • Wei Li
  • Peter J Liu
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485-5551.
Adapting language models for non-parallel authorstylized rewriting
  • Bakhtiyar Syed
  • Gaurav Verma
  • Anandhavelu Balaji Vasan Srinivasan
  • Vasudeva Natarajan
  • Varma
Bakhtiyar Syed, Gaurav Verma, Balaji Vasan Srinivasan, Anandhavelu Natarajan, and Vasudeva Varma. 2020. Adapting language models for non-parallel authorstylized rewriting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9008-9015.
CAIL2018: A large-scale legal dataset for judgment prediction
  • Chaojun Xiao
  • Haoxi Zhong
  • Zhipeng Guo
  • Cunchao Tu
  • Zhiyuan Liu
  • Maosong Sun
  • Yansong Feng
  • Xianpei Han
  • Zhen Hu
  • Heng Wang
  • Jianfeng Xu
Chaojun Xiao, Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Zhiyuan Liu, Maosong Sun, Yansong Feng, Xianpei Han, Zhen Hu, Heng Wang, and Jianfeng Xu. 2018. CAIL2018: A large-scale legal dataset for judgment prediction.
Text style transfer via learning style instance supported latent space
  • Xiaoyuan Yi
  • Zhenghao Liu
  • Wenhao Li
  • Maosong Sun
Xiaoyuan Yi, Zhenghao Liu, Wenhao Li, and Maosong Sun. 2020. Text style transfer via learning style instance supported latent space. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 3801-3807. International Joint Conferences on Artificial Intelligence Organization. Main track.
Bertscore: Evaluating text generation with BERT
  • Tianyi Zhang
  • Varsha Kishore
  • Felix Wu
  • Kilian Q Weinberger
  • Yoav Artzi
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. Bertscore: Evaluating text generation with BERT. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
  • Jie Zhao
  • Ziyu Guan
  • Wei Zhao
Jie Zhao, Ziyu Guan, Wei Zhao, Yue Jiang, and Xiaofei He. 2023. Few-shot domain adaptation for charge prediction on unprofessional descriptions. arXiv preprint arXiv:2309.17313.