Conference PaperPDF Available

Verb Paraphrase based on Case Frame Alignment

Authors:
  • Yahoo Japan

Abstract

This paper describes a method of translating a predicate-argument structure of a verb into that of an equivalent verb, which is a core component of the dictionary-based paraphrasing. Our method grasps several usages of a headword and those of the def-heads as a form of their case frames and aligns those case frames, which means the acquisition of word sense disambiguation rules and the detection of the appropriate equivalent and case marker transformation.
Computational Linguistics (ACL), Philadelphia, July 2002, pp. 215-222.
Proceedings of the 40th Annual Meeting of the Association for
{soccer, tennnis...}{students, children...}
{parent, people...} {art}
{he, woman...} {computer}
1 Case Frames of necchuu ( to be enthusiastic )
{child, son...} {mother, father... }
2 Case Frames of shitau ( to admire )
Case Frames of keitou ( to devote )
{son, she... } {music}
{she, I...} {president}
{golf, soccer...}ni{player, he...} ga
{teacher, professor...} wo{students, he...}
......
......
......
keitou ( to devote )
1 necchuu ( to be enthusiastic )
2 shitau ( to admire )
Dictionary
shitau ( to admire )
keitou ( to devote )
ga ni
ga ni
ga
ga
ga
ga
ga
ni
ni
ni
wo
{she, I...} {president}ga ni
{teacher, professor...}wo{students, he...} ga
raw corpus
parsing
case frames
initial case frames
woga
ga woni tsumu
tsumu}{experience:1
}{
baggage:2
supply:2
{ }
car:1
truck:1
airplane:1
}{worker:1
{ }player:1
ni
niga
wo
wo
wo tsumu
tsumu
tsumuga
{ }car:1
{ }supply:2
{ }baggage:2
}{experience:1{ }player:1
}{worker:1
{ }
truck:1
airplane:1
predicate-argument examples
wo
wo
wo
wo
wo
ni
ni
ga
ga
ni
car tsumu
(load)
tsumu
tsumu
tsumu
tsumu
baggage
baggageworker
supply
supply
experienceplayer
truck
airplane
(accumulate)
(load)
(load)
(load)
0.911.0 0.86
tsumu
tsumu
wo
woni
ga ni (load)
{baggage:8}{worker:3} {car:5}
{supply:10}airplane:2}{truck:3,
F :
F :
1
2(load)
... On one hand, it provides reading aids to people with low-literacy skills, such as nonnative speakers, children as well as patients with linguistic and cognitive disabilities [4], [5]. On the other hand, it also can be a downstream task to improve the performance of other NLP tasks [6]- [8] Some sentence simplification systems [5], [9] focused on splitting a long sentence into shorter sentences, deleting less important words/phrases [10]- [12] or paraphrasing [13]- [15]. But these systems depended on manual rules and cannot be trained end-to-end. ...
... In previous works, the most usual operations for sentence simplification include substituting rare words with more simple words or phrases, deleting unimportant element of the original text or making syntactically complex structures simpler [24]. And they usually just focused on individual aspects of simplification problem such as several systems performed syntactic simplification only, using rules aimed at sentence splitting [5], [6], [13], [24], or performed lexical simplification by substituting difficult words with simpler WordNet synonyms or paraphrases [14], [15], [25]. ...
... In order to investigated the influence of weak attention in our model, we also explored different β 1 and β 2 setting in equation (15) and ran the experiments on the Newsela. We showed the result in Figure 3. ...
Article
Full-text available
Sentence simplification aims to simplify a complex sentences while retaining its main idea. It is one of the most important tasks in natural language processing. Recent works addressed the task with sequence-to-sequence(Seq2seq) model. However, these conventional Seq2seq models usually based on a single-stage encoder, which only read the source complex sentence once, as a result, it was hard to extract the representational features of the source sentence precisely. To resolve the problem, we proposed a multi-stage encoder based Seq2seq model for sentence simplification. Specificly, there were three stages in the encoder of proposed model, namely N-gram reading stage, glance-over stage and final encoding stage. The N-gram reading stage catched N-gram feature embedding for other stage and the glance-over stage extracted local and global information about the source sentence. The final encoding stage taken advantage of the information extracted by the former two stage to encode source sentence better. Then, it introduced a novel attention connection method which could help the decoder to make full use of the information of encoder. Experiments on three public datasets demonstrated the proposed model that outperforms state-of-the-art baseline simplification systems.
... Modern approaches Zhang and Lapata (2017); Vu et al. (2018); Guo, Pasunuru, and Bansal (2018); Zhao et al. (2018) view the simplification task as monolingual textto-text rewriting and employ the very successful encoderdecoder neural architecture Bahdanau, Cho, and Bengio (2015); Sutskever, Vinyals, and Le (2014). In contrast to traditional methods which target individual aspects of the simplification task such as sentence splitting Carroll et al. (1999); Chandrasekar, Doran, and Srinivas (1996), inter alia) or the substitution of complex words with simpler ones Devlin (1999); Kaji et al. (2002), neural models have no special-purpose mechanisms for ensuring how to best simplify text. They rely on representation learning to implicitly learn simplification rewrites from data, i.e., examples of complex-simple sentence pairs. ...
... Lexical substitution, the replacement of complex words with simpler alternatives, is an integral part of sentence simplification and has been the subject of much previous work Specia, Jauhar, and Mihalcea (2012) Kaji et al. (2002). We enrich the encoder of the Transformer with lexical constraints, by adding indicator features to each word embedding, specifying if the token should be kept. ...
Preprint
Sentence simplification aims to make sentences easier to read and understand. Recent approaches have shown promising results with sequence-to-sequence models which have been developed assuming homogeneous target audiences. In this paper we argue that different users have different simplification needs (e.g. dyslexics vs. non-native speakers), and propose CROSS, ContROllable Sentence Simplification model, which allows to control both the level of simplicity and the type of the simplification. We achieve this by enriching a Transformer-based architecture with syntactic and lexical constraints (which can be set or learned from data). Empirical results on two benchmark datasets show that constraints are key to successful simplification, offering flexible generation output.
... Several approaches use hand-crafted syntactic simplification rules aimed at splitting long and complicated sentences into several simpler ones (Carroll et al. 1999;Chandrasekar, Doran, and Srinivas 1996;Vickrey and Koller 2008;Siddharthan 2004). Other work focuses on lexical simplifications and substitutes difficult words by more common WordNet synonyms or paraphrases found in a predefined dictionary Inui et al. 2003;Kaji et al. 2002). ...
Article
Text simplification aims to rewrite text into simpler versions and thus make information accessible to a broader audience (e.g., non-native speakers, children, and individuals with language impairments). In this paper, we propose a model that simplifies documents automatically while selecting their most important content and rewriting them in a simpler style. We learn content selection rules from same-topic Wikipedia articles written in the main encyclopedia and its Simple English variant. We also use the revision histories of Simple Wikipedia articles to learn a quasi-synchronous grammar of simplification rewrite rules. Based on an integer linear programming formulation, we develop a joint model where preferences based on content and style are optimized simultaneously. Experiments on simplifying main Wikipedia articles show that our method significantly reduces the reading difficulty, while still capturing the important content.
... WI '19, and summary task [17]. There are two linguistic aspects of operation for the sentence simplification: (a) the lexical aspect, which is to replace complex words in a sentence with simple words or idioms [2][3] [4][10] [13] [15] [40], and (b) the syntactic aspect, which is to compress complex sentence structures and delete original elements [4][8] [26]. Earlier work focused on these individual aspects of the simplification problem. ...
Conference Paper
In this study, we propose sentence simplification from a non-parallel corpus with adversarial learning. In recent years, sentence simplification based on a statistical machine translation framework and neural networks have been actively studied. However, most methods require a large parallel corpus, which is expensive to build. In this paper, our purpose is sentence simplification with a non-parallel corpus in open data en-Wikipedia and Simple-Wikipedia articles. We use a style transfer framework with adversarial learning for learning by non-parallel corpus and adapted a prior work [by Barzilay et al.] to sentence simplification as a base framework. Furthermore, from the perspective of improving retention of sentence meaning, we add pretraining reconstruction loss and cycle consistency loss to the base framework. We also improve the sentence quality output from the proposed model as a result of the expansion.
... Lexica for simplification: There have been previous attempts to use manually created lexica for simplification. For example, Elhadad and Sutaria (2007) used UMLS lexicon (Bodenreider, 2007), a repository of technical medical terms; Ehara et al. (2010) asked non-native speakers to answer multiple-choice questions corresponding to 12,000 English words to study each user's familiarity of vocabulary; Kaji et al. (2012) and Kajiwara et al. (2013) used a dictionary of 5,404 Japanese words based on the elementary school textbooks; Xu et al. (2016) used a list of 3,000 most common English words; Lee and Yeung (2018) used an ensemble of vocabulary lists of different complexity levels. However, to the best of our knowledge, there is no previous study on manually building a large word-complexity lexi-con with human judgments that has shown substantial improvements on automatic simplification systems. ...
... Another challenge for text simplification is generating an ample set of rewrite rules that potentially simplify an input sentence. Most early work has relied on either hand-crafted rules (Chandrasekar et al., 1996;Carroll et al., 1999;Siddharthan, 2006;Vickrey and Koller, 2008) or dictionaries like WordNet Kaji et al., 2002;Inui et al., 2003). Other more recent studies have relied on the parallel Normal-Simple Wikipedia Corpus to automatically extract rewrite rules (Zhu et al., 2010;Woodsend and Lapata, 2011;Coster and Kauchak, 2011b;Wubben et al., 2012;Narayan and Gardent, 2014;Siddharthan and Angrosh, 2014;Angrosh et al., 2014). ...
Article
Full-text available
Most recent sentence simplification systems use basic machine translation models to learn lexical and syntactic paraphrases from a manually simplified parallel corpus. These methods are limited by the quality and quantity of manually simplified corpora, which are expensive to build. In this paper, we conduct an in-depth adaptation of statistical machine translation to perform text simplification, taking advantage of large-scale paraphrases learned from bilingual texts and a small amount of manual simplifications with multiple references. Our work is the first to design automatic metrics that are effective for tuning and evaluating simplification systems, which will facilitate iterative development for this task.
... Sometimes, instead of replacing difficult words, texts are augmented with automatically retrieved explanations or definitions (Elhadad 2006, Kaji et al. 2002, Kandula et al. 2010, which can help to improve text comprehension. When second language learners or non-native speakers are the target population, these definitions or links to dictionary entries can be provided in the native language of the readers. ...
Article
Full-text available
We discuss the design, development and evaluation of an automated lexical simplification tool for Dutch. A basic pipeline approach is used to perform both text adaptation and annotation. First, sentences are preprocessed and word sense disambiguation is performed. Then, the difficulty of each token is estimated by looking at their average age of acquisition and frequency in a corpus of simplified Dutch. We use Cornetto to find synonyms of words that have been identified as difficult and the SONAR500 corpus to perform reverse lemmatisation. Finally, we rely on a large-scale language model to verify whether the selected replacement word fits the local context. In addition, the text is augmented with information from Wikipedia (word definitions and links). We tune and evaluate the system with sentences taken from the Flemish newspaper De Standaard. The results show that the system's adaptation component has low coverage, since it only correctly simplifies around one in five 'difficult' words, but reasonable accuracy, with no grammatical errors being introduced in the text. The Wikipedia annotations have a broader coverage, but their potential for simplification needs to be further developed and more thoroughly evaluated.
... Lexica for simplification: There have been previous attempts to use manually created lexica for simplification. For example, Elhadad and Sutaria (2007) used UMLS lexicon (Bodenreider, 2007), a repository of technical medical terms; Ehara et al. (2010) asked non-native speakers to answer multiple-choice questions corresponding to 12,000 English words to study each user's familiarity of vocabulary; Kaji et al. (2012) and Kajiwara et al. (2013) used a dictionary of 5,404 Japanese words based on the elementary school textbooks; Xu et al. (2016) used a list of 3,000 most common English words; Lee and Yeung (2018) used an ensemble of vocabulary lists of different complexity levels. However, to the best of our knowledge, there is no previous study on manually building a large word-complexity lexi-con with human judgments that has shown substantial improvements on automatic simplification systems. ...
Preprint
Full-text available
Current lexical simplification approaches rely heavily on heuristics and corpus level features that do not always align with human judgment. We create a human-rated word-complexity lexicon of 15,000 English words and propose a novel neural readability ranking model with a Gaussian-based feature vectorization layer that utilizes these human ratings to measure the complexity of any given word or phrase. Our model performs better than the state-of-the-art systems for different lexical simplification tasks and evaluation datasets. Additionally, we also produce SimplePPDB++, a lexical resource of over 10 million simplifying paraphrase rules, by applying our model to the Paraphrase Database (PPDB).
Chapter
Many people are denied access to information and communication because they cannot read and understand standard texts. The translation of standard texts into easy-to-read texts can reduce these obstacles and enable barrier-free communication. Due to the lack of computer-supported tools, this is a primarily intellectual process, which is why very little information is available as easy-to-read text. Existing approaches dealing with the automation of the intralingual translation process focus in particular on the sub-process of text simplification. In our study, we look at the entire translation process with the aim of identifying the characteristics of a software system that are required to digitize the entire process as needed. We analyse the intralingual translation process and create a model. In addition, we conduct a software requirements analysis, which we use to systematically analyse and document the demands put to the software architecture. The results of our study form the foundation of the development of a software system that can make the intralingual translation process more efficient and effective.