Figure 1 - available via license: Creative Commons Zero 1.0
Content may be subject to copyright.
Baseline sequence-to-sequence model's architecture with attention [See et al., 2017]

Baseline sequence-to-sequence model's architecture with attention [See et al., 2017]

Source publication
Preprint
Full-text available
In this work, we study abstractive text summarization by exploring different models such as LSTM-encoder-decoder with attention, pointer-generator networks, coverage mechanisms, and transformers. Upon extensive and careful hyperparameter tuning we compare the proposed architectures against each other for the abstractive text summarization task. Fin...

Contexts in source publication

Context 1
... this work, as the baseline model we consider an LSTM Encoder-Decoder architecture with attention as shown in Figure 1. ...
Context 2
... revisit the model proposed by See et al. [2017] in the following and as well compare it with a transformer based model proposed by Miyagishima et al. [2014] for machine translation tasks, and finally use it as a feature generation mechanism for fake news classification. Pointer-Generator Mechanism: Pointer-generator is a hybrid network that chooses during training and test whether to copy words from the source via pointing or to generate words from a fixed vocabulary set. Figure 2 shows the architecture for the pointer-generator mechanism where the decoder part is modified compared to Figure 1. In Figure 1, the baseline model, only an attention distribution and a vocabulary distribution are calculated. ...
Context 3
... Mechanism: Pointer-generator is a hybrid network that chooses during training and test whether to copy words from the source via pointing or to generate words from a fixed vocabulary set. Figure 2 shows the architecture for the pointer-generator mechanism where the decoder part is modified compared to Figure 1. In Figure 1, the baseline model, only an attention distribution and a vocabulary distribution are calculated. However, here in the pointer-generator network a generation probability p gen , which is a scalar value between 0 and 1 is also calculated which represents the probability of generating a word from the vocabulary, versus copying a word from the source text. ...

Similar publications

Article
Full-text available
Text summarying is a process by which the most important information from the source document is precisely found. It stands for the information condensed to a longer text. Text summary is broken down into two approaches: extractive summary and abstractive summary. The proposed method creates an extractive summary of a given text and generate an app...