Figure - uploaded by Yun-Zhu Song
Content may be subject to copyright.
Human evaluation results.

Human evaluation results.

Source publication
Article
Full-text available
With the rapid proliferation of online media sources and published news, headlines have become increasingly important for attracting readers to news articles, since users may be overwhelmed with the massive information. In this paper, we generate inspired headlines that preserve the nature of news articles and catch the eye of the reader simultaneo...

Contexts in source publication

Context 1
... Attractiveness: given two headlines generated by PORL-HG and (Chen and Bansal 2018), we ask users to choose the one that he/she would click; 2) Relevance: given the human written summary that provided in the CNN/Daily Mail Dataset and the headlines generated by different approaches, users are asked to answer whether the headlines are related to the given summary and can select more than one headline as relevance; and 3) Grammaticality: given generated headlines, people are asked to rate the generated headlines from 1 to 5 (the higher score indicates a better result). Table 3 shows the attractiveness, relevance and grammaticality of headlines generated from the state-of-the-art method and PORL-HG. The results manifest that 63.1% users think that the headlines generated by PORL-HG are more attractive, while only 36.9% users think that the headlines generated by (Chen and Bansal 2018) are more attractive. ...
Context 2
... Attractiveness: given two headlines generated by PORL-HG and (Chen and Bansal 2018), we ask users to choose the one that he/she would click; 2) Relevance: given the human written summary that provided in the CNN/Daily Mail Dataset and the headlines generated by different approaches, users are asked to answer whether the headlines are related to the given summary and can select more than one headline as relevance; and 3) Grammaticality: given generated headlines, people are asked to rate the generated headlines from 1 to 5 (the higher score indicates a better result). Table 3 shows the attractiveness, relevance and grammaticality of headlines generated from the state-of-the-art method and PORL-HG. The results manifest that 63.1% users think that the headlines generated by PORL-HG are more attractive, while only 36.9% users think that the headlines generated by (Chen and Bansal 2018) are more attractive. ...

Citations

... Besides, we find that focusing solely on attractiveness in sequence generation might compromise the fidelity, which has been observed in attractive headline generation studies (Song et al., 2020;. To address this, we incorporate a textual entailment model to improve the faithfulness of produced copywriting. ...
... Xu et al. (2019) train a CNN-based sensationalism scorer to make Pointer Generator (See et al., 2017) create sensational headlines. Song et al. (2020) train a popularity predictor from the click of view and take the popularity score and ROUGE-L score (Lin, 2004) to improve the model. A follow-up study claims that the number of clicks may be affected by trending topics, making click rate not a suitable popularity indicator . ...
... Another research line is to make the control codes as learning targets. Much research has attempted to train scorers for target attributes, and utilized them as reward functions within the RL framework (Song et al., 2020;Stiennon et al., 2020) or the sampling bias during the decoding process (Krause et al., 2021;Mireshghallah et al., 2022). However, for PHG, control codes are the user preferences encapsulated in click histories. ...
Article
Full-text available
Personalized Headline Generation aims to generate unique headlines tailored to users’ browsing history. In this task, understanding user preferences from click history and incorporating them into headline generation pose challenges. Existing approaches typically rely on predefined styles as control codes, but personal style lacks explicit definition or enumeration, making it difficult to leverage traditional techniques. To tackle these challenges, we propose General Then Personal (GTP), a novel framework comprising user modeling, headline generation, and customization. We train the framework using tailored designs that emphasize two central ideas: (a) task decoupling and (b) model pre-training. With the decoupling mechanism separating the task into generation and customization, two mechanisms, i.e., information self-boosting and mask user modeling, are further introduced to facilitate the training and text control. Additionally, we introduce a new evaluation metric to address existing limitations. Extensive experiments conducted on the PENS dataset, considering both zero-shot and few-shot scenarios, demonstrate that GTP outperforms state-of-the-art methods. Furthermore, ablation studies and analysis emphasize the significance of decoupling and pre-training. Finally, the human evaluation validates the effectiveness of our approaches.1
... Despite their success in attracting readers, there are several challenges in current models. First, clickbait datasets for training headline generators with sensational style transfer are commonly collected based on the amount of views or clicks, which assumes that headline popularity is always due to the writing style (Song et al., 2020). However, user reading preferences could also be motivated by trending topics or major events. ...
... Xu et al. (2019) propose auto-tuned reinforcement learning to generate sensational headlines using a pretrained sensationalism scorer; the resulting score is used as the reward to enhance the attractiveness. Although generating attractive headlines has been widely explored (Song et al., 2020;Jin et al., 2020), we focus more on fidelity to ensure that the semantics of the generated headline are faithful to the source content to avoid harmful hallucination. ...
Preprint
Full-text available
Current methods for generating attractive headlines often learn directly from data, which bases attractiveness on the number of user clicks and views. Although clicks or views do reflect user interest, they can fail to reveal how much interest is raised by the writing style and how much is due to the event or topic itself. Also, such approaches can lead to harmful inventions by over-exaggerating the content, aggravating the spread of false information. In this work, we propose HonestBait, a novel framework for solving these issues from another aspect: generating headlines using forward references (FRs), a writing technique often used for clickbait. A self-verification process is included during training to avoid spurious inventions. We begin with a preliminary user study to understand how FRs affect user interest, after which we present PANCO1, an innovative dataset containing pairs of fake news with verified news for attractive but faithful news headline generation. Automatic metrics and human evaluations show that our framework yields more attractive results (+11.25% compared to human-written verified news headlines) while maintaining high veracity, which helps promote real information to fight against fake news.
... News headline generation, conventionally considered as a paradigm of text summarization tasks, has been extensively researched (Tan et al., 2017;Goyal et al., 2022). Advances in automation range from heuristic approaches like parse-andtrim (Dorr et al., 2003) to sophisticated machine learning algorithms like recurrent neural networks (Lopyrev, 2015), Universal Transformer (Gavrilov et al., 2019), reinforcement learning (Song et al., 2020;Xu et al., 2019), large-scale generation models trained with a distant supervision approach (Gu et al., 2020), and large language models (Zhang et al., 2023). Zhang et al. (2023) demonstrated that news summaries generated by freelance writers or Instruct GPT-3 Davinci received an equivalent level of preference from human annotators. ...
... Despite their success in attracting readers, there are several challenges in current models. First, clickbait datasets for training headline generators with sensational style transfer are commonly collected based on the amount of views or clicks, which assumes that headline popularity is always due to the writing style (Song et al., 2020). However, user reading preferences could also be motivated by trending topics or major events. ...
... Xu et al. (2019) propose auto-tuned reinforcement learning to generate sensational headlines using a pretrained sensationalism scorer; the resulting score is used as the reward to enhance the attractiveness. Although generating attractive headlines has been widely explored (Song et al., 2020;Jin et al., 2020), we focus more on fidelity to ensure that the semantics of the generated headline are faithful to the source content to avoid harmful hallucination. ...
... An online article's popularity can be defined in terms of its page views [34,40], which is a by-product of the regular Internet browsing activities of the global population. Similarly, within a particular online document, information popularity of a unit sentence is proportional to the number of received requests actively seeking its contained information. ...
... For informative documents (including online news), a preferred surrogate of popularity has been the number of pageview hits [3,31,34]. Intuitively, pageviews captures the generic browsing trends of the population not limited to social media actions. Our task, however, requires knowledge of more fine-grained Internet browsing activity for annotating specific sentences with popularity labels. ...
Preprint
Full-text available
Multiple studies have focused on predicting the prospective popularity of an online document as a whole, without paying attention to the contributions of its individual parts. We introduce the task of proactively forecasting popularities of sentences within online news documents solely utilizing their natural language content. We model sentence-specific popularity forecasting as a sequence regression task. For training our models, we curate InfoPop, the first dataset containing popularity labels for over 1.7 million sentences from over 50,000 online news documents. To the best of our knowledge, this is the first dataset automatically created using streams of incoming search engine queries to generate sentence-level popularity annotations. We propose a novel transfer learning approach involving sentence salience prediction as an auxiliary task. Our proposed technique coupled with a BERT-based neural model exceeds nDCG values of 0.8 for proactive sentence-specific popularity forecasting. Notably, our study presents a non-trivial takeaway: though popularity and salience are different concepts, transfer learning from salience prediction enhances popularity forecasting. We release InfoPop and make our code publicly available: https://github.com/sayarghoshroy/InfoPopularity