December 2024
·
3 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
December 2024
·
3 Reads
October 2024
·
4 Reads
Data annotation refers to the labeling or tagging of textual data with relevant information. A large body of works have reported positive results on leveraging LLMs as an alternative to human annotators. However, existing studies focus on classic NLP tasks, and the extent to which LLMs as data annotators perform in domains requiring expert knowledge remains underexplored. In this work, we investigate comprehensive approaches across three highly specialized domains and discuss practical suggestions from a cost-effectiveness perspective. To the best of our knowledge, we present the first systematic evaluation of LLMs as expert-level data annotators.
September 2024
·
2 Reads
In diverse professional environments, ranging from academic conferences to corporate earnings calls, the ability to anticipate audience questions stands paramount. Traditional methods, which rely on manual assessment of an audience's background, interests, and subject knowledge, often fall short - particularly when facing large or heterogeneous groups, leading to imprecision and inefficiency. While NLP has made strides in text-based question generation, its primary focus remains on academic settings, leaving the intricate challenges of professional domains, especially earnings call conferences, underserved. Addressing this gap, our paper pioneers the multi-question generation (MQG) task specifically designed for earnings call contexts. Our methodology involves an exhaustive collection of earnings call transcripts and a novel annotation technique to classify potential questions. Furthermore, we introduce a retriever-enhanced strategy to extract relevant information. With a core aim of generating a spectrum of potential questions that analysts might pose, we derive these directly from earnings call content. Empirical evaluations underscore our approach's edge, revealing notable excellence in the accuracy, consistency, and perplexity of the questions generated.
September 2024
·
1 Read
Training large language models (LLMs) from scratch is an expensive endeavor, particularly as world knowledge continually evolves. To maintain relevance and accuracy of LLMs, model editing has emerged as a pivotal research area. While these methods hold promise, they can also produce unintended side effects. Their underlying factors and causes remain largely unexplored. This paper delves into a critical factor-question type-by categorizing model editing questions. Our findings reveal that the extent of performance degradation varies significantly across different question types, providing new insights for experimental design in knowledge editing. Furthermore, we investigate whether insights from smaller models can be extrapolated to larger models. Our results indicate discrepancies in findings between models of different sizes, suggesting that insights from smaller models may not necessarily apply to larger models. Additionally, we examine the impact of batch size on side effects, discovering that increasing the batch size can mitigate performance drops.
September 2024
·
7 Reads
Understanding the duration of news events' impact on the stock market is crucial for effective time-series forecasting, yet this facet is largely overlooked in current research. This paper addresses this research gap by introducing a novel dataset, the Impact Duration Estimation Dataset (IDED), specifically designed to estimate impact duration based on investor opinions. Our research establishes that pre-finetuning language models with IDED can enhance performance in text-based stock movement predictions. In addition, we juxtapose our proposed pre-finetuning task with sentiment analysis pre-finetuning, further affirming the significance of learning impact duration. Our findings highlight the promise of this novel research direction in stock movement prediction, offering a new avenue for financial forecasting. We also provide the IDED and pre-finetuned language models under the CC BY-NC-SA 4.0 license for academic use, fostering further exploration in this field.
September 2024
·
26 Reads
In the era of rapid Internet and social media platform development, individuals readily share their viewpoints online. The overwhelming quantity of these posts renders comprehensive analysis impractical. This necessitates an efficient recommendation system to filter and present significant, relevant opinions. Our research introduces a dual-pronged argument mining technique to improve recommendation system effectiveness, considering both professional and amateur investor perspectives. Our first strategy involves using the discrepancy between target and closing prices as an opinion indicator. The second strategy applies argument mining principles to score investors' opinions, subsequently ranking them by these scores. Experimental results confirm the effectiveness of our approach, demonstrating its ability to identify opinions with higher profit potential. Beyond profitability, our research extends to risk analysis, examining the relationship between recommended opinions and investor behaviors. This offers a holistic view of potential outcomes following the adoption of these recommended opinions.
July 2024
·
1 Read
June 2024
·
3 Reads
·
1 Citation
In this paper, we investigate the phenomena of "selection biases" in Large Language Models (LLMs), focusing on problems where models are tasked with choosing the optimal option from an ordered sequence. We delve into biases related to option order and token usage, which significantly impact LLMs' decision-making processes. We also quantify the impact of these biases through an extensive empirical analysis across multiple models and tasks. Furthermore, we propose mitigation strategies to enhance model performance. Our key contributions are threefold: 1) Precisely quantifying the influence of option order and token on LLMs, 2) Developing strategies to mitigate the impact of token and order sensitivity to enhance robustness, and 3) Offering a detailed analysis of sensitivity across models and tasks, which informs the creation of more stable and reliable LLM applications for selection problems.
May 2024
·
2 Reads
·
2 Citations
Argument mining has typically been researched for specific corpora belonging to concrete languages and domains independently in each research work. Human argumentation, however, has domain-and language-dependent linguistic features that determine the content and structure of arguments. Also, when deploying argument mining systems in the wild, we might not be able to control some of these features. Therefore, an important aspect that has not been thoroughly investigated in the argument mining literature is the robustness of such systems to variations in language and domain. In this paper, we present a complete analysis across three different languages and three different domains that allow us to have a better understanding on how to leverage the scarce available corpora to design argument mining systems that are more robust to natural language variations.
May 2024
·
9 Reads
... have demonstrated that LLMs are sensitive to subtle perturbations in the answer choices. For example, studies by Alzahrani et al. [1], Zheng et al. [37], and Wei et al. [31] have shown that even minor changes such as reordering of the answer options can lead to variability in the models' predictions and, consequently, affect benchmark rankings. In contrast to these studies, our work examines a different and under-explored factor in MCQA design: the impact of including "None of the Above" (NA) as the correct option. ...
January 2024
... Thus limiting the usability of the resulting models in different domains than the ones included during training. Although some work has investigated cross-domain and cross-language argument mining (Al Khatib et al., 2016;Eger et al., 2018), this issue has never been systematically explored in-depth, leaving the door open to a new challenging direction: robust-ness in argument mining (Ruiz-Dolz et al., 2024). Considering the relevance of language and domain in natural language argumentation, developing robust systems is a main issue if we want to be able to effectively deploy them in real-world scenarios. ...
May 2024
... The NumHG data set was prepared to cover the headline generation task by focusing on numbers (Chen et al., 2024Huang et al., 2023). The NumEval competition is held to evaluate different models that can understand the numbers and generate the headline according to the ground truth specified (NumEval). ...
January 2024
... GPT-4o was also more likely to select the first choice 227 presented albeit to a lesser degree (p=.004), and GPT-4 was actually more likely to select the last choice 228 presented (p<.001). This is consistent with previous findings of order bias in LLMs 53,54 . The fact that 229 ...
June 2024
... Overall, there are three major areas of legal NLP: legal question-answering (Khazaeli et al., 2021;Kien et al., 2020;Ryu et al., 2023;Martinez-Gil, 2023;, judgment prediction (Masala et al., 2021;Valvoda et al., 2023;Juan et al., 2023), and corpus mining (e.g., summarization, text classification, information extraction, and retrievalrelated research) (Poudyal et al., 2020;Li et al., 2022;Vihikan et al., 2021;Limsopatham, 2021;de Andrade and Becker, 2023). There has also been some broad methodology work that is aimed at working on various legal tasks in general (e.g., LegalBERT (Chalkidis et al., 2020)). ...
January 2023
... The evidence reviewed here suggests that rationalization models might improve performance by prompting language models in a few-shot manner, with rationale-augmented examples. Using this approach, Chen et al. (2023) introduced ZARA, an approach for data augmentation and extractive rationalization using transformer-based models (Vaswani et al., 2017) such as RoBERTa (Liu et al., 2019b), DeBERTa (He et al., 2020), and BART (Lewis et al., 2020). Along the same lines, Zhou et al. (2023) presented a two-stage few-shot learning method that first generates rationales using GPT-3 (Brown et al., 2020), and then finetunes a smaller rationalization model, RoBERTa, with generated explanations. ...
January 2023
... Our proposed method only incurs a linearly scaling overhead due to the nature of nearest-neighbor algorithm. Self-ICL (Chen et al., 2023) generated its own demonstration and their pseudo-labels and uses them as demonstrations. We disagree with Self-ICL's premise that even unlabeled data are hard to come by in realistic settings, and posit that unlabeled data are abundant and inexpensive to obtain (Zou et al., 2023a,b). ...
January 2023
... Further studies have extended the concept of CS to incorporate additional criteria in scoring candidates for the next token. Chen et al. (2023) investigate the effect of adding a third criterion, fidelity, beyond model confidence and degeneration penalty to enhance the coherence of the generated text. This third criterion is again weighed by a hyperparameter β that is determined from empirical results. ...
January 2023
... Cai et al. (2023) proposed a research contribution recognition model enhanced by semantic role annotation based on SciBERT. Similarly, Liu et al. (2023) divided scientific contributions into four types: approach, analysis, result, topic or resource. They finetuned SciBERT by manually annotating contribution type data, and the model finally achieved a Micro-F1 result of 90.5. ...
October 2023
... Within this innovative framework, they meticulously assessed the effectiveness of graph-based models in predicting function. In a related vein, Tsai et al. (2023) have developed a comprehensive citation graph, which labels citation intents and their associated supporting evidence between citing and cited papers. Their model utilizes SciBERT for processing a wide array of papers. ...
October 2023