April 2025
What is this page?
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
Publications (14)
July 2024
·
3 Reads
May 2024
·
1 Read
In-context learning (ICL), which promotes inference with several demonstrations, has become a widespread paradigm to stimulate LLM capabilities for downstream tasks. Due to context length constraints, it cannot be further improved in spite of more training data, and general features directly from LLMs in ICL are not adaptive to the specific downstream task. In this paper, we propose a feature-adaptive and data-scalable in-context learning framework (FADS-ICL), which can leverage task-adaptive features to promote inference on the downstream task, with the supervision of beyond-context samples. Specifically, it first extracts general features of beyond-context samples via the LLM with ICL input form one by one, and introduces a task-specific modulator to perform feature refinement and prediction after fitting a specific downstream task. We conduct extensive experiments on FADS-ICL under varying data settings (4128 shots) and LLM scale (0.870B) settings. Experimental results show that FADS-ICL consistently outperforms previous state-of-the-art methods by a significant margin under all settings, verifying the effectiveness and superiority of FADS-ICL. For example, under the 1.5B and 32 shots setting, FADS-ICL can achieve \textbf{+14.3} average accuracy from feature adaptation over vanilla ICL on 10 datasets, with \textbf{+6.2} average accuracy over the previous state-of-the-art method, and the performance can further improve with increasing training data. Code and data are publicly available at \url{https://github.com/jiahaozhenbang/FADS-ICL}.
January 2024
·
5 Reads
January 2024
·
7 Reads
January 2024
·
4 Reads
·
3 Citations
January 2024
·
1 Citation
January 2024
·
4 Reads
January 2024
·
3 Reads
·
1 Citation
Signal Processing Letters, IEEE
Although the pre-trained language models have achieved great success on machine reading comprehension task, they often rely on large-scale annotated data, while only a little amount of data is available in the most real-world scenarios. To enhance the PTLMs' capabilities in low-resource scenario, we propose a curriculum learning driven domain adaptation method for low-resource machine reading comprehension, the basic paradigm of which is to train a source model with sufficient data and then adaptive it to our target domain. In the adapting procedure, we introduce the curriculum learning strategy, the core idea of which is arranging training examples from easy to difficult, to bridge the gap between source and target domains and enable the source model adapting to the target domain progressively. Specifically, before fine-tuning the well-trained source model using target data, we firstly calculate the loss of each target example using the source model to evaluating the example difficulty accurately. After that, we sample suitable batches based on an increasing sampling function at each fine-tuning step, allowing the source model to start learning from easy examples in the target domain and gradually transition to difficult ones. Experiments conducted on two public datasets have demonstrated the effectiveness of our method.
January 2023
·
49 Reads
·
3 Citations
JUSTC
Question generation aims to generate meaningful and fluent questions, which can address the lack of question-answer type annotated corpus by augmenting the available data. Using unannotated text with optional answers as input contents, question generation can be divided into two types based on whether answers are provided: answer-aware and answer-agnostic. While generating questions with providing answers is challenging, generating high-quality questions without providing answers is even more difficult, for both humans and machines. In order to address this issue, we proposed a novel end-to-end model called QGAE, which is able to transform answer-agnostic question generation into answer-aware question generation by directly extracting candidate answers. This approach effectively utilizes unlabeled data for generating high-quality question-answer pairs, and its end-to-end design makes it more convenient compared to a multi-stage method that requires at least two pre-trained models. Moreover, our model achieves better average scores and greater diversity. Our experiments show that QGAE achieves significant improvements in generating question-answer pairs, making it a promising approach for question generation.
Citations (7)
... In addition to directly selecting examples from training data, another research trend involves utilizing LLMs to reformat the representation of existing demonstrations Hao et al., 2022b;Liu et al., 2024a;Li et al., 2024a) (Sorensen et al., 2022) Human design GPT-3 Mutual Information EPR (Rubin et al., 2022) Human design GPT-{J, 3}/CodeX Score-based Retrieval IDS (Qin et al., 2023) Human design GPT-3.5 Iterative Selection AdaICL (Mavromatis et al., 2023) Human design GPT-{J, Neo} Selective Demonstration UDR (Li et al., 2023d) Human design GPT-Neo-2.7B Unified Retrieval ...
Reference:
A Survey on In-context Learning
- Citing Conference Paper
January 2024
... Curriculum learning [2] has been extensively studied as a training paradigm that orders the training set by increasing difficulty to enhance stability and sample efficiency. In the context of question answering (QA), curriculum learning has been leveraged to bridge distributional gaps between pre-training and downstream fine-tuning datasets [31], mitigating domain shift and improving generalization. Recent advances in LLMs have incorporated curriculum-inspired self-improvement mechanisms [10], wherein models iteratively augment their training data with instances they can already solve, thereby facilitating generalization to more complex reasoning tasks. ...
- Citing Article
January 2024
Signal Processing Letters, IEEE
... Recent parameter-efficient approaches in KGs representation have introduced methods for reducing model complexity and embedding dimensions by utilizing only a small subset of entities [14,15]. In these methods, entities for embedding are chosen randomly beforehand, and rather than independently embedding each entity, the model leverages specific types of distinguishing information to encode all entities. ...
- Citing Conference Paper
January 2023
... Providing an appropriate context for QG is crucial in order to produce questions that are relevant to the educational material. Before LLMs appeared, related work [10], [5], [9] implemented context-specific QG models using well-known datasets. Other work focused on educational environments, such as in [4], where resources like school repositories, Wikipedia, or other websites provided context. ...
- Citing Article
January 2023
JUSTC
... Given the scarcity of parallel data (i.e., text pairs conveying the same content but differing in styles) and the labor-intensive nature of annotating such pairs, existing research has predominantly focused on unsupervised TST. Recent contributions in this domain, including studies by Lee et al., 2021;Huang et al., 2021;Suzgun et al., 2022;Ramesh Kashyap et al., 2022;Han et al., 2023), have demonstrated significant progress. Despite notable success, these works primarily concentrate on the transfer of a single sentence, which we call short TST. ...
- Citing Conference Paper
January 2023
... Recently, various studies [3,29,31,41,52] have been conducted to achieve efficient data learning. Most real datasets are large and have different data difficulties. ...
- Citing Article
January 2021
IEEE/ACM Transactions on Audio Speech and Language Processing
... A series of illustrative experiments were meticulously crafted to showcase the advantages of employing curriculum strategies in both image classification and language modeling. When focusing on the field of NLP, by experimenting with several heuristics, (Sachan and Xing 2016;Xu et al. 2020) migrated the success of CL to NLU tasks. Zhou et al. 2021) improved the machine translation modeling by carefully designing different curricula. ...
- Citing Conference Paper
January 2020