Figure 3 - uploaded by Pushpendre Rastogi
Content may be subject to copyright.
Source publication
We increase the lexical coverage of FrameNet through automatic paraphras-ing. We use crowdsourcing to manually filter out bad paraphrases in order to en-sure a high-precision resource. Our ex-panded FrameNet contains an additional 22K lexical units, a 3-fold increase over the current FrameNet, and achieves 40% better coverage when evaluated in a pr...
Similar publications
In this paper, combined with the Kalman filtering algorithm, the linear weighted sum of the normalized prediction information increment DI and the norm of target filter error covariance matrix ║ c P ║ is used to represent the pairing function of the target-sensor combination and the assignment and automatic updating of the pairing function between...
The present study carried out in the Highest Bandama watershed at Tortiya, in the northern of Côte d’Ivoire,
aims to evaluate climate variability impact on groundwater resources in order to best management. It is based on rainfall and runoff data exploitation. Thus, application of low-pass filter of Hanning and statistic tests (Pettitt, Lee and Heg...
Deep Packet Inspection (DPI) plays a central role in modern networks. Specifically, content providers need a way to monitor the traffic that enters and leave their datacenters. Most DPI techniques depend on using a predefined set of signatures in the packet payload. This matching process consumes a lot of memory and CPU resources, and thus, many re...
This review paper provides a comprehensive overview of microstrip passive components for energy harvesting and 5G applications. The paper covers the structure, fabrication and performance of various microstrip passive components such as filters, couplers, diplexers and triplexers. The size and performance of several 5G and energy harvester microstr...
Automatic patch generation is often described as a search problem of patch candidate space, and it has two major issues: one is search space size, and the other is navigation. An effective patch generation technique should have a large search space with a high probability that patches for bugs are included, and it also needs to locate such patches...
Citations
... In an effort to improve FrameNet's LU coverage, Pavlick et al. (2015) proposes increasing the LU vocabulary via automatic paraphrasing and crowdworker verification, without expanding the lexicographic annotations. Others address this limitation by generating annotations through lexical substitution (Anwar et al., 2023) and predicate replacement (Pancholy et al., 2021); neither leverages the generative capabilities of LLMs, however. ...
Despite the remarkable generative capabilities of language models in producing naturalistic language, their effectiveness on explicit manipulation and generation of linguistic structures remain understudied. In this paper, we investigate the task of generating new sentences preserving a given semantic structure, following the FrameNet formalism. We propose a framework to produce novel frame-semantically annotated sentences following an overgenerate-and-filter approach. Our results show that conditioning on rich, explicit semantic information tends to produce generations with high human acceptance, under both prompting and finetuning. Our generated frame-semantic structured annotations are effective at training data augmentation for frame-semantic role labeling in low-resource settings; however, we do not see benefits under higher resource settings. Our study concludes that while generating high-quality, semantically rich data might be within reach, the downstream utility of such generations remains to be seen, highlighting the outstanding challenges with automating linguistic annotation tasks.
... In an effort to improve FrameNet's LU coverage, Pavlick et al. (2015) proposes increasing the LU vocabulary via automatic paraphrasing and crowdworker verification, without expanding the lexicographic annotations. Others address this limitation by generating annotations through lexical substitution (Anwar et al., 2023) and predicate replacement (Pancholy et al., 2021); neither leverages the generative capabilities of LLMs, however. ...
... We have used various NLP datasets to see the effectiveness of KL Regularized normalization in the out of domain generalization .We have used datasets referred in [34], including SICK [35], ADD1 [42], JOCI [56], MPE [30], MNLI, SNLI, SciTail [24], and three datasets from [52] namely DPR [45], FN+ [43], SPR [46], and Quora Question Pairs (QQP) interpreted as an NLI task as by [14]. We use the same split used in [51] for the experiment. ...
Large pre-trained models, such as Bert, GPT, and Wav2Vec, have demonstrated great potential for learning representations that are transferable to a wide variety of downstream tasks . It is difficult to obtain a large quantity of supervised data due to the limited availability of resources and time. In light of this, a significant amount of research has been conducted in the area of adopting large pre-trained datasets for diverse downstream tasks via fine tuning, linear probing, or prompt tuning in low resource settings. Normalization techniques are essential for accelerating training and improving the generalization of deep neural networks and have been successfully used in a wide variety of applications. A lot of normalization techniques have been proposed but the success of normalization in low resource downstream NLP and speech tasks is limited. One of the reasons is the inability to capture expressiveness by rescaling parameters of normalization. We propose KullbackLeibler(KL) Regularized normalization (KL-Norm) which make the normalized data well behaved and helps in better generalization as it reduces over-fitting, generalises well on out of domain distributions and removes irrelevant biases and features with negligible increase in model parameters and memory overheads. Detailed experimental evaluation on multiple low resource NLP and speech tasks, demonstrates the superior performance of KL-Norm as compared to other popular normalization and regularization techniques.
... Data augmentation in the context of FrameNet and SRL is not novel. Pavlick et al. (2015) created an expanded FN via automatic paraphrasing and crowdsourcing to confirm frame assignment of LUs. Hartmann et al. (2017) used automatically generated training data by linking FrameNet, Prop-Bank, and VerbNet to study the differences among those resources in frame assignment and semantic role classification. ...
While FrameNet is widely regarded as a rich resource of semantics in natural language processing, a major criticism concerns its lack of coverage and the relative paucity of its labeled data compared to other commonly used lexical resources such as PropBank and VerbNet. This paper reports on a pilot study to address these gaps. We propose a data augmentation approach, which uses existing frame-specific annotation to automatically annotate other lexical units of the same frame which are unannotated. Our rule-based approach defines the notion of a sister lexical unit and generates frame-specific augmented data for training. We present experiments on frame-semantic role labeling which demonstrate the importance of this data augmentation: we obtain a large improvement to prior results on frame identification and argument identification for FrameNet, utilizing both full-text and lexicographic annotations under FrameNet. Our findings on data augmentation highlight the value of automatic resource creation for improved models in frame-semantic parsing.
... To evaluate out-of-domain generalization, we take NLI models trained on medium-sized 6K subsampled SNLI and MNLI in Section 3.2 and evaluate their generalization on several NLI datasets. (Rahman & Ng, 2012), FN+ (Pavlick et al., 2015), SPR (Reisinger et al., 2015), and Quora Question Pairs (QQP) interpreted as an NLI task as by Gong et al. (2017). We use the same split used in Wang et al. (2017). ...
While large-scale pretrained language models have obtained impressive results when fine-tuned on a wide variety of tasks, they still often suffer from overfitting in low-resource scenarios. Since such models are general-purpose feature extractors, many of these features are inevitably irrelevant for a given target task. We propose to use Variational Information Bottleneck (VIB) to suppress irrelevant features when fine-tuning on low-resource target tasks, and show that our method successfully reduces overfitting. Moreover, we show that our VIB model finds sentence representations that are more robust to biases in natural language inference datasets, and thereby obtains better generalization to out-of-domain datasets. Evaluation on seven low-resource datasets in different tasks shows that our method significantly improves transfer learning in low-resource scenarios, surpassing prior work. Moreover, it improves generalization on 13 out of 15 out-of-domain natural language inference benchmarks. Our code is publicly available in https://github.com/rabeehk/vibert.
... Snow et al. (2006) used hypernym predictions and coordinate term classifiers to add 10,000 new WordNet entries with high precision. FrameNet+ (Pavlick et al., 2015) tripled the size of FrameNet by substituting words from PPDB (Ganitkevitch et al., 2013), a collection of primarily word-level paraphrases obtained via bilingual pivoting. PPDB paraphrases lack sentential context; for example, ''river bank'', ''bank account'', and ''data bank'' are listed as paraphrases of ''bank'', in addition to the broader and incorrectly cased ''organizations'' and less related still, ''administrators'', 1 without any means of determining when one might not be a valid substitute. 2 While the FrameNet+ expansion itself involved little cost, the lexicalized nature of their procedure failed to capture word senses in context and resulted in many false positives, requiring costly manual evaluation of every sentence. ...
... FrameNet has been used in tasks ranging from question-answering (Shen and Lapata, 2007) and information extraction (Ruppenhofer and Rehbein, 2012) to semantic role labeling (Gildea and Jurafsky, 2002) and recognizing textual entailment (Burchardt and Frank, 2006), in addition to finding utility as a lexicographic compendium. As a manually created resource, FrameNet is limited by the size of its lexical inventory and number of annotations (Shen and Lapata, 2007;Pavlick et al., 2015). ...
... We demonstrate the usefulness of our approach on downstream tasks in §6.3, where we apply our generated paraphrastic dataset to the task of Frame ID. Following Pavlick et al. (2015), we consider FrameNet as an illustrative resource motivating augmentation. In all experiments we treat each system output (paraphrase and alignment) as evoking the same frame as the original FrameNet input sentence. ...
We introduce a novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment. Our approach allows for the large-scale expansion of existing datasets or the rapid creation of new datasets using a small, manually produced seed corpus. We demonstrate our approach with experiments on the Berkeley FrameNet Project, a large-scale language understanding effort spanning more than two decades of human labor. With four days of training data collection for a span alignment model and one day of parallel compute, we automatically generate and release to the community 495,300 unique (Frame,Trigger) pairs in diverse sentential contexts, a roughly 50-fold expansion atop FrameNet v1.7. The resulting dataset is intrinsically and extrinsically evaluated in detail, showing positive results on a downstream task.
... Data augmentation in the context of FrameNet and SRL is not novel. Pavlick et al. (2015) created an expanded FN via automatic paraphrasing and crowdsourcing to confirm frame assignment of LUs. Hartmann et al. (2017) used automatically generated training data by linking FrameNet, Prop-Bank, and VerbNet to study the differences among those resources in frame assignment and semantic role classification. ...
... During the annotation process, we found some difficulties because of the coverage issue of the FrameNet. The wordto-frame mapping in FrameNet has a coverage issue, and it has been widely reported in the literature (Pavlick et al., 2015;Botschen et al., 2017). ...
... To handle the first two cases (finding of a frame), we use synonyms from WordNet (Miller, 1994) and FrameNet+ data (Pavlick et al., 2015). To simplify the annotation process, we develop a tool ( Figure 2) that makes the annotation process easier. ...
... Several efforts have enhanced FrameNet by mapping it to other lexicons, such as WordNet, PropBank and VerbNet (Shi and Mihalcea, 2005;Palmer, 2009;Ferrández et al., 2010). Pavlick et al. (2015) increased the lexical coverage of FrameNet through automatic paraphrasing and manual verification. Yatskar et al. (2016) introduced situation recognition, which is the problem of producing a concise summary of the situation that an image depicts. ...
... We report the detailed results per heuristics on HANS in Table 6 and the results on the mismatched hard test set of MNLI in Table 7. (Rahman & Ng, 2012), FrameNet+ (FN+) (Pavlick et al., 2015) and Semantic Proto-Roles (SPR) (Reisinger et al., 2015). We also evaluate on the hard SNLI test set (Gururangan et al., 2018), which is a set where a hypothesis-only model cannot solve easily. ...
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended underlying task. Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available. We consider cases where the bias issues may not be explicitly identified, and show a method for training models that learn to ignore these problematic correlations. Our approach relies on the observation that models with limited capacity primarily learn to exploit biases in the dataset. We can leverage the errors of such limited capacity models to train a more robust model in a product of experts, thus bypassing the need to hand-craft a biased model. We show the effectiveness of this method to retain improvements in out-of-distribution settings even if no particular bias is targeted by the biased model.