Figure - uploaded by Mausam Mausam
Content may be subject to copyright.
In SmallCorpus, SRL-IE-Lund has the highest precision. Taking the union of the SRL systems and the higher precision results from TextRunner achieves the highest recall and F-measure. Both SRL-based systems require over an order of magnitude more processing time. The bold values indicate the highest values for the metric and relation-type.
Source publication
Open Information Extraction extracts relations from text without requiring a pre-specified domain or vocabulary. While existing techniques have used only shallow syntactic features, we investigate the use of semantic role labeling techniques for the task of Open IE. Semantic role labeling (SRL) and Open IE, although developed mostly in isolation, a...
Contexts in source publication
Context 1
... we compute pseudorecall by taking the union of correct tuples from all methods as denominator. 1 Table 2 reports the performance of the three extractors on our data sets for this traditional NLP setting. Overall, SRL-IE-Lund achieves the highest precision, and SRL-IE-UIUC achieves the highest recall and the highest F1 score. ...Context 2
... taking a union of the SRL-based systems' output and the highest precision subset of TextRunner's extractions, we achieve the highest recall and F-measure (Table 2). We identify the highest precision subset of TextRunner's extractions by our novel locality ranking (see Figure 2). 2 This shows the benefit of using multiple systems for extraction -they extract different tuples. ...Citations
... It utilizes a set of patterns in order to obtain propositions but does not capture the 'context' of each clause for effective extraction. A follow-up study relies on semantic features (semantic roles) for the OIE task, demonstrating that Semantic role labeling (SRL) can be used to increase the precision and recall of OIE [8]. Separately, a greedy parser, which relies on a classifier to predict the correct transition based on a small number of dense features, is treated for speedy parsing [6]. ...
Large Language Models (LLMs) have received considerable interest in wide applications lately. During pre-training via massive datasets, such a model implicitly memorizes the factual knowledge of trained datasets in its hidden parameters. However, knowledge held implicitly in parameters often makes its use by downstream applications ineffective due to the lack of common-sense reasoning. In this article, we introduce a general framework that permits to build knowledge bases with an aid of LLMs, tailored for processing Web news. The framework applies a rule-based News Information Extractor (NewsIE) to news items for extracting their relational tuples, referred to as knowledge bases, which are then graph-convoluted with the implicit knowledge facts of news items obtained by LLMs, for their classification. It involves two lightweight components: 1) NewsIE: for extracting the structural information of every news item, in the form of relational tuples; 2) BERTGraph: for graph convoluting the implicit knowledge facts with relational tuples extracted by NewsIE. We have evaluated our framework under different news-related datasets for news category classification, with promising experimental results.
... This broadly comprises of two steps -identifying entities signifying specific information of interest (known as named entity recognition or NER), and inferring the relationships between identified entities (known as relation extraction or RE). Approaches underlying these tasks utilize grammatical structures to extract semantic frames [96,97], utilizing knowledge bases for relation inference [98], syntactic dependency parsing-based information extraction [99], semantic role labeling [100], coreference resolution [101], and so on. [110,90], process flowsheets [92,84], and many more applications of graph-based data mining as presented in [111]. ...
Process systems engineering (PSE) involves a systems-level approach to solving problems in chemical engineering related to process modeling, design, control, and optimization and involves modeling interactions between various systems (and subsystems) governing the process. This requires using a combination of mathematical methods, physical intuition, and recently machine learning techniques. Recently, language models have seen tremendous advances due to new and more efficient model architectures (such as transformers), computing power, and large volumes of training data.
Many of these language models could be appropriately adapted to solve several PSE-related problems. However, language models are inherently complex and are often characterized by several million parameters, which could only be trained efficiently in data-rich areas, unlike PSE. Moreover, PSE is characterized by decades of rich process knowledge that must be utilized during model training to avoid mismatch between process knowledge and data-driven language models.
This thesis presents a framework for building domain-informed language models for several central problems in PSE spanning multiple scales. Specifically, the frameworks presented include molecular property prediction, forward and retrosynthesis reaction outcome prediction, chemical flowsheet representation and generation, pharmaceutical information extraction, and reaction classification. Domain knowledge is integrated with language models using custom model architectures, standard and custom-built ontologies, linguistics-inspired chemistry and process flowsheet grammar, adapted problem formulations, graph theory techniques, and so on. This thesis is intended to provide a path for future developments of domain-informed language models in process systems engineering that respect domain knowledge, but leverage their computational advantages.
... VBSRL and open IE. The adoption of SRL in machine understanding is not a novelty in the Open IE panorama; in fact, its potential has been already recognized and exploited in several Open IE system implementations -see, for example, Exemplar (de Sá Mesquita et al., 2013) and SRL-IE (Christensen et al., 2010). This constitutes an important legacy that guarantees some important benefits to the Cnosso method devised in the next section, and addresses some key points to improve the results of Open IE techniques -as pointed out in Etzioni et al. (2011). ...
... As an important subtask of information extraction and knowledge acqui- Unsupervised and rule-based systems mainly apply syntax features to de-95 signed syntactic constraints or paradigms in order to extract relationships between entities. Typical systems include TextRunner [2], ReVerb [3], SRLIE [14], ClausIE [15], RelNoun [16], PropS [17], OpenIE4 [18], MinIE [19], Graphene [20], and CALMIE [21]. For example, researchers in the ReVerb system extracted relations in the form of (arg1, relation phrase, arg2) by 1) articulating 100 two simple but powerful constraints (that is, a syntactic constraint and a lexical constraint) and 2) expressing relation phrases via verbs in English sentences. ...
Open Relation Extraction (ORE) is a task of extracting semantic relations from a text document. Current ORE systems have significantly improved their efficiency in obtaining Chinese relations, when compared with conventional systems which heavily depend on feature engineering or syntactic parsing. However, the ORE systems do not use robust neural networks such as pre-trained language models to take advantage of large-scale unstructured data effectively. In respons to this issue, a new system entitled Chinese Open Relation Extraction with Knowledge Enhancement (CORE-KE) is presented in this paper. The CORE-KE system employs a pre-trained language model (with the support of a Bidirectional Long Short-Term Memory (BiLSTM) layer and a Masked Conditional Random Field (Masked CRF) layer) on unstructured data in order to improve Chinese open relation extraction. Entity descriptions in Wikidata and additional knowledge (in terms of triple facts) extracted from Chinese ORE datasets are used to fine-tune the pre-trained language model. In addition, syntactic features are further adopted in the training stage of the CORE-KE system for knowledge enhancement. Experimental results of the CORE-KE system on two large-scale datasets of open Chinese entities and relations demonstrate that the CORE-KE system is superior to other ORE systems. The F1-scores of the CORE-KE system on the two datasets have given a relative improvement of 20.1% and 1.3%, when compared with benchmark ORE systems, respectively. The source code is available at https://github.com/cjwen15/CORE-KE.
... Examples of first-generation Open IE systems are TextRunner (Banko et al., 2007), WOE (Wu & Weld, 2010), which uses Wikipedia as a source of training data, StatSnowBall (Zhu et al., 2009), and SRLIE (Christensen et al., 2010), based on semantic role labelling. At this stage, Open IE can operate without knowing the focus relations a priori and can extract all relations simultaneously. ...
Tenders are powerful means of investment of public funds and represent a strategic development resource. Despite the efforts made so far by governments at national and international levels to digitalise documents related to the Public Administration sector, most of the information is still available in an unstructured format only. With the aim of bridging this gap, we present OIE4PA, our latest study on extracting and classifying relations from tenders of the Public Administration. Our work focuses on the Italian language, where the availability of linguistic resources to perform Natural Language Processing tasks is considerably limited. Nevertheless, OIE4PA adopts a multilingual approach so it can be applied to several languages by providing appropriate training data. Rather than purely training a classifier on a portion of the extracted relations, the backbone idea of our learning strategy is to put a supervised method based on self-training to the proof and to assess whether or not it improves the performance of the classifier. For evaluation purposes, we built a dataset composed of 2,000 triples which have been manually annotated by two human experts. The in-vitro evaluation shows that OIE4PA achieves a MacroF1 equal to 0.89 and a 91% accuracy. In addition, OIE4PA was used as the pillar of a prototype search engine, which has been evaluated through an in-vivo experiment with positive feedback from 32 final users, obtaining a SUS score equal to 83.98.
... Semantic Role Labeling (SRL) is a NLP task that aims to reveal the underlying semantic structure of a sentence by identifying predicate-argument structure and classifying their semantic roles. This process is important for understanding the meaning of natural language sentences and plays a useful role in various NLP tasks such as information extraction and machine translation systems [138,139]. ...
... The benefit of this method is that it may be used with texts from any domain. Some OpenIE systems that can extract information from the free text include: KnowItAll [52], TEX-TRUNNER [53], REVERB [54], SRL-IE [55], OLLIE [56], and RELNOUN [57]. ...
Information Extraction (IE) refers to the process of automatically extracting structured data from unstructured sources to enable the utilization of such data by other applications. Extracting relations from textual sources, which seeks to detect the semantic relation represented between entities ref-erenced in the texts, is a common sub-problem. The objective of the RE task is to develop automatic extractors that can identify and extract structured, relational information from unstructured sources like natural language text. Assigning a relationship label to a pair of entities may be considered a classification problem. As a result, supervised machine learning methodologies can be employed. It is essential to pre-process the data using methods from natural language processing to organize the textual contents into meaningful data structures before extracting relations from the unprocessed text. In addition , as relations are represented between entities, it is necessary to locate the entities using an entity extraction technique, which is another information extraction sub-problem. Relation extraction methods that use entity-annotated text are called pipeline approaches. Relations can be represented between two or more than two entities which are known as binary relations and N-ary relations, respectively. This thesis limits our research to binary relations using pipeline approaches.
... Baselines. We compare the proposed methods on the CaRB and LSOIE datasets with the state-ofthe-art approaches, which include: (1) non-neural models: MinIE (Gashteovski et al., 2017); ClausIE (Corro and Gemulla, 2013); OIIIE (Mausam et al., 2012); ReVerb (Fader et al., 2011); OpenIE4 (Christensen et al., 2011); OpenIE5 (Saha et al., 2017;Saha and Mausam, 2018); (2) sequence labelling based methods: RnnOIE (Stanovsky et al., 2018); SenseOIE (Roy et al., 2019); SpanOIE (Zhan and Zhao, 2020); (3) generation based methods: NeuralOIE (Cui et al., 2018); IMoJIE (Kolluru et al., 2020b); OpenIE6 (Kolluru et al., 2020a); (4) detection-based model DetIE (Vasilkovsky et al., 2022), which could also be applied to the IGL-CA model in OpenIE6 with 'simplified' texts. Please note that we utilize the provided checkpoint from DetIE to reproduce the experiment, so the results may be different from the original paper. ...
... • Stanford-OpenIE (Angeli et al., 2015) uses fourteen hand-crafted patterns defined over a dependency parse of the input text sequence in order to identify relational triples. • OpenIE5 6 combines four approaches-CALMIE (Saha and Mausam, 2018), BONIE (Saha et al., 2017), RelNoun (Pal and Mausam, 2016) and SRLIE (Christensen et al., 2011) to extract relational triples. It uses a combination of hand-crafted and automatically mined patterns using syntactic and surface-form information. ...
... Del Corro and Gemulla [38] propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. Also, some rule-based systems using man-made extraction rules are proposed, including verb-based [39] , semantic role labeling [40] , dependency parse trees [41] , etc. ...
Question generation aims to generate meaningful and fluent questions, which can address the lack of question-answer type annotated corpus by augmenting the available data. Using unannotated text with optional answers as input contents, question generation can be divided into two types based on whether answers are provided: answer-aware and answer-agnostic. While generating questions with providing answers is challenging, generating high-quality questions without providing answers is even more difficult, for both humans and machines. In order to address this issue, we proposed a novel end-to-end model called QGAE, which is able to transform answer-agnostic question generation into answer-aware question generation by directly extracting candidate answers. This approach effectively utilizes unlabeled data for generating high-quality question-answer pairs, and its end-to-end design makes it more convenient compared to a multi-stage method that requires at least two pre-trained models. Moreover, our model achieves better average scores and greater diversity. Our experiments show that QGAE achieves significant improvements in generating question-answer pairs, making it a promising approach for question generation.