Figure 7 - uploaded by Diana Maynard
Content may be subject to copyright.
1: The JAPE Debugger User Interface 

1: The JAPE Debugger User Interface 

Source publication
Article
Full-text available
Contents 1 Introduction 3 1.1 How to Use This Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Structure of the Book . . . . . . . . . . . . . . ....

Citations

... For the sake of illustration, consider a paragraph with 20 tokens, where each annotator has annotated different spans: tokens [3][4][5][6][7][8][9][10], [3][4][5][6][7][8][9][10][11][12], [7][8][9][10][11][12][13][14][15], respectively. From these annotations, we get: a union span of [3][4][5][6][7][8][9][10][11][12][13][14][15], thus ρ(U) = 13/20 = 0.65; an intersection span of [7][8][9][10], thus ρ(I) = 4/20 = 0.2; and a probabilistic span containing tokens [7][8][9][10] in three annotations, tokens [3][4][5][6] and [11][12] in only two annotations, and tokens [13][14][15] in a single annotation, thus ρ(P) = (4 × 3 + 6 × 2 + 3 × 1)/3/20 = 0.45. ...
... For the sake of illustration, consider a paragraph with 20 tokens, where each annotator has annotated different spans: tokens [3][4][5][6][7][8][9][10], [3][4][5][6][7][8][9][10][11][12], [7][8][9][10][11][12][13][14][15], respectively. From these annotations, we get: a union span of [3][4][5][6][7][8][9][10][11][12][13][14][15], thus ρ(U) = 13/20 = 0.65; an intersection span of [7][8][9][10], thus ρ(I) = 4/20 = 0.2; and a probabilistic span containing tokens [7][8][9][10] in three annotations, tokens [3][4][5][6] and [11][12] in only two annotations, and tokens [13][14][15] in a single annotation, thus ρ(P) = (4 × 3 + 6 × 2 + 3 × 1)/3/20 = 0.45. ...
... For the sake of illustration, consider a paragraph with 20 tokens, where each annotator has annotated different spans: tokens [3][4][5][6][7][8][9][10], [3][4][5][6][7][8][9][10][11][12], [7][8][9][10][11][12][13][14][15], respectively. From these annotations, we get: a union span of [3][4][5][6][7][8][9][10][11][12][13][14][15], thus ρ(U) = 13/20 = 0.65; an intersection span of [7][8][9][10], thus ρ(I) = 4/20 = 0.2; and a probabilistic span containing tokens [7][8][9][10] in three annotations, tokens [3][4][5][6] and [11][12] in only two annotations, and tokens [13][14][15] in a single annotation, thus ρ(P) = (4 × 3 + 6 × 2 + 3 × 1)/3/20 = 0.45. ...
Chapter
Annotating a corpus with argument structures is a complex task, and it is even more challenging when addressing text genres where argumentative discourse markers do not abound. We explore a corpus of opinion articles annotated by multiple annotators, providing diverse perspectives of the argumentative content therein. New annotation aggregation methods are explored, diverging from the traditional ones that try to minimize presumed errors from annotator disagreement. The impact of our methods is assessed for the task of argument density prediction, seen as an initial step in the argument mining pipeline. We evaluate and compare models trained for this regression task in different generated datasets, considering their prediction error and also from a ranking perspective. Results confirm the expectation that addressing argument density from a ranking perspective is more promising than looking at the problem as a mere regression task. We also show that probabilistic aggregation, which weighs tokens by considering all annotators, is a more interesting approach, achieving encouraging results as it accommodates different annotator perspectives. The code and models are publicly available at https://github.com/DARGMINTS/argument-density.
... Gate [17] A tool for text organization used to assist the text annotation. ...
Chapter
Brainstorming is an effective technique for seeking out ideas on a specific issue that can be expressed shortly and powerfully and then determine the best solution. As a method, It is especially popular in areas that rely on creativity such as industry and advertising. Many solutions are created in the service of digital brainstorming to enable better management, however, literature still reports that these techniques offer only partial solutions in themselves. In this work, we present an architecture of a support system for brainstorming activities based on content-based information extraction and Natural Language Processing. First results show that it is possible to make decisions automatically or to effectively help the user to make the right decisions.
... For the text corpus composition we have selected those of them who synthesize the concept most fully. Then, an exemplary text corpus is compiled including For the text corpus exploration the open-source language processing software toolkit GATE (Cunningham et al. 2018) is used. Two pipelines of GATE applications are composed -for each individual document and for corpus processing. ...
Conference Paper
Full-text available
The social sustainability is one of the three pillars of the sustainable development. In the article the respective domain ontology is developed as a part of the existing ontological metamodel of the sustainable development. А methodology for the ontology building is proposed including concept exploration, controlled vocabulary and thesaurus creating; ontology formalization, verification, querying and publication. The main concepts of the ontology are extracted from a text corpus covering well-established and authoritative sources, approved articles, standards and commonly accepted regulations of United Nations. For the purpose, manual selection and automatic text mining by the language processing toolkit GATE is performed. The ontology is formalized by OWL in Protégé environment and after verification its code in GitHub is published. Keywords: social sustainable development, domain ontology, concept extraction, text mining, ontology formalization
... The case elements considered include: case title, name of court, date of judgment, judge(s), suit number, parties in court, lead judge -where there exists more than one case decider, fact of the case, cause of action etc. In particular, annotation of the selected Nigeria Supreme Court case law was performed using GATE with A Nearly New Information Extraction System (ANNIE) components while extractive summarisation of the annotated case law was carried out using GATE with SUMMA plug-ins [27,28]. The study made use of 72 Nigeria Supreme Court electronic case law. ...
... The importance of evaluation in text engineering stems from the fact that what cannot be measured and expressed in either quality or quantity is inconsequential to man and oftentimes cannot be relied upon. Commonly, processes and operations in IE and IR are measured for purposes of dependency and trust using metrics such as Precision, Recall and F-measure [28,29]. Consequently, this research paper measured the Precision, Recall and F-measure of the case law's annotation for extractive summarisation using GATE; since GATE is the platform of annotation, extraction and summarisation. ...
... GATE is a complete text engineering tool not only because it supports most text engineering processes but also because it enables the processes, artifacts and systems built on it, to be evaluated for performance quality [28]. A veritable tool in GATE for evaluating annotation including those for IE is the Annotation Diff Tool (ADT). ...
Article
Full-text available
Legal reasoning and judicial verdicts in many legal systems are highly dependent on case law. The ever increasing number of case law make the task of comprehending case law in a legal case cumbersome for legal practitioners; and this invariably stifles their efficiency. Legal reasoning and judicial verdicts will therefore be easier and faster, if case law were in abridged form that preserves their original meaning. This paper used the General Information Extraction System Architecture approach and integrated Natural Language Processing, Annotation, and Information Extraction tools to develop a software system that does automatic extractive text summarisation of Nigeria Supreme Court case law. The summarised case law which were about 20% of their original, were evaluated for semantic preservation and has shown to be 83% reliable.
... The features examined are ones that can be identified automatically from text. Most of the analysis utilized Agent99 Analyzer (Cao, Crews, Lin, Burgoon, & Nunamaker, 2003), a software tool built on the open-source General Architecture for Text Engineering (GATE; Cunningham, 2002;Cunningham et al., 2005). GATE is a Javabased, object-oriented architecture and development environment for analyzing, processing, or generating natural language. ...
Article
Full-text available
Ample scientific research has confirmed significant linguistic differences between truthful and deceptive discourse in both laboratory and field experiments. That literature is reviewed, followed by presentation of an experiment that tested the effects of veracity on a wide array of linguistic indicators and tested which effects were moderated by motivation and modality. A 2 (veracity: truthful/deceptive) × 2 (incentives: high/low) × 3 (modality: FtF/audio/text) factorial experiment revealed that linguistic indicators of quantity, immediacy, vividness/dominance, specificity, complexity, diversity, and hedging/uncertainty were all affected by veracity, and veracity interacted with motivation in the latter four cases. Only personalism and affect failed to differ between truth and deception. Modality also affected language use but did not interact with veracity. Four linguistic indicators together successfully classified 76% of text-based deception and 76% to 78% of truthful responses from text, audio, and face-to-face interaction. The importance of context in predicting linguistic patterns is emphasized.
... Using various document formats such as PDF and HTML, the authors first extracted the regulations so that they could be converted into a machine-readable format. The list of extracted regulatory obligations was then processed using GATE [40], a text engineering platform, and the executable semantic rules were generated from the regulatory ontology. The authors implemented their proposed work using an industry case study that used Eudralex EU regulations. ...
Article
Full-text available
Literature on business process compliance (BPC) has predominantly focused on the alignment of the regulatory rules with the design, verification and validation of business processes. Previously surveys on BPC have been conducted with specific context in mind; however, the literature on BPC management research is largely sparse and does not accumulate a detailed understanding on existing literature and related issues faced by the domain. This survey provides a holistic view of the literature on existing BPC management approaches, and categories them based on different compliance management strategies in the context of formulated research questions. A systematic literature approach is used where search terms pertaining keywords were used to identify literature related to the research questions from scholarly databases. From initially 183 papers, we selected 79 papers related to the themes of this survey published between 2000-2015. The survey results reveal that mostly compliance management approaches center around three distinct categories namely: design-time (28%), run-time (32%) and auditing (10%). Also, organisational and internal control based compliance management frameworks (21%) and hybrid approaches make (9%) of the surveyed approaches. Furthermore, open research challenges and gaps are identified and discussed with respect to the compliance problem.
... GATE also incorporates a large number of plugins in order to integrate and leverage other projects and tools. GATE currently deals with the following languages: Arabic, Bulgarian, Cebuano, Chinese, English, French, German, Hindi, Italian, Romanian, Russian and Spanish [20,21]. GATE, by default, accepts input documents in different formats, including plain text, HTML, SGML, XML, RTF, RDF, UIMA CAS XML, and CoNLL format. ...
Chapter
Arabic Natural Language Processing (ANLP) has known a significant progress during the last years. As a result, several ANLP tools and applications have been developed such as tokenizers, Part Of Speech taggers, morphological analyzers, syntactic parsers, etc. However, most of these tools are heterogeneous and can hardly be reused in the context of other projects without modifying their source code. This problem is known to be common to all languages, that is why some advanced NLP language independent architectures have emerged such as GATE (Cunningham et al. ACL, 2002) [1] and UIMA (Apache UIMA Manuals and Guides, 2015) [2]. These architectures have significantly changed the way NLP applications are designed and developed. They provide homogenous structures for applications, better reusability and faster deployment. In this article, we present a comparative study of NLP architectures in order to specify which ones can suitably deal with Arabic language and its specificities.
... Both the construction process and its outputs are measurable and predictable. The literature of the field relates to both applications of relevant scientific results and a body of practice [15]". GATE components come in three flavours [15], as depicted in [15,16]. ...
... The literature of the field relates to both applications of relevant scientific results and a body of practice [15]". GATE components come in three flavours [15], as depicted in [15,16]. Figure 3 shows the JAPE rule example. ...
... The literature of the field relates to both applications of relevant scientific results and a body of practice [15]". GATE components come in three flavours [15], as depicted in [15,16]. Figure 3 shows the JAPE rule example. ...
Conference Paper
Stakeholders exchange ideas and describe requirements of the system in natural language at the early stage of software development. These software requirements tend to be unclear, incomplete and inconsistent. However, better quality and low cost of system development are grounded on clear, complete and consistent requirements statements. Requirements boilerplate is an effective way to minimise the ambiguity from the natural language requirements. But manual conformance of natural language requirements with boilerplate is time consuming and difficult task. This paper aims to automate requirements analysis phase using language processing tool. We propose a natural language requirement analysis model. We also present an open source General Architecture for Text Engineering (GATE) framework for automatically checking of natural language requirements against boilerplates for conformance. The evaluation of proposed approach shows that GATE framework is only capable of detecting ambiguity in natural language requirements. We also present the rules to minimise ambiguity, incompleteness, and inconsistency.
... As the annotators are not only classifying predefined mentions but can also define different mentions, traditional IAA measures such as Cohen's Kappa are less suited to this task. Therefore, we measured the IAA in terms of precision (P), recall (R) and F-measure (F 1 ) using exact matches, where the annotations must be correct for both the entity mentions and the entity types over the full spans (entity mention and label) [80]. Annotations which were partially correct were considered as incorrect. ...
Article
The large number of tweets generated daily is providing decision makers with means to obtain insights into recent events around the globe in near real-time. The main barrier for extracting such insights is the impossibility of manual inspection of a diverse and dynamic amount of information. This problem has attracted the attention of industry and research communities,resulting in algorithms for the automatic extraction of semantics in tweets and linking them to machine readable resources. While a tweet is shallowly comparable to any other textual content, it hides a complex and challenging structure that requires domain-specific computational approaches for mining semantics from it. The NEEL challenge series, established in 2013, has contributed to the collection of emerging trends in the field and definition of standardised benchmark corpora for entity recognition and linking in tweets, ensuring high quality labelled data that facilitates comparisons between different approaches. This article reports the findings and lessons learnt through an analysis of specific characteristics of the created corpora, limitations, lessons learnt from the different participants and pointers for furthering the field of entity recognition and linking in tweets.
... For implementing a BNF grammar over NL statements, we use JAPE (Java Annotation Patterns Engine). JAPE is a regular-expression-based pattern matching language, available as part of the GATE NLP workbench [27]. Fig. 6 shows an example JAPE script, which checks conformance to Rupp Each JAPE script consists of a set of phases, with each phase made up of a set of rules. ...
... JAPE provides various options for controlling the results of annotations when multiple rules match the same section in text, or for controlling the text segment that is annotated on a match. These options are brill, appelt, all, first, and once [27]. In our work, we make use of brill, appelt and first. ...
... Template-specific keywords, e.g., the modals and the conditional keywords, are grouped into keyword lists, often called gazetteers [27] in NLP. These lists decouple TCC rules from template-specific keywords, thus avoiding the need to change the rules when the keywords change. ...
Article
Templates are effective tools for increasing the precision of natural language requirements and for avoiding ambiguities that may arise from the use of unrestricted natural language. When templates are applied, it is important to verify that the requirements are indeed written according to the templates. If done manually, checking conformance to templates is laborious, presenting a particular challenge when the task has to be repeated multiple times in response to changes in the requirements. In this article, using techniques from Natural Language Processing (NLP), we develop an automated approach for checking conformance to templates. Specifically, we present a generalizable method for casting templates into NLP pattern matchers and reflect on our practical experience implementing automated checkers for two well-known templates in the Requirements Engineering community. We report on the application of our approach to four case studies. Our results indicate that: (1) our approach provides a robust and accurate basis for checking conformance to templates; and (2) the effectiveness of our approach is not compromised even when the requirements glossary terms are unknown. This makes our work particularly relevant to practice, as many industrial requirements documents have incomplete glossaries.