Conference Paper

Antarlekhaka: A Comprehensive Tool for Multi-task Natural Language Annotation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Computational Linguistics is an interdisciplinary field of computer science and linguistics that focuses on designing computational models and algorithms for processing, analyzing, and generating human language. Over recent years, this field has made substantial progress. While its primary emphasis tends to center around widely spoken languages, there is equal importance in investigating languages that are not commonly spoken but have contributed immensely to the literature, culture, and philosophy of the society. Thus, this survey paper comprehensively delves into the exploration of computational tasks undertaken for Sanskrit, an ancient language of the Indian sub-continent steeped in a wealth of literary heritage. The purpose of this study is to provide an overview of the progress made thus far in the computational analysis of Sanskrit, while also reviewing the current digital infrastructure that supports these efforts. Additionally, our study also identifies potential avenues for future research, serving as a reference for anyone interested in advancing their exploration in this field.
Article
Full-text available
Motivation: Annotation tools are applied to build training and test corpora, which are essential for the development and evaluation of new natural language processing algorithms. Further, annotation tools are also used to extract new information for a particular use case. However, owing to the high number of existing annotation tools, finding the one that best fits particular needs is a demanding task that requires searching the scientific literature followed by installing and trying various tools. Methods: We searched for annotation tools and selected a subset of them according to five requirements with which they should comply, such as being Web-based or supporting the definition of a schema. We installed the selected tools (when necessary), carried out hands-on experiments and evaluated them using 26 criteria that covered functional and technical aspects. We defined each criterion on three levels of matches and a score for the final evaluation of the tools. Results: We evaluated 78 tools and selected the following 15 for a detailed evaluation: BioQRator, brat, Catma, Djangology, ezTag, FLAT, LightTag, MAT, MyMiner, PDFAnno, prodigy, tagtog, TextAE, WAT-SL and WebAnno. Full compliance with our 26 criteria ranged from only 9 up to 20 criteria, which demonstrated that some tools are comprehensive and mature enough to be used on most annotation projects. The highest score of 0.81 was obtained by WebAnno (of a maximum value of 1.0).
Article
Full-text available
This paper presents GATE Teamware—an open-source, web-based, collaborative text annotation framework. It enables users to carry out complex corpus annotation projects, involving distributed annotator teams. Different user roles are provided (annotator, manager, administrator) with customisable user interface functionalities, in order to support the complex workflows and user interactions that occur in corpus annotation projects. Documents may be pre-processed automatically, so that human annotators can begin with text that has already been pre-annotated and thus making them more efficient. The user interface is simple to learn, aimed at non-experts, and runs in an ordinary web browser, without need of additional software installation. GATE Teamware has been evaluated through the creation of several gold standard corpora and internal projects, as well as through external evaluation in commercial and EU text annotation projects. It is available as on-demand service on GateCloud.net, as well as open-source for self-installation.
Conference Paper
We introduce INCEpTION, a new annotation platform for tasks including interactive and semantic annotation (e.g., concept linking, fact linking, knowledge base population, semantic frame annotation). These tasks are very time consuming and demanding for annotators, especially when knowledge bases are used. We address these issues by developing an annotation platform that incorporates machine learning capabilities which actively assist and guide annotators. The platform is both generic and modular. It targets a range of research domains in need of semantic annotation, such as digital humanities, bioinformatics, or linguistics. INCEpTION is publicly available as open-source software. The full paper is available here: http://aclweb.org/anthology/C18-2002
Conference Paper
We introduce the brat rapid annotation tool (BRAT), an intuitive web-based tool for text annotation supported by Natural Language Processing (NLP) technology. BRAT has been developed for rich structured annotation for a variety of NLP tasks and aims to support manual curation efforts and increase annotator productivity using NLP techniques. We discuss several case studies of real-world annotation projects using pre-release versions of BRAT and present an evaluation of annotation assisted by semantic class disambiguation on a multicategory entity mention annotation task, showing a 15% decrease in total annotation time. BRAT is available under an open-source license from: http://brat.nlplab.org
How free is free word order in sanskrit
  • Amba Kulkarni
  • Preethi Shukla
  • Pavankumar Satuluri
  • Devanand Shukl
Amba Kulkarni, Preethi Shukla, Pavankumar Satuluri, and Devanand Shukl. 2015. How free is free word order in sanskrit. The Sanskrit Library, USA, pages 269-304.
doccano: Text annotation tool for human
  • Hiroki Nakayama
  • Takahiro Kubo
  • Junya Kamura
  • Yasufumi Taniguchi
  • Xu Liang
Hiroki Nakayama, Takahiro Kubo, Junya Kamura, Yasufumi Taniguchi, and Xu Liang. 2018. doccano: Text annotation tool for human. Software available from https://github.com/doccano/doccano.
Introduction to the conll-2003 shared task: languageindependent named entity recognition
  • Erik F Tjong Kim Sang
  • Fien De Meulder
Erik F Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: languageindependent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 142-147.
Folia linguistic annotation tool
  • Maarten Van Gompel
Maarten van Gompel. 2014. Folia linguistic annotation tool. https://github.com/proycon/flat.