Conference Paper

Automated Extraction of Normative References in Legal Texts.

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Italian Ministry of Justice, with the contributions of the researcher centres, universities and public bodies, are presently engaged in an effort to work out shared standards with which to represent legal texts. Documents standardised under uniform formats and structures make it possible to link up distinct bodies of norms, and this in turn makes it easier to find and look up norms and design tools with which to process them, as when doing legal drafting and bringing out consolidated texts. This function is enabled by marking up the different parts of a legal text: its identification data (indicating text type, text number, date of delivery, and the like), its partitions (e.g., the articles and sections that make up its layout), and the normative references it contains.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... By looking at development data, one can define a rule-based system with a set of rules that recognize the majority of entities in texts and do not produce many false positives. [6] worked out a set of regular expressions to recognize references to legal documents. They reported F-Measure of 85% evaluated on a database of IT legislation. ...
... Adding new development data is definitely more straightforward than editing contextual rules. ITA Regexps Rule-based [6] ITA Regexps Rule-based 85 % [10] DUT Regexps, Lists Rule-based 95 % Table 1 presents the systems for detecting references in legal English, Italian and Dutch texts developed recently. The systems apply different detecting techniques, like lists, POS tagger, parser, regular expressions and they belong to either hybrid or rulebased strategies. ...
... Strict and Lenient variants of performance measures allow dealing with partially correct matches in different ways: Strict measures consider all partially correct matches as incorrect (spurious, false positive), while Lenient measures consider all partially correct matches as correct (true positive). 6 We performed an experiment using 10-fold cross-validation. Statistical significance was computed using the corrected resampled (two tailed) t-Test [27], which is suitable for cross validation based experiments. ...
Conference Paper
We address the task of detection and classification of references in Czech court decisions, mainly we focus on references to other court decisions and acts. In addition, we are interested in detection of institutions that issued documents under consideration. We handle these references like entities in the task of Named Entity Recognition. We approach the task using machine learning methods, namely HMM and Perceptron algorithm and we report F-measure over 90% averaged over all entities. The results significantly outperform the systems published previously.
... Extracting specific rights and obligations from legal rules permits the creation of a knowledge base, as was possible with the logic programming efforts, to model the key elements of regulations and answer directed user queries. The major deontic logic efforts include: LEGOL, a formal LEGally Orientated Language for capturing obligations [39]; ON-LINE, an ONtologybased Legal INformation Environment for capturing and analyzing legal texts as legal knowledge [41]; work establishing the legal importance of monitoring permissions as well as obligations [10]; and systems for automated extraction of normative references from legal texts [8,33]. ...
... More recent efforts include automated extraction of normative references (e.g. specific rights and obligations) detailed in a legal text, and addressed the problem of the law's evolution by tracking changes over time [8,33]. This provides for some degree of traceability, as the system maintains information on each extracted section including its type, number, date, section and subpart headers, and the normative references [33]. ...
... specific rights and obligations) detailed in a legal text, and addressed the problem of the law's evolution by tracking changes over time [8,33]. This provides for some degree of traceability, as the system maintains information on each extracted section including its type, number, date, section and subpart headers, and the normative references [33]. However, these more recent projects were not completed, and there are few examples to illustrate the effectiveness of this approach. ...
Article
Full-text available
Legal texts, such as regulations and legislation, are increasingly playing an important role in requirements engineering and system development. Monitoring systems for requirements and policy compliance has been recognized in the requirements engineering community as a key area for research. Similarly, regulatory compliance is critical in systems that are governed by regulations and law, especially given that non-compliance can result in both financial and criminal penalties. Working with legal texts can be very challenging, however, because they contain numerous ambiguities, cross-references, domain-specific definitions and acronyms, and are frequently amended via new regulations and case law. Requirements engineers and compliance auditors must be able to identify relevant regulations, extract requirements and other key concepts, and monitor compliance throughout the software lifecycle. This paper surveys research efforts over the past 50 years in handling legal texts for systems development. These efforts include the use of symbolic logic, logic programming, first-order temporal logic, deontic logic, defeasible logic, goal modeling, and semi-structured representations. This survey can aid requirements engineers and auditors to better specify, monitor, and test software systems for compliance.
... As a remedy to these drawbacks, we present in this paper a system for managing the process whereby the documents in a normative system get consolidated. These shortcomings are overcome by providing a legal temporal model (Palmirani and Brighi, 2003) that respects legal principles and that is formalised into logic to permit automatic reasoning, and by investigating the use of the multi-agent paradigm in such a way as to allow proactivity and a larger view of the normative system. ...
... In other words, pieces of information contained in legislative documents have to be represented, marked up using any appropriate digital language. The legislative documents to be consolidated are inserted in the system using dedicated editors, such as Norma-Editor (Palmirani, 2000), that make it possible transform normative documents on the basis of a common XML representation language compliant with the NormeinRete project standards (Circolare 35) (Circolare 40). As has been argued in (Brighi 04) (Palmirani et al.,03), the semantic markup of normative references enables intelligent agents to reason and provide advice in consolidating documents. ...
... The concepts on which agents make their decisions are defined in ontologies written in OWL or in any other XML syntax. An OWL ontology of modifications has been defined (Palmirani and Brighi, 2003). ...
... Therefore, work has started to attempt to discover these relations automatically, using parsing techniques. Earlier work on Italian sources [2] has indicated that automated detection of references can be a very great help, as 85% of all references could be detected automatically, and another 9% could at least partially be detected. ...
... Thus, "member 1, article 1, Banklaw 1998" is a complete reference, whereas "member 1, article 1" is an incomplete reference. In [2] , the distinction between complete and incomplete references has also been made. There, they were named "well-formed references" and "not well-formed references". ...
... A test on six very diverse Dutch laws showed an accuracy of 95-99% and hardly any false positives. In the Norme-in-Rete project achieved a similar result was achieved on a much larger, but less diverse Italian corpus [2]. Their parser found 85% of the references, but only 35% could be resolved. ...
Conference Paper
Full-text available
Combining legal content stores of different providers is usually time, effort and money intensive due to the usually ‘hard-wired ’ links between different parts of the constituting sources within those stores. In practice users of legal content are confronted with a vendor lock-in situation and have to find workarounds when they want to combine their own content with the content provided by others. In the BSN project we developed a parser that enables the creation of a referential structure on top of a legal content store. We empirically tested the parsers ’ effectiveness and found an over 95 % accuracy even for complex references.
... Extracting specific rights and obligations from legal rules permits the creation of a knowledge base, as was possible with the logic programming efforts, to model the key elements of regulations and answer directed user queries. The major deontic logic efforts include: LEGOL, a formal LEGally Orientated Language for capturing obligations [45]; ON-LINE, an ONtology-based Legal INformation Environment for capturing and analyzing legal texts as legal knowledge [47]; work establishing the legal importance of monitoring permissions as well as obligations [11]; and systems for automated extraction of normative references from legal texts [9] [38]. Deontic logic approaches have not yet met users' needs for working with regulations and ensuring compliance. ...
... The ON-LINE system was able to deal with only small sections of legislation at a time and the usability of the ontology-based approach proved problematic during usability testing [47]. More recent efforts include automated extraction of normative references, such as specific rights and obligations, detailed in a legal text, and addressed the problem of the law's evolution by tracking changes over time [9] [38]. This provides for some degree of traceability, as the system maintains information on each extracted section including its type, number, date, section and subpart headers, and the normative references [38]. ...
... More recent efforts include automated extraction of normative references, such as specific rights and obligations, detailed in a legal text, and addressed the problem of the law's evolution by tracking changes over time [9] [38]. This provides for some degree of traceability, as the system maintains information on each extracted section including its type, number, date, section and subpart headers, and the normative references [38]. However, these more recent projects were not completed, and there are few examples to illustrate the effectiveness of this approach. ...
Conference Paper
Legal texts, such as regulations and legislation, are playing an increasingly important role in requirements engineering and system development. Monitoring systems for requirements and policy compliance has been recognized in the requirements engineering community as a key area for research. Similarly, regulatory compliance is critical in systems that are governed by regulations and law, especially given that non-compliance can result in both financial and criminal penalties. Working with legal texts can be very challenging, however, because they contain numerous ambiguities, cross-references, domain-specific definitions, and acronyms, and are frequently amended via new regulations and case law. Requirements engineers and compliance auditors must be able to identify relevant regulations, extract requirements and other key concepts, and monitor compliance throughout the software lifecycle. This paper surveys research efforts over the past 50 years in handling legal texts for systems development. These efforts include the use of symbolic logic, logic programming, first-order temporal logic, deontic logic, defeasible logic, goal modeling, and semi-structured representations. This survey can aid requirements engineers and auditors to better specify, monitor, and test software systems for compliance.
... The automatic extraction of references has been developed mainly with regard to their different formats. To this end, Adedjouma et al. [1] used gazzetters and concept markers; Palmirani et al. [17] used regular expressions; Harasta [12] used Conditional Random Files (CRF); Leitner et al. [15] used BiLSTM neural networks. Other authors adopted methods beyond natural language processing, such as ML, deep neural networks and CRF to extract citations [2]. ...
... This functionality relies on the fact that the structure of these documents is generally quite uniform. Regular expressions have previously been adopted in NLP applications to extract structured information from legal sources/cases in plain text [17,20]. ...
Conference Paper
Although with some discrepancy, both in common law and in civil law systems, previous judgments play an important role with respect to future decisions. Traditional legal methodologies usually involve the use of manual rather than automatic keyword search mechanisms to retrace the steps of judicial decision-making. However, these methods are generally highly time-consuming and can be subject to different types of biases. In this work, we present an automated extraction pipeline to map and structure citations in rulings regarding fiscal state aids in the case-law of the Court of Justice of the European Union. In particular, by exploiting the available XML data in the EUR-Lex platform, we built an end-to-end parser based on a set of regular expressions and heuristics, which is able to iteratively extract all citations, finally creating a hierarchical structure of citations with their contextual information at the paragraph level. A such data structure can be projected into a graphical representation, enabling useful visualization and exploration features and insights, such as the diachronic study of the development of specific citations and legal principles over time. Our work suggests how the exploitation and analysis of citation networks through automated means can provide significant tools to support traditional legal methodologies.
... In this work, we focus on the extraction of references to normative and regulatory documents such as laws, government decrees, and ministerial circulars from legal text in Vietnamese. In previous work (Italian [1,17], Spanish [15], Dutch [8], and Japanese [20]), references of this kind is known as normative references. We will use the term "reference" to refer to such normative references throughout the paper. ...
... Reference extraction from legal documents has been studied in different languages, including Italian [17], Spanish [15], Dutch [8], and Japanese [20]. Palmirani [20] introduce a framework that can extract references to sub-document targets from Japanese legal texts. ...
Conference Paper
Legal and regulatory texts are ubiquitous and important in our life. Automated processing of such documents using natural language processing and information retrieval techniques is desired. Many legal text processing problems require information extraction as a base component. In this paper, we address the task of extracting references from law and regulatory documents, which are necessary for recognition of the relations between documents and document parts, and other problems. We formulate the task as a sequence labeling problem and introduce several extraction models, consisting of both traditional (conditional random fields) and more advanced (deep neural networks) methods. In addition to features learned by deep networks, we investigate various types of manually engineered features that reflect the characteristics of legal documents. Our best model that combines bidirectional long short-term memory networks and conditional random fields achieves 95.35% in the F1 score on a corpus consisting of more than 11 thousand sentences from Vietnamese law and regulatory documents.
... Metalex [Boer et al., 2002, Boer et al., 2007, Boer et al., 2008, Boer, 2009 AkomaNtoso [Palmirani et al., 2003] produit des DTD pour les documents parlementaires, législatifs et judiciaires de plusieurs pays africains. Les Schémas XML AkomaNtoso rendent "visibles" la structure et la sémantique des composants pertinents de documents numériques afin de soutenir la création de services d'information à forte valeur ajoutée et accroître l'efficacité et la responsabilité dans le contexte parlementaire, législatif et judiciaire. ...
... Plusieurs techniques et outils ont été proposés pour l'exploitation du contenu de la réglementation [Lau, 2004, Geist, 2009, Chieze et al., 2010, Palmirani et al., 2003, Amardeilh et al., 2013. ...
Thesis
Full-text available
Une collection documentaire est généralement représentée comme un ensemble de documents mais cette modélisation ne permet pas de rendre compte des relations intertextuelles et du contexte d’interprétation d’un document. Le modèle documentaire classique trouve ses limites dans les domaines spécialisés où les besoins d’accès à l’information correspondent à des usages spécifiques et où les documents sont liés par de nombreux types de relations. Ce travail de thèse propose deux modèles permettant de prendre en compte cette complexité des collections documentaire dans les outils d’accès à l’information. Le premier modèle est basée sur l’analyse formelle et relationnelle de concepts, le deuxième est basée sur les technologies du web sémantique. Appliquées sur des objets documentaires ces modèles permettent de représenter et d’interroger de manière unifiée les descripteurs de contenu des documents et les relations intertextuelles qu’ils entretiennent.
... Therefore, we conclude that intersections of references in legal texts indicate that these texts deal with similar topics. Legal references can be extracted by supervised machine learning (Tran et al., 2013) or by predefined rules (Palmirani et al., 2003). We implement an approach with similarities to Palmirani et al. (2003) and define country-specific rules to extract references from law texts. ...
... Legal references can be extracted by supervised machine learning (Tran et al., 2013) or by predefined rules (Palmirani et al., 2003). We implement an approach with similarities to Palmirani et al. (2003) and define country-specific rules to extract references from law texts. References among legal texts are often differently abbreviated (e.g. ...
Conference Paper
Full-text available
In today’s globalized world, companies are faced with numerous and continuously changing legal requirements. To ensure that these companies are compliant with legal regulations, law and consulting firms use open legal data published by governments worldwide. With this data pool growing rapidly, the complexity of legal research is strongly increasing. Despite this fact, only few research papers consider the application of information systems in the legal domain. Against this backdrop, we pro-pose a knowledge management (KM) system that aims at supporting legal research processes. To this end, we leverage the potentials of text mining techniques to extract valuable information from legal documents. This information is stored in a graph database, which enables us to capture the relation-ships between these documents and users of the system. These relationships and the information from the documents are then fed into a recommendation system which aims at facilitating knowledge transfer within companies. The prototypical implementation of the proposed KM system is based on 20,000 legal documents and is currently evaluated in cooperation with a Big 4 accounting company.
... Metalex [Boer et al., 2002, Boer et al., 2007, Boer et al., 2008, Boer, 2009 AkomaNtoso [Palmirani et al., 2003] produit des DTD pour les documents parlementaires, législatifs et judiciaires de plusieurs pays africains. Les Schémas XML AkomaNtoso rendent "visibles" la structure et la sémantique des composants pertinents de documents numériques afin de soutenir la création de services d'information à forte valeur ajoutée et accroître l'efficacité et la responsabilité dans le contexte parlementaire, législatif et judiciaire. ...
... Plusieurs techniques et outils ont été proposés pour l'exploitation du contenu de la réglementation [Lau, 2004, Geist, 2009, Chieze et al., 2010, Palmirani et al., 2003, Amardeilh et al., 2013. ...
Thesis
Full-text available
Une collection documentaire est généralement représentée comme un ensemble de documents mais cette modélisation ne permet pas de rendre compte des relations intertextuelles et du contexte d'interprétation d'un document. Le modèle documentaire classique trouve ses limites dans les domaines spécialisés où les besoins d'accès à l'information correspondent à des usages spécifiques et où les documents sont liés par de nombreux types de relations. Ce travail de thèse propose deux modèles permettant de prendre en compte cette complexité des collections documentaire dans les outils d'accès à l'information. Le premier modèle est basée sur l'analyse formelle et relationnelle de concepts, le deuxième est basée sur les technologies du web sémantique. Appliquées sur des objets documentaires ces modèles permettent de représenter et d'interroger de manière unifiée les descripteurs de contenu des documents et les relations intertextuelles qu'ils entretiennent.
... Several approaches already exist for cross reference detection and resolution [6,11,18,26,33,34]; but, as we argue in more detail in Sect. 11, certain aspects of the problem have not been adequately addressed: -There are books and best-practice guides for drafting legal texts and cross references. ...
... Palmirani et al. [26] define cross reference patterns based on guidelines for the Italian legal corpus and apply their approach to several legal texts. However, they tackle only cross reference detection and not resolution. ...
Article
Full-text available
When identifying and elaborating compliance requirements, analysts need to follow the cross references in legal texts and consider the additional information in the cited provisions. Enabling easier navigation and handling of cross references requires automated support for the detection of the natural language expressions used in cross references, the interpretation of cross references in their context, and the linkage of cross references to the targeted provisions. In this article, we propose an approach and tool support for automated detection and resolution of cross references. The approach leverages the structure of legal texts, formalized into a schema, and a set of natural language patterns for legal cross reference expressions. These patterns were developed based on an investigation of Luxembourg’s legislation, written in French. To build confidence about their applicability beyond the context where they were observed, these patterns were validated against the Personal Health Information Protection Act (PHIPA) by the Government of Ontario, Canada, written in both French and English. We report on an empirical evaluation where we assess the accuracy and scalability of our framework over several Luxembourgish legislative texts as well as PHIPA.
... There exists some work related to this kind of researches, which was developed for many languages such as Italian [1,14], Spanish [11], and Dutch [10]. In these studies, the authors propose rule-based approaches to recognize references. ...
... To detect out references, most of previous work focus on rule-based approaches in which they build regular expressions or context free grammars to recognize them [1,10,11,14]. However, the disadvantage is that the the created systems must be constantly extended in order to provide rules for yet unseen cases. ...
Chapter
Full-text available
Sentences in the domain of legal texts are usually long and complicated. At the discourse level, they contains lots of reference phenomena which make the understanding of laws become more difficult. This paper investigates the task of reference resolution in the legal domain. The aim is to create a system which can automatically extracts referents for references in a real time. This is a new interesting task in the research of Legal Engineering. It does not only help readers in comprehending the law, support law makers in developing and amending laws, but also support in building an information system which works based on laws, etc. The main issues are to detect references and then resolve them to their referents. To detect references, we use a powerful machine learning technique rather than rule-based approaches as used in previous works. In resolving them, we design regular expressions to catch up the position of referents. We also build a corpus using Japanese National Pension Law to train and test our model. Our final system achieved 91.6% in the F1 score in detecting references, 96.18% accuracy in resolving them, and 88.5% in the F1 score in the end-to-end system.
... In past works we detected some regularity in the linguistic structure of modificatory provisions [14], and we showed how this regularity, coupled with a XML markup [12] can be used by automated tools to qualify a modificatory provision [13]. In particular, our approach relies on a tree-matching technique to put together deep parsing and shallow semantic interpretation [7]. ...
... The FrameNet model described above is designed to deal with legislative texts encoded in XML format, with some elements already annotated, in a supervised manner. A parser called Norma-Editor automatically detects references, dates, and allows adding metadata in legislative texts [14]. Norma-Editor is employed to convert legal texts in a XML format based on Legal XML standards (such as Akoma Ntoso and NiR, [2]). ...
Conference Paper
In this work we illustrate a novel approach for solving an information extraction problem on legal texts. It is based on Natural Language Processing techniques and on the adoption of a formalization that allows coupling domain knowledge and syntactic information. The proposed approach is applied to extend an existing system to assist human annotators in handling normative modificatory provisions ---that are the changes to other normative texts---. Such laws 'versioning' problem is a hard and relevant one. We provide a linguistic and legal analysis of a particular case of modificatory provision (the efficacy suspension), show how such knowledge can be formalized in a linguistic resource such as FrameNet, and used by the semantic interpreter.
... There exists some work related to this kind of research, developed for several languages such as Italian (Bolioli et al. 2002;Palmirani et al. 2003), Spanish (Mercedes et al. 2005), and Dutch (Maat et al. 2006). In previous work, authors focus on detecting and resolving so-called normative references to distinguish them from the above references. ...
... For Italian legal texts, there exists two typical studies. The first one is the work of Palmirani et al. (2003), in which the authors conduct a project to work out and implement a model for recognizing, understanding, normalizing the normative references found in legal texts and bringing such references under a set of common standards in order to favour the interoperability between different legal information systems. The second one is the work of Bolioli et al. (2002), in which the authors name references as citations. ...
Article
Full-text available
This paper investigates the task of reference resolution in the legal domain. This is a new interesting task in Legal Engineering research. The goal is to create a system which can automatically detect references and then extracts their referents. Previous work limits itself to detect and resolve references at the document targets. In this paper, we go a step further in trying to resolve references to sub-document targets. Referents extracted are the smallest fragments of texts in documents, rather than the entire documents that contain the referenced texts. Based on analyzing the characteristics of reference phenomena in legal texts, we propose a four-step framework to deal with the task: mention detection, contextual information extraction, antecedent candidate extraction, and antecedent determination. We also show how machine learning methods can be exploited in each step. The final system achieves 80.06 % in the F1 score for detecting references, 85.61 % accuracy for resolving them, and 67.02 % in the F1 score for the end-to-end setting task on the Japanese National Pension Law corpus.
... One of the main goals in the research conducted in the last ten years on digitalization in the legal domain has been to provide techniques for detecting the linguistic legal content from the text for favouring consolidation ( [15], [10], [3]), for helping the legal drafting activity ( [7], [17]) or for extracting arguments for supporting logic rules and metadata ( [1], [4]). ...
... The annotation of modificatory provisions -including suspensions-is a three steps process. Although these steps have been illustrated in previous work (full details are provided in [15]), we briefly recall them in order to make the paper more complete and readable. We then show how the FrameNet formalization is used in the semantic interpretation process, pointing out the benefits due to encoding the knowledge about modifications in declarative form. ...
Conference Paper
Full-text available
One open problem in the AI & Law community is how to provide computers with a basic understanding of legal concepts, and their relationship with legal texts and with the legal lexicon. We propose to add a layer to connect the linguistic description of the provisions to syntactic patterns using FramNet that can be exploited thought NLP tools. A deep-parsing and shallow-semantics approach has been devised to interpret and retrieve the characterizing components of legal modificatory provisions. In this paper we single out the case of efficacy suspension and show how FrameNet approach can provide profit especially to isolate temporal parameters and their interpretation.
... A precondition for this analysis is extracting the references from the text. Studies have shown that addressing this task is effective and can be done by applying rules or using learning models such as Named Entity Recognition (NER) [6,28,46,50]. By extracting the references, ILDE tools can illustrate to the draftsman the various effects of the text he writes on external provisions in the legal network. ...
... У [5] аутори су представили метод којим се из правних аката врши екстракција информација о референцираним нормативним актима. Метод је прилагођен референцама у италијанском правосуђу. ...
Article
U radu je izvršena ekstrakcija metapodataka novozelandskih sudskih odluka korišćenjem tehnika obrade prirodnog jezika. Predložene su metode mašinskog i dubokog učenja u cilju pronalaženja propisa relevantnih za donošenje odluka i izvršeno poređenje dobijenih rezultata. Pomoću ekstrahovanih referenci pronađeni su precedenti i dokumenti koji su posledica postupaka po pravnom leku na osnovu kojih su oformljene grupe povezanih dokumenata.
... For inter-and intralegislation links we have shown this can be done very effectively for the Dutch case [5] and others for other jurisdictions (e.g. [9] for Italy; [12] for Japan). For inter-case law links it is a bit more difficult, but it works for the Dutch case [15] [8]. ...
... The extraction of references from the Italian legislation based on regular expressions was reported by Palmirani et al. [12]. The main goal was to bring references under a set of common standards to ensure the interoperability between different legal information systems. ...
Preprint
Full-text available
In this paper, we introduce the citation data of the Czech apex courts (Supreme Court, Supreme Administrative Court and Constitutional Court). This dataset was automatically extracted from the corpus of texts of Czech court decisions - CzCDC 1.0. We obtained the citation data by building the natural language processing pipeline for extraction of the court decision identifiers. The pipeline included the (i) document segmentation model and the (ii) reference recognition model. Furthermore, the dataset was manually processed to achieve high-quality citation data as a base for subsequent qualitative and quantitative analyses. The dataset will be made available to the general public.
... Research study carried out in [9] perform metadata extraction to consolidate Italian legislative acts. Another research on Italian legal text [10] is focused on extracting normative references from text using pattern matching techniques. ...
Chapter
Due to advent of computing, content digitization and its processing is being widely performed across the globe. Legal domain is amongst many of those areas that provide various opportunities for innovation and betterment by means of computational advancements. In Pakistan, since last couple of years, courts have been reporting judgments for public consumption. This reported data is of great importance for judges, lawyers and civilians in various aspects. As this data is growing at rapid rate, there is dire need to process this huge amount of data to better address the need of respective stakeholders. Therefore, in this study, our aim is to develop a machine learning system that can automatically extract information out of public reported judgments of Lahore High Court. This information, once extracted, can be utilized in betterment for society and policy making in Pakistan. This study takes the first step to achieve this goal by means of extracting various entities from legal judgments. Total ten entities are being extracted that include dates, case numbers, reference cases, person names, respondent names etc. In order to automatically extract these entities, primary requirement was to construct dataset using legal judgments. Hence, firstly annotation guidelines are prepared followed by preparation of annotated dataset for entity extraction. Finally, various algorithms including Markov models and Conditional Random Fields are applied on annotated dataset. Experiments show that these approaches achieve reasonable well results for legal data extraction. Primary contribution of this study is development of annotated dataset on civil judgments followed by training of various machine learning models to extract the potential information from a judgment.
... Research study carried out in [9] perform metadata extraction to consolidate Italian legislative acts. Another research on Italian legal text [10] is focused on extracting normative references from text using pattern matching techniques. ...
Conference Paper
Full-text available
Due to advent of computing, content digitization and its processing is being widely performed across the globe. Legal domain is amongst many of those areas that provide various opportunities for innovation and betterment by means of computational advancements. In Pakistan, since last couple of years, courts have been reporting judgements for public consumption. This reported data is of great importance for judges, lawyers and civilians in various aspects. As this data is growing at rapid rate, there is dire need to process this huge amount of data to better address the need of respective stakeholders. Therefore, in this study, our aim is to develop a machine learning system that can automatically extract information out of public reported judgements of Lahore High Court. This information, once extracted, can be utilized in betterment for society and policy making in Pakistan. This study takes the first step to achieve this goal by means of extracting various entities from legal judgments. Total ten entities are being extracted that include dates, case numbers, reference cases, person names, respondent names etc. In order to automatically extract these entities, primary requirement was to construct dataset using legal judgments. Hence, firstly annotation guidelines are prepared followed by preparation of annotated dataset for entity extraction. Finally, various algorithms including Markov models and Conditional Random Fields are applied on annotated dataset. Experiments show that these approaches achieve reasonable well results for legal data extraction. Primary contribution of this study is development of annotated dataset on civil judgements followed by training of various machine learning models to extract the potential information from a judgement.
... Biagioli et al. [17] explore automatic methodologies for helping the manual identification of the type of normative rule and its elements. Palmirani et al. [20] propose a model for recognizing, understanding and normalizing the normative references of legal texts and standardize such references to increase interoperability of information systems. ...
Conference Paper
Full-text available
The great impact that law has on the design of software systems has been widely recognized in past years. However, little attention has been paid to the challenge of coping with variability characterizing the legal domain (e.g., multiple ways to comply with a given law, frequent updates to regulations, different jurisdictions, etc.) on the design of software systems. This position paper advocates the use of adaptation mechanisms in order to support regulatory compliance for software systems. First we show an example of how Zanshin, a requirements-based adaptation framework, can be used to design a system that adapts to legal requirements to accommodate legal variability. Then we examine how legal texts can be analyzed as sources for parameters and indicators needed to support adaptation. As motivating running example we consider legal situations concerning the Google driverless car and its recent legalization in the highways of Nevada and soon also in California.
... It will take some effort to convince "the legal drafter" to switch from his/her old faithful word processor and his/her manual system of handling amendments through a combination of glue and scissors, to use any kind of strange text editor. In the meantime, one of the most important tools will be the converter (for an example of a converter, see Palmirani et al. 2005). ...
Article
Full-text available
This report aims at providing an overview of the state of the art and of the prospects of the application of Information and Communication Technologies (ICT) in the legislative domain, in particular concerning the management of legislative documents. After a brief introduction on legal informatics, we focus on legislative informatics and identify the challenges it faces nowadays, in the framework of the Internet and the (semantic) Web. We then describe the evolution of the ICT-based management of legislative documents and identify and evaluate the emergent approaches, focusing on those based on open standards. The report is completed by two appendixes: the first reviews initiatives pertaining to the standard-based management of legal sources, the second reviews initiatives pertaining to semantic resources for legislation. Disclaimer The Working Papers of the Global Centre for ICT in Parliament are preliminary documents circulated in a limited number of copies and posted on the Global Centre website at http://www.ictparliament.org to stimulate discussion and critical comment. The views and opinions expressed herein are those of the author and do not necessarily reflect those of the Global Centre for ICT in Parliament. The designations and terminology employed may not conform to United Nations practice and do not imply the expression of any opinion whatsoever on the part of the Organization.
... - [2], [17] address automated extraction of normative references, such as specific rights and obligations, detailed in legal texts, and address the problem of the law's evolution by tracking changes over time. ...
Conference Paper
Full-text available
Today’s organizations are faced with the need to conform to various laws and regulations governing their domain of activity. The information systems (IS) supporting organizational activities have to align to these enforcements as well. The obligation of compliance is particularly stressed in domains in which legal framework determines the entire functioning of an organization. E-government is such a domain. This state-of-the-art study aims to investigate the practices of regulation analysis for extracting key information for IS engineering. The study addresses the practices in any regulation domain, not only the one of e-Government, and will focus on approaches aiming to achieve and maintain regulatory compliance of IS and services with given regulations.
... Evaluation of influence could also help officials to estimate the impact of changes to be made in the legal text. A related research has been reported by Palmirani et al. [1] concerning the extraction of references from unstructured legal text. The initial parsing phase of the research is rather similar -as references are expressed in a natural language, the research group developed a prototype to identify references using regular expressions. ...
Conference Paper
This paper introduces visualization methods for an initial approximation of the structure in any legislative act, procedures for relevance calculation and metrics for calculating the distance between the sections. To evaluate the methodology in real world scenarios, experimental results for the law of obligations are included and analyzed. The results of the study show that it is possible to get insightful and interpretable results when analyzing only the inner-structure of the legislative act constructed by references between the sections. Methods and metrics presented in this paper can be directly applied to current keyword-based searching systems of legal texts for relevance ranking enhancement.
... Since references in legal texts follow a quite regular pattern, the automatic parsing of these references is, although definitely not trivial, a feasible task [6] . The parser of references used in this paper is an adapted version of the parser proposed by de Maat et al. [7]. ...
Conference Paper
Full-text available
Organizing legislative texts into a hierarchy of legal topics enhances the access to legislation. Manually placing every part of new legislative texts in the correct place of the hierarchy, however, is expensive and slow, and therefore naturally calls for automation. In this paper, we assess the ability of machine learning methods to develop a model that automatically classifies legislative texts in a legal topic hierarchy. It is investigated whether such methods can generalize across different codes. In the classification process, the specific properties of legislative documents are exploited. Both the hierarchical structure of legal codes and references within the legal document collection are taken into account. We argue for a closer cooperation between legal and machine learning experts as the main direction of future work.
Article
Full-text available
This paper deals with an important task in legal text processing, namely reference and relation extraction from legal documents, which includes two subtasks: 1) reference extraction; 2) relation determination. Motivated by the fact that two subtasks are related and share common information, we propose a joint learning model that solves simultaneously both subtasks. Our model employs a Transformer-based encoder-decoder architecture with non-autoregressive decoding that allows relaxing the sequentiality of traditional seq2seq models and extracting references and relations in one inference step. We also propose a method to enrich the decoder input with learnable meaningful information and therefore, improve the model accuracy. Experimental results on a dataset consisting of 5031 legal documents in Vietnamese with 61,446 references show that our proposed model performs better results than several strong baselines and achieves an F 1 score of 99.4% for the joint reference and relation extraction task.
Article
We are recently witnessing a radical shift towards digitisation in many aspects of our daily life, including law, public administration and governance. This has sometimes been done with the aim of reducing costs and human errors by improving data analysis and management, but not without raising major technological challenges. One of these challenges is certainly the need to cope with relatively small amounts of data, without sacrificing performance. Indeed, cutting-edge approaches to (natural) language processing and understanding are often data-hungry, especially those based on deep learning. With this paper we seek to address the problem of data scarcity in automatic Legalese (or legal English) processing and understanding. What we propose is an ensemble of shallow and deep learning techniques called SyntagmTuner, designed to combine the accuracy of deep learning with the ability of shallow learning to work with little data. Our contribution is based on the assumption that Legalese differs from its spoken language in the way the meaning is encoded by the structure of the text and the co-occurrence of words. As result, we show with SyntagmTuner how we can perform important tasks for e-governance, as multi-label classification of the United Nations General Assembly (UNGA) Resolutions or legal question answering, with data-sets of roughly 100 samples or even less.
Article
The tabular structure of legal texts and the principles of their drafting result in the frequent use of various types of references, which has a negative impact on the comprehensibility of the law. As legal texts are nowadays drafted and made available in electronic format, it is reasonable to try to develop automated mechanisms for checking the correctness of references contained in these texts. The paper shows how a particular type of automated and dedicated information management mechanisms, offered by the so-called adaptive hypertexts, can be used for this purpose. The authors focus primarily on describing the specificity of this type of tools and on analyzing the possibilities, principles and prospects of their use in order to improve the quality of legal texts, in particular their comprehensibility.
Article
Full-text available
Smart legal systems carry immense potential to provide legal community and public with valuable insights using legal data. These systems can consequently help in analyzing and mitigating various social issues. In Pakistan, since last couple of years, courts have been reporting judgments online for public consumption. This public data, once processed, can be utilized for betterment of society and policy making in Pakistan. This study takes the first step to realize smart legal system by extracting various entities such as dates, case numbers, reference cases, person names, etc. from legal judgments. To automatically extract these entities, the primary requirement is to construct dataset using legal judgments. Hence, firstly annotation guidelines are prepared followed by preparation of annotated dataset for extraction of various legal entities. Experiments conducted using variety of datasets, multiple algorithms and annotation schemes, resulted into maximum F1-score of 91.51% using Conditional Random Fields.
Chapter
Within the OpenLaws.eu project, we attempt to suggest relevant new sources of law to users of legal portals based on the documents they are focusing on at a certain moment in time, or those they have selected. In the future we attempt to do this both based on ‘objective’ features of the documents themselves and on ‘subjective’ information gathered from other users (‘crowdsourcing’). At this moment we concentrate on the first method. In Sect. 10.2 I describe how we create the web of law if it is not available in machine readable form, or extend it when that is necessary. Next, I present results of experiments using analysis of the network of references or citations to suggest these new documents. In Sect. 10.3 I describe two experiments where we mix the use of network analysis with similarity based on the comparison of the actual text of documents. One experiment is based on simple bag-of-words and normalisation, the other uses Latent Dirichlet Allocation (LDA) with added n-grams. A small formative evaluation in both experiments suggests that text similarity alone works better than network analysis alone or a combination, at least for Dutch court decisions.
Chapter
We describe an annotated corpus of 350 decisions of Czech top-tier courts which was gathered for a project assessing the relevance of court decisions in Czech law. We describe two layers of processing of the corpus; every decision was annotated by two trained annotators and then manually adjudicated by one trained curator to solve possible disagreements between annotators. This corpus was developed as training and testing material for reference recognition tasks which will be further used for research on assessment of legal importance. However, the overall shortage of available research corpora of annotated legal texts, particularly in Czech language, leads us to believe that other research teams may find it useful.
Chapter
The aim of this paper is to produce a Japanese legal terminology consisting of legal terms and their explanations that includes accessible citations. Although we have succeeded in finding over 14,000 terms with high precision, 23.1 percent of the correct explanations included citations that were inaccessible due to context-dependent format. We propose a method for revising explanatory sentences that takes into account XML-tag annotation for context-independent format for all citations. The effectiveness of this method is confirmed by our experimental results.
Conference Paper
When elaborating compliance requirements, analysts need to follow the cross references in the underlying legal texts and consider the additional information in the cited provisions. To enable easier navigation and handling of cross references, automation is necessary for recognizing the natural language patterns used in cross reference expressions (cross reference detection), and for interpreting these expressions and linking them to the target provisions (cross reference resolution). In this paper, we propose a solution for automated detection and resolution of legal cross references. We ground our work on Luxembourg's legislative texts, both for studying the natural language patterns in cross reference expressions and for evaluating the accuracy and scalability of our solution.
Article
This paper presents a new approach to building legal citation classification systems. Our approach is based on Ripple-down Rules (RDR), an efficient knowledge acquisition methodology. The main contributions of the paper (over existing expert-systems approaches) are extensions to the traditional RDR approach introducing new automatic methods to assist in the creation of rules: using the available dataset to provide performance estimates and relevant examples, automatically suggesting and validating synonyms, re-using exceptions in different portions of the knowledge base. We compare our system LEXA with baseline machine learning techniques. LEXA obtains better results both in clean and noisy subsets of our corpus. Compared to machine learning approaches, LEXA also has other advantages such as supporting continuous extension of the rule base, and the opportunity to proceed without an annotated data set and to validate class labels while building rules.
Article
Full-text available
Resumen En este artículo se presenta la arquitectura de un sistema para la recuperación del derecho español de forma eficiente, que utiliza como base XML y registra me-tainformación junto al propio cuerpo legal. Esto permite definir una base de datos consolidada para recuperar el estado de un cuerpo legal en una fecha concreta. La estructura XML diseñada es suficientemente flexible para representar cualquier tipo de cuerpo legal existente en España. Abstract In this paper an architecture of a system is presented. This architecture has been designed for the efficient retrieval of Spanish laws, based on the metainformation structured in XML, to allow the definition of a consolidated database to retrieve the state of a legal norm in a certain date. The XML structure is sufficiently flexible to be able to represent all the types of legal documents that exist in Spain.
Thesis
Legal informatics is the discipline, which deals with the use of information communication technology to process legal information and support legal activities. A new approach is presented in this thesis to ease the access to legal sources and increase the usability of legislation. The problem originates from real life needs and is based partially on the author´s personal work in the public sector in close relation with litigation. The proposed method allows extracting the most meaningful parts of legal texts, visualise the content for human readers and represent it in a way which is easily accessible by computers. The method is based on an interdisciplinary approach, combining methods from language technology, system analysis and data analysis. The method is applied upon legal texts in order to overcome some built-in weaknesses of the legislation with the help of re-engineering and restructuring. This enhances structural exploration abilities of different users without interfering or altering the existing legislation and is tightly tied with it, to enable direct access to the sources of law. The method is in essence universal, based on extraction of meaningful words; it has many potential areas of application within and very likely outside the legal domain, which need to be explored further. The method is tested on selected part of the Estonian legislation, whereby the text processing and data analysis was done automatically. An extra compressed data layer upon the legislation was created enabling further visual and computerized analysis of the legal domain. A web based prototype was created to demonstrate the usability of the proposed method and allow the user to get personal hands-on experience. Method outcomes were analysed in cooperation with legal scientists in order to estimate the preliminary value of the visualisation and verify the method suitability for application within the legal domain. Experimental high level structural analysis of the chosen part of the legislation was performed and a new structural perspective to it is proposed for further discussion. Keywords: legislation, system engineering, restructuring, language processing, visualisation, data mining.
Chapter
This chapter presents the emerging standard-based approach to the management of legislative documents, an approach that best fits the needs of a Parliamentarian democracy in the Internet age. A presentation of the rationale of standard-based management of legislative documents is offered, introducing the objectives and scope of standards for legislative documents, and discussing the relevance they assume for different stake holders and computer applications.
Chapter
Laws and regulations are playing an increasingly important role in requirements engineering and systems development. Monitoring systems for requirements and policy compliance has been recognized in the requirements engineering community as a key area for research. Similarly, legal compliance is critical in systems development, especially given that non-compliance can result in both financial and criminal penalties. Working with legal texts can be very challenging, however, because they contain numerous ambiguities, cross-references, domain-specific definitions, and acronyms, and are frequently amended via new statutes, regulations, and case law. Requirements engineers and compliance auditors must be able to identify relevant legal texts, extract requirements and other key concepts, and monitor compliance. This chapter surveys research efforts over the past 50 years in handling legal texts for systems development. This survey can aid requirements engineers and auditors to better specify, test, and monitor systems for compliance.
Chapter
Full-text available
Currently almost all legislative bodies throughout Europe use general purpose word-processing software for the drafting of legal documents. These regular word processors do not provide specific support for legislative drafters and parliamentarians to facilitate the legislative process. Furthermore, they do not natively support metadata on regulations. This paper describes how the MetaLex regulation-drafting environment (MetaVex) aims to meet such requirements.
Chapter
One of the main emerging challenges in legal documentation is to capture the meaning and the semantics of normative content using NLP techniques, and to isolate the relevant part of the linguistic speech. The last five years have seen an explosion in XML schemas and DTDs whose focus in modelling legal resources their focus was on structure. Now that the basic elements of textual descriptiveness are well formalized, we can use this knowledge to proceed with content. This paper presents a detailed methodology for classifying modificatory provisions in depth and providing all the necessary information for semi-automatically managing the consolidation process. The methodology is based on an empirical legal analysis of about 29,000 Italian acts, where we bring out regularities in the language associated with some modifications, and where we define patterns of proprieties for each type of modificatory provision. The list of verbs and the frames inferred through this empirical legal analysis have been used by the NLP group at the University of Turin to refine a syntactical NLP parser for isolating and representing the sentences as syntactic trees, and the pattern will be used by the light semantic interpreter module to indentify the parameters of modificatory provisions. KeywordsLegal XML-Legal Ontology-NLP
Article
References to parts of structured documents use their structure to locate the piece of document which is the reference target. On the other hand, XML has become an increasingly important language for structured documents. One of its most important related languages is XPath, the language that permits fragments of XML documents to be selected. In this article we present a methodology, and an application case, to automatically extract and solve references to fragments of structured documents. This approach combines structure manipulation and information extraction, to enhance reference extraction tools by improving the precision of the references extracted. We take advantage of XML markup to locate the position within the structure in which the references are found. The use of XPath, one of the most important XML related languages, for reference resolution is original: the resolution tool automatically builds XPath expressions. This proposal is inspired (and implemented) from our work with legislative documents.
Conference Paper
Legal terms such as “owner”, “contract”, “possession”, “citizen” are “intermediaries” in the sense that they serve as vehicles of inference between statements of legal grounds, on one hand, and legal consequences, on the other. After introducing our approach to the representation of a normative system, we present a theory of “intervenients”, seen as a tool for analysing intermediaries. The paper is especially concerned with the subject-matter of open and closed intervenients as well as the related issue of negations of intervenients. Also, we introduce the idea of so-called gic-systems, where “gic” is an abbreviation of “ground-intervenient-consequence”.
Conference Paper
This essay shows how an analysis of the references found in normative texts can make it easier to find the same references and related information in a document base starting from randomly selected strings of text.
Manuale Per La Redazione Dei Testi Normativi
  • E Pattaro
  • G Sartor
  • A Capelli