Conference PaperPDF Available

Party Identification of Legal Documents using Co-reference Resolution and Named Entity Recognition

Abstract and Figures

In the field of natural language processing, domain-specific information retrieval using given documents has been a prominent and ongoing research area. Automatic extraction of the legal parties (petitioner and defendant sets) involved in a legal case has a significant impact on the proceedings of legal cases. This is a study proposing a novel way to identify the legal parties in a given legal document. The motivation behind this study is that there are no proper existing systems to accurately identify the legal parties in a legal document. We combined several existing natural language processing annotators to achieve the goal of extracting legal parties in a given court case document. Then, our methodology was evaluated with manually labelled court case paragraphs. The outcomes of the evaluation demonstrate that our system is successful in identifying legal parties.
Content may be subject to copyright.
Party Identification of Legal Documents using
Co-reference Resolution
and Named Entity Recognition
Chamodi Samarawickrama, Melonie de Almeida, Nisansa de Silva,
Gathika Ratnayaka, and Amal Shehan Perera
Department of Computer Science & Engineering, University of Moratuwa, Sri Lanka
Abstract—In the field of natural language processing, domain-
specific information retrieval using given documents has been
a prominent and ongoing research area. Automatic extraction
of the legal parties (petitioner and defendant sets) involved in
a legal case has a significant impact on the proceedings of
legal cases. This is a study proposing a novel way to identify
the legal parties in a given legal document. The motivation
behind this study is that there are no proper existing systems
to accurately identify the legal parties in a legal document. We
combined several existing natural language processing annotators
to achieve the goal of extracting legal parties in a given court case
document. Then, our methodology was evaluated with manually
labelled court case paragraphs. The outcomes of the evaluation
demonstrate that our system is successful in identifying legal
parties. Index Terms—Legal party identification, Legal entity
identification, Co-reference resolution, NER
Law and order is a domain that suffers from a lack
of technology’s involvement to improve efficiency when
compared to other domains. Therefore, this research attempts
to contribute to a legal system that would be well capable of
extracting information from court cases and analyzing them,
providing the users with easy and meaningful access to the
information residing in them. This research particularly looks
at identifying the legal entities that belong to legal parties in
a given legal text.
Case Law is the law created by the courts that is different
from the common way of making a law that is enacting by
the parliaments through legislation. With the existence of
this law system where previous judicial decisions can be
presented as arguments, it is expected of the lawyers and legal
officers to be thoroughly educated on the similar cases that
can be brought forth as precedents. But due to many reasons,
primarily the high abundance of resources against the limited
time for preparation, going through records, and preparing
for a case has become a tedious task for the lawyers and
legal officials. This raises the need for an automated system
that has the capacity to perform searches and analysis on
legal records with a natural language understanding. When
delving into the complexities that come with the higher-level
objective of implementing the system, one preliminary and a
most crucial segment of it came out to be the identification
of the legal parties.
One of the core elements in a legal case is the legal
parties involved in the case. The entire document is wrapped
around these parties where the document proceeds to unravel
the arguments and counter-arguments basing the said parties.
Therefore, we believe that correctly identifying the legal
parties involved in a case is the first step in properly structuring
the document to retrieve and use the information in it. A
legal party may consist of one or more legal entities. With
this paper, we take the initial step of identifying all the legal
entities which belong to legal parties in the document, and in a
later stage, we would segregate them to the relevant petitioner
and defendant sets. Nevertheless, this seemingly easy task of
identifying parties is complicated by several factors requiring
a solution apart from the already existing systems.
Example 1
The decision whether to plead guilty also involves assessing the
respective consequences of a conviction after trial and by plea. See
INS v. St. Cyr, 533 U. S. 289, 322-323. When those consequences
are, from the defendant’s perspective, similarly dire, even the smallest
chance of success at trial may look attractive. For Lee, deportation
after some time in prison was not meaningfully different from
deportation after somewhat less time; he says he accordingly would
have rejected any plea leading to deportation in favor of throwing a
”Hail Mary” at trial.
The Example 1 is a paragraph extracted from Lee v. United
States [1] to demonstrate the complexity of the language
used in legal documents. Even though the above text quote
is taken without a leading text and therefore does not make
much sense in the context, it is clear that the language used
in the text is relatively complex with long sentences where
the writer tries to fit multiple ideas within a single sentence.
Even a human who reads these documents without prior
exposure to these documents may have a hard time grasping
the context since these legal documents maintain a native
style of writing. Hence, there is a need for a system that has
NLP capabilities fine-tuned for the legal domain.
A party in a legal context stands for a single person or
a group of persons that can be identified as entities to bear
accountability for the purpose of law. Majorly, it can be a
participant in a lawsuit or any other legal proceeding, the
outcome of which will interest the said participant. Two
parties are often found involved with a case; namely the
plaintiff; that is whoever the legal entity filing the suit, and
a defendant, the entity sued or charged against. Therefore,
a party may or may not be an actual person but rather a
certain body that has decisive capabilities. Additionally, a
person who only appears in the case to provide witnesses
is not considered to be a party. Thus, it is clear that the
process of identifying the parties in a legal case has to
do a lot more than identifying mere individuals or entities
appearing to be similar figures. This research is an attempt
made in automating a successful means to correctly identify
the entities that belong to a party in a legal case.
Example 2
Sentence 2.1: Petitioner Jae Lee moved to the United States from
South Korea with his parents when he was 13.
Sentence 2.2: During the plea process, Lee repeatedly asked his
attorney whether he would face deportation; his attorney assured
him that he would not be deported as a result of pleading guilty.
Example 2 contains two sentences extracted from Lee v.
United States [1] which demonstrates the ambiguity in decid-
ing legal entities in a legal document. It turns out that his
parents in Sentence 1.1 does not qualify to be a legal entity
in this case but is a mere mention of them to explain the
history of Petitioner Jae Lee. On the contrary, a similar case
presented as the Sentence 1.2 contains one of the major legal
entities in this case which here appears as his attorney. By
carefully analysing the entire document, we can reach that
understanding of information but that is by running the text
through various sieves in our brain and making logical assump-
tions and whatnot. With this paper, we propose a rudimentary
model to identify the legal entities in the document.
A. NLP in Legal Domain
Many attempts have been made to introduce and utilize NLP
in the field of law due to the increasing popularity of the
technology along with the ease and efficiency it can provide
a system that has to do with language and human interactions
with language. Extensive work in relation to the legal domain
has been carried out on areas such as ontology population [2],
semantic similarity [3], etc.
Identifying Participant Mentions and Resolving Their Coref-
erences in Legal Court Judgements [4] is a work tightly related
to the course of this research area. Within this research, Gupta
et al. have tried to address the issue of correctly co-referencing
entities that identify as parties in legal texts where terms
like petitioner,defendant,appellant appear as mentions and
slip away from getting identified by the existing generic co-
reference resolution systems.
Extensive work has been done in applications and sub-fields
such as legal discourse [5], [6], document retrieval [7], shift-
of-perspective [8], and ontology instance population [9] among
Additionally, many research projects are being conducted
to grasp the sentiments in these documents to further analyze
the documents in a logical perspective for the purpose of
using them as precedents. Therefore, we also believe that our
research would be of high value in supporting research as such
since identifying the legal parties in a legal document stands
as the first step in performing sentiment analysis in the legal
domain [10].
B. Co-reference Resolution
Co-reference resolution is the task of finding all expressions
that refer to the same entity in a text. It is an important step
for a lot of higher-level Natural Language Processing tasks
like information extraction.
Example 3
Sentence 3.1: Petitioner Jae Lee moved to the United States from
South Korea with his parents when he was 13.
Sentence 3.2: In the 35 years he has spent in this country, he has
never returned to South Korea, nor has he become a U.S. citizen,
living instead as a lawful permanent resident.
The two sentences presented in the Example 3 is also from
the case of Lee v. United States [1] where the Sentence 3.1
is immediately followed by the Sentence 3.2. The words his,
he in Sentence 3.1 and the words he,he,he in Sentence 3.2
all refer to the same person that is Petitioner Jae Lee. The co-
reference resolution basically creates a mapping between these
words, rather referred to as tokens in NLP, implying they are
in fact referring to the same entity.
The initial step in our proposed methodology is to have a
preliminary set of entities that could or could not be legal
entities so we can get them through the models we have
implemented to calculate the probability of them being a legal
entity. We use co-reference resolution to get this done where
we pick the headword of each coref cluster as an entity.
We mainly considered the Stanford Co-reference Resolution
Model [11], [12] and the Spacy Co-reference Resolution
system for this purpose and decided to use Stanford Co-
reference Resolution model since it performed better for an
experiment we carried for both systems to decide which works
better for our domain. For this, we evaluated both the systems
with 10 cases we picked from legal documents and considered
the precision, recall, and F1 values to measure performance.
The table for this is included in the experiments section.
Stanford Co-reference Resolution system is built upon the
following of Clark and Manning [11] whereas the neuralcoref,
the Spacy system also makes use of the same research. Since
our initial evaluations demonstrated that Stanford Co-reference
Resolution system works better on our domain, we decided
to continue our work with that model. That model has used
Reinforcement learning to directly optimize a neural mention-
ranking model for co-reference evaluation metrics.
C. Named Entity Recognition
Named Entity Recognition is the task of locating and
classifying named entities into predefined categories. These
categories may include persons, organizations, locations, dates
and times, etc. Stanford named entity recognition model and
Spacy are two main models that can be used for this task.
These tools are capable of identifying named entities under
multiple different categories as mentioned above. In our case,
we are working on identifying only legal parties. Only a person
or an organization can be considered as a potential legal class.
Performance of the above systems to identify persons and
organizations with respect to precision, recall, and f1 value on
selected 10 examples are discussed in the Table II. Stanford
Named Entity Recognition Model works better on our domain
because we focus only on how well the model identifies
persons and organizations.
D. Subject Identification
In the context of English grammar, a subject of a sentence or
a clause is an entity that usually explains what the sentence is
about or what performs the action(if there is any). Identifying
the subject in a sentence indirectly can provide knowledge
about who has authority over whom or what. This idea is
used in our study in identifying the potential legal parties
where we award a certain set of co-ref mentions a score in an
instance where it acts as the subject of a sentence or a phrase.
Dependency parsing plays a significant role in achieving the
goal of extracting the subject from a sentence/phrase. Work
on this area has been carried out by Chen and Manning in the
research of which is used in building the dependency parser
for Stanford CoreNLP [13].
Fig. 1. Legal Party Identification Model
The Figure 1 shows the overall design of our model that
comprises two different sub-models. The first model is the
Legal Entity Identification Model, which does a rudimentary
extraction of legal entities from the given legal case. The
output from the said model is then passed onto a second model
named Probability Calculation Model, where the probability
for each legal entity to belong to a legal party in the case
is calculated. Having calculated the probabilities of all the
entities likewise, entities that belong to legal parties are
selected out of them based on a pre-decided threshold. The
Legal party identification model is evaluated with respect to
different thresholds using a data set of US supreme court
opinions paragraphs with manually identified legal parties.
A. Legal Entity Identification Model
Fig. 2. Legal Entity Identification Model
This model visualized in Figure 2 basically uses co-
reference resolution to find the set of entities in the given text
and all expressions that refer to the same entity. We use the
neural co-reference system in the Stanford CoreNLP to achieve
this task due to the proven better performance of the Stanford
system in co-reference resolution as shown in the Table II. This
set of entities extracted from the co-reference resolution is then
run through the next step where the NER test is performed in
order to identify the person and organizations entities because
only person and organization entities are considered as legal
entities. We use the Stanford Named Entity Recognizer [14]
since it showcased better performance for the test cases we
tested the NER system with as shown in the Table II. The
problem we encountered with that system is that it labels each
word or rather each token of the given text separately. We
came up with a novel method to tackle this issue with a rule-
based algorithm that is capable of putting together a group
of tokens to form a whole of single NER tag. Algorithm 1
describes the above process. This algorithm basically checks
Algorithm 1 NER check
Input: token list of a single sentence tokens, current word
word, NER of the current word NER
Output: Concatenated tokens of the same NER
1: function BUILDNER(tokens,word,NER)
2: if tokens[0] == NER then
3: word =word +BUI LD NER(tokens[1 :
],tokens[0].word,NE R)
4: end if
5: return word
6: end function
a sequence of tokens(generated by tokenizing a text using
Stanford annotator) for consecutive tokens having the same
NER tag, and concatenates them to form a single entity
with the same NER. Thus using the output of co-reference
resolution model and the NER model, we acquire the co-ref
mention sets which are either persons or organizations. The
model outputs the headwords(that is usually either the mention
with the NER tag of PERSON or ORGANIZATION or the first
mention of the entity), of these final sets as the set of legal
Additionally, we edit the original court case document with
co-reference resolution output for further usage. For that, this
model replaces all the mentions of a certain co-ref mention
set with each of its headwords.
B. Probability Calculation Model
Fig. 3. Probability Calculation Model1
This model visualized as in Figure 3 takes the set of
PERSON/ORGANIZATION entities and the edited court case
document as inputs and outputs the legal entities with the
probability of each legal entity to belong to a legal party of the
case. For this task, the model considers the number of times
each legal entity is mentioned as a subject of a sentence in the
text. The concept behind this method is that if a given legal
entity is the subject of the most number of sentences, then
it is most likely a legal party because it is the main entity
that is being discussed in the document. We can calculate the
probability of other legal entities to be in a legal party using
the number of times that legal entity has become the subject
of a sentence with respect to the maximum number of times
a legal entity has become the subject of a sentence.
The output of this algorithm is basically a dictionary in
which keys(K={k1,k2,}) are the identified legal entities
and the values are the probabilities (P={p1,p2,..pi.. pn}) of
those entities to be a legal party. Algorithm 2 explains the
process of this model. First, this model takes the list of legal
entities as keys(ki) of a dictionary and initializes the current
value of each of the key (vi) to be zero. Then the model takes
each sentence (sj) of the edited document(D) and checks the
subject(S) of that sentence (sj) using Stanford dependency
parser [13]. If the subject is a key (SK) of the input
dictionary, it increases the current value (vi) corresponding
to that key (ki) by one.
vi=vi+1 (1)
Algorithm 2 Probability Calculation Model
Input: set of legal entities(K), Edited court case docu-
Output: set of legal entities and their probabilities to be
a legal party(setO f (ki:pi))
2: for each sjin D do
3: SSubjectOf(sj)
4: for each kiin K do
5: if ki== Sthen
6: vi=vi+1
7: end if
8: end for
9: end for
10: VMax(vi)
11: for each kiin K do
12: pi=vi/V
13: end for
return setO f (ki:pi)
14: end function
Equation 1 shows how scores are awarded at a mentioning
of an entity as a subject of a sentence where viis the current
value of ki. Likewise every time a key (ki) appears as the
subject of a sentence in the edited document, the current value
(vi) of the key (ki) increases. After going through all the
sentences (sj) of the edited document(D), this process ends
and the probabilities are calculated by dividing current values
(vi) of all the keys(ki) by the maximum current value (Vi).
Then it outputs a dictionary of headwords (keys/ki) and their
probability to be an actual legal party (pi).
Equation 2 shows how the final values for the probabilities
are calculated where, viis the current value of ki,Viis the
Max(vi), and piis the probability of ki.
After getting the output from this model where each entity
is associated with a probability of being a legal entity, we
introduce a threshold value to decide if an entity qualifies to
be a legal entity or not. For evaluation purposes, we used
threshold values of 0.3, 0.4, 0.5, 0.6, 0.7 and 0.8 to see how
the system performs with each threshold level.
Example 4
In fiscal year 2002, petitioner Roberts was injured at an Alaska marine
terminal while working for respondent Sea-Land Services, Inc. Sea-
Land voluntarily paid Roberts benefits until fiscal year 2005. Roberts
then filed an LHWCA claim, and Sea-Land controverted. In fiscal
year 2007, an ALJ awarded Roberts benefits at the fiscal year 2002
statutory maximum rate. Roberts sought reconsideration, contending
that the award should have been set at the higher statutory maximum
rate for fiscal year 2007, when, he argued, he was ”newly awarded
compensation” by order of the ALJ. The ALJ denied his motion,
and the Department of Labor’s Benefits Review Board affirmed,
concluding that the pertinent maximum rate is determined by the
date disability commences.
The final probability output for the text paragraph in
Example 4 is as follows:
’petitioner Roberts’: 1.0, ’Roberts benefits’:
0.15384615384615385, ’respondent Sea-Land Services , Inc.’:
0.46153846153846156, ’an ALJ’: 0.38461538461538464
Identified legal entities of this text paragraph are Petitioner
Roberts, Sea-Land Services and ALJ. With a threshold of
0.3, we can capture all 3 but with a threshold of 0.4, ALJ is
no longer identified as a legal entity. So even though we can
improve the precision of the system by raising the threshold
level, the recall of the system drops as we do so.
A. Setup
Natural Language Software: All experiments are run using
the Stanford CoreNLP tools. Tools for NER, co-reference
resolution and dependency parsing were specifically used
within our models to come up with the presented results.
1) Datasets: We handpicked paragraphs consisting of 2-
5 sentences and created our own database of 100 such test
cases to test against our system. This paragraph set accounted
for nearly 200-300 separate legal parties that were used to
test the system. The legal cases from which the paragraphs
were taken out are from the dataset presented in the paper
Synergistic Union of Word2Vec and Lexicon for Domain
Specific Semantic Similarity [3]. It contains a large legal data
text corpus, several word2vec embedding models of the words
in the said corpus, and a set of legal domain gazetteer lists.
We used the legal data text corpus in the data set to extract
paragraphs and created the test cases, 100 in number which is
used to measure the performance of our model.
2) Performance Measures: Each test case was run sepa-
rately in our model and the output was compared with the
expected outputs.
B. Performance
We tested our system for different thresholds and the results
are as shown in the Table I
Threshold Precision Recall F1
0.3 0.84 0.68 0.75
0.4 0.87 0.63 0.73
0.5 0.88 0.62 0.72
0.6 0.88 0.56 0.69
0.7 0.90 0.54 0.67
0.8 0.89 0.52 0.65
These values are obtained by running text paragraphs through
our system. Figure 4 shows a comparison of the precision,
recall and F1 scores for different threshold levels. It clearly
shows how the recall drops when the threshold is increased and
how the precision increases when the threshold is increased.
The reason as if to why the recall drops is that as the threshold
is increased, the system looks for a higher probability value in
each legal entity to make the final call of whether it belongs
to a legal party or not.F1 score goes to its maximum when the
threshold is 0.7.
Fig. 4. Precision, Recall and F1 vs Threshold
1) Co-reference Resolution Comparison: To get a clear
comparison between the two systems on how they perform
specifically on our domain, we decided to carry out an initial
evaluation on our own. For this, we chose a set of sentences
taken out from court cases and performed co-reference reso-
lution on them with both of the systems. Table II shows the
results we got from the initial evaluation.
NER Co-reference
Stanford Spacy Stanford Spacy
Precision 1.00 0.94 0.93 0.98
Recall 0.95 0.94 0.92 0.77
F1 0.97 0.94 0.92 0.86
Since the F1 score of the Stanford system came out to be
higher than of spacy, we decided to use the Stanford system for
our co-reference resolution purposes. We also took the recall
into consideration since in our system co-reference is used
for the preliminary stage of extracting entities. The entities
recognized by co-referencing is further filtered out in the next
steps, and therefore we believe having a higher recall rather
than precision is not harmful on this level. Figure 5 visualizes
the difference between the performance of the two systems in
terms of precision, recall and F1 score.
2) NER Comparison: We carried out an evaluation
similarly for NER on Stanford and spacy systems. The results
turned out to be as shown in the Table II. As visualized
in the Figure 6, Stanford performed better with respect
to all three measurements. And most importantly Stanford
had an excellent precision which is crucial for us in our
Fig. 5. Stanford vs Spacy Co-reference Resolution Performance Comparison
Fig. 6. Stanford vs Spacy NER Comparison
Developing a methodology to identify the legal entities that,
belong to legal parties in a given legal document is the main
research objective of this study. This study discusses a novel
methodology to perform the said task by integrating already
existing technologies into an optimum setting along with two
algorithms to extract the legal entities in a document. We
managed to tackle the issue of identifying the NER value of
an entity with multiple tokens by coming up with an algorithm
based on logical assumptions and to complete the system, we
introduced a novel methodology to use a scoring system to
calculate the probability of an entity to be a legal party where
the entities are awarded points whenever they appear to be
the subject of a sentence. We have proven the success of the
proposed system by conducting experiments on the models and
evaluating them with respect to performance measurements
such as precision, recall, and F1 score.
[1] “Lee v. United States,” in US, vol. 432, no. No. 76-5187. Supreme
Court, 1977, p. 23.
[2] V. Jayawardana, D. Lakmal, N. de Silva, A. S. Perera, K. Sugathadasa,
and B. Ayesha, “Deriving a representative vector for ontology classes
with instance word vector embeddings,” in 2017 Seventh International
Conference on Innovative Computing Technology (INTECH). IEEE,
2017, pp. 79–84.
[3] K. Sugathadasa, B. Ayesha, N. de Silva, A. S. Perera, V. Jayawardana,
D. Lakmal, and M. Perera, “Synergistic union of word2vec and lexicon
for domain specific semantic similarity,” in 2017 IEEE International
Conference on Industrial and Information Systems (ICIIS). IEEE, 2017,
pp. 1–6.
[4] A. Gupta, D. Verma, S. Pawar, S. Patil, S. Hingmire, G. K. Palshikar, and
P. Bhattacharyya, “Identifying participant mentions and resolving their
coreferences in legal court judgements,” in International Conference on
Text, Speech, and Dialogue. Springer, 2018, pp. 153–162.
[5] G. Ratnayaka, T. Rupasinghe, N. de Silva, M. Warushavithana, V. Gam-
age, and A. S. Perera, “Identifying relationships among sentences in
court case transcripts using discourse relations,” in 2018 18th Interna-
tional Conference on Advances in ICT for Emerging Regions (ICTer).
IEEE, 2018, pp. 13–20.
[6] G. Ratnayaka, T. Rupasinghe, N. de Silva, M. Warushavithana, V. S.
Gamage, M. Perera, and A. S. Perera, “Classifying sentences in court
case transcripts using discourse and argumentative properties,ICTer,
vol. 12, no. 1, 2019.
[7] K. Sugathadasa, B. Ayesha, N. de Silva, A. S. Perera, V. Jayawardana,
D. Lakmal, and M. Perera, “Legal document retrieval using document
vector embeddings and deep learning,” in Science and information
conference. Springer, 2018, pp. 160–175.
[8] G. Ratnayaka, T. Rupasinghe, N. de Silva, V. S. Gamage,
M. Warushavithana, and A. S. Perera, “Shift-of-perspective identification
within legal cases,” arXiv preprint arXiv:1906.02430, 2019.
[9] V. Jayawardana, D. Lakmal, N. de Silva et al., “Word vector embeddings
and domain specific semantic based semi-supervised ontology instance
population,” International Journal on Advances in ICT for Emerging
Regions, vol. 10, no. 1, p. 1, 2017.
[10] V. Gamage, M. Warushavithana, N. de Silva, A. S. Perera, G. Ratnayaka,
and T. Rupasinghe, “Fast approach to build an automatic sentiment
annotator for legal domain using transfer learning,” arXiv preprint
arXiv:1810.01912, 2018.
[11] K. Clark and C. D. Manning, “Deep reinforcement learning for mention-
ranking coreference models,” arXiv preprint arXiv:1609.08667, 2016.
[12] ——, “Improving coreference resolution by learning entity-level dis-
tributed representations,” arXiv preprint arXiv:1606.01323, 2016.
[13] D. Chen and C. D. Manning, “A fast and accurate dependency parser
using neural networks,” in Proceedings of the 2014 conference on
empirical methods in natural language processing (EMNLP), 2014, pp.
[14] J. R. Finkel, T. Grenager, and C. Manning, “Incorporating non-local
information into information extraction systems by gibbs sampling,” in
Proceedings of the 43rd annual meeting on association for computa-
tional linguistics. Association for Computational Linguistics, 2005,
pp. 363–370.
... The legal domain is such a domain in which research interests have come up over the recent past, and many research have been carried out considering its different aspects. Party based sentiment analysis [1][2][3], extracting parties of a legal case [4][5][6], detecting important sentences of a court case and predicting the outcome of a court case are such important aspects. ...
The advancement of Natural Language Processing (NLP) is spreading through various domains in forms of practical applications and academic interests. Inherently, the legal domain contains a vast amount of data in text format. Therefore it requires the application of NLP to cater to the analytically demanding needs of the domain. Identifying important sentences, facts and arguments in a legal case is such a tedious task for legal professionals. In this research we explore the usage of sentence embeddings for multi-class classification to identify important sentences in a legal case, in the perspective of the main parties present in the case. In addition, a task-specific loss function is defined in order to improve the accuracy restricted by the straightforward use of categorical cross entropy loss.
... As the number of legal cases increases, legal professionals typically endure heavy workloads on a daily basis, and they may become overwhelmed and as a result of that, be unable to obtain quality analysis. In this analysis process, identifying advantageous and disadvantageous statements relevant to legal parties [1][2][3][4] can be considered a critical and time consuming task. By automating this task, legal officers will be able to reduce their workload significantly. ...
Full-text available
Legal information retrieval holds a significant importance to lawyers and legal professionals. Its significance has grown as a result of the vast and rapidly increasing amount of legal documents available via electronic means. Legal documents, which can be considered flat file databases, contain information that can be used in a variety of ways, including arguments, counter-arguments, justifications, and evidence. As a result, developing automated mechanisms for extracting important information from legal opinion texts can be regarded as an important step toward introducing artificial intelligence into the legal domain. Identifying advantageous or disadvantageous statements within these texts in relation to legal parties can be considered as a critical and time consuming task. This task is further complicated by the relevance of context in automatic legal information extraction. In this paper, we introduce a solution to predict sentiment value of sentences in legal documents in relation to its legal parties. The Proposed approach employs a fine-grained sentiment analysis (Aspect-Based Sentiment Analysis) technique to achieve this task. Sigmalaw PBSA is a novel deep learning-based model for ABSA which is specifically designed for legal opinion texts. We evaluate the Sigmalaw PBSA model and existing ABSA models on the SigmaLaw-ABSA dataset which consists of 2000 legal opinion texts fetched from a public online data base. Experiments show that our model outperforms the state-of-the-art models. We also conduct an ablation study to identify which methods are most effective for legal texts.
Full-text available
Information that are available in court case transcripts which describes the proceedings of previous legal cases are of significant importance to legal officials. Therefore, automatic information extraction from court case transcripts can be considered as a task of huge importance when it comes to facilitating the processes related to legal domain. A sentence can be considered as a fundamental textual unit of any document which is made up of text. Therefore, analyzing the properties of sentences can be of immense value when it comes to information extraction from machine readable text. This paper demonstrate how the properties of sentences can be used to extract valuable information from court case transcripts. As the first task, the sentence pairs were classified based on the relationship type which can be observed between the two sentences. There, we defined relationship types that can be observed between sentences in court case transcripts. A system combining a machine learning model and a rule-based approach was used to classify pairs of sentences according to the relationship type. The next classification task was performed based on whether a given sentence provides a legal argument or not. The results obtained through the proposed methodologies were evaluated using human judges. To the best of our knowledge, this is the first study where discourse relationships between sentences have been used to determine relationships among sentences in legal court case transcripts. Similarly, this study provides novel and effective approaches to identify argumentative sentences in a court case transcripts.
Conference Paper
Full-text available
Case Law has a significant impact on the proceedings of legal cases. Therefore, the information that can be obtained from previous court cases is valuable to lawyers and other legal officials when performing their duties. This paper describes a methodology of applying discourse relations between sentences when processing text documents related to the legal domain. In this study, we developed a mechanism to classify the relationships that can be observed among sentences in transcripts of United States court cases. First, we defined relationship types that can be observed between sentences in court case transcripts. Then we classified pairs of sentences according to the relationship type by combining a machine learning model and a rule-based approach. The results obtained through our system were evaluated using human judges. To the best of our knowledge, this is the first study where discourse relationships between sentences have been used to determine relationships among sentences in legal court case transcripts.
Full-text available
An ontology defines a set of representational primitives which model a domain of knowledge or discourse. With the arising fields such as information extraction and knowledge management, the role of ontology has become a driving factor of many modern day systems. Ontology population, on the other hand, is a inherently problematic process, as it needs manual intervention to prevent the conceptual drift. The semantic sensitive word embedding has become a popular topic in natural language processing with its capability to cope with the semantic challenges. Incorporating domain specific semantic similarity with the word embeddings could potentially improve the performance in terms of semantic similarity in specific domains. Thus, in this study we propose a novel way of semi-supervised ontology population through word embeddings and domain specific semantic similarity as the basis. We built several models including traditional benchmark models and new types of models which are based on word embeddings. Finally, we ensemble them together to come up with a synergistic model which outperformed the candidate models by 33% in comparison to the best performed candidate model.
Full-text available
Selecting a representative vector for a set of vectors is a very common requirement in many algorithmic tasks. Traditionally, the mean or median vector is selected. Ontology classes are sets of homogeneous instance objects that can be converted to a vector space by word vector embeddings. This study proposes a methodology to derive a representative vector for ontology classes whose instances were converted to the vector space. We start by deriving five candidate vectors which are then used to train a machine learning model that would calculate a representative vector for the class. We show that our methodology out-performs the traditional mean and median vector representations.
Conference Paper
Full-text available
Most current statistical natural language process- ing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sam- pling, a simple Monte Carlo method used to per- form approximate inference in factored probabilis- tic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorpo- rate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consis- tency constraints. This technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks.
Domain specific information retrieval process has been a prominent and ongoing research in the field of natural language processing. Many researchers have incorporated different techniques to overcome the technical and domain specificity and provide a mature model for various domains of interest. The main bottleneck in these studies is the heavy coupling of domain experts, that makes the entire process to be time consuming and cumbersome. In this study, we have developed three novel models which are compared against a golden standard generated via the on line repositories provided, specifically for the legal domain. The three different models incorporated vector space representations of the legal domain, where document vector generation was done in two different mechanisms and as an ensemble of the above two. This study contains the research being carried out in the process of representing legal case documents into different vector spaces, whilst incorporating semantic word measures and natural language processing techniques. The ensemble model built in this study, shows a significantly higher accuracy level, which indeed proves the need for incorporation of domain specific semantic similarity measures into the information retrieval process. This study also shows, the impact of varying distribution of the word similarity measures, against varying document vector dimensions, which can lead to improvements in the process of legal information retrieval.
Conference Paper
Semantic similarity measures are an important part in Natural Language Processing tasks. However Semantic similarity measures built for general use do not perform well within specific domains. Therefore in this study we introduce a domain specific semantic similarity measure that was created by the synergistic union of word2vec, a word embedding method that is used for semantic similarity calculation and lexicon based (lexical) semantic similarity methods. We prove that this proposed methodology out performs word embedding methods trained on generic corpus and methods trained on domain specific corpus but do not use lexical semantic similarity methods to augment the results. Further, we prove that text lemmatization can improve the performance of word embedding methods.
Identifying participant mentions and resolving their coreferences in legal court judgements
  • A Gupta
  • D Verma
  • S Pawar
  • S Patil
  • S Hingmire
  • G K Palshikar
  • P Bhattacharyya
A. Gupta, D. Verma, S. Pawar, S. Patil, S. Hingmire, G. K. Palshikar, and P. Bhattacharyya, "Identifying participant mentions and resolving their coreferences in legal court judgements," in International Conference on Text, Speech, and Dialogue. Springer, 2018, pp. 153-162.