Content uploaded by Oren Halvani
Author content
All content in this area was uploaded by Oren Halvani on Jan 14, 2019
Content may be subject to copyright.
Unary and Binary Classification Approaches
and their Implications for Authorship Verification
Oren Halvani, Christian Winter, Lukas Graner
Abstract
Retrieving indexed documents, not by their topical content but their writing style opens the
door for a number of applications in information retrieval (IR). One application is to retrieve
textual content of a certain author X, where the queried IR system is provided beforehand with
a set of reference texts of X. Authorship verification (AV), which is a research subject in the
field of digital text forensics, is suitable for this purpose. The task of AV is to determine if two
documents (i.e. an indexed and a reference document) have been written by the same author
X. Even though AV represents a unary classification problem, a number of existing approaches
consider it as a binary classification task. However, the underlying classification model of an
AV method has a number of serious implications regarding its prerequisites, evaluability, and
applicability. In our comprehensive literature review, we observed several misunderstandings
regarding the differentiation of unary and binary AV approaches that require consideration.
The objective of this paper is, therefore, to clarify these by proposing clear criteria and new
properties that aim to improve the characterization of existing and future AV approaches. Given
both, we investigate the applicability of eleven existing unary and binary AV methods as well
as four generic unary classification algorithms on two self-compiled corpora. Furthermore, we
highlight an important issue concerning the evaluation of AV methods based on fixed decision
criterions, which has not been paid attention in previous AV studies.
1 Introduction
Document classification plays a vital role across numerous fields of studies including library, infor-
mation or computer science and represents a major task in IR. The categorization of documents1
can be performed regarding a variety of concepts such as genre,register,text type,domain,
sublanguage or style. Focusing on the (writing) style of documents allows a number of promising
applications in IR. For example, a user can provide an IR system reference texts of an author A
such as blog posts and ask the system to retrieve additional content of A(e. g., product reviews,
articles or comments) based on the same writing style. Style-based IR is in particular interesting if
both reference and indexed documents stem from the same person but differ in terms of meta data
(for instance, different user names) or are not even provided with meta data at all. This might be
a helpful supplement in the context of fake news detection.
A number of research disciplines exist that concern themselves with the analysis of writing style
(more precisely, with the authorship of documents), where the most important are authorship
attribution (AA) and authorship verification (AV). The former deals with the problem to
identify the most likely author of an unknown document DU, given a set of texts of candidate
authors. AV, on the other hand, focuses on the question2if DUwas in fact written by a known
author A, where only a set DAof reference texts of this author is given. Both disciplines are strongly
related to each other, as any AA problem can be broken down into a series of AV problems [35].
Here, an AV system must determine for each verification problem ρ= (DU,DA), if all involved
documents stem from the same author, based on a specific decision criterion. Breaking down an
AA problem into multiple AV problems is especially important in such scenarios, where the presence
of DU’s true author in the candidate set cannot be guaranteed. In contrast to AA, which represents
an n-ary (multi-class) classification problem, AV is a unary (one-class) classification3problem, as
1Note that “documents” are not necessarily restricted to natural language, but might also represent source code
snippets or other types of textual data.
2There are also other formulations that describe authorship verification problems (see, for example, [42]).
3In this form, as highlighted in [17], AV is considered to be a recognition rather than a classification problem.
1
arXiv:1901.00399v1 [cs.IR] 31 Dec 2018
there is only one class (A) to learn from [17,26,34,36,42]. However, inspecting previous studies
reveals that unary classification appears to be a gray area in machine learning and, in particular, in
the context of AV, where a number of misunderstandings can be observed.
The objective of this paper is to analyze these misunderstandings in detail and to propose clear
criteria and properties that aim to close the gap of existing definitions and attempts to
characterize AV methods, especially regarding the underlying classification models. By this, we hope
to contribute to the further development of this young1research field. Based on our definitions, we
investigate the applicability of eleven existing unary and binary AV methods as well as four generic
unary classification algorithms on two self-compiled corpora, which we make available for the AV
community. Furthermore, we elaborate the implications that have to be faced for each approach
and highlight an important issue concerning the evaluation of such AV methods that are based on
fixed decision criterions.
2 Existing Approaches
In order to design an AV method, a wide spectrum of possibilities exists including unary, binary and
n-ary classification approaches. In the following subsection, we first describe a number of generic
unary classification algorithms that can and have been used in the context of AV. Afterwards, we
present existing AV approaches that were partially motivated by these algorithms. All introduced
approaches will be assessed regarding their performance in Sec. 4, where each approach will be
categorized according to our proposed criteria and properties.
2.1 Generic Unary Classification Algorithms
As can be observed in the literature (for example, [10,11,19,32]), a number of existing AV methods
are based on unary classification algorithms. In the following, we therefore provide a brief
overview of some selected approaches, which were also considered in our evaluation.
2.1.1 One-Class Nearest Neighbor
One very simple, but quite effective, unary classification algorithm is OCNN (One-class Nearest
Neighbor [43]). Given a unknown document DU, the known documents DA={D1,D2,...,Dn}and
a predefined distance function, the idea behind OCNN is that DUis accepted as a member of the
target class A, if its closest neighbor Diwithin DAis closer to DUthan the closest neighbor of Di
within DA\{Di}. A number of existing AV approaches including [10,11,20] represent modifications
of OCNN.
2.1.2 One-Class Support Vector Machine (OSVM)
The idea of OSVM (One-class Support Vector Machine [23]) is to construct a hypersphere shaped
decision boundary with minimal volume around samples of Ain a specific feature space and, by
this, to distinguish all other possible documents of an unknown authorship. If DUfalls inside this
hypersphere, DUis accepted (U=A), otherwise rejected. In existing AV and AA works, OSVM
served as a baseline (for example, [2,26]) or as the core method [33].
2.1.3 Local Outlier Factor
Another unary classification algorithm, which originally was constructed for finding outliers in large
databases is LOF (Local Outlier Factor [3]) which, similarly to OCNN, also employs nearest neighbor
1According to the literature [40], Stamatatos et al. were the first researchers, who discussed AV in the context of
natural language texts in their paper [41] published in 2000. AV, therefore, can be seen as a young field in contrast
to AA, which dates back to the 19th century [16].
2
distances. LOF uses a sophisticated strategy for comparing the distances between DUand its k-
nearest neighbors to the distances between these neighbors and their k-nearest neighbors. The final
score LOF returns is a quotient, derived from all these distances, which becomes larger the more DU
is an outlier.
2.1.4 Isolation Forest
Another unary classifier, which gained much attention in recent years, is IF (Isolation Forest [29]).
Similarly to its counterpart Random Forest1,IF builds multiple binary trees that separate the feature
space recursively. Each node divides its child nodes based on a randomly selected feature and
threshold. Assuming that outliers are only “few and different” [29], the idea is that instances placed
deeper in a tree are less likely outliers. The acceptance of DUto be a member of the target class
Adepends on its corresponding depth, averaged over the trees. Neal et al. [32] proposed an AV
approach that works on top of IF.
2.2 Existing AV Approaches
During 2013–2015, the organizers of the PAN2workshop held three AV competitions [22,39,40],
which attracted attention among the AV community and led to a noticeable increase of proposed
approaches in this field. In the following, we give a brief overview regarding a number of existing
AV methods which, at least partially, achieved promising results within the PAN competitions.
In 2013, Seidman [38] proposed a successful AV method named GenIM (General Impostors Method)
which is a slight variation of the well-known Impostors approach introduced by Koppel and Schler
[28]. GenIM works in two steps. First, so-called impostor documents are gathered that aim
to represent the counter class of A, namely ¬A. Second, a feature randomization technique is
applied iteratively to measure the similarity between pairs of documents. If, given this measure,
a suspect is picked out from among the impostor set with sufficient salience, then the questioned
document DUis considered to be written by this author, otherwise not [28]. GenIM was the overall
winning approach of the PAN-2013 AV competition [22] in terms of F1and was ranked second in
terms of AUC. In 2014, Khonji and Iraqi [24] proposed a slightly modified version of GenIM, which
they named ASGALF3. The authors adapted a modified min-max(·)4similarity measure as well as a
larger set of features including function words, word shapes, and part-of-speech tags. ASGALF was
the overall winning approach of the PAN-2014 AV competition [40].
In 2015, Bagnall [1] proposed a method based on a character-level RNN, which was not only the
overall winning approach at the PAN-2015 AV competition [39] but also the first attempt to apply
deep learning in the context of AV. Similarly to GenIM, the approach of Bagnall also requires a
corpus C={ρ1, ρ2, . . . , ρn}with ρi= (DUi,DAi), where the ratio of matching (Y) and non-matching
(N) authorships must be known beforehand. The method can be roughly split up into four steps.
The first step is to train language models5(LM) for all known document sets DA1,DA2,...,DAn,
which results in LM1, LM2,. . ., LMn. As a second step, each unknown document DUiis attributed
against all ntrained language models. For this, a score (more precisely, the mean cross entropy )
between DUiand each LMjis calculated, which describes how “well” LMjpredicts DUi. In the
third step, all scores across the problems are normalized, in order to overcome possible variances
among the learned language models. In the fourth step, the normalized scores are ranked and
transformed into similarity scores. Based on the Y/N-ratio of C, the threshold to accept or reject
unknown documents is then determined. For example, if Cis balanced6(which was the case for the
PAN-2015 AV corpora) then the method uses the median of the similarity scores as a threshold. One
advantage of Bagnall’s approach is that it considers the entire document as a sequence of characters
and automatically learns patterns to distinguish between authors. By this, defining handcrafted
1Random Forest is a well-known classification algorithm, widely used for n-ary classification tasks.
2PAN is a series of scientific events and shared tasks on digital text forensics.
3ASGALF stands for “A Slightly-modified GI-based Author-verifier with Lots of Features” [24].
4Also known as the Ruzicka measure.
5Technically, language models considered in Bagnall’s method are character probability distributions.
6In a balanced corpus, the verification problems with matching (Y) and non-matching (N) authorships are evenly
distributed.
3
features is avoided. However, since the method discriminates between multiple (known) authors, it
better fits to the category of AA instead of AV methods (Dwyer [6] also made this observation).
This and the fact that we were not able to reproduce this complex approach led to our decision to
exclude it from our evaluation.
In 2015, H¨urlimann et al. [17] proposed their novel AV approach GLAD1, which deliberately discards
the idea to model an outlier class ¬A by collecting impostor documents. Instead, GLAD considers
a training corpus C={ρ1, ρ2, . . . , ρn}, where each verification problem ρ= (DU,DA) is labeled
either as Yor N. Given C,GLAD constructs for each ρin ∈ C (not each document in ρ) a feature
vector consisting of 24 features. Here, the features were obtained individually from DU,DAor
simultaneously from both (H¨urlimann et al. denote these as “joint features”). After representing
each ρiin the feature space, the authors train a binary SVM to separate the space such that DUis
accepted or rejected depending in which subspace it falls.
In 2018, Halvani et al. [10] proposed their AV method OCCAV2, which also avoids the idea to
model a counter class ¬A and is even independent of a training corpus. OCCAV is inspired by
the unary classification algorithm OCNN. However, instead of constructing feature vectors from
the documents, here, all texts are represented as compressed byte streams using the Prediction by
Partial Matching (PPMd) algorithm. By this, a verification problem ρ= (DU,DA) is transformed
into ˆρ= ( ˆ
DU,ˆ
DA), where all involved documents are compressed. Another difference to OCNN is
that instead of using a standard distance function, OCCAV relies on the so-called Compression Based
Cosine, which measures dissimilarities between compressed documents. Given this measure and the
compressed documents in ˆρ, the method computes dissimilarities between ˆ
DUand each ˆ
DA∈ˆ
DA.
Next, ˆ
Dnear ∈ˆ
DAis selected, which has the smallest dissimilarity dmin to ˆ
DU. Then, dissimilarities
between ˆ
Dnear and each ˆ
Dj∈ˆ
DA\{ ˆ
Dnear}as well as their average davg are computed. If dmin < davg
holds, then the unknown document DUis assumed to be written by A.
3 Analysis
With the increasing number of proposed AV approaches, the wish arose to compile a systematic
characterization to enable a better comparison between the methods. In 2004, Koppel and Schler
[26] described, for the first time, the connection between unary classification and AV. In 2008,
Stein et al. [42] provided an overview of important algorithmic building blocks for AV where, among
others, they also formulated three AV problems as decision problems. In 2014, Potha and Stamatatos
[34], introduced specific properties that aim to characterize AV methods. However, a deeper look on
previous attempts to organize the field of AV reveals a number of misunderstandings, in particular,
when it comes to draw the borders between unary and binary AV approaches. In the following, we
analyze these misunderstandings and propose redefinitions as well as new AV properties. We show
that in fact there are not only two (unary/binary) but three possible categories of AV methods
and that their categorization solely depends on the way how the acceptance criterion is determined.
3.1 Determinism of results
A fundamental property of any AV method, especially in the context of evaluation, is whether
it behaves deterministically or non-deterministically. AV approaches such as [17,18,34] always
generate the same output for the same inputs, i. e., these methods are deterministic. In contrast,
non-deterministic AV methods as proposed in [15,26,32,35,38] involve randomness (for instance,
subsampling of the feature space or the number of impostors) which, as a consequence, might distort
the evaluation since every run on a (training or test) corpus very likely leads to different results.
Therefore, it is indispensable to perform multiple runs and to consider the average and dispersion
of the achieved results for a reasonable and robust comparison between different AV approaches.
1Groningen Lightweight Authorship Detection.
2One-Class Compression Authorship Verifier.
4
3.2 Optimizability
Optimizability is another property of an AV method, which affects the dependency on a training
corpus. We define an AV method as optimizable if, according to its design, it offers adjustable hy-
perparameters that can be tuned against a training corpus, given an optimization method (e. g., grid
or random search). Such hyperparameters might be, for instance, the selected distance/similarity
function, the number of neurons/layers in a neural network, the chosen kernel method of an SVM,
the selected feature categories, or adjustable weights and thresholds. The ma jority of existing AV
methods in the literature (including [5,7,15,28,34]) belong to this category. On the other hand, if a
published AV approach involves hyperparameters that have been entirely fixed such that there is no
further possibility to improve its performance from outside (without deviating from the definitions
in the publication of the method), the method is considered to be non-optimizable. Obviously,
non-optimizable AV approaches are easier to reproduce, as we can discard the dependency on a
training corpus. Among the proposed AV methods in the respective literature, we only identified
three approaches [10,20,44] that belong to this category.
3.3 Model Category (Unary versus Binary)
Even though AV clearly represents a unary classification problem [17,26,34,36,42], one can
observe in the literature that sometimes it is interpreted as unary [19,20,32,34] and sometimes
as binary [25,28,30,44]. We define the way an AV approach is modeled by the phrase model
category. However, before explaining this in more detail, we first have to recall what, according to
the literature, unary classification exactly represents. For this, we list the following verbatim quotes,
which characterize unary classification, as can be seen, almost identically (emphasized by us):
•“In one-class classification it is assumed that only information of one of the classes, the target
class, is available. This means that just example objects of the target class can be used
and that no information about the other class of outlier objects is present.” [43]
•“One-class classification (OCC) [.. .] consists in making a description of a target class
of objects and in detecting whether a new object resembles this class or not. [. . . ]
The OCC model is developed using target class samples only.” [37]
•“In one-class classification framework, an object is classified as belonging or not belonging to
a target class, while only sample examples of objects from the target class are available
during the training phase.” [19]
Note that in the context of authorship verification, the target class refers to the known author A
such that for a document DUof an unknown author Uthe task is to verify whether U=Aholds.
One of the most important requirements of any existing AV method is a decision criterion, which
aims to accept or reject a questioned authorship. A decision criterion can be expressed through a
simple threshold θor a more complex decision model θM. As a consequence of the above statements,
the determination of θor θMhas to be performed solely on DA, otherwise the AV method cannot be
considered to be unary. However, our conducted literature search regarding existing AV approaches
revealed that there are uncertainties, how to precisely draw the borders between unary and binary
AV methods (for instance, [2,34,36]). Nonetheless, few attempts have been made to distinguish
both categories from another perspective. Potha and Stamatatos [36], for example, categorize AV
methods based on their characteristics being either intrinsic or extrinsic (emphasized by us):
1. “Verification models differ with respect to their view of the task. Intrinsic verification models
view it as a one-class classification task [. . . ] Such methods [. . . ] do not require any
external resources.” [36]
2. “On the other hand, extrinsic verification models attempt to transform the verification task
to a pair classification task by considering external documents to be used as samples of
the negative class.” [36]
While we agree with (2), the former statement (1) is unsatisfactory, since intrinsic verification models
are not necessarily unary. The AV approach GLAD [17], for instance, directly contradicts the
above statement. Here, the authors
5
“decided to cast the problem as a binary classification task where class values are Y
[A=U]and N[A 6=U]. [. . . ] We do not introduce any negative examples by means
of external documents, thus adhering to an intrinsic approach.” [17]
A similar contradiction to the statement of Potha and Stamatatos can be observed in the paper
of Jankowska et al. [18], who introduced the so-called CNG approach that resembles the unary
k-centers algorithm [43]. CNG is intrinsic in that way that it considers only DA. On the other
hand, the decision criterion which, in this specific case is a threshold θ, is determined on a set of
verification problems, labeled either as Yor N(“external resources”). Therefore, CNG is in conflict
with the unary definition mentioned above. In a subsequent paper, however, the authors refined
their CNG approach and introduced an ensemble based on multiple k-centers [19]. This time, θwas
determined solely on the basis of DAsuch that the modified approach can be considered as a true
unary AV method, according to the aforementioned statements.
In 2004, Koppel and Schler [26] presented the Unmasking approach in their paper “Authorship
Verification as a One-Class Classification Problem”, which, according to the authors, represents a
unary AV method. However, if we take a closer look at the learning process of Unmasking, we can see
that it is based on a binary SVM classifier, which consumes feature vectors labeled as Yand N. Here,
the task of the SVM is to classify the generated curves according to the two classes same-author
and different-author.Unmasking, therefore, cannot be considered to be unary, as the decision is not
based solely on the documents within DA.
It should be highlighted again that these approaches are binary and intrinsic since their decision
criteria are determined on a training corpus labeled with Yand Nin a binary manner (binary decision
regarding problems with known Yand Nlabels) while regarding the verification they consider, in
an intrinsic manner, only DA. A crucial aspect, which might have lead to misperceptions regarding
the model category of these approaches in the past, is the fact that two different class domains are
involved. On the one hand, there is the class domain of authors, where the task is to distinguish
Aand ¬A. On the other hand, there is the elevated or lifted domain of verification problems,
which either falls into class Yor class N. The training phase of binary-intrinsic approaches is used
for learning to distinguish these two classes, and the verification task can be understood as putting
the verification problem as a whole into class Yor class N, whereby the class domain of authors fades
from the spotlight.
In contrast to binary-intrinsic approaches, there exist also AV approaches that are binary and
extrinsic (for example [14,24,28,35,44]) as these methods use external documents during a
potentially existing training phase and – more importantly – during testing. In these approaches,
the decision between Aand ¬A is put into the focus, where the external documents aim to construct
the counter class ¬A.
Based on the observations above, we conclude that the key requirement (see illustration in Fig-
ure 1) to judge the model category of an AV method depends solely on the fact how its decision
criterion θor θMis determined:
1. An AV method is unary, if and only if its decision criterion θor θMis determined solely on
the basis of the target class A. As a consequence, an AV method cannot be considered to
be unary if documents not belonging to Aare used to define θor θM.
2. An AV method is binary-intrinsic, if its decision criterion θor θMis determined on a training
corpus comprising verification problems labeled either as Yor N(in other words documents of
several authors). However, once the training is completed, a binary-intrinsic method has no
access to external documents anymore such that the decision regarding the authorship of DU
is made on the basis of the reference data of Aas well as θor θM.
3. An AV method is binary-extrinsic, if its decision criterion θor θMis determined on the basis
of external documents that represent the outlier class ¬A (the counterpart of A). Here, it is
not relevant whether a training corpus was used to optimize θor θM. As long as the method
has access to documents of ¬A, it will remain binary-extrinsic.
It should be highlighted that unary AV methods (for instance, [11,32,34]) are not excluded to be
optimizable. As long as θor θMis not part of the optimization, the model category of the method
6
Learn 𝜃or 𝜃𝑀using
only reference data
of the target class 𝑨.
Learn 𝜃or 𝜃𝑀on a trai ni ng cor pu s
comprising verification problems
labeled as 𝐘and 𝐍.
Learn 𝜃or 𝜃𝑀given
ex ter na l data that
explicitly models the
outlier class ¬𝑨.
Unary
𝒖
𝑨𝑨
¬
𝑨
Bina ry-ex trins ic
𝐘
𝐍
Training
Bina ry-in trins ic
Testing
𝝆
𝐘
𝐍
𝒖
𝑨
¬
Figure 1: The three possible model categories of authorship verification approaches. Here, Urefers
to the instance (for example, a document or a feature vector) of the unknown author. Arepresents
instances of the target class (known author) and ¬A the outlier class (any other author). Yand N
denote the regions of the feature space where, according to a training corpus, the authorship holds
or not. In the binary-intrinsic case, ρdenotes the verification problem (subject of classification).
remains unary. The rationale behind this is that hyperparameters might influence the resulting
performance of a unary AV method, while the decision criterion itself remains unchanged.
3.4 Implications
Each model category has its own implications regarding prerequisites, evaluability, and applicability.
3.4.1 Unary AV Methods
One advantage of unary AV methods is that they do not require a specific document collection
strategy to construct the counter class ¬A, which reduces their complexity. Moreover, a training
corpus is not required, at least if the method is non-optimizable (for example, OCCAV [10]. On the
downside, the choice of the underlying machine learning model of a unary AV method is restricted
to unary classification algorithms or also unsupervised learning techniques, given a suitable decision
criterion. However, a far more important implication of unary AV approaches concerns their
performance assessment. Since unary classification (not necessarily AV) approaches depend on
a fixed decision criterion θor θM, performance measures such as the area under the ROC1curve
(AUC) are meaningless. Recall that ROC analysis is used for evaluating classifiers, where the
decision threshold is not finally fixed. ROC analysis requires that the classifier generates scores,
which are comparable across classification problem instances. The ROC curve and the area under
this curve is then computed by considering all possible discrimination thresholds for these scores.
While unary AV approaches might produce such scores, introducing a variable θwould change
the semantics of these approaches. Since unary AV approaches have a fixed decision criterion,
they provide only a single point in the ROC space. To assess the performance of a unary AV
method it is, therefore, mandatory to consider the confusion matrix that leads to this point in the
ROC space.
3.4.2 Binary AV Methods
If we design a binary (intrinsic or extrinsic) AV method, we can choose among a variety of binary2
and n-ary3classification models. However, if the choice falls on a binary-extrinsic method, a
strategy has to be considered, in order to collect representative documents for the outlier class ¬A.
1Receiver Operating Characteristic.
2For example: Support vector machines, logistic regression or perceptron.
3For example: Naive Bayes, random forests or a variety of neural networks.
7
Methods such as [28,35,44] rely on search engines for retrieving appropriate documents, which
might refuse their service if a specified quota is exhausted. Additionally, the retrieved documents
make these methods inherently non-deterministic. Moreover, as can be observed in [22,40] (as
well as in our evaluation in Sec. 4) such methods cause relatively high runtimes. Using search
engines also requires an active Internet connection, which might not be available or even allowed in
specific scenarios. But even if we can access the Internet to retrieve documents, there is no guarantee
that the true author is not among them. With these points in mind, the applicability of binary-
extrinsic methods in real-world cases i. e., forensic settings, remains questionable. On the other
hand, if we consider to design a binary-intrinsic AV method, it should not be overlooked that the
involved classifier learns nothing about individual authors but only similarities or differences that
hold in general for Yand Nverification problems [28].
4 Evaluation
Based on our definitions in Sec. 3, we investigate the applicability of unary, binary-intrinsic and
binary-extrinsic AV methods. First, we describe which existing AV methods as well as generic
unary classification approaches were considered for our evaluation. Afterwards, we explain which
corpora were compiled for the task.
4.1 Existing AV Approaches
To assess the performance of AV methods based on our criteria, we reimplemented 11 existing AV
approaches that have shown their potentials in existing studies as well as in the three PAN AV
competitions from 2013–2015. More precisely, we reimplemented two binary-extrinsic (GenIM [38]
and NNCD [44]), five binary-intrinsic (COAV [12], AVeer [13], GLAD [17], ProfileAV [34] and Unmasking
[26]) and four unary AV approaches (DistAV [20], CNG [19], MOCC [11] and OCCAV [10]).
Note that in the original version of both binary-extrinsic approaches GenIM and NNCD, the authors
proposed to use search engine queries to generate impostor documents that are needed to model
the counter class ¬A. However, due to quota limits, we decided to use an alternative strategy in
our reimplementations. Let C={ρ1, ρ2, . . . , ρn}denote a corpus. For a given verification problem
ρi= (DUi,DAi)∈ C, we choose all DUjin Cwith i6=jas the impostor set U. However, it should
be highlighted that in GenIM, the number of impostors is a hyperparameter such that the resulting
impostor set is a subset of U, whereas in NNCD all Uj∈Uare considered. Although our strategy is
not flexible like using a search engine, it has one advantage that here it is assumed1that the true
author of an unknown document is not among the impostors, since in our corpora we know the user
names of those who have written all documents.
4.2 Generic Unary Classification Approaches
In addition to the reimplemented unary AV methods, we also considered the four generic unary
classification algorithms OCNN,OSVM,LOF and IF (introduced in Sec. 2) and adapted them to the
AV task. To ensure a fair and equal setting, all classifiers were provided with the same set of features
which, according to the literature in AV and AA, have been proven to perform very well. The set of
features consists of character n-grams (with n∈ {2,3,4}), punctuation marks and function words.
However, since instead of raw strings the four algorithms require numerical feature vectors as an
input, we represent all extracted features according to their relative frequencies in the documents.
Instead of selecting the top most frequent features, which is the case in existing AV approaches
such as CNG [19] or ProfileAV [34], we used all occurring features in the texts. Regarding the two
distance-based methods OCNN and LOF, we decided to use the Manhattan distance, which has been
applied successfully in previous authorship analysis studies (for example, [4,8,13,21,27]).
1Note that we cannot be sure if two or more user names in fact refer to the same author.
8
4.3 Corpora
As a data basis for our evaluation, we compiled two corpora1. The first corpus represents a collection
of 4,000 documents (aggregated postings, crawled from Reddit ) written by 1,000 authors. The second
corpus is a collection of 7,000 documents (aggregated product reviews, extracted from the Amazon
product data corpus [31]) written by 1,400 authors. After aggregating the documents, we split both
datasets into training (C(tr)
Reddit,C(tr)
Amazon) and evaluation corpora (CReddit,CAmazon ) and resampled
the documents to construct balanced corpora with a bigger number of verification problems. As a
result of the resampling procedure, C(tr)
Reddit,CReddit ,C(tr)
Amazon and CAmazon ended up with 600, 1,400,
800 and 2,000 verification problems, respectively.
4.4 Results
After tuning hyperparameters of all optimizable approaches on the training corpora based on the
described training procedure in the respective literature, we applied the learned models together with
the non-optimizable methods on both evaluation corpora CReddit and CAmazon. The results regarding
all approaches are listed in Table 1. Since we do not limit ourselves to one specific performance
measure, we report for each method the outcomes TP, FN, FP, and FN of the corresponding confusion
matrix. However, to enable a better comparison, we also list the following “single number” evaluation
metrics: Accuracy, F1and Cohen’s κ, where the latter is a relatively new performance measure in
the context of AV, proposed by Halvani et al. in [9].
A variety of observations can be inferred from Table 1. In particular, the majority of binary-
intrinsic AV methods tend to outperform both binary-extrinsic and unary approaches. GLAD,
which is the top performing approach on both corpora, demonstrates that binary-intrinsic ap-
proaches are very effective, even though the AV task itself represents an unary classification prob-
lem. The two other binary-intrinsic methods AVeer and COAV also achieve high results, but differ
from GLAD in several important aspects. AVeer and COAV rely both on simple similarity func-
tions that accept or reject the authorship of unknown documents according to a scalar threshold.
GLAD, on the other hand, is based on an SVM, which is widely known to be a strong classifier. An
explanation why GLAD is superior might be that the discrimination ability of a single threshold is
not fine-granular enough, compared to the hyperplane constructed by the SVM, which separates a
24-dimensional feature space in a non-linear2way. Another (or an additional) explanation could be
that, in contrast to AVeer and COAV,GLAD makes use of several joint features (see Sect. 2.2), which
might capture better differences or similarities between the documents.
Furthermore, we can see from Table 1that binary-extrinsic approaches also perform very well, in
particular GenIM. This is consistent with the findings in previous studies such as [19,25,34]. The
high results regarding GenIM also indicate that considering static corpora to generate impostor
documents is a suitable alternative to search engine queries.
When comparing the results of all methods on CReddit and CAmazon to each other, we can also
see that the majority of the examined AV approaches perform more or less stable (GLAD,COAV,
DistAV and IF even have exactly the same rankings on both corpora). However, one exception is the
binary-extrinsic method NNCD, which performs quite well on CReddit, but is among the worst three
approaches on the CAmazon corpus. Unfortunately, there is no clear explanation, if this is caused by
the bigger number of available impostors in CAmazon (here, each DUiis confronted with 400 more
impostors than in CReddit) or due to another reason. Therefore, we leave this open question for
future work.
From the four examined unary AV approaches (DistAV,CNG,MOCC and OCCAV; without the
generic unary classification algorithms), OCCAV yields the best results and performs quite stable
on both corpora. Despite of the fact that OCCAV, which builds on top of OCNN, belongs to the
category of non-optimizable AV approaches, it seems to generalize very well. This is particularly
important in real-world settings such as forensic cases, where training corpora with labeled data of
the suspects are not always available.
1All corpora and additional material will be available after publication of this paper.
2GLAD utilizes the RBF kernel.
9
Table 1: Evaluation results for CReddit and CAmazon sorted by Accuracy in descending order. Binary-
intrinsic approaches are highlighted by purple, binary-extrinsic approaches by orange and unary
approaches by green .Non-optimizable and non-deterministic AV methods are marked by †
and ?, respectively.
Method Performance score Confusion Matrix Runtime
Accuracy κF1TP FN FP TN (hh:mm:ss)
GLAD [17] 0.826 0.653 0.827 579 121 122 578 18:06
GenIM [38] (?) 0.805 0.610 0.768 451 249 24 676 5:21:54
AVeer [13] 0.776 0.553 0.769 521 179 134 566 0:59
COAV [12] 0.770 0.540 0.736 449 251 71 629 0:39
OCCAV [10] (†) 0.767 0.534 0.766 533 167 159 541 12:27
NNCD [44] (†) 0.764 0.529 0.695 376 324 6 694 14:36:25
ProfileAV [34] 0.728 0.457 0.732 519 181 199 501 1:55
CNG [19] 0.719 0.437 0.743 569 131 263 437 17:40
LOF [3] 0.701 0.403 0.731 568 132 286 414 46:11
MOCC [11] 0.683 0.366 0.624 368 332 112 588 0:57
Unmasking [26] (?) 0.682 0.364 0.691 497 203 242 458 7:10
OCNN [23] 0.671 0.343 0.601 347 353 107 593 52:41
OSVM [23] 0.651 0.301 0.560 311 389 100 600 1:04:21
DistAV [20] (†) 0.639 0.277 0.715 634 66 440 260 0:35
CReddit
IF [29] (?) 0.501 0.001 0.612 551 149 550 150 45:39
GLAD [17] 0.858 0.716 0.859 867 133 151 849 16:00
AVeer [13] 0.816 0.631 0.811 790 210 159 841 1:39
GenIM [38] (?) 0.784 0.567 0.761 690 310 123 877 52:32
COAV [12] 0.778 0.556 0.763 716 284 160 840 2:18
LOF [3] 0.769 0.537 0.779 817 183 280 720 40:34
OCCAV [10] (†) 0.757 0.514 0.769 811 189 297 703 12:07
OCNN [23] 0.734 0.467 0.674 552 448 85 915 41:52
Unmasking [26] (?) 0.731 0.462 0.728 719 281 257 743 8:42
ProfileAV [34] 0.722 0.443 0.719 714 286 271 729 1:48
CNG [19] 0.713 0.426 0.750 863 137 437 563 23:25
MOCC [11] 0.712 0.424 0.660 559 441 135 865 1:38
OSVM [23] 0.677 0.353 0.560 411 589 58 942 1:39:11
NNCD [44] (†) 0.604 0.208 0.349 212 788 4 996 15:52:22
DistAV [20] (†) 0.604 0.207 0.708 960 40 753 247 0:27
CAmazon
IF [29] (?) 0.495 -0.011 0.608 785 215 796 204 36:15
From all generic unary classification algorithms (OCNN,OSVM,LOF and IF), LOF achieves the
highest result. One interesting point here is that LOF outperforms the closely related OCNN method,
although both not only rely on the same features but also on the same distance function. We wish
to highlight at this point that, according to the literature, this is the first time LOF has been applied
to AV such that we recommend to investigate its potential in future work. Another observation that
can be seen in Table 1is that IF performs similar to a random guess. This is noteworthy as recently
Neal et al. proposed an AV method in [32] that is very similar to our IF implementation1, where
the authors report a recognition accuracy exceeding 98% on the so-called CASIS 2corpus. However,
since this corpus is not available online3, we cannot investigate this issue in more detail.
When comparing the unary AV approaches against the generic unary classification algorithms, there
is no clear separation of these to groups regarding their performance since their ranks in Table 1
are interweaved. There might be a little advantage for the dedicated AV methods compared to the
generic algorithms since OCCAV is on average the best method within these two groups, and IF is
clearly separated at the bottom.
1Our IF implementation is mostly based on the scikit-learn library.
2Center for Advanced Studies in Identity Science.
3An attempt to request the corpus directly from the authors was also not successful.
10
Regarding the performance measures listed in Table 1, several interesting observations can be made.
For example, when looking at the performance results in the F1-column, we can see that the ranking
of the examined AV methods differs from those of Accuracy (and κ). The reason for this can be
explained easily, when we consider the underlying formulas of both measures:
Accuracy = TP + TN
TP + TN + FP + FN F1=2TP
2TP + FN + FP
The given formula for F1is obtained from the given formula for Accuracy by replacing TP + TN
with 2TP. Resulting Accuracy values will be greater than F1values if TN >TP and smaller if
TN <TP holds. This also answers the question, why two AV methods that perform almost equally
in terms of Accuracy (for example, NNCD vs. DistAV on the CAmazon) have a significant difference
regarding their F1values.
The difference between Accuracy and F1is more than a matter of interchangeable design choices.
The design of the F1measure leads to the problem that resulting F1values can be quite misleading.
For instance, if an AV method predicts always Y(e.g., due to a weak threshold), F1will result in 2
/3
on a balanced corpus. In contrast, Accuracy will result in 1
/2, which can be interpreted as a coin
toss. In the case of an AV method that predicts always N, F1will be 0 while Accuracy will result
in 1
/2again.
Putting the discussion to a more abstract level, the problem is that the measure F1ignores the true
negatives (TN) in contrast to Accuracy (and κ). Ignoring TN is generally not reasonable in the
context of AV, as it must be measurable if a method is able to correctly predict such cases, where
the authorship does not hold. Based on these findings, we discourage to use F1for assessing the
performance of AV methods.
An observation regarding Accuracy and κcan be made when comparing the columns for Accuracy
and κin Table 1to each other. Both measures preserve the same ranking, and a closer look reveals
that κhas a linear relationship to Accuracy on balanced corpora (such as CReddit and CAmazon). The
explanation for this can be shown based on the definition of κ:
n= TP + FN + FP + TN
p0=n−1(TP+TN)
pc=n−2((TP+FN)(TP+FP) + (FP+TN)(FN+TN))
κ=p0−pc
1−pc
For balanced corpora, pcresults in 0.5 such that κ= 2 ×Accuracy −1 holds. However, in cases
where corpora are imbalanced, it makes more sense to use κinstead of Accuracy, as the latter favors
the majority class. A visual inspection of the behavior of both measures regarding imbalanced
corpora is given in [9].
A closer look on the last column in Table 1also reveals a number of issues that may require some
consideration. Compared to the binary-intrinsic AV methods, the majority of the unary approaches
obviously require more runtime. One exception here is DistAV, which needs on average ≈31 seconds
to process a whole test corpus. Binary-extrinsic approaches require even more runtime, compared
to almost all unary approaches. A good trade-off between performance and runtime (which might
be an important issue in the context of an IR system) can be observed for AVeer, followed by COAV.
5 Conclusion and Future Work
Based on a comprehensive literature review of numerous AV studies, we identified a number of
misunderstandings regarding the different model categories of existing AV approaches, which have
serious implications regarding their prerequisites, evaluability, and applicability. We defined clear
criteria that aim to draw precise borders between the different categories of AV approaches and
explained, which challenges occur in terms of evaluation, when an AV method is based on a fixed
decision criterion. Given our definitions, we reimplemented a number of existing unary, binary-
intrinsic and binary-extrinsic AV methods and assessed their performance on two large self-compiled
11
corpora, which we made available for the AV community. One of our observations was that specific
unary AV methods can not only outperform their binary-intrinsic and binary-extrinsic counterparts
but also perform stable across the different corpora. We have shown why the F1performance measure
can be misleading in the context of authorship verification and also highlighted the connection
between Accuracy and κ, which occurs when the considered corpora are balanced.
Furthermore, we tested the applicability of four generic unary classification algorithms for the AV
task, where all four were given exactly the same feature vectors (and in two cases the same distance
function). It turned out that distance-based unary classifiers are able to outperform existing AV
methods and achieve (at least partially) promising results. For the first time, we applied LOF in
the context of AV, which not only outperformed OSVM (a commonly used baseline in existing AV
studies) but also requires less runtime. Therefore, we recommend to consider LOF as a starting point
for future AV approaches.
In the near future, we will expand our evaluation on more corpora and organize the field of author-
ship verification in more depth, through the definition of additional properties such as reliability,
robustness and interpretability. Here, especially the latter is gaining more and more importance.
Moreover, we plan to compile additional corpora in order to investigate the question, if the findings
in this paper also hold in other corpora, which differ in terms of topic, genre and the language itself.
References
[1] Douglas Bagnall. Author Identification Using Multi-headed Recurrent Neural Networks. In
Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum, Toulouse, France,
September 8-11, 2015., 2015.
[2] Mohamed Amine Boukhaled and Jean-Gabriel Ganascia. Probabilistic Anomaly Detection
Method for Authorship Verification, pages 211–219. Springer International Publishing, Cham,
2014.
[3] Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and J¨org Sander. LOF: Identifying
Density-Based Local Outliers. In Weidong Chen, Jeffrey F. Naughton, and Philip A. Bernstein,
editors, Proceedings of the 2000 ACM SIGMOD International Conference on Management of
Data, May 16-18, 2000, Dallas, Texas, USA., pages 93–104. ACM, 2000.
[4] Henry C. Williams, Joi N. Carter, Willie L. Campbell, Kaushik Roy, and Gerry V. Dozier.
Genetic & Evolutionary Feature Selection for Author Identification of HTML Associated with
Malware. International Journal of Machine Learning and Computing, 4:250–255, 06 2014.
[5] Daniel Castro Castro, Yaritza Adame Arcia, Mar´ıa Pelaez Brioso, and Rafael Mu˜noz Guillena.
Authorship Verification, Average Similarity Analysis. In Proceedings of the International Con-
ference Recent Advances in Natural Language Processing, pages 84–90. INCOMA Ltd. Shoumen,
BULGARIA, 2015.
[6] Gareth Dwyer. Novel Approaches to Authorship Attribution. Master’s thesis, University of
Groningen, 2017.
[7] Hugo Jair Escalante, Manuel Montes-y-G´omez, and Luis Villase˜nor Pineda. Particle Swarm
Model Selection for Authorship Verification. In Eduardo Bayro-Corrochano and Jan-Olof Ek-
lundh, editors, Progress in Pattern Recognition, Image Analysis, Computer Vision, and Ap-
plications, 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Guadalajara,
Jalisco, Mexico, November 15-18, 2009. Proceedings, volume 5856 of Lecture Notes in Computer
Science, pages 563–570. Springer, 2009.
[8] Stefan Evert, Thomas Proisl, Fotis Jannidis, Isabella Reger, Steffen Pielstr¨om, Christof Sch¨och,
and Thorsten Vitt. Understanding and Explaining Delta Measures for Authorship Attribution.
Digital Scholarship in the Humanities, 32(suppl 2):ii4–ii16, 2017.
[9] Oren Halvani and Lukas Graner. Rethinking the Evaluation Methodology of Authorship Veri-
fication Methods. In Patrice Bellot, Chiraz Trabelsi, Josiane Mothe, Fionn Murtagh, Jian Yun
Nie, Laure Soulier, Eric SanJuan, Linda Cappellato, and Nicola Ferro, editors, Experimental
IR Meets Multilinguality, Multimodality, and Interaction, pages 40–51. Springer International
Publishing, 2018.
12
[10] Oren Halvani, Lukas Graner, and Inna Vogel. Authorship Verification in the Absence of Explicit
Features and Thresholds. In Gabriella Pasi, Benjamin Piwowarski, Leif Azzopardi, and Allan
Hanbury, editors, Advances in Information Retrieval, pages 454–465. Springer International
Publishing, 2018.
[11] Oren Halvani and Martin Steinebach. An Efficient Intrinsic Authorship Verification Scheme
Based on Ensemble Learning. In Ninth International Conference on Availability, Reliability and
Security, ARES 2014, Fribourg, Switzerland, September 8-12, 2014, pages 571–578, Washington,
DC, USA, 2014.
[12] Oren Halvani, Christian Winter, and Lukas Graner. On the Usefulness of Compression Models
for Authorship Verification. In Proceedings of the 12th International Conference on Availability,
Reliability and Security, ARES ’17, pages 54:1–54:10, New York, NY, USA, 2017. ACM.
[13] Oren Halvani, Christian Winter, and Anika Pflug. Authorship Verification for Different Lan-
guages, Genres and Topics. Digit. Investig., 16(S):S33–S43, March 2016.
[14] Josu´e Gerardo Guti´errez Hern´andez, Jos´e Casillas, Paola Ledesma, Gibran Fuentes Pineda,
and Iv´an Vladimir Meza Ru´ız. Homotopy Based Classification for Author Verification Task:
Notebook for PAN at CLEF 2015. In Working Notes of CLEF 2015 - Conference and Labs of
the Evaluation forum, Toulouse, France, September 8-11, 2015., 2015.
[15] ´
Angel Hern´andez-Casta˜neda and Hiram Calvo. Author Verification Using a Semantic Space
Model. Computaci´on y Sistemas, 21(2), 2017.
[16] David I. Holmes. The Evolution of Stylometry in Humanities Scholarship. Literary and Lin-
guistic Computing, 13(3):111–117, 1998.
[17] Manuela H¨urlimann, Benno Weck, Esther von den Berg, Simon ˇ
Suster, and Malvina Nissim.
GLAD: Groningen Lightweight Authorship Detection. In Working Notes of CLEF 2015 –
Conference and Labs of the Evaluation forum, Toulouse, France, September 8–11, 2015, 2015.
[18] Magdalena Jankowska, Vlado Keselj, and Evangelos E. Milios. Proximity Based One-class
Classification with Common N-Gram Dissimilarity for Authorship Verification Task Notebook
for PAN at CLEF 2013. In Working Notes for CLEF 2013 Conference , Valencia, Spain,
September 23-26, 2013., 2013.
[19] Magdalena Jankowska, Evangelos E. Milios, and Vlado Keselj. Author Verification Using Com-
mon N-Gram Profiles of Text Documents. In Jan Hajic and Junichi Tsujii, editors, COLING
2014, 25th International Conference on Computational Linguistics, Proceedings of the Confer-
ence: Technical Papers, August 23-29, 2014, Dublin, Ireland, pages 387–397. ACL, 2014.
[20] John Noecker Jr and Michael Ryan. Distractorless Authorship Verification. In Nicoletta Calzo-
lari (Conference Chair), Khalid Choukri, Thierry Declerck, Mehmet U˘gur Do˘gan, Bente Mae-
gaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings
of the Eight International Conference on Language Resources and Evaluation (LREC’12), Is-
tanbul, Turkey, may 2012. European Language Resources Association (ELRA).
[21] Patrick Juola, John Noecker, Ariel Stolerman, Michael Ryan, Patrick Brennan, and Rachel
Greenstadt. Towards Active Linguistic Authentication. In Gilbert Peterson and Sujeet Shenoi,
editors, Advances in Digital Forensics IX, pages 385–398, Berlin, Heidelberg, 2013. Springer
Berlin Heidelberg.
[22] Patrick Juola and Efstathios Stamatatos. Overview of the Author Identification Task at PAN
2013. In Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23-26, 2013,
2013.
[23] Shehroz S. Khan and Michael G. Madden. One-Class Classification: Taxonomy of Study and
Review of Techniques. The Knowledge Engineering Review, 29(3):345–374, 2014.
[24] Mahmoud Khonji and Youssef Iraqi. A Slightly-Modified GI-Based Author-Verifier with Lots of
Features (ASGALF). In Working Notes for CLEF 2014 Conference, Sheffield, UK, September
15-18, 2014., pages 977–983, 2014.
[25] Mirco Kocher and Jacques Savoy. A Simple and Efficient Algorithm for Authorship Verification.
Journal of the Association for Information Science and Technology, 68(1):259–269, 2017.
13
[26] Moshe Koppel and Jonathan Schler. Authorship Verification as a One-Class Classification
Problem. In Carla E. Brodley, editor, Machine Learning, Proceedings of the Twenty-first Inter-
national Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004, volume 69 of ACM
International Conference Proceeding Series. ACM, 2004.
[27] Moshe Koppel and Shachar Seidman. Automatically Identifying Pseudepigraphic Texts. In
EMNLP, pages 1449–1454. ACL, 2013.
[28] Moshe Koppel and Yaron Winter. Determining if Two Documents are Written by the Same
Author. JASIST, 65(1):178–187, 2014.
[29] Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation Forest. In Proceedings of the 2008
Eighth IEEE International Conference on Data Mining, ICDM ’08, pages 413–422, Washington,
DC, USA, 2008. IEEE Computer Society.
[30] Kim Luyckx and Walter Daelemans. Authorship Attribution and Verification with Many Au-
thors and Limited Data. In Proceedings of the 22Nd International Conference on Computational
Linguistics - Volume 1, COLING ’08, pages 513–520, Stroudsburg, PA, USA, 2008. Association
for Computational Linguistics.
[31] Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. Image-Based
Recommendations on Styles and Substitutes. In Proceedings of the 38th International ACM
SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pages
43–52, New York, NY, USA, 2015. ACM.
[32] Tempestt J. Neal, Kalaivani Sundararajan, and Damon L. Woodard. Exploiting Linguistic
Style as a Cognitive Biometric for Continuous Verification. In 2018 International Conference
on Biometrics, ICB 2018, Gold Coast, Australia, February 20-23, 2018, pages 270–276. IEEE,
2018.
[33] Novino Nirmal.A, Kyung-Ah Sohn, and T. Chung. A Graph Model Based Author Attribu-
tion Technique for Single-Class e-Mail Classification. In 2015 IEEE/ACIS 14th International
Conference on Computer and Information Science (ICIS), pages 191–196, June 2015.
[34] Nektaria Potha and Efstathios Stamatatos. A Profile-Based Method for Authorship Verification.
In Artificial Intelligence: Methods and Applications: 8th Hellenic Conference on AI, SETN
2014, Ioannina, Greece, May 15–17, 2014. Proceedings, pages 313–326. Springer International
Publishing, 2014.
[35] Nektaria Potha and Efstathios Stamatatos. An Improved Impostors Method for Authorship
Verification. In Experimental IR Meets Multilinguality, Multimodality, and Interaction - 8th
International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, September
11-14, 2017, Proceedings, pages 138–144, 2017.
[36] Nektaria Potha and Efstathios Stamatatos. Intrinsic Author Verification Using Topic Modeling.
In Proceedings of the 10th Hellenic Conference on Artificial Intelligence, SETN 2018, Patras,
Greece, July 09-12, 2018, pages 20:1–20:7. ACM, 2018.
[37] Oxana Ye. Rodionova, Paolo Oliveri, and Alexey L. Pomerantsev. Rigorous and Compliant
Approaches to One-Class Classification. Chemometrics and Intelligent Laboratory Systems,
159:89 – 96, 2016.
[38] Shachar Seidman. Authorship Verification Using the Impostors Method Notebook for PAN at
CLEF 2013. In Working Notes for CLEF 2013 Conference , Valencia, Spain, September 23-26,
2013., 2013.
[39] Efstathios Stamatatos, Walter Daelemans, Ben Verhoeven, Patrick Juola, Aurelio L´opez-L´opez,
Martin Potthast, and Benno Stein. Overview of the Author Identification Task at PAN 2015.
In Working Notes of CLEF 2015 – Conference and Labs of the Evaluation forum, Toulouse,
France, September 8–11, 2015, 2015.
[40] Efstathios Stamatatos, Walter Daelemans, Ben Verhoeven, Benno Stein, Martin Potthast,
Patrick Juola, Miguel A. S´anchez-P´erez, and Alberto Barr´on-Cede˜no. Overview of the Au-
thor Identification Task at PAN 2014. In Working Notes for CLEF 2014 Conference, Sheffield,
UK, September 15–18, 2014, pages 877–897, 2014.
14
[41] Efstathios Stamatatos, Nikos Fakotakis, and George K. Kokkinakis. Automatic Text Catego-
rization in Terms of Genre and Author. Computational Linguistics, 26(4):471–495, 2000.
[42] Benno Stein, Nedim Lipka, and Sven Meyer zu Eissen. Meta Analysis within Authorship
Verification. In 19th International Workshop on Database and Expert Systems Applications
(DEXA 2008), 1-5 September 2008, Turin, Italy, pages 34–39. IEEE Computer Society, 2008.
[43] David Martinus Johannes Tax. One-Class Classification: Concept Learning In the Absence of
Counter-Examples. PhD thesis, Delft University of Technology, 2001.
[44] Cor J. Veenman and Zhenshi Li. Authorship Verification with Compression Features. In Work-
ing Notes for CLEF 2013 Conference , Valencia, Spain, September 23–26, 2013, 2013.
15