Conference Paper

Authorship Identification of Romanian Texts with Controversial Paternity.

Conference: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2008, 26 May - 1 June 2008, Marrakech, Morocco
Source: DBLP


In this work we propose a new strategy for the authorship identification problem and we test it on an example from Romanian literature: did Radu Albala found the continuation of Mateiu Caragiale's novel "Sub pecetea tainei", or did he write himself the respective contin- uation? The proposed strategy is based on the similarity of rankings of function words; we compare the obtained results with the results obtained by a learning method (namely Support Vector Machines -SVM- with a string kernel).

Download full-text


Available from: Anca Dinu
  • Source

    Full-text · Chapter · Sep 2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper aims to present two clustering methods based on rank distance. Rank distance has applications in many different fields such as computational linguistics, biology and informatics. Rank distance can be computed fast and benefits from some features of the edit (Levenshtein) distance. In [1] two clustering methods based on rank distance are described. The K-means algorithm uses the median string to represent the centroid of a cluster, while the hierarchical clustering method joins pairs of strings and replaces each pair with the median string. Two similar clustering algorithms are about to be presented in this paper, only that the closest string will be considered instead of the median string. The new clustering algorithms are compared with those presented in [1] and other similar clustering techniques. Experiments using mitochondrial DNA sequences extracted from several mammals are performed to compare the results of the clustering methods. Results demonstrate the clustering performance and the utility of the new algorithms.
    No preview · Conference Paper · Jan 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results.
    Full-text · Article · Jun 2012 · PLoS ONE
Show more