Language and gender author cohort analysis of e-mail for computer forensics

Source: OAI

ABSTRACT We describe an investigation of authorship gender and language background cohort attribution mining from e-mail text documents. We used an extended set of predominantly topic content-free e-mail document features such as style markers, structural characteristics and gender-preferential language features together with a Support Vector Machine learning algorithm. Experiments using a corpus of e-mail documents generated by a large number of authors of both genders gave promising results for both author gender and language background cohort categorisation.

Download full-text


Available from: Malcolm Corney, May 27, 2014
1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This work addresses the problem of automatic annotation of clinical interview transcripts. We formulate this task as su-pervised machine learning problem and propose highly scal-able and efficient probabilistic classifiers based on generative latent variable models to solve it. Experimental results indi-cate that the proposed classifiers outperform some popular standard algorithms, such as Nave Bayes, and provide more interpretable results for clinicians and researchers.
    Knowledge Discovery and Data Mining, New York, New York; 08/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we investigate the potential of leveraging keystroke analysis - primarily used in user authentication - to user profiling and identification for forensic investigations. As such, the keystroke forensics approach proposed in this paper will support user profiling through integration with the offender profiling domain. Early findings show that it was possible to identify with significant probability the conditions and means a user is performing typing operations.
    Proceedings of the 6th Balkan Conference in Informatics; 09/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Human communication via networks poses interesting and challenging problems on towards sustainable technology revolutions. Instant communication by means of chat mediums has become an attractive and effective communication mode. Such mediums are text based which provides relevant information to society"s current interests, attitudes, intention of the speakers in general dependent on the domain etc. Analysis and processing of such conversation is of greater importance. This paper aims to develop an methodology that can automatically determine the gender of the chatters. Since approaches towards understanding the dynamics of chat conversation are limited, need for automatic analysis increases. Therefore the social interactions and their conceptual topic is a genuine challenge. Experiments carried out in this paper were quite promising.