Language and Gender Author Cohort Analysis of E-mail for Computer Forensics

In Proceedings of the Digital Forensic Research Workshop. Syracuse, NY 01/2002;
Source: OAI


We describe an investigation of authorship gender and language background cohort attribution mining from e-mail text documents. We used an extended set of predominantly topic content-free e-mail document features such as style markers, structural characteristics and gender-preferential language features together with a Support Vector Machine learning algorithm. Experiments using a corpus of e-mail documents generated by a large number of authors of both genders gave promising results for both author gender and language background cohort categorisation.


Available from: Malcolm Corney, May 27, 2014
  • Source
    • "The task of fine-grained analysis of language arises in many context from sentiment analysis of on-line reviews [7] [3] to analysis of scientific literature [1] and digital forensics [2]. Performance of different classifiers (including Naive Bayes [5]) have been examined in previous work for most common text classification tasks and particularly for sentiment analysis [4] [6]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This work addresses the problem of automatic annotation of clinical interview transcripts. We formulate this task as su-pervised machine learning problem and propose highly scal-able and efficient probabilistic classifiers based on generative latent variable models to solve it. Experimental results indi-cate that the proposed classifiers outperform some popular standard algorithms, such as Nave Bayes, and provide more interpretable results for clinicians and researchers.
    Knowledge Discovery and Data Mining, New York, New York; 08/2014
  • Source
    • "In all approaches a common modus operandi was the theoretical or empirical adjustment of the underlying coefficients in order to improve the results (that is reduce Type I & II errors) and remove potential outliers [2]. However, a portion of the work on keystroke dynamics has moved away from authentication, dealing with issues such as typing in different languages and the information that can be extracted from the user's textual artifacts [6], or identifying the gender of the author of a text [4] [1] based primarily on content (words, punctuation, emoticons, etc) and composition (number uppercase characters, number of digits, number of blank lines, length of the proposals, and so forth) of the text, while other work looked for relations between the psychological state of the user and the way in which she types [5]. As a general conclusion, it was established that modeling of typing is associated with many parameters and the data obtained from a typing sample can be significant. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we investigate the potential of leveraging keystroke analysis - primarily used in user authentication - to user profiling and identification for forensic investigations. As such, the keystroke forensics approach proposed in this paper will support user profiling through integration with the offender profiling domain. Early findings show that it was possible to identify with significant probability the conditions and means a user is performing typing operations.
    Proceedings of the 6th Balkan Conference in Informatics; 09/2013
  • Source
    • "In addition to some well-known stylistic features, they used features like smiley " s and emoticons. The experiments investigated seem to achieve reasonable accuracies in predicting the gender and language background [4]. Email documents are examined for predicting the identity and gender of the author involves mining of typical features such as message tags, signatures and vocabulary richness [2]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Human communication via networks poses interesting and challenging problems on towards sustainable technology revolutions. Instant communication by means of chat mediums has become an attractive and effective communication mode. Such mediums are text based which provides relevant information to society"s current interests, attitudes, intention of the speakers in general dependent on the domain etc. Analysis and processing of such conversation is of greater importance. This paper aims to develop an methodology that can automatically determine the gender of the chatters. Since approaches towards understanding the dynamics of chat conversation are limited, need for automatic analysis increases. Therefore the social interactions and their conceptual topic is a genuine challenge. Experiments carried out in this paper were quite promising.
Show more