Language and gender author cohort analysis of e-mail for computer forensics

Source: OAI

ABSTRACT We describe an investigation of authorship gender and language background cohort attribution mining from e-mail text documents. We used an extended set of predominantly topic content-free e-mail document features such as style markers, structural characteristics and gender-preferential language features together with a Support Vector Machine learning algorithm. Experiments using a corpus of e-mail documents generated by a large number of authors of both genders gave promising results for both author gender and language background cohort categorisation.

  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents the results of studies on author attribution for short text documents. It briefly presents features used in authorship identification, outlines the proposed hybrid algorithm, and describes the results of experiments which explore conditions needed to reliably recognize author identity. Finally, it summarizes obtained results, presents main problems faced while authorship attribution for short text documents and proposes improvements which when applied might lead to the better authorship description and in consequence recognition.
    Human Language Technology. Challenges for Computer Science and Linguistics - 4th Language and Technology Conference, LTC 2009, Poznan, Poland, November 6-8, 2009, Revised Selected Papers; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The focus of this paper is to investigate the possibility of predicting several user and message attributes in text-based, real-time, online messaging services. For this purpose, a large collection of chat messages is examined. The applicability of various supervised classification techniques for extracting information from the chat messages is evaluated. Two competing models are used for defining the chat mining problem. A term-based approach is used to investigate the user and message attributes in the context of vocabulary use while a style-based approach is used to examine the chat messages according to the variations in the authors’ writing styles. Among 100 authors, the identity of an author is correctly predicted with 99.7% accuracy. Moreover, the reverse problem is exploited, and the effect of author attributes on computer-mediated communications is discussed.
    Information Processing & Management. 01/2008;
  • [Show abstract] [Hide abstract]
    ABSTRACT: There has been tremendous growth in the information environment since the advent of the Internet and wireless networks. Just as e-mail has been the mainstay of the web in its use for personal and commercial communication, one can say that text messaging or Short Message Service (SMS) has become synonymous with communication on mobile networks. With the increased use of text messaging over the years, the amount of mobile evidence has increased as well. This has resulted in the growth of mobile forensics. A key function of digital forensics is efficient and comprehensive evidence analysis which includes authorship attribution. Significant work on mobile forensics has focused on data acquisition from devices and little attention has been given to the analysis of SMS. Consequentially, we propose a software application called: SMS Management and Information Retrieval Kit (SMIRK). SMIRK aims to deliver a fast and efficient solution for investigators and researchers to generate reports and graphs on text messaging. It also allows investigators to analyze the authorship of SMS messages.
    Digital Forensics and Cyber Crime - First International ICST Conference, ICDF2C 2009, Albany, NY, USA, September 30-October 2, 2009, Revised Selected Papers; 01/2009

Full-text (2 Sources)

1 Download
Available from
Jun 10, 2014