Conference Paper

Detecting Plagiarism in Text Documents.

DOI: 10.1007/978-3-642-12214-9_86 Conference: Information Processing and Management - International Conference on Recent Trends in Business Administration and Information Processing, BAIP 2010, Trivandrum, Kerala, India, March 26-27, 2010. Proceedings
Source: DBLP

ABSTRACT Plagiarism aims at identifying the amount of information that is copied or reproduced in modified representation of original
documents. This is quiet common among students, researchers and academicians that leads to a kind of unrecognizing. Though
there exits some commercial tools to detect plagiarism, still plagiarism is tricky and quiet challenging task due to abundant
information available online. Commercially existing softwares adopt methods like paraphrasing, sentence matching or keyword
matching. This paper focuses its attention on identifying some key parameters that would help to identify plagiarism in a
better manner and to report plagiarism in an effective way. The result seems to be promising and have further scope in detecting
the plagiarism.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Using keyword overlaps to identify plagiarism can result in many false negatives and positives: substitution of synonyms for each other reduces the similarity between works, making it difficult to recognize plagiarism; overlap in ambiguous keywords can falsely inflate the similarity of works that are in fact different in content. Plagiarism detection based on verbatim similarity of works can be rendered ineffective when works are paraphrased even in superficial and immaterial ways. Considering linguistic information related to creative aspects of writing can improve identification of plagiarism by adding a crucial dimension to evaluation of similarity: documents that share linguistic elements in addition to content are more likely to be copied from each other. In this paper, we present a set of low-level syntactic structures that capture creative aspects of writing and show that information about linguistic similarities of works improves recognition of plagiarism (over tfidf-weighted keywords alone) when combined with similarity measurements based on tfidf-weighted keywords.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Similarity is an important and widely used concept in many applications such as Document Summarisation, Question Answering, Information Retrieval, Document Clustering and Categorisation. This paper presents a comparison of various similarity measures in comparing the content of text documents. We have attempted to find the best measure suited for finding the document similarity for newspaper reports.
    Journal of Information & Knowledge Management 11/2011; 07(01).
  • [Show abstract] [Hide abstract]
    ABSTRACT: As information technologies advance, the data amount gathered on the Internet increases at an incredible rapid speed. To solve the data overloading problem, people commonly use Web search engines to find what they need. However, as search engines become an efficient and effective tool, plagiarists can grab, reassemble and redistribute text contents without much difficulty. In this paper, we develop an online detection system to reduce such misapplication of search engines. Specifically, suspicious documents are extracted and verified through the collaboration of our plagiarism detection system and search engines. With a proper design, extracted text segments are given different priorities when sending them to search engines as the ascertainment of plagiarism. This greatly reduces unnecessary and repetitive works when performing plagiarism detection.
    Information Reuse and Integration, 2007. IRI 2007. IEEE International Conference on; 09/2007