Conference Proceeding

A Novel Adjustable Matrix Bloom Filter-Based Copy Detection System for Digital Libraries

10/2011; DOI:10.1109/CIT.2011.61 pp.518 - 525 In proceeding of: Computer and Information Technology (CIT), 2011 IEEE 11th International Conference on
Source: IEEE Xplore

ABSTRACT With the increasing volume of on-line literatures on the Internet and the simplicity of finding and downloading data, dishonest use of the findings of others, known as plagiarism, is getting worse and worse. Therefore, there is a need to be a copy detection system to address this problem in an efficient way. Most current systems only focus on one goal, estimating similarity with highest accuracy, i.e. 100%. While, in some real applications, it can be useful to take into account other factors such as query speed, memory usage and security of content at the cost of reducing accuracy by a few percentages. In this paper, we propose an innovative adjustable copy-paste detection system which provides an adjustable property on mentioned factors according to the application requirements. The main core of our design is a new extension of Bloom filters, called Matrix Bloom Filter (MBF), which provides the adjustability of the system. A matrix Bloom filter is defined as a bit matrix in which each entry can only be set or reset. It is utilized to efficiently maintain all documents of libraries. Based on our knowledge, this is the first work using the idea behind Bloom filters to solve copy-paste detection problem while ensuring the privacy of document content and also the first work aiming to provide this adjustable property. The experimental results show that our proposed approach provides three main improvements, including enhancing the speed of querying operation up to 2.7 times, diminishing the memory required and providing the security of content besides allowing an adjustable trade-off among all aforesaid factors.

0 0
 · 
0 Bookmarks
 · 
26 Views
  • Article: Survey: Network Applications of Bloom Filters: A Survey.
    Internet Mathematics. 01/2003; 1.
  • Source
    Conference Proceeding: Space-code bloom filter for efficient per-flow traffic measurement
    [show abstract] [hide abstract]
    ABSTRACT: Per-flow traffic measurement is critical for usage accounting, traffic engineering, and anomaly detection. Previous methodologies are either based on random sampling (e.g., Cisco's NetFlow), which is inaccurate, or only account for the "elephants". We introduce a novel technique for measuring per-flow traffic approximately, for all flows regardless of their sizes, at very high-speed (say, OC768). The core of this technique is a novel data structure called space code bloom filter (SCBF). A SCBF is an approximate representation of a multiset; each element in this multiset is a traffic flow and its multiplicity is the number of packets in the flow. The multiplicity of an element in the multiset represented by SCBF can be estimated through either of two mechanisms-maximum likelihood estimation (MLE) or mean value estimation (MVE). Through parameter tuning, SCBF allows for graceful tradeoff between measurement accuracy and computational and storage complexity. SCBF also contributes to the foundation of data streaming by introducing a new paradigm called blind streaming. We evaluate the performance of SCBF through mathematical analysis and through experiments on packet traces gathered from a tier-1 ISP backbone. Our results demonstrate that SCBF achieves reasonable measurement accuracy with very low storage and computational complexity
    INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies; 04/2004
  • Source
    Article: CHECK: A Document Plagiarism Detection System
    [show abstract] [hide abstract]
    ABSTRACT: Langston Hughes was among four principal writers who achieved major recognition during the Harlem Renaissance. The Renaissance was an outstanding phase of literary and artistic development of black people in the United States. Hughes wrote in every genre on a sundry of topics. However, for purposes of this research, Hughes role as a social critic of his time will be discussed. The paper will begin with bibliographical facts on Hughes for the benefit of demonstrating to the students the relationship between the artist and the art. Next, I will demonstrate how Hughes drew from the historically rich period in which he lived and became in essence an artistic recorder of history. A detailed study of selected poems that will reflect his attempts to protest injustice will follow.
    08/1998;

Full-text

View
1 Download
Available from
7 May 2013

Keywords

aforesaid factors
 
application requirements
 
Bloom filters
 
copy detection system
 
copy-paste detection problem
 
current systems
 
dishonest use
 
downloading data
 
efficient way
 
first work
 
highest accuracy
 
increasing volume
 
innovative adjustable copy-paste detection system
 
Matrix Bloom Filter
 
new extension
 
on-line literatures
 
proposed approach
 
query speed
 
querying operation
 
real applications