-
Proceedings of the 2011 Joint International Conference on Digital Libraries, JCDL 2011, Ottawa, ON, Canada, June 13-17, 2011; 01/2011
-
Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries (JCDL`11); 01/2011
-
[show abstract]
[hide abstract]
ABSTRACT: Various approaches for plagiarism detection exist. All are based on more or less sophisticated text analysis methods such as string matching, fingerprinting or style comparison. In this paper a new approach called Citation-based Plagiarism Detection is evaluated using a doctoral thesis, in which a volunteer crowd-sourcing project called GuttenPlag identified substantial amounts of plagiarism through careful manual inspection. This new approach is able to identify similar and plagiarized documents based on the citations used in the text. It is shown that citation-based plagiarism detection performs significantly better than text-based procedures in identifying strong paraphrasing, translation and some idea plagiarism. Detection rates can be improved by combining citation-based with text-based plagiarism detection.
Proceedings of 11th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'11); 01/2011
-
[show abstract]
[hide abstract]
ABSTRACT: Extracting titles from a PDF’s full text is an important task in information retrieval to identify PDFs. Existing approaches
apply complicated and expensive (in terms of calculating power) machine learning algorithms such as Support Vector Machines
and Conditional Random Fields. In this paper we present a simple rule based heuristic, which considers style information (font
size) to identify a PDF’s title. In a first experiment we show that this heuristic delivers better results (77.9% accuracy)
than a support vector machine by CiteSeer (69.4% accuracy) in an ‘academic search engine’ scenario and better run times (8:19
minutes vs. 57:26 minutes).
Keywordsheader extraction-title extraction-style information-document analysis
09/2010: pages 413-416;
-
HT'10, Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, Toronto, Ontario, Canada, June 13-16, 2010; 01/2010
-
HT'10, Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, Toronto, Ontario, Canada, June 13-16, 2010; 01/2010
-
Journal of Scholarly Publishing. 01/2010; 41:176-190.
-
01/2010
-
Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication, ICUIMC 2010, Suwon, Republic of Korea, January 14-15, 2010; 01/2010
-
Research and Advanced Technology for Digital Libraries, 14th European Conference, ECDL 2010, Glasgow, UK, September 6-10, 2010. Proceedings; 01/2010
-
Research and Advanced Technology for Digital Libraries, Proceedings of the 14th European Conference on Digital Libraries (ECDL'10); 01/2010
-
HT'10, Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, Toronto, Ontario, Canada, June 13-16, 2010; 01/2010
-
Proceedings of the 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom'09); 12/2009
-
Rio de Janeiro (Brazil); 07/2009
-
Rio de Janeiro (Brazil); 07/2009
-
Virudhunagar (India); 01/2009
-
Heidelberg (Germany); 12/2008
-
Berkeley (USA);
-
Createspace.
-
Shaker Verlag.