Conference Paper

Detecting wikipedia vandalism using WikiTrust: Lab report for PAN at CLEF 2010

Conference: CLEF 2010 LABs and Workshops, Notebook Papers, 22-23 September 2010, Padua, Italy
Source: DBLP


WikiTrust is a reputation system for Wikipedia authors and content. WikiTrust computes three main quantities: edit quality, author reputation, and content reputation. The edit quality measures how well each edit, that is, each change introduced in a revision, is preserved in subsequent revisions. Authors who perform good quality edits gain reputation, and text which is revised by sev- eral high-reputation authors gains reputation. Since vandalism on the Wikipedia is usually performed by anonymous or new users (not least because long-time vandals end up banned), and is usually reverted in a reasonably short span of time, edit quality, author reputation, and content reputation are obvious candi- dates as features to identify vandalism on the Wikipedia. Indeed, using the full set of features computed by WikiTrust, we have been able to construct classifiers that identify vandalism with a recall of 83.5%, a precision of 48.5%, and a false positive rate of 8%, for an area under the ROC curve of 93.4%. If we limit our- selves to the set of features available at the time an edit is made (when the edit quality is still unknown), the classifier achieves a recall of 77.1%, a precision of 36.9%, and a false positive rate of 12.2%, for an area under the ROC curve of 90.4%. Using these classifiers, we have implemented a simple Web API that provides the vandalism estimate for every revision of the English Wikipedia. The API can be used both to identify vandalism that needs to be reverted, and to select high- quality, non-vandalized recent revisions of any given Wikipedia article. These recent high-quality revisions can be included in static snapshots of the Wikipedia, or they can be used whenever tolerance to vandalism is low (as in a school setting, or whenever the material is widely disseminated).

  • Source
    • "The datasets are represented by the selected element based on their metadata information, which helps the users to visually evaluate the quality of datasets. The proposed method has been inspired by WikiTrust (Adler et al., 2010; wikitrust, 2012), which automatically assesses the credibility of content and author reputation of wiki articles, and then uses different text and textbackground colors to represent this assessment to users (Figure 1): High reputation text, revised by many high-reputation colors, will appear over a white background, while lowreputation text, which has not benefitted yet from revision by multiple, high-reputation users, is displayed over an orange background; the more intense the orange, the lower the reputation of text. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Volunteered geographic information is constantly being added, edited or removed by users. Most of VGI users are not experts, thus formal representation of spatial data quality parameters through metadata standards does not efficiently communicate, as it may be interpreted differently by different users with different semantics. In addition, a user may not be able to decide on the relevant dataset for their in-hand application. In this paper, we propose providing VGI users with the spatial data quality parameters through simple cartographic representations, which is independent of users’ semantics. The problem is described and its implementation results for a simple case study are represented.
    Full-text · Article · Aug 2015
  • Source
    • "Notice that in practice the choice of τ depends on the preferred performance characteristic. In order to quantify the performance of a detector independent of τ , precision values are plotted over recall values, and, analogously, TP-rate values are plotted over FP-rate values—for all sensible choices of τ ∈ [0] [1]. The resulting curves are called precision-recall curve and receiver operating characteristic (ROC) curve. "

    Full-text · Conference Paper · Jan 2011
  • Source
    • "0.90351 2 0.49263 3 ↓ Adler et al. [1] 0.89856 3 0.44756 4 ↓ Javanmardi [8] 0.89377 4 0.56213 2 ⇈ Chichkov [3] 0.87990 5 0.41365 7 Seaward [12] "

    Full-text · Conference Paper · Jan 2010
Show more