Conference PaperPDF Available

Quality and Importance of Wikipedia Articles in Different Languages


Abstract and Figures

This article aims to analyse the importance of the Wikipedia articles in different languages (English, French, Russian, Polish) and the impact of the importance on the quality of articles. Based on the analysis of literature and our own experience we collected measures related to articles, specifying various aspects of quality that will be used to build the models of articles’ importance. For each language version, the influential parameters are selected that may allow automatic assessment of the validity of the article. Links between articles in different languages offer opportunities in terms of comparison and verification of the quality of information provided by various Wikipedia communities. Therefore, the model can be used not only for a relative assessment of the content of the whole article, but also for a relative assessment of the quality of data contained in their structural parts, the so-called infoboxes.
Content may be subject to copyright.
Quality and Importance of Wikipedia Articles in
Different Languages
Włodzimierz Lewoniewski, Krzysztof W˛ecel, Witold Abramowicz
n Univercity of Economics and Business,
Al. Niepodległo´
sci 10, 61-875 Pozna´
n, Poland
Abstract. ?This article aims to analyse the importance of the Wikipedia arti-
cles in different languages (English, French, Russian, Polish) and the impact of
the importance on the quality of articles. Based on the analysis of literature and
our own experience we collected measures related to articles, specifying various
aspects of quality that will be used to build the models of articles importance.
For each language version, the influential parameters are selected that may al-
low automatic assessment of the validity of the article. Links between articles
in different languages offer opportunities in terms of comparison and verification
of the quality of information provided by various Wikipedia communities. There-
fore, the model can be used not only for a relative assessment of the content of the
whole article, but also for a relative assessment of the quality of data contained
in their structural parts, the so-called infoboxes.
Keywords: Wikipedia, DBpedia, information quality, data quality, WikiRank, ar-
ticle importance
JEL classification: C55, D8, L15, L86
1 Introduction
Currently there are 282 active Wikipedia language editions1.The largest is the English
version, which has more than 5 million articles. The first ten biggest editions also in-
cludes German, French, Russian and Polish.
This online encyclopedia has become one of the most important sources of knowl-
edge throughout the world. In April 2016, the number of visits amounted to 282 mil-
lion per day in all the language versions2. In the ranking of the most popular websites
Wikipedia occupies 6th place in the world3.
Every day increases the number of articles in each language. Articles can be created
(edited) also by anonymous users. The authors do not have to formally demonstrate
their skills in a specific field. Wikipedia has no central editorial or group of reviewers
who could comprehensive approaches to verify all new and existing products. These
?This is a preprint version. The original publication available at
1007/978-3- 319-46254- 7_50
2 W. Lewoniewski, K. W˛ecel, W. Abramowicz
and other problems led to criticism of the concept of Wikipedia, in particular pointing
out the poor quality of information4.
Quality issues, however, concern the creators of Wikipedia. Practically every lan-
guage version of the online encyclopedia have an award system for high quality articles.
In the English version of Wikipedia the best articles have name „Featured Article” (FA).
Articles that does not fulfill all the criteria FA, but closer to their quality, they can also
get slightly lower award „Good Article” (GA).
In order to receive award article must be submitted for nomination by the user. The
result of this is carried out discussion and voting takes place, where every user can
approve or not to give awards for the specific article and explain their point of view.
The criteria and rules for granting awards in each language version may change over
time, which in turn may result in loss of award by some articles5.
In addition to the award, in some language versions the article may receive lower
scores. Such an indirect assessment may indicate„ maturity ” of the article (i.e., in what
degree it is close to the best articles). The English version of Wikipedia generally dis-
tinguishes 7 quality classes of articles (from the highest): FA, GA, A-class, B-class,
C-class, Start, Stub. It is noteworthy that, unlike higher classes FA and GA, the other
(lower) grades are received without a community discussion and voting – each user
can set the rating by himself on the basis of rules. Some language versions use less-
developed grading scale, e.g. in Polish version in addition awarded equivalent FA and
GA are also grades6: Czwórka, Start, Zal ˛a˙
zek (altogether 5 classes).
In Wikipedia there is no generally accepted standard classification of quality articles
between different language versions [1]. Some languages use expanded rating scale
(EN, RU), others are limited to 2-3 grades (BE, DE).In other words, each language
version can have its own classification system of articles quality, but all of them use at
least two highest classes - equivalent for FA and GA. However, such articles are very
few - on average, in each language version of their share is about 0,07%. It should also
be noted that a large part of the articles is not even evaluated, eg. in Polish edition share
of such articles is over 99%.
In some language there is an importance scale7for articles. This feature is used for
rating article importance in particular subject (or subjects) and usually marked as Top-,
High-, Medium- or Low-importance. It can be expected that the greater the importance
of the article, the better its quality. However, it should be taken into account quality
class. Figures 1and 2show summary table by taking quality and importance rating
for each assessed article in English, French, Russian and Polish Wikipedia. In contrast
to similar statistics at Wikipedia8, where one article could count to 2 times or more,
we took into account only one of the highest quality and importance grade of each
5For English Wikipedia there is a list of articles that have lost their award - https://en.
Quality and Importance of Wikipedia Articles in Different Languages 3
article. On it to the reason, for example, the class „A” has only 181 articles, although
technically number of such articles - 1593. It is connected with that the vast majority
of articles with grade „A” are additionally evaluated as FA or GA. Therefore, in our
experiments we will not take into account the class „A”.
Fig. 1. Articles by quality and importance in English (on the left) and French (on the right)
Wikipedia. Source: own calculations in May 2016
Fig. 2. Articles by quality and importance in Russian (on the left) and Polish (on the right)
Wikipedia. Source: own calculations in May 2016
In the scientific literature we can find studies, which offer different approaches to
the automatic evaluation of the quality of Wikipedia articles. Based on the different
characteristics of highly-rated (awarded) articles it is possible to evaluate other. Text
length, number of references, the number of images and other articles’ features can
help in the quality assessment.
The aim of our research is to answer to the following questions:
Does article importance affect its quality?
What parameters can help to assess the importance of the article automatically?
Is there a difference between importance models in different languages?
Most of the research on models for the quality of Wikipedia articles is focused
on the “largest” language – English. In this paper we consider 4 popular languages:
4 W. Lewoniewski, K. W˛ecel, W. Abramowicz
English (en), French (fr), Polish (pl), Russian (ru), which have introduced the templates
for specifying article’s importance. This allows us to build models that will be able to
compare articles quality in different languages. Besides, this is the first study in which
we have build importance models of the article and we will conduct a comparative
analysis of these models in a different languages.
2 Automatic quality assessment
Since founding and with the increasing popularity of Wikipedia there are more and
more scientific publications on the quality of the information. One of the first studies
showed that the measurement of the volume of content can help determine the degree
of maturity of the article [2]. Work in this direction show that generally higher quality
articles are longer [3], use references in a coherent way, are edited by hundreds of
editors and have thousands of editions [4,5].
In addition to quantitative analysis, later research has focused on the qualitative
analysis around the content of the article. In one of the works has been used so-called.
FOG index readability, which determines the degree accessibility of text [6]. In cases
where the volume of contents in articles is similar, better article will have more factual
information [7]. Style and variety of words used also affects the quality of the article
[8,9]. Wikipedia users can include special templates in an article, indicating gaps in
quality. Such annotations can help in assessing the quality of the article [10]. Features
related to articles popularity can also be used in assessing the quality of the information
they contain [11].
Another works on automatic quality classification of Wikipedia articles taking into
account user behavior. There are models that take into account their experience and
reputation. Articles quality has a large number of editing and a large number of editors
who have a high level of cooperation [12,13]. It is important that in this group of editors
was even one user with a high level of experience in content editing in Wikipedia [14].
Particular importance have the reputation of the user who made the first edition of the
article [15]. Reputation can be calculated on the basis of „survival” of the text, which
user placed [16,17,18].
In this study, we decided to focus primarily on those aspects that can help improve
the quality of the article – so we consider the content of the article and its metadata.
3 Data selection and extraction
On the basis of literature [19,2,4,12,3,20,6,8,21,10,22,11] and our own research
we have chosen 85 articles parameters which will be taken into account when building
quality and importance models of Wikipedia articles. These parameters include various
areas such as text statistics, parts of speech, readability formulas, similarity of words,
the structure of the article, edition history, network parameters, popularity of the article,
the characteristics of discussion.
One of the most attractive methods for obtaining data from Wikipedia is API ser-
vice, which provides easy access to data and metadata of articles using HTTP, via a URL
Quality and Importance of Wikipedia Articles in Different Languages 5
in a variety formats (including XML, JSON). API service works for every language and
is available at the address specified by the template: https://{lang}.wikipedia.
org/w/api.php?action={settings}, where {lang} – abbreviation of the
language version, {settings} – query settings9. Possibilities of API used in our
specially prepared program WikiAnalyzer, which can get over 50 different parameters
of each article.
In figure 3the distribution of variables in articles with different quality class is
shown. It is noticeable that the increase in each feature attracts increase in a share of
higher-quality articles. We also compared parameters of the articles only from FA-class
but of varying importance. Some of them are shown in figure 4in English Wikipedia.
Here regularity is also observed: the increase in value of features involves increases in
a share of important articles.
Fig. 3. Distribution of variables in articles with different quality class in English Wikipedia
Fig. 4. Distribution of variables in FA articles with different importance in English Wikipedia
9All possible settings in API service can be found on a special page: https://en.
6 W. Lewoniewski, K. W˛ecel, W. Abramowicz
3.1 Dataset ENQ
For the answer to the first question raised in the introduction, we decided to make
evaluation on articles with certain quality and certain importance of English Wikipedia,
because this version:
is the largest language version
has the developed system of quality classification of articles
has the greatest number of articles on intersections of quality and importance (see
figure 1).
Because the smallest number of articles on intersection of quality and importance is
854 (for class FA and Top-importance) we decided to choose randomly 800 articles of
each intersection (without A-class for the reasons described earlier). Altogether, there
were 19200 articles in our ENQ dataset.
3.2 Datasets IMP
For the answer to the second and the third question evaluation in articles from different
quality with certain importance. From 4 studied language versions the least developed
system of importance assessment have Polish Wikipedia. There, the smallest number
of Top-important articles was 489. Therefore, we have decided to choose randomly 400
articles from each importance level in each language version to allow the homogeneous
distribution of the learning datasets.
4 Evaluation
In many approaches for building models the binary dependent variable was used [9,7,
22,11] and the quality was modelled as the probability of belonging to one of the two
Complete articles: FA-class and GA-class
Incomplete articles: all other – developing (which should be further developed) and
the unassessed articles.
Our previous research has shown that with such binary forecast variable the preci-
sion of 98-100% can be achieved (depending on the language version) [23]. Therefore,
we decided to expand the number of alternatives in dependent variable – now each qual-
ity class is a separate name of this variable. For example, for our dataset ENQ we have
6 alternatives in dependent variable: FA, GA, B, C, Start, Stub.
Our researches have shown efficiency of Random Forest classifier on similar tasks,
therefore in this study we also we use that data mining algorithm with default settings
(100 trees, cross-validation with 10 folds) using WEKA software [23].
So, using 85 different articles parameters as independent variable and quality class
as dependent we can reach 60% precision of classification. After inclusion of additional
feature – importance of article – the precision of the model increased to 61%. Therefore
Quality and Importance of Wikipedia Articles in Different Languages 7
we conclude that inclusion of addition input variable (article importance) can improve
the precision of classification.
The confusion matrix for model with quality class and importance level as depen-
dent variables in ENQ dataset are shown in figure 5. Table 1and 2show the performance
of the classifier. It can be argued that importance of an article affects its quality.
Fig. 5. Confusion matrix - Quality (on the left) and Importance (on the right). English Wikipedia
Table 1. Classification results per quality class in English Wikipedia using Random Forest.
Source: own study
Quality class TP Rate FP Rate Precision Recall F-Measure ROC Area
FA 0.893 0.046 0.797 0.893 0.842 0.983
GA 0.719 0.066 0.687 0.719 0.703 0.946
B 0.394 0.087 0.474 0.394 0.43 0.827
C 0.391 0.104 0.429 0.391 0.409 0.827
Start 0.542 0.109 0.499 0.542 0.52 0.859
Stub 0.778 0.045 0.775 0.778 0.777 0.964
Overall 0.62 0.076 0.61 0.62 0.613 0.901
Now let’s try to answer remaining two research questions. We use our IMP datasets
which contains importance level as dependent variable. Using Random Forest as predic-
tion model, we can obtain the most influential features, which affect article importance
in each language. In figure 6we show influence of each article parameter in importance
model in different language editions of Wikipedia (in scale from 0 to 100, 100 - is the
highest influence). As we can see, we have some differences between the models in
particular languages. For example for English version the most influential features are:
the sum of visits in 30 days, the number of links to article.
8 W. Lewoniewski, K. W˛ecel, W. Abramowicz
Table 2. Classification results per importance level in English Wikipedia using Random Forest.
Source: own study
Importance level TP Rate FP Rate Precision Recall F-Measure ROC Area
Top 0.662 0.158 0.583 0.662 0.62 0.852
High 0.335 0.172 0.394 0.335 0.362 0.676
Mid 0.325 0.166 0.395 0.325 0.357 0.672
Low 0.644 0.183 0.54 0.644 0.587 0.827
Overall 0.491 0.17 0.478 0.491 0.481 0.757
Fig. 6. Influence of article parameters in importance model in different laguage editions of
Wikipedia (description of parameters abbreviations in table 3). Source: own study.
5 Conclusions
In this paper we have shown that the importance of the article affects the quality of the
information contained in it. In our study we used ca. 80 features of articles and various
data mining techniques to come up with a proposal for a quality models. We have also
built the importance models for particular language edition of Wikipedia and shown the
differences between these models.
The proposed models can help to improve the quality of Wikipedia articles by iden-
tifying the best version of a particular article. In consequence, our work can improve
the quality of data in DBpedia10, one of the most famous semantic database, which is
enriched by extracting facts from articles of different language versions of Wikipedia.
Data mining algorithms allow to determine the significance of the features in models of
quality that can later be used to compare articles in different languages. This property is
Quality and Importance of Wikipedia Articles in Different Languages 9
Table 3. Description of parameters abbreviations used in figure 6
Name Description Name Description
A1 Last modified A44 The number of pictures (all)
A2 Last modified not by the bot A45 The number of unique pictures 1 lvl
A3 page length (in bytes) A46 The number of unique pictures 2 lvl
A4 informativeness 1 A47 The number of unique pictures 3 lvl
A5 informativeness 2 A48 The number of unique pictures 4 lvl
A6 Number of edits by anonymous authors for the whole time A49 The number of unique pictures 5 lvl
A7 Number of edits by anonymous for 12 months A50 The number of followers
A8 Number of edits by anonymous for 6 months A51 Number of templates (all)
A9 Number of edits by bots A52 Number of templates ns10
A10 Number of edits by bots for 12 months A53 Number of templates ns828
A11 Number of edits by bots for 6 months A54 The number of unique anonymous for 12 months
A12 Number of edits for 12 months A55 The number of unique anonymous for 6 months
A13 Number of edits for 6 months A56 Number of unique authors for 12 months
A14 Number of edits for all time A57 Number of unique authors for 6 months
A15 Number of edits for all time A58 Number of unique bots for 12 months
A16 the number of links to the article (all) A59 Number of unique bots for 6 months
A17 the number of links on the article ns0 A60 Number of unique bots for the all time
A18 the number of links on the article ns1 A61 Unique templates quality gaps
A19 the number of links on the article NS10 A62 Number of uhunique anonymous authors for the all time
A20 the number of links on the article NS100 A63 The number of language versions
A21 the number of links on the article ns101 A64 Median of non-zero last 30 days
A22 the number of links on the article ns11 A65 The median of visits for 30 days
A23 the number of links on the article NS12 A66 The median of visits for 90 days
A24 the number of links on the article ns13 A67 Heading 1
A25 the number of links on the article ns14 A68 Heading 2
A26 the number of links on the article NS15 A69 Heading 3
A27 the number of links on the article ns2 A70 Heading 4
A28 the number of links on the article ns3 A71 Heading 5
A29 the number of links on the article ns4 A72 Heading 6
A30 the number of links on the article NS5 A73 Come visit the ost day
A31 the number of links on the article NS6 A74 Ref / Length
A32 the number of links on the article ns7 A75 Ref / Number of letters
A33 the number of links on the article ns8 A76 References unique
A34 the number of links on the article ns828 A77 all references
A35 the number of links on the article ns829 A78 Average visits for 30 days
A36 the number of links on the article ns9 A79 Average visits for 90 days
A37 the number of internal links (all) A80 Total visits for 30 days
A38 the number of good internal links A81 Total visits for 90 days
A39 the number of broken internal links A82 Noise1
A40 the number of external links A83 Noise2
A41 The number of letters A84 Unique authors for last time
A42 The number of letters without noise 1 A85 Unique authors for all time.
A43 The number of letters without noise 2
10 W. Lewoniewski, K. W˛ecel, W. Abramowicz
used as the design creation service Wikirank11, which is used to calculate the so-called
relative quality of articles.
1. ecel, K., Lewoniewski, W.: Modelling the Quality of Attributes in Wikipedia Infoboxes.
In Abramowicz, W., ed.: Business Information Systems Workshops. Volume 228 of Lecture
Notes in Business Information Processing. Springer International Publishing (2015) 308–
2. Stvilia, B., Twidale, M.B., Smith, L.C., Gasser, L.: Assessing information quality of a
community-based encyclopedia. Proc. ICIQ (2005) 442–454
3. Blumenstock, J.E.: Size matters: word count as a measure of quality on wikipedia. In:
WWW. (2008) 1095–1096
4. Hu, M., Lim, E.P., Sun, A., Lauw, H.W., Vuong, B.Q.: Measuring article quality in wikipedia.
In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Manage-
ment - CIKM ’07. (2007) 243–252
5. Wöhner, T., Peters, R.: Assessing the quality of Wikipedia articles with lifecycle based
metrics. Proceedings of the 5th International Symposium on Wikis and Open Collaboration
WikiSym 09 (2009) 1
6. Dalip, D.H., Gonçalves, M.A., Cristo, M., Calado, P.: Automatic quality assessment of con-
tent created collaboratively by web communities: a case study of wikipedia. In: Proceedings
of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries. (2009) 295–304
7. Lex, E., Voelske, M., Errecalde, M., Ferretti, E., Cagnina, L., Horn, C., Stein, B., Granitzer,
M.: Measuring the quality of web content using factual information. Proceedings of the 2nd
Joint WICOW/AIRWeb Workshop on Web Quality - WebQuality ’12 (2012) 7
8. Lipka, N., Stein, B.: Identifying Featured Articles in Wikipedia: Writing Style Matters.
Proceedings of the 19th International Conference on World Wide Web (2010) (2010) 1147–
9. Xu, Y., Luo, T.: Measuring article quality in Wikipedia: Lexical clue model. IEEE Sympo-
sium on Web Society (19) (2011) 141–146
10. Anderka, M.: Analyzing and Predicting Quality Flaws in User-generated Content: The Case
of Wikipedia. Phd, Bauhaus-Universitaet Weimar Germany (2013)
11. Lewoniewski, W., W˛ecel, K., Abramowicz, W.: Analiza porównawcza modeli jako´
sci infor-
macji w narodowych wersjach Wikipedii. In Por˛ebska-Mi ˛ac, T., ed.: Systemy Wspomagania
Organizacji SWO 2015. Wydawnictwo Uniwersytetu Ekonomicznego w Katowicach (2015)
12. Wilkinson, D.M., Huberman, B.a.: Cooperation and quality in wikipedia. Proceedings of the
2007 international symposium on Wikis WikiSym 07 (2007) 157–164
13. Kittur, A., Kraut, R.E.: Harnessing the wisdom of crowds in wikipedia. Proceedings of the
ACM 2008 conference on Computer supported cooperative work - CSCW ’08 (2008) 37
14. Arazy, O.: Determinants of Wikipedia Quality : the Roles of Global and Local Contribution
Inequality. New York (2010) 233–236
15. Stein, K., Hess, C.: Does it matter who contributes: a study on featured articles in the german
wikipedia. HT ’07: Proceedings of the eighteenth conference on Hypertext and hypermedia
(2007) 171–174
Quality and Importance of Wikipedia Articles in Different Languages 11
16. Suzuki, Y., Yoshikawa, M.: Mutual Evaluation of Editors and Texts for Assessing Quality of
Wikipedia Articles. In: Proceedings of the Eighth Annual International Symposium on Wikis
and Open Collaboration. WikiSym ’12, New York, NY, USA, ACM (2012) 18:1–18:10
17. Halfaker, A., Kraut, R., Riedl, J.: A Jury of Your Peers : Quality, Experience and Ownership
in Wikipedia. WikiSym’09 (2009) 1–10
18. Adler, B.T., De Alfaro, L.: A content-driven reputation system for the wikipedia. Proceed-
ings of the 16th international conference on World Wide Web WWW 07 7(Generic) (2007)
19. Lih, A.: Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating
collaborative media as a news resource. 5th International Symposium on Online Journalism
(2004) 31
20. Blumenstock, J.E.: Automatically Assessing the Quality of Wikipedia Articles. Technical
report (2008)
21. Dalip, D.H., Gonçalves, M.A., Cristo, M., Calado, P.: Automatic Assessment of Document
Quality in Web Collaborative Digital Libraries. Journal of Data and Information Quality 2(3)
(2011) 1–30
22. Warncke-wang, M., Cosley, D., Riedl, J.: Tell Me More : An Actionable Quality Model for
Wikipedia. In: WikiSym 2013. (2013) 1–10
23. Lewoniewski, W., W˛ecel, K., Abramowicz, W.: Analiza porównawcza modeli klasyfika-
cyjnych w kontek´
scie oceny jako´
sci artykułów wikipedii. In: VI Ogólnopolska Konferencja
Naukowa. Matematyka i informatyka na usługach ekonomii im. Profesora Zbigniewa Czer-
nskiego. (in press) (2016)
... Blumenstock (2008) found that the number of words in an article is a strong predictor of whether the article will be featured on Wikipedia. Similarly, a number of studies (Lewoniewski et al., 2016(Lewoniewski et al., , 2017aWarncke-Wang et al., 2013) found that the number of references in an article is a strong indicator of the articles' quality. ...
... Note that we employ multiple simple measures to compare content across different Wikipedia editions. These measures, while admittedly coarse, are simple to interpret, can be computed easily for millions of articles in Wikipedia, and have been shown to correlate well with the quantity and quality of content in Wikipedia (Blumenstock, 2008;Lewoniewski, 2017Lewoniewski, , 2018Lewoniewski et al., 2016Lewoniewski et al., , 2017bWarncke-Wang et al., 2013). It would be instructive to study what additional insights could be gained by adopting more complex content comparison techniques as part of future work. ...
... Wikipedia guidelines recommend editors to support their edits with "good references from independent sources" 14 and an article with significant number of references to support its content is considered to be a Good Article (GA). 15 The number of references in an article has been found to be one of the most important predictors of article quality (Lewoniewski et al., 2016(Lewoniewski et al., , 2017aWarncke-Wang et al., 2013). Comparing the number of references in articles in different languages on the same topic can thus provide insights into the relative quality and rigor of the content about the topic in different editions. ...
Wikipedia is the largest web‐based open encyclopedia covering more than 300 languages. Different language editions of Wikipedia differ significantly in terms of their information coverage. In this article, we compare the information coverage in English Wikipedia (most exhaustive) and Wikipedias in 8 other widely spoken languages, namely Arabic, German, Hindi, Korean, Portuguese, Russian, Spanish, and Turkish. We analyze variations in different language editions of Wikipedia in terms of the number of topics covered as well as the amount of information discussed about different topics. Further, as a step towards bridging the information gap, we present WikiCompare—a browser plugin that allows Wikipedia readers to have a comprehensive overview of topics by incorporating missing information from Wikipedia page in other language.
... Using machine learning techniques it is possible to solve the problem of quality assessment of Wikipedia articles as a classification task. In order to build such models, various features can be taken into the account, for example length of an article, number of references, number of images or sections [30][31][32][33][34][35]. ...
... Readers of encyclopedias must be able to check where the information comes from [60]. Therefore, one of the most commonly used reliability measures is the number of references in a Wikipedia article [28,34,38,[48][49][50]56,58,[61][62][63][64]. References are related to the credibility of the article. ...
... Wikipedia articles must provide information in a fair and impartial manner. In this case, we can take into account information presented graphically-images [28,34,38,47,50,[55][56][57]61,62,65,66]. On the one hand, pictures can help to assess the objectivity of the presented material. ...
Full-text available
On Wikipedia, articles about various topics can be created and edited independently in each language version. Therefore, the quality of information about the same topic depends on the language. Any interested user can improve an article and that improvement may depend on the popularity of the article. The goal of this study is to show what topics are best represented in different language versions of Wikipedia using results of quality assessment for over 39 million articles in 55 languages. In this paper, we also analyze how popular selected topics are among readers and authors in various languages. We used two approaches to assign articles to various topics. First, we selected 27 main multilingual categories and analyzed all their connections with sub-categories based on information extracted from over 10 million categories in 55 language versions. To classify the articles to one of the 27 main categories, we took into account over 400 million links from articles to over 10 million categories and over 26 million links between categories. In the second approach, we used data from DBpedia and Wikidata. We also showed how the results of the study can be used to build local and global rankings of the Wikipedia content.
... Thus, how to ensure article quality has become a critical concern for Wikipedia, and much related work has been done on information quality assessment. Researchers have evaluated Wikipedia articles based on traditional methods (Halfaker, 2017;Lewoniewski et al., 2016;Li et al., 2015), machine learning methods (Dalip et al., 2017;Dang & Ignat, 2016a;Zielinski et al., 2018), and deep learning methods (Dang & Ignat, 2017;Wang & Li, 2020;Zhang et al., 2018). Recently, there has been some research focusing on the automatic classification of articles as being of either high or low quality. ...
Article quality has always been a major concern for Wikipedia. To improve article quality, it is critical to first identify defects. Thus, flaw classification has attracted considerable attention. To achieve this, several machine-learning-based approaches are available, including deep learning models based on either manually constructed or autoextracted features. However, adopting only features of either single type may not ensure a comprehensive description of articles. To improve flaw classification, we propose a feature fusion framework combining both handcrafted and autoextracted features. In this research, we first use a rule-based method from a previously proposed framework to extract handcrafted features. Additionally, we obtain autoextracted features using Bidirectional Encoder Representations from Transformers (BERT) and various deep learning models, including bidirectional long short-term memory (Bi LSTM), bidirectional gated recurrent unit (Bi GRU), bidirectional recurrent neural network (Bi RNN), and multihead self-attention models. Finally, the handcrafted features are standardized and concatenated with the autoextracted features. Then, the concatenated features are fed into a feedforward neural network for classification. A detailed comparison of different classifiers is conducted. We compare 12 different classifiers in terms of training performance, classification performance, and model training time. The experiments show that the proposed feature fusion framework can notably improve the effectiveness of quality flaw classification for Wikipedia articles. In particular, a Bi GRU model based on the proposed framework achieves excellent classification accuracy.
... It is known that language editions of Wikipedia differ in coverage and level of detail. Their size ranges from a few hundred to a few million articles. 1 These differences have been analyzed with respect to various aspects, e.g., topical coverage [1, 9, 15, 28], article quality [24] and neutrality [3, 26,51,55], bias related to geography [2] or gender [42], and user behavior [12,23]. ...
Full-text available
Public knowledge graphs such as DBpedia and Wikidata have been recognized as interesting sources of background knowledge to build content-based recommender systems. They can be used to add information about the items to be recommended and links between those. While quite a few approaches for exploiting knowledge graphs have been proposed, most of them aim at optimizing the recommendation strategy while using a fixed knowledge graph. In this paper, we take a different approach, i.e., we fix the recommendation strategy and observe changes when using different underlying knowledge graphs. Particularly, we use different language editions of DBpedia. We show that the usage of different knowledge graphs does not only lead to differently biased recommender systems, but also to recommender systems that differ in performance for particular fields of recommendations.
... Finally, we observe that one can also attain reasonably good results for term ranking based on frequency. Specifically, a commonly used metric for defining the "importance" of a Wikipedia article is the number of other articles which link to it (though many other metrics exist, e.g., Thalhammer and Rettinger (2016); Lewoniewski et al. (2016)). We find that this metric is strongly correlated with the frequency of the article's title in LNC (Spearman rank correlation 0.77). ...
One of the most impressive human endeavors of the past two decades is the collection and categorization of human knowledge in the free and accessible format that is Wikipedia. In this work we ask what makes a term worthy of entering this edifice of knowledge, and having a page of its own in Wikipedia? To what extent is this a natural product of on-going human discourse and discussion rather than an idiosyncratic choice of Wikipedia editors? Specifically, we aim to identify such "wiki-worthy" terms in a massive news corpus, and see if this can be done with no, or minimal, dependency on actual Wikipedia entries. We suggest a five-step pipeline for doing so, providing baseline results for all five, and the relevant datasets for benchmarking them. Our work sheds new light on the domain-specific Automatic Term Extraction problem, with the problem at hand being a domain-independent variant of it.
... internationalization and text input in multiple languages is supported. Although DBpedia is using a linked data approach in order to provide a multilingual mapping between URIs (Auer et al. 2007), the quality of the underlying ontology based on Wikipedia data varies depending on the language (Lewoniewski et al. 2016). This might have an effect on the text extraction quality. ...
Full-text available
The research area of Technology Enhanced Learning brings together the disciplines of learning sciences, pedagogy, and computer science in order to provide mechanisms and (digital) tools to support learning and teaching. The Go-Lab project aims to promote inquiry-based science education using online laboratories. It serves as a toolbox for teachers to create customized learning spaces for scientific experiments that includes a variety of applications that support the inquiry and knowledge construction processes. Research in the learning sciences has found group learning to be supportive for knowledge (co-) construction in inquiry-based learning. Particularly for group learning approaches, the terms heterogeneity and homogeneity have been stretched out in research and practice. It may be considered as common sense that heterogeneous learning groups have the highest knowledge gain. This leads to problematic policies: First, weaker learners benefit from the skills of the better performing students. Consequently, a heterogeneous grouping is almost only helpful for the weaker learners. Second, a stigmatization of weak learners leads to a less inclusive and unbalanced approach in learning and teaching. Moreover, stigmatizing learners prevents finding the reasons for the problems the learners are facing. Aronson (1978) developed the Jigsaw teaching technique to create a more inclusive learning situation, but with the goal criterion to deal with challenges of mixing ethnicities in the classroom due to the desegregation of public schools in the USA in the late 1950s. However, the formation of groups for Jigsaw relied on creating experts that have a distinct knowledge in a certain field. Managing and facilitating knowledge diversity and complementarity seems to be the key in order to create classrooms that are more inclusive. The work presented in this dissertation aims to create and convey methods that support learning and teaching in inquiry-based science education. Compared to traditional approaches, a more inclusive learning situation can be created by managing learners’ knowledge diversity. In order to create such support tools, computational methods and architectures from the field of learning analytics have been employed to create a technical infrastructure in Go-Lab. Using this learning analytics architecture, an analysis of the first two years of teachers and students using Go-Lab has been conducted. This analysis posed challenges and requirements for the design of support tools, which have to be integrated into the Go-Lab ecosystem. Based on this technical infrastructure, a general approach to support individual and group learning by facilitating knowledge complementarity has been developed and presented in this work. This framework uses automatic semantic extraction of concepts from learner-generated content to create a shared group knowledge model. Two Applications, which facilitate knowledge diversity and complementarity using this approach have been developed and presented. The “concept cloud app” serves as a cognitive scaffold that interactively visualizes the group knowledge as an open learner model. It uses semantic extraction of concepts from learning artifacts in order to create the model. Furthermore, the “semantic group formation” creates and uses such a shared group knowledge model to form groups with an optimal knowledge complementarity. Several empirical studies have been conducted in schools using Go-Lab and the support tools as a part of this work. As a first study, traditional approaches to form heterogeneous and homogeneous groups based on an operationalization of skills have been explored in the context of IBL. It turned out that, similar to other contexts, heterogeneous groups perform better with respect to the group result and the average learning gain. The subsequent studies have been used to explore the opportunities of knowledge-based approaches. In a second experiment, the concept cloud app has been presented to learners. The results have shown that this app is an effective cognitive scaffold, which supports the knowledge construction in conjunction with other production tools such as concept mapping. The final study aimed to evaluate the semantic group formation. In addition to the formation, the model and the results of the group formation have been presented to learners as a cognitive group awareness tool. The results indicate that the semantic group formation creates groups with a high knowledge diversity and a relatively even distribution of scores across the groups. Finally, the presentation of knowledge complementarity as a group awareness tool supports learners in structuring their collaboration and the communication when exchanging knowledge.
... It is also important to note that in Wikipedia there are special tools for adding and editing references in articles with limited metadata [22]. There are also publications, which take into account the number of references to automatically assess the quality of information in Wikipedia articles [23], including approaches that used machine learning algorithms [24,25] or synthetic measure [26]. ...
Full-text available
The quality assurance of publication data in collaborative knowledge bases and in current research information systems (CRIS) becomes more and more relevant by the use of freely available spatial information in different application scenarios. When integrating this data into CRIS, it is necessary to be able to recognize and assess their quality. Only then is it possible to compile a result from the available data that fulfills its purpose for the user, namely to deliver reliable data and information. This paper discussed the quality problems of source metadata in Wikipedia and CRIS. Based on real data from over 40 million Wikipedia articles in various languages, we performed preliminary quality analysis of the metadata of scientific publications using a data quality tool. So far, no data quality measurements have been programmed with Python to assess the quality of metadata from scientific publications in Wikipedia and CRIS. With this in mind, we programmed the methods and algorithms as code, but presented it in the form of pseudocode in this paper to measure the quality related to objective data quality dimensions such as completeness, correctness, consistency, and timeliness. This was prepared as a macro service so that the users can use the measurement results with the program code to make a statement about their scientific publications metadata so that the management can rely on high-quality data when making decisions.
... The articles were selected from Japanese Wikipedia with the condition of being hyperlinked at least 100 times from other articles in Wikipedia. We also considered the Goodness scoring measures mentioned in (Lewoniewski et al., 2016) to remove some of the unuseful articles. The collected dataset contained 120,333 Japanese Wikipedia articles in different areas, covering 141 out of 200 ENE labels 3 . ...
Wikipedia is a great source of general world knowledge which can guide NLP models better understand their motivation to make predictions. We aim to create a large set of structured knowledge, usable for NLP models, from Wikipedia. The first step we take to create such a structured knowledge source is fine-grain classification of Wikipedia articles. In this work, we introduce the Shinara Dataset, a large multi-lingual and multi-labeled set of manually annotated Wikipedia articles in Japanese, English, French, German, and Farsi using Extended Named Entity (ENE) tag set. We evaluate the dataset using the best models provided for ENE label set classification and show that the currently available classification models struggle with large datasets using fine-grained tag sets.
Online open‐source knowledge repository such as Wikipedia has become an increasingly important source for users to access knowledge. However, due to its large volume, it is challenging to evaluate Wikipedia article quality manually. To fill this gap, we propose a novel approach named “feature fusion‐based stack learning” to assess the quality of Wikipedia articles. Pre‐trained language models including BERT (Bidirectional Encoder Representations from Transformers) and ELMo (Embeddings from Language Models) are applied to extract semantic information in Wikipedia content. The feature fusion framework consisting of semantic and statistical features is built and fed into an out‐of‐sample (OOS) stacking model, which includes both machine learning and deep learning models. We compare the performance of proposed model with some existing models with different metrics extensively, and conduct ablation studies to prove the effectiveness of our framework and OOS stacking. Generally, the experiment shows that our method is much better than state‐of‐the‐art models.
Los medios sociales sirven hoy en día a distintos propósitos de la comunicación corporativa. Dentro de ellos, los wikis públicos, como Wikipedia, son considerados un recurso al que se debe prestar atención. En este artículo se estudian tres variables —visibilidad, calidad y pertinencia—, para caracterizar el espacio que ocupan en Wikipedia las empresas colombianas. Para ello se tomó una muestra de artículos de compañías que cotizan en la Bolsa de Valores Colombia, y se le realizó un análisis de datos estadísticos por agrupamiento y un análisis de contenido asociado a las subcategorías del estudio de reputación RepTrak. La investigación concluye que a pesar de la disponibilidad y aportaciones potenciales del contenido que reposa en la enciclopedia, en Colombia no existe suficiente apropiación social en favor del proceso comunicativo con los públicos de interés.
Full-text available
In this paper we compare the suitability of various classification models (including CART, random forest, boosting trees, C4.5, C5.0, SVM, neural networks) for automatic assessment of the quality of articles in seven language editions of Wikipedia (Belarussian, German, English, French, Polish, Russian, Ukrainian). We employed models available in STATISTICA, WEKA and R Studio. For the classification task we used over 80 different features of the articles, elaborated based on state of the art analysis and our own experience. We also carried out a comparative analysis regarding the significance of the parameters having an impact on the quality of the papers in each language.
Conference Paper
Full-text available
Quality of data in DBpedia depends on underlying information provided in Wikipedia’s infoboxes. Various language editions can provide different information about given subject with respect to set of attributes and values of these attributes. Our research question is which language editions provide correct values for each attribute so that data fusion can be carried out. Initial experiments proved that quality of attributes is correlated with the overall quality of the Wikipedia article providing them. Wikipedia offers functionality to assign a quality class to an article but unfortunately majority of articles have not been graded by community or grades are not reliable. In this paper we analyse the features and models that can be used to evaluate the quality of articles, providing foundation for the relative quality assessment of infobox’s attributes, with the purpose to improve the quality of DBpedia.
Conference Paper
Full-text available
In this paper, we propose a method to identify good quality Wikipedia articles by mutually evaluating editors and texts. A major approach for assessing article quality is a text survival ratio based approach. In this approach, when a text survives beyond multiple edits, the text is assessed as good quality. This approach assumes that poor quality texts are deleted by editors with high possibility. However, many vandals delete good quality texts frequently, then the survival ratios of good quality texts are improperly decreased by vandals. As a result, many good quality texts are unfairly assessed as poor quality. In our method, we consider editor quality for calculating text quality, and decrease the impacts on text qualities by the vandals who has low quality. Using this improvement, the accuracy of the text quality should be improved. However, an inherent problem of this idea is that the editor qualities are calculated by the text qualities. To solve this problem, we mutually calculate the editor and text qualities until they converge. We did our experimental evaluation, and we confirmed that the proposed method could accurately assess the text qualities.
Web applications that are based on user-generated content are often criticized for containing low-quality information; a popular example is the online encyclopedia Wikipedia. The major points of criticism pertain to the accuracy, neutrality, and reliability of information. The identification of low-quality information is an important task since for a huge number of people around the world it has become a habit to first visit Wikipedia in case of an information need. Existing research on quality assessment in Wikipedia either investigates only small samples of articles, or else deals with the classification of content into high-quality or low-quality. This thesis goes further, it targets the investigation of quality flaws, thus providing specific indications of the respects in which low-quality content needs improvement. The original contributions of this thesis, which relate to the fields of user-generated content analysis, data mining, and machine learning, can be summarized as follows: (1) We propose the investigation of quality flaws in Wikipedia based on user-defined cleanup tags. Cleanup tags are commonly used in the Wikipedia community to tag content that has some shortcomings. Our approach is based on the hypothesis that each cleanup tag defines a particular quality flaw. (2) We provide the first comprehensive breakdown of Wikipedia's quality flaw structure. We present a flaw organization schema, and we conduct an extensive exploratory data analysis which reveals (a) the flaws that actually exist, (b) the distribution of flaws in Wikipedia, and, (c) the extent of flawed content. (3) We present the first breakdown of Wikipedia's quality flaw evolution. We consider the entire history of the English Wikipedia from 2001 to 2012, which comprises more than 508 million page revisions, summing up to 7.9 TB. Our analysis reveals (a) how the incidence and the extent of flaws have evolved, and, (b) how the handling and the perception of flaws have changed over time. (4) We are the first who operationalize an algorithmic prediction of quality flaws in Wikipedia. We cast quality flaw prediction as a one-class classification problem, develop a tailored quality flaw model, and employ a dedicated one-class machine learning approach. A comprehensive evaluation based on human-labeled Wikipedia articles underlines the practical applicability of our approach.
Conference Paper
In this paper we address the problem of developing actionable quality models for Wikipedia, models whose features directly suggest strategies for improving the quality of a given article. We first survey the literature in order to understand the notion of article quality in the context of Wikipedia and existing approaches to automatically assess article quality. We then develop classification models with varying combinations of more or less actionable features, and find that a model that only contains clearly actionable features delivers solid performance. Lastly we discuss the implications of these results in terms of how they can help improve the quality of articles across Wikipedia.
Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia articles edited by non-experts and inexperienced contributors. In this paper, we investigate the problem of assessing the quality of articles in collaborative authoring of Wikipedia. We propose three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history. Our Basic model is designed based on the mutual dependency between article quality and their author authority. The PeerReview model introduces the review behavior into measuring article quality. Finally, our ProbReview models extend PeerReview with partial reviewership of contributors as they edit various portions of the articles. We conduct experiments on a set of well-labeled Wikipedia articles to evaluate the effectiveness of our quality measurement models in resembling human judgement.
Wikipedia is the most entry-abundant on-line encyclopedia. Some studies published by Nature proved that the scientific entries in Wikipedia are of good quality comparable to those in the Encyclopedia Britannica which are mainly maintained by experts. But the manual partition of the articles in Wikipedia from a WikiProject implies that high-quality articles are usually reached grade by grade via being repeatedly revised. So many work address to automatically measuring the article quality in Wikipedia based on some assumption of the relationship between the article quality and contributors' reputations, view behaviors, article status, inter-article link, or so on. In this paper, a lexical clue based measuring method is proposed to assess article quality in Wikipedia. The method is inspired the idea that the good articles have more regular statistic features on lexical usage than the primary ones due to the more revise by more people. We select 8 lexical features derived from the statistic on word usages in articles as the factors that can reflect article quality in Wikipedia. A decision tree is trained based on the lexical clue model. Using the decision tree, our experiments on a well-labeled collection of 200 Wikipedia articles shows that our method has more than 83% precise and recall.
Wikipedia is an Internet-based, user contributed encyclopedia that is collaboratively edited, and utilizes the wiki concept – the idea that any user on the Internet can change any page within the Web site, even anonymously. Paradoxically, this seemingly chaotic process has created a highly regarded reference on the Internet. Wikipedia has emerged as the largest example of participatory journalism to date – facilitating many-to-many communication among users editing articles, all working towards maintaining a neutral point of view — Wikipedia's mantra. This study examines the growth of Wikipedia and analyzes the crucial technologies and community policies that have enabled the project to prosper. It also analyzes Wikipedia's articles that have been cited in the news media, and establishes a set of metrics based on established encyclopedia taxonomies and analyzes the trends in Wikipedia being used as a source.