ArticlePDF Available

Power of the Few vs. Wisdom of the Crowd: Wikipedia and the Rise of the Bourgeoisie

Authors:

Abstract and Figures

Wikipedia has been a resounding success story as a collaborative system with a low cost of online participation. However, it is an open question whether the success of Wikipedia results from a “wisdom of crowds ” type of effect in which a large number of people each make a small number of edits, or whether it is driven by a core group of “elite ” users who do the lion’s share of the work. In this study we examined how the influence of “elite ” vs. “common ” users changed over time in Wikipedia. The results suggest that although Wikipedia was driven by the influence of “elite ” users early on, more recently there has been a dramatic shift in workload to the “common ” user. We also show the same shift in del.icio.us, a very different type of social collaborative knowledge system. We discuss how these results mirror the dynamics found in more traditional social collectives, and how they can influence the design of new collaborative knowledge systems. Author Keywords Wikipedia, Wiki, collaboration, collaborative knowledge
Content may be subject to copyright.
Power of the Few vs. Wisdom of the Crowd: Wikipedia and
the Rise of the Bourgeoisie
Aniket Kittur
University of California,
Los Angeles
Los Angeles, CA 90095 USA
nkittur@ucla.edu
Ed Chi, Bryan A. Pendleton,
Bongwon Suh
Palo Alto Research Center Inc.
Palo Alto, CA 94304 USA
{echi, bp, suh}@parc.com
Todd Mytkowicz
University of Colorado at
Boulder
Boulder, CO 80309 USA
mytkowit@colorado.edu
ABSTRACT
Wikipedia has been a resounding success story as a
collaborative system with a low cost of online participation.
However, it is an open question whether the success of
Wikipedia results from a “wisdom of crowds” type of effect
in which a large number of people each make a small
number of edits, or whether it is driven by a core group of
“elite” users who do the lion’s share of the work. In this
study we examined how the influence of “elite” vs.
“common” users changed over time in Wikipedia. The
results suggest that although Wikipedia was driven by the
influence of “elite” users early on, more recently there has
been a dramatic shift in workload to the “common” user.
We also show the same shift in del.icio.us, a very different
type of social collaborative knowledge system. We discuss
how these results mirror the dynamics found in more
traditional social collectives, and how they can influence
the design of new collaborative knowledge systems.
Author Keywords
Wikipedia, Wiki, collaboration, collaborative knowledge
systems, social tagging, delicious.
ACM Classification Keywords
H.5.3:n. [Information Interfaces]: Group and Organization
Interfaces - Collaborative computing, Web-based
interaction, Computer-supported cooperative work; H.3.5
[Information Storage and Retrieval]: Online Information
Systems; K.4.3 [Computers and Society]: Organizational
Impacts – Computer-supported collaborative work.
INTRODUCTION
Wikipedia is an online collaborative encyclopedia whose
most distinctive feature has been its low cost of
participation -- users do not even have to register to
contribute. This openness to new users has been cited as
both a source of strength and weakness [6]. Despite or
because of it, Wikipedia has grown exponentially in users
and information since 2002 [14] and has been highlighted
as a success story of low-cost collaborative knowledge
systems.
The distinctive openness of Wikipedia suggests that one of
its key strengths lies in attracting contributions from new
users who may make few edits. This suggests a kind of
“wisdom of crowds” effect” [12] in which a large number
of people making small contributions can create a quality
product.
However, many prominent Wikipedians argue that a small
number of prolific users, rather than a large crowd, are the
driving force behind the success of Wikipedia. For
example, Jimmy Wales, the founder of Wikipedia, argues
that most of the work on Wikipedia is done by a small
number of users, citing that as of December 2004, 2.5% of
the registered users on the site made half of the edits [15].
In a Sept. 4, 2006 post to his blog[11], Aaron Schwartz
published the results of his study of several articles on
Wikipedia suggesting that measured by the change in
content of each edit, less-active users of Wikipedia are
actually creating much of the text in these articles.
Schwartz’ blog entry was slashdotted, and only deepened
the mystery on who really writes Wikipedia. Is it the work
of a few elites or is it the wisdom of the crowd?
Who does the work in Wikipedia has important
implications both for the allocation of resources within
Wikipedia and for the design of novel collaborative
knowledge systems. Jimmy Wales has been quoted as
saying “I spend a lot of time listening to those four or five
hundred” top users [11], suggesting that the development of
tools and features within Wikipedia may be targeted for the
user groups that are most influential. Similarly, when
designing a collaborative knowledge system it is important
to predict who will be using the system for what purposes,
and to make design decisions and feature choices that
support important users.
In this study we examine the distribution of work in
Wikipedia over time to answer the question of who does the
work in Wikipedia. We examine “elite” vs. “common” user
contributions over time, with the elite defined either by
status (administrators) or by participation level (high-edit
1
users). Two different metrics (number of edits and change
in content) provide converging evidence on an answer.
Finally, to see whether the results found on Wikipedia
generalize, we examine del.icio.us, a very different type of
collaborative knowledge system.
RELATED WORK
A number of studies have quantified the growth of
Wikipedia as a network or graph [1][2][19]. These studies
suggest that the dynamics of Wikipedia are consistent with
those typically found in complex networks. They also find
many characteristics in common across Wikipedias in
different languages [19], and with the structure of the
World Wide Web [1][2][19].
Voss showed that the content on Wikipedia has been
growing exponentially since 2002 [14], whether measured
by articles, words, links, or bytes, or users (though he only
examined two classes of users: those making more than 5
edits in a month or more than 100 edits in a month). He
also showed that the number of unique authors per articles
follows a power law, as does the number of articles per
author. Interestingly, these measures also appear consistent
across Wikipedias of different languages (though with
slightly different parameters), suggesting similar underlying
generation processes.
Buriol et al. [1] found article and user growth consistent
with Voss’ findings. They additionally characterized user
edits over time, showing that the average number of edits
peaked in January 2003, and has been steadily declining
since then. However, this analysis was aggregated across
all users, precluding a more detailed breakdown.
Viegas et al. studied the edit patterns of articles through
“history flow visualizations” [13]. In this technique they
visualized how article edit histories changed, identifying
sections of articles that changed or remained constant over
time. They also examined the growth of 273 articles in
Wikipedia, showing that only 21% of edits reduced the size
of a page, with 6% reducing by more than 50 characters.
However, their data was collected using the May 2003
Wikipedia; as we shall describe below, much has changed
since then.
METHODS
In the following analyses, we used a history dump of the
English Wikipedia that was generated on 7/2/2006. The
dump included over 58 million revisions, from more than
4.7 million wiki pages, of which 2.4 million are article-
related entries in the encyclopedia. To process this data, we
imported the raw text into the Hadoop [7] distributed
computing environment running on a cluster of commodity
machines, while importing the structure into a clone of the
Wikipedia’s own databases for direct analysis. The Hadoop
infrastructure allowed us to quickly explore new content
analysis techniques while minimizing code optimization
time, while the database allowed us to inspect Wikipedia
statistics in their native format.
To calculate the work done while editing an article, we
calculated both the number of edits made and the change in
content between edits. We model change as the number of
words added and removed, as calculated by a traditional
“diff” operation [9]. However, we used words as units
instead of lines, allowing greater precision than previous
studies (e.g., in [13], where the change of a comma would
count an entire line as different). For both measures we
aggregated edits over all 58+ million revisions, grouping by
time and user participation level. User participation level
was calculated based on the total number of edits made by a
user.
ANALYSIS
Rise and Fall of Admins’ Influence
We first examined the influence of Wikipedia
administrators (admins). Admins consist of a small group
of power users who have gone through a stringent peer
selection process and can perform more types of actions
than a regular user, such as temporarily blocking a page
from being edited. Admins typically have an established
track record of heavy editing and commitment to improving
Wikipedia. In our Wikipedia data, there were 967 admins
averaging 12,280 edits each. The admins represent an
interesting “elite” group for these reasons: there are
relatively few of them; they have a strong record of editing;
and they have been peer-selected as belonging to a class
trusted with more power than a normal user.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2001 2002 2003 2004 2005 2006
Propor tion of total ed its made by a dmin
s
Figure 1. Percentage of total edits made by admins.
For each month in Wikipedia’s history, we calculated
admin influence as the number of edits made by admins
divided by the total number of edits made in that month.
Figure 1 shows the percentage of edits made by admins out
of the total edits in Wikipedia. The figure shows a rise in
the percentage of total edits made by admins to a peak of
59% of total edits in late 2002. This period of high
influence lasted until 2004, at which time the data shows a
decline in the percentage of edits made by admins that
continues through the latest 2006 data, to a low of 10% of
total edits.
Why is there such a dramatic decline in the proportion of
edits made by administrators?
Some Hypotheses for the Phenomenon
Decrease in number of admins’ edits
One possibility is that this decline in admins’ influence is
driven by a decrease in the absolute number of edits made
by admins. For example, admins may have a limited
lifespan on Wikipedia and the decline could be a result of
fewer admins making edits, or the same number of admins
making fewer edits. To answer this question we calculated
the number of edits made per month by admins. Figure 2
shows that the number of edits made by admins per month
has been steadily rising. Although there is a dropoff in the
graph toward the end in 2006, this cannot account for the
dramatic decline which began in 2004.
0
100000
200000
300000
400000
500000
600000
700000
800000
2001 2002 2003 2004 2005 2006
Total edits made b y admins
Figure 2. Number of edits per month made by admins.
This admin edit dropoff is an intriguing trend that merits
further study. However, we believe that it may merely
reflect the start-up time associated with becoming an
admin. That is, some of the admins whose edits would
contribute to that part of the curve will not attain admin
status until sometime in the future, and so their edits are not
yet counted in the graph. For example, a user joining in,
say, February 2006, will not likely to have became an
admin by July 2006, which is the latest data we have. We
could not count this user’s edits as admin edits, even though
she might become an admin later.
Bots made maintainence easier
Another potential reason for the decline in admin edits is a
reduction in the maintenance workload for administrators.
There have been a number of automated bots created for
use in Wikipedia which help with maintenance functions
such as identifying and reverting vandalism and spam [18].
If these bots are taking over some of the workload that
previously had to be done by admins, that might account for
the decline in edit percentage seen in Figure 1. However,
Error! Reference source not found. shows that this is not
the case. The percentage of edits made by bots is fairly low
and does not fit the declining admin pattern. Furthermore,
the percentage of vandalism in Wikipedia does not appear
to be decreasing [4].
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2001 2002 2003 2004 2005 2006
Proportion of total edits made by bots
Figure 3. Percentage of total edits made by bots.
RISE OF THE CROWD
From the data above, the rise and decline of the percentage
of edits made by admin users is a phenomenon that is not
explained by a decrease in admin editing or workload.
Instead, it suggests the hypothesis that the decline could be
due to a rise in the number of edits made by non-admins,
which would support the idea of the growing influence of
the masses. In the following, we use a different way of
analyzing the distribution of work done in Wikipedia in
order to test whether this is truly the case.
While the previous analyses dealt with the administrator
user class, there are some advantages that can be gained by
creating user classes by a different metric; specifically, the
total number of edits made by a user. First, this allows us
to verify that the rise and decline in influence found in the
admin group applies to “elite” users and is not an artifact of
being an admin. Second, this provides a data-driven metric
which is not dependent on particularities of the admin
selection process.
We classified users into one of five groups based on the
total number of edits they made in Wikipedia: more than
10,000 edits (10k+); between 5,001 and 10,000 edits (5-
10k); between 1,001 and 5,000 edits (1-5k); between 101
and 1000 edits (100-1k); and 100 or fewer edits (100-). We
then calculated the percentage of total edits that each group
made.
These percentages are shown in Figure 4. Importantly, the
same pattern of rise, dominance, and decline as seen in the
admins appears for the user class with the most edits (10k+)
– the expert “elite”. The decline of the “elite” users appears
to be accompanied by an increase in the percentage of edits
made by users with less than 100 edits – the novice
“masses”.
3
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2001 2002 2003 2004 2005 2006
% Total Edits
<100
100-1
k
1-5k
5-10k
10k+
Figure 4. Percentage of total edits made by users with
differing editing levels.
A different view of the interactions between groups can be
seen in Figure 5, which shows the raw number of edits
made by each user group per month. The number of edits
made per month by each group increases over time to 2006.
From this plot it is possible to see that the number of edits
made by users with less than 100 edits has been growing
much faster than the growth of the 10k+ group (or, indeed,
any other group).
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2001
2002
2003
2004
2005
2006
Edits
10k+
5-10k
1-5k
100-1k
<100
Figure 5. Number of edits per month made by users with
differing editing levels.
Note that high-edit user influence is not accounted for by a
decrease in their absolute activity since their edit rate
increases from 2004 through 2006, while their proportion of
edits is in decline. This is consistent with the admin data
above.
The above analyses demonstrate that the rise in edits by
users with less than 100 edits is driving the declining
proportion of high-edit user influence. However, what is
accounting for the rise in edits by the low-edit group? Is
this growth due to an increase in the population of low-edit
users, or does it mark a shift in their editing pattern?
The editing rate for each user group is shown in Figure 6
(essentially, Figure 5 normalized by the number of users per
group per month). The average number of edits per month
for each user group appears to be relatively stable for much
of the history of Wikipedia. While the low-edit group lines
are bounded in their possible range (e.g., the group with
less than 100 edits could not make an average of 100 or
more edits per month), they are remarkably flat throughout.
The 10k+ group also shows a non-decreasing pattern,
providing further evidence that their decline in influence is
not a result of a decline in absolute activity.
Figure 7 shows the raw population growth for each user
group. All groups show exponential population growth,
with a small leveling out of high-edit groups in 2006 that
likely reflects the lag in a user being counted as part of that
group.
1
10
100
1000
10000
2001 2002 2003 2004 2005 2006
Edits
10k+
5-10k
1-5k
100-1k
<100
Figure 6. Average number of edits per user per month.
1
10
100
1000
10000
100000
1000000
2001 2002 2003 2004 2005 2006
Users
10k+
5-10k
1-5k
100-1k
<100
Figure 7. Population growth for each user group.
However, plotting the percentage of the total population
made up by each user group shows that the low-edit group
is increasing in size faster than the high-edit group (Figure
8). This is consistent with and accounts for the growth in
total edits made by the low-edit group, and the proportional
decline of edits made by the high-edit group.
0%
5%
10%
15%
20%
25%
30%
35%
2001
2002
2003
2004
2005
2006
% Users
<100
100-1k
1-5k
5-10k
10k+
Figure 8. Percentage of users in each user group over time.
CHANGE IN EDIT CONTENT
The previous analyses looked at the number of edits made
by different types of users. However, an issue with these
analyses is that edits themselves could differ greatly in the
amount of changes to an article. By counting each edit
instead of the length of each edit, we effectively treat, say,
the deletion of a comma as equivalent to the addition of
three paragraphs of text. Thus to characterize the amount
and kinds of work done by different user types we need to
analyze the change in content of each edit. Using
distributed processing we were able to calculate the change
in content for all 58+ million revisions on a word-by-word
basis (see Methods for more details).
We first analyzed changes in content length made by
admins. The percentage of words changed by admins out
of the total changed words is shown in Figure 9. This
shows that the number of words changed by admins peaked
in mid-2002 at 63% of all changed words, but then declined
to 13% in the current data. Thus it appears consistent with
the data on raw edits shown in Figure 1. However, if we
discount the 2006 data due to the lag effect described
earlier, it looks like the percentage of words changed by
admins during 2005 remained stable at approximately 30%.
This is in marked contrast to the percentage of total edits
made by admins, which declined steadily from about 30%
to 10% during 2005 (see Figure 1).
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2001 2002 2003 2004 2005 2006
Proportion changed words (admins)
Figure 9. Proportion of words changed by admins.
Figure 10 shows the reason for this difference. Admins
increased sharply in the number of words changed per
month in the 2005-2006 period (again, the drop in 2006 is
likely due to the lag effect). Thus, while the number of
edits made by admins did not keep pace with the number
made by other users, the average number of words changed
per month made up for it, and resulted in what looks like a
stable period.
0
20000000
40000000
60000000
80000000
100000000
120000000
2001 2002 2003 2004 2005 2006
Average changed words (admins)
Figure 10. Average words changed per month by admins.
We also analyzed the data using the data-driven breakdown
of users described earlier. Figure 11 shows the distribution
of changed words over time as a function of user editing
levels. The overall rise and decline of elite (10k+) user
influence (from a peak of about 50% to the latest level of
near 30%) is consistent with the trend found in Figure 4.
However, like the analysis of the admins above, the
percentage of work as measured by changed words remains
higher than measured by total edits, remaining stable at
about 30% during 2005.
5
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
<100
100-1k
1-5k
5-10k
10k+
Changed words
Figure 11. Percentage of changed words in edits made by users
with differing editing levels.
The average number of words changed per month is shown
in Figure 12. Comparing this graph to Figure 5 shows that,
remarkably, the number of words changed by elite users has
kept up with changes made by novice users, even though
the number of edits made by novice users has grown
proportionately faster.
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2001 2002 2003 2004 2005 2006
Changed words
10k+
5-10k
1-5k
100-1k
<100
Figure 12. Average words changed per month.
The above data demonstrate that the rise and decline of the
influence of elite users found above does not depend on the
type of metric used (either percentage of edits or percentage
of changed content). However, while the percentage of
edits declined sharply in the 2005-2006 period, the
percentage of changed content has remained remarkably
stable. Thus though their influence may have waned in
recent years, elite users appear to continue to contribute a
sizeable portion of the work done in Wikipedia.
Furthermore, based on the above data, edits by elite users
appear to be substantial in nature. That is, they appear to be
doing more than just fixing spelling errors or reformatting
citations. One possibility accounting for this is that they
simply revert more than other others, and while reverting
only takes a few clicks, it can look like many words have
changed. However, an analysis removing revert edits does
not substantially change the findings.
Another question is how different user editing levels differ
in the type of edits they make. Schwartz proposed that
although elite users make many edits, novice users are the
ones contributing most of the new content [11]. In contrast,
Wales suggests that elite users drive content creation while
contributions from novice users tend to be more of the
spelling error fixing variety [11]. We examined this issue
by separately counting the total number of words added and
deleted by different user types. The ratios of words added
to words removed per revision are shown in Figure 13. As
the user participation level increases, the ratio also rises,
with novice (<100 edit) users adding .86 words for every
word removed but elite and admins users having ratios
much higher (1.81 and 1.76, respectively). These data
suggest that the more experienced the user, the more
content is contributed. Indeed, novice users appear to
remove more content than they create. While this does not
mean that their contributions are not valuable (removing
unnecessary or low quality content can be an effective way
of improving quality), it does suggest that experienced
users tend to add more new content than novice users.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
<100 100-1k 1-5k 5-10k >10k Admins
Ratio of Added:Removed Words .
Figure 13. Ratio of words added to words removed per
revision for different user classes.
SHIFTS IN OTHER ONLINE SYSTEMS: DEL.ICIO.US
Is the rise and decline of elite users specific to Wikipedia or
is it a more general phenomenon found in growing
collaborative knowledge systems? To address this question
we examined the distribution of work over time in another
social collaborative system: del.icio.us.
Del.icio.us is a popular site on which users bookmark web
pages using free-form tags rather than fixed categories.
Web pages can exist with multiple tags, and tags can have
multiple associated web pages, unlike a traditional
classification organization. The social nature of del.icio.us
arises from users’ ability to see what others have tagged.
They can also see the most popular pages overall or for
specific tags, leading to an impromptu ranking system for
highly tagged pages.
A key difference between del.icio.us and Wikipedia is that
the former does not promote direct interaction between
users; instead, its power derives from the aggregation of
many users’ individual data. As such it is an interesting
contrast case to the high degree of interaction found in
Wikipedia.
We examined the distribution of work over time in
del.icio.us as measured by the number of bookmarks added
per user. As in the earlier analysis, users were split into
classes based on their total number of bookmarks added.
Figure 14 shows the percentage of bookmarks made by
different user classes. As in Wikipedia, we see a marked
decline in the percentage of edits made by the highest-edit
class from a high of 78% to a low of 27% in the latest data
(June 2006). There is a corresponding rise in the lowest-
edit class, from a low of 3% to the current high of 31%.
Note that del.icio.us shows only a steady decline in the
influence of elite users, with no initial rise as seen in
Wikipedia. This is an intriguing difference that merits
further study.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1/04 7/04 1/05 7/05 1/06
% Total bookmarks
0-161
161-271
271-415
415-628
628+
Figure 14. Percentage of bookmarks made by different user
classes in del.icio.us.
Figure 15 shows the number of bookmarks per week for the
different user classes. This figure is evidence that, like
Wikipedia, the steady decline of elite user influence is not
due to a decrease in their participation: the highest-
bookmark users continue to increase in participation
throughout the years. (The dip in 2006 is likely due to lag
effects in amassing enough bookmarks to be considered
part of the elite group, just as we saw in Wikipedia. It
cannot account for the continued decline of elite influence
since 2004.) Instead, the effect appears to be driven by the
growth in low participation users. Thus although
del.icio.us, like Wikipedia, continues to grow, there is a
dramatic shift in influence from the power of the few to the
rise of the crowds.
0
50000
100000
150000
200000
250000
1/04 7/04 1/05 7/05 1/06
Bookmarks per wee
k
0-161
161-271
271-415
415-628
628+
Figure 15. Number of bookmarks per week for different user
classes.
DISCUSSION
Although the population and content of Wikipedia appear to
be in continued exponential growth, a closer look revealed a
major shift in the distribution of work in the system. We
discovered an initial rise and subsequent decline in the
influence of “elite” users. This result held true whether
elite users were defined by peer-selected groups
(administrators) or data-driven groups (high-edit users).
We demonstrated that this decline was not due to a decrease
in elite user activity or to shifts in user group editing
patterns, but instead was driven by marked growth in the
population of low-edit users – the rise of the bourgeoisie.
These results were consistent whether the data were
analyzed by edit count or by the actual change in content.
We also examined del.icio.us, a social collaborative
bookmarking site which has also experienced tremendous
growth. Again we discovered a shift in the distribution of
work from the elite (high bookmark) to the novice (low
bookmark) users. This raises the intriguing hypothesis that
this change of influence over time may be a typical
phenomenon of online collaborative knowledge systems
and may occur despite what appears to be constant
continued overall growth.
One way of viewing the shift in influence from elite users
to novice users is as a process of technology adoption [10].
Elite users are the early adopters who select and refine the
technology. They are followed by a majority of novice
users who begin to be the primary users of the system.
7
However, collaborative products like Wikipedia are
different from traditional technology products in that the
product itself changes as a direct result of adoption. That
is, the end user who begins participating in Wikipedia
immediately has an effect on it. In this sense collaborative
products resemble dynamic social systems more than fixed
products, as they are in a state of constant change based on
the prevailing opinions of the population.
For such systems to spread, early participants must generate
sufficient utility in the system for the larger masses to find
value in low cost participation. Like the first pioneers or
the founders of a startup company, the elite few who drove
the early growth of Wikipedia generated enough utility for
it to take off as a more commons-oriented production
model; without them, it is unlikely that Wikipedia would
have succeeded. Just as the first pioneers built
infrastructure which diminished future migration costs, the
early elite users of Wikipedia built up enough content,
procedures, and guidelines to make Wikipedia into a useful
tool that promoted and rewarded participation by new users.
To carry the analogy further, as emerging social systems
grow, the influence of the early founders begins to wane.
The people who start a company are rarely the same as
those who run it; the pioneers were dwarfed by the influx of
settlers. Similarly, the influence of elite users whose
contributions drove Wikipedia until recently has been
shifting to the novice masses. With such population growth
comes the need for structure, procedure, and hierarchies.
Already there is evidence of increasing structure and
bureaucracy evolving to handle system growth. Until 2004,
the arbiter of serious disputes and the only person with the
ability to ban non-vandal users was Jimmy Wales [16];
since then an Arbitration committee has been established to
do so, as well as a Mediation committee which focuses on
helping users resolve their disputes before they reach the
level of needing arbitration. Informal structures have also
been evolved, such as the Mediation Cabal -- an unofficial
group of normal users who try to help mediate disputes
and the Association of Members’ Advocates.
This view of Wikipedia as an emerging social system
suggests that it may be entering a critical period. The
recent massive influx of low-participation users has resulted
in a large shift in the distribution of work done in the
system. How Wikipedia reacts to this shift may be a major
determinant of its future viability and continued growth.
Future Directions and Application
These findings suggest additional avenues for further
research. First, do social stratifications (the hierarchical
arrangement of social classes within a society happen in
other social collaborative systems? There are some
anecdotal evidences that social stratification does happen in
open-source development [1], multi-player online games
[5], and bulletin board systems such as Slashdot [8].
Second, another research question is “what causes the
social stratification in the Wikipedia society?” Do
stratifications from other online communities result directly
from an increase in participation by common classes of
users? Interestingly, in sociology, social stratification is
believed by proponents of structural-functional theory to be
beneficial in stabilizing the existence of societies. Conflict
theorists such as Max Weber believe stratification occurs
due to status and power differentials [17]. Viewed from
this perspective, the invention of the admins class in
Wikipedia could have predicted the stratification of the
Wiki-society. The clear subsequent shift in power among
levels of stratification is an intriguing trend that merits
study in other online social systems.
The results described here also have implications for the
design of collaborative knowledge systems. One
recommendation is that during the early phase of the system
resources should initially be allocated towards building
tools for power users and improving expert features, as this
is the population driving early growth. However, as the
population increases resources should be shifted towards
improving ease of use and effectiveness for novice users, as
well as developing structures and procedures that can
support a large influx of users. It also suggests that
designers should continue to reevaluate the user population
in anticipation of the shifts seen here.
CONCLUSION
Wikipedia’s growth as a reference tool and an online
community has caught the attention of researchers
worldwide. Little is currently known about the dynamics of
its social structure. A current raging debate is “who writes
Wikipedia?” Is it the work of a small group of elite users,
or is it the input from the wisdom of a large crowd?
In this paper, we show that the story is more complex than
explanations offered before. In the beginning, elite users
contributed the majority of the work in Wikipedia.
However, beginning in 2004 there was a dramatic shift in
the distribution of work to the common users, with a
corresponding decline in the influence of the elite. These
results did not depend on whether work was measured by
edits or by actual change in content, though the content
analysis showed that elite users add more words per edit
than novice users (who on average remove more words than
they added). The decline of elite user influence was also
shown to occur in del.icio.us, a social collaborative
knowledge system with a very different participation
structure from Wikipedia, suggesting that it may be a
common phenomenon in the evolution of online
collaborative knowledge systems. The data presented in
this paper suggest that user dynamics in Wiki-society merit
further study and provide insights into allocating resources
when building online collaborative knowledge systems.
10. Rogers, Everett M. (1962 and 1995). Diffusion of
Innovations. New York: Free Press.
ACKNOWLEDGMENTS
We would like to thank Peter Pirolli and Stuart Card for
valuable advice and the User Interface Research Group for
engaging discussion on this topic. 11. Schwartz, Aaron. Who Writes Wikipedia?
http://www.aaronsw.com/weblog/whowriteswikipedia
(Blog retrieved Sept 20, 2006).
REFERENCES 12. Surowiecki, James (2004). The Wisdom of Crowds: Why
the Many Are Smarter Than the Few and How
Collective Wisdom Shapes Business, Economies,
Societies and Nations. New York: Doubleday, 2004.
1. Buriol, L., Castillo, C., Donato, D., Leonardi, S., and
Millozzi, S.: "Temporal Evolution of the Wikigraph".
To appear in Proc. of the Web Intelligence Conference.
Hong Kong, December 2006. IEEE CS Press. 13. Viégas, F. B., Wattenberg M., Dave K., Studying
cooperation and conflict between authors with history
flow visualizations, In Proc. of the SIGCHI conference
on Human factors in computing systems, p.575-582,
April 24-29, 2004, Vienna, Austria
2. Capocci, A., Servedio, V.D.P., Colaiori, F., Buriol L.S.,
Donato, D., Leonardi S., Caldarelli, G. “Preferential
attachment in the growth of social networks: the case of
Wikipedia”. arXiv.org/physics/0602026 (2006).
3. Chance, Tom. The social structure of open source
development.
http://programming.newsforge.com/article.pl?sid=05/01/
25/1859253 (retrieved Sept 20, 2006).
14. Voss, J. Measuring Wikipedia. In Proceedings of the
ISSI 2005 (Stockholm, Sweden, July 24-28, 2005).
15. Wales, J., Wikipedia, Emergence, and The Wisdom of
Crowds. http://mail.wikipedia.org/pipermail/wikipedia-
l/2005-May/039397.html (2005). (Retrieved Sept 21,
2006)
4. Kittur, A., Suh, B., Chi, E., Pendleton, B. A., “He says,
she says: Conflict and coordination in Wikipedia”.
Submitted. 16. Wikipedia.org. Wikipedia: Arbitration Committee.
http://en.wikipedia.org/wiki/Wikipedia:Arbitration
(Retreived Sept. 29, 2006).
5. Koster, Raph. Small Worlds: Competitive and
Cooperative Structures in Online Worlds. Talk given at
GDC2003.
http://www.raphkoster.com/gaming/smallworlds.html
(Retrieved Sept 21, 2006). 17. Wikipedia.org. Max Weber.
http://en.wikipedia.org/wiki/Max_Weber. (Retrieved
Sept. 20, 2006).
6. Hafner, Katie. Growing Wikipedia Refines Its 'Anyone
Can Edit' Policy. New York Times, June 17, 2006.
http://www.nytimes.com/2006/06/17/technology/17wiki
.html
18. Wikipedia.org. Wikipedia: Registered bots.
http://en.wikipedia.org/wiki/Wikipedia:Registered_bots
19. Zlatic, V., Bozicevic, M., Stefancic, H., Domazet, M.,
Wikipedias: Collaborative web-based encylopedias as
complex networks, Physical Review E, vol 74 (2006).
7. Hadoop Project, http://lucene.apache.org/hadoop/
8. Malda, Rob. (aka CmdrTaco) Slashdot Moderation.
http://slashdot.org/moderation.shtml (Retrieved Sept 21,
2006).
9. Myers, E., "An O(ND) Difference Algorithm and its
Variations", Algorithmica Vol. 1 No. 2, (1986), p 251.
9
... So, the growth performance is progressive and encouraging, which solicits a dedicated study to analyze the status of Bengali Wikipedia along this whole journey and what factors contributed the most, what essential factors fell behind. In a platform like Wikipedia, where the content is produced by the community, i.e., its dedicated volunteers [2,36,37], the rate of new content creation, the role of active editors, etc., are considered vital parameters [20,42] and awaits logical reasons to find out what is bringing positive or negative effect on them. Different outreach activities including contests, edit-a-thons, and other sorts of campaigns are being organized from time to time, and the impact of these events are also major issues. ...
... It is found in several studies that most of the contributions in Wikipedia come from a minor portion of very active contributors [1,18,20,31,32]. The observation has been proved to be valid in Bengali Wikipedia as well. ...
Conference Paper
Full-text available
The Bengali Wikipedia has recently crossed the milestone of 100,000 articles after a journey of almost 17 years in December 2020. In this journey, the Bengali language edition of the world's largest encyclopedia has experienced multiple changes with a promising increase in the overall performance considering the growth of community members and content. This paper analyzes the various associating factors throughout this journey including the number of active editors, number of content pages, pageview, etc., along with the connection to outreach activities with these parameters. The gender gap has been a worldwide problem and is quite prevalent in Bengali Wikipedia as well, which seems to be unchanged over the years and consequentially, leaving a conspicuous disparity in the movement. The paper inspects the present scenario of Bengali Wikipedia through quantitative factors with a relative comparison with other regional languages.
... This article concludes with a discussion of several implications of our findings for research on online activism. In particular, we highlight how the tactics employed by SG constitute an innovation in the repertoire of contention used by corporate activists (Briscoe & Gupta, 2016;King & Pearce, 2010) by going beyond hashtag activism (Jackson et al., 2020) and relying upon peer production (Kittur et al., 2007); how episodes of emotional intensity affect the commitment of participants to online activism (Jasper, 1998); how the internet and its infrastructure has become not only an arena in which contentious activity takes place, but also a target in itself (Ayres, 1999;Marantz, 2020;Nagle, 2017); and how social media allows online movements to hold persuasive and confrontational stances simultaneously. We also raise questions about the viability of online activism, as a form of private politics, to influence platform self-regulation toward issues of disinformation and hate speech. ...
... Therefore, SG's contentious activity required both the work of vigilantes but also micro-contribution by the crowd to be impactful. There is thus promise in considering online activism as a form online peer production (Kittur et al., 2007) that involves coordination, division of labor, and control in future inquiries. ...
Article
Full-text available
This article explores how successful digitally native activism generates social change. Digitally native movements are initiated, organized, and coordinated online without any physical presence or pre-existing offline campaign. To do so, we explore the revelatory case of Sleeping Giants (SG)—an online movement that led more than 4,000 organizations to withdraw their programmatic advertising spend from Breitbart, a far-right publisher. Analyzing 3.5 million tweets related to the movement along with qualitative secondary data, we used a mixed method approach to investigate the conditions that favored SG emergence, the organizing and coordinating practices of the movement, and the strategic framing practices involved in the tuning of the movement’s language and rhetoric toward its targets. Overall, we contribute to research on online movements and shed light on the pivotal role of peer production work and of language in leading an impactful online movement that aimed to counter online disinformation and hate speech.
... The success story of UNIX/LINUX (Torvalds and Read By-Diamond 2001) has, in a sense, been repeated by public information platforms such as Wikipedia (Kittur, Chi, et al. 2007;Kittur and Kraut 2008) and OpenStreetMap (Chilton 2009). Such platforms are sometimes supported by millions of volunteers but used by a thousand times bigger community of people. ...
... In the past decades, subjects such as "collective intelligence" (Woolley et al. 2010) and "swarm intelligence" (Beni 2020) or "the wisdom of crowds" (Kittur, Chi, et al. 2007;Kittur and Kraut 2008;Mollick and Nanda 2016;Surowiecki 2005) as well as the opposite, "the madness of crowd" (Mackay 1841), have been intensive research areas. Overall, one can conclude that crowds can often collectively outperform individual experts (Surowiecki 2005). ...
Preprint
Full-text available
The digital revolution is reinventing business models, reshaping economic sectors, and changing entire societal institutions. Big Data and Artificial Intelligence, profiling and targeting, and several other technological developments are now fundamentally changing the ways economies work. This contribution discusses opportunities and threats of the “Attention Economy” and “Surveillance Capitalism”, with a focus on systemic changes. These are associated with developments such as “more data”, “more speed“, “more networking“. This contribution will also compare two paradigms: one that is based on a data-driven, AI-controlled, and largely centralized vision of society and its optimization, and one that is focused on empowerment, coordination, cooperation, self-organization, self-regulation, co-evolution, and collective intelligence in a distributed framework. It will be illustrated that suitable network effects are critical to a more cooperative and sustainable economy. Based on these insights, the possibility of a new, circular, and synergistic organization of supply chains and a new, symbiotic economy will be highlighted.
... Today, a little more than 15% of the edits on Wikipedia are done by bots and another 26% by anonymous users, whereas the majority of 59% of edits originates from registered users. In this regard, the question of "Who writes Wikipedia?" has been intensely debated, and it has been frequently pointed out that the majority of manual edits originates from a core group of registered "elite" editors who make up for most of the contributions [139,185,198], considering both edit quantity and edit quality (i.e., longevity of edited words). This may explain why research focuses almost exclusively on the portion of edits originating from registered users. ...
Thesis
Full-text available
With the growing importance of the World Wide Web, the major challenges our society faces are also increasingly affecting the digital areas of our lives. Some of the associated problems can be addressed by computer science, and some of these specifically by data-driven research. To do so, however, requires to solve open issues related to archive quality and the large volume and variety of the data contained. This dissertation contributes data, algorithms, and concepts towards leveraging the big data and temporal provenance capabilities of web archives to tackle societal challenges. We selected three such challenges that highlight the central issues of archive quality, data volume, and data variety, respectively: (1) For the preservation of digital culture, this thesis investigates and improves the automatic quality assurance of the web page archiving process, as well as the further processing of the resulting archive data for automatic analysis. (2) For the critical assessment of information, this thesis examines large datasets of Wikipedia and news articles and presents new methods for automatically determining quality and bias. (3) For digital security and privacy, this thesis exploits the variety of content on the web to quantify the security of mnemonic passwords and analyzes the privacy-aware re-finding of the various seen content through private web archives.
... We can hypothesize a variety of factors influencing these different patterns, related to external factors, e.g., Internet access, geopolitical context, number and demographic composition of a language's speakers, language status (official language or not); and to internal factors, i.e., calcification of policies [7], community dynamics [22,44], conflict [45,46], community identification [47], platform usability [26], among others. While in this study we focused on investigating the state of the active community, developing simple indicators for capturing these internal and external factors would be an interesting approach to explain how they may specifically affect the growth, stagnation and decline patterns of different language communities. ...
Article
Full-text available
Wikipedia is an undeniably successful project, with unprecedented numbers of online volunteer contributors. After 2007, researchers started to observe that the number of active editors for the largest Wikipedias declined after rapid initial growth. Years after those announcements, researchers and community activists still need to understand how to measure community health. In this paper, we study patterns of growth, decline and stagnation, and we propose the creation of 6 sets of language-independent indicators that we call “Vital Signs”. Three focus on the general population of active editors creating content: retention, stability, and balance; the other three are related to specific community functions: specialists, administrators, and global community participation. We borrow the analogy from the medical field, as these indicators represent a first step in defining the health status of a community; they can constitute a valuable reference point to foresee and prevent future risks. We present our analysis for eight Wikipedia language editions, and we show that communities are renewing their productive force even with stagnating absolute numbers; we observe a general lack of renewal in positions related to special functions or administratorship. Finally, we evaluate our framework by discussing these indicators with Wikimedia affiliates to support them in promoting the necessary changes to grow the communities.
... The principle of "the wisdom of the crowd" shows that a large group of people with average knowledge on a topic can provide reliable answers. The aggregate results cancel out the noise and can often be superior to those of highly knowledgeable experts (Kittur 2007 Bagging stands for bootstrap aggregation. It increases the accuracy of models through the use of decision trees, which reduces variance to a large extent. ...
Thesis
Full-text available
Landslides are frequently responsible for considerable huge economic losses and casualties in mountainous regions especially nowadays as development expands into unstable hillslope areas under the pressures of increasing population size and urbanization. People are not the only vulnerable targets of landslides. Indeed, mass movements can easily lay waste to everything in their path, threatening human properties, infrastructures and natural environments. Italy is severely affected by landslide phenomena and it is one of the most European countries affected by this kind of phenomena. In this framework, Italy is particularly concerned with forecasting landslide effects, in compliance with the National Law n. 267/98, enforced after the devastating landslide event of Sarno (Campania, Southern Italy). According to the latest Superior Institute for the Environmental Protection and Research report on "hydrogeological instability" of 2018, it emerges that the population exposed to landslides risk is more than 5 million and in particular almost half-million falls into very high hazard zones. The slope stability can be compromised by both natural and human-caused changes in the environment. The main reasons can be summarised into heavy rainfalls, earthquakes, rapid snow-melts, slope cut due to erosions, and variation in groundwater levels for the natural cases whilst slopes steepening through construction, quarrying, building of houses, and farming along the foot of mountainous zone correspond to the human component. This Ph.D. thesis was carried out in the Liguria region, inside the Cinque Terre National Park. This area was chosen due to its abundance of different types of landslides and its geological, geomorphological and urban characteristics. The Cinque Terre area can be considered as one of the most representative examples of human-modified landscape. Starting from the early centuries of the Middle Ages, local farmers have almost completely modified the original slope topography through the construction of dry-stone walls, creating an outstanding terraced coastal landscape. This territory is extremely dynamic since it is characterized by a complex geological and geomorphological setting, where many surficial geomorphic processes coexist, along with peculiar weather conditions. For this reason, part of this research focused on analysing the disaster that hit the Cinque Terre on October, 25th, 2011. Multiple landslides took place in this occasion, triggering almost simultaneously hundreds of shallow landslides in the time-lapse of 5-6 hours, causing 13 victims, and severe structural and economic damage. Moreover, this artificial landscape experienced important land-use changes over the last century, mostly related to the abandonment of agricultural activity. It is known that terraced landscapes, when no longer properly maintained, become more prone to erosion processes and mass movements. Within the context of slope instability, the international community has been focusing for the last decade on recognising the landslide susceptibility/hazard of a given area of interest. Landslide susceptibility predicts "where" landslides are likely to occur, whereas, landslide hazard evaluates future spatial and temporal mass movement occurrence. Although both definitions are incorrectly used as interchangeable. Such a recognition phase becomes crucial for land use planning activities aimed at the protection of people and infrastructures. In fact, only with proper risk assessment governments, regional institutions, and municipalities can prepare the appropriate countermeasures at different scales. Thus, landslide susceptibility is the keystone of a long chain of procedures that are actively implemented to manage landslide risk at all levels, especially in vulnerable areas such as Liguria. The methods implemented in this dissertation have the overall objective of evaluating advanced algorithms for modelling landslide susceptibility.
... One possible side effect of this "attention economics" approach is the spread of fake news and hate speech. However, to achieve a "wisdom of crowds" [148][149][150][151] rather than a "madness of crowds" [152], it is important to give room for diverse opinions and avoid manipulation [153]. ...
Article
Full-text available
The digital revolution has brought about many societal changes such as the creation of “smart cities”. The smart city concept has changed the urban ecosystem by embedding digital technologies in the city fabric to enhance the quality of life of its inhabitants. However, it has also led to some pressing issues and challenges related to data, privacy, ethics inclusion, and fairness. While the initial concept of smart cities was largely technology- and data-driven, focused on the automation of traffic, logistics and processes, this concept is currently being replaced by technology-enabled, human-centred solutions. However, this is not the end of the development, as there is now a big trend towards “design for values”. In this paper, we point out how a value-sensitive design approach could promote a more sustainable pathway of cities that better serves people and nature. Such “value-sensitive design” will have to take ethics, law and culture on board. We discuss how organising the digital world in a participatory way, as well as leveraging the concepts of self-organisation, self-regulation, and self-control, would foster synergy effects and thereby help to leverage a sustainable technological revolution on a global scale. Furthermore, a “democracy by design” approach could also promote resilience.
Article
The Embassy of Good Science ( https://www.embassy.science ) aims to improve research integrity and research ethics by offering an online, open, 'go-to' platform, which brings together all information on research integrity and research ethics relevant for researchers, and makes that information accessible, understandable, and appealing. It effectively organizes and describes research integrity and research ethics guidelines, educational materials, cases, and scenarios. The Embassy is wiki-based, allowing users to add -- when logged in with their ORCID researcher id -- new information, and update and refine existing information. The platform also makes the research integrity and research ethics community visible and accessible in pages dedicated to relevant initiatives, news and events. Therefore, the Embassy enables researchers to find useful guidance, rules and tools to conduct research responsibly. The platform empowers researchers through increased knowledge and awareness, and through the support of the research integrity and research ethics community. In this article we will discuss the background of this new platform, the way in which it is organized, and how users can contribute.
Article
Full-text available
Peer production online communities are groups of people that collaboratively engage in the building of common resources such as wikis and open source projects. In such communities, participation is highly unequal: few people concentrate the majority of the workload, while the rest provide irregular and sporadic contributions. The distribution of participation is typically characterized as a power law distribution. However, recent statistical studies on empirical data have challenged the power law dominance in other domains. This work critically examines the assumption that the distribution of participation in wikis follows such distribution. We use statistical tools to analyse over 6,000 wikis from Wikia/Fandom, the largest wiki repository. We study the empirical distribution of each wiki comparing it with different well-known skewed distributions. The results show that the power law performs poorly, surpassed by three others with a more moderated heavy-tail behavior. In particular, the truncated power law is superior to all competing distributions, or superior to some and as good as the rest, in 99.3% of the cases. These findings have implications that can inform a better modeling of participation in peer production, and help to produce more accurate predictions of the tail behavior, which represents the activity and frequency of the core contributors. Thus, we propose to consider the truncated power law as the distribution to characterize participation distribution in wiki communities. Furthermore, the truncated power law parameters provide a meaningful interpretation to characterize the community in terms of the frequency of participation of occasional contributors and how unequal are the group of core contributors. Finally, we found a relationship between the parameters and the productivity of the community and its size. These results open research venues for the characterization of communities in wikis and in online peer production.
Article
Full-text available
We present an analysis of the statistical properties and growth of the free on-line encyclopedia Wikipedia. By describing topics by vertices and hyperlinks between them as edges, we can represent this encyclopedia as a directed graph. The topological properties of this graph are in close analogy with that of the World Wide Web, despite the very different growth mechanism. In particular we measure a scale--invariant distribution of the in-- and out-- degree and we are able to reproduce these features by means of a simple statistical model. As a major consequence, Wikipedia growth can be described by local rules such as the preferential attachment mechanism, though users can act globally on the network.
Conference Paper
Full-text available
Wikipedia, a wiki-based encyclopedia, has become one of the most successful experiments in collaborative knowledge building on the Internet. As Wikipedia continues to grow, the potential for conflict and the need for coordination increase as well. This article examines the growth of such non-direct work and describes the development of tools to characterize conflict and coordination costs in Wikipedia. The results may inform the design of new collaborative knowledge systems. Author Keywords Wikipedia, wiki, collaboration, conflict, user model, Web-based interaction, visualization.
Article
Full-text available
Wikipedia, an international project that uses Wiki software to collaboratively create an encyclopaedia, is becoming more and more popular. Everyone can directly edit articles and every edit is recorded. The version history of all articles is freely available and allows a multitude of examinations. This paper gives an overview on Wikipedia research. Wikipedia’s fundamental components, i.e. articles, authors, edits, and links, as well as content and quality are analysed. Possibilities of research are explored including examples and first results. Several characteristics that are found in Wikipedia, such as exponential growth and scale-free networks are already known in other context. However the Wiki architecture also possesses some intrinsic specialities. General trends are measured that are typical for all Wikipedias but vary between languages in detail.
Article
Full-text available
Wikipedia is a popular web-based encyclopedia edited freely and collaboratively by its users. In this paper we present an analysis of Wikipedias in several languages as complex networks. The hyperlinks pointing from one Wikipedia article to another are treated as directed links while the articles represent the nodes of the network. We show that many network characteristics are common to different language versions of Wikipedia, such as their degree distributions, growth, topology, reciprocity, clustering, assortativity, path lengths, and triad significance profiles. These regularities, found in the ensemble of Wikipedias in different languages and of different sizes, point to the existence of a unique growth process. We also compare Wikipedias to other previously studied networks.
Article
The problems of finding a longest common subsequence of two sequencesA andB and a shortest edit script for transformingA intoB have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simpleO(ND) time and space algorithm is developed whereN is the sum of the lengths ofA andB andD is the size of the minimum edit script forA andB. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to haveO(N+D 2) expected-time performance under a basic stochastic model. A refinement of the algorithm requires onlyO(N) space, and the use of suffix trees leads to anO(N logN+D 2) time variation.
Conference Paper
The Internet has fostered an unconventional and powerful style of collaboration: "wiki" web sites, where every visitor has the power to become an editor. In this paper we investigate the dynamics of Wikipedia, a prominent, thriving wiki. We make three contributions. First, we introduce a new exploratory data analysis tool, the history flow visualization, which is effective in revealing patterns within the wiki context and which we believe will be useful in other collaborative situations as well. Second, we discuss several collaboration patterns highlighted by this visualization tool and corroborate them with statistical analysis. Third, we discuss the implications of these patterns for the design and governance of online collaborative social spaces. We focus on the relevance of authorship, the value of community surveillance in ameliorating antisocial behavior, and how authors with competing perspectives negotiate their differences.
Conference Paper
Wikipedia (www.wikipedia.org) is an online encyclopedia, available in more than 100 languages and comprising over 1 million articles in its English version. If we consider each Wikipedia article as a node and each hyperlink between articles as an arc we have a ” Wikigraph”, a graph that represents the link structure of Wikipedia. The Wikigraph differs from other Web graphs studied in the literature by the fact that there are timestamps associated with each node. The timestamps indicate the creation and update dates of each page, and this allows us to do a detailed analysis of the Wikipedia evolution over time. In the first part of this study we characterize this evolution in terms of users, editions and articles; in the second part, we depict the temporal evolution of several topological properties of the Wikigraph. The insights obtained from the Wikigraphs can be applied to large Web graphs from which the temporal data is usually not available.
The social structure of open source development
  • Tom Chance
Chance, Tom. The social structure of open source development.