ArticlePDF Available

Why Is Stack Overflow Failing? Preserving Sustainability in Community Question Answering

Authors:
  • Kempelen Institute of Intelligent Technologies
  • Kempelen Institute of Intelligent Technologies

Abstract and Figures

Enormous amounts of knowledge sharing occur every day in community question answering (CQA) sites, some of which (for example, Stack Overflow or Ask Ubuntu) have become popular with software developers and users. In spite of these systems' overall success, problems are emerging in some of them - increased failure and churn rates. To investigate this trend, researchers conducted a case study of Stack Overflow. First, they evaluated the community perception that the emerging problems are heavily related to the growing amount of low-quality content created by undesired groups of users (help vampires, noobs, and reputation collectors). Reproducible data analyses of content and community evolution supported these findings. Suggestions to deal with the emerging problems include providing users with responder-oriented adaptive support that involves a whole community in QA. Such approaches represent an eminent attitude change regarding QA support, with the aim to preserve CQA ecosystems' long-term sustainability.
Content may be subject to copyright.
1
Why Stack Overflow Fails?
Preservation of Sustainability in Community Question
Answering
Ivan Srba and Maria Bielikova
Faculty of Informatics and Information Technologies
Slovak University of Technology in Bratislava
Ilkovičova 2, 842 16 Bratislava, Slovakia
ivan.srba@stuba.sk, maria.bielikova@stuba.sk
Abstract Enormous amount of knowledge sharing occurs every day in Community Question Answering
(CQA) sites, some of which became popular also among software developers and end users (e.g. Stack Overflow
or Ask Ubuntu). In spite of their overall success, we can witness emerging problems in some CQA systems an
increasing failure and churn rate. In order to investigate this trend, we conducted a case study focused on Stack
Overflow. At first, we evaluated a community perception that indicates that the emerging problems are highly
related to the growing amount of low quality content created by undesired groups of users (i.e. help vampires,
noobs and reputation collectors). Consequently, we supported these findings by reproducible data analyses of
content and community evolution. In order to face the emerging problems, we suggest to provide the users with
novel answerer-oriented adaptive support that, in addition, involves a whole community in question answering.
These approaches represent an eminent attitude change in the existing question-answering support methods
with the aim to preserve a long-term sustainability of CQA ecosystems.
Keywords H.3.4.e Question-answering systems; N.3.d Knowledge sharing; H.5.3.c Computer-supported
cooperative work
COMMUNITY QUESTION ANSWERING
With increasing popularity of online communities gathered in knowledge sharing systems (e.g.
Wikipedia, forums, and mailing lists), their new forms constantly emerge. One of the most popular among
them is the concept of Community Question Answering (CQA) sites, such as Yahoo! Answers or Stack
Overflow. Members of these communities can ask various questions, which cannot be usually answered
easily by standard information retrieval tools [1], while other members can provide answers to them. Besides
general CQA systems, various domain-specific communities appeared. Some of them gained high popularity
among software developers and end users, such as Stack Overflow, which is even considered as one of the
most successful CQA systems ever [2]. Stack Overflow is particularly effective for novices to obtain answers
on conceptual or code review questions and, moreover, it can serve even as a supplement for official software
documentation if it does not exist or is only very sparse [3].
2
Currently, the existing CQA systems are perceived mainly as a successful example of collective
intelligence due to their high popularity, millions of answered questions, fast question answering process as
well as universal availability. In spite of that, some CQA systems become not as successful as they used to
be. As an example, we can take the recent evolution of composition of content and community on Stack
Overflow.
While the total number of new questions asked each month was growing gradually from the beginning,
we witness a noticeable change in 2014 (see Figure 1). At first, we can a see a sharp peak in the total number
of new questions around March 2014 followed by a significant drop of 13% over the next three months. At
the same time, the number of questions for which information needs of askers were not fulfilled (i.e.
questions either deleted due to their poor quality/violation of community rules, or unanswered for more than
one month after their posting) overcomes the number of questions for which askers’ information needs were
fulfilled (i.e. questions with accepted answer - AA). In the second view by means of relative proportions, we
can see the constantly increasing proportion of these deleted or unanswered questions among all new
questions, which we denote as a failure rate. It is growing rapidly in 2011, three years after site’s
establishment, it was only 22.45% while in 2014, it was even 39.43%. The development of the failure rate
can be precisely predicted by a linear regression with a high significance (0.192 + 0.0048x; where x
corresponds to the order of month starting from January 2011; F(1,46) = 3144, p < 0.001, R2 = 0.986). It
means that the failure rate increases in average by 0.48% each month.
Fig. 1. Evolution of answering success of new questions posted each month [source: Query 1].
Not only content but also community behavior is constantly changing. Only a small fraction of the whole
community actively participates in question answering opposing the majority of lurkers (non-contributing
members of the community). In 2011, about 15.18% of all registered users posted at least one question or
answer each month in average, while in 2014, it was only 5.05% [source: Query 2]. In the detailed view on
active users (Figure 2), we can see that the number of active users was increasing until March 2014 and from
0
50 000
100 000
150 000
200 000
250 000
300 000
Jan 11
May 11
Sep 11
Jan 12
May 12
Sep 12
Jan 13
May 13
Sep 13
Jan 14
May 14
Sep 14
Number of questions
Deleted or Unanswered Answered without AA Answered with AA Total Count
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Jan 11
May 11
Sep 11
Jan 12
May 12
Sep 12
Jan 13
May 13
Sep 13
Jan 14
May 14
Sep 14
Relative proportion of all questions
3
that time, it remains rather constant. The proportion of the most important stable users (users who were and
remain active) is steadily decreasing (from 41.05% in 2011 to 34.89% in 2014) in contrast to the proportion
of one-time users (users who became active just in one month), which is on the contrary increasing (from
30.80% in 2011 to 33.12% in 2014). As the result, the number of one-time users even overcame the number
of stable users for the first time in November 2014. The outflow of stable users is related to a growing number
of churn users (users who were active at least once during previous three months and became inactive for
following three months; the definition was adopted from Pudipeddi et al. [4]). From April to June 2014, the
number of churn users for the first time notably overcame the number of newcomers (i.e. users who became
active in the particular month, opposite to churn users). This negative trend is reflected in a churn rate (a
relative proportion of users who churned in a particular month from all active users), which increased from
12.52% in 2011 to 15.85% in 2014. Similarly as the failure rate, also the churn rate can be modelled by a
linear regression (0.121 + 0.0009x; where x corresponds to the order of month starting from January 2011;
F(1,46) = 149, p < 0.001, R2 = 0.764). It means that the churn rate increases in average by 0.09% each month.
Fig. 2. Evolution of composition of active users [source: Query 3].
Moreover, besides the aggregate numbers also direct feedback from community members in various
Internet discussions and blogs (e.g. http://michael.richter.name/blogs/why-i-no-longer-contribute-to-
stackoverflow) points out the emerging problems that prevent the long-term sustainability of Stack Overflow.
Despite seriousness of this phenomenon (i.e. the increasing failure and churn rate), it has not been well-
described yet and furthermore, we are not aware of any particular works aiming to effectively face it.
Therefore, we conducted a case study on Stack Overflow to provide a deeper insight into the evolution of
CQA communities. Following the obtained results, analyses of state-of-the-art approaches as well as our
experiences with an educational CQA system Askalot [5], we propose a shift in providing adaptive
collaboration support that can contribute to preserve a long-term sustainability of CQA ecosystems.
0
20 000
40 000
60 000
80 000
100 000
120 000
140 000
160 000
180 000
Jan 11
May 11
Sep 11
Jan 12
May 12
Sep 12
Jan 13
May 13
Sep 13
Jan 14
May 14
Sep 14
Number of users
Newcomers One-time Stable Churn Total Count
0.0
0.1
0.2
0.3
0.4
0.5
Jan 11
May 11
Sep 11
Jan 12
May 12
Sep 12
Jan 13
May 13
Sep 13
Jan 14
May 14
Sep 14
Relative proportion of all active users
4
CHARACTERIZING CONTENT AND USERS IN CQA
Openness of CQA systems is closely connected with the diversity of users’ expertise and activity levels
as well as with quality of the created content. This diversity is fruitful for efficient knowledge sharing among
people with different levels of expertise, but at the same time, it also prevents CQA systems from becoming
trustful archives with entirely unique and quality content. In general, users in CQA systems can be
categorized in three dimensions: (1) according to their preferred activities (askers vs. answerers); (2)
according to an amount of activity carried out in a system (active vs. passive persons, so called lurkers); (3)
and according to their knowledge (level of expertise). Since these dimensions are perpendicular, it is possible
to combine them and thus categorize users according to behavior to various user stereotypes. In order to
achieve successful question answering, it is essential that the community comprises particular types of users
(e.g. active answerers with a high level of expertise). On the other hand, some types of users are not very
desirable, although they represent a natural element of each community (e.g. users who ask many low-quality
questions). A previous study [6] examined dynamics of these stereotypes on the basis of data from Super
User system collected before the end of August 2011. They recognized that the composition of the
community is constantly changing the proportion of some stereotypes in the total number of users is stable
while the proportion of other stereotypes (e.g. expert answerers and low-activity users) dynamically changes.
We suppose that the increasing failure and churn rate on Stack Overflow can be explained by a constant
evolution of proportionality of content quality and user stereotypes in time. More specifically, we
hypothesize that the undesirable types of users and their content has become too widespread and overloaded
potential answerers. In spite of several studies, which investigate evolution of individuals’ behavior (e.g.
churn prediction [4]), the existing research has not focused on time evolution of whole CQA communities
and their composition so far. We believe, that this perspective on CQA systems can provide us with deeper
insight into the emerging problems.
CASE STUDY: EVOLUTION OF COMMUNITY ON STACK OVERFLOW
In order to verify our hypothesis, we performed a case study on Stack Overflow, where the emerging
problems are probably most eminent. In addition, Stack Overflow plays a model role for Stack Exchange
platform, which unites a dozen of CQA sites with various topics (e.g. Ask Ubuntu, Mathematics).
Context of the Study. At first, we analyze discussions on Meta Stack Overflow, which is a specific part
of Stack Overflow with questions related to the system itself. We manually evaluated 150 questions with the
highest number of votes provided by the community (with voting score at least 7). We identified 12
questions, which directly pointed to the negative development of the community. All were posted after
March 2014, i.e. after the negative changes in the total number of questions and active users appeared. The
importance of these problems is even more highlighted by the fact that 1st and 3rd highest ranking questions
(by number of votes as of May 2015) were concerned with these topics: Why is Stack Overflow so negative
5
of late? (http://meta.stackoverflow.com/questions/251758/) and Question quality is dropping on Stack
Overflow (http://meta.stackoverflow.com/questions/252506/).
Community perception indicates that the decline on Stack Overflow is a result of its increasing popularity
and openness, which has resulted into a massive arrival of users with low level of expertise. Consequently
the system was flooded with too many questions that were not interesting for other users. Furthermore, the
community identified and denominated three groups of users who are the major origin of this undesired
content:
1. Help Vampires users who ask a great number of questions without spending any effort to find a
required knowledge (e.g. from search engines or archives of already solved questions). Consequently,
the posted questions are often tedious or even duplicated. Help vampires are interested only in getting
answers to their questions and they do not return the help received back to the community.
2. Noobs users with low level of expertise who create mainly trivial questions with poor quality. Noobs
overload the system with a significant amount of low-quality content and make finding of the unique
and interesting questions very difficult.
3. Reputation Collectors users who answer as many questions as possible (commonly regardless their
insufficient knowledge in the question’s topic), primarily to gain the reputation. On one side, these
users contribute to the system (by means of assisting experts in answering uninteresting questions), on
the other side, they mutually reinforce and motivate help vampires and noobs in asking more low-
quality questions.
Opposing these undesired groups of users, another part of Stack Overflow community exists: the Care
Takers experts, who want to keep the system clean and with valuable content. Care takers regularly search
for interesting questions and provide good answers to them. Their presence is essential and it is important to
provide adequate motivation to keep these users active and devoted to the community.
The communitybased perception points out that the proportionality of different types of users forming
the community has put the CQA ecosystem off balance what also supports our hypothesis. We consider
community feedback as relevant, yet we decided to support it by quantitative analyses.
Quantitative Study. We employed a dataset from Stack Overflow which contains all non-anonymous
activities. The dataset is publicly available and distributed under creative commons license by means of a
dataset dump or by Data Explorer tool, which allows us to investigate data by means of SQL queries. We
decided to employ Data Explorer tool to make the analyses presented in this paper easily reproducible at any
time (all presented results are accompanied by references to queries that are publicly available and executable
at the latest versions of datasets at http://data.stackexchange.com/users/16409/ivan-srba/). Due to this
solution, it is possible to use the same methodology and continue in monitoring the further evolution of Stack
Overflow as well as all CQA systems in Stack Exchange platform.
6
At first, we investigated evolution of content quality in time. We consider votes provided by the
community as a relatively precise estimation of content quality. However, overall score (a difference between
positive and negative votes) can be influenced by the various time lapse between content creation and time
when the analyses are performed. Thus to preserve reproducibility of our analyses, we take into consideration
only those votes that were created within one month after the content was posted. In addition, number of
votes can be influenced by popularity of particular topics. Therefore, to estimate quality of questions/answers
more accurately, we normalized their score by a mean score obtained by other questions/answers with the
same user-assigned tags (if a post has more than one tag assigned, the score is normalized by means of each
of these tags separately and an average from all partially normalized scores is calculated). According to this
score, we divided questions and answers into four groups: low quality (with negative score); neutral quality
(with score equal to 0); good quality (with positive score which is below 1.5 multiple of the average score);
and high quality (with positive score above 1.5).
The absolute numbers of new questions asked each month with good and high quality remained stable
regardless the long-term increase in the total number of all questions (see Figure 3). As the result, their
relative proportions decreased from 30.79% in 2011 to 18.43% in 2014 and from 5.96% to 2.02%,
respectively. This decline is associated with the growth in the number as well in the proportion of low-quality
questions that constituted only 4.99% of all questions in 2011 and in 2014, it was even 16.72% (the greatest
rise was between April and May 2014 by 5.24% when also the number of low-quality questions overcame
the number of good-quality questions). The proportion of neutral questions was relatively stable during all
four years (59.94% in average). It means that the most of the increase in the total number of questions is
created by uninteresting questions with zero or even negative score. This finding confirms our hypothesis as
well as the community perception that the system is flooded by content that nobody cares about, while really
interesting content is getting rarer.
Fig. 3. Evolution of questions’ quality [source: Query 4].
0
50 000
100 000
150 000
200 000
250 000
300 000
Jan 11
May 11
Sep 11
Jan 12
May 12
Sep 12
Jan 13
May 13
Sep 13
Jan 14
May 14
Sep 14
Number of questions
Low Quality Neutral Quality Good Quality High Quality Total Count
7
The increasing number of questions is not, however, reflected in the number of answers (see Figure 4).
Quite the opposite, it is possible to notice a three-month long decrease by 23.90% in the number of answers
from April to June 2014 (there are also other declines, e.g. between May and June 2013 by 8.73%,
nevertheless they are not so extensive). During this rapid decrease, the answering capacity of Stack Overflow
community returned to an average achieved in 2012 (about 280 thousands of answers per month). This
indicates that the community was not able to handle so many incoming (mainly low-quality and
uninteresting) questions what finally resulted in the continuous increase of the failure rate. Relative
proportions reveal the decrease of good and high-quality answers as well as the increase of low-quality
answers, however, they are not as significant as for questions.
Fig. 4. Evolution of answers quality [source: Query 5].
Secondly, we investigated the evolution of quantity of user stereotypes, which were denoted by the
community. To assign stereotypes to users, we employed a set of rules that are based on the previously
provided characteristics, which we additionally supported with the results from Furtado et al. [6]: 1) help
vampire is a user who asked during a particular month at least two questions and did not provide any answer;
2) noob is a user who asked at least of two questions of low or neutral quality; 3) reputation collector is a
user who provided at least three answers on low-quality questions and was systematically active during at
least three days; and finally, 4) care taker is a user who was similarly active at least three days and provided
high-quality answers on high-quality questions. We identified a long-term outflow of care takers (see Figure
5). Despite the overall community growth, the proportion of care takers among all active users decreased
from 3.70% in 2011 to 1.24% in 2014. The proportion of help vampires is relatively stable with average
about 13.15%. On the other side, the proportion of noobs increased from 6.23% in 2011 to 10.11% in 2014.
Reputation collectors also became more common from 4.11% in 2011 to 5.98% in 2014. These results
support the community perception that important care takers tend to leave the system. Also we confirmed
0
50 000
100 000
150 000
200 000
250 000
300 000
350 000
400 000
Jan 11
May 11
Sep 11
Jan 12
May 12
Sep 12
Jan 13
May 13
Sep 13
Jan 14
May 14
Sep 14
Number of answers
Low Quality Neutral Quality Good Quality High Quality Total Count
8
the rise of noobs and reputation collectors, nevertheless, it seems that help vampires always were a natural
element of CQA community.
Fig. 5. Evolution of stereotypes assigned to active users [source: Query 6].
In order to evaluate the relation between the failure/churn rate and content/user evolution numerically,
we calculated cross-correlation between time series representing the failure/churn rate (see Figure 1 and 2)
and relative proportions of each kind of content/users (see Figure 3, 4 and 5). To calculate cross-correlation,
the R script was employed (cross correlation was calculated by means of ccf function available in pracma
library; lag was set to -1 month to measure how content/user evolution leads the failure/churn rate; in
addition, a linear trend was removed from each time series by detrend function). It means that the obtained
results measure how well content/user development can predict changes in the failure/churn rate in the
following month.
The significant cross-correlations confirm that low-quality questions and low-quality answers lead to the
higher failure rate (see Table 1). Moreover, the correlations obtained for the churn rate confirmed that the
proportion of help vampires, noobs and reputation collectors leads to higher user churn. On the other side,
good and high quality content keeps users active and devoted to the community, nevertheless, the correlations
are in these cases just a little bit below the significance level.
0
5 000
10 000
15 000
20 000
25 000
Jan 11
May 11
Sep 11
Jan 12
May 12
Sep 12
Jan 13
May 13
Sep 13
Jan 14
May 14
Sep 14
Number of users
Help Vampires Noobs Reputation Collectors Care Takers
9
Table 1. Cross-correlations between time series representing the failure/churn rate and proportion of various kinds of
content/users analyzed in the quantitative study (significant correlations at 5% level of significance are highlighted in bold).
Failure Rate
Churn Rate
Questions
Low Quality
0.422
-0.112
Neutral Quality
-0.148
0.243
Good Quality
-0.149
-0.254
High Quality
0.041
-0.215
Answers
Low Quality
0.444
0.024
Neutral Quality
-0.035
0.121
Good Quality
-0.093
-0.065
High Quality
0.078
-0.270
Users
Help Vampires
0.071
0.558
Noobs
0.122
0.423
Reputation Collectors
-0.027
0.325
Care Takers
0.034
-0.239
To conclude, the results from the quantitative study confirmed the community perception that the low-
quality content and the undesired user groups are closely connected with the emerging problems. Thanks to
the common database structure in the whole Stack Exchange platform, Data Explorer tool allowed us to
perform the same analyses also with other CQA systems. For those systems that are recently undergoing
through a rapid expansion similarly as Stack Overflow previously did (e.g. Ask Ubuntu), we identified very
similar trends.
A SUGGESTION TO PRESERVE CQA SUSTAINABILITY
We identified several possibilities how to deal with the emerging problems. One option is to change
community rules and restrict the overall openness of CQA systems (e.g. limit a number of questions which
users can ask per week). However, this solution will solve the problems only partially (as there would be
still low-quality content, although its amount would be reduced) and temporally (as restrictions often pose
new unexpected problems). We emphasize that it is not possible to get completely rid of low quality content.
Instead of that, we suggest that CQA systems should take various content quality and users’ expertise into
consideration.
Only very recently, approaches to automatic detection of low quality posts [7] and content abusers [8],
[9] have been proposed to help moderators with deletion of these posts or banning these users, respectively.
These solutions might be really effective, however, in spite of their overall precision (varying from 0.70 to
0.80), they can misclassify good quality content or innocent users and as the result, taking the extreme actions
can finally lead to even more undesired and antisocial behavior [9]. Therefore, we suggest to solve the
10
emerging problems in alternative and less invasive ways. At first, it is possible to adjust a reputation system
to reflect a value of contributions more accurately and thus motivate users to provide good answers on good
questions (e.g. reputation received for providing an answer can reflect corresponding question difficulty).
We introduced an example of this kind of reputation system in our recent paper [10].
Another possibility, which we consider even more promising, is to provide users with an appropriate
adaptive collaboration support. In the CQA domain, several approaches have been already proposed to
adaptively support effective knowledge sharing (e.g. personalized recommendation of questions [11]).
Unfortunately, adaptive support methods still do not reflect the emerging problems appropriately. Moreover,
some of these methods even indirectly support the undesired user groups by giving preference to their goals
instead of other users (e.g. care takers). For preserving a long-term sustainability of CQA communities, it is,
however, necessary to change an attitude in providing an adaptive support. We propose two basic approaches
how to achieve this shift.
Answerer-oriented Approaches At first, a majority of the existing adaptive support methods can be
characterized as asker-oriented as they are either explicitly dedicated to askers or they are primarily focused
on askers’ goals while answerers’ preferences and expectations are suppressed.
The asker-oriented approach is visible especially in question routing methods (i.e. recommendation of
new questions to potential answerers). Most of the existing methods recommend questions to experts
regardless of the real question difficulty (and required minimum level of expertise for proper answering).
This approach is really successful in achieving askers’ goals (to receive a high quality answer), although it
completely overlooks those experts who prefer to answer more difficult and challenging questions within
their limited time capacities. In addition, existing methods usually route questions to potential answerers
with the same or very similar topics and thus answerers can lose motivation easily. In order to prevent this
filter bubble and to meet answerers’ expectations, diversification of recommendation should be applied [11].
Involvement of the Whole Community Another drawback of existing adaptive methods is that they
involve and motivate only a small portion of community to actively participate in the question answering
process. To maintain CQA ecosystem, it is necessary to satisfy expectations of all types of users [11].
For illustration, we continue with an example of the question routing problem. As the existing methods
prefer users with a high level of expertise, other users are involved only very rarely while experts are easily
overloaded and capacity of other users is left unutilized. Trivial questions (especially those asked by help
vampires or noobs) can be usually answered by other users who have sufficient knowledge, but are not
necessary experts with the highest level of expertise. Moreover, existing methods usually rely on previous
user activities in the CQA system and thus they are not able to route questions to newcomers (due to a well-
known cold start problem) or to lurkers (due to lack of their sufficient activity).
In our previous work [12], we demonstrated a complex example how the given recommendations to
preserve CQA sustainability can be applied in the case of question routing. We introduced a novel question
11
routing method which involves all possible answerers by employing their public non-QA data (i.e. about me
descriptions, blogs) in order to supplement QA data (i.e. data from the question answering process in CQA
system itself) and thus provide question recommendations more precisely and even to all kinds of users.
To sum it up, positive outcomes of CQA systems (e.g. the number of questions, the average time to the
first answer and the great archives of answered questions) outperform the emerging problems that are
reflected in the increasing failure and churn rate. Their negative impact is, however, significantly growing
in the recent time. A set of easily executable analyses, which we published in the Data Explorer tool, allow
to monitor the future development of Stack Overflow and other CQA systems based on Stack Exchange
platform. The conclusion from the performed study is that openness of CQA communities leads to increasing
numbers of users who create mainly low-quality and uninteresting questions. As the result, these questions
remain very often unanswered and demotivate experts who slowly leave the community.
We propose the attitude change in adaptive support methods to deal with the emerging problems and thus
to contribute to the long-time sustainability of CQA ecosystems. This shift aims to prevent expert
overloading, to make answerers more satisfied and to optimally utilize knowledge embedded in all
community members, and can be described by two approaches:
1. instead of giving a focus only on askers and their goals, preferences and expectations of answerers
should be considered as well;
2. instead of involving only a subset of active and expert users, a whole community should be engaged in
a question answering process.
ACKNOWLEDGEMENT
This work was partially supported by grant No. VG 1/0646/15 and KEGA 009STU-4/2014, and is the
partial result of the Research & Development Operational Programme for the project Research of methods
for acquisition, analysis and personalized conveying of information and knowledge, ITMS 26240220039,
co-funded by the ERDF.
REFERENCES
[1] Q. Liu, E. Agichtein, G. Dror, Y. Maarek, and I. Szpektor, “When Web Search Fails, Searchers Become Askers:
Understanding the Transition,” in Proceedings of the 35th International ACM SIGIR Conference on Research and
Development in Information Retrieval - SIGIR ’12, 2012, pp. 801810.
[2] L. Mamykina, B. Manoim, M. Mittal, G. Hripcsak, and B. Hartmann, “Design Lessons from the Fastest Q&A Site in the
West,” in Proceedings of the 2011 annual conference on Human factors in computing systems - CHI ’11, 2011, pp.
28572866.
[3] C. Treude, O. Barzilay, and M.-A. Storey, “How Do Programmers Ask and Answer Questions on the Web? (NIER
Track),” in Proceedings of the 33rd international conference on Software engineering - ICSE ’11, 2011, p. 804.
12
[4] J. Pudipeddi, L. Akoglu, and H. Tong, “User Churn in Focused Question Answering Sites: Characterizations and
Prediction,” in Proceedings of the 23st international conference companion on World Wide Web - WWW ’14
Companion, 2014, pp. 469474.
[5] I. Srba and M. Bielikova, “Askalot: Community Question Answering as a Means for Knowledge Sharing in an
Educational Organization,” in Proceedings of the 18th ACM Conference Companion on Computer Supported
Cooperative Work & Social Computing - CSCW’15 Companion, 2015, pp. 179182.
[6] A. Furtado, N. Andrade, N. Oliveira, and F. Brasileiro, “Contributor Profiles, their Dynamics, and their Importance in
Five Q&A Sites,” in Proceedings of the 2013 conference on Computer supported cooperative work - CSCW ’13, 2013,
pp. 12371252.
[7] L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza, and D. Fullerton, “Improving Low Quality Stack Overflow Post
Detection,” in Proceedings of IEEE International Conference on Software Maintenance and Evolution - ICSME ’14,
2014, pp. 541544.
[8] I. Kayes, N. Kourtellis, D. Quercia, A. Iamnitchi, and F. Bonchi, “The Social World of Content Abusers in Community
Question Answering,” in Proceedings of the 24th International Conference on World Wide Web - WWW ’15, 2015, pp.
570580.
[9] J. Cheng, C. Danescu-niculescu-mizil, and J. Leskovec, “Antisocial Behavior in Online Discussion Communities,” in
Proceedings of AAAI International Conference on Weblogs and Social Media - ICWSM ’15, 2015.
[10] A. Huna, I. Srba, and M. Bielikova, “Exploiting Content Quality and Question Difficulty in CQA Reputation Systems,”
in Proceedings of International Conference on Nework Science - NetSciX ’16, 2016, to appear.
[11] I. Szpektor, Y. Maarek, and D. Pelleg, “When Relevance is not Enough: Promoting Diversity and Freshness in
Personalized Question Recommendation,” in Proceedings of the 22nd international conference on World Wide Web -
WWW ’13, 2013, pp. 12491259.
[12] I. Srba, M. Grznar, and M. Bielikova, “Utilizing Non-QA Data to Improve Questions Routing for Users with Low QA
Activity in CQA,” in Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining - ASONAM ’15, 2015, pp. 129136.
Ivan Srba is currently a doctoral student at the Institute of Informatics and Software Engineering,
Slovak University of Technology in Bratislava. His research interests are in the area of web-based
systems, which utilize concepts of collaboration and collective intelligence. Contact him at
ivan.srba@stuba.sk.
Maria Bielikova is a full professor at the Institute of Informatics and Software Engineering, Slovak
University of Technology in Bratislava. Her research interests are in the area of web-based
systems with special emphasis on personalization, context awareness and collaboration. She is a
senior member of IEEE, a member of its Computer Society, senior member of ACM and member
of International Society for Web Engineering. Contact her at maria.bielikova@stuba.sk.
13
SUMMARY QUESTIONS AND HIGHLIGHTS
1. Why Stack Overflow fails? Answering questions about sustainability in Community Question Answering.
2. Analyses of recent evolution of Stack Overflow revealed new emerging problems: increasing failure and
churn rate.
3. Help vampires, noobs, reputation collectors and their influence on successfulness of Stack Overflow.
4. Proposing suggestions to preserve a long-term sustainability of Community Question Answering.
5. Answerer-oriented approaches and involvement of the whole community - how to maintain Community
Question Answering.
... [ [60][61][62] Answer ranking If the answers receive on a question are more, then all answers cannot be displayed on a single webpage. Hence, to serve the visitors, it is needed to list the answer by following their content. ...
... The low-quality content on the SO website increased to 16.83% in 2016 compared to 4.11% in 2011 [60]. Espina et al. [112] focussed on why a question was asked on CQAs. ...
... While much research has shed light on expert user identification, Srba et al. [60] categorised inefficient CQA users in three different categories: (i) help vampires, (ii) noobs, and (iii) reputation collectors. Help vampires generally post questions without trying to find the answers from the existing database or other Internet sources. ...
Article
Full-text available
Over the last couple of decades, community question-answering sites (CQAs) have been a topic of much academic interest. Scholars have often leveraged traditional machine learning (ML) and deep learning (DL) to explore the ever-growing volume of content that CQAs engender. To clarify the current state of the CQA literature that has used ML and DL, this paper reports a systematic literature review. The goal is to summarise and synthesise the major themes of CQA research related to (i) questions, (ii) answers and (iii) users. The final review included 133 articles. Dominant research themes include question quality, answer quality, and expert identification. In terms of dataset, some of the most widely studied platforms include Yahoo! Answers, Stack Exchange and Stack Overflow. The scope of most articles was confined to just one platform with few cross-platform investigations. Articles with ML outnumber those with DL. Nonetheless, the use of DL in CQA research is on an upward trajectory. A number of research directions are proposed. KEY WORDS: answer quality, community question answering, deep learning, expert user, machine learning, question quality
... Despite their obvious advantages, SQAs are marred by the perennial problem of low-quality answers [10]. Existing research suggests that the problem primarily stems from the lack of adequate gate-keeping [11]. ...
... More worryingly, answer quality on SQAs have been reported to be following a downward trajectory in recent years [10]. For example, the proportion of low-quality content on Stack Overflow has grown from 4% in 2011 to 16% in 2014 [10]. ...
... More worryingly, answer quality on SQAs have been reported to be following a downward trajectory in recent years [10]. For example, the proportion of low-quality content on Stack Overflow has grown from 4% in 2011 to 16% in 2014 [10]. More recently, [17] revealed that Chinese SQAs such as Baidu Zhidao also attract large volumes of low-quality answers. ...
Article
Social Question Answering sites (SQAs) are online platforms that allow Internet users to ask questions , and obtain answers from others in the community. SQAs have been marred by the problem of low-quality answers. Worryingly, answer quality on SQAs have been reported to be following a downward trajectory in recent years. To this end, existing research has predominantly focused on finding the best answer, or identifying high-quality answers among the available responses. However, such scholarly efforts have not reduced the volume of low-quality answers on SQAs. Therefore, the goal of this research is to extract features in order to weed out low-quality answers as soon as they are posted on SQAs. Data from Stack Exchange was used to carry out the investigation. Informed by the literature , 26 features were extracted. Thereafter, machine learning algorithms were implemented that could correctly identify 85% to 96% of low-quality answers. The key contribution of this research is the development of a system to detect subpar answers on the fly at the time of posting. It is intended to be used as an early warning system that warns users about answer quality at the point of posting.
... The adverse consequences of such a decline in user engagement include an increased tendency for users to churn (i.e., leave the website without any intention to return anytime soon) [38,67,77], delayed services (e.g., late responses) [10,49], and other issues concerning sustainability [67,89]. Therefore, many CQAs are actively seeking solutions to address the problem of user engagement during specific periods such as the holiday season. ...
Preprint
Full-text available
Community Question Answering Websites (CQAs) like Stack Overflow rely on continuous user contributions to keep their services active. Nevertheless, they often undergo a sharp decline in their user participation during the holiday season, undermining their performance. To address this issue, some CQAs have developed their own special promotional gamification schemes to incentivize users to maintain their contributions throughout the holiday season. These promotional gamification schemes are often time-limited, optional, and run alongside the default gamification schemes of their websites. However, the impact of such promotional gamification schemes on user behavior remains largely unexplored in the existing literature. This paper takes the first steps toward filling this knowledge gap by conducting a large-scale empirical study of a particular promotional gamification scheme called Winter Bash (WB) on the CQA of Stack Overflow. According to our findings, promotional gamification schemes may not be the panacea they are portrayed to be. For example, in the case of WB, we find that the scheme is not effective for improving the collective engagement of all users. Only some particular user types (i.e., experienced and reputable users) are often provoked under WB. Most novice users, who comprise the majority of Stack Overflow website's user base, seem to be indifferent to such a gamification scheme. Our research also shows the importance of studying the quantity and quality of user engagement in unison to better understand the effectiveness of a gamification scheme. Previous gamification studies in the literature have focused predominantly on studying the quantity of user engagement alone. Last but not least, we conclude our paper by presenting some practical considerations for improving the design of future promotional gamification schemes in CQAs and similar platforms.
... Prior studies also aimed to understand the quality of the crowdsourced knowledge on Stack Overflow. For example, Srba and Bielikova (2016) observed that an increasing amount of content with relatively lower quality is affecting the Stack Overflow community. Lower quality content on Stack Overflow may also affect how questions are answered. ...
Article
Full-text available
Stack Overflow provides a means for developers to exchange knowledge. While much previous research on Stack Overflow has focused on questions and answers (Q&A), recent work has shown that discussions in comments also contain rich information. On Stack Overflow, discussions through comments and chat rooms can be tied to questions or answers. In this paper, we conduct an empirical study that focuses on the nature of question discussions. We observe that: (1) Question discussions occur at all phases of the Q&A process, with most beginning before the first answer is received. (2) Both askers and answerers actively participate in question discussions; the likelihood of their participation increases as the number of comments increases. (3) There is a strong correlation between the number of question comments and the question answering time (i.e., more discussed questions receive answers more slowly). Our findings suggest that question discussions contain a rich trove of data that is integral to the Q&A processes on Stack Overflow. We further suggest how future research can leverage the information in question discussions, along with the commonly studied Q&A information.
... Popular CQA websites like Quora, Stack Overflow, and Stack Exchange sites may have tens of thousands of questions posted every day. This led to a severe gap between the posted questions and potential experts who can provide answers, resulting in as much as 30% of the questions remaining unanswered [2]. ...
Preprint
Community Question Answering (CQA) websites have become valuable knowledge repositories where individuals exchange information by asking and answering questions. With an ever-increasing number of questions and high migration of users in and out of communities, a key challenge is to design effective strategies for recommending experts for new questions. In this paper, we propose a simple graph-diffusion expert recommendation model for CQA, that can outperform state-of-the art deep learning representatives and collaborative models. Our proposed method learns users' expertise in the context of both semantic and temporal information to capture their changing interest and activity levels with time. Experiments on five real-world datasets from the Stack Exchange network demonstrate that our approach outperforms competitive baseline methods. Further, experiments on cold-start users (users with a limited historical record) show our model achieves an average of ~ 30% performance gain compared to the best baseline method.
... The quality of the user-generated content is a key factor to attract users to visit the CQA sites, such as Stack Overflow. Prior work suggests that a quality decay problem occurs in these CQA community due to the growth in the number of duplicate questions [60]. This makes finding answers to a question harder and may dilute quality of answers. ...
Article
Stack Overflow has been heavily used by software developers to seek programming-related information. More and more developers use Community Question and Answer forums, such as Stack Overflow, to search for code examples of how to accomplish a certain coding task. This is often considered to be more efficient than working from source documentation, tutorials or full worked examples. However, due to the complexity of these online Question and Answer forums and the very large volume of information they contain, developers can be overwhelmed by the sheer volume of available information. This makes it hard to find and/or even be aware of the most relevant code examples to meet their needs. To alleviate this issue, in this work we present a query-driven code recommendation tool, named Que2Code , that identifies the best code snippets for a user query from Stack Overflow posts. Our approach has two main stages: (i) semantically-equivalent question retrieval and (ii) best code snippet recommendation. During the first stage, for a given query question formulated by a developer, we first generate paraphrase questions for the input query as a way of query boosting, and then retrieve the relevant Stack Overflow posted questions based on these generated questions. In the second stage, we collect all of the code snippets within questions retrieved in the first stage and develop a novel scheme to rank code snippet candidates from Stack Overflow posts via pairwise comparisons. To evaluate the performance of our proposed model, we conduct a large scale experiment to evaluate the effectiveness of the semantically-equivalent question retrieval task and best code snippet recommendation task separately on Python and Java datasets in Stack Overflow. We also perform a human study to measure how real-world developers perceive the results generated by our model. Both the automatic and human evaluation results demonstrate the promising performance of our model, and we have released our code and data to assist other researchers.
... Prior studies also aimed to understand the quality of the crowdsourced knowledge on Stack Overflow. For example, Srba et al. [35] observed that an increasing amount of content with relatively lower quality is affecting the Stack Overflow community. Lower quality content on Stack Overflow may also affect how questions are answered. ...
Preprint
Full-text available
Stack Overflow provides a means for developers to exchange knowledge. While much previous research on Stack Overflow has focused on questions and answers (Q&A), recent work has shown that discussions in comments also contain rich information. On Stack Overflow, discussions through comments and chat rooms can be tied to questions or answers. In this paper, we conduct an empirical study that focuses on the nature of question discussions. We observe that: (1) Question discussions occur at all phases of the Q&A process, with most beginning before the first answer is received. (2) Both askers and answerers actively participate in question discussions; the likelihood of their participation increases as the number of comments increases. (3) There is a strong correlation between the number of question comments and the question answering time (i.e., more discussed questions receive answers more slowly); also, questions with a small number of comments are likely to be answered more quickly than questions with no discussion. Our findings suggest that question discussions contain a rich trove of data that is integral to the Q&A processes on Stack Overflow. We further suggest how future research can leverage the information in question discussions, along with the commonly studied Q&A information.
Article
Home care workers (HCWs) are increasingly central to post-acute and long-term health services in the United States. Despite being a critical component of the day-to-day care of home-dwelling adults, these workers often feel underappreciated and isolated on the job and come from low-income and marginalized backgrounds. Leveraging the support of peers is one potential way to empower HCWs, but peer support encompasses a broad range of activities and aspects. Traditional conceptions of workplace support may not be appropriate to the home care context, as HCWs are a distributed workforce who have few opportunities to interact with each other. In this study, we explore how HCWs value and conceptualize peer support. Our findings demonstrate the importance of peer support in performing the emotional labor of home care work and ongoing attempts to strategically frame the home care profession as essential and medical in nature. Our results ground design implications for technology-enabled peer support based on the power dynamics of our participants' context and allow us to engage with issues where technology design for empowerment intersects with exploitation in distributed or crowd work, emotional labor, and tacit knowledge.
Conference Paper
Full-text available
Community Question Answering (CQA) systems (e.g. StackOverflow) have gained popularity in the last years. With the increasing community size and amount of user generated content, a task of expert identification arose. To tackle this problem, various reputation mechanisms exist, however, they estimate user reputation especially according to overall user activity, while the quality of contributions is considered only secondary. As the result, reputation usually does not reflect the real value of users’ contributions and, moreover, some users (so called reputation collectors) purposefully abuse reputation systems to achieve a high reputation score. We propose a novel reputation mechanism that focuses primarily on the quality and difficulty of users’ contributions. Calculated reputation was compared with four baseline methods including the reputation schema employed in Stack Exchange platform. The experimental results showed a higher precision achieved by our approach, and confirmed an important role of contribution quality and difficulty in estimation of user reputation.
Conference Paper
Full-text available
Community Question Answering (CQA) systems, such as Yahoo! Answers and Stack Overflow, represent a well-known example of collective intelligence. The existing CQA systems, despite their overall successfulness and popularity, fail to answer a significant amount of questions in required time. One option for scaffolding collaboration in CQA systems is a recommendation of new questions to users who are suitable candidates for providing correct answers (so called question routing). Various methods have been proposed so far to find appropriate answerers, but almost all approaches heavily depend on previous users' activities in a particular CQA system (i.e. QA-data). In our work, we attempt to involve a whole community including users with no or minimal previous activity (e.g. newcomers or lurkers). We proposed a question routing method which analyses users' non-QA data from a CQA system itself as well as from external services and platforms, such as blogs, microblogs or social networking sites, in order to estimate users' interests and expertise early and more precisely. Consequently, we can recommend new questions to a wider part of a community as well as more accurately. Evaluation on a dataset from Stack Exchange platform showed that considering non-QA data leads not only to better recognition of users with low activity as suitable answerers, but also to higher overall precision of the recommendations. It implies that non-QA data can supplement QA data during expertise estimation in question routing and thus also improve a success rate of a questions answering process.
Conference Paper
Full-text available
Community Question Answering (CQA) is a well-known example of a knowledge management system for effective knowledge sharing in open online communities. In spite of the increasing research effort in recent years, the beneficial effects of CQA systems have not been fully discovered in organizational and educational environments yet. We present a novel concept of an organization-wide educational CQA system that fills the gap between open and too restricted class communities of learners. In order to evaluate its feasibility, we de-signed CQA system Askalot. Askalot was experimentally evaluated during a summer term at our university with more than 600 users. The results of the experiment provide an insight into employment of CQA systems as nontraditional learning environments that utilize a diversity of students' knowledge in a whole organization.
Article
Full-text available
Community-based question answering platforms can be rich sources of information on a variety of specialized topics, from finance to cooking. The usefulness of such platforms depends heavily on user contributions (questions and answers), but also on respecting the community rules. As a crowd-sourced service, such platforms rely on their users for monitoring and flagging content that violates community rules. Common wisdom is to eliminate the users who receive many flags. Our analysis of a year of traces from a mature Q&A site shows that the number of flags does not tell the full story: on one hand, users with many flags may still contribute positively to the community. On the other hand, users who never get flagged are found to violate community rules and get their accounts suspended. This analysis, however, also shows that abusive users are betrayed by their network properties: we find strong evidence of homophilous behavior and use this finding to detect abusive users who go under the community radar. Based on our empirical observations, we build a classifier that is able to detect abusive users with an accuracy as high as 83%.
Article
User contributions in the form of posts, comments, and votes are essential to the success of online communities. However, allowing user participation also invites undesirable behavior such as trolling. In this paper, we characterize antisocial behavior in three large online discussion communities by analyzing users who were banned from these communities. We find that such users tend to concentrate their efforts in a small number of threads, are more likely to post irrelevantly, and are more successful at garnering responses from other users. Studying the evolution of these users from the moment they join a community up to when they get banned, we find that not only do they write worse than other users over time, but they also become increasingly less tolerated by the community. Further, we discover that antisocial behavior is exacerbated when community feedback is overly harsh. Our analysis also reveals distinct groups of users with different levels of antisocial behavior that can change over time. We use these insights to identify antisocial users early on, a task of high practical importance to community maintainers.
Conference Paper
Q&A sites currently enable large numbers of contributors to collectively build valuable knowledge bases. Naturally, these sites are the product of contributors acting in different ways - creating questions, answers or comments and voting in these - contributing in diverse amounts, and creating content of varying quality. This paper advances present knowledge about Q&A sites using a multifaceted view of contributors that accounts for diversity of behavior, motivation and expertise to characterize their profiles in five sites. This characterization resulted in the definition of ten behavioral profiles that group users according to the quality and quantity of their contributions. Using these profiles, we find that the five sites have remarkably similar distributions of contributor profiles. We also conduct a longitudinal study of contributor profiles in one of the sites, identifying common profile transitions, and finding that although users change profiles with some frequency, the site composition is mostly stable over time.
Conference Paper
What makes a good question recommendation system for community question-answering sites? First, to maintain the health of the ecosystem, it needs to be designed around answerers, rather than exclusively for askers. Next, it needs to scale to many questions and users, and be fast enough to route a newly-posted question to potential answerers within the few minutes before the asker's patience runs out. It also needs to show each answerer questions that are relevant to his or her interests. We have designed and built such a system for Yahoo! Answers, but realized, when testing it with live users, that it was not enough. We found that those drawing-board requirements fail to capture user's interests. The feature that they really missed was diversity. In other words, showing them just the main topics they had previously expressed interest in was simply too dull. Adding the spice of topics slightly outside the core of their past activities significantly improved engagement. We conducted a large-scale online experiment in production in Yahoo! Answers that showed that recommendations driven by relevance alone perform worse than a control group without question recommendations, which is the current behavior. However, an algorithm promoting both diversity and freshness improved the number of answers by 17%, daily session length by 10%, and had a significant positive impact on peripheral activities such as voting.
Conference Paper
Given a user on a Q&A site, how can we tell whether s/he is engaged with the site or is rather likely to leave? What are the most evidential factors that relate to users churning? Question and Answer (Q&A) sites form excellent repositories of collective knowledge. To make these sites self- sustainable and long-lasting, it is crucial to ensure that new users as well as the site veterans who provide most of the answers keep engaged with the site. As such, quantifying the engagement of users and preventing churn in Q&A sites are vital to improve the lifespan of these sites. We study a large data collection from stackoverflow.com to identify significant factors that correlate with newcomer user churn in the early stage and those that relate to veterans leaving in the later stage. We consider the problem under two settings: given (i) the first k posts, or (ii) first T days of activity of a user, we aim to identify evidential features to automatically classify users so as to spot those who are about to leave. We find that in both cases, the time gap between subsequent posts is the most significant indicator of diminishing interest of users, besides other indicative factors like answering speed, reputation of those who answer their questions, and number of answers received by the user.
Article
While Web search has become increasingly effective over the last decade, for many users' needs the required answers may be spread across many documents, or may not exist on the Web at all. Yet, many of these needs could be addressed by asking people via popular Community Question Answering (CQA) services, such as Baidu Knows, Quora, or Yahoo! Answers. In this paper, we perform the first large-scale analysis of how searchers become askers. For this, we study the logs of a major web search engine to trace the transformation of a large number of failed searches into questions posted on a popular CQA site. Specifically, we analyze the characteristics of the queries, and of the patterns of search behavior that precede posting a question; the relationship between the content of the attempted queries and of the posted questions; and the subsequent actions the user performs on the CQA site. Our work develops novel insights into searcher intent and behavior that lead to asking questions to the community, providing a foundation for more effective integration of automated web search and social information seeking.