Article

Wisdom in the social crowd: An analysis of quora

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In CQA websites, after posting questions, many answerers will continuously contribute to their interested ones. Particularly, some attractive questions can appeal to even hundreds of answers, and also, some of them may make users lose their interest due to their long paragraphs [1]. Thus, it is hard for the askers and attracted visitors to quickly find the valuable answers, especially for those answers whose views and upvotes are not stable yet. ...
... We collected a dataset including 372 818 questions and 1 739 222 answers associated with topics, upvotes, timestamps, etc., from Quora using the approach described in [1]. For training a robust model, we only reserved the questions with more than ten answers among which the most number of upvotes is over 20. ...
... Then, answers with more than ten upvotes were treated as good answers and the rest of them were treated as bad ones (i.e., ground truth of good answers equals to 1; 0 otherwise). For answer ranking, we sorted the answers of each question by their final upvotes and treated that ranking as the ground truth [1], [7]. Moreover, we also analyze some distributions of the number of sentences and words. ...
... Most of the answers come from a small number of users, resulting in very sparse text information available to evaluate users [13,14]. Take the user's activities in Quora as an example [15,16], 90% of questions on Quora have less than ten answers and Over 30% of Quora registered users haven't responded to any question, the number of users who answered more than 4 questions is only 16.74%. Second, the capacity of text understanding of currently used deep neural networks (e.g., CNN's or LSTM's) is still limited, especially when dealing with short questions. ...
... In Sections 4.2 and 4.3, we describe how to characterize the nodes in CQA heterogeneous networks more accurately from two perspectives by exploring the proximity relationship and semantic relevance. With the above formulations, we optimize the following joint objective function, which is a weighted combination of the topology-based embedding loss and the content-based embedding loss: (15) where γ ∈ [0, 1] is a parameter to balance the importance of the two loss functions. When γ increases, more structural information will be considered to characterize the nodes in the network, and conversely, textual content information will account for a larger share. ...
... OSNs have become very popular in recent decades. Representative studies have been conducted on various social media, including Twitter [3], Quora [6], Foursquare [7] and a great many. As a representative software development-centric OSN, GitHub also catches the attention of researchers [8] [9] [10]. ...
... For instance, the Twitter network has an average degree of 35 [3], about 5 times as many as graph G. Second, most users tend to participate in the graph unilaterally. Even in similar directed OSNs like Quora, more than 50% of users have bilateral edges [6]. However, only 20% of users in graph G have both. ...
Conference Paper
Full-text available
As one of the biggest online developer communities, GitHub supports interactions between millions of developersaround the world. This paper focuses on detecting and analyzingthe interactions on GitHub from several perspectives. First, from a global viewpoint, we build an interaction graph based on GitHub Events to investigate the general structure and attributes of the GitHub interaction network. Second, from the perspective of important users on GitHub, we pay particular attention to those who bridge social circles by spanning across communities. Concretely, we apply the structural hole theory to identify and explore the structural hole spanners in this interaction network. Last but not least, we compare structural hole spanners with ordinary users in several aspects and excavate distinctions between them. To our best knowledge, this is the first study that applies structural hole theory to the interaction graph of a leading social network. It not only provides a unique stand point on user interactions, but also gives a comprehensive inspection of structural hole spanners.
... In order to explore the QA and social features of SCQA, we first present the related statistical analysis of our crawled data. From this analysis, we can find that Zhihu has the similar QA and social features as Quora studied in [21]. Questions and Answers. ...
... The exponential fitting parameter α for the follower count distribution is 1.84 with standard error 0.001, which is close to that of Twitter (α=2.28) [21]. The average numbers of followers and followees per user are around 12 and 43. ...
... • Quora 17 is an innovative Q&A site with a rapidly growing user community that differs from its competitors by integrating a social network into its basic structure (Wang et al., 2013). Users sign in and can immediately start searching for answers to specific questions or topics, which are subject headings assigned by users. ...
Chapter
Full-text available
This chapter uncovers the opportunities that online media portals like content sharing and consumption sites or photography sites have for informal learning. The authors explored online portals that can provide evidence of evaluating, inferring, measuring skills, and/or contributing to the development of competencies and capabilities of the 21 st century with two case studies. The first one is focused on identifying data science topical experts across the Reddit community. The second one uses online Flickr data to apply a model on the photographs to rate them as aesthetically attractive and technically sound, which can serve as a base for measuring the photography capabilities of Flickr users. The presented results can serve as a base to replicate these methodologies to infer other competencies and capabilities across online portals. This novel approach can be an effective alternative evaluation of key 21 st century skills for the modern workforce with multiple applications.
... We therefore looked for some social media that binds on language and discusses on convergent issues. Quora is one such platform and analysis of Quora discussion forums can be found in literature as well (Wang et al., 2013;Maity et al., 2018). So for the present work, Quora Bangla dataset has been taken and the spread of interactions on Quora Bangla posts amongst the Bangla speaking community across the border is explored. ...
... The rapid development of OSNs have attracted millions of users and have produced a large amount of data for user behavior study. A number of papers have studied the user behavior by crawling one or multiple OSN sites, such as Facebook [61], Twitter [34], [59], Pinterest [21] and Quora [56]. To analyze the properties of the entire Foursquare population, we aim to obtain a dataset covering all Foursquare users. ...
Article
Full-text available
Being a leading online service providing both local search and social networking functions, Foursquare has attracted tens of millions of users all over the world. Understanding the user behavior of Foursquare is helpful to gain insights for location-based social networks (LBSNs). Most of the existing studies focus on a biased subset of users, which cannot give a representative view of the global user base. Meanwhile, although the user-generated content (UGC) is very important to reflect user behavior, most of the existing UGC studies of Foursquare are based on the check-ins. There is a lack of a thorough study on tips, the primary type of UGC on Foursquare. In this article, by crawling and analyzing the global social graph and all published tips, we conduct the first comprehensive user behavior study of all 60+ million Foursquare users around the world. We have made the following three main contributions. First, we have found several unique and undiscovered features of the Foursquare social graph on a global scale, including a moderate level of reciprocity, a small average clustering coefficient, a giant strongly connected component, and a significant community structure. Besides the singletons, most of the Foursquare users are weakly connected with each other. Second, we undertake a thorough investigation according to all published tips on Foursquare. We start from counting the numbers of tips published by different users and then look into the tip contents from the perspectives of tip venues, temporal patterns, and sentiment. Our results provide an informative picture of the tip publishing patterns of Foursquare users. Last but not least, as a practical scenario to help third-party application providers, we propose a supervised machine learning-based approach to predict whether a user is an influential by referring to the profile and UGC, instead of relying on the social connectivity information. Our data-driven evaluation demonstrates that our approach can reach a good prediction performance with an F1-score of 0.87 and an AUC value of 0.88. Our findings provide a systematic view of the behavior of Foursquare users and are constructive for different relevant entities, including LBSN service providers, Internet service providers, and third-party application providers.
... Question Answering sites like Yahoo and Google Answers existed over a decade however they failed to keep up the content value [32] of their topics and answers due to a lot of junk information posted; thus their user base declined. On the other hand, Quora is an emerging site for the quality content, launched in 2009 and as of 2019, it is estimated to have 300 million active users 1 . ...
Preprint
Identifying semantically identical questions on, Question and Answering social media platforms like Quora is exceptionally significant to ensure that the quality and the quantity of content are presented to users, based on the intent of the question and thus enriching overall user experience. Detecting duplicate questions is a challenging problem because natural language is very expressive, and a unique intent can be conveyed using different words, phrases, and sentence structuring. Machine learning and deep learning methods are known to have accomplished superior results over traditional natural language processing techniques in identifying similar texts. In this paper, taking Quora for our case study, we explored and applied different machine learning and deep learning techniques on the task of identifying duplicate questions on Quora's dataset. By using feature engineering, feature importance techniques, and experimenting with seven selected machine learning classifiers, we demonstrated that our models outperformed previous studies on this task. Xgboost model with character level term frequency and inverse term frequency is our best machine learning model that has also outperformed a few of the Deep learning baseline models. We applied deep learning techniques to model four different deep neural networks of multiple layers consisting of Glove embeddings, Long Short Term Memory, Convolution, Max pooling, Dense, Batch Normalization, Activation functions, and model merge. Our deep learning models achieved better accuracy than machine learning models. Three out of four proposed architectures outperformed the accuracy from previous machine learning and deep learning research work, two out of four models outperformed accuracy from previous deep learning study on Quora's question pair dataset, and our best model achieved accuracy of 85.82% which is close to Quora state of the art accuracy.
... A number of previous research works have investigated user interests and motivations for participating in CQA [27], [40]. Adamic et al. [1] studied the impact of CQA. ...
Article
Full-text available
Community question–answering (CQA) has become a popular method of online information seeking. Within these services, peers ask questions and create answers to those questions. For some time, content repositories created through CQA sites have widely supported general-purpose tasks; however, they can also be used as online digital libraries that satisfy specific needs related to education. Horizontal CQA services, such as Yahoo! Answers, and vertical CQA services, such as Brainly, aim to help students improve their learning process via Q&A exchanges. In addition, Stack Overflow—another vertical CQA—serves a similar purpose but specifically focuses on topics relevant to programmers. Receiving high-quality answer(s) to a posed CQA query is a critical factor to both user satisfaction and supported learning in these services. This process can be impeded when experts do not answer questions and/or askers do not have the knowledge and skills needed to evaluate the quality of the answers they receive. Such circumstances may cause learners to construct a faulty knowledge base by applying inaccurate information acquired from online sources. Though site moderators could alleviate this problem by surveying answer quality, their subjective assessments may cause evaluations to be inconsistent. Another potential solution lies in human assessors, though they may also be insufficient due to the large amount of content available on a CQA site. The following study addresses these issues by proposing a framework for automatically assessing answer quality. We accomplish this by integrating different groups of features—personal, community-based, textual, and contextual—to build a classification model and determine what constitutes answer quality. We collected more than 10 million educational answers posted by more than 3 million users on Brainly and 7.7 million answers on Stack Overflow to test this evaluation framework. The experiments conducted on these data sets show that the model using random forest achieves high accuracy in identifying high-quality answers. Findings also indicate that personal and community-based features have more prediction power in assessing answer quality. Additionally, other key metrics such as F1-score and area under ROC curve achieve high values with our approach. The work reported here can be useful in many other contexts that strive to provide automatic quality assessment in a digital repository.
... Many of the emerging OSNs, such as Foursquare [6]/Swarm [5], Quora [18] and Pinterest [12], take advantage of their users' accounts on dominant OSNs to enhance their function-orientated services. They support a cross-site linking function [10], allowing users to link their accounts on dominant OSNs, e.g., Facebook and Twitter. ...
Chapter
Full-text available
Online social networks (OSNs) have become a commodity in our daily-life. Besides the dominant platforms such as Facebook and Twitter, several emerging OSNs have been launched recently, where users may generate less activity data than on dominant ones. Identifying influential users is critical for the advertisement and the initial development of the emerging OSNs. In this work, we investigate the identification of potential influential users in these emerging OSNs. We build a supervised machine learning-based system by leveraging the widely adopted cross-site linking function, which could overcome the limitations of referring to the user data of a single OSN. Based on the collected real data from Twitter (a dominant OSN) and Medium (an emerging OSN), we show that our system is able to achieve an F1-score of 0.701 and an AUC of 0.755 in identifying influential users on Medium using the Twitter data only.
... In (Wang et al. 2013), the authors perform a detailed analysis of Quora. They show that heterogeneity in the user and question graphs are significant contributors to the quality of Quora's knowledge base. ...
Preprint
Full-text available
Anonymity forms an integral and important part of our digital life. It enables us to express our true selves without the fear of judgment. In this paper, we investigate the different aspects of anonymity in the social Q&A site Quora. The choice of Quora is motivated by the fact that this is one of the rare social Q&A sites that allow users to explicitly post anonymous questions and such activity in this forum has become normative rather than a taboo. Through an analysis of 5.1 million questions, we observe that at a global scale almost no difference manifests between the linguistic structure of the anonymous and the non-anonymous questions. We find that topical mixing at the global scale to be the primary reason for the absence. However, the differences start to feature once we "deep dive" and (topically) cluster the questions and compare the clusters that have high volumes of anonymous questions with those that have low volumes of anonymous questions. In particular, we observe that the choice to post the question as anonymous is dependent on the user's perception of anonymity and they often choose to speak about depression, anxiety, social ties and personal issues under the guise of anonymity. We further perform personality trait analysis and observe that the anonymous group of users has positive correlation with extraversion, agreeableness, and negative correlation with openness. Subsequently, to gain further insights, we build an anonymity grid to identify the differences in the perception on anonymity of the user posting the question and the community of users answering it. We also look into the first response time of the questions and observe that it is lowest for topics which talk about personal and sensitive issues, which hints toward a higher degree of community support and user engagement.
... In (Wang et al. 2013), the authors perform a detailed analysis of Quora. They show that heterogeneity in the user and question graphs are significant contributors to the quality of Quora's knowledge base. ...
Preprint
Full-text available
Anonymity forms an integral and important part of our digital life. It enables us to express our true selves without the fear of judgment. In this paper, we investigate the different aspects of anonymity in the social Q&A site Quora. The choice of Quora is motivated by the fact that this is one of the rare social Q&A sites that allow users to explicitly post anonymous questions and such activity in this forum has become normative rather than a taboo. Through an analysis of 5.1 million questions, we observe that at a global scale almost no difference manifests between the linguistic structure of the anonymous and the non-anonymous questions. We find that topical mixing at the global scale to be the primary reason for the absence. However, the differences start to feature once we "deep dive" and (topically) cluster the questions and compare the clusters that have high volumes of anonymous questions with those that have low volumes of anonymous questions. In particular, we observe that the choice to post the question as anonymous is dependent on the user's perception of anonymity and they often choose to speak about depression, anxiety, social ties and personal issues under the guise of anonymity. We further perform personality trait analysis and observe that the anonymous group of users has positive correlation with extraversion, agreeableness, and negative correlation with openness. Subsequently, to gain further insights, we build an anonymity grid to identify the differences in the perception on anonymity of the user posting the question and the community of users answering it. We also look into the first response time of the questions and observe that it is lowest for topics which talk about personal and sensitive issues, which hints toward a higher degree of community support and user engagement.
Article
Full-text available
Primary research to detect duplicate question pairs within community-based question answering systems is based on datasets made of English questions only. This research put forward a solution to the problem of duplicate question detection by matching semantically identical questions in transliterated bilingual data. Deep learning has been implemented to analyze informal languages like Hinglish which is a bilingual mix of Hindi and English on Community Question Answering (CQA) platforms to identify duplicacy in questions. The proposed model works in two sequential modules. First module is a language transliteration module which converts input questions into a mono-language text. The next module takes the transliterated text where a hybrid deep learning model which is implemented using multiple layers is used to detect duplicate questions in the mono-lingual data. The similarity between the question pairs is done utilizing this hybrid model combining a Siamese neural network with identical capsule network as the subnetworks and a decision tree classifier. Manhattan distance function is used with the Siamese network for computing the similarity between questions. The proposed model has been validated on 150 pairs of questions which were scrapped from various social media platforms, such as Tripadvisor and Quora which achieves accuracy of 87.0885% and AUC-ROC value of 0.86.
Article
The rise of blockchain technology has brought innovations in various business domains and spawned a new form of organization—the decentralized autonomous organization (DAO). Steemit is recognized as one of the earliest blockchain-based online communities, and a typical example of DAO. By endowing community members with new roles, the decentralization of blockchain-based communities brings changes to the design of user incentive mechanisms. However, few studies have paid attention to the user incentive mechanism when users play the dual roles of social participant and community owner. On the basis of social capital theory and psychological ownership theory, this study explores Steemit's incentive mechanism by evaluating the impact of these dual roles on user active participation behavior. The study adopts a two-way fixed effect negative binomial regression to test the research model. The results show that users’ social capital, share capital, social feedback, and economic feedback positively affect their active participation behavior. At the same time, social feedback and economic feedback play moderating roles on the effects of the dual capitals. Overall, this research provides both theoretical insights and practical implications for understanding, designing, and governing blockchain-based online communities.
Chapter
Information exchange systems are the basis on which the current Information age operates. It is therefore important to understand how these systems work and how we can use them to our benefit. This study aims to analyze information exchange platforms found on the internet, how they operate, what benefits they offer, and what disadvantages they pose. To more deeply understand their operation, we created our own questions-and-answers system. In later chapters, we discuss how this system is created, how users can interact while on the platform, and what best practices we used based on our previous analysis of other similar platforms. Based on our research and the system created, we can conclude that while creating a Q&A platform is with the right knowledge easy to implement, and it acts as a very useful tool to support the free exchange of information. Especially in the field of business, such a system may be a major advantage in the field of marketing, recruiting and customer support or when used internally to support the exchange of ideas and experiences between employees.
Article
Full-text available
In the context of Web 2.0, the interaction between users and resources is more and more frequent in the process of resource shar- ing and consumption. However, the current research on resource pricing mainly focuses on the attributes of the resource itself, and does not weigh the interests of the resource sharing participants. In order to deal with these problems, the pricing mechanism of resource-user interaction evaluation based on multi-agent game theory is established in this paper. Moreover, the user similarity, the evaluation bias based on link analysis and punish- ment of academic group cheating are also included in the model. Based on the data of 181 scholars and 509 articles from the Wanfang database, this paper conducts 5483 pricing experiments for 13 months, and the results show that this model is more effective than other pricing models - the pric- ing accuracy of resource resources is 94.2%, and the accuracy of user value evaluation is 96.4%. Besides, this model can intuitively show the relation- ship within users and within resources. The case study also exhibits that the user’s knowledge level is not positively correlated with his or her authority. Discovering and punishing academic group cheating is conducive to ob- jectively evaluating researchers and resources. The pricing mechanism of scientific and technological resources and the users proposed in this paper is the premise of fair trade of scientific and technological resources.
Article
Informative contributions are critical for the healthy development of online Q&A communities, which have gained increasing popularity in solving personalized open-ended problems. However, little is known about whether past contribution behaviors and corresponding community feedbacks received affect the characteristics of subsequent contributions. Drawing upon the social cognitive theory, we examine the learning effects on users' knowledge contribution behaviors. Specifically, we focus on two types of learning effects: Enactive learning from one's past contribution experience and vicarious learning from observation of others' performances in a question thread. Using a dataset collected from one of the largest online Q&A communities in China, we find that the length feature of past user contributions that garner highly positive feedback, no matter through enactive or vicarious learning, would influence the informativeness of subsequent contributions in the community. These learning effects are more effective for users with higher social status. The enactive learning effect is stronger for contributors with higher social status. For the vicarious learning on higher-status contributors, the influence of high-vote long answers is stronger, but the high-vote short answers show a weaker effect. Our research provides a deeper understanding of knowledge contribution behaviors in online knowledge communities and guides for establishing a healthy knowledge contribution environment.
Article
Full-text available
As social media influence become increasingly popular, understanding why some posts are highly followed than others, especially from the perspective of those leading the discussion allows us to gain insight on how followership is being influenced. A qualitative study of eight participants leading active discussions on Quora was conducted using semi-structured in-depth interviews, followed by thematic analysis. The open coding method was used to iteratively code related answers to develop themes. Results suggest that copyright tactics, controversial answers and sharing new information are some of the mechanisms for influencing followership. These mechanisms are built overtime through conscious strong engagement and by writing a consistently well-thought-out answer. The motivation for leading and writing answers on Quora were more intrinsic than extrinsic, and most participants believed influencing followership should not be a concern if one has the right message.
Article
As more people flock to social media to connect with others and form virtual communities, it is important to research how members of these groups interact to understand human behavior on the Web. In response to an increase in hate speech, harassment, and other antisocial behaviors, many social media companies have implemented different content and user moderation policies. On Reddit, for example, communities, i.e., subreddits, are occasionally banned for violating these policies. We study the effect of these regulatory actions as well as when a community experiences a significant external event such as a political election or a market crash. Overall, we find that most subreddit bans prompt a small, but statistically significant, number of active users to leave the platform; the effect of external events varies with the type of event. We conclude with a discussion on the effectiveness of the bans and wider implications for online content moderation.
Article
As an emerging platform, online question-answering (Q&A) communities are becoming valuable corpus sources that reflect the opinions of expert consumers on comparable entities. In this study, a novel method-Identifying Comparable entities from online Question-Answering contents (ICQA)-is proposed to effectively extract comparable entities from online Q&A communities. In ICQA, candidate entities are firstly extracted by utilizing the advantages of pattern-based methods and supervised learning-based methods. An entity comparison network is then built by considering the credibility difference and entity relatedness in Q&A contents to analyze competitiveness between entities and extract comparable entities from candidate entities accordingly. Thus, within the same method framework, comparable entities and their competitiveness ranks can be simultaneously identified for a given entity specified by managers or consumers. By taking the automotive industry as the experimental background, the effectiveness of ICQA is demonstrated with comprehensive experiments regarding comparable entity identification and comparable entity ranking. Experimental results demonstrate that compared with state-of-the-art methods, ICQA can identify more accurate and broader comparable entities, find novel entities ignored by other methods, and provide a more solid comparable entity rank-features that are deemed desirable and applicable for both managers and consumers in making rational decisions based on online Q&A contents.
Article
Full-text available
Agility gives the ability to be highly responsive to changes and make improvements in the way the agile project is documented. The just-in-time and just barely good enough documentation may miss out on important executable specifications. Interestingly, the frequently asked ‘how-to-do’ questions on the popular question answering (Q&A) websites like stack overflow are strong indicators of gaps in documentation and respondent answers can complement conventional software documentation practices. Social interaction within these QA websites generates partially structured content commonly referred to as crowd knowledge that can offer a peer-reviewed re-documentation by integrating the answers of these ‘how-to-do’ concerns. But finding the best, value-added answer to the question which can contribute toward an enriched and curated documentation is computationally difficult. Moreover, query duplicates can cause seekers to spend more time finding these best answers. As a solution, the research proffers a novel question-answering crowd documentation model (QACDoc) which is based on a socially mediated documentation mechanics involving the dynamics of community-based web. Firstly, duplicate questions are detected using Siamese neural architecture where two identical hierarchical attention networks are used to generate vectors for similarity matching. Semantic matching is done using Manhattan distance function, and a multi-layer perceptron is trained to output the predictions. Next, all respondents of semantically matched questions are grouped to form an intent-based crowd and lastly, top k-experts are identified using representative social presence features for expert ranking. The crowd documentation is then filtered to only include answers of these identified experts.
Chapter
Online Question & Answering (Q&A) platforms facilitate instant information, but with the influx of questions and answers, the response time is high and the quality of answer is compromised too. Duplicate content further corrupts the filtering mechanism. Concurrently, bi-lingual mash-up, specifically Anglicization of language, that is, to make or become English in sound, appearance, or character, is a commonly observed phenomenon on social media. This research put forwards a model for semantic matching of duplicate question pairs, where one question is in the source language (English) and the other is a mash-up (Hindi + English = Hinglish). In the proposed model, firstly language transformation is done to translate the mash-up question into the source language text and then a Siamese artificial neural network (multi-layer perceptron) is implemented to detect semantically similar question pairs using Manhattan distance function for similarity measure. The encoder vector representations and their distance are given as input to the logit (logistic regression) model for binary classification as duplicate or not-duplicate. The model achieves an accuracy of 70.09%.
Conference Paper
Full-text available
Question-and-answer (Q&A) websites are one of the latest evolutions in crowdsourced knowledge aggregation. Q&A websites provide more diverse opinions, as they involve the entire community. Quora made its reputation out of enhancing the traditional Q&A model with popular aspects of social media and incites its users to provide their names, locations, and references. This model allows higher quality control-including anonymous content, but more importantly, it leads users to form communities based on other criteria (e.g. profession, city) than similar interests. In this paper, we study the interactions among Quorans to unveil how such communities emerge. We perform both quantitative and qualitative analysis on the user-generated content and relate this content to social and demographic features. We show that being anonymous significantly affects the answers' length and subjectivity. On the other hand, most of the user interactions relate to their geographic locations.
Article
Large Question-and-Answer (Q&A) platforms support diverse knowledge curation on the Web. While researchers have studied user behaviour on such platforms in a variety of contexts, there is relatively little insight into important by-products of user behaviour that also encode knowledge. Here, we analyse and model the macroscopic structure of tags applied by users to annotate and catalogue questions, using a collection of 168 Stack Exchange websites that span a diversity of sizes and topics. We study the distribution of tag frequencies and also the structure of ‘co-tagging’ networks where nodes are tags and links connect tags that have been applied to the same question. We find striking similarity in tagging structure across Stack Exchange communities, even though each community evolves independently (albeit under similar guidelines). Our findings thus provide evidence that social tagging behaviour is largely driven by the Stack Exchange platform itself and not by the individual Stack Exchange communities. We also develop a simple generative model that creates random bipartite graphs of tags and questions. Our model accounts for the tag frequency distribution but does not explicitly account for co-tagging correlations. Even under these constraints, we demonstrate empirically and theoretically that our model can reproduce a number of the statistical properties that characterize co-tagging networks.
Article
Full-text available
Question answering (QA) websites now play a crucial role in meeting Internet users’ information needs. Quora is a growing QA platform where users get quick answers to their questions from their peers. Nonetheless, it is noted that a significant number of questions remained unanswered for a long time. Questions that have long been unable to receive any answer, opinion-based, need a debate to get the answers, or a valid answer does not exist, fall under Insincere question group. It is therefore important to weed out Insincere questions in order to maintain the integrity of the site. Quora have a huge number of such questions that can not be filtered manually. To overcome this problem, this paper proposes a multi-layer convolutional neural network model that helps to minimize Insincere questions from the website. Two embeddings were created from Quora dataset: (i) using Skipgram, and (ii) using Continuous Bag of Word model. The created embeddings and a pre-trained GloVe embedding vector were used for system development. The proposed model needs only the question text to predict the question is Insincere question or not and hence free from manual feature engineering. The experimental results indicated that the proposed multilayer CNN model outperforming over the earlier works by achieving the F1-score of 0.98 for the best case.
Article
Social web has transformed healthcare communication as patients reach out to seek support and advice by connecting with other patients, caregivers and healthcare professionals. The influx of health-related queries and the volume of answers within the medical forums is a testimony to this adaption. The scalability, natural interaction and dynamism of the continuously collected and connected user-generated social big data can support health assessment, intervention and provisioning to produce the best kind of cognitive smart city. On the flip side the use of social media for healthcare communication suffers from data deluge, lack of reliability and quality, confidentiality and privacy (location/personal) issues. Duplicate questions, that is, queries with similar semantics (meaning) corrupt the filtering mechanism, increase the response time and compromise with the quality of the answer too. This research puts forward solutions to resolve the key challenge of duplicacy within the medical community Question-Answering sites (Medical CQAs). We propose to solve the semantic question matching problem for duplicate question pair detection, using a hybrid deep learning model, which combines a Co-attention based Bi-Directional Long Short-Term Memory (Bi-LSTM) Siamese neural network and a Multi-layer perceptron classifier to output the probability of a similarity match between the two questions. Euclidean distance function is then used to compute the similarity between questions. The proposed model is validated on 100 question pairs which are scrapped from three featured groups, namely, ‘Irritable Bowel Syndrome’, ‘Anxiety Disorder’ and ‘Menopause’ of Patient.info community forum and an accuracy of 86.375% is observed. The results obtained are comparable to that of the Quora’s state-of-the-art results for duplicate detection.
Article
Full-text available
Amputation is a growing health issue with implications for the corporeal form and sense of bodily identity. Disposal of the removed limb (the amputate) has historically been suggested to impact on patient adaptation to amputation, although understandings of limb disposal are scarce within existing research. The growth of online question and answer sites has created opportunities for social actors to post and respond to a vast array of topic areas, including those that are seen as morbid or taboo. This paper then explores the discussion of amputate disposal within threads from two popular question and answer sites. Using thematic analysis, the paper examines how perceived ownership of limbs, understanding of the amputate as ‘waste’ and recourse to grotesque humour are key means by which limb disposal is discussed within these sites. Posters then create a new knowledge around the disposal of limbs, albeit one framed by uncertainty.
Chapter
Anonymity forms an integral and important part of our digital life. It enables us to express our true selves without the fear of judgment. In this paper, we investigate the different aspects of anonymity in the social Q&A site Quora. Quora allows users to explicitly post anonymous questions and such activity in this forum has become normative rather than a taboo. Through an analysis of millions of questions, we observe that at a global scale almost no difference manifests between the linguistic structure of the anonymous and the non-anonymous questions posted on Quora. We find that topical mixing at the global scale to be the primary reason for the absence. However, the differences start to feature once we “deep dive” and (topically) cluster the questions and compare them. In particular, we observe that the choice to post the question as anonymous is dependent on the user’s perception of anonymity and they often choose to speak about depression, anxiety, social ties and personal issues under the guise of anonymity. Subsequently, to gain further insights, we build an anonymity grid to identify the differences in the perception on anonymity of the user posting the question and the community of users answering it. We also look into the first response time of the questions and observe that it is lowest for topics which talk about personal and sensitive issues, which hints toward a higher degree of community support.
Conference Paper
Full-text available
Recently, there is an increasing trend that people start sharing their comments in online question and answer (QA) websites. For such websites, the ranking of answers is usually determined based on several factors like the received upvotes and downvotes, the publishing time, and the reputation of users, such that a highly ranked answer has much more chances to be propagated over the Internet. Due to the popularity of these QA websites, some public relationship companies try to cheat users by promoting or blocking certain answers. In this paper, we take Quora and Zhihu as two case studies to understand the impact of follower numbers, the publishing time, the upvotes and downvotes on the ranking of answers, and to investigate how these websites are vulnerable to voting spammers. Meanwhile, we propose a scheme to estimate the list of downvotes, which is usually hidden behind such websites.
Article
In the area of community question answering (CQA), answer selection and answer ranking are two tasks which are applied to help users quickly access valuable answers. Existing solutions mainly exploit the syntactic or semantic correlation between a question and its related answers (Q&A), where the multifacet domain effects in CQA are still underexplored. In this paper, we propose a unified model, enhanced attentive recurrent neural network ( EARNN ), for both answer selection and answer ranking tasks by taking full advantages of both Q&A semantics and multifacet domain effects (i.e., topic effects and timeliness). Specifically, we develop a serialized long short-term memory to learn the unified representations of Q&A, where two attention mechanisms at either sentence level or word level are designed for capturing the deep effects of topics. Meanwhile, the emphasis of Q&A can be automatically distinguished. Furthermore, we design a time-sensitive ranking function to model the timeliness in CQA. To effectively train EARNN , a question-dependent pairwise learning strategy is also developed. Finally, we conduct extensive experiments on a real-world dataset from Quora. Experimental results validate the effectiveness and interpretability of our proposed EARNN model.
Conference Paper
Full-text available
Online tech support communities have become valuable channels for users to seek and provide solutions to specific problems. From the resource exchange perspective, the sustainability of a social system is contingent upon the size of its members as well as their communication activities. To further extend the resource-based model, the current research identifies a variety of social roles in a large tech support Q&A forum and examines longitudinal changes in the community's structure based on the identification. Moreover, this study also investigates the relationship between the community's functionality and its traffic. Results suggest that the proportion of unsolved questions negatively impacts the number of future incoming questions and the outcome of a given question is not only dependent on users' interactions within the discussion, but also on the community activities preceding the question. These observations can help community managers to improve system design and task allocation.
Article
Social Q&A communities are becoming increasingly popular, but literature on users' continued participation is still relatively limited. Based on theory of planned behavior, this paper aims to answer two research questions: (1) what are the factors that motivate users to continuously participate in online social Q&A communities? (2) How do the factors differ across lurkers, askers and answerers? An online survey was performed on a Chinese social Q&A community. And motivational factors are selected from three perspectives: psychological, social and functional. Results indicate that commitment, shared language and shared vision have positive influence on both lurkers' and answerers' attitude towards continued participation. Concerning the functional dimension, hypotheses about influence of network externalities on users' perceived usefulness are partially supported. And about the psychological dimension, motivations differ greatly between lurkers and answerers.
Article
Quora is one of the most popular community question & answer (Q&A) sites of recent times. However, with increasing question posts over time and the posts covering a wide range of topics (unlike focused Q&A sites like Stack Overflow), not all of them are getting answered. Measuring answerability (i.e., whether a question shall get answered or not) involves collecting expensive human judgment data that can differentiate the characteristics of an answered question from an unanswered (aka open ) one. Factors to judge if a question would remain open include its subjectivity, openendedness, vagueness, ambiguity, and so on. It is difficult to collect such judgments for thousands of questions, requiring automatic framework to deal the issue of answerability of questions. In this paper, we quantify: 1) user-level and 2) question-level linguistic activities —that can nicely correspond to many of the judgment factors noted earlier , can be easily measured for each question post and that appropriately discriminates an answered question from an unanswered one . Our central finding is that the way users use language while writing the question text can be a very effective means to characterize answerability. This characterization further helps us to predict early if a question remaining unanswered for a specific time period $t$ will eventually be answered or not and achieve an accuracy of 76.26% ( $t=1$ month) and 68.33% ( $t=3$ months). Notably, features representing the language use patterns of the users are most discriminative and alone account for an accuracy of 74.18%. We also compare our method with some of the similar works [1] , [2] achieving a maximum improvement of ~39% in terms of accuracy.
Chapter
Expert finding plays an important role in community question answering websites. Previously, most works focused on assessing the user expertise scores mainly from their past question-answering semantic features. In this work, we propose a gating mechanism to dynamically combine structural and textual representations based on past question-answering behaviors. We also use some user activities including temporal behaviors as the features, which determine the gate values. We evaluate the performance of our method on the well-known question answering sites Stackexchange and Quora. Experiments show that our approach can improve the performance on expert finding tasks.
Chapter
Social questioning and answering (social Q&A or SQA) is a community-based online service on which peer users ask and answer questions to and for one another about various topics in everyday life. Social Q&A has been labeled with several variations, such as community Q&A, collaborative Q&A, and online Q&A, but it most often refers to a free and open Q&A site with dedicated users who subscribe to the service to ask and answer questions. This encourages people to bring up their various issues, to actively seek solutions and suggestions, and to share personal experiences as well as to give and receive social and emotional support. This chapter provides a literature review of the recent social Q&A research and explains the theories and methods that have been applied to conducting social Q&A research with examples from previous studies in order to show a range of diverse approaches to examining user behaviors and interactions in social Q&A.
Article
Community-based question answering platforms have attracted substantial users to share knowledge and learn from each other. As the rapid enlargement of community-based question answering (CQA) platforms, quantities of overlapped questions emerge, which makes users confounded to select a proper reference. It is urgent for us to take effective automated algorithms to reuse historical questions with corresponding answers. In this paper, we focus on the problem with question retrieval, which aims to match historical questions that are relevant or semantically equivalent to resolve one’s query directly. The challenges in this task are the lexical gaps between questions for the word ambiguity and word mismatch problem. Furthermore, limited words in queried sentences cause sparsity of word features. To alleviate these challenges, we propose a novel framework named HSIN which encodes not only the question contents but also the asker’s social interactions to enhance the question embedding performance. More specifically, we apply random walk based learning method with recurrent neural network to match the similarities between asker’s question and historical questions proposed by other users. Extensive experiments on a large-scale dataset from a real world CQA site Quora show that employing the heterogeneous social network information outperforms the other state-of-the-art solutions in this task.
Conference Paper
In recent times, Question-Answer communities have engaged much user attention and have become a major platform for knowledge sharing and discussion. Stack Exchange (SE) is one such successful community which is a collection of various domain-specific forums, each acting as an independent community in itself. In this paper, we undertake a comparative measurement study across a large number of these domain-specific forums within Stack Exchange. We analyse a number of user activity-based features of each forum and try to cluster different forums based on their similarities on this feature set. For our study, we model Stack Exchange as an Across "Forum Graph" based on inter-forum similarity, and its individual forums as: (a) A user-to-user graph (question asker-answerer) (b) A bipartite graph between questions and answerers, and (c) A bipartite graph between questions and answers. Through these graphs we present a measurement study of Stack Exchange which focuses on the similarities and differences between various forums based on the patterns of user activity on them. The clusters obtained give a high level idea of similar forums based on common users and content. We observe that communities can be classified as "discussion-based" and "fact-based" and further we classify forums on the basis of question answering patterns.
Article
Community question answering (CQA) are collaborative online places where members ask questions for others to answer. Community members on these platforms share their expertise on various topics, from mechanical repairs to parenting. As a crowd-sourced service, such platforms not only depend on user-provided questions and answers, but also rely on their users for monitoring and flagging content that violates community rules. This study focuses on user-reported flags to characterize the behavior of the good guys and bad guys in a popular community question answering, Yahoo Answers. Conventional wisdom is to eliminate the users who receive many flags. However, our analysis of a year of traces from Yahoo Answers shows that the number of flags does not tell the full story: on one hand, users with many flags may still contribute positively to the community. On the other hand, users who never get flagged are found to violate community rules and get their accounts suspended. This analysis, however, also shows that abusive users are betrayed by their network properties: we find strong evidence of homophilous behavior and use this finding to detect abusive users who go under the community radar. Based on our empirical observations, we build a classifier that is able to detect abusive users with an accuracy as high as 83%.
ResearchGate has not been able to resolve any references for this publication.