Preprint

Characterizing and Forecasting User Engagement with In-app Action Graph: A Case Study of Snapchat

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

While mobile social apps have become increasingly important in people's daily life, we have limited understanding on what motivates users to engage with these apps. In this paper, we answer the question whether users' in-app activity patterns help inform their future app engagement (e.g., active days in a future time window)? Previous studies on predicting user app engagement mainly focus on various macroscopic features (e.g., time-series of activity frequency), while ignoring fine-grained inter-dependencies between different in-app actions at the microscopic level. Here we propose to formalize individual user's in-app action transition patterns as a temporally evolving action graph, and analyze its characteristics in terms of informing future user engagement. Our analysis suggested that action graphs are able to characterize user behavior patterns and inform future engagement. We derive a number of high-order graph features to capture in-app usage patterns and construct interpretable models for predicting trends of engagement changes and active rates. To further enhance predictive power, we design an end-to-end, multi-channel neural model to encode temporal action graphs, activity sequences, and other macroscopic features. Experiments on predicting user engagement for 150k Snapchat new users over a 28-day period demonstrate the effectiveness of the proposed models. The prediction framework is deployed at Snapchat to deliver real world business insights. Our proposed framework is also general and can be applied to other social app platforms.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

... While the design of predictive technologies tends to be agnostic to theoretical insights from the behavioral sciences, their application is closely aligned with the catchphrase that past behavior is the best predictor of future behavior [7,8]. However, the availability of fine-grained behavioral user data also opens the door for modeling approaches aimed at deeper psychological constructs, such as personality [9][10][11][12][13][14] or habits [15][16][17]. ...
... For example, user engagement on Snapchat follows habitual patterns over time, enabling new neural network architectures specifically designed to represent cyclical patterns [15]. Similarly, fine-grained in-app action sequences follow predictable patterns that can be characterized as habitual [16]. However, with the exception of time, the role of context has largely been overlooked in studies linking habits and user engagement. ...
... These findings underscore the value of contextualized representations of user behavior for predicting user engagement on social platforms. Additionally, our findings are aligned with previous research suggesting that user behavior on Snapchat follows habitual patterns over time [15,16]. More broadly, our findings are consistent with the notion that social sharing [48,96] and media consumption [17,40,46,47,49] are habit-driven. ...
Article
Full-text available
The success of online social platforms hinges on their ability to predict and understand user behavior at scale. Here, we present data suggesting that context-aware modeling approaches may offer a holistic yet lightweight and potentially privacy-preserving representation of user engagement on online social platforms. Leveraging deep LSTM neural networks to analyze more than 100 million Snapchat sessions from almost 80.000 users, we demonstrate that patterns of active and passive use are predictable from past behavior (R2R2{R^2}=0.345) and that the integration of context features substantially improves predictive performance compared to the behavioral baseline model (R2R2{R^2}=0.522). Features related to smartphone connectivity status, location, temporal context, and weather were found to capture non-redundant variance in user engagement relative to features derived from histories of in-app behaviors. Further, we show that a large proportion of variance can be accounted for with minimal behavioral histories if momentary context is considered (R2R2{R^2}=0.442). These results indicate the potential of context-aware approaches for making models more efficient and privacy-preserving by reducing the need for long data histories. Finally, we employ model explainability techniques to glean preliminary insights into the underlying behavioral mechanisms. Our findings are consistent with the notion of context-contingent, habit-driven patterns of active and passive use, highlighting the value of contextualized representations of user behavior for predicting user engagement on online social platforms.
Conference Paper
Full-text available
As online platforms are striving to get more users, a critical challenge is user churn, which is especially concerning for new users. In this paper, by taking the anonymous large-scale real-world data from Snapchat as an example, we develop ClusChurn , a systematic two-step framework for interpretable new user clustering and churn prediction, based on the intuition that proper user clustering can help understand and predict user churn. Therefore, ClusChurn firstly groups new users into interpretable typical clusters, based on their activities on the platform and ego-network structures. Then we design a novel deep learning pipeline based on LSTM and attention to accurately predict user churn with very limited initial behavior data, by leveraging the correlations among users' multi- dimensional activities and the underlying user types. ClusChurn is also able to predict user types, which enables rapid reactions to different types of user churn. Extensive data analysis and experiments show that ClusChurn provides valuable insight into user behaviors, and achieves state-of-the-art churn prediction performance. The whole framework is deployed as a data analysis pipeline, delivering real-time data analysis and prediction results to multiple relevant teams for business intelligence uses. It is also general enough to be readily adopted by any online systems with user behavior data.
Article
Full-text available
Mobile health applications that track activities, such as exercise, sleep, and diet, are becoming widely used. While these activity tracking applications have the potential to improve our health, user engagement and retention are critical factors for their success. However, long-term user engagement patterns in real-world activity tracking applications are not yet well understood. Here we study user engagement patterns within a mobile physical activity tracking application consisting of 115 million logged activities taken by over a million users over 31 months. Specifically, we show that over 75% of users return and re-engage with the application after prolonged periods of inactivity, no matter the duration of the inactivity. We find a surprising result that the re-engagement usage patterns resemble those of the start of the initial engagement period, rather than being a simple continuation of the end of the initial engagement period. This evidence points to a conceptual model of multiple lives of user engagement, extending the prevalent single life view of user activity. We demonstrate that these multiple lives occur because the users have a variety of different primary intents or goals for using the app. We find evidence for users being more likely to stop using the app once they achieved their primary intent or goal (e.g., weight loss). However, these users might return once their original intent resurfaces (e.g., wanting to lose newly gained weight). Based on insights developed in this work, including a marker of improved primary intent performance, our prediction models achieve 71% ROC AUC. Overall, our research has implications for modeling user re-engagement in health activity tracking applications and has consequences for how notifications, recommendations as well as gamification can be used to increase engagement.
Article
Full-text available
The next generation of Internet services is driven by users and user-generated content. The complex nature of user behavior makes it highly challenging to manage and secure online services. On one hand, service providers cannot effectively prevent attackers from creating large numbers of fake identities to disseminate unwanted content (e.g., spam). On the other hand, abusive behavior from real users also poses significant threats (e.g., cyberbullying). In this article, we propose clickstream models to characterize user behavior in large online services. By analyzing clickstream traces (i.e., sequences of click events from users), we seek to achieve two goals: (1) detection: to capture distinct user groups for the detection of malicious accounts, and (2) understanding: to extract semantic information from user groups to understand the captured behavior. To achieve these goals, we build two related systems. The first one is a semisupervised system to detect malicious user accounts (Sybils). The core idea is to build a clickstream similarity graph where each node is a user and an edge captures the similarity of two users’ clickstreams. Based on this graph, we propose a coloring scheme to identify groups of malicious accounts without relying on a large labeled dataset. We validate the system using ground-truth clickstream traces of 16,000 real and Sybil users from Renren, a large Chinese social network. The second system is an unsupervised system that aims to capture and understand the fine-grained user behavior. Instead of binary classification (malicious or benign), this model identifies the natural groups of user behavior and automatically extracts features to interpret their semantic meanings. Applying this system to Renren and another online social network, Whisper (100K users), we help service providers identify unexpected user behaviors and even predict users’ future actions. Both systems received positive feedback from our industrial collaborators including Renren, LinkedIn, and Whisper after testing on their internal clickstream data.
Article
Full-text available
Online crowdfunding platforms like DonorsChoose.org and Kickstarter allow specific projects to get funded by targeted contributions from a large number of people. Critical for the success of crowdfunding communities is recruitment and continued engagement of donors. With donor attrition rates above 70%, a significant challenge for online crowdfunding platforms as well as traditional offline non-profit organizations is the problem of donor retention. We present a large-scale study of millions of donors and donations on DonorsChoose.org, a crowdfunding platform for education projects. Studying an online crowdfunding platform allows for an unprecedented detailed view of how people direct their donations. We explore various factors impacting donor retention which allows us to identify different groups of donors and quantify their propensity to return for subsequent donations. We find that donors are more likely to return if they had a positive interaction with the receiver of the donation. We also show that this includes appropriate and timely recognition of their support as well as detailed communication of their impact. Finally, we discuss how our findings could inform steps to improve donor retention in crowdfunding communities and non-profit organizations.
Article
Full-text available
In the competitive environment of the internet, retaining and growing one's user base is of major concern to most web services. Furthermore, the economic model of many web services is allowing free access to most content, and generating revenue through advertising. This unique model requires securing user time on a site rather than the purchase of good which makes it crucially important to create new kinds of metrics and solutions for growth and retention efforts for web services. In this work, we address this problem by proposing a new retention metric for web services by concentrating on the rate of user return. We further apply predictive analysis to the proposed retention metric on a service, as a means for characterizing lost customers. Finally, we set up a simple yet effective framework to evaluate a multitude of factors that contribute to user return. Specifically, we define the problem of return time prediction for free web services. Our solution is based on the Cox's proportional hazard model from survival analysis. The hazard based approach offers several benefits including the ability to work with censored data, to model the dynamics in user return rates, and to easily incorporate different types of covariates in the model. We compare the performance of our hazard based model in predicting the user return time and in categorizing users into buckets based on their predicted return time, against several baseline regression and classification methods and find the hazard based approach to be superior.
Article
Full-text available
Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as well as their implicit feedback to search engine results. Mining the wealth of information available in the query logs has many important applications including query-log analysis, user profiling and personalization, advertising, query recommendation, and more. In this paper we introduce the query-flow graph, a graph representation of the interesting knowledge about latent querying behavior. Intuitively, in the query-flow graph a directed edge from query qi to query qj means that the two queries are likely to be part of the same "search mission". Any path over the query-flow graph may be seen as a searching behavior, whose likelihood is given by the strength of the edges along the path. The query-flow graph is an outcome of query-log mining and, at the same time, a useful tool for it. We propose a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users. Using this approach we build a real-world query-flow graph from a large-scale query log and we demonstrate its utility in concrete applications, namely, finding logical sessions, and query recommendation. We believe, however, that the usefulness of the query-flow graph goes beyond these two applications.
Conference Paper
Full-text available
Understanding how users behave when they connect to social networking sites creates opportunities for better interface design, richer studies of social interactions, and improved design of content distribution systems. In this paper, we present a rst of a kind analysis of user workloads in on- line social networks. Our study is based on detailed click- stream data, collected over a 12-day period, summarizing HTTP sessions of 37,024 users who accessed four popular social networks: Orkut, MySpace, Hi5, and LinkedIn. The data were collected from a social network aggregator web- site in Brazil, which enables users to connect to multiple social networks with a single authentication. Our analysis of the clickstream data reveals key features of the social net- work workloads, such as how frequently people connect to social networks and for how long, as well as the types and sequences of activities that users conduct on these sites. Ad- ditionally, we crawled the social network topology of Orkut, so that we could analyze user interaction data in light of the social graph. Our data analysis suggests insights into how users interact with friends in Orkut, such as how frequently users visit their friends' or non-immediate friends' pages. In summary, our analysis demonstrates the power of using clickstream data in identifying patterns in social network workloads and social interactions. Our analysis shows that browsing, which cannot be inferred from crawling publicly available data, accounts for 92% of all user activities. Con- sequently, compared to using only crawled data, considering silent interactions like browsing friends' pages increases the measured level of interaction among users.
Conference Paper
Full-text available
MillionsofusersretrieveinformationfromtheInternetusingsearch engines. Mining these user sessions can provide valuable informa- tion about the quality of user experience and the perceived quality of search results. Often search engines rely on accurate estimates of Click Through Rate (CTR) to evaluate the quality of user expe- rience. The vast heterogeneity in the user population and presence of automated software programs (bots) can result in high variance in the estimates of CTR. To improve the estimation accuracy of user experience metrics like CTR, we argue that it is important to identify typical and atypical user sessions in clickstreams. Our ap- proach to identify these sessions is based on detecting outliers using Mahalanobis distance in the user session space. Our user session model incorporates several key clickstream characteristics includ- ing a novel conformance score obtained by Markov Chain analysis. Editorial results show that our approach of identifying typical and atypical sessions has a precision of about 89%. Filtering out these atypical sessions reduces the uncertainty (95% confidence inter- val) of the mean CTR by about 40%. These results demonstrate that our approach of identifying typical and atypical user sessions is extremely valuable for cleaning "noisy" user session data for in- creased accuracy in evaluating user experience.
Conference Paper
Full-text available
Web usage mining is a main research area in Web mining focused on learning about Web users and their interactions with Web sites. Main challenges in Web usage mining are the application of data mining techniques to Web data in an efficient way and the discovery of non trivial user behaviour patterns. In this paper we focus the attention on search engines analyzing query log data and showing several models about how users search and how users use search engine results.
Article
Full-text available
Classification is an important topic in data mining research. Given a set of data records, each of which belongs to one of a number of predefined classes, the classification problem is concerned with the discovery of classification rules that can allow records with unknown class membership to be correctly classified. Many algorithms have been developed to mine large data sets for classification models and they have been shown to be very effective. However, when it comes to determining the likelihood of each classification made, many of them are not designed with such purpose in mind. For this, they are not readily applicable to such problems as churn prediction. For such an application, the goal is not only to predict whether or not a subscriber would switch from one carrier to another, it is also important that the likelihood of the subscriber's doing so be predicted. The reason for this is that a carrier can then choose to provide a special personalized offer and services to those subscribers who are predicted with higher likelihood to churn. Given its importance, we propose a new data mining algorithm, called data mining by evolutionary learning (DMEL), to handle classification problems of which the accuracy of each predictions made has to be estimated. In performing its tasks, DMEL searches through the possible rule space using an evolutionary approach that has the following characteristics: 1) the evolutionary process begins with the generation of an initial set of first-order rules (i.e., rules with one conjunct/condition) using a probabilistic induction technique and based on these rules, rules of higher order (two or more conjuncts) are obtained iteratively; 2) when identifying interesting rules, an objective interestingness measure is used; 3) the fitness of a chromosome is defined in terms of the probability that the attribute values of a record can be correctly determined using the rules it encodes; and 4) the likelihood of predictions (or classifications) made are estimated so that subscribers can be ranked according to their likelihood to churn. Experiments with different data sets showed that DMEL is able to effectively discover interesting classification rules. In particular, it is able to predict churn accurately under- different churn rates when applied to real telecom subscriber data.
Article
Retaining participation is crucial for information services, online knowledge sharing services among them. We present the first comprehensive analysis of users’ activity lifespan across three predominant online knowledge-sharing communities. Extending previous work focusing on initial interactions of new users, we use survival analysis to quantify participation patterns that can be used to predict individual lifespan over the long term. We discuss how cross-site differences in user participation and the underlying factors can be related to differences in system design and culture. We conduct a longitudinal comparison of the communities’ evolvement between two distinct stages, the initial days just after the site launch and one year later. We also observe that sub-communities corresponding to different topics differ in their ability to sustain users. All results reveal the complexity and diversity in users’ engagement to a site and design implications are discussed.
Conference Paper
People have different intents in using online platforms. They may be trying to accomplish specific, short-term goals, or less well-defined, longer-term goals. While understanding user intent is fundamental to the design and personalization of online platforms, little is known about how intent varies across individuals, or how it relates to their behavior. Here, we develop a framework for understanding intent in terms of goal specificity and temporal range. Our methodology combines survey-based methodology with an observational analysis of user activity. Applying this framework to Pinterest, we surveyed nearly 6000 users to quantify their intent, and then studied their subsequent behavior on the web site. We find that goal specificity is bimodal - users tend to be either strongly goal-specific or goal-nonspecific. Goal-specific users search more and consume less content in greater detail than goal-nonspecific users: they spend more time using Pinterest, but are less likely to return in the near future. Users with short-term goals are also more focused and more likely to refer to past saved content than users with long-term goals, but less likely to save content for the future. Further, intent can vary by demographic, and with the topic of interest. Last, we show that user's intent and activity are intimately related by building a model that can predict a user's intent for using Pinterest after observing their activity for only two minutes. Altogether, this work shows how intent can be predicted from user behavior.
Conference Paper
Session identification is a common strategy used to develop metrics for web analytics and perform behavioral analyses of user-facing systems. Past work has argued that session identification strategies based on an inactivity threshold is inherently arbitrary or has advocated that thresholds be set at about 30 minutes. In this work, we demonstrate a strong regularity in the temporal rhythms of user initiated events across several different domains of online activity (incl. video gaming, search, page views and volunteer contributions). We describe a methodology for identifying clusters of user activity and argue that the regularity with which these activity clusters appear implies a good rule-of-thumb inactivity threshold of about 1 hour. We conclude with implications that these temporal rhythms may have for system design based on our observations and theories of goal-directed human activity.
Article
We present a scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs. We motivate the choice of our convolutional architecture via a localized first-order approximation of spectral graph convolutions. Our model scales linearly in the number of graph edges and learns hidden layer representations that encode both local graph structure and features of nodes. In a number of experiments on citation networks and on a knowledge graph dataset we demonstrate that our approach outperforms related methods by a significant margin.
Conference Paper
Easy accessibility can often lead to over-consumption, as seen in food and alcohol habits. On video on-demand (VOD) services, this has recently been referred to as binge watching, where potentially entire seasons of TV shows are consumed in a single viewing session. While a user viewership model may reveal this binging behavior, creating an accurate model has several challenges, including censored data, deviations in the population, and the need to consider external influences on consumption habits. In this paper, we introduce a novel statistical mixture model that incorporates these factors and presents a first of its kind characterization of viewer consumption behavior using a real-world dataset that includes playback data from a VOD service. From our modeling, we tackle various predictive tasks to infer the consumption decisions of a user in a viewing session, including estimating the number of episodes they watch and classifying if they continue watching another episode. Using these insights, we then identify binge watching sessions based on deviation from normal viewing behavior. We observe different types of binging behavior, that binge watchers often view certain content out-of-order, and that binge watching is not a consistent behavior among our users. These insights and our findings have application in VOD revenue generation, consumer health applications, and customer retention analysis.
Conference Paper
Online e-commerce applications are becoming a primary vehicle for people to find, compare, and ultimately purchase products. One of the fundamental questions that arises in e-commerce is to characterize, understand, and model user long-term purchasing intent, which is important as it allows for personalized and context relevant e-commerce services. In this paper we study user activity and purchasing behavior with the goal of building models of time-varying user purchasing intent. We analyze the purchasing behavior of nearly three million Pinterest users to determine short-term and long-term signals in user behavior that indicate higher purchase intent. We find that users with long-term purchasing intent tend to save and clickthrough on more content. However, as users approach the time of purchase their activity becomes more topically focused and actions shift from saves to searches. We further find that purchase signals in online behavior can exist weeks before a purchase is made and can also be traced across different purchase categories. Finally, we synthesize these insights in predictive models of user purchasing intent. Taken together, our work identifies a set of general principles and signals that can be used to model user purchasing intent across many content discovery applications.
Conference Paper
Predicting the next request of a user as she visits Web pages has gained importance as Web-based activity increases. Markov models and their variations, or models based on sequence mining have been found well suited for this problem. However, higher order Markov models are extremely complicated due to their large number of states whereas lower order Markov models do not capture the entire behavior of a user in a session. The models that are based on sequential pattern mining only consider the frequent sequences in the data set, making it difficult to predict the next request following a page that is not in the sequential pattern. Furthermore, it is hard to find models for mining two different kinds of information of a user session. We propose a new model that considers both the order information of pages in a session and the time spent on them. We cluster user sessions based on their pair-wise similarity and represent the resulting clusters by a click-stream tree. The new user session is then assigned to a cluster based on a similarity measure. The click-stream tree of that cluster is used to generate the recommendation set. The model can be used as part of a cache prefetching system as well as a recommendation model.
Conference Paper
Web search engines are traditionally evaluated in terms of the relevance of web pages to individual queries. However, relevance of web pages does not tell the complete picture, since an individual query may represent only a piece of the user's information need and users may have different infor- mation needs underlying the same queries. We address the problem of predicting user search goal success by modeling user behavior. We show empirically that user behavior alone can give an accurate picture of the success of the user's web search goals, without considering the relevance of the docu- ments displayed. In fact, our experiments show that models using user behavior are more predictive of goal success than those using document relevance. We build novel sequence models incorporating time distributions for this task and our experiments show that the sequence and time distribu- tion models are more accurate than static models based on user behavior, or predictions based on document relevance.
Modeling user consumption sequences
  • Ravi Austin R Benson
  • Andrew Kumar
  • Tomkins
Austin R Benson, Ravi Kumar, and Andrew Tomkins. 2016. Modeling user consumption sequences. In WWW.
  • David M Blei
  • Andrew Y Ng
  • Michael I Jordan
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3 (March 2003), 993-1022.
MoodBar: Increasing new user retention in Wikipedia through lightweight socialization
  • Giovanni Luca Ciampaglia
  • Dario Taraborelli
Giovanni Luca Ciampaglia and Dario Taraborelli. 2015. MoodBar: Increasing new user retention in Wikipedia through lightweight socialization. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 734-742.
Churn prediction in MMORPGs: A social influence based approach
  • Jaya Kawale
  • Aditya Pal
  • Jaideep Srivastava
Jaya Kawale, Aditya Pal, and Jaideep Srivastava. 2009. Churn prediction in MMORPGs: A social influence based approach. In CSE, Vol. 4. IEEE, 423-428.