ArticlePublisher preview available

Customer segmentation using online platforms: isolating behavioral and demographic segments for persona creation via aggregated user data

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

We propose a novel approach for isolating customer segments using online customer data for products that are distributed via online social media platforms. We use non-negative matrix factorization to first identify behavioral customer segments and then to identify demographic customer segments. We employ a methodology for linking the two segments to present integrated and holistic customer segments, also known as personas. Behavioral segments are generated from customer interactions with online content. Demographic segments are generated using the gender, age, and location of these customers. In addition to evaluating our approach, we demonstrate its practicality via a system leveraging these customer segments to automatically generate personas, which are fictional but accurate representations of each integrated behavioral and demographic segment. Results show that this approach can accurately identify both behavioral and demographical customer segments using actual online customer data from which we can generate personas representing real groups of people.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
1 3
Social Network Analysis and Mining (2018) 8:54
https://doi.org/10.1007/s13278-018-0531-0
ORIGINAL ARTICLE
Customer segmentation using online platforms: isolating behavioral
anddemographic segments forpersona creation viaaggregated user
data
JisunAn1 · HaewoonKwak1· Soon‑gyoJung1· JoniSalminen1,2· BernardJ.Jansen1
Received: 21 December 2017 / Revised: 28 April 2018 / Accepted: 5 August 2018 / Published online: 23 August 2018
© Springer-Verlag GmbH Austria, part of Springer Nature 2018
Abstract
We propose a novel approach for isolating customer segments using online customer data for products that are distributed via
online social media platforms. We use non-negative matrix factorization to first identify behavioral customer segments and
then to identify demographic customer segments. We employ a methodology for linking the two segments to present inte-
grated and holistic customer segments, also known as personas. Behavioral segments are generated from customer interactions
with online content. Demographic segments are generated using the gender, age, and location of these customers. In addition
to evaluating our approach, we demonstrate its practicality via a system leveraging these customer segments to automatically
generate personas, which are fictional but accurate representations of each integrated behavioral and demographic segment.
Results show that this approach can accurately identify both behavioral and demographical customer segments using actual
online customer data from which we can generate personas representing real groups of people.
Keywords Web analytics· Social computing· Personas· Marketing· System design· Customer segmentation
1 Introduction
One use of social media and other web analytics data is cus-
tomer segmentation (Jansen 2009), which is an approach
for separating an overall customer population based on
segment differences defined by a specific set of attributes.
Customer segmentation is a common practice across many
industries with the set of attributes utilized being relevant
to the particular domain. Examples of such domains include
marketing, advertising, education, and system design.
E-commerce companies and other organizations rely on cus-
tomer segmentation to target specific customer groups with
content and products that the consumers within a segment
would likely find relevant. Additionally, customer segmenta-
tion might also lead to a deeper understanding of customer
preferences, needs, and wants by isolating what each seg-
ment finds most valuable. Based on these insights, organi-
zations can more effectively engage with their customers,
audience, or users. In software design, marketing planning,
and advertising development, there are continuing efforts for
identifying and assessing segments of people (i.e., custom-
ers, audience, or markets) to optimize some performance
metric (e.g., the speed of task, buying preferences, or ease
of use).
Major online social media platforms used for distrib-
uting content and other products present unique chal-
lenges for customer segmentation efforts attempting to
rely on online customer data. The customer segmentation
approach relies on identifying key attributes from which
one can separate customers into segments (Cooil etal.
2008). Targeting customers via behavioral segmenta-
tion involves dividing the customer base based on their
collective behavior. A behavior can be a single attribute
* Jisun An
jisun.an@acm.org
Haewoon Kwak
haewoon@acm.org
Soon-gyo Jung
sjung@hbku.edu.qa
Joni Salminen
jsalminen@hbku.edu.qa
Bernard J. Jansen
bjansen@hbku.edu.qa
1 Qatar Computing Research Institute, HBKU, ArRayyān,
Qatar
2 Turku School ofEconomics, Turku, Finland
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... The ability to personalize interactions not only strengthens customer loyalty but also positions businesses to anticipate and adapt to evolving consumer demands. (3) This data-driven methodology ensures that companies remain competitive, fostering stronger customer relationships and paving the way for sustainable growth in an ever-changing market landscape. Through effective analysis and actionable insights, customer personality analysis becomes a valuable tool for achieving strategic business objectives and delivering meaningful value to consumers. ...
Article
Full-text available
Introduction: This study focuses on analyzing customer personalities through segmentation techniques and exploratory data analysis (EDA) to better understand consumer behavior and preferences. In a highly competitive market, gaining a deeper understanding of customer needs is crucial for creating personalized marketing strategies and enhancing the overall customer experience. Objective: The research utilizes advanced segmentation approaches to group customers based on shared traits, preferences, and behaviors, while EDA is applied to uncover trends and insights within extensive datasets. Method: By identifying distinct customer segments, the study provides valuable recommendations that businesses can use to align their products, services, and marketing efforts with the unique demands of their clientele. Result: The insights derived from this research enable companies to implement data-driven strategies that not only enhance customer satisfaction but also foster long-term growth. Conclusion: By tapping into these analytical findings, organizations can optimize their decision-making processes and build stronger connections with their target audiences, ultimately positioning themselves for success in an increasingly data-oriented business landscape.
... TikTok's short-form, viral content format has gained popularity for its ability to quickly reach and engage younger audiences with creative and dynamic travel content. These platforms offer unique features and user demographics that can influence the effectiveness of marketing campaigns (An et al., 2018). ...
Chapter
Influencer marketing is a crucial strategy in the tourism industry, leveraging social media influencers to enhance destination visibility and appeal. This chapter explores its impact on consumer engagement, trust, and destination branding, providing a theoretical framework for understanding its significance. It offers practical insights for tourism marketers, discussing strategies for selecting influencers, planning campaigns, and measuring success. It also addresses challenges and ethical considerations, such as managing fake followers and ensuring regulatory compliance. Real-world examples of successful influencer campaigns in destination marketing and travel service promotion demonstrate the benefits of influencer marketing in tourism. The chapter also predicts future trends and the increasing importance of influencer marketing in the industry. This comprehensive analysis is essential for understanding and leveraging influencer marketing to achieve tourism marketing objectives.
... Meanwhile aaccording to Raza (2020), to expand the market, company need consider the importance of target markets as competitive advantage. According to An et al. (2018), demographics are one of the most popular segmentation segments to do consumer pricing because consumer needs and rates vary by their demographics. In addition, gender has a positive and significant relationship with negative access and sensitivity, moreover, age and marital status have positive and significant relationships with credibility (Wongsaichia et al. 2022). ...
Article
Full-text available
One of the most crucial elements for corporate success is customer happiness. Client happiness leads to customer loyalty, which in turn aids the organization's performance in terms of sales and profit. Customer happiness must be prioritized in order to ensure business excellence in the face of fierce competition among businesses. This paper is about a study on the impact of Umrah Pilgrims' demographics on customer satisfaction. A questionnaire was produced and distributed to 200 respondents from a Melaka Tengah-based private travel and tour firm. This study uses correlation, ANOVA and T-Test method to analyses the findings. SPSS 18 software was used to examine relationships and effect of demographic on Umrah Pilgrims' Satisfaction. Overall, the descriptive result demonstrates a high degree of customer satisfaction, indicating that customers are pleased with the packages supplied by a Melaka Tengah-based private travel and tour company. This finding stated, demographics are a key component that private trips and tours companies in Melaka Tengah and should emphasize in order to provide the greatest quality service for client satisfaction. According to results, the higher perceived value leads to higher customer satisfaction. Therefore, the tour and travel company managers should take into consideration that clients who are coming to company expect to have medium or high quality of service and they expect safe journey of tours. As a result, this data clearly indicates the importance of demographic factors in ensuring consumer satisfaction.
Article
Full-text available
Personas are hypothetical representations of real-world people used as storytelling tools to help designers identify the goals, constraints, and scenarios of particular user groups. A well-constructed persona can provide enough detail to trigger recognition and empathy while leaving room for varying interpretations of users. While a traditional persona is a static representation of a potential user group, a chatbot representation of a persona is dynamic, in that it allows designers to “converse with” the representation. Such representations are further augmented by the use of large language models (LLMs), displaying more human-like characteristics such as emotions, priorities, and values. In this paper, we introduce the term “Synthetic User” to describe such representations of personas that are informed by traditional data and augmented by synthetic data. We study the effect of one example of such a Synthetic User – embodied as a chatbot – on the designers’ process, outcome, and their perception of the persona using a between-subjects study comparing it to a traditional persona summary. While designers showed comparable diversity in the ideas that emerged from both conditions, we find in the Synthetic User condition a greater variation in how designers perceive the persona’s attributes. We also find that the Synthetic User allows novel interactions such as seeking feedback and testing assumptions. We make suggestions for balancing consistency and variation in Synthetic User performance and propose guidelines for future development.
Article
Full-text available
This research aimed to investigate the application of Vlse Kriterijumska Optimizacija I Kompromisno Resenje (VIKOR) and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) multi-criteria decision-making to select the optimal clustering for e-commerce customer segmentation. In this context, clustering as an unsupervised machine learning method offered a way to overcome the limitations of traditional grouping, particularly by providing the ability to capture the diverse needs of consumers. A total of five different clustering methods were considered based on the behavioral data of e-commerce customers. Even though the analyzed algorithms were well-known and widely used, the comprehensive and multidirectional comparison was not trivial. Selected approaches were evaluated on the basis of twelve indicators (decision criteria), divided into four characteristics that take into account both the out-of-context aspects of clustering and the requirements arising from the context of using the clustering results. The results showed consistent outcomes from both analyzed Multi-Criteria Decision Methods, with some notable differences. The methods obtained the same ranking of the top three clustering algorithms (K-median-BIRCH-K-means). However, the TOPSIS and VIKOR sensitivity analysis recommended K-means in 87% of the cases and 60% of the variants verified, respectively. The parameterization of the decision factors had a significant impact on the final ranking of clustering options. This research demonstrated the practical application of the decision methods in selecting the best clustering for multivariate user interfaces to improve personalization in e-commerce.
Article
Full-text available
Dimension reduction is a challenge task in data processing, especially in high-dimensional data processing area. Non-negative matrix factorization (NMF), as a classical dimension reduction method, has a contribution to the parts-based representation for the characteristics of non-negative constraints in the NMF algorithm. In this paper, the NMF algorithm is introduced to extract local features for dimension reduction. Considering the problem of which NMF is required to define the number of the decomposition rank manually, we proposed a rank-adaptive NMF algorithm, in which the affinity propagation (AP) clustering algorithm is introduced to determine adaptively the number of the decomposition rank of NMF. Then, the rank-adaptive NMF algorithm is used to extract features for the original image. After that, a low-dimensional representation of the original image is obtained through the projection from the original images to the feature space. Finally, we used extreme learning machine (ELM) and k-nearest neighbor (KNN) as the classifier to classify those low-dimensional feature representations. The experimental results demonstrate that the decomposition rank determined by the AP clustering algorithm can reflect the characteristics of the original data. When it is combined with the classification algorithm ELM or KNN and applied to handwritten character recognition, the proposed method not only reduces the dimension of original images but also performs well in terms of classification accuracy and time consumption. A new rank-adaptive NMF algorithm is proposed based on the AP clustering algorithm and the original NMF algorithm. According to this algorithm, the low-dimensional representation of the original data can be obtained without any prior knowledge. In addition, the proposed rank-adaptive NMF algorithm combined with the ELM and KNN classification algorithms performs well.
Article
Full-text available
The emergence of various new technologies, especially social media, has led to their worldwide acceptance, as these technologies offer dual-benefit to the employees in terms of work as well as socialization and entertainment. However, the knowledge about the influence of these social media technologies on the performance of a company is limited. This study measures the influence of social media usage on firm performance, mediated by social capital. Data were collected using surveys method from a sample of 132 social media users of large IT company community members. It was found that all the three dimensions of social media usage (social use, hedonic use and cognitive use) have a positive influence on firm performance (financial and market performance). Social capital was found as a partial mediator between the relationship of social media usage and firm performance. Theoretical and managerial implications along with future research directions are also discussed.
Conference Paper
The use of personas is an interactive design technique with considerable potential for product and content development. A persona is a representation of a group or segment of users, sharing common behavioral characteristics. Although representing a segment of users, a persona is generally developed in the form of a detailed narrative about an explicit but fictitious individual that represents the collection of users possessing similar behaviors or characteristics. In order to make the fictitious individual appear as real person to the product developers, the persona narrative usually contains a variety of both demographic and behavioral details about socio economic status, gender, hobbies, family members, friends, possessions, among many other data. Also, the narrative of a persona normally also addresses the goals, needs, wants, frustrations and other emotional aspects of the fictitious individual that are pertinent to the product being designed. However, personas have typically been viewed as fairly static. In this research, we demonstrate an approach for creating and validating personas in real time, based on automated analysis of actual user data. Our data collection site and research partner is AJ+ ( http://ajplus.net /), which is a news channel from Al Jazeera Media Network that is natively digital with a presence only on social media platforms and a mobile application. Its media concept is unique in that AJ+ was designed from the ground up to serve news in the medium of viewer, versus a teaser in one medium with a redirect to a website. In pursuit of our overall research objective of automatically generating personas in real time, for research reported in this manuscript, we are specifically interested in understanding the AJ+ audience by identifying (1) whom are they reaching (i.e., market segment) and (2) what competitive (i.e., non-AJ+) content are associated with each market segment. Focusing on one aspect of user behavior, we collect 8,065,350 instances of sharing of links by 54,892 users of an online news channel, specifically examining the domains these users share. We then cluster users based on similarity of domains shared, identifying seven personas based on this behavioral aspect. We conduct term-frequency – inverse document frequency (tf-idf) vectorization. We remove outliers of less than 5 shares (too unique) and more than 80% of the all users' shares (too popular). We use K-means++ clustering (K = 2.. 10), which is an advanced version of K-means to improve selection of initial seeds, because K-means++ effectively works for a very sparse matrix (user-link). We use the “elbow” method to choose the optimal number of clusters, which is eight in this case. In order to characterize each cluster, we list top 100 domains from each cluster and discover that there are large overlaps among clusters. We then remove from each cluster the domains that existed in another cluster in order to identify the relevant, unique, and impactful domains. This de-duplication results in the elimination of one cluster, leaving us with a set of clusters, where each cluster is characterized by domains that are shared only by users within that cluster. We note that the K-means++ clustering method can be replaced easily with other clustering methods in various situations. Demonstrating that these insights can be used to develop personas in real-time, the research results provide insights into competitive marketing, topic interests, and preferred system features for the users of the online news medium. Using the description of each of shared links, we detect their languages. 55.2% (30,294) users share links in one just language and 44.8% users share links in multiple languages. The most frequently used language is English (31.98%), followed by German (5.69%), Spanish (5.02%), French (4.75%), Italian (3.46%), Indonesian (2.99%), Portuguese (2.94%), Dutch (2.94%), Tagalog1 (2.71%), and Afrikaans (2.69%). As there were millions of domains shared, we utilize the top one hundred domains for each cluster, resulting in 700 top domains shared by the 54,892 AJ+ users. We, as mentioned, de-duplicated, resulting in the elimination of a cluster (11,011 users, 20.06%). So, we have seven unique clusters based on sharing of domains representing 43,881 users. We then demonstrate how these findings can be leveraged to generate real-time personas based on actual user data. We stream the data analyze results into a relational database, combine the results with other demographic data that we gleaned from available sources such as Facebook and other social media accounts, using each of the seven clusters as representative of a persona. We give each persona a fictional name and use a stock photo as the face of our personas. Each persona was linked to the top alternate (i.e., non-AJ+) domains they most commonly shared with the personas shared links updateable with new data. Research implications are that personas can be generated in real-time, instead of being the result of a laborious, time-consuming development process.
Article
The rapid development of information technology and the fast growth of Internet have facilitated an explosion of information which has accentuated the information overload problem. Recommender systems have emerged in response to this problem and helped users to find their interesting contents. With increasingly complicated social context, how to fulfill personalized needs better has become a new trend in personalized recommendation service studies. In order to alleviate the sparsity problem of recommender systems meanwhile increase their accuracy and diversity in complex contexts, we propose a novel recommendation method based on social network using matrix factorization technique. In this method, we cluster users and consider a variety of complex factors. The simulation results on two benchmark data sets and a real data set show that our method achieves superior performance to existing methods.
Article
Inferring users’ interests from their activities on social networks has been an emerging research topic in the recent years. Most existing approaches heavily rely on the explicit contributions (posts) of a user and overlook users’ implicit interests, i.e., those potential user interests that the user did not explicitly mention but might have interest in. Given a set of active topics present in a social network in a specified time interval, our goal is to build an interest profile for a user over these topics by considering both explicit and implicit interests of the user. The reason for this is that the interests of free-riders and cold start users who constitute a large majority of social network users, cannot be directly identified from their explicit contributions to the social network. Specifically, to infer users’ implicit interests, we propose a graph-based link prediction schema that operates over a representation model consisting of three types of information: user explicit contributions to topics, relationships between users, and the relatedness between topics. Through extensive experiments on different variants of our representation model and considering both homogeneous and heterogeneous link prediction, we investigate how topic relatedness and users’ homophily relation impact the quality of inferring users’ implicit interests. Comparison with state-of-the-art baselines on a real-world Twitter dataset demonstrates the effectiveness of our model in inferring users’ interests in terms of perplexity and in the context of retweet prediction application. Moreover, we further show that the impact of our work is especially meaningful when considered in case of free-riders and cold start users.
Conference Paper
We propose a novel method for generating personas based on online user data for the increasingly common situation of content creators distributing products via online platforms. We use non-negative matrix factorization to identify user segments and develop personas by adding personality such as names and photos. Our approach can develop accurate personas representing real groups of people using online user data, versus relying on manually gathered data.