Conference PaperPDF Available

Design of shopper segmentation systems in retail. Evidence from 2 heterogeneous retail cases

Association for Information Systems
AIS Electronic Library (AISeL)
-+ ""!&*$.+#/%"-"
," &(*/"-"./-+0,+*" &.&+*0,,+-/*!
*(3/& .
Design of shopper segmentation systems in retail.
Evidence from 2 heterogeneous retail cases
Anastasia Griva
Department of Management Science and Technology, Athens University of Economics and Business, Athens Greece
Cleopatra Bardaki
ELTRUN, Department of Management Science and Technology, Athens University of Economics & Business, Greece
Katerina Pramatari
Athens University of Economics & Business',-)/-&0"$-
George Doukidis
Athens University of Economics and Business (AUEB)$&!0"$-
+((+2/%&.*!!!&/&+*(2+-'./ %6,.&."(&.*"/+-$.&$!.
5&.)/"-&(&.-+0$%//+3+03/%"," &(*/"-"./-+0,+*" &.&+*0,,+-/*!*(3/& ./(" /-+*& &--3"/%.
""* ",/"!#+-&* (0.&+*&*-+ ""!&*$.+#/%"-"3),+.&0)3*0/%+-&4"!!)&*&./-/+-+#(" /-+*& &--3
"+-)+-"&*#+-)/&+*,("." +*/ / "(&--3&.*"/+-$
" +))"*!"!&//&+*
1&!"* "#-+)%"/"-+$"*"+0.-"/&( .". Proceedings of the 2018 Pre-ICIS SIGDSA Symposium
Designing shopper segmentation systems
2018 Pre-ICIS SIGDSA Symposium on Decision Analytics Connecting People, Data & Things, San Francisco 2018 1
2018 Pre-ICIS SIGDSA Symposium
Design of shopper segmentation systems in
retail. Evidence from 2 heterogeneous retail
Anastasia Griva, Cleopatra Bardaki, Katerina Pramatari, George
ELTRUN, the E-Business Research Center, Department of Management Science
& Technology, Athens University of Economics & Business, Athens Greece
47A Evelpidon St. & 33 Lefkados St., 113 62, Athens, Greece,,,
Extended abstract
Data proliferation in the retail industry enables data-driven segmentation systems that support retailers to
embrace more customer-centric strategies. This research is motivated by the abundance of data reflecting
the buying behavior of retail shopper and utilizes them for identifying shopping patterns. These patterns
correspond to different shopper segments with specific preferences that may guide tailor-made services. In
this context we propose a shopper segmentation approach that highlights the shopping intentions of
consumers that motivate them to visit the stores. Our approach proposes a holistic view of the consumer
shopping attitude that sees beyond the consumer’s entire sales history or associations of the purchased
products. We move the attention from the purchased products to the shopping needs that motivate the
shopper’s shopping trips and, in particular, we translate shopping basket per visit to shopping intention per
visit. We adopt a broader perspective of shopping trips and we delve into the product categories in shopping
baskets to reveal the shopping intention behind each basket. While other researchers view shoppers just as
associations of product items i.e. cereals milk (e.g. Cil, 2012; Srikant & Agrawal, 1995;) or as a bulk of
visits(e.g. high spending shoppers) (e.g. Aeron et al., 2012; Boone & Roehm, 2002; Han et al., 2014; Liao et
al., 2011, Park e t al., 2014), we want to give a description of the consumers’ behavior during their visits.
We applied to and validated our approach through two heterogeneous retail cases to demonstrate its
generalizability. The first one concerns sales data from different channels and stores of a major fast-moving
consumer goods (FMCG) retailer. The second one concerns sales data obtained by the physical stores of a
Fortune 500 specialty retailer of home improvement and construction products also known as do-it-
yourself (DIY) retailer. Applying our system’s segmentation approach to two heterogeneous retailers, we
identified and assessed how different retailer (e.g. shopping channel/ place, product brand), shopper (e.g.
basket variety, volume) and data (e.g. data variety, volume) features affect the design and application of
shopper segmentation systems. We highlight those features/elements that prospective practitioners and
academics should consider if they want to conduct successful shopper segmentation analysis. We detected
various data characteristics (e.g. data variety, basket variety, and shopping channel) that affect both the
data mining results, as well as the translation of the shopper visit segments to shopping intentions.
Delving deeper into the literature, we identified studies mainly in the marketing domain (e.g. Bradlow et
al., 2017) that discuss several features that affect big data analytics systems in general. However, they do
not present evidence of how these features affected relevant segmentation cases. Also, in the IS literature,
there is a great majority of papers (e.g. Aeron et al., 2012; Boone & Roehm, 2002; Boztuǧ & Reutterer, 2008;
Miguéis et al., 2012; Rust & Huang, 2014) that perform shopper segmentation. Though, to the best of our
knowledge, authors describe their own case and not “the bigger” picture, i.e. how system inputs and features
(e.g. data) affect and alter the segmentation process, system and results/outputs; it is only implied, and
they do not discuss how different features affected segmentation results. In our interdisciplinary study, we
Designing shopper segmentation systems
2018 Pre-ICIS SIGDSA Symposium on Decision Analytics Connecting People, Data & Things, San Francisco 2018 2
identify all these features that the marketing literature has highlighted for studying consumer behavior and
shopping habits. Thus, this research also aspires to bridge marketing researchers and managers with data
scientists. The consumer segmentation analysis and its results should be both handled considering the
“marketing” characteristics of the shoppers and the retailers. Especially the accumulated experience of the
marketing managers and their intuition is necessary for a reliable, meaningful interpretation of the shopper
segment results.
Figure 1 summarizes the features affecting each phase of shopper segmentation. As shown, the translation
layer is the one that is affected by most of the features. At this phase to extract wisdom from the results, we
need experts’ opinion that know the market. Experts not only consider the tangible, quantitative features
(e.g. value, volume etc.) to identify shoppers missions and motives, but also intangible elements such as
their domain knowledge and accumulate experience. Likewise, we could claim that variety is the most
important feature that affects all the phases of our approach, from the outliers’ elimination, and the product
taxonomy calibration, to the identification of the unit of analysis and the translation of the results into
insights. Closing, we should mention that price feature didn’t affect our segmentation. First, it wasn’t
available in all our cases, secondly even in the FMCG case, that was available it didn’t influence our results.
Hence, we partially confirm existing literature that admits that price feature plays an important role in more
particular products e.g. cars.
Figure 1. Shopper Segmentation system
Moreover, the two cases revealed that the units of analysis used in the literature, i.e. product items in a
single visit, or all shopper visits, are not sufficient and applicable in every retail context, but there are cases
where we should examine groups of “x” sequential visits. The value of “x” differs according to the domain
the data derived from. As we proved and as other researches support (Wolf and McQuitty 2011), a shopper
usually visits a retail store that sells products for home improvement many times and purchases few
materials each time. We devise and test a new unit of analysis where we examine groups of x continuous
visits. This intermediate unit of analysis is dictated by the particularity of some retail domains that demand
many store visits during small time windows.
Regarding the value of such a system, it is stressed when considering the consumer-oriented business
decisions it can support. Our approach/system could be evolved into to a tool for designing innovative
marketing campaigns and bundled promotions and cross-coupon programs for product categories that
belong to the same shopping visit segment. Likewise, we can create offline and online product catalogues.
For instance, we have detected women that a professional visit a DIY store for to purchase woodwork
products. Thus, to promote the new collection, it could be more effective to send them product catalogues
that meet their specific preferences, instead of including all the products. Additionally, the extracted
knowledge could be valuable for advertising purposes; e.g. breakfast products advertisements. On the other
hand, it might be used to dictate a new redesigned store layout where product categories in the same visit
segment are positioned in nearby store aisles and shelves. This way shoppers will locate products more
easily and buy more in less time. Further, the store manager could reengineer store operations management
and replenishment strategies by ordering groups of products based on the identified visit segments (Griva
et al., 2018). Last, predicting future behaviors and missions based on historical data can support several
operations e.g. product replenishment, out of stock situations.
Designing shopper segmentation systems
2018 Pre-ICIS SIGDSA Symposium on Decision Analytics Connecting People, Data & Things, San Francisco 2018 3
Future research may address some limitations of this study e.g. cases where the purpose of the visit is to
return items, or buying as a gif etc. Also, we can use data derived from alternative technologies (e.g. Radio
Frequency Identification-RFID, Global Positioning System-GPS) to evaluate the proposed approach. For
instance, data that indicate the shoppers in-store movements and the product categories they interact with
during a visit. Then comparing the resulting visit segments from POS and the IoT (Internet of Things) data
we can identify the selling gaps. From a technical perspective, we can apply more data mining techniques
and compare the resulting visit segments. Also, other techniques e.g. graph mining could also be examined
to further analyze each resulting segment and cope with the difficulty to identify more detailed segments in
the DIY.
Shopper Segmentation, Retail Analytics, Shopper Behavior, Cluster Analysis, Data Mining
We would like to thank Wharton Customer Analytics Initiative (, for
providing the dataset regarding the Fortune 500 Specialty retailer. This research has been supported by the
European Commission under the H2020 project Transforming Transport (Under Grand agreement no: 731932).
Aeron, H., Kumar, A., & Moorthy, J. 2012. "Data mining framework for customer lifetime value-based
segmentation", Journal of Database Marketing and Customer Strategy Management, (19:1), pp. 17-
Boone, D. S., & Roehm, M. 2002. "Retail segmentation using artificial neural networks", International
Journal of Research in Marketing, (19 :3), pp. 287-301.
Boztuǧ, Y., & Reutterer, T. 2008. "A combined approach for segment-specific market basket analysis",
European Journal of Operational Research, (187:1), pp. 294-312.
Bradlow, E. T., Gangwar, M., Kopalle, P., & Voleti, S. 2017. "The Role of Big Data and Predictive Analytics
in Retailing", Journal of Retailing, (93:1), pp. 79-95.
Cil, I. 2012. "Consumption universes based supermarket layout through association rule mining and
multidimensional scaling", Expert Systems with Applications, (39:10), pp. 8611-8625.
Griva, A., Bardaki, C., Pramatari, K., & Papakiriakopoulos, D. 2018. "Retail business analytics: Customer
visit segmentation using market basket data", Expert Systems with Applications, (100:2018), pp. 1-
Gupta, S., Hanssens, D., Hardie, B., Kahn, W., Kumar, V., Lin, N., Sriram, S. 2006. "Modeling customer
lifetime value", Journal of Service Research, (9:2), pp. 139-155.
Han, S., Ye, Y., Fu, X., & Chen, Z. 2014. "Category role aided market segmentation approach to convenience
store chain category management", Decision Support Systems, (57:1), pp. 296-308.
Liao, S., Chen, Y., & Hsieh, H. 2011. "Mining customer knowledge for direct selling and marketing", Expert
Systems with Applications, (38:5), pp. 6059-6069.
Miguéis, V. L., Camanho, A. S., & Falcão e Cunha, J. 2012. "Customer data mining for lifestyle
segmentation", Expert Systems with Applications, (39:10), pp. 9359-9366.
Park, C. H., Park, Y.-H., & Schweidel, D. A. 2014. "A multi-category customer base analysis", International
Journal of Research in Marketing, (31:3), pp. 266-279.
Rust, R. T., & Huang, M.-H. 2014. "The Service Revolution and the Transformation of Marketing Science",
Marketing Science, (33:2), pp. 206-221.
Srikant, R., & Agrawal, R. 1995. "Mining Generalized Association Rules", in VLDB ’95 Proceedings of the
21th International Conference on Very Large Data Bases, Umeshwar Dayal, Peter M. D. Gray,
Shojiro Nishio (eds.), Zurich, Switzerland, pp. 407-419.
Wolf, M., & McQuitty, S. 2011. "Understanding the do-it-yourself consumer: DIY motivations and
outcomes", AMS Review, (1), pp. 154170.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Basket analytics is a powerful tool in the retail context for acquiring knowledge about consumer shopping habits and preferences. In this paper, we propose a business analytics approach that mines customer visit segments from basket sales data. We characterize a customer visit by the purchased product categories in the basket and identify the shopping intention or mission behind the visit e.g. a ‘breakfast’ visit to purchase cereal, milk, bread, cheese etc. We also suggest a semi-supervised feature selection approach that uses the product taxonomy as input and suggests customized categories as output. This approach is utilized to balance the product taxonomy tree that has a significant effect on the data mining results. We demonstrate the utility of our approach by applying it to a real case of a major European fast-moving consumer goods (FMCG) retailer. Apart from its theoretical contribution, the proposed approach extracts knowledge that may support several decisions ranging from marketing campaigns per customer segment, redesign of a store's layout to product recommendations.
Full-text available
The paper examines the opportunities in and possibilities arising from big data in retailing, particularly along five major data dimensions—data pertaining to customers, products, time, (geo-spatial) location and channel. Much of the increase in data quality and application possibilities comes from a mix of new data sources, a smart application of statistical tools and domain knowledge combined with theoretical insights. The importance of theory in guiding any systematic search for answers to retailing questions, as well as for streamlining analysis remains undiminished, even as the role of big data and predictive analytics in retailing is set to rise in importance, aided by newer sources of data and large-scale correlational techniques. The Statistical issues discussed include a particular focus on the relevance and uses of Bayesian analysis techniques (data borrowing, updating, augmentation and hierarchical modeling), predictive analytics using big data and a field experiment, all in a retailing context. Finally, the ethical and privacy issues that may arise from the use of big data in retailing are also highlighted.
Full-text available
Estimating Customer Lifetime Value (CLV) is essential for firms competing in data-rich environments. Segmentation on the basis of CLV is helpful in customization of products and services by justification of resource allocation. Model-based automated decision making is likely to penetrate various marketing decision-making environments. We are presenting a framework for customer lifetime value-based segmentation. The framework automates two decisions: first, selection of variables; and second creation of optimal segments on the basis of CLV. The framework uses clustering for segmentation and genetic algorithm for optimization.
Full-text available
Category management (CM) plays an increasingly important role in retailing management, as it aids retailers to increase their core competitiveness, maximise profits and ensure a good long-term customer relationship. This technique has been successfully applied to diverse large manufacturers and wholesale retailers. However, it remains a challenging task to directly employ the CM technique in convenience store (CVS) chain(s). This is because CVS chains are often distributed in a variety of areas, each store has impulsive consumers, and the traditional market segmentation attributes (e.g. consumer age, salary, and background) are difficult to collect under such circumstances. This makes it impractical to apply one general CM solution to all CVS chains. Hence, it is crucial to segment a market region and then apply customised CM solutions to the corresponding segments. This paper presents an innovative market segmentation model which is driven by category-role (CR), for the first time, to support CM in CVS chains. A new similarity measure (named HCsim()) and an improved weighted fuzzy K-means clustering algorithm (WFKM) are developed in an effort to cluster the CVSs. The usefulness and applicability of this study is illustrated by means of an empirical study to provide marketing strategy decision support. The derived results are also discussed and compared with existing methods.
Full-text available
Do-it-yourself (DIY) is an increasingly popular consumer behavior, but little is known about this large consumer segment. We undertake a depth interview study and review diverse literatures to develop a conceptual model of DIY behavior that explores the reasons why consumers DIY and the benefits they receive. The purpose of the model is to improve our understanding of a consumer segment that, in many ways, behaves differently from typical consumers. Research propositions are derived and discussions of implications and ideas for future research follow.
Full-text available
The success of retail business is influenced by its fast response and its ability in understanding consumers’ behaviors. Analysis of transaction data is the key for taking advantage of these new opportunities, which enables supermarkets to understand and predict customer behavior, has become a crucial technique for effective decision-making and strategy formation. We propose a methodological framework for the use of the knowledge discovery process and its visualization to improve store layout. This study examines the layout strategy in relation to supermarket retail stores and assists managers in developing better layout for supermarkets. We use the buying association measure to create a category correlation matrix and we apply the multidimensional scale technique to display the set of products in the store space. This is a new approach to supermarket layout from industrial categories to consumption universes that is consumer-oriented store layout approach through a data mining approach. This framework is useful for both academia and retail industry. For industry professionals, it may be used to guide development of successful layout. Retailers can utilize the proposed model to dynamically improve their in-store conversion rate. As the empirical study, a practical application proceeded for Migros Turk, a leading Turkish retailing company.
Full-text available
As modern economies become predominantly service-based, companies increasingly derive revenue from the creation and sustenance of long-term relationships with their customers. In such an environment, marketing serves the purpose of maximizing customer lifetime value (CLV) and customer equity, which is the sum of the lifetime values of the company’s customers. This article reviews a number of implementable CLV models that are useful for market segmentation and the allocation of marketing resources for acquisition, retention, and cross-selling. The authors review several empirical insights that were obtained from these models and conclude with an agenda of areas that are in need of further research.
The nature of marketing science is changing in a systematic, predictable, and irrevocable way. As information technology enables ubiquitous customer communication and big customer data, the fundamental nature of the firm's connection to the customer changes: better, more personalized service can be offered, from which service relationships are deepened, and consequently, more profitable customers grow the influence of service within the goods sector and expand the service sector in the economy. Marketing is becoming more personalized, and marketing science techniques that exploit customer heterogeneity are becoming more important. Information technology improvements also guarantee the increasing importance and usage of computationally intensive data processing and "big data." Most importantly, these trends have already lasted for more than a century, and they will become even more pronounced in the coming years as a result of the monotonic nature of technology improvement. These changes imply a transformation of marketing science in both the topics to be emphasized and the methods to be employed. Increasingly, and inevitably, all of marketing will come to resemble to a greater degree the formerly specialized area of service marketing, only with an increased emphasis on marketing analytics.
Customer base analysis is an essential tool to measure and develop relationships with customers. While various models have been proposed in a noncontractual setting, they focus primarily on analyzing transactional patterns associated with a single product category or a firm-level activity, such as the times at which purchases are made at a particular retailer. This research proposes a modeling framework for customer base analysis in a multi-category context. Specifically, we model the time between acustomer’s purchases at the firm and the product categories that comprise her shopping basket arising from multi-category choice decisions. The proposed model uses a latent space approach that parsimoniously captures the dynamics of multi-category shopping behavior due to the interplay between purchase timing and shopping basket composition. We also account for interdependence among multiple categories, temporal dependence across category choices, and latent customer attrition. Using category-leveltransaction data, we show that the proposed model offers excellent fit and performance in predicting customer purchase patterns across multiple categories. The forecasts and inferences afforded by our model can assist managers in tailoring marketing efforts across categories.
A good relationship between companies and customers is a crucial factor of competitiveness. Market segmentation is a key issue for companies to develop and maintain loyal relationships with customers as well as to promote the increase of company sales. This paper proposes a method for market segmentation in retailing based on customers’ lifestyle, supported by information extracted from a large transactional database. A set of typical shopping baskets are mined from the database, using a variable clustering algorithm, and these are used to infer customers lifestyle. Customers are assigned to a lifestyle segment based on their purchases history. This study is done in collaboration with an European retailing company.