ArticlePDF Available

Market basket analysis in insurance industry

Authors:

Abstract and Figures

Nowadays, many organizations focus on discovering their customers’ hidden patterns to maintain their competitive position through customer analysis. In fact, more and more organizations are realizing that customers are their most valuable resources. This paper performs a research using data associated with 300 clients of an insurance company in city of Anzali, Iran and they are analyzed using K-Means clustering method. Using demographic variables including gender, age, occupation, education level, marital status, place of residence and clients’ incomes, the study determines the optimal numbers of clusters in order to achieve necessary data for grouping customers. Next, the study uses the method of association rules to find hidden patterns for the insurance industry.
Content may be subject to copyright.
*Corresponding author.
E-mail addresses: kvahidi1989@yahoo.com (M. Vahidi Roodpis hi)
© 2015 Growing Science Ltd. All rights reserved.
doi: 10.5267/j.msl.2015.2.004
Management Science Letters 5 (2015) 393–400
Contents lists available at GrowingScience
Management Science Letters
homepage: www.GrowingScience.com/msl
Market basket analysis in insurance industry
Marzieh Vahidi Roodpishia* and Reza Aghajan Nashtaeib
aM.A. of Business Management, Rasht Branch, Islamic Azad University, Rasht, Iran
bDepartment of Business Management, Rasht Branch, Islamic Azad University, Rasht, Iran
C H R O N I C L E A B S T R A C T
Article history:
Received October 28, 2014
Received in revised format 2
February 2015
Accepted 4 February 2015
Available online
February 4 2015
Nowadays, many organizations focus on discovering their customers’ hidden patterns to
maintain their competitive position through customer analysis. In fact, more and more
organizations are realizing that customers are their most valuable resources. This paper
performs a research using data associated with 300 clients of an insurance company in city of
Anzali, Iran and they are analyzed using K-Means clustering method. Using demographic
variables including gender, age, occupation, education level, marital status, place of residence
and clients’ incomes, the study determines the optimal numbers of clusters in order to achieve
necessary data for grouping customers. Next, the study uses the method of association rules to
find hidden patterns for the insurance industry.
Growing Science Ltd. All rights reserved. 5© 201
Keywords:
Insurance industry
Shopping basket analysis
Data mining
Clustering
Association rules
1. Introduction
With the advent of new technology and competition facilities, the market environment of the insurance
industry has become highly competitive. Insurance industry and the entry of private insurance have led
to intense competition among new firms and older firms pay more attention to ways of analyzing
customer data. Therefore, considering the market basket helps insurance firms reach better
understanding about clients’ demands. Decision-making and understanding customer behavior are
critical and challenging issues for organizations to maintain their position in the competitive market.
Technological innovation also helps firms achieve better processing of customers’ needs. Data mining
tools as the most reliable tool for analyzing large amounts of data helps achieve success in decision-
making.Data mining analysis can be effectively used to determine the patterns and to optimize the
dynamic behavior of transactions made by consumers for the purchase of specific products. Mansur
and Kuncoro (2012), in a survey, tried to find out more about consumers’ behavior in buying the
products so it could be used to forecast the purchasing for the next period. Later, they used the
prediction as a decision support in detecting the suitable amount of inventory for each product at
394
Karomah Brass, which was a small and medium enterprise. The techniques used in this study were
based on the Market Basket Analysis (MBA) and Artificial Neural Network (ANN) Back propagation.
They used MBA to investigate the buying behavior of customer while ANN Back propagation was
applied to predict product inventory's requirements/needs for each product. They reported that the
customers frequently buy products that serve as a type of antique closet accessories and if customer
purchased that certain product, then they would also buy similar products in accordance with 21 rules
obtained from the mining of transaction data.
Trnka (2010) explained the way of MBA implementation to Six Sigma methodology. Data mining
techniques provide significant amount of opportunities in the market sector. Data mining applications
are becoming increasingly popular for many applications across a set of very divergent fields. Six
Sigma methodology implements several statistical techniques (Pande & Abdel-Aty, 2009). With
implementation of MBA as a part of Data Mining (Han et al., 2006) to Six Sigma Trnka improved the
results and changed the Sigma performance level of the process. In his survey, General Rule Induction
algorithm to produce association rules between products in the market basket was implemented. Cheng
and Chen (2009) presented a new procedure, joining quantitative value of RFM attributes and K-means
algorithm into rough set theory (RS theory), to obtain meaning rules and to find out the characteristic
of customer in order to strengthen CRM.
According to Russell and Petersen (2000), Market basket choice is a decision process in which a
consumer chooses different items from a number of product categories on the same shopping trip. The
primary feature of market basket choice is the interdependence in demand relationships across the
things in the target basket. Russell and Petersen (2000) developed a new method to the specification of
market basket models, which helps a choice model for a basket of products to be built using a set of
“local” conditional choice models corresponding to each item in the basket. The method provides a
parsimonious market basket model, which permits for any kind of demand relationship across product
categories and can be forecasted based on simple modifications of standard multinomial logit software.
Tang et al. (2008) presented a method to perform MBA in a multiple-store and multiple-period
environment. They first defined a time concept hierarchy and a place (location) hierarchy, based on the
application and requirements. A set of contexts was systematically extracted from the two hierarchies
by integrating the concept levels of the two hierarchies. They developed an efficient approach for
extracting the association rules, which could meet the support and confidence needs for all the contexts.
Using the approach, a decision maker is able to analyze purchasing patterns at very detailed concept
levels of time and place and combinations of detailed levels of one with general level of the other. The
association rules appeared to be well organized, because they were generated based on the contexts
extracted from the time and place hierarchies.
Cavique (2007) presented a method to discover large itemset patterns for the MBA, where the
condensed data were used and was obtained by transforming the market basket problem into a
maximum-weighted clique problem. Wick and Wagner (2006) applied MBA to integrate and motivate
topics in discrete structures. Dhanabhakyam and Punithavalli (2011) examined customer buying
patterns by detecting associations among different items that customers place in their shopping baskets.
Raorane et al. (2012) investigated the huge amount of data thereby exploiting the consumer behavior
and making the correct decision leading to competitive edge over rivals. Yun et al. (2006) explored a
clustering of market-basket data, which was different from the traditional data.
2. The proposed model
This paper performs a research using data of 300 clients of insurance company in city of Anzali, Iran
and they are analyzed using K-Means clustering method. Using demographic variables including
gender, age, occupation, education level, marital status, place of residence, income clients the study
determines the optimal number of clusters in order to achieve necessary data for grouping customers.
M. Vahidi Roodpishi and R. Aghajan Nashtaei / Management Science Letters 5 (2015)
395
Next, the study uses the method of association rules and practice of insurance policies to find hidden
patterns in the insurance industry. Fig. 1 demonstrates the summary of the proposed study.
Fig. 1. The structure of the proposed study
In Fig. 1, the structure proposed study includes the following:
The first step is to review the literature and identify relevant variables for modeling the behavior of
the customers and design the questionnaire to collect the data from the insurance companies.
The second step of the K-Means is associated with clustering and using demographic variables
including gender, age, occupation, education level, marital status, place of residence, income clients to
determine the optimal number of clusters, on the banks intelligence.
The third step uses the method of association rules to determine hidden patterns in cart insurance
industry clients.
The fourth step is to validate the results obtained by our data on the association rules.
The study has accomplished among customers of third party car insurance in city of Anzali located in
province of Gilan, Iran in 2014. All questionnaires were distributed in 6 Insurance Agents in Iran. This
paper performs a research using data of 300 clients of insurance company in city of Anzali, Iran and
they are analyzed using K-Means clustering method.
2.1 K-means technique
The K-means technique has become popular method in generating appropriate clustering results for
many real-world case studies. K-means clustering is a well-known data mining clustering model, which
tries to partition N observations into K clusters where each observation is assigned to the cluster with
the closest mean. Normal evaluation of an appropriate K is performed by minimizing the inner-cluster
variation and maximizing the among-cluster variation, concurrently. K-means clustering is normally
sensitive to outliers, therefore, outliers have to be eliminated before accomplishing clustering. Edwards
(2003) and Kantardzic (2011) described the K-means method with the following steps,
1. Select a primary part of K categories including samples randomly chosen and compute the mean of
each pair,
2. Build a new section of each part by computing the nearest center core,
3. Compute the new batches as the main centers,
4. Repeat step 2 and step 3 until the algorithm reaches termination criteria.
396
2.2. Association rules
Non-regulatory mining association rules in data mining to establish the relationship between the items
are created. An association rule of the form X Y is a term meaning that X and Y are discrete sets of
items. (X Y = Y) means that Power of Community law can be given support and confidence measures
(Ngai et al., 2009).
3. Implementation
For the proposed study of this paper, the implementation of the data modeling, data mining software
are executed using SPSS Clementine data mining software, which is widely used for modeling. We
consider the following three steps:
Step One: Use the clustering method to get the direction of grouping customers,
Step Two: Use the method of association rules to find hidden patterns in the insurance industry
customer MBA,
Step Three: Review the insurance industry clients Basket validation by association rules.
3.1. Clustering
Clustering the data is accomplished based on the demographic variables including gender, age,
occupation, education level, marital status, place of residence and clients’ income clients. These are
more descriptive data and the proposed study uses a K-Means clustering. For the implementation of K-
Means clustering, the number of clusters is important. Therefore, we optimized it using SSE criterion
for assessing the quality of clustering to determine the number of clusters. Due to the high volume of
data to compare, the number of clusters starts from 2 clusters and Table 1 demonstrates the results of
SSE for different numbers of clusters.
Table 1
The number of parameters career clusters
Number of clusters 2 3 4 5 6
SSE 1.548341 1.547119 1.600229 1.593326 1.629742
Characteristics of each cluster are as follows :
Cluster 1: Customers or employees, mostly engineers, earn a monthly income of between 5×106-10×106
Rials.
Cluster 2: Customers, employees or farms with income between 2.5×106-5×106 Rials and are mostly
high school graduates.
Cluster 3: Customers aged 35-50 with 12-14 years of education and a monthly income of 5×106-20×106
Rials.
Cluster 4: Customers aged 35-50 with Bachelor degree of science and a monthly income of 10×106-
20×106 Rials.
Cluster 5: Customers aged, at least, 50 with 12 years of education and a monthly income of at least
20×106 Rials.
3.2. Association rules and hidden patterns
Once the data for insurance customers have been collected and clustering has been accomplished, we
examined the association rules hidden in each cluster using causal extraction algorithm. The following
items have been investigated for validation of the data,
M. Vahidi Roodpishi and R. Aghajan Nashtaei / Management Science Letters 5 (2015)
397
• Which insurance services do customers use (auto insurance, life insurance, health insurance,
accident insurance, liability insurance, engineering insurance),
• Term life insurance,
• Annual fee for insurance,
• Specifications of the month of insurance,
• Location of the insurance firm,
• Validity of the leading insurance company in the insurance services,
• Use the services of the insurance company upon the recommendation of friends and family,
• The advantage of insurance company's offer to friends and family.
Tables 2-6 present some examples of applying Association rule derived based on on clusters of 1 to 5.
Table 2
Examples of association rules in cluster
Consequence Antecedent Support% Confidence% Lift
Car Insurance = Yes
Time = 2-5 and Month = 2nd quarter and Credit = yes and Life= yes
and Income = 5×106-10×106 Rials
26.596 80.0 1.106
Life Insurance= Yes
Time = 2-5 and Month = 2nd quarter and Credit = yes and Life= yes
and Income = 5×106-10×106 Rials and Engineering = Yes
15.957 80.0 1.016
Car Insurance = Yes
= Accidentquarter and Credit = yes and
nd
5 and Month = 2-Time = 2
Engineering = Yesand Rials
6
10×10-
6
5×10yes and Income =
27.66 80.769 1.117
Car Insurance = Yes
quarter and Credit = yes and Life= yes
nd
5 and Month = 2-Time = 2
Engineering = Yesand Rials
6
10×10-
6
5×10Income = and
31.914 83.333 1.152
Accident= yes
quarter and Credit = yes and Life= yes
nd
5 and Month = 2-Time = 2
Engineering = Yesand Rials
6
10×10-
6
5×10Income = and
28.723 85.185 1.144
Accident= yes
quarter and Credit = yes and Life= yes
nd
and Month = 2 2-1Time =
Engineering = Yesand Rials
6
10×10-
6
5×10Income = and
24.468 86.667 1.163
Engineering = Yes
Credit = yes and Accidentquarter and
nd
5 and Month = 2-Time = 2
Rials
6
10×10-
6
5×10Income = = Yes and
24.468 100.0 1.0217
Engineering = Yes
Income Credit = Yes and and Car ownership = yes5 and -Time = 2
Rials
6
10×10-
6
5×10=
26.596 100.0 1.0217
Accident= yes
Income = Credit = Yes and quarter and
nd
5 and Month = 2-Time = 2
Engineering = Yesand Rials
6
10×10-
6
5×10
28.723 85.185 1.144
The results of Table 2 indicate that the cluster is ruled by one of the following:
1) People who use their car insurance usually have maintained between 1 to 5 years of insurance
services. Usually, this is the second quarter to extend the validity of the insurance care, Life insurance
and engineering services and income between 5×106-10×106.
2) People who use life insurance usually hold between 1 to 2 years insurance. Usually, this is the second
quarter to extend insurance and engineering insurance services are between 500 thousand to 1 million
dollars.
3) People who usually have between 1 to 5 years of car insurance. This is usually the second quarter to
extend insurance and engineering insurance and life insurance services.
4) People who use their car insurance and it has been between 1 to 2 years of using their insurance.
This is the second quarter to extend their insurance and engineering insurance and have incomes
between 5×106-10×106 Rials.
5) People who use their engineering insurance and they have maintained between 2 to 5 years of using
their insurance. Usually this is the second quarter of credit insurance services, life insurance is
important for them to use and their incomes are between 5×106-10×106 Rials.
398
Table 3
Examples of association rules in the second cluster
Consequence Antecedent Support% Confidence% Lift
Car Insurance = Yes
<= yes and Income Place quarter and
dr
35 and Month = -Time = 2
Rials
6
5×102
18.181 87.5 0.962
Car Insurance = Yes
<= yes and Income Place quarter and
dr
3and Month = 10-5Time =
Rials
6
5×102
13.636 100.0 1.1
Based on the results of Table 3, the following can be concluded,
1) People who use their car insurance and maintain between 5 and 10 years of insurance. Place of
issuing insurance is important.
2) People who use their car insurance and maintain between 5 and 10 years of insurance. Place of
issuing insurance is important.
Table 4
Examples of association rules in cluster 3
Consequence Antecedent
Support% Confidence% Lift
Life insurance
= yes
-
6
5×10Income =quarter and Place = yes
rd
and Month = 35 -Time = 2
Rials and Car Insurance = yes
6
10×10
19.608 80.0 0.983
Life insurance
= yes
Place = yes and Responsibility insurance = yes and Accident insurance =
yes and Car Insurance = yes
79.412 80.246 0.986
Life insurance
= yes
-
6
5×10Income = quarter and Place = yes
rd
and Month = 3 2-1Time =
Insurance = yes ResponsibilityRials and
6
10×10
16.667 82.352 1.012
Accident
insurance = yes
Time < 1 and Life insurance = yes and Month = 3rd quarter and Credit =
yes and Place = yes
11.764 83.33 0.913
Accident
insurance = yes
and Credit = yes quarter
rd
Month = 3< 1 and Car insurance = yes and Time
and Place = yes
14.706 80.0 0.877
Car Insurance =
yes
6
10×10-
6
5×10Income == yes Creditquarter and
rd
and Month = 3< 1Time
Rials
12.745 100.0 1.009
According to Table 3, the following can be concluded,
1) People use their life insurance usually between 2 and 5 years. They renew their insurance in the third
quarter of the year and place of insurance firm is important for them.
2) People who use their life insurance usually between 1 to 2 years of insurance plan and renew their
life insurance in the third quarter of year and provide liability and accident insurance.
3) People who use life insurance where the insurance company is important in choosing their insurer.
And liability insurance and auto accidents are often used.
4) People who have insurance are usually under one year of using insurance. They renew their insurance
in the third quarter of year.
Table 5
Examples of association rules in cluster 4
Consequence Antecedent Support% Confidence% Lift
Accidence
Insurance = yes
Rials and
6
20×10-
6
10×105 and Credit = yes Income =-Time = 2
Engineering Insurance = yes and Life Insurance = yes
27.027 80.0 0.925
Life insurance =
yes
quarter and Accident = yes and Income
n
d
5 and Month = 2-Time = 2
Rials and Credit = yes
6
20×10-
6
10×10=
29.73 90.909 0.961
Life insurance =
yes
quarter and Accident = yes and Income
n
d
5 and Month = 2-Time = 2
Rials and Engineering insurance = yes
6
20×10-
6
10×10=
29.73 90.909 0.961
Engineering
Insurance = yes
quarter and Accident = yes and Income
n
d
5 and Month = 2-Time = 2
Rials and Engineering insurance = yes
6
20×10-
6
10×10=
62.162 100.0 1.0571
Engineering
Insurance = yes
quarter and Life insurance = yes and Income
nd
5 and Month = 2-Time = 2
Rials and Credit = yes
6
20×10-
6
10×10=
27.027 100.0 1.0571
Accidence
Insurance = yes
Rials
6
20×10-
6
10×105 and Life insurance = yes and Income =-Time = 2
and Credit = yes and Engineering insurance = yes
37.837 92.857 1.073
Accidence
Insurance = yes
-
6
10×10quarter and Life insurance = yes and Income =
nd
Month = 2
Rials and Credit = yes
6
20×10
64.864 95.833 1.108
M. Vahidi Roodpishi and R. Aghajan Nashtaei / Management Science Letters 5 (2015)
399
According to Table 5, the following can be concluded,
1) People who use their life insurance have between 2 and 5 years of insurance and extend their
insurance and engineering insurance services during the second quarter of the season. Credit insurance
is also important for them.
2) People who use insurance have 1 to 2 years of insurance, extend their insurances including life
insurance services and engineering in the second quarter of year.
4. Discussion and conclusion
The present study was designed to analyze the insurance company's customers based on Market basket
analysis. The paper has performed a research using data associated with 300 clients of insurance
company in city of Anzali, Iran and they have been analyzed using K-Means clustering method. Using
demographic variables including gender, age, occupation, education level, marital status, place of
residence, clients’ income, the study has determined the optimal numbers of clusters in order to achieve
necessary data for grouping customers. Next, the study used the method of association rules and
practice of insurance policies to find hidden patterns in the insurance industry. The results of this survey
could be used for targeting appropriate customers in cities located in north regions of Iran. The study
can be also extended for other regions of the country and we leave it for interested researchers as future
studies.
Acknowledgement
The authors would like to thank the anonymous referees for their constructive comments on earlier
version of this paper.
References
Cavique, L. (2007). A scalable algorithm for the market basket analysis. Journal of Retailing and
Consumer Services, 14(6), 400-407.
Cheng, C. H., & Chen, Y. S. (2009). Classifying the segmentation of customer value via RFM model
and RS theory. Expert Systems with Applications, 36(3), 4176-4184.
Dhanabhakyam, M., & Punithavalli, M. (2011). A survey on data mining algorithm for market basket
analysis. Global Journal of Computer Science and Technology, 11(11).
Edwards, D. (2003). Data mining: Concepts, models, methods, and algorithms. Journal of Proteome
Research, 2(3), 334-334.
Han, J., Kamber, M., & Pei, J. (2006). Data mining, Southeast Asia edition: Concepts and techniques.
Morgan Kaufmann.
Kantardzic, M. (2011). Data mining: concepts, models, methods, and algorithms. John Wiley & Sons.
Mansur, A., & Kuncoro, T. (2012). Product inventory predictions at small medium enterprise using
market basket analysis approach-neural networks. Procedia Economics and Finance, 4, 312-320.
Ngai, E. W., Xiu, L., & Chau, D. C. (2009). Application of data mining techniques in customer
relationship management: A literature review and classification. Expert systems with
applications, 36(2), 2592-2602.
Pande, A., & Abdel-Aty, M. (2009). Market basket analysis of crash data from large jurisdictions and
its potential as a decision support tool. Safety science, 47(1), 145-154.
Raorane, A. A., Kulkarni, R. V., & Jitkar, B. D. (2012). Association rule–extracting knowledge using
market basket analysis. Research Journal of Recent Sciences,1(2), 19-27.
Russell, G. J., & Petersen, A. (2000). Analysis of cross category dependence in market basket
selection. Journal of Retailing, 76(3), 367-392.
Tang, K., Chen, Y. L., & Hu, H. W. (2008). Context-based market basket analysis in a multiple-store
environment. Decision Support Systems, 45(1), 150-163.
400
Trnka, A. (2010, June). Market basket analysis with data mining methods. Networking and Information
Technology (ICNIT), 2010 International Conference on (pp. 446-450). IEEE.
Wick, M. R., & Wagner, P. J. (2006, March). Using market basket analysis to integrate and motivate
topics in discrete structures. In ACM SIGCSE Bulletin(Vol. 38, No. 1, pp. 323-327). ACM.
Yun, C. H., Chuang, K. T., & Chen, M. S. (2006). Adherence clustering: an efficient method for mining
market-basket clusters. Information Systems, 31(3), 170-186.
... precipitation prediction [33]; -insurance risk assessment [34]; -traffic safety analysis [35][36][37]; -assessment of construction project risk [27]; -assessment of risk in construction disputes [38]; -a variety of problems in biology [39][40][41]; -preferences' discovering in social sciences [42]; -collusion detection in tender procedures [43]; -quality management problem-solving in production [44]. ...
Article
Full-text available
There are several factors influencing the time of construction project execution. The properties of the planned structure, the details of an order, and macroeconomic factors affect the project completion time. Every construction project is unique, but the data collected from previously completed projects help to plan the new one. The association analysis is a suitable tool for uncovering the rules-showing the influence of some factors appearing simultaneously. The input data to the association analysis must be preprocessed - every feature influencing the duration of the project must be divided into ranges. The number of features and the number of ranges (for each feature) create a very complicated combinatorial problem. The authors applied a metaheuristic tabu search algorithm to find the acceptable thresholds in the association analysis, increasing the strength of the rules found. The increase in the strength of the rules can help clients to avoid unfavorable sets of features, which in the past - with high confidence - significantly delayed projects. The new 7-score method can be used in various industries. This article shows its application to reduce the risk of a road construction contract delay. Importantly, the method is not based on expert opinions, but on historical data.
... Despite its original applications (they can be still found e.g., [57]) there is a spectrum of the association analysis applications. For example, it is applied in: insurance for risk assessment [63]. ...
Article
Full-text available
A high monetary value of the construction projects is one of the reasons for frequent disputes between a general contractor (GC) and a client. A construction site is a unique, one-time, and single product factory with many parties involved and dependent on each other. The organizational dependencies and their complexity make any fault or mistake propagate and influence the final result (delays, cost overruns). The constant will of the parties involved results in completing a construction object. The cost increase, over the expected level, may cause settlements between parties difficult and lead to disputes that often finish in a court. Such decision of taking a client to a court may influence the future relations with a client, the trademark of the GC, as well as, its finance. To ascertain the correctness of the decision of this kind, the machine learning tools as decision trees (DT) and artificial neural networks (ANN) are applied to predict the result of a dispute. The dataset of about 10 projects completed by an undisclosed contractor is analyzed. Based on that, a much bigger database is simulated for automated classifications onto the following two classes: a dispute won or lost. The accuracy of over 93% is achieved, and the reasoning based on results from DT and ANN is presented and analyzed. The novelty of the article is the usage of in-company data as the independent variables what makes the model tailored for a specific GC. Secondly, the calculation of the risk of wrong decisions based on machine learning tools predictions is introduced and discussed.
... In crime analysis, this algorithm also attracts attention of researchers in this area due to the ability of this algorithm in producing highly related result for future decision making actions. [20] Research in 300 data clients of insurance company by using demographic variable data which focusing on grouping customer and find hidden patterns. Using Association Rule Mining for Extracting Product Sales Patterns in Retail Store Transaction [21] Prediction of product sales trends and customer behavior. ...
... It also helps the marketing analyst to understand the behavior of customers, e.g. which products are being bought together [Kaur, Kang 2016]. Over the years, market basket analysis has started to play an increasingly important role in the analysis of financial and insurance transactions [Roodpishi, Nashtaei 2015], in telecommunications [Jaroszewicz 2008] and in the pharmaceutical industry [Cerrito 2007;Hsieh et al. 2008]. This technique was also used on a large data set of cyclone conditions to derive hypotheses regarding which particular conditions are the best predictors of cyclones [Yang et al. 2007]. ...
Article
Full-text available
Market basket analysis, which is a method of discovering co-occurrence relationships, is widely used for the purposes of marketing research and e-commerce, mainly by supermarkets and online stores. Moving beyond the traditional notion of a market basket understood as a fixed list of products, the technique can be applied for data mining in other fields of research which do not involve traditional transactions and purchases made by customers. The following article describes theoretical aspects of market basket analysis with an illustrative application based on data from the National Census of Population and Housing 2011 with respect to marital status. This is the first application of market basket analysis to census data to be conducted in Poland, in which attributes of the market basket have been replaced with respondents' demographic characteristics. This approach makes it possible to identify relationships between legal (de jure) marital status and actual (de facto) marital status, taking into account other basic socio-demographic variables available in large datasets. Using the R software to generate choropleth maps classified by province as a method of visualizing association rules, it was possible to conduct a spatial analysis of the phenomenon of interest.
Article
Full-text available
The main advantage of the structural composite material known as cement-stabilized rammed earth (CSRE) is that it can be formulated as a sustainable and cost-saving solution. The use of the aggregates collected very close to a construction site allows economizing on transportation costs. Another factor that makes sustainability higher and the costs lower is a small addition of cement to the CSRE in comparison to the regular concrete. However, the low cement content makes the compressive strength of this structural material sensitive to other factors. One of them is the composition of the aggregates. Considering the fact that they are obtained locally, without full laboratory control of their composition, achieving the required compressive strength of CSRE is a challenge. To assess the possibility of achieving a certain compressive strength of CSRE, based on its core properties, the innovative algorithm of designing CSRE is proposed. Based on 582 crash-test of CSRE samples of different composition and compaction levels, along with the use of association analysis, the spreadsheet application is created. Applying the algorithm and the spreadsheet, it is possible to design the composition of CSRE with high confidence of achieving the required compressive strength. The algorithm considers a random character of aggregates locally collected and proposes multiple possible ways of increasing the confidence. They are verified through innovatively applied association analyses in the enclosed spreadsheet.
Article
This study was conducted in order to make a Market Basket Analysis by using Association Rules. The data used in the study are the sales data of any supermarket received from the Vancouver Island University website. Data were analyzed in the Weka program using a data set containing 225 different products. Apriori and FP Growth, which are Association Rules algorithms, were tried in order. Since the data set is categorical, the Apriori algorithm did not yield any results. Therefore, the FP Growth algorithm was used and the top 10 rules were given according to the conviction value. The best rule accordingly; a customer who buys Milk, Sweet Relish and Pepperoni Pizza (Frozen) also gets eggs. Best rule with 21.06 Conviction and 1 (100%) confidence values are this rule. 24 customers who received these 3 products in the dataset received eggs. Similarly, also other rules were interpreted in this study. As a result, product placement in the supermarket can be made according to these rules. Thus, sales of these products will increase and supermarket revenue will increase directly.
Article
Full-text available
One of the key problems in every company, including small and medium enterprises, is how to determine the inventory level for each product that will be sold to their customers appropriately as it can suppress the build up of inventory as well as avoid the stock out. This study is aimed to understand the behavior of consumers in purchasing the products so it can be used to predict the purchasing for the next period. Later, the prediction is used as a decision support in determining the appropriate amount of inventory for each product. The study was conducted at Karomah Brass, a small and medium enterprise engaged in the sale of antique furniture accessories in which the company doesn't produce its own products but buys from the supplier. The methods that used in this study are the Market Basket Analysis (MBA) and Artificial Neural Network (ANN) Back propagation. MBA is used to examine the buying behavior of customer while ANN Back propagation is used to predict product inventory's requirements/needs for each product. The results discover that the customers frequently purchase products that serve as a kind of antique closet accessories and if customer bought that certain product, then they will also buy similar products in accordance with 21 rules that have been obtained from the mining of transaction data. Whereas, other result shows the prediction of the amount product inventory requirements/needs for one year to the next. (C) 2012 The Authors. Published by Elsevier Ltd.
Article
Full-text available
Market basket choice is a decision process in which a consumer selects items from a number of product categories on the same shopping trip. The key feature of market basket choice is the interdependence in demand relationships across the items in the final basket. This research develops a new approach to the specification of market basket models that allows a choice model for a basket of goods to be constructed using a set of “local” conditional choice models corresponding to each item in the basket. The approach yields a parsimonious market basket model that allows for any type of demand relationship across product categories (complementarity, independence, or substitution) and can be estimated using simple modifications of standard multinomial logit software. We analyze the choice of four grocery store categories that exhibit common cross-category brand names for both national brands and private labels. Results indicate that cross-category price elasticities are small. We argue that store traffic patterns may be more important than consumer-level demand interdependence in forecasting market basket choice.
Article
Full-text available
Data mining applications are becoming increasingly popular for many applications across a set of very divergent fields. Analysis of crash data is no exception. There are many data mining methodologies that have been applied to crash data in the recent past. However, one particular application conspicuously missing from the traffic safety literature until recently is association analysis or market basket analysis. The methodology is used by retailers all over the world to determine which items are purchased together. In this study, crashes are analyzed as supermarket transactions to detect interdependence among crash characteristics. The results from the analysis include simple rules that indicate which crash characteristics are associated with each other. The application is demonstrated using non-intersection crash data from the state of Florida for the year 2004. In the proposed methodology no variable needs to be assigned as dependent variable. Hence, it is useful in identifying previously unknown patterns in the data obtained from large jurisdictions (such as the State of Florida) as opposed to the data from a single roadway or intersection. Based on the association rules discovered from the analysis, it was concluded that there is a significant correlation between lack of illumination and high severity of crashes. Furthermore, it was found that under rainy conditions straight sections with vertical curves are particularly crash prone. Results are consistent with the understanding of crash characteristics and point to the potential of this methodology for the analysis of crash data collected by the state and federal agencies. The potential of this technique may be realized in the form of a decision support tool for the traffic safety administrators.
Article
Full-text available
The market basket is defined as an itemset bought together by a customer on a single visit to a store. The market basket analysis is a powerful tool for the implementation of cross-selling strategies. Especially in retailing it is essential to discover large baskets, since it deals with thousands of items. Although some algorithms can find large itemsets, they can be inefficient in terms of computational time. The aim of this paper is to present an algorithm to discover large itemset patterns for the market basket analysis. In this approach, the condensed data is used and is obtained by transforming the market basket problem into a maximum-weighted clique problem. Firstly, the input dataset is transformed into a graph-based structure and then the maximum-weighted clique problem is solved using a meta-heuristic approach in order to find the most frequent itemsets. The computational results show large itemset patterns with good scalability properties.
Article
This book reviews state-of-the-art methodologies and techniques for analyzing enormous quantities of raw data in high-dimensional data spaces, to extract new information for decision making. The goal of this book is to provide a single introductory source, organized in a systematic way, in which we could direct the readers in analysis of large data sets, through the explanation of basic concepts, models and methodologies developed in recent decades. If you are an instructor or professor and would like to obtain instructor's materials, please visit http://booksupport.wiley.com. If you are an instructor or professor and would like to obtain a solutions manual, please send an email to: [email protected] /* */
Conference Paper
This paper describes the way of Market Basket Analysis implementation to Six Sigma methodology. Data Mining methods provide a lot of opportunities in the market sector. Basket Market Analysis is one of them. Six Sigma methodology uses several statistical methods. With implementation of Market Basket Analysis (as a part of Data Mining) to Six Sigma (to one of its phase), we can improve the results and change the Sigma performance level of the process. In our research we used GRI (General Rule Induction) algorithm to produce association rules between products in the market basket. These associations show a variety between the products. To show the dependence between the products we used a Web plot. The last algorithm in analysis was C5.0. This algorithm was used to build rule-based profiles.
Article
Data mining is a powerful new technique to help companies mining the patterns and trends in their customers data, then to drive improved customer relationships, and it is one of well-known tools given to customer relationship management (CRM). However, there are some drawbacks for data mining tool, such as neural networks has long training times and genetic algorithm is brute computing method. This study proposes a new procedure, joining quantitative value of RFM attributes and K-means algorithm into rough set theory (RS theory), to extract meaning rules, and it can effectively improve these drawbacks. Three purposes involved in this study in the following: (1) discretize continuous attributes to enhance the rough sets algorithm; (2) cluster customer value as output (customer loyalty) that is partitioned into 3, 5 and 7 classes based on subjective view, then see which class is the best in accuracy rate; and (3) find out the characteristic of customer in order to strengthen CRM.A practical collected C-company dataset in Taiwan’s electronic industry is employed in empirical case study to illustrate the proposed procedure. Referring to [Hughes, A. M. (1994). Strategic database marketing. Chicago: Probus Publishing Company], this study firstly utilizes RFM model to yield quantitative value as input attributes; next, uses K-means algorithm to cluster customer value; finally, employs rough sets (the LEM2 algorithm) to mine classification rules that help enterprises driving an excellent CRM. In analysis of the empirical results, the proposed procedure outperforms the methods listed in terms of accuracy rate regardless of 3, 5 and 7 classes on output, and generates understandable decision rules.
Article
We explore in this paper the efficient clustering of market-basket data. Different from those of the traditional data, the features of market-basket data are known to be of high dimensionality and sparsity. Without explicitly considering the presence of the taxonomy, most prior efforts on clustering market-basket data can be viewed as dealing with items in the leaf level of the taxonomy tree. Clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market-basket data. In view of the features of market-basket data, we devise in this paper a novel measurement, called the category-based adherence, and utilize this measurement to perform the clustering. With this category-based adherence measurement, we develop an efficient clustering algorithm, called algorithm k-todes, for market-basket data with the objective to minimize the category-based adherence. The distance of an item to a given cluster is defined as the number of links between this item and its nearest tode. The category-based adherence of a transaction to a cluster is then defined as the average distance of the items in this transaction to that cluster. A validation model based on information gain is also devised to assess the quality of clustering for market-basket data. As validated by both real and synthetic datasets, it is shown by our experimental results, with the taxonomy information, algorithm k-todes devised in this paper significantly outperforms the prior works in both the execution efficiency and the clustering quality as measured by information gain, indicating the usefulness of category-based adherence in market-basket data clustering.