Market Basket Analysis in R

Introduction

Hundreds and thousands of transactions occur every day in a supermarket, while a customer would buy multiple products in each transaction. For example, it may look like this in the database: {Transaction1: Product1, Product3, Product4, Product8, Product9}. In a larget data set of transactions, purchase patterns, i.e. some products are always bought together, can be examined based on product association analysis, also called Market Basket Analysis.

Key Concepts of Market Basket Analysis

The basics of market basket analysis
The basics of market basket analysis, source: UofT

Basically, there are four key concepts of Market Basket Analysis:

  • Rule: a rule expresses the incidence across transactions of one set of items as a condition of another set of items, i.e. X => Y;
  • Support: the support for a set of items is the proportion of all transactions that contain the set;
  • Confidence: the support for the co-occurrence of all items in a rule, conditional on the support for the left-hand set alone;
  • Lift: the support of a set conditional on the joint support of each element;

These three measures tell us different things. When we search for rules we wish to exceed a minimum threshold on each: to find item sets that occur relatively frequently in transactions (support), that show strong conditional relationships (confidence), and that are more common than chance (lift).

Example

The following is an example of how to do Market Basket Analysis using R with a large data set from a Belgian supermarket chain (http://fimi.uantwerpen.be/data/retail.dat).

Code

# Loading data  
retail.raw <- readLines("http://fimi.ua.ac.be/data/retail.dat") 

# Preparing data 
retail.list <- strsplit(retail.raw, " ") 
names(retail.list) <- paste("Trans", 1:length(retail.list), sep="") 
retail.trans <- as(retail.list, "transactions") 

# Loading "arules" package to use "apriori" algorithm 
library(arules) 

# Loading "arulesViz" package to visuliza the result 
library(arulesViz) 

# Run "apriori" algorithm 
retail.rules <- apriori(retail.trans, parameter=list(supp=0.001, conf=0.4)) 

# Show the top 50 rules by "lift" 
retail.hi <- head(sort(retail.rules, by="lift"), 50) 

# Plot the first top 50 rules 
plot(retail.hi, method="graph", control=list(type="items"))

Result

Top 50 rules of market basket analysis
Top 50 rules of market basket analysis

As shown above, items 696 and 699 form a tight set; there are item clusters for {3402, 3535, 3537}, {309, 1080, 1269, 1378, 1379, 1380}, and so forth; and item 39 appears as a key item in two sets of items that otherwise do not overlap.

Applications

Market Basket Analysis can contribute to supermarkets in various ways. First, supermarkets can use the output to optimize physical layouts in the stores. For example, supermarkets should put products of 3402, 3535, and 3537 together according to the analysis. Second, some creative marketing strategies could be made accordingly. Supermarkets put a discount promotion on product 696, for instance, while increase the price of product 699.

Reference:
Chapman, C., & Feit, E. M. (2015). R for marketing research and analytics. Springer International Publish.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments