Association Rule Learning: Uncovering Hidden Relationships in Data

Association rule learning, a rule-based machine learning method, serves as a valuable tool for uncovering hidden patterns in large datasets. It is particularly useful in domains where understanding the relationships between different items can lead to actionable insights. This article explores the fundamental concepts, algorithms, and applications of association rule learning.

Introduction to Association Rule Learning

Association rule learning, often referred to in the context of association learning, is a data mining technique designed to uncover associations between categorical variables in data. It is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. The algorithm’s goal is to dig into large amounts of data and discover interesting relations between attributes.

Core Concepts

Association learning is based on the concept of rules, which are implications of the form X → Y, where X and Y are disjoint itemsets.

Association rules are structured as “if-then” statements. For instance, if a customer bought shaving cream, then they also bought razors. The “if-then” portions of association rules are often referred to as the antecedent (“if”) and the consequent (“then”). It is worth noting that association rules merely denote co-occurrence, not causal patterns. Based on the results of association rule mining, we cannot determine the cause of buying razors, merely that it is associated with purchasing shaving cream.

For example, a rule stating {Bread, Butter} → {Milk} indicates that customers who buy bread and butter also tend to buy milk.

Read also: Learning by Doing: The AEE Story

Key Metrics for Evaluating Association Rules

To evaluate the importance of each rule, several common metrics can be evaluated. For this article we will focus on only three: support, confidence, and lift.

Support: This is the proportion of transactions in the database that contain the itemset. Support is defined as how often this item set appears in a dataset. Support represents the percentage of items seen together in the association rule. For example, if the association rule has a support value of 5%, it means that 5% of the related items are seen together. Support is defined as the ratio of the number of times an item occurs in the transactions to the total number of transactions. This metric thus defines the probability of the occurrence of each individual item in the transactions. For example, in a retail store, 250 out of 2000 transactions over a day might include a purchase of apples. You can indicate a required minimum support threshold when applying the Apriori algorithm.
Confidence: This is a measure of the reliability of the inference made by the rule. Confidence represents the proportion of rules containing both the antecedent and the consequent. Confidence is a measure of the reliability or certainty of an association rule. It represents the conditional probability of the consequent item given the antecedent item. A higher confidence value indicates a stronger predictive power of the rule. The confidence metric identifies the probability of items or itemsets occurring in the itemsets together. For example, if there are two items in a transaction, the existence of one item is assumed to lead to the other. The first item or itemset is the antecedent, and the second is the consequent. The confidence is thus defined as the ratio of the number of transactions having both the antecedent and the consequent, to the number of transactions only having the antecedent. Extending the preceding example, assume that there are 150 transactions where apples and bananas were purchased together. This result indicates a 60% chance that an apple purchase then leads to a banana purchase. While confidence is a good measure of likelihood, it is not a guarantee of a clear association between items. The value of confidence might be high for other reasons. Reliability refers to how often the association rule is true. If the concurrency value of the association rule is 80%, it means that the rule is true when 80% of the related items are seen together.
Lift: This is the ratio of the observed support to that expected if X and Y were independent. Lift represents how much more likely the consequent is to occur when the antecedent is present as compared to when it is absent; it is the ratio of the confidence of a rule to the frequency of the consequent in the whole dataset. Win (Lift): Win shows how “interesting” the association rule is. If the lift value is greater than 1, it indicates that the association rule is above the predicted dependency and there is an interesting relationship. Lift is the factor with which the likelihood of item A leading to item B is higher than the likelihood of item A. This metric quantifies the strength of association between A and B. The high lift value indicates that the likelihood of apples and bananas being purchased together is 4.8 times higher than that of apples being purchased alone. Lift has three possible values -Lift = 1 - The probability of occurrence of A and B is independent of each other.Lift > 1 -It determines the degree to which A and B are dependent on each other.Lift < 1 - It tells us that A is a substitute for B, which means A has a negative effect on item B.Lift is a measure of the strength of an association rule. It indicates how much more likely the consequent item is to be purchased when the antecedent item is present, compared to its individual occurrence rate. A lift value greater than 1 indicates a positive relationship.

Common Algorithms for Association Rule Mining

There are several algorithms designed to efficiently find association rules in data. Different types of Association Rule Learning Association Rule Learning can be divided into three algorithms:

Apriori Algorithm

The Apriori algorithm is an unsupervised machine learning algorithm used for association rule learning. The Apriori algorithm will search the data for the most frequently occurring sets of items. Typically, the antecedent can contain any number of items while the consequent will contain only one item. Association rule learning is a data mining technique that identifies frequent patterns, connections and dependencies among different groups of items called itemsets in data. Introduced in 1994 by Rakesh Agrawal and Ramakrishnan Srikant the name, ‘Apriori’ acknowledges the prior knowledge of frequent itemsets that the algorithm uses in computation. The algorithm runs iterations over the data to identify k-itemsets, meaning k items that frequently occur together. It then uses the k-itemsets to identify the k+1 itemsets.

The Apriori algorithm relies on the insight that adding items to a frequently purchased group can only make it less frequent, not more. The process relies on the Apriori property that states that if an itemset appears frequently in a dataset, all its subsets must also be frequent. The Apriori algorithm is applicable to all kinds of datasets, especially those generated by transactional databases and it’s often used for market basket analysis to support recommendation systems. For example, when using an e-commerce platform that sells clothes and shoes a shopper searches for shoes and decides to add a pair of formal black shoes to their shopping cart. The shopper then notices that the interface recommends other items to purchase, like socks.

One of the biggest advantages of using the Apriori algorithm is its simplicity and adaptability. However, Apriori algorithms are not as efficient when handling large datasets. The multi-iteration process of itemset candidate generation can become computationally expensive and memory intensive. The algorithm first identifies the unique items, sometimes referred to as 1-itemsets, in the dataset along with their frequencies. Then, it combines the items that appear together with a probability above a specified threshold into candidate itemsets and filters out the infrequent itemsets to reduce the compute cost in further steps.

Read also: Vanderbilt University Alumni

Using the Apriori property, the algorithm combines frequent itemsets further to form larger itemsets. The larger itemset combinations with a lower probability are pruned. The algorithm repeats steps 1 and 2 until all frequent itemsets meeting the defined threshold probability are generated exhaustively.

FP-Growth Algorithm

Frequent Pattern (FP) Growth Algorithm: This is an improvement over the Apriori algorithm that uses a special data structure called an FP-tree to store the database in a compressed form. The FP-Growth algorithm stands for Frequent Pattern, and it is the improved version of the Apriori Algorithm. The FP-Growth Algorithm is an alternative way to find frequent item sets without using candidate generations, thus improving performance. It uses a Divide-and-Conquer strategy and the core of this method is the usage of a special data structure named Frequent-Pattern Tree (FP-tree), which retains the item set association information. The purpose of this frequent tree is to extract the most frequent patterns.

Eclat Algorithm

Eclat algorithm stands for Equivalence Class Transformation. While the Apriori algorithm works in a horizontal sense imitating the Breadth-First Search of a graph, the ECLAT algorithm works in a vertical manner just like the Depth-First Search of a graph. It performs faster execution than Apriori Algorithm.

Applications of Association Rule Learning

Association rule analysis is widely used in retail, healthcare, and finance industries. One of the most common applications of the Apriori algorithm is performing market basket analysis. Retailers analyze customer purchase history and optimize the way stores are laid out by placing frequently purchased items near each other or on the same shelf. E-commerce platforms use Apriori algorithms to study product-based relationships based on user preferences and purchase pattern mining analysis to create efficient customer recommendation systems. The Apriori algorithm can be used to find strong association rules between symptoms and diseases to improve the efficiency of diagnosis and devise targeted treatment plans. For example, which patients are likely to develop diabetes or the role that diet or lifestyle play in disease. Apriori algorithms are also applicable in nontransactional databases. Another common application of the Apriori algorithm is to identify fraudulent patterns in financial transactions.

Market Basket Analysis

It is one of the most popular examples of Association Rule Learning. By identifying associations between items, it provides valuable insights into customer behavior and preferences. Retailers analyze customer purchase history and optimize the way stores are laid out by placing frequently purchased items near each other or on the same shelf. E-commerce platforms use Apriori algorithms to study product-based relationships based on user preferences and purchase pattern mining analysis to create efficient customer recommendation systems.

Read also: Supporting South High Students

Recommendation Systems

In this study, we will try to create an arl-based recommendation system for the data of the e-commerce site. Recommendation systems are usually kept ready in databases and product recommendations are made according to the relevant products. In this study, we will examine how to create a recommendation from scratch.

Medical Diagnosis

Association rules can also be used in other fields, for example, in medical diagnosis, understanding which symptoms tend to co-morbid can help to improve patient care and medicine prescription. The Apriori algorithm can be used to find strong association rules between symptoms and diseases to improve the efficiency of diagnosis and devise targeted treatment plans. For example, which patients are likely to develop diabetes or the role that diet or lifestyle play in disease.

Fraud Detection

Another common application of the Apriori algorithm is to identify fraudulent patterns in financial transactions. Banks can swiftly detect and prevent fraudulent conduct by spotting odd or suspicious trends in client transaction data.

Association Rule Mining in R

To conduct association rule mining in R (R Core Team 2023), we will use the apriori() function from the arules package (Hahsler et al. 2023). Within the arules package there is a dataset called Groceries. This dataset represents transactions at a grocery store for 1 month (30 days) from a real world grocery store. There are 9,835 transactions across 169 categories.First, we will load the arules package. Then, we will load in the Groceries dataset. Let’s take a look at the Groceries dataset. The Groceries dataset is of class “transactions”. This class of object is specific to the arules package. The functions in the arules package used to conduct association rule mining specifically take objects of class “transactions”. To convert an object to the class “transactions”, the transactions() function can be used. Objects of this class have a unique structure. This will print a data frame of all the items and their information contained in the Groceries object. This shows us that there are three variables called labels, level2, and level1. The labels variable contains all of the items; level2 shows what category those items are in, and level1 shows the category that level2 is in.

The apriori() function in the arules package will allow us to apply the Apriori algorithm to these data. This function defaults to mining only rules with a minimum support value of 0.10, a minimum confidence value of 0.80, and a maximum of 10 items in the transactions. In order to prevent the function from running for too long, it will also time out checking for subsets after 5 seconds. Transaction lengths refer to the entire itemset, including both the antecedent and the consequent. This function defaults to a minimum transaction or itemset length of 1, meaning that a rule could be returned containing only the antecedent. sorting and recoding items … creating transaction tree … writing … creating S4 object … Using the default setting, this returns 0 relevant rules. Let’s adjust some of the default settings and see how this affects the model. Within the apriori() function we will pass a named list to the argument called parameter. In this list, we will adjust minimum support values with an argument called supp, minimum confidence values with an argument called conf, and the maximum length of rules with an argument called maxlen. sorting and recoding items … creating transaction tree … writing … creating S4 object … This returns a set of 15 rules. In the output of rules2 there are columns labeled rhs (right hand side) and lhs (left hand side). These refer to the antecedent (lhs) and consequent (rhs). The first rule can be read as “if curd and yogurt were purchased, then so was whole milk.”The way we specified the apriori() function for rules2 allowed it to search through all possible antecedent and consequent combinations. If we want to, we can also define which items we would like to have in the antecedent and which items should be in the consequent. A list of possible values can be given for consideration in the antecedent. The algorithm will search through possible combinations, ranging from no items to all items. A list of possible values will also be given for consideration in the consequent, however only one item can be present in the consequent within a single rule. The algorithm will produce a set of rules in which the consequents will contain one of the items from the list provided and the antecedents will contain any combination of items from the list provided. In the rules_3 object, we will use the appearance argument of the apriori() function to set these values. sorting and recoding items … creating transaction tree … writing … creating S4 object … It is possible for redundant rules to be produced. No rules were returned as redundant.

Bootstrapping Association Rule Mining

Applying association rule mining to a sample provides insight into the association rules and pairings present in a particular sample. However, in many analyses, we are more interested in generalizing to other samples. We can apply a bootstrapping approach to assess the reproducibility of the rules we uncovered. For this example, we will generate 1,000 bootstrapped samples by randomly sampling (with replacement) from the original dataset. Then, on each bootstrapped sample we will apply the Apriori algorithm. We investigate the association rules found in each sample and retain only those rules that appeared in 90% or more of the bootstrapped samples. Association rule strength metrics (confidence, lift, and support) can then be computed on each bootstrapped sample. From this, we can calculate the mean and 95% confidence interval for these metrics.

Bootstrapping the apriori() function

To bootstrap the apriori() function, we can set up a function called bootapriori(). This function will accept seven arguments:data (dataset)rhs (consequent)lhs (antecedent)confidence (minimum confidence)minlen (minimum rule length)maxlen (maximum rule length)supp (minimum support value)First, the function will produce a bootstrapped sample using an embedded function called bootstrapsample(). This function will sample with replacement and create a new sample with only those cases. Using the data produced by resampling with replacement, the apriori() function is then applied. We could apply this function 1,000 times using a process such as a for() loop or the replicate() function. However, this could take a very long time to run. sorting and recoding items … creating transaction tree … writing … creating S4 object … In this test we found 37 rules. This process did not take very long, however, to do this 1,000 times would take much longer. To speed this process up, let’s run this function using parallel processing.

Parallel Processing

Parallel processing allows for a large task to be split into smaller tasks efficiently distributed throughout the CPUs of a machine. The parallel package is included with R and contains a set of functions that can easily parallelize most processes. To parallelize this process, we use the clusterExport() function. In order to use this function, we need to save all information that will be necessary for the bootapriori() function into the Global Environment. This includes all arguments being passed to bootapriori(). Then, we will define the number of clusters to use with the makeCluster() and the detectCores() functions. By bootstrapping association rule mining, we can gain insight into which rules would replicate over and over again if we were to keep resampling from the population. This gives a better idea of whether or not a rule is an association that happens to appear in the given data, or is one that actually exists in the population. These are the rules that appeared in more than 90% of the bootstrapped samples. The first row indicates two rules. Since the antecedent is an empty set, we can interpret this as “no matter what other items were bought, other vegetables were bought” and “no matter what other items were bought, whole milk was bought.” To avoid getting rules with an empty antecedent set, simply set the minimum rule length to 2. For the next row, two more rules are given: “if butter was bought, then other vegetables were bought” and “if butter was bought, then whole milk was bought.”Since we have bootstrapped samples, we can also calculate the mean and 95% confidence intervals for any metric of interest. Here only the first 10 rows are displayed. To see all rows simply add %>% print(n = Inf). For the first rule, “if butter was bought, then so were other vegetables,” we can interpret the mean confidence value as the percentage of transactions that supported this rule. The confidence intervals can be interpreted as follows: We are 95% confident that the true percentage of transactions supporting this rule falls between 36.0% and 36.2%.

Advantages and Limitations

One of the key advantages of Association Rule Learning is its ability to uncover hidden patterns and associations that may not be apparent through traditional analysis methods. It helps businesses understand which items are frequently purchased together, enabling them to optimize their product placement, cross-selling, and promotional strategies. One of the biggest advantages of using the Apriori algorithm is its simplicity and adaptability.

However, Association Rule Learning also has its limitations. One limitation is that it can generate a large number of rules, including some that may not be meaningful or actionable. Filtering and selecting the most relevant rules become crucial to avoid information overload and ensure practicality. Apriori algorithms are not as efficient when handling large datasets. The multi-iteration process of itemset candidate generation can become computationally expensive and memory intensive.

tags: #association #rule #learning #explained