SQL Server Data Mining contains an Association Rules
browser, which displays the most important rules found inside a mining model.
However, for certain pattern exploration tasks, it may be useful to visualize
only those rules that point to a certain item.
An example: for a model* that predicts movie associations, extract
those rules that point to ‘Star Wars’. Even more, extract only rules with a probability
larger than .25 and containing ‘Indiana Jones’ or ‘Blade Runner’ on the left
All the rules (and item-sets) are exposed in the model
content, the rowset containing all the patterns detected by the model.
Content nodes that represent rules have a NODE_TYPE value of
8 (Item sets have NODE_TYPE = 7). Inside a rule node, the NODE_DISTRIBUTION
field contains the rule’s right hand side item (with a value type of 2, “Existing”)
as well as the identifier of the node representing the left hand side itemset
(value type of 10, NODE_UNIQUE_NAME).
Therefore, to get the results requested in the beginning, we
need a query that:
Selects the rule nodes (NODE_TYPE=8)
Filters only those that contain ‘Star Wars’ in NODE_DISTRIBUTION
Extracts the support, probability and other information about the rule
Applies a probability threshold ( >.25)
Filters only those that contain ‘Indiana Jones’ or ‘Blade Runner’
Sorts the results in descending order, by probability
And here is the query:
top 100 FROM
NODE_CAPTION AS [Rule],
<> null AND
> 0.25 AND
VBA!InStr(Rule, 'Blade Runner') > 0 OR
VBA!InStr(Rule, 'Indiana Jones') > 0
The result contains all the requested rules, with their
* The model created for this example
used the MovieClick data available on this site. The algorithm was
Association Rules with the parameters MINIMUM_PROBABILITY set to 0.1 and
MINIMUM_SUPPORT set to 3.