SQL Server Data Mining has a powerful associative prediction model allowing for easy creation of predictions based on, for example, shopping cart contents. These predictions can easily be implemented using the PredictAssociation function on a model* as such
SELECT
PredictAssociation([MovieModel].[Movies],5)
From
[MovieModel]
NATURAL PREDICTION JOIN
(SELECT (SELECT '28 Days' AS [Movie]) AS [Movies]) AS t
The statement above will always generate five predictions for the movie ’28 Days’. In this case the results are:

Now say for example, the user adds the movie “What’s New, Pussycat” to their basket. You generate a new query like this:
SELECT
PredictAssociation([MovieModel].[Movies],5)
From
[MovieModel]
NATURAL PREDICTION JOIN
(SELECT (SELECT 'What''s New, Pussycat' AS [Movie]
UNION SELECT '28 Days' AS [Movie]) AS [Movies]) AS t
You fetch the results and amazingly, you get exactly the same results as before! How is this possible? Well, let’s explain. The associative prediction engine in SQL Server Data Mining assumes that if you ask for five predictions, you want to show five predictions. Whenever the engine has no rules based on the input, or not sufficient results based on the input, it pads the results with the most popular items. In this case, the model had no rules for either “28 Days” or “What’s New, Pussycat”, so it returned the top results selected by anyone.
In many cases this behavior is perfectly acceptable. For example, if you have an empty basket, you still want some recommendations – and the most popular items are a good bet. The same may be true for rare items that nobody else purchased – you may still want to show something, so this list is better than no list.
However, in other circumstances, you may only want to show items that are definitely correlated to the input – if for no other reason, just so you know that the suggestions are actually based on the basket contents or not. In this case you can use a feature called the “PredictNodeId” to filter out the padded results.
The PredictNodeId represents the piece of mining model content that supports any particular prediction. If you were to look up the node id in the model’s content you can determine why any particular prediction was made. It turns out, if the prediction was made simply due to an item’s popularity and not based on any learned information, that PredictNodeId will be empty. Therefore, we can use a parameter of PredictAssociation called INCLUDE_NODE_ID, and filter out predictions without rules like this:
SELECT
(SELECT * FROM PredictAssociation([MovieModel].[Movies],5,
INCLUDE_NODE_ID)
WHERE $NODEID<>'')
From
[MovieModel]
NATURAL PREDICTION JOIN
(SELECT (SELECT 'What''s New, Pussycat' AS [Movie]
UNION SELECT '28 Days' AS [Movie]) AS [Movies]) AS t
Voila! Now the query above returns no results, because the model has no rules based on either input movie. Likewise, the following query, for the movie “1941”, returns one result because a rule does exist for it.
SELECT
(SELECT * FROM PredictAssociation([MovieModel].[Movies],5,
INCLUDE_NODE_ID)
WHERE $NODEID<>'')
From
[MovieModel]
NATURAL PREDICTION JOIN
(SELECT (SELECT '1941' AS [Movie]) AS t
To show that it is still possible for the “top five” to appear in a recommendations list, you can simply add “Star Wars” to the above query and you will see that “The Empire Strikes Back” – a top five movie – is returned in addition to the result generated by “1941”. If you want to find out what the actual rules are for each prediction see this tip for an explanation.
* The model created for this example used the MovieClick data available on this site. The algorithm was Association Rules with the parameters MINIMUM_PROBABILITY set to 0.1 and MINIMUM_SUPPORT set to 3.