Let Us Count the Ways
A common
customer question is “How do I get data mining results into SQL tables?” All
of the following methods work:
1.
Use the Save button from the Prediction
Query Builder in BI Development Studio. Note that this only works in the interactive,
single-user case - it cannot be used programmatically.
2.
Build an SSIS package containing
a data flow task that terminates in a SQL Server Destination, with a Data
Mining Query Transform in the middle.
3.
Create a linked server on the
target SQL Server relational database server that points to the Analysis Server
hosting your mining models.
This tip
will dive into the third way. We will show you how to use a linked server to
execute DMX queries from the SQL Server relational engine; once you can do
that, it’s simple to manipulate the data to your heart’s content using T-SQL
functions and save it to a table using SELECT-INTO.
Linking SQL Server to Analysis Services
Establish
a link to an AS server as follows:
EXEC sp_addlinkedserver
@server='LINKED_AS', -- local SQL name
given to the linked server
@srvproduct='',
-- not used (any value will do)
@provider='MSOLAP', -- Analysis
Services OLE DB provider
@datasrc='localhost', -- Analysis Server
name (machine name)
@catalog='MovieClick' -- default
catalog/database
Alternatively,
you can use SQL Server Management Studio to create the linked server:
-
Connect to your
SQL Server Database Engine instance using SQL Server Management Studio.
-
Expand the
Server Objects node under the top level node for your server in Object
Explorer.
-
Right-click on
Linked Servers and select “New Linked Server …”.
-
Enter a name for
the Linked server (“LINKED_AS”).
-
The “Other data
source” radio button should be selected by default for Server type.
-
Select the
latest version of the Microsoft OLE DB Provider for Analysis Services (10.0
for SQL Server 2008) for Provider.
-
Enter any name
in the Product name field (“DM” will work).
-
Enter the name
of the Analysis Server instance hosting your mining model(s) in the Data
source field.
-
Enter the name
of the AS database containing the model(s) you want to query in the
Catalog field.
-
Click OK.
(Note:
You can generate the “EXEC spaddlinkedserver”
statement by right-clicking on the newly-added linked server and selecting “Script
Linked Server as …”.)
Running DMX Queries from T-SQL
Now you can
do data mining queries from the SQL Server relational engine with T-SQL and insert the
results into a SQL table using OPENQUERY like this:
SELECT *
INTO DMResults FROM
OPENQUERY(LINKED_AS,
'SELECT Cluster() AS [Cluster], ClusterProbability() AS
[Prob]
FROM
[Customers - Clustering]
NATURAL
PREDICTION JOIN
OPENQUERY([Movie Click],''SELECT * FROM Customers'') AS t')
Of course,
you can also do all kinds of manipulations on the results that may not have been
possible in straight DMX. For example, finding the average cluster probability
of each cluster becomes simple. Well, almost – since the data type returned by
the Cluster() function is something that GROUP BY does
not support, you have to do some casting first. The actual query would
look like this:
SELECT Cluster,
AVG(Prob) FROM
(SELECT CAST(Cluster
AS Char(30)) AS Cluster, Prob FROM OPENQUERY(LINKED_AS,
'SELECT Cluster() AS [Cluster],
ClusterProbability() AS [Prob] FROM [Customers - Clustering]
NATURAL
PREDICTION JOIN
OPENQUERY([Movie Click],''SELECT * FROM Customers'') AS t')
) AS t
GROUP BY
Cluster
That will
give you a nice result showing you, in a way, the affinity of each cluster
based on the input set. That is, if you ran such a query against the
training data, you could say that the clusters with a higher probability are
"tighter" than the ones with low probabilities.
And this is
is just the starting point. We are sure you will unleash your creativity and
SQL skills to come up with page-long T-SQL queries on top of DMX.
Key Things to Remember
When you’re putting together the DMX to embed in
the OPENQUERY from SQL Server, keep the following in mind:
1. Double
your single quotes.
2. Flatten
nested results. Even if you are not explicitly including a sub-select or
table-returning expression in your query, remember that a simple “SELECT *”
might include a nested table. Example: “SELECT * FROM model.CONTENT”.
3. Alias
function calls like PredictProbability(). If you have multiple function
calls in your DMX statement, they will all have a default column name of
"Expression" and T-SQL will complain about that.