A common
request from the relational database gurus in the SQL Server Data Mining community
is to execute predictions from the SQL Server relational database engine – either
in batch mode or ‘on-the-fly’ while rows are being inserted into a table. In
this article, we’ll start with the basics of executing DMX queries in T-SQL and
walk you all the way to predicting values in real-time during the INSERT
operation.
Executing
prediction queries from the relational server
Before we
start, we'll state the assumptions we are making:
-
The
SQL Server 2005 Database Engine and SQL Server 2005 Analysis Services are installed as default instances on the same
machine
-
A
database named TestDB is deployed on the Analysis
Services instance
-
A
mining model named Iris is deployed in the [Test DB] Analysis Services database
-
The
mining model is intended to classify Iris flowers based on their sepal/petal
dimensions and has 5 columns:
o
Petal
Length (double, input)
o
Petal
Width (double, input)
o
Sepal
Length (double, input)
o
Sepal
Width (double, input)
o
Class
(text, predictable)
Let's now
go through the basic steps for setting up prediction query execution from T-SQL.
Step 1: Create a linked
server
The first
step is to create a linked server inside SQL Server pointing to the Analysis
Services instance. The linked server object is created with a statement like
below:
EXEC master.dbo.sp_addlinkedserver
@server =
N'DMServer',
@srvproduct=N'Analysis Services 2005',
@provider=N'MSOLAP',
@datasrc=N'localhost',
@catalog=N'TestDB'
GO
Upon
executing this statement, a linked server named DMServer
is registered on the database server. The linked server uses the MSOLAP OLE DB
provider to connect to the “localhost” instance of
Analysis Services.
SQL
Server 2005 Management Studio provides a friendly UI that allows you to create
a linked server and to set various linked server options, such as query
timeouts, impersonation rules and others. This is accessible under Server
Objects -> Linked Servers -> New Linked Server …
Step 2: Execute DMX
queries from T-SQL using OPENQUERY
Once the
linked server is created, it can be used for data mining (DMX) queries like this:
SELECT * FROM OPENQUERY(DMServer,
'select node_caption,
node_type from iris.content')
A
potential use of such queries is to use data mining to assign predicted labels
to all rows in a table and then fetch the rows together with the new label and
save them inside the relational server. Such a query typically looks like
below:
INSERT INTO
QueryResultsTable SELECT * FROM OPENQUERY(DMServer,
'select … FROM Modell
PREDICTION JOIN OPENQUERY…')
Executing prediction
queries from inside a trigger
Now that
we know how to execute basic data mining queries from the SQL Server 2005
relational server, let's look at the more advanced “on-the-fly” scenario.
For this
purpose, we’ll build a trigger that is launched whenever a row is added to a
certain table. The trigger will execute a data mining prediction and append the
prediction result to the newly inserted row.
Trigger
objects are well documented in the SQL Server 2005 books online. The only
problem for our task is executing the prediction query and using the result.
The problem resides in the fact that:
-
The
prediction query must usually be parameterized with the values of the other row
columns
-
The
OPENQUERY statement which is used for data mining queries cannot take
parameters or even string variables as the statement text
To
overcome this problem, we will use the EXEC statement.
We’ll
assume that a relational table named NewIris is
created on the SQL Server instance and the table has a set of columns matching
the mining model, i.e.:
-RowKey int (the identity column)
-PLength float (to be mapped to
model’s Petal Length column)
-PWidth float (to be mapped to
model’s Petal Width column)
-SLength float (to be mapped to
model’s Sepal Length column)
-SWidth float (to be mapped to
model’s Sepal Width column)
-PredictedClass – varchar(50) (to keep the prediction result)
The
trigger will be launched on inserting a new row and has to
a)
Collect
the values for the columns
b)
Author
a data mining prediction statement for the current row (a singleton prediction)
c)
Execute
the prediction
Inside
the trigger’s scope, the current row can be identified using the
@@identity variable.
Note that this variable is only available when the table has an identity column
(RowKey above). The column values for the current row
should be collected into a set of variables, so that they can be used later to
author the DMX query:
DECLARE
@newPLength varchar(15)
DECLARE @newPWidth
varchar(15)
DECLARE
@newSLength varchar(15)
DECLARE @newSWidth
varchar(15)
-- Select the newly added values
SELECT @newPLength =
dbo.FormatFloatForDMX(PLength),
@newPWidth = dbo.FormatFloatForDMX(PWidth),
@newSLength =
dbo.FormatFloatForDMX(SLength),
@newSWidth =
dbo.FormatFloatForDMX(SWidth)
FROM NewIris WHERE $identity=@@identity
The
$identity column represents the key column
of the table.
FormatFloatForDMX is a user-defined function which formats a float number
(or a NULL) so that it can be used inside a DMX statement. The function is
defined as shown below:
CREATE FUNCTION FormatFloatForDMX
(
@Value float
)
RETURNS varchar(20)
AS
BEGIN
DECLARE
@ResultVar varchar(20)
SELECT
@ResultVar = CASE
WHEN
@Value IS NULL
THEN
'NULL'
ELSE
CONVERT(VARCHAR(15), @value)
END
RETURN
@ResultVar
END
GO
Now, with
the column values collected, we can author the prediction query.
-- compose the Analysis Services
DMX query
DECLARE @DMQuery varchar(256)
SET @DMQuery = 'select [Class] from [Iris]
NATURAL PREDICTION JOIN' +
'(SELECT '+
@newPLength +
' AS [Petal
Length], '+
@newPWidth + ' AS [Petal Width], '+
@newSLength +
' AS [Sepal
Length], '+
@newSWidth + ' AS [Sepal Width] '+
') AS T'
The data
mining prediction query is ready to be executed against Analysis Services. Remember, the OPENQUERY
statement does not allow variables as arguments, so this query cannot be
executed directly. Besides, we need to collect the predicted result and insert
it into the NewIris table in the PredictedClass
column. To achieve this, we will use the SQL EXEC instruction.
--compose the OPENQUERY statement
DECLARE @FullQuery
varchar(512)
SET @FullQuery
=
'DECLARE
@predictedValue VARCHAR(50) ;'+
'SELECT
@predictedValue=[Class] FROM OPENQUERY(DMServer, ''' + @DMQuery +''');' +
'UPDATE
[NewIris] SET PredictedClass=@predictedValue WHERE $identity=' + CONVERT(varchar(15), @@IDENTITY)
-- execute the OPENQUERY statement
EXEC (@FullQuery)
As you
notice, the SQL statements above declare a string variable, @predictedValue, then populate the
value from the execution of an OPENQUERY statement against the DMServer linked server. After that, it sets the value of
the variable as the PredictedClass column value in
the NewIris table.
The whole
set of SQL instructions is grouped in a single string and executed in an EXEC instructions.
With
this, all the blocks required for defining the trigger are available and the
full trigger definition can be put together as shown below:
CREATE TRIGGER PredictClassForNewIris
ON NewIris
AFTER INSERT
AS
DECLARE
@newPLength varchar(15)
DECLARE @newPWidth
varchar(15)
DECLARE
@newSLength varchar(15)
DECLARE @newSWidth
varchar(15)
-- Select the newly added values
SELECT @newPLength =
dbo.FormatFloatForDMX(PLength),
@newPWidth =
dbo.FormatFloatForDMX(PWidth),
@newSLength =
dbo.FormatFloatForDMX(SLength),
@newSWidth =
dbo.FormatFloatForDMX(SWidth)
FROM NewIris WHERE $identity=@@identity
-- compose the Analysis Services query
DECLARE @DMQuery varchar(256)
SET @DMQuery = 'select [Class] from [Iris]
NATURAL PREDICTION JOIN' +
'(SELECT '+
@newPLength +
' AS [Petal
Length], '+
@newPWidth + ' AS [Petal Width], '+
@newSLength +
' AS [Sepal
Length], '+
@newSWidth + ' AS [Sepal Width] '+
') AS T'
--compose the OPENQUERY statement
DECLARE @FullQuery
varchar(512)
SET @FullQuery
=
'DECLARE
@predictedValue VARCHAR(50) ;'+
'SELECT
@predictedValue=[Class] FROM OPENQUERY(DMServer, ''' + @DMQuery +''');' +
'UPDATE
[NewIris] SET PredictedClass=@predictedValue WHERE $identity=' + CONVERT(varchar(15), @@IDENTITY)
-- execute the OPENQUERY statement
EXEC (@FullQuery)
GO
We can now
test the new trigger with the following statements:
INSERT INTO NewIris(PWIdth, SLength)
VALUES(2.5, 1)
GO
SELECT * FROM newIris
GO
And here's
the
result of the second query confirming the real-time prediction that occurs
during INSERT, as promised:
|
PLength
|
PWidth
|
SLength
|
SWidth
|
RowKey
|
PredictedClass
|
|
NULL
|
2.5
|
1
|
NULL
|
1
|
Iris-virginica
|