In the first tip of this series we explained how to use Analysis Services queries as data sources for INSERT INTO training statements. Now we'll present a way to build accuracy charts fro such models using Excel.
This article assumes that you followed the steps in the first tip of this series. Therefore, you already have a decision trees mining model based on the cases in Cluster 1 of the TargetMailCL model. We'll explain how to invoke the stored procedure that builds the accuracy chart and how to display it in Excel. With minimal changes the example may be modified to apply to last week's tip or to other models.
As the previously built model is based exclusively on cases that belong to Cluster 1, we want to display the accuracy charts only for those cases.
Overview of the GenerateLiftTableUsingDatasource System Stored Procedure
The accuracy chart stored procedure, GenerateLiftTableUsingDatasource, is a built-in stored procedure (it is available upon installation of Analysis Services). It takes as parameter a single string, but the string is expected to be the XML serialization of structures defined as below:struct LiftQueryInfo
public string ModelName;
public string PredictColumnName;
public string PredictColumnContentType;
public string PredictValue;
public string PredictMin;
public string PredictMax;
public string Query;
The most important element in this structure is the Query string. This describes the prediction query to be used in measuring the lift for this model. The stored procedure will measure how well the model predicted the target column for all the rows indicated by the query. The requirement for that query in the context of the accuracy chart stored procedure is that it should return a flattened rowset containing:
A typical query (generated by the accuracy chart designer) looks like:
- The actual (test) value for the target column, under the name _Actual
- The predicted value for the target column, under the name _Predicted
- The confidence (probability) for the prediction, under the name _Prob
T.[Bike Buyer] AS _Actual,
PredictProbability([Bike Buyer]) AS _Prob,
[Bike Buyer] AS _Predicted,
OPENQUERY(...) AS T
The OPENQUERY part of the query above can be changed to any other query, including:
- a stored procedure call
- another Analysis Services statement that returns input cases for the target the mining model
Getting the accuracy data
We will change the OPENQUERY clause described above to another prediction statement, which returns only those cases that fit into the Cluster 1 of the source model (TargetMailCL). The newly generated query is too verbose to be included here, but it can be downloaded from here.
Once you download the query, you can execute it inside SQL Server Management Studio:
The execution may take a few moments, depending on the speed of your computer.
- Double-click the AccChartQuery.dmx file (this will load the file into SQL Server Management Studio).
- Connect to the server instance where the first tip was deployed.
- Connect to the SampleProject catalog (which contains the mining model used in the first tip).
- Execute the query.
Displaying the accuracy chart in Excel
Now we get to the actual goal of this exercise. To display the accuracy chart in Excel once you get the results above, follow the next steps:
You should now see an accuracy chart that looks like this:
- Right-click in the data grid that displays the results and choose "Select All".
- Right-click again, and select "Copy".
- Launch Microsoft Excel.
- Paste the copied information in Excel (Ctrl-V, usually)
- Select the Percentile and the Value columns of the result.
- Select the Insert\Chart menu item and choose the Line chart type.
The Value line represents the actual accuracy of the model. The Percentile line, can be thought of as representing an ideal model in this chart (an ideal model would have 100% correct predictions after seeing 100% of the cases).