Lack of fit is often used to test whether a data mining model is statistically significant. This article and its accompanying download shows how to calculate the lack of fit for data mining models built with Microsoft SQL Server 2005. To apply this test, the target model should have at least one continuous predictable attribute, and it should be based on one of these algorithms: Microsoft Linear Regression, Microsoft Logistic Regression, Microsoft Decision Tree and Microsoft Neural Network.
We recommend that you read this brief overview of Lack of Fit before you proceed with the rest of this article.
How to Calculate Lack of Fit? Simple, use a stored procedure
We decided that a good way to showcase the extensibility of the SQL Server 2005 Data Mining platform was to develop a stored procedure to calculate the lack of fit metric. We have put together an accompanying package for this article that contains the following:
A sample dataset that you should attach to your SQL Server 2005 Database Engine
A sample SQL Server Analysis Services project that helps you build a linear regression model. You need to deploy the project onto your Analysis Services 2005 server
A stored procedure to calculate lack of fit for data mining models
A Windows application that demonstrates how to call the stored procedure. To build this project, you need to add a reference to Microsoft.AnalysisServices.AdomdClient.dll located in “C:\Program Files\Microsoft.NET\ADOMD.NET\90”.
Let's walk you through the process of using the pieces in the package to explore the Lack of Fit sample.
First, build and deploy the stored procedure (note that this version only supports models that allow drill through):
- Open the project in the LackOfFitStoredProcedure with Visual Studio .Net (Whidbey)
- Add a reference to the msmgdsrv.dll (Microsoft.AnalysisServices.AdomdServer.dll) class library
This class library is available in the location where Analysis Services 2005 is installed By default, this location has the form:
C:\Program Files\Microsoft SQL Server\MSSQL.2\OLAP\bin
Please note that, depending on your installation options, MSSQL.2 might be MSSQL.1 or MSSQL.3 or so
- Build the project
- Deploy the stored procedure on the server:
- Open SQL Server Workbench
- Connect to the target Analysis Services server
- In the Object Explorer, select the 'Assemblies' node at the server level
- Right-click and select 'New Assembly'
- For the FileName, field, use the Browse (...) button to navigate to the location where the stored procedure was built. A file named LackOfFitHelper.dll should be available at that location, under the bin\debug or bin\release folder. Select that file
- Assembly Name should be automatically filled with LackOfFitHelper
- Set the permission of this assembly to unrestricted
- Select OK and deploy the stored procedure
To call the stored procedure, you can execute the following command in SQL Server Management Studio:
· CALL LackOfFitHelper.LackOfFitHelper.LackOfFit.CalculateLackOfFitFromDrillThroughModel(, )
o is the name of the target model
o is the name of the target attribute
Now, go ahead and deploy the "LinearRegressionModel" Analysis Services project. This project creates a simple linear regression model with two integer attributes x and y, where x is used as input and y as output.
Finally, you can build and run the C# application CalculateLackOfFit in the package. This application demonstrates how to call the stored procedure to calculate lack of fit metric.
The following figure shows the interface of this application:
From the interface, you can specify a server name and press the Connect to server button to establish the connection to the Analysis Services server. The application will prompt a list of valid catalogs, models and target attributes on the screen. After selecting the desired model and target attribute, you can then click Calculate Lack of Fit button to check the result. The application calls the stored procedure to calculate the statistics. After fetching the result, it shows the statistics in the text boxes on the bottom of the screen.