R-Squared
(a.k.a. Coefficient of Determination) is a well-known metric that measures
the goodness fit of your regression model. Its value lies between 0 and 1. The
closer R-Squared is to 1, the better your model is with respect to the training
data.
R-Squared
can be calculated as:
R-Squared = 1 – RSS / TSS
where RSS is the
residual sum of squares, and TSS is the total sum of squares.
Another
interesting measure is Adjusted
R-Squared, which adjusts the R-Squared according to the number of
explanatory terms in a model.
This
article shows you how to compute both these measures for Microsoft mining
models using our favorite mechanism – stored procedures!
Calculate R-Squared
with a stored procedure
We
have put together an accompanying
package for this article that contains the following:
- A sample dataset that you
should attach to your SQL Server 2005 Database Engine instance.
- A sample SQL Server Analysis
Services project that helps you build a linear
regression model. You need to deploy the project to your Analysis Services
2005 server instance.
- A stored procedure to calculate
R-Squared for data mining models.
- A Windows application that
demonstrates how to call the stored procedure. To build this project, you
need to add a reference to Microsoft.AnalysisServices.AdomdClient.dll
located in “%ProgramFiles% \Microsoft.NET\ADOMD.NET\90”.
Let's walk you through the process of using the pieces in the
package to explore the R-Squared sample.
Step 1:
First,
build and deploy the stored procedure (note that this version only
supports models that allow drill through):
-
Open the project in the
RSquareHelper with
Visual Studio 2005.
-
Add a reference to the
msmgdsrv.dll (Microsoft.AnalysisServices.AdomdServer.dll) class library
This class library is available in the location where Analysis Services
2005 is installed By default, this location has the form:
C:\Program Files\Microsoft SQL Server\MSSQL.2\OLAP\bin
Please note that, depending on your installation options, MSSQL.2 might be
MSSQL.1 or MSSQL.3 or so
-
Build the project
-
Deploy the stored procedure on
the server:
-
Open SQL
Server Management Studio
-
Connect to
the target Analysis Services server instance
-
In the Object
Explorer, select the 'Assemblies' node at the server level
-
Right-click
and select 'New Assembly'
-
For the
FileName, field, use the
Browse (...) button to navigate to the location where the stored
procedure was built. A file named RSquareHelper.dll should be available
at that location, under the bin\debug or bin\release folder. Select that
file
-
Assembly Name
should be automatically filled with
RSquareHelper
-
Set the
permission of this assembly to unrestricted.
-
Select OK and
deploy the stored procedure
To
call the stored procedure, you can execute the following command in SQL Server
Management Studio:
CALL
RSquareHelper.RSquareHelper.RSquare.CalculateRSquareFromDrillThroughModel (<modelName>, <targetAttributeName>)
o
<modelName> is the name of the target model
o
<targetAttributeName> is the name of the target
attribute
Step 2:
Now, go ahead and deploy the "RSquareLinearRegression"
Analysis Services project. This project creates a simple linear
regression model with two integer attributes x and y, where x is used as input
and y as output.
Step 3:
Finally, you can build and run the C# application CalculateRSquare in
the package. This application demonstrates how to call the stored procedure to calculatethe R-Squared metric.
The
following figure shows the interface of this application:

From
the interface, you can specify a server name and press the Connect to server
button to establish the connection to the Analysis Services server. The
application will display a list of valid catalogs, models and target attributes.
After selecting the desired model and target attribute, you can then click Calculate
R-Squared button to check the result. The application calls the stored
procedure to calculate the statistics. After fetching the result,
it shows the statistics in the text boxes at the bottom of the screen.