This article describes two stored procedures for the Analysis Server. The first one is to calculate the covariance matrix between all continuous attributes in the selected data mining model. The second one is to calculate the correlation matrix between all continuous attributes in the selected data mining model. In both cases it does not matter if the attributes are inputs or outputs of the model, the stored procedures ignore that fact.
The stored procedures are to be used with any model as the only parameter. The stored procedures will not work if any of the following is true:
- One or more columns contain nested tables
- The model does not allow drill through
- The model is not processed
Even if those conditions are met the stored procedures will only return a result for the continuous attributes. If none of the attributes are continuous the resulting matrix will be empty.
The stored procedures will return a table with three columns. The first two columns indicate the name of the attributes that were used to calculate the covariance/correlation. This table is a flatten version of a square matrix on size nxn where n is the number of attributes. Since covariance/correlation matrixes are symmetric the result only contains one half of the matrix (including the diagonal). This can be done because COV(X, Y) = COV(Y, X) so we only need one result. The diagonal or COV(X, X) represents the variance of X and is included in the results.
Deploying the stored procedure
The stored procedures are created using C# as a class library project. The project contains only 1 file: “MatrixRelationships.cs”. To create the library file with the stored procedures follow the next steps:
- 1. Create a new C# class library Project called MatrixRelationships
- 2. Download the MatrixRelationships.cs file into the MatrixRelationships directory
- 3. Add the source file to the project by pressing Shift+Alt+A (Add Existing Item) and delete the default Class1.cs element from the Solution Explorer
- 4. Add msmgdsrv to the references, select Project->Add Reference->.NET->Microsoft.AnalysisServices.AdomdServer
- 5. Build the project (Ctrl+Shift+B by default)
- 6. The dll file will be inside the bin\debug directory in the project
Once the project is built and the dll created we are ready to deploy the stored procedures into the Analysis Server. To deploy the stored procedures follow the next steps:
- 1. Open the Microsoft SQL Server Management Studio
- 2. Connect to the Analysis Server where you wish to deploy
- 3. On the object explorer (F8) and expand the server branch, you will see Databases and Assemblies, right click Assemblies
- 4. Select New Assembly and the Register Server Assembly will pop up, set the parameters as follows:
- a. Type: .NET Assembly
- b. File Name: Location of dll file with stored procedures
- c. Assembly Name: You can use the default name (MatrixRelationships) or you can set a new name here (The name should not be a duplicated assembly name)
- d. Include Debug Information: Check
- e. Permissions: Safe (by default)
- f. Impersonation: Use the credentials of the current user
- 5. The stored procedures are ready to be used
Executing the stored procedure
To execute the stored procedures you need to open a new Analysis Services DMX Query. To do this select File->New->Analysis Services DMX Query.
Once the DMX query window is open type the following command:
CALL MatrixRelationships.GetCovarianceMatrix( model name)
CALL MatrixRelationships.GetCorrelationMatrix( model name)
Note: MatrixRelationships will be substituted with whatever name was selected in Assembly Name.
The list of available models will show in the DMX query window under mining model.
Debugging the stored procedure
If the user wants to debug the stored procedures follow the next steps:
- 1. Open the project in Visual Studio
- 2. Deploy the stored procedures
- 3. In Visual Studio select Debug->Attach To Process
- 4. From available processes select msmdsrv.exe
Covariance matrix algorithm is based on the following formula:
The algorithm works as follows:
- 1. Extract continuous attributes from the model
- 2. Verify that:
- a. the model is processed
- b. the model supports drill through
- c. the model does not contain nested tables
- 3. Extract mean and variance for each attribute
- 4. Obtain the data from the model (all the Xi and Yi)
- 5. Create a triangular matrix to store the partial results for each X,Y combination
- 6. For each row in the data set from the model:
- a. Update the number of rows processed
- b. Obtain the values for each column and subtract the mean
- c. Update all combinations of (X,Y) by adding the appropriate
- d. If the value of a column is null for a given row, a 0.0 will be inserted (no covariance)
- 7. Divide every combination of (X,Y) in the triangular matrix by the total number of rows processed
- 8. Flatten the triangular matrix into a 3xM matrix where M is the number of continuous attribute combinations, given by where n is the number of continuous attributes.
Correlation matrix algorithm is based on the Pearson product-moment coefficient which is given by the following formula:
The algorithm to calculate the correlation is the same as the covariance plus it divides every (X,Y) combination in the triangular matrix by the product of X and Y standard deviation and the standard deviation is the square root of the variance.