When you build a mining model using the Microsoft_Clustering algorithm, you can specify the number of clusters via the algorithm parameter CLUSTER_COUNT (set to 10 by default). You can also let the algorithm detect the number of clusters automatically by setting CLUSTER_COUNT to 0. In this case, you need to determine the number of clusters discovered by the model. This tip will demonstrate how to find the number of clusters by querying the mining model content.
As you might already know, the discovered content of a trained model can be retrieved either by fetching the MINING_MODEL_CONTENT schema rowset or by issuing the following DMX query:
SELECT * FROM [model name].CONTENT
If you browse through the tip "Layout of Clustering algorithm content", you will see that the root node in the content graph corresponds to the cluster model and its children are the nodes for the clusters themselves. This means that the cluster count is the CHILDREN_CARDINALITY of the root node, which always has a NODE_TYPE of 1. (Node types can be looked up here: [InstallDrive]:\Program Files\Microsoft SQL Server\90\SDK\Include\oledbdm.h.)
So all we need to do is modify the DMX query above to fetch just the CHILDREN_CARDINALITY column for the root node:
SELECT CHILDREN_CARDINALITY FROM [model name].CONTENT
WHERE NODE_TYPE = 1
Here is sample ADOMD.NET code for getting the cluster count:
conn = new AdomdConnection();
conn.ConnectionString = string.Format("Data Source=localhost;Initial Catalog=[Initial Catalog Name]");
conn.Open();
int clusterNumber;
using (AdomdCommand cmd = new AdomdCommand())
{
//Get the cluster number
cmd.CommandText = "Select [CHILDREN_CARDINALITY] from " +
"MyModel.Content where Node_Type=1";
AdomdDataReader rdr = cmd.ExecuteReader();
if (rdr.Read())
{
clusterNumber = rdr.GetInt32(0);
}
}