The knowledge gained by a
trained mining model can be retrieved in the form of a rowset that represents a
graph of nodes, where each row corresponds to one node. The rowset can be
fetched either by sending a Discover or schema rowset request for
DMSCHEMA_MINING_MODEL_CONTENT or by executing the "SELECT * FROM model.CONTENT"
DMX query.
Here we describe the layout of
the CONTENT rowset for the Microsoft_Clustering algorithm.
The Microsoft_Clustering algorithm
generates one root node and then one node for each cluster found.
The Cluster nodes appear as children of the root node.

The root node NODE_TYPE is 1 as in all other algorithms.
The description is ‘All’ and the
default Caption is ‘Cluster Model’.
The CHILDREN_CARDINALITY column
contains the number of clusters found by the algorithm.
The NODE_SUPPORT column contains
the total number of cases in the training set
The NODE_DISTRIBUTION contains one row for each attribute/value pair,
ordered by the attribute number then by state number.
The Support and probability / variance values in the distribution table
represent the attribute distribution in the training set (marginal support,
probability and variance)
The Cluster nodes have NODE_TYPE = 5 (in Adomd.Net this
corresponds to the MiningNodeType.Cluster enumeration
value).
The default NODE_CAPTION is ‘Cluster N’ where N is the 1-based index of
the cluster (it can be changed with UPDATE)
The NODE_PROBABILITY contains the probability for the current cluster
The NODE_SUPPORT contains the hard support for the current cluster
(number of cases in the training set that were classified as belonging to this
cluster)
The NODE_DISTRIBUTION contains one row for each attribute/value pair,
ordered by the attribute number then by state number. For each NODE_DISTRIBUTION
row:
- ATTRIBUTE_NAME is the name of the attribute
- ATTRIBUTE_VALUE is the state of the attribute
- VALUETYPE describes the value type
- SUPPORT is the support for the current
state/value in the current cluster
- PROBABILITY is the probability for the current
attribute state in the current cluster
- VARIANCE – defined only for continuous, variance
in side this cluster
The NODE_DESCRIPTION contains a natural language description of the
cluster as an enumeration of attribute states that favor that particular
cluster, in the order of importance,
such as:
PetalWidth >= 2.09,
Class =
Iris-virginica,
4.6 <=
PetalLength <= 6.64
The attribute descriptions are separated by a newline character