Under the Hood: How the Naive-Bayes Attribute Discrimination Viewer Gets Its Data
If you have ever wondered about how
the DM viewers get the data to display on the screen, read on.
In many cases, what is displayed in the data mining viewers is the result of
built-in stored procedures which allow the processing required for the view to
be done on the server without requiring all of the model content to be brought
to the server. In this "Under the hood" tip, we will dive into one of
those stored procedures.
Naive-Bayes Attribute Discrimination View
The attribute discrimination view for the Naive-Bayes algorithm shows the differences in the input attributes across the states of
an output attribute. It generally looks like this:

The viewer doesn’t download all of the correlations in the Naive-Bayes content,
rather it calls the stored procedure GetAttributeDiscrimination like this:
CALL System.GetAttributeDiscrimination
('Classify CollegePlans NB',
'100000005',
'Plans to attend',
1,
'All other states',
2,
0.0005,
true)
The not-so-obvious parameters are, in order strModel, strPredictableNode,
strValue1, iValType1, strValue2, iValType2, dThreshold, and bNormalize.
Let’s go through these parameters:
strModel – The name of the model.
strPredictableNode – This one is a bit difficult, as it takes
the Node Unique Name of the target attribute instead
of just the string you see in the viewer. The Node Unique Name identifies
the attribute in the content rowset generated by the model. You can get
the list of predictable attributes and their Node Unique Names by calling
another stored procedure – like this CALL
System.GetPredictableAttributes('ModelName'). This stored
procedure returns two columns – one for the attribute name and one for the Node
Unique Name.
strValue1 – The name of the value you want to compare on the
left hand side. The usage of this parameter depends on the value of the
next parameter.
iValType1 – This parameter indicates how to treat strValue1.
It can have values 0,1, or 2. If this parameter is a 1, the value in
strValue1 is the actual state of the attribute. However, if this parameter
is a 0 or 2, the value in strValue1 is ignored. If the value is 0, the
left-hand value is considered to be the “missing state”. If the value is
2, the left hand value is considered to be “all other states.” In the
example above, “All other states” is specified only because it looks nice (and
it’s easier to just drop the combo box value into the function call even if it
will just be ignored).
strValue2 – Like strValue1, but for the right hand side.
iValType2 – Like iValType2, but for the right hand side.
dThreshold – A threshold value used to filter results, such
that small correlations don’t come back in the results.
Usually you set it to a really small number like 0.0005 in the example above.
bNormalize - Whether or not the result is normalized.
If this value is true, the results are normalized to a maximum absolute value of
100, giving a possible range of –100 to 100. All this does is take the
largest absolute value in the result and divide that into 100, and then multiple
all the other numbers by that amount. If set to false, the numbers are
whatever they are and you can figure it out yourself – it’s up to you, but the
NB viewer
always sets this to true.
The Results
Calling this routine returns a row for every differentiating attribute/value
pair with a score higher than the specified threshold. The row contains
the differentiating pair along with the score and some other columns and looks
somewhat like this:
The score column is the “important” one and is best explained
as if you did something like a C language compare routine e.g
int Compare(int v1,int v2) { return v1-v2; }
. That is, if the value is positive, it favors value1 and if the value is
negative, it favors value2. The other
columns are the actual counts of the correlations of
the discriminator against the inputs. The best way to understand them is
to look at the Mining Legend as you browse the model and click on rows.
For example, if you clicked on the first row of the result above (in either
picture), the Mining Legend would look like this:
Of course, once you have the result set you can use it wherever you want – in
Reporting Services, Integration Services, or in your custom program.
|
Sidebar: How Do I See What Queries Are Being Sent by the Viewers?
-
Run SQL Server Profiler.
-
Start a New Trace from the File Menu and connect to Analysis Services.
-
Leave all the defaults on for the Trace Properties dialog that appears.
-
Go to the data mining viewer you're interested in digging into, and browse a
model.
-
Profiler will show you the queries that are being sent. Click on a "Query Begin"
event line in the top pane to see the full query text in the bottom pane.
|