In a previous tip, we explored new syntax introduced in SQL Server 2005 SP2 that allows you to pass column references in place of parameters in DMX queries. One of the nice scenarios this enables is generation of accuracy reports using Reporting Services. We will show you how to build one of these in this tip.
Let’s imagine that we are working for a bank that wants to detect possibly fraudulent transactions and investigate them. It might cost the bank just a few dollars to investigate a transaction that is flagged as fraudulent. On the other hand, the bank may lose several thousand dollars on every transaction that was in fact fraudulent but was not reported as such. Our goal is to minimize the bank’s expenses by detecting the probability threshold at which transactions should be reported as fraudulent.
The Cost of False Positives and False Negatives
A prediction is considered a true-positive (TP) if the probability of the prediction is above a certain threshold and the predicted state is the correct state. In DMX terms, this can be expressed as follows:
PredictProbability([Column],@State) > @Threshold AND t.[Column] = @State
A Prediction is true-negative (TN) if the probability of the prediction is below the threshold and the column’s state is not the predicted state, that is
PredictProbability([Column],@State) <= @Threshold AND t.[Column] <> @State
A prediction is a false-positive (FP) if
PredictProbability([Column],@State) > @Threshold AND t.[Column] <> @State
And finally, a prediction is a false-negative (FN) if
PredictProbability([Column],@State) <= @Threshold AND t.[Column] = @State
Computing the Cost
The bank’s expenses are equal to:
Number of false negative predictions * Cost of false-negative prediction + Number of false positive predictions * Cost of false-positive prediction
Putting Together the Report
Let’s generate a report that shows the number of true-positive, true-negative, false-positive and false-negative predictions and the bank’s expenses associated with all of the transactions. The report allows the user to model various thresholds and pick one that minimizes the associated expenses.
The accompanying download for this tip includes the complete BI Development Studio solution for the report as well as a backup file of the Analysis Services database containing the model that’s used by the report.
The following steps walk you through the process of creating the same report from scratch.
Step 1: Restore the database backup
Go to SQL Server Management Studio, connect to your Analysis Services instance and restore the database from the BankDB.abf file that is part of the download.
Step 2: Use the Report Server Project Wizard to create the report
1. Open Visual Studio and create a project using the Report Server Project Wizard template.
2. Click Next button on the “Welcome to the Report Wizard” page.
3. To define a new data source, select “Microsoft SQL Server Analysis Services” from the list of data source types, and click the “Edit…” button. Type in the server name and select the database name. Click OK and Next.
4. Click the “Query Builder…” button on the “Design the Query” wizard page.
a. When the Query Builder is opened, click the “Query Parameters” button in the toolbar and define the following query parameters:
· The State parameter will be used to specify the state that is being predicted. Fraudulent transactions have a value of 1 in the Fraudulent column, that is why parameter value is set to 1.
· Threshold parameter will be used to specify probability threshold.
· Cost_Per_FP parameter specifies cost of investigating false-positive transactions
· Cost_Per_FN parameter specifies average bank’s losses on fraudulent transaction.
Click Ok when all query parameters are defined.
b. Click “Design Mode” button in the toolbar of the Query Builder to turn off the design mode. Copy and paste following query and click Ok.
vba!cint(PredictProbability([Fraudulent],@State)>@Threshold and t.[Fraudulent]=@State)*(-1) as [TP],
vba!cint(PredictProbability([Fraudulent],@State)<=@Threshold and t.[Fraudulent]<>@State)*(-1) AS [TN],
vba!cint(PredictProbability([Fraudulent],@State)>@Threshold and t.[Fraudulent]<>@State)*(-1) AS [FP],
vba!cint(PredictProbability([Fraudulent],@State)<=@Threshold and t.[Fraudulent]=@State)*(-1) AS [FN]
NATURAL PREDICTION JOIN
(SELECT * FROM BankModel.CASES) AS T
5. Click Next button in the wizard
6. Click Finish button twice.
Step 3: Modify the report in the report designer
1. Change the report title from “Report1” to “Bank Report”.
2. Remove the table with the TP, TN, FP and FN values; we will not need it.
3. Open the Toolbox window and drag and drop 10 textboxes. Type the text for 5 of the textboxes as shown below:
4. In the report, the number of true-positive, true-negative, false-positive and false-negative transactions can be reported as
5. Total expenses can be calculated as
6. Change the font and color of the text boxes to match your aesthetic preferences.
The Final Output
Now we can estimate the bank’s expenses by plugging in different combinations of values for the Threshold, Cost_Per_FP and Cost_Per_FN parameters. Switch to the Preview tab to see the results.