No History? No Worries!
Say you’re launching a new product and you want to
predict what sales might look like in the next few months. Classical time series prediction does not
work in this scenario because you don’t have historical sales data for the
product. However, new SQL Server 2008
extensions to the Microsoft_Time_Series algorithm and DMX allow you to easily
apply the patterns from another similarly-behaving time series to solve this
conundrum.
Predicting Post-IPO Stock Values for VISA
In this
example, we illustrate how to use the PredictTimeSeries method with the
parameter REPLACE_MODEL_CASES to obtain predictions for a time series for which
we do not have enough historic data. The power of this combination comes into
play when we have a time series with not enough historic data points to build a
realistic model, but we know that this series follows a pattern similar to
another time series for which we have enough historic data to build a model.
Here’s an Excel 2007 workbook
that has 73 historic data points representing post-IPO daily closing values for
the MasterCard stock and just 8 corresponding values for Visa (since we’re
doing this 8 days after the Visa IPO). Our goal is to use the MasterCard stock
history to derive better predictions for the Visa stock.
We will use the SQL
Server 2008 Data Mining Client Add-in for Excel to build and query the time series model:
1.
Make sure you have Excel 2007 with the DM
Client add-in installed.
2. Save the
workbook with
the MasterCard/Visa stock data to your local disk and open in Excel 2007.
3. To create a model for the MasterCard stock data
on the first worksheet, click on the “Data Mining” tab and select the
“Forecast” task.
4. Select “Next” on the first page of the forecast
wizard “Getting Started with the Forecast Wizard”. (Note: This page might not
appear if you previously selected the option to skip the welcome page.)
5. On the second page “Select Data Source”, select
the table we created previously and click on “Next” button.
6. On the “Forecasting” page, select the time stamp
column to be the first column, named “TimeStamp”.
7. In the input columns grid, de-select the
“TimeStamp” column and select the “MasterCard” column, then click “Next”.
8. On the last page of the wizard, rename the
structure “MasterCardStructure” and the model “MasterCardModel”, leave the
default selections to browse the model after it is created and to allow drill
through, and click “Finish” to end the wizard and proceed to build the model.
The MasterCard model historic data and the first 10
predicted values are illustrated in the following graph:

Now, use the same steps to create a time series model for
the Visa stock using the 8 historical data points on the second workbook sheet.
You will see right away that the model will not
generate meaningful predictions due to the lack of sufficient historic data
points. The VisaModel historic data and the next 10 predicted values are
illustrated in the following graph:

Better Predictions Using REPLACE_MODEL_CASES
A better approach is to use the knowledge that the Visa
and MasterCard stocks have a similar pattern and to use the model built for MasterCard
to obtain predictions for the Visa stock values. Here’s how (again using the
Data Mining Client Add-in for Excel):
1. Select the “Query” task from the “Data Mining”
ribbon and click "Next" on the
introductory page.
2. Select the “MasterCardModel” model and click the
“Advanced” button.
3. On the “Data Mining Advance Query Editor” page,
click on the button “Edit Query”, select Yes on the dialog asking to confirm
that “Any changes to the query text will not be preserved when you switch back
to the design mode.”
4. Type the following query:
SELECT
(SELECT $Time, [MasterCard] as
[Visa] FROM
PredictTimeSeries([MasterCardModel].[MasterCard],
10, REPLACE_MODEL_CASES)) as Predictions
From [MasterCardModel]
NATURAL PREDICTION
JOIN
(SELECT 1 AS [TimeStamp], 64.35 as
[MasterCard]
UNION
SELECT 2 AS
[TimeStamp], 62.76 as [MasterCard]
UNION
SELECT 3 AS
[TimeStamp], 64.48 as [MasterCard]
UNION
SELECT 4 AS
[TimeStamp], 66.11 as [MasterCard]
UNION
SELECT 5 AS
[TimeStamp], 69 as [MasterCard]
UNION
SELECT 6 AS
[TimeStamp], 75.1 as [MasterCard]
UNION
SELECT 7 AS
[TimeStamp], 82.75 as [MasterCard]
UNION
SELECT 8 AS
[TimeStamp], 82.86 as [MasterCard])
as t
5. Click “Finish” and select the results of the
query to be copied into a new worksheet.
The results should look like this:

When the REPLACE_MODEL_CASES parameter is used, the
PredictTimeSeries method will return the requested number of prediction
obtained by replacing the last historic points of the given model with the new
values provided in the query. In our case, the last 8 data points for the
MasterCardModel are replaced with the values we generate on the fly using the
SELECT and UNION options in the input set specified after the “NATURAL
PREDICTION JOIN” keywords. Then, the MasterCardModel equations are used to
predict the next 10 values for the Visa stock series.
To see the power of this method, we can compare the
predictions obtained using the MasterCard model (Predictions.Visa), with the
predictions generated by the VisaModel model obtained using only the limit sets
of 8 data points of the Visa stock values (Predictions.Visa2). The results are illustrated
in the following graph:

So there you go - you have a new tool in your arsenal when
you don’t have enough data to make accurate time series predictions. Enjoy!