What is data mining?
Simply put, data mining is the process of exploring large quantities of data in order to discover meaningful information about the data, in the form of patterns and rules.
In this process, various forms of analysis can be used to discern such patterns and rules in historical data for a given business scenario, and the information can then be stored as an abstract mathematical model of the historical data, referred to as a data mining model.
After a data mining model is created, new data can be examined through the model to see if it fits a desired pattern or rule. From this information, actions can be taken to improve results in the given business scenario.
What is SQL Server Data Mining?
SQL Server Data Mining is a collection of machine learning algorithms that explore your data for patterns. Once discovered, these patterns can be browsed for greater insight into your data, or they can be applied to new data to create “predictions” – which allow you to determine unknown facts about data based on data the algorithms have seen before.
SQL Server Data Mining comes with nine algorithms, plus all of the tools required to create, explore and deploy mining models in your enterprise or business applications.
Where do I find additional information about SQL Server Data Mining?
In addition to the resources under http://www.microsoft.com/sql/technologies/dm and http://msdn.microsoft.com/sql, more information on SQL Server Data Mining, including whitepapers, tips and tricks can be found at http://www.sqlserverdatamining.com. Additionally the book “Data Mining with SQL Server 2005” by Tang and MacLennan provides overviews and details of the language, algorithms and usage scenarios for Data Mining.
Where do I ask questions about SQL Server Data Mining?
Please post your product questions and comments to the SQL Server Data Mining MSDN forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=81&SiteID=1.
What do I install to use SQL Server 2005 Data Mining?
SQL Server 2005 comes with many components. This describes what components are necessary to perform data mining once you have the SQL Server 2005.
Analysis Services is the only required component to install on the server. If you want to do data mining against existing SQL Server 2000 databases or other data sources (DB2, Oracle, Access, etc), this is the component you need to install.
Install Reporting Services if you want to be able to create reports that work against your data mining models.
SQL Server Database Services
You only need to install the SQL Server relational engine if you want to use it as a data source, or if you want to use the Data Mining samples and tutorials.
SQL Server Integration Services
Installing SSIS causes the SSIS service to be installed on your server, allowing the running of scheduled packages. Install this if you want to use the integrated data mining tasks and transforms on your server.
Workstation Components, etc.
Install the Workstation Components, etc, on any client machine that will be creating mining models, authoring reports and SSIS packages, or managing Analysis Services. The Workstation Components work equally well when installed on the same machine as the server.
You will need to click the Advanced button to install the samples and sample databases. All the samples are located under Client Components/Documentation and Samples. The pertinent samples and databases to install are:
- AdventureWorksDW Sample Data Warehouse (requires you install SQL Server Database Services)
- AdventureWorks Sample OLAP
- Analysis Services Samples
- Integration Services Samples
- Note that installing Samples installs the Samples installation packages. Links to run these are found under
- Start/Microsoft SQL Server 2005/Install Samples.
What’s new for Data Mining in SQL Server 2005?
Exhaustive information can be found in Books Online and other product documentation. Below are the highlights:
- All-new integrated user interface hosted in Visual Studio 2005
- New algorithms
- Time Series
- Sequence Clustering
- Neural Network
- Naïve Bayes
- Association Rules
- Linear Regression
- Logistic Regression
- Improvements to SQL 2000 algorithms
- Decision Trees enhanced to include Regression Trees, among others
- Clustering enhanced to include auto-detection of cluster number, among others
- DMX language enhancements
- Plug-in algorithms and viewers
- Client and server-side managed object models
- Integration with SSIS
- Transforms/Tasks for DM processing/querying
- Text mining transforms
- Integration with Reporting Services
- Complete framework for building and deploying intelligent applications
What data mining algorithms are available in SQL Server 2005?
SQL Server 2005 Analysis Services includes a rich set of nine algorithms developed by the SQL Server Data Mining product team in collaboration with Microsoft Research:
- Microsoft Decision Trees
- Microsoft Clustering
- Microsoft Time Series
- Microsoft Association Rules
- Microsoft Sequence Clustering
- Microsoft Naive Bayes
- Microsoft Neural Network
- Microsoft Linear Regression
- Microsoft Logistic Regression
What additional DM features are available in Enterprise Edition?
- plug-in algorithms
- ability to tune several additional algorithm parameters
- unlimited concurrent DM queries
- parallel cluster model processing
- unlimited number of attributes for Association Rules
- Time Series cross prediction
- sequence prediction
- multiple prediction targets for Naive Bayes, Neural Network and Logistic Regression
- text mining transforms in Integration Services
What deployment options are there for SQL Server 2005 Data Mining?
SQL Server 2005 Data Mining offers unparalleled deployment options for making data mining work for you. Data Mining results can be deployed directly in reports created by SQL Server 2005 Reporting Services allowing deployment to the web, e-mail, SharePoint, and many other destinations.
Integration tasks and transformations in SQL Server 2005 Integration Services allows for a new type of deployment for the predictive analytics in SQL Server 2005 Data Mining. Through these tasks and transforms, you can deploy your models directly into your ETL pipeline, using the power of data mining to intelligently filter, enhance, or detect anomalies in your data streams.
Creating data mining models from your UDMs (cubes) allows you to create new data mining dimensions allowing you to slice your data against the patterns learned from the machine learning algorithms. These patterns are available through any standard cube browser such as Excel, ProClarity, Panorama, and many others.
Finally, SQL Server Data Mining provides a rich API allowing you to access your trained models from any client or middle-tier application – giving you the ability to create intelligent applications using data mining your way.
Can I use Data Mining as a Web Service?
Yes, SQL Server 2005 Analysis Services allows native access to all objects as a web service through XML for Analysis (XMLA). Therefore you can make any data mining model a web service.
How do I embed data mining into my applications?
SQL Server 2005 Data Mining supports many API’s that you can use to embed data mining into your applications.
From .NET (managed) code you have:
- AMO (Analysis Management Objects)
- ADOMD.NET (for client applications)
- ADOMD Server (for server-side stored procs)
- From C++ (native) code you have
OLE DB for Data Mining
From the web you can use ASP.NET and the .NET interfaces or you can use XML for Analysis (XMLA) directly over HTTP.
Can I access SQL Server Data Mining from my non-Windows clients?
Yes, while SQL Server Data Mining runs on the Windows platform, it supports XML for Analysis (XMLA), a SOAP-based protocol allowing access over HTTP from any platform.
How can I do text mining with SQL Server 2005?
Text mining is a two-step process:
First, you use the text mining transforms (Term Extraction and Term Lookup) in the Integration Services component to build your dictionary and to extract keywords/phrases from your source text data into relational tables.
Then you can apply the data mining algorithms in Analysis Services to mine the extracted data.
This tutorial at SQLServerDataMining.com describes a complete end-to-end text mining scenario: http://www.sqlserverdatamining.com/ssdm/Default.aspx?tabid=94&Id=164.
Can I add my own algorithms and/or viewers to SQL Server 2005 Data Mining (Analysis Services)?
SQL Server 2005 DM provides an extensible framework that allows you to easily plug in your own specialized algorithms as well as viewers for native (built-in) or plug-in algorithms. The following resources are available to help you get started:
Whitepaper: SQL Server Data Mining: Plug-In Algorithms
Tutorials and sample code (download includes full source code for a working plug-in algorithm):
- A Tutorial for Constructing a Plug-In Algorithm
- A Tutorial for Constructing a Plug-In Viewer
Can I write a plug-in algorithm using C#?
Yes, you can! The Managed Plug-in API Framework was shipped with the SQL Server 2005 February 2007 Feature Pack in conjunction with SP2. You can download the complete package, which includes a step-by-step tutorial, a compiled HTML help file as well a sample plug-in algorithm developed in C#, from here.
Can I embed the DM Viewers in my own application?
Yes, the SQL Server 2005 Data Mining viewer controls supplied with the product in Business Intelligence Development Studio and SQL Management Studio are now available as a redistributable web download here
Do I need to create an OLAP Cube to do data mining?
No, SQL Server Data Mining reads data from relational sources as well as SQL Server Analysis Services.
Can I mine my OLAP cubes?
Yes, SQL Server 2005 DM supports building mining models on both relational and OLAP data sources. You can also create a data mining dimension that can be used to slice the data in the source cube with a hierarchy discovered by data mining.
Can I do data mining against data in Oracle? DB2? Teradata? Other providers?
Yes, SQL Server Data Mining works against any OLE DB data source supporting standard SQL. Oracle, DB2, and Teradata are all supported providers and access to MS Access sources has been verified as well.
Can I mine against data that is not in a database, such as flat files or data on the web?
Yes, data mining transformations in SQL Server Integration Services (SSIS) allow you to perform data mining operations against data from any source readable by SSIS. This includes native support for flat files and XML sources, including web services, and additional support for custom sources via script transforms.
I don’t have access to an Analysis Server. Can I create mining models locally on my client?
Yes, you just need to install the Client Components (or just the AS2005 OLE DB provider redistributable, available as a web download here – look for the multi-platform downloads under Microsoft SQL Server 2005 Analysis Services 9.0 OLE DB Provider on the page). Local mining models can be created using the following two algorithms: Microsoft_Clustering, Microsoft_Decision_Trees.
How do I upgrade my local models from SQL 2000 to SQL 2005?
To upgrade local mining models from SQL 2000 to SQL 2005, connect using the AS2005 OLE DB provider (MSOLAP.3) and set the “Mining Location” connection string property to the name of the directory as you did in SQL 2000.
The local mining model service will read all the .dmm files in the directory and import them into a .cub file (if you don’t specify an existing or new .cub file in your “Data Source” connection property, a new file called “__MSOLAP90_Imported80MiningModels.cub” is created in the directory specified in “Mining Location” to hold the upgraded models. From this point on, all access to models in that directory will be from this file and not the files created by SQL 2000 data mining. The original files will be left on your hard drive untouched.
Can I transfer (backup/restore) individual mining models from one server to another?
Individual mining structures and models can be exported/imported using the EXPORT and IMPORT DMX statements. See Books Online for the syntax.