Mining the Movie Data

 

Once we had the survey data in the correct form, it was a relatively easy task to mine the survey data to find out interesting information.  One of the features of Yukon Data Mining is that it allows you to define a mining problem once and examine it in many different ways.  The construct of a “Mining Structure” outlines the data that you want to analyze, and then you can define several different mining models within that structure to perform the analysis.  These models can use different settings, algorithms, or even different sets of columns from the structure.  When this structure is processed, all of the models are trained in parallel after reading the source data only once.  Modifications to individual models can be made, or new models added, without having to return to the data source.  This way we trained a dozen different models on the survey data.  Within a few minutes we had 12 different analyses on 3200 cases over approximately 5000 attributes.

 

Mining Structure with 3 models

The next step was interpreting the models.  Since we created models using several different algorithms, specifically Association Rules, Decision Trees,  Clustering, and Naïve Bayes, we need different tools to understand the information that was discovered by each algorithm.  Yukon data mining has a rich set of unique browsers that allow the user to visualize the patterns deep inside their data.  Each algorithm has its own set of viewers tailored to expose the model content in the unique way that is required to understand the patterns discovered. 

 

Using OLEDB for Data Mining, which describes a SQL-like syntax for accessing mining models, we were able to recreate two of the viewers using ASP.Net and DHTL to allow you to explore some of the models we created.