Say, from a decision tree model to predict Gender, you want to select only the nodes that have higher than 60% of Females. The distribution of a content node is available in the NODE_DISTRIBUTION column, which is a nested table column. But DMX doesn’t allow you to access any column inside NODE_DISTRIBUTION in the top-level WHERE-clause. For instance, the following DMX query will return an error:
SELECT … FROM dmm.CONTENT
WHERE Node_Distribution.Attribute_Name = ‘Female’ AND Node_Distribution.Probability > 0.6
Instead, you could do the following:
SELECT …
(SELECT … FROM Node_Distribution WHERE Attribute_Value = ‘Female’ AND Probability > 0.6) AS Dist
FROM dmm.CONTENT
However, note that the above query won’t actually filter out any nodes, but instead it will return rows for all nodes and only the nodes that satisfy WHERE-clause will have a non-NULL value in the Dist column. But, once we flatten the hierarchical rowset, the nodes that have NULL values in the Dist column will disappear.
That is, the following will return only the nodes that have higher than 60% of Females:
SELECT FLATTENED …
(SELECT … FROM Node_Distribution WHERE Attribute_Value = ‘Female’ AND Probability > 0.6) AS Dist
FROM dmm.CONTENT
See Returning flattened results from DMX queries to understand why FLATTENED eliminates the top level rows with NULL values in the nested table column.