Graphing a Hierarchy from a Common Table Expression Using TreePlot

Common table expressions are one of the most underrated tools SQL Server ships with. CTEs greatly simplify querying SQL Server in situations where other queries would have either been extremely difficult or impossible to write. One can learn more about CTEs on MSDN. Further, I’d highly recommend this write-up by SQL Lion. It’s very thorough and provides a detailed breakdown of what is happening in a common table expression.

One of the great powers of CTEs is that they trivialize hierarchical queries. A common example is querying AdventureWorks to build a list of who reports to whom. Take the following query, shamelessly yanked from SQL Lion (with a tiny modification):

WITH EmpDetails (EmployeeKey, FirstName, LastName, ParentEmployeeKey,
PFirstName, PLastName, EmployeeRank, [Status])
AS
(
SELECT
EmployeeKey,FirstName,LastName,ParentEmployeeKey
,CAST('NULL' AS VARCHAR(50)) AS PFirstName
,CAST('NULL' AS VARCHAR(50)) AS PLastName
,1 AS EmployeeRank,[Status]
FROM dbo.DimEmployee
WHERE ParentEmployeeKey IS NULL
UNION ALL
SELECT e.EmployeeKey,e.FirstName,e.LastName,e.ParentEmployeeKey
,CAST(CTE_Emp.FirstName AS VARCHAR(50)) AS PFirstName
,CAST(CTE_Emp.LastName AS VARCHAR(50)) AS PLastName
,CTE_Emp.EmployeeRank + 1 AS EmployeeRank
,e.[Status]
FROM dbo.DimEmployee e
INNER JOIN EmpDetails CTE_Emp
ON e.ParentEmployeeKey = CTE_Emp.EmployeeKey
)
SELECT * FROM EmpDetails WHERE [Status] IS NOT NULL ORDER BY EmployeeRank

If we query the AdventureWorksDW2008R2 database, we should get a similar result to the result set below:


Reading the result set, we can see that Ken Sánchez reports to no one as his ParentEmployeeKey is NULL. Throughout the rest of the table, we can see we have individuals that have someone else they report to. In tabular format this is difficult to read and even harder to visualize.

Enter Mathematica and TreePlot[]. We want to take the essential data from the above output and create a visual representation. To start, we will want create a database connection in our notebook (.nb) file. This is done the usual way with DatabaseLink.


Take care to note that for the purposes of this example, we are connecting to the AdventureWorksDW2008R2 database and not AdventureWorks2008R2. Once connected, we will want to use the above query with a small modification.


We modified the SELECT * from the first query to SELECT EmployeeKey, ParentEmployeeKey. This is due to the fact that the other columns are extraneous to our purpose and will only create problems if we keep them in the result set. Once we run the command, we should receive a result set that appears as follows (the image is cropped):


If we look at the TreePlot documentation, we note that our list isn’t quite in a usable state so we will have to massage the data some. The first thing we will do is define a function that will take two parameters u, v, and return u -> v. To do this, we will use the Function[] function in the following manner: Unite = Function[{u,v}, u -> v]


I’ve named my function Unite. It’s a terrible name. Feel free to name yours better. Once the function is built, we need to apply it to the list. To do this, we will use Apply[].


We’ve used shorthand here with the three @ signs. The reason we’ve done so is that the default behavior of Apply would be to apply the “Unite” function at the top level of the list. We want it to apply at one level beneath that. The three @@@ signs is shorthand for doing exactly that. The documentation for Apply discusses it. Also note that the Apply function stripped the sublists out of the list, giving us a litany of relation rules ready for TreePlot. To use TreePlot, merely pass the variable for the relation rules and any options needed. In this example, we want to label our vertices.


It appears that we have four discrete trees in our hierarchy! While this would seem to be unexpected, a simple verification of the data does show that this is not inaccurate. The trees are quite small and very busy so if we select them, click, and drag, we can see them in better detail.






That’s it! We’ve just used a common table expression to query a database for hierarchical relationships and then used Mathematica to graph it. Play around with it and leave any comments you might have. Thanks for reading.