The AdventureWorks Database as a Transversal Matroid: Part I – Generating a Bipartite Graph

In previous posts, we took the foreign keys present in the AdventureWorks2008R2 database and used them to represent the database as a multigraph, a simple graph, and an incidence matrix. We even calculated properties such as graph radius and periphery. Another way to investigate relationships between objects is to present them as a matroid (or combinatorial pre-geometry).

To start, the simple graph of the finite key relationships present in the AdventureWorks database is already a cycle matroid. This is a very well known result as all finite graphs are cycle matroids. The independent sets in a cycle matroid are the edges that do not form a closed path.  Arguably, the cycle matroid derived from the AdventureWorks database graph may have some interesting properties.


Instead of investigating the properties of this cycle matroid, we will investigate the AdventureWorks database as a transversal matroid. The definition of a transversal matroid in “Matroid Theory” by James Oxley is as follows:

A way to visualize transversals is via bipartite graphs:


The AdventureWorks database actually works as a great candidate for a transversal matroid. The family of subsets of the AdventureWorks database is the schemas. The AdventureWorks2008R2 database has the following schemas: dbo, HumanResources, Person, Production, Purchasing, and Sales. It’s easy to query the database for its schemas:

SELECT DISTINCT SCHEMA_NAME(schema_id) FROM sys.tables

The expected output is shown below.


To visualize the relationship between tables and schemas, we will need to create a bipartite graph of the relationship. We will create a connection to the database the usual way. Once that is accomplished, we will need to run the following command:


The reason we use CASE in the T-SQL statement is because there’s a table named Person too. So to take care of the fact that there’s a schema and a table of the same name, we just rename it. An partial listing of the output is show below:

We will need to combine the two columns into something Mathematica will like to work with for graphing.

I tried having the SQL query seamlessly put out columns of the form “SpecialOffer” -> “Sales”. While the query executed the way one would expect in SSMS, the parser Mathematica uses choked on it. So here we are, inelegantly combining the results. It is what it is. Perhaps I’ll find the trick.

We apply the custom function Combine to the query result set.


Finally, it’s time to see something worth looking at. The command below:


Returns:


To see the entire graph, click on the image. It’s very large.

While we’ve generated a great looking image of the relationship between schemas and tables in the AdventureWorks database, we have not yet described the matroid yet. In the second part of this series, we will generate the maximal independent sets to describe this matroid. I’m looking forward to it!

Basic SQL Queries using AdventureWorks

In a previous post, we walked through installing SQL Server 2008 and the Adventure Works databases. Upon successful install, we were able to connect to our database server and expand the installed databases.

Let’s take a moment to explore our surroundings a little bit. Click the plus sign next to AdventureWorks2008R2. You’ll see a number of tables within the database. I have to assume some basic level of acquaintance with databases, but succinctly put, tables are to databases what tracks are to CDs (I wonder how long that analogy will last). If you expand a table, say HumanResources.Department, you find features that are appropriate to tables contained within. The two features you’re likely to be most interested to start with are columns and constraints.


In the above picture, we can observe that we have four columns titled, “DepartmentID”, “Name”, “GroupName”, and, “ModifiedDate”. In addition, we have a default constraint on the ModifiedDate column. Constraints are quite important in databases, you can learn about them more here as well as in Books Online (BOL), the help built into SQL Server Management Studio (SSMS). You may find the BOL entry by clicking, “Help” inside of SSMS and clicking, “search”. At that point, just type in “Constraints”, and you’ll find an entry constraints in the section on understanding tables. It’s a good read.

We can assume that this table models departments and divisions inside of the AdventureWorks organization. So let’s begin to query the database. Right click on AdventureWorks2008RS and select, “New Query”.


Let’s start by just returning all rows and all columns from with HumanResources.Department. We can do this by typing out in the new query window, “SELECT * FROM HumanResources.Department”.


It’s best practice to return only the columns you need. For example, let’s just return the Name and GroupName. To do so let’s type, “SELECT Name, GroupName FROM HumanResources.Department”.


Of course, it’s also best practice to only return the rows you need as well. There’s a lot of reasons for this, sometimes from a managing data perspective, other times from a server load perspective. In general, it just follows from a good principle of living, take only what you need and no more. To return precisely the number of rows you need, we will need use the WHERE clause in our query. Let’s suppose we need the Departments under Executive General and Administration. To do so, let’s use the following query, “SELECT Name, GroupName FROM HumanResources.Department WHERE GroupName = ‘Executive General and Administration”.


These queries work fine when we only need data out of a single table. Querying data out of a single table is a rare event. SQL Server is a relational database. Intuitively, a relational database is a technology that describes relationships between objects via one or more tables. A common example of this would be employees have departments and vice versa. What if we wanted to know what employees worked in what departments? Well, we need to figure out where the relationships lie within the database. There’s a couple of ways to do this. My favorite is to draw a picture. Many of my mathematics professors in college would say, when in doubt, draw a picture. So let’s draw a picture. Let’s right click on Database Diagrams right above the tables. It’ll ask to configure the database for diagrams. Click okay and once that’s done, you’ll get a window asking what table you want to diagram. Scroll through and find the Departments table.


Click, “Add” and then click, “Close. You’ll have a window that should look like the picture below.


Let’s right click on the Departments table and select, “Add Related Tables”.


The related table will drop right on top of the first table. Click and drag it off, right click it and select, “Add Related Tables”.


Before we move on, let’s think about what we want. We want to pull information held in two separate tables out. To do this and not have it be a Cartesian product (commonly referred to as a CROSS JOIN), we will need to find the intersecting attribute that unites the data between the tables. This means we must use a JOIN. In particular, we want the stuff that is only in both tables or sets. To this end, we’re going to employ the INNER JOIN operator.

Let’s look at this screen grab of a handy Mathematica demonstration I downloaded:


I want the rows from table A that have red circles in table B. The syntax would be as follows: SELECT A.* FROM A INNER JOIN B ON A.RedCircles = B.RedCircles. That would only give us the elements in table A that have corresponding elements in tables B.

Let’s move back into the world of AdventureWorks. I’m going to want to find out who in the Person table has a company credit card assigned to them and get the ID of that credit card. This is a great opportunity for you to dive back into the Database Diagram tool and see if you can find the column that is common between the two tables.

Let’s check the row counts on the two tables by themselves. We want uneven tables so we can see the operator for as it. SELECT * FROM Person.Person returns 19972 rows. SELECT * FROM Sales.PersonCreditCard returns 19918 rows. We have our match. To illustrate my point, I will perform an INNER JOIN on those tables.

As an aside, you may be wondering why I use all caps for certain words in my queries. It’s common practice to capitalize T-SQL syntax when writing a query and leaving non-SQL operators and clauses as lowercase. It’s a readability issue but I would suggest strongly in adopting it as a practice.

Returning to the query, let’s get all the data out of the Person.Person table and match it with the Credit card ID of the Sales.PersonCreditCard table.


Note the row count matches the row count of the Sales.PersonCreditCard table. That means there’s rows in the Person.Person table that do not have a matching entry in the Sales.PersonCreditCard table. What if we wanted those rows as well? Enter the LEFT JOIN operator! Let’s make a small adjustment to the query and see what happens.


19972 rows! We’ve seen that number before! Note the duplicate rows and note that rows without a matching BusinessEntityID in the Sales.PersonCreditCard table have NULL in the CreditCardID column. You can also use the RIGHT JOIN operator if you need all rows from the right table in your query. Here’s plenty of documentation on joins here.

So let’s get back to the original example. We need to get data that spans three tables. Not a problem? Not a problem!


As you can see, we can span INNER JOINs across tables as we need. Nifty.

So what next? Inserts, updates, and deletes? Why not? We’ve come this far, let’s just power through this last little bit and then we’ll be on to Mathematica as a front end for querying SQL Server! Understanding how to interact with databases directly will give us a background to all things we do. Fundamentals! It’s all about fundamentals.

We refer to adding new data to a database as an INSERT. The syntax is usually quite basic. You’ll often see the syntax in two forms. INSERT INTO <Table> VALUES (<Value 1>, <Value 2>, … , <Value k>). You’ll use this syntax when you’re inserting a value for every single column in the table. The database will insert each value (assuming that it doesn’t break any rules) in a manner of left to right. Sometimes, when you’re inserting data, you’ll have NULL values for a particular column. If that’s the case, you’ll want to use the following syntax, INSERT INTO <Table> (<Column 1>, <Column 2>, … , <Column k>) VALUES (<Value 1>, <Value 2>, … , <Value k>). The assumption here is that there’s a break in the sequence. Let’s do a quick example.


The DepartmentID column has an identity so it auto-incremented when we inserted the Name and GroupName and the default constraint took over and inserted a timestamp since I didn’t have a value in its place. The SELECT query shows that the values took. That’s an insert.

UPDATES follow the same in the same fashion. UPDATE < Table> SET <Column> = <Value> WHERE <Other or same column> = <Value>. The WHERE clause is really important here. Without it being present, every row would be updated. Eek.


And to verify:


You might note that I have multiple queries and I’ve highlighted one in each of the screenshots I’ve taken. In SSMS (and in many other database management tools), if you highlight a query or parts of a query and attempt to execute that query, SQL Server will only try to execute the portion of the query that’s highlighted. This is very handy for running queries and testing to ensure that you wrote the query accurately.

Finally, there’s DELETE. If you want to delete all rows, just type DELETE FROM <Table>. Bingo! Empty table. Of course, the nuclear option isn’t usually the appropriate option. So you’ll want to use a WHERE clause to ensure that you’re removing the only row(s) you wish to remove. In that case, you do something like this, DELETE FROM <Table> WHERE <Column> = <Value>.


A simple SELECT query returns no results so we know we’ve been successful.

So there it is, the ultra-basics of querying in a nut-shell. Where to next? Well, as promised here’s some fantastic resources that I’ve used and still use as well today.

There’s a lot of great SQL Server blogs and websites out there as well.

I highly suggest getting a book on SQL Server development as well. There’s really no replacing a book. Forums and searches are helpful but that’s information and information is a subset of knowledge. Books contain knowledge. Go to the source.

Installing SQL Server 2008

To get things started off, I’m going to cover some of the super basics. I remember a time when I was an early mathematics student. I had never encountered a database before in my life. In my mind, my idea of a database was something along the lines of lots of Excel spreadsheets. This first, non-salutory post is going to address that student or person like him. There’s an ocean of knowledge awaiting beneath our feet and this install guide is the start to getting at it!

The first thing in getting started with SQL Server 2008 is to get the software. I assume that as an individual, you’re likely not interested in purchasing a full version of SQL Server 2008 for personal use. That means either using SQL Server 2008 Express or SQL Server 2008 Enterprise Trial. You may find the Express version here and the trial version of Enterprise here. I’m going to assume during the set up process that you’re installing this on your workstation or in a development environment and if you’re not, then you know what you’re doing.

The install process is fairly painless. After downloading the file, choose the path you wish to save your extracted files to.  Make sure that you’re placing the files in a subfolder otherwise the install files will be placed at the same level as your saved documents and et cetera. I appended SQL2008 to the end of the path defaulted to by the extractor and the folder was created when I hit, “Ok”.

For personal purposes, I usually just go straight to the Installation section and click on, “New SQL Server stand-alone installation or add features to an existing installation”. You may wish, since this might be your first time, to poke around and look at some of the documentation linked from the Installation Center window.  Usually, I just proceed through the install wizard selecting the default options and selecting every feature. If this was a production environment, I would highly recommend not installing any unnecessary features. Normally, unless there’s a good case for it, I would recommend using Windows mode for security; however, Mathematica likes to use SQL Server authentication so you’ll want to select Mixed Mode. I am working on getting Windows authentication and Mathematica to play nice but so far, I haven’t found the silver bullet. If security and best practice is something you’d like to explore more, Paul S. Randal’sCommon SQL Server Security Issues and Solutions” is a great starting place. There’s also an MSDN article linked right from the SQL Server Installation Center window. If you’re so inclined, attempting to set up your database server like a production environment with service accounts and minimal permissions is a great way to play around and learn. Allow me to also recommend Pinal Dave’s Journey to SQL Authority. There’s a ton of great SQL Server community resources out there. SQLSkills.com and SQL Authority are great places to start. They both have a lot of links. Happy surfing!

Here’s the process step by step (each bullet represents a discrete step, usually punctuated by hitting, “Next” in the wizard):

  • Navigate to the folder you extracted the files to and double click on Setup.exe
  • Click on the installation category in the left hand pane and when the main window changes, click on “New SQL Server stand-alone installation or add features to an existing installation”.
  • SQL Server 2008 will run an environment checker to ensure that your system is ready for SQL Server 2008. Once the operation completes, you should have passed each of the prerequisites. If you’re curious, click on the, “Show Details” button. Otherwise just hit, “Ok”.
  • SQL Server 2008 Setup will then ask you for permission to install setup support files. Click, “Ok” and let it do its work.
  • The set up process will ask you what sort of installation type you want. For our purposes, select “Perform a new installation of SQL Server 2008”.
  • You’re going to be asked what edition of SQL Server 2008 you want to install. I’m installing the 180 day Enterprise Evaluation version. I’ll probably also perform an install of SQL Server Express with Advanced Services as well. I’m not very familiar with SQL Server Express on a day to day basis since I work with Standard and Enterprise at my work. I’m going to fix my lack of familiarity but for now, if you’re following along and you haven’t done this before, just select the evaluation version.
  • Accept the license terms.
  • I’ve selected the following options for my installation:

The essential features you want installed is Database Engine Services, Full Text Search, and Client Tools. It’s also my intent to work with Analysis Services and Reporting. That means I’ll want Business Intelligence Development Studio. I’m also going to want Management Tools installed as well (mine are greyed out because I’ve already installed them in another instance on this computer). Clicking on each feature will update the Description portion of the window with a summary of the feature you’ve selected.

  • You may select this installation to be the default instance or you may make it a named instance and give it a special name. The most obvious difference between the two is that for the default instance when you connect, it’ll be to <SERVERNAME> whereas with a named instance you’ll be connecting to <SERVERNAME>\<HOSTNAME>. Make sure to select the path to where you wish the root directory for your instance to live if you’re unhappy with the default path. After stepping through a summary of space usage, you’re going to find yourself on a screen asking for service account names. I love this feature of the installer. If you’re comfortable with creating custom accounts (or have some set up in your environment already), just click in and type the account name and password. Make your choices about how the services start up. On my computer, I have the service start up type set to manual because it takes long enough to get to a workable desktop in the first place. When I want to play with SQL Server, I can just go to Services in the Administrative Tools and start it manually. Pick whatever you’re most comfortable with. If you’re working at home and don’t wish to use custom accounts you may pick either NT AUTHORITY\NETWORK SERVICE or NT AUTHORITY\SYSTEM from the drop down. Picking these accounts is really against best practice and should be done only in an environment where it’s appropriate. You may read more about NT AUTHORITY\NETWORK SERVICE  here and about NT AUTHORITY\SYSTEM here. Choose wisely.
  • In the Account Provisioning tab, you may select the means by which you wish users to authenticate with SQL Server. If you select Windows authentication mode, then Windows credentials will be used. In mixed mode, you may set up SQL Server specific users that exist outside of the Windows domain. If you do this, you will need set the password for the, “sa” account. Make this a strong password with the recognition that this account is truly top dog in the environment. Since this is my home computer and I have a single sign on, I just usually keep Windows authentication mode and then I hit, “Add Current User” at the bottom of the window to add my Windows user as an administrator. For the Data Directory tab, you may select the path for each of your database and log files. Many production servers will place these files on separate drives for performance considerations; however, if you’re just play working like I am, the default paths work fine here.
  • For Analysis Services Configuration, you may wish to add your current user as an administrator here as well on the Account Provisioning tab.  You may also select the default data directories for your OLAP files.
  • For Reporting Services configuration, I select the native mode installation.
  • For Error and Usage Reporting, select the options that make the most amount of sense to you. Personally, I send this kind of data to Microsoft.
  • SQL Server 2008 Setup will next perform some last sanity checking to make sure that what you’ve selected to install will make work during installation.
  • Once you’re ready, hit “Install” and it’s off to the races!

Congratulations, you’ve installed SQL Server 2008. Once you’ve completed the installation process, I suggest running Windows Update (or Microsoft Update) and downloading any pertinent updates to SQL Server. If you think you’re done, you’d be almost right. There’s one more task we must complete before we can think about Mathematica and SQL Server putting together like peanut butter and chocolate. We need data.

In the past, SQL Server 2005 shipped with the Adventure Works database. With 2008, we will need to download and install the database ourselves. The Adventure Works database is an example database full of example data. It’s a great way to practice and learn database technology without using a production database. To get the Adventure Works 2008 databases, point your browser toward its Codeplex page, select the database examples that match your database version (likely 2008 R2) and download away. Once the executable extracts, you’ll be presented with a screen that looks like the following. Read and accept the terms and hit, “Next”.

On the next screen, you’re going to find a window that appears like so:

If you do not, then you will see buttons to the right of the databases that cannot currently be installed labeled, “More information”. If you follow them, they will take you to this page where they have short guides and links to getting you up and running. I found that once I’ve fixed the issues keeping me from installing the databases I wanted (in particular, AdventureWOrks OLTP 2008), I had to close out of the installer and restart it. I found this to be the case in particular for the FILESTREAM settings change. Once you’re happy, fire off the installer and watch it go. It’ll take a bit so if you’ve been jamming on the keyboard for a bit, now’s a good time to attend to some business.

Once the installer is done, we need to install the Analysis Services databases. Follow the appropriate, “More information” link to this page and get ready for Business Intelligence Development Studio (known from here on out as BIDS). I’ve copied and pasted its contents here for ease of use:

  • Ensure that the SQL Server Analysis Services service account for the instance you will be deploying to has access to the instance where the AdventureWorksDW2008R2 sample database is stored, and also has access to that databases, and the account is at least a member of the db_datareader role for the AdventureWorksDW2008R2 sample database.
  • Start Business Intelligence Development Studio by clicking Start / All Programs / Microsoft SQL Server 2008/ SQL Server Business Intelligence Development Studio.
  • Click File / Open / Project/Solution
  • Navigate to the appropriate folder containing the Analysis Services database solution. If the sample databases were installed to the default location:
    • For SQL Server 2008:
      • C:\Program Files\Microsoft SQL Server\100\Tools\Samples\AdventureWorks 2008 Analysis Services Project\standard\Adventure Works.sln will contain the solution for SQL Server Standard.
      • C:\Program Files\Microsoft SQL Server\100\Tools\Samples\AdventureWorks 2008 Analysis Services Project\enterprise\Adventure Works.sln will contain the solution for SQL Server Enterprise and SQL Server Developer.
    • For SQL Server 2008R2:
      • C:\Program Files\Microsoft SQL Server\100\Tools\Samples\AdventureWorks 2008R2 Analysis Services Project\standard\Adventure Works.sln will contain the solution for SQL Server Standard.
      • C:\Program Files\Microsoft SQL Server\100\Tools\Samples\AdventureWorks 2008R2 Analysis Services Project\enterprise\Adventure Works.sln will contain the solution for SQL Server Enterprise and SQL Server Developer.
    • Analysis Services databases are not supported on other editions of SQL Server.
  • After opening the solution, in the Solution Explorer double click the “Adventure Works.ds” data source. Click the Edit… button just below the Connection String box. Supply your SQL Server (database engine) server and instance name (if its a named instance). Click the Test Connection Button. If the test succeeds, click OK and OK to save the changes.
  • Right click on the solution in Solution Explorer, choose Properties. On the Adventure Works DW 2008 SE Property Pages dialogue, choose the tree item for Deployment. Change the Target – Server property to your AS server name and instance name (if it is not a default instance).
  • In the solution explorer click right on the solution (Solution ‘Adventure Works’) and click Deploy Solution

This deployment will take a bit so now’s another good chance to get up and attend to other matters or do some reading about SQL Server. Once the deployment is finished, we’re done right? Almost.  Let’s just check and make sure everything’s alright. To do that, let’s open SQL Server Management Studio (hereon known as SSMS) by going to Start/All Programs/Microsoft SQL Server 2008. You should get a window that looks like the following:

We’re going to just check and make sure that our transactional databases are set up. So make sure the server type is set to, “Database Engine”, the server name is set to your computer name, and the authentication is set to what you set it up as during the install (preferably Windows Authentication). Click, “Connect” and let’s see if we’ve done it right. Click the plus sign next to, “Databases” and you should see something like the following:

If I have more databases than you do, don’t worry. I’ve installed the Adventure Works databases for both 2008 and 2008R2. What’s important is that you have AdventureWorks2008R2 and AdventureWorksDW2008R2. Next up, we need to make sure our Analysis Services database installed too. Above your database server name, click on “Connect” and select Analysis Services. You’ll get a dialog box akin to that of when you first pulled up SSMS. Note this time that, “Server type” reads, “Analysis Services”. Click on, “Connect” and go. Expand the databases section and you should see Adventure Works DW 2008R2.

That’s it. We’re done. Next up, getting Mathematica to connect to SQL Server 2008. Not to worry, the process should not nearly be as arduous as this. If you have any particular issues that you’re unable to resolve or if I’ve left something out, please leave a comment and I’ll try to help you.