Wednesday, February 10, 2010

SQL Table Relations, Primary and Foreign Keys, and Normalization

So far in all SQL examples we had we were dealing with a single table. The truth is that in real life when dealing with databases you’ll have to work with many tables, which are interrelated. The true power of the Relational Database Management Systems is the fact that they are Relational.

The relationships in a RDBMS ensure that there is no redundant data. What is redundant data, you might ask? I’ll answer you with example. An online store, offers computers for sale and the easiest way to track the sales will be to keep them in a database. You can have a table called Product, which will hold information about each computer - model name, price and the manufacturer. You also need to keep some details about the manufacturer like their website and their support email. If you store the manufacturer details in the Product table, you will have the manufacturer contact info repeated for each computer model the manufacturer produces:

model Price Manufacturer ManufacturerWebsite ManufacturerEmail
Inspiron B120 $499 Dell http://www.dell.com support@dell.com
Inspiron B130 $599 Dell http://www.dell.com support@dell.com
Inspiron E1705 $949 Dell http://www.dell.com support@dell.com
Satellite A100 $549 Toshiba http://www.toshiba.com support@toshiba.com
Satellite P100 $934 Toshiba http://www.toshiba.com support@toshiba.com

To get rid of the redundant manufacturer data in the Product table, we can create a new table called Manufacturer, which will have only one entry (row) for each manufacturer and we can link (relate) this table to the Product table. To create this relation we need to add additional column in the Product table that references the entries in the Manufacturer table.

A relationship between 2 tables is established when the data in one of the columns in the first table matches the data in a column in the second table. To explain this further we have to understand SQL relational concepts – Primary Key and Foreign Key. Primary Key is a column or a combination of columns that uniquely identifies each row in a table.
Foreign Key is a column or a combination of columns whose values match a Primary Key in a different table. In the most common scenario the relationship between 2 tables matches the Primary Key in one of the tables with a Foreign Key in the second table. Consider the new Product and Manufacturer tables below:

Manufacturer
ManufacturerID Manufacturer ManufacturerWebsite ManufacturerEmail
1 Dell http://www.dell.com support@dell.com
2 Toshiba http://www.toshiba.com support@toshiba.com

Product
model Price ManufacturerID
Inspiron B120 $499 1
Inspiron B130 $599 1
Inspiron E1705 $949 1
Satellite A100 $549 2
Satellite P100 $934 2

The first table is Manufacturer which has 2 entries for Dell and Toshiba respectively. Each of these entries has a ManufacturerID value, which is unique integer number. Because the ManufacturerID column is unique for the Manufacturer table we can use it as a Primary Key in this table. The Product table retains the Model and the Price columns, but has a new column called ManufacturerID, which matches the values of the ManufacturerID column in the Manufacturer table. All values in the ManufacturerID column in the Product table have to match one of the values in the Manufacturer table Primary Key (for example you can’t have ManufacturerID with value of 3 in the Product table, simply because there is no manufacturer with this ManufacturerID defined in the Manufacturer table).

I’m sure you’ve noticed that we used the same name for the Primary Key in the first table as for the Foreign Key in the second. This was done on purpose to show the relationship between the 2 tables based on these columns. Of course you can call the 2 columns with different names, but if somebody sees your database for a first time it won’t be immediately clear that these 2 tables are related.

But how do we ensure that the Product table doesn’t have invalid entries like the last entry below:

model Price ManufacturerID
Inspiron B120 $499 1
Inspiron B130 $599 1
Inspiron E1705 $949 1
Satellite A100 $549 2
Satellite P100 $934 2
ThinkPad Z60t $849 3

We do not have a manufacturer with ManufacturerID of 3 in our Manufacturer table, hence this entry in the Product table is invalid. The answer is that you have to enforce referential integrity between the 2 tables. Different RDBMS have different ways to enforce referential integrity, and I will not go into more details as this is not important to understand the concept of relationship between tables.

There are 3 types of relations between tables – One-To-Many, Many-To-Many and One-To-One. The relation we created above is One-To-Many and is the most common of the 3 types. In One-To-Many relation a row in one of the tables can have many matching rows in the second table, but a row the second table can match only one row in the first table.

In our example, each manufacturer (a row in the Manufacturer table) produces several different computer models (several rows in the Product table), but each particular product (a row in the Product table) has only one manufacturer (a row in the Manufacturer table).

The second type is the Many-To-Many relation. In this relation many rows from the first table can match many rows in the second and the other way around. To define this type of relation you need a third table whose primary key is composed of the 2 foreign keys from the other 2 table. To clarify this relation lets review the following example. We have a Article table (ArticleID is primary key) and Category (CategoryID is primary key) table.
Every article published in the Article table can belong to multiple categories. To accommodate that, we create a new table called ArticleCategory, which has only 2 columns – ArticleID and CategoryID (these 2 columns form the primary key for this table). This new table called sometimes junction table defines the Many-To-Many relationship between the 2 main tables. One article can belong to multiple categories, and every category may contain more than one article.

In the One-To-One relation each row in the first table may match only one row in the second and the other way around. This relationship is very uncommon simply because if you have this type of relation you may as well keep all the info in one single table.

By dividing the data into 2 tables we successfully removed the redundant manufacturer details from the initial Product table adding an integer column referencing the new Manufacturer table instead.

The process of removing redundant data by creating relations between tables is known as Normalization. Normalization process uses formal methods to design the database in interrelated tables.

No comments:

Post a Comment

Followers