Database Normalization Part 1
When creating a database, it is often useful to learn from the mistakes of
others. Within the Relational Database Model, a set of rules has been established
to aid in the design of tables that are meant to be connected through relationships.
This set of rules is known as Normalization.
In this Tutorial we will address the first of five normal forms, and
the ways in which normalizing your database will help prevent problems as you
add to it.
The concept of database normalization is not unique to any particular Relational
Database Management System. It can be applied to any of several implications
of relational databases including Microsoft Access, dBase, Oracle, etc. The
benefits of Normalizing your database include:
- Avoiding repetitive entries</li>
- Reducing required storage space</li>
- Preventing the need to restructure existing tables to accommodate new data.</li>
- Increased speed and flexibility of queries, sorts, and summaries.</li>
There are five normal forms in all, each progressively building on its predecessor.
In order to reach peak efficiency, it is recommended that relational databases
be normalized through at least the third normal form. In order to normalize
a database, each table should have a primary key field that uniquely identifies
each record in that table. A primary key can consist of a single field (an ID
Number field for instance) or a combination of two or more fields that together
make a unique key (called a multiple field primary key).
The First Normal Form
For a table to be in first normal form, data must be broken up into the smallest
units possible. For example, the following table is not in first normal form.
| Name
| Address
| Phone
|
| Sally Singer | 123 Broadway New York, NY, 11234 | (111) 222-3345 |
| Jason Jumper | 456 Jolly Jumper St. Trenton NJ, 11547 | (222) 334-5566 |
To conform to first normal form, this table would require additional fields.
The name field should be divided into first and last name and the address should
be divided by street, city state, and zip like this.
| ID
| First
| Last
| Street
| City
| State
| Zip
| Phone
|
| 564 | Sally | Singer | 123 Broadway | New York | NY | 11234 | (111) 222-3345 |
| 565 | Jason | Jumper | 456 Jolly Jumper St. | Trenton | NJ | 11547 | (222) 334-5566 |
In addition to breaking data up into the smallest meaningful values, tables
in first normal form should not contain repetitions groups of fields such as
in the following table.
| Rep ID
| Representative
| Client 1
| Time 1
| Client 2
| Time 2
| Client 3
| Time 3
|
| TS-89 | Gilroy Gladstone | US Corp. | 14 hrs | Taggarts | 26 hrs | Kilroy Inc. | 9 hrs |
| RK-56 | Mary Mayhem | Italiana | 67 hrs | Linkers | 2 hrs | | |
The problem here is that each representative can have multiple clients not
all will have three. Some may have less as is the case in the second record,
tying up storage space in your database that is not being used, and some may
have more, in which case there are not enough fields. The solution to this is
to add a record for each new piece of information.
| Rep ID
| Rep First Name
| Rep Last Name
| Client
| Time With Client
|
| TS-89 | Gilroy | Gladstone | US Corp | 14 hrs |
| TS-89 | Gilroy | Gladstone | Taggarts | 26 hrs |
| TS-89 | Gilroy | Gladstone | Kilroy Inc. | 9 hrs |
| RK-56 | Mary | Mayhem | Italiana | 67 hrs |
| RK-56 | Mary | Mayhem | Linkers | 2 hrs |
Notice the splitting of the first and last name fields again.
This table is now in first normal form. Note that by avoiding repeating groups
of fields, we have created a new problem in that there are identical values
in the primary key field, violating the rules of the primary key. In order to
remedy this, we need to have some other way of identifying each record. This
can be done with the creation of a new key called client ID.
| Rep ID*
| Rep First Name
| Rep Last Name
| Client ID*
| Client
| Time With Client
|
| TS-89 | Gilroy | Gladstone | 978 | US Corp | 14 hrs |
| TS-89 | Gilroy | Gladstone | 665 | Taggarts | 26 hrs |
| TS-89 | Gilroy | Gladstone | 782 | Kilroy Inc. | 9 hrs |
| RK-56 | Mary | Mayhem | 221 | Italiana | 67 hrs |
| RK-56 | Mary | Mayhem | 982 | Linkers | 2 hrs |
This new field can now be used in conjunction with the Rep ID field to create
a multiple field primary key. This will prevent confusion if ever more than
one Representative were to serve a single client.
This now sets us up for our next installment, the second and third normal forms.