This is a tutorial for those who are confused about the normal forms due to the extreme confusion you find on the web about the subject. If you want to know what normalization is and why to do it, wikipedia has a great article detailing this information:
http://en.wikipedia.org/wiki/Database_normalization
1NF
1NF is arguably the most ambiguous and confusing normal form on the web. The first normal form is just about making multi-valued fields organized into multiple rows. There are two types of multi-valued fields in unnormalized tables:
This-
Id | Student | Subject 1 | Subject 2 |
1 | Harry | Charms | Potions |
2 | Ron | Charms | Potions |
And this-
Id | Student | Subjects |
1 | Harry | Charms, Potions |
2 | Ron | Charms, Potions |
Or as is shown in most examples, this-
Id | Student | Subject |
1 | Harry | Charms |
| | Potions |
2 | Ron | Charms |
| | Potions |
In both cases we are tying to shove in a list of values, that is a one-to-many or many-to-many relationship with the row, into the row itself. For reasons detailed in the wikipedia article, this should be fixed by creating a separate row for each value and repeating the values in the others fields, like so:
Id | Student | Subject |
1 | Harry | Charms |
1 | Harry | Potions |
2 | Ron | Charms |
2 | Ron | Potions |
The table is now in 1NF were it not for the primary key losing its uniqueness. The id field must be unique in order to identify each distinct student. To solve this we can do one of two things:
Either we create an additional key field for the subjects and make the primary key of the whole table be a composite key of both student and subject ids, thus making the composite key unique-
StudId | Student | SubjId | Subject |
1 | Harry | 1 | Charms |
1 | Harry | 2 | Potions |
2 | Ron | 1 | Charms |
2 | Ron | 2 | Potions |
Or we could just prepare for the other normal forms and start splitting the tables now as is usually done in examples-
StudId | Student | SubjId |
1 | Harry | 1 |
1 | Harry | 2 |
2 | Ron | 1 |
2 | Ron | 2 |
SubjId | Subject |
1 | Charms |
2 | Potions |
Notice that we still need to make the foreign key in the students table part of the primary key in order to preserve uniqueness. If you're wondering why we didn't use a weak entity (bridge table / junction table), that's because it's done in 2NF.
If we had more than one multi-valued field, we'd just create a Cartesian product, like so:
StudId | Student | SubjId | Subject | TeacId | Teacher |
1 | Harry | 1 | Charms | 1 | Filius |
1 | Harry | 2 | Potions | 2 | Slughorn |
1 | Harry | 2 | Potions | 3 | Snape |
2 | Ron | 1 | Charms | 1 | Filius |
2 | Ron | 2 | Potions | 2 | Slughorn |
2 | Ron | 2 | Potions | 3 | Snape |
Resulting in 3 tables:
StudId | Student | SubjId | TeacId |
1 | Harry | 1 | 1 |
1 | Harry | 2 | 2 |
1 | Harry | 2 | 3 |
2 | Ron | 1 | 1 |
2 | Ron | 2 | 2 |
2 | Ron | 2 | 3 |
SubjId | Subject |
1 | Charms |
2 | Potions |
TeacId | Teacher |
1 | Filius |
2 | Slughorn |
3 | Snape |
Complete example:
Id | Student | SubjId | Subject | Room | RoomHall | TeacId | Teacher |
1 | Harry | 1 | Charms | 101 | A | 1 | Filius |
| | 2 | Potions | 202 | B | 2 | Slughorn |
| | | | | | 3 | Snape |
2 | Ron | 1 | Charms | 101 | A | 1 | Filius |
| | 2 | Potions | 202 | B | 2 | Slughorn |
| | | | | | 3 | Snape |
Turns into:
StudId | Student | SubjId | Subject | Room | RoomHall | TeacId | Teacher |
1 | Harry | 1 | Charms | 101 | A | 1 | Filius |
1 | Harry | 2 | Potions | 202 | B | 2 | Slughorn |
1 | Harry | 2 | Potions | 202 | B | 3 | Snape |
2 | Ron | 1 | Charms | 101 | A | 1 | Filius |
2 | Ron | 2 | Potions | 202 | B | 2 | Slughorn |
2 | Ron | 2 | Potions | 202 | B | 3 | Snape |
Notice that the room is assumed to be dependant on the subject such that each subject is taught in its own room.
We may opt to leave the table as it is as it quite complex to break down into smaller tables without any guidance. However if we are to break the table down, the room information would be included in with the subject table since we said that the room is dependant on it, yielding the following:
StudId | Student | SubjId | TeacId |
1 | Harry | 1 | 1 |
1 | Harry | 2 | 2 |
1 | Harry | 2 | 3 |
2 | Ron | 1 | 1 |
2 | Ron | 2 | 2 |
2 | Ron | 2 | 3 |
SubjId | Subject | Room | RoomHall |
1 | Charms | 101 | A |
2 | Potions | 202 | B |
TeacId | Teacher |
1 | Filius |
2 | Slughorn |
3 | Snape |
2NF
2NF is only applicable on tables with composite keys. If a table does not have a composite key, then it is already in 2NF. To make a table in 2NF, first you make sure it is in 1NF and then you split it into separate tables depending on which part of the composite key the fields depend on. For example in the students' table above, the student name does not depend on the subject id, it depends only on part of the composite key, that is, the student id. So the student name field should go into a separate table which describes the students (together with the primary key of course).
StudId | Student |
1 | Harry |
2 | Ron |
SubjId | Subject |
1 | Charms |
2 | Potions |
And there is it, the weak entity we are so used to in many to many relationships.
Complete example (following from previous):
StudId | Student | SubjId | TeacId |
1 | Harry | 1 | 1 |
1 | Harry | 2 | 2 |
1 | Harry | 2 | 3 |
2 | Ron | 1 | 1 |
2 | Ron | 2 | 2 |
2 | Ron | 2 | 3 |
Turns into:
StudId | SubjId | TeacId |
1 | 1 | 1 |
1 | 2 | 2 |
1 | 2 | 3 |
2 | 1 | 1 |
2 | 2 | 2 |
2 | 2 | 3 |
StudId | Student |
1 | Harry |
2 | Ron |
Hence yielding:
StudId | SubjId | TeacId |
1 | 1 | 1 |
1 | 2 | 2 |
1 | 2 | 3 |
2 | 1 | 1 |
2 | 2 | 2 |
2 | 2 | 3 |
StudId | Student |
1 | Harry |
2 | Ron |
SubjId | Subject | Room | RoomHall |
1 | Charms | 101 | A |
2 | Potions | 202 | B |
TeacId | Teacher |
1 | Filius |
2 | Slughorn |
3 | Snape |
Notice that if in 1NF we did not break down the table, we'd result with the same set of tables by now.
3NF
3NF is the normal form we are used to. All we do is check that every field in a 2NF table depends directly on the primary key. If it doesn't or if it depends on a non-primary key field, you place it in its own table. For example if we had the following table:
SubjId | Subject | Room | RoomHall |
1 | Charms | 101 | A |
2 | Potions | 202 | B |
The RoomHall field is directly dependent on the Room field and not on the SubjId primary key field, so the RoomHall field should go into a table on its own together with the Room field. In fact the room where the subject is thought is not a direct property of the subjects entity but is an entity on its own and hence should be separated into a rooms entity and only referenced by a foreign key.
SubjId | Subject | Room |
1 | Charms | 101 |
2 | Potions | 202 |
Complete example (following from previous):
SubjId | Subject | Room | RoomHall |
1 | Charms | 101 | A |
2 | Potions | 202 | B |
Turns into:
SubjId | Subject | Room |
1 | Charms | 101 |
2 | Potions | 202 |
Room | RoomHall |
101 | A |
202 | B |
Hence yielding:
StudId | SubjId | TeacId |
1 | 1 | 1 |
1 | 2 | 2 |
1 | 2 | 3 |
2 | 1 | 1 |
2 | 2 | 2 |
2 | 2 | 3 |
StudId | Student |
1 | Harry |
2 | Ron |
SubjId | Subject | Room |
1 | Charms | 101 |
2 | Potions | 202 |
Room | RoomHall |
101 | A |
202 | B |
TeacId | Teacher |
1 | Filius |
2 | Slughorn |
3 | Snape |
Links
http://portal.dfpug.de/dFPUG/Dokumente/Partner/Hentzenwerke/Visual%20FoxPro%20Certification%20Exam%20Study%20Guide%20Chapter%2002.pdf
Thank you ...
ReplyDeleteWhat about normalizing tables where we do not know the information before hand. Example say a server has multiple clients that can vary in number. Each client sends a message to another through server. A db table comprising SRC and dest for message is likely to have repeated fields since clients are dynamic in connection. Meaning message with SRC Id #1can send to Dest id #1,2,3,4 each on a separate row of table. SRC Id #3 can similarly send to fest #15,2,43, etc.
ReplyDeleteIt would still be normalized as long as you're using foreign keys. Normalization does not stop every form of repetition as foreign keys would still be repeated. You can see that in the example above, the "StudId | SubjId | TeacId" table has a lot of repetition but it is still normalized because the repetition is in the foreign keys. What's important is that fields in a table which are more closely related to each other than they are to the rest of the fields should be placed in their own table.
Delete