Database Normalization
Database normalization is the process of structuring a database, usually a relational database, in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by Edgar F. Codd as part of his relational model.
Codd introduced the concept of normalization and what is now known as the first normal form (1NF) in 1970. Codd went on to define the second normal form (2NF) and third normal form (3NF) in 1971, and Codd and Raymond F. Boyce defined the Boyce-Codd normal form (BCNF) in 1974.
Some normal forms (from least normalized to most normalized) are:
- 1NF ( First Normal Form) (1970)
- 2NF (Second Normal Form) (1971)
- 3NF (Third Normal Form) (1971)
- BCNF (Boyce-Codd Normal Form) (1974)
- 4NF (Fourth Normal Form) (1977)
- 5NF (Fifth Normal Form) (1979)
- 6NF (Sixth Normal Form) (2003)
Database Normalization With Examples -
1NF (First Normal Form) -
To satisfy 1NF,
- It should only have single(atomic) valued attributes/columns i.e should not be composite or multi-valued attribute.
- Each record needs to be unique.
Example -
In the above table there are multiple subject for single row which shows multi-valued attribute. So, it is not in 1NF.
To change into 1NF we write each subject in a seperate row.
2NF (Second Normal Form) -
- Should be in first normal form.
- There should not be partial dependency.
Partial Dependency — If the proper subset of candidate key determines non-prime attribute, it is called partial dependency.
Non-Prime Attribute — Attributes of the relation which does not exist in any of the possible candidate keys of the relation, such attributes are called non prime attributes.
Prime Attribute — Attributes of the relation which exist in any of the possible candidate keys of the relation, such attributes are called prime attributes.
Example -
Let’s assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach more than one subject.
candidate key {TEACHER_ID, SUBJECT}
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key. That’s why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
3NF (Third Normal Form) -
- Should be in second normal form.
- There should not be transitive functional dependencies.
A relation is in 3NF if at least one of the following condition holds in every non-trivial function dependency X –> Y
- X is a super key.
- Y is a prime attribute (each element of Y is part of some candidate key).
Transitive dependency — If A->B and B->C are two FDs then A->C is called transitive dependency.
Example -
Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}….so on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form.
That’s why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
Boyce-Codd Normal Form (BCNF) –
A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is super key. A relation is in BCNF iff in every non-trivial functional dependency X –> Y, X is a super key.
Example -
In the above table Functional dependencies are as follows:
EMP_ID->EMP_COUNTRY
EMP_DEPT->{DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
Functional dependencies:
EMP_ID->EMP_COUNTRY
EMP_DEPT->{DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.