It's All About ORACLE

Oracle - The number one Database Management System. Hope this Blog will teach a lot about oracle.

Database Normalization Concepts

Description

Database normalization is the process of organizing the attributes and tables of a relational database to minimize data redundancy.

Normalization involves re-factoring a table into smaller (and less redundant) tables but without losing information; defining foreign keys in the old table referencing the primary keys of the new ones. The objective is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table and then propagated through the rest of the database using the defined foreign keys.

Normalization can be described as:

  • It is a technique for designing relational database tables to minimize duplication of information.
  • Normalization is a practice to safeguard the database against logical and structural anomalies.
  • Normalization is also termed as Canonical synthesis by the experts.
  • It is used to keep data consistent and check that no loss of data as well as integrity is there.
  • Its complexity may lead to higher degree of join operations which sometimes lead to the degraded throughput times.

Normalization Objective


  1. To free the collection of relations from undesirable insertion, update and deletion dependencies;
  2. To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs;
  3. To make the relational model more informative to users;
  4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.

Database Modification Anomalies

When an attempt is made to modify (update, insert into, or delete from) a table, undesired side-effects may follow. Not all tables can suffer from these side-effects; rather, the side-effects can only arise in tables that have not been sufficiently normalized. An insufficiently normalized table might have one or more of the following characteristics:


Update Anomaly:
The same information can be expressed on multiple rows; therefore updates to the table may result in logical inconsistencies. For example, each record in an "Employees' Skills" table might contain an Employee ID, Employee Address, and Skill; thus a change of address for a particular employee will potentially need to be applied to multiple records (one for each skill). If the update is not carried through successfully—if, that is, the employee's address is updated on some records but not others—then the table is left in an inconsistent state. Specifically, the table provides conflicting answers to the question of what this particular employee's address is. This phenomenon is known as an update anomaly.


An update anomaly. Employee 519 is shown as having different addresses on different records.


Insertion Anomaly:
There are circumstances in which certain facts cannot be recorded at all. For example, each record in a "Faculty and Their Courses" table might contain a Faculty ID, Faculty Name, Faculty Hire Date, and Course Code—thus we can record the details of any faculty member who teaches at least one course, but we cannot record the details of a newly hired faculty member who has not yet been assigned to teach any courses except by setting the Course Code to null. This phenomenon is known as an insertion anomaly.
 An insertion anomaly. Until the new faculty member, Dr. Newsome, is assigned to teach at least one course, his details cannot be recorded.

Deletion Anomaly:
Under certain circumstances, deletion of data representing certain facts necessitates deletion of data representing completely different facts. The "Faculty and Their Courses" table described in the previous example suffers from this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, we must delete the last of the records on which that faculty member appears, effectively also deleting the faculty member, unless we set the Course Code to null in the record itself. This phenomenon is known as a deletion anomaly.
deletion anomaly. All information about Dr. Giddens is lost if he temporarily ceases to be assigned to any courses.

First Normal Form (1NF)

As per First Normal Form, no two Rows of data must contain repeating group of information i.e each set of column must have a unique value, such that multiple columns cannot be used to fetch the same row. Each table should be organized into rows, and each row should have a primary key that distinguishes it as unique.
The Primary key is usually a single column, but sometimes more than one column can be combined to create a single primary key. For example consider a table which is not in First normal form

Student Table :
Student Age Subject
Adam 15 Biology, Maths
Alex 14 Maths
Stuart 17 Maths

In First Normal Form, any row must not have a column in which more than one value is saved, like separated with commas. Rather than that, we must separate such data into multiple rows.

Student Table following 1NF will be:
Student Age Subject
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths
Using the First Normal Form, data redundancy increases, as there will be many columns with same data in multiple rows but each row as a whole will be unique.  

Second Normal Form (2NF)

As per the Second Normal Form there must not be any partial dependency of any column on primary key. It means that for a table that has concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column depends only on one part of the concatenated key, then the table fails Second normal form.

In example of First Normal Form there are two rows for Adam, to include multiple subjects that he has opted for. While this is searchable, and follows First normal form, it is an inefficient use of space. Also in the above Table in First Normal Form, while the candidate key is {Student, Subject}, Age of Student only depends on Student column, which is incorrect as per Second Normal Form. To achieve second normal form, it would be helpful to split out the subjects into an independent table, and match them up using the student names as foreign keys

New Student Table following 2NF will be :
Student Age
Adam 15
Alex 14
Stuart 17
In Student Table the candidate key will be Student column, because all other column i.e Age is dependent on it.

New Subject Table introduced for 2NF will be :
Student Subject
Adam Biology
Adam Maths
Alex Maths
Stuart Maths

In Subject Table the candidate key will be {Student, Subject} column. Now, both the above tables qualifies for Second Normal Form and will never suffer from Update Anomalies. Although there are a few complex cases in which table in Second Normal Form suffers Update Anomalies, and to handle those scenarios Third Normal Form is there.

Third Normal Form (3NF)

Third Normal form applies that every non-prime attribute of table must be dependent on primary key. The transitive functional dependency should be removed from the table. The table must be in Second Normal form. For example, consider a table with following fields.

Student_Detail Table:
Student_id Student_name DOB Street city State Zip
In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city and state to new table, with Zip as primary key.

New Student_Detail Table :
Student_id Student_name DOB Zip

Address Table:
Zip Street city state
The advantage of removing transtive dependency is,
  • Amount of data duplication is reduced.
  • Data integrity achieved. 

Boyce and Codd Normal Form (BCNF)
 Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anamoly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.

Normalizing an Example Table

These steps demonstrate the process of normalizing a fictitious student table.
  1. Unnormalized table: 

    Student#AdvisorAdv-RoomClass1Class2Class3
    1022Jones412101-07143-01159-02
    4123Smith216201-01211-02214-01
  2. First Normal Form: No Repeating Groups

    Tables should have only two dimensions. Since one student has several classes, these classes should be listed in a separate table. Fields Class1, Class2, and Class3 in the above records are indications of design trouble. 

    Spreadsheets often use the third dimension, but tables should not. Another way to look at this problem is with a one-to-many relationship, do not put the one side and the many side in the same table. Instead, create another table in first normal form by eliminating the repeating group (Class#), as shown below:

    Student#AdvisorAdv-RoomClass#
    1022Jones412101-07
    1022Jones412143-01
    1022Jones412159-02
    4123Smith216201-01
    4123Smith216211-02
    4123Smith216214-01
  3. Second Normal Form: Eliminate Redundant Data

    Note the multiple Class# values for each Student# value in the above table. Class# is not functionally dependent on Student# (primary key), so this relationship is not in second normal form.

    The following two tables demonstrate second normal form: 

    Students:

    Student#AdvisorAdv-Room
    1022Jones412
    4123Smith216


    Registration:

    Student#Class#
    1022101-07
    1022143-01
    1022159-02
    4123201-01
    4123211-02
    4123214-01
  4. Third Normal Form: Eliminate Data Not Dependent On Key

    In the last example, Adv-Room (the advisor's office number) is functionally dependent on the Advisor attribute. The solution is to move that attribute from the Students table to the Faculty table, as shown below:

    Students:


    Student#Advisor
    1022Jones
    4123Smith


    Faculty:

    NameRoomDept
    Jones41242
    Smith21642

0 comments:

You Might Also Like

Related Posts with Thumbnails

Pages