A dimension is a fast changing or rapidly changing dimension if one or more of its attributes in the table changes very fast and in many rows. Handling rapidly changing dimension in data warehouse is very difficult because of many performance implications.
As you know slowly changing dimension type 2 is used to preserve the history for the changes. But the problem with type 2 is, with each and every change in the dimension attribute, it adds new row to the table. If in case there are dimensions that are changing a lot, table become larger and may cause serious performance issues. Hence, use of the type 2 may not be the wise decision to implement the rapidly changing dimensions.
For example: Consider patient dimension where there are 1000 rows in it. On average basis, each patient changes the 10 of attributes in a year. If you use the type 2 to manage this scenario, there will be 1000*10 = 10000 rows. Imagine if the table has 1 million rows, it’ll become very hard to handle the situation with type 2. Hence we use rapidly changing dimension approach.
Separate Rapidly Changing Attribute by Implementing Junk Dimension
Consider the fact table, in which not all the attributes of the table changes rapidly. There may be some attribute that may be changing rapidly and other not. The idea here is to separate the rapidly changing attribute from the slowly changing ones and move those attribute to another table called junk dimension and maintain the slowly changing attribute in same table. In this way, we can handle situation of increasing table size.
PATIENT
Patient_id
Name
Gender
Marital_status
Weight
BMI
The attribute like patient_id, Name, Gender, Marital_status will not change or changes very rarely. And attribute like weight and BMI (body mass index) changes every month based on the patient visit to hospital. So, we need to separate the weight column out of the patient table otherwise we end up filling the table if we use SCD type 2 on PATIENT dimension. We can put the weight column which is rapidly changing into junk dimension table. Below is the structure of Junk dimension table.
PATIENT_JNK_DIM
Pat_SK
Weight
BMI
Pat_SK is the surrogate key and acts as a primary key for junk dimension table. Below is how our PATIENT table looks after removing weight and BMI from it.
PATIENTPatient_idNameGenderMarital_status
Link Junk Dimension and PATIENT Table
In this step, we must link the junk dimension and patient table. Keep in mind; we cannot simply refer the junk dimension table by adding its primary key to patient table as foreign key. Because any changes made to junk dimension will have to reflect in the patient table, this obviously increases the data in patient dimension. Instead, we create one more table called mini dimension that acts as a bridge between Patient and Junk dimension. We can also add the columns such as start and end date to track the change history. Below is the structure of the mini dimension:
PATIENT_MINI_DIMPat_idPat_SKSTART_DATEEND_DATE
This table is just bridge between two tables and does not require any surrogate key in it. Below is the diagrammatic representation of the Rapidly Changing Dimension implementation.
PATIENT_MINI_DIMPat_idPat_SKSTART_DATEEND_DATE
This table is just bridge between two tables and does not require any surrogate key in it. Below is the diagrammatic representation of the Rapidly Changing Dimension implementation.
2 comments:
Animachi is really hard to recognize since you took over the website - what you made of it and I really appreciate your commitment to it
BCOM 1st Year TimeTable 2020
BCOM 2nd Year TimeTable 2020
BCOM 3rd Year TimeTable 2020
Post a Comment