CS614 Data Warehousing Assignment 4 Solution Spring 2013

Reference:

Please refer to the case study given in assignment no. 1

Scenario:

Consider the following table (Tickets) taken from second assignment solution.

PFirstName PMiddleName PLastName PCellno TicketID TicketType FlightID
irshad Ali Khan 92314713 006 b pk203
ebrahim Khan Yusafzai 92314630 052 e pk507
arif Raza Awan 92333553 090 b pk307
abdul Wahab Khan 92321553 012 e pk922
Ibrahim Khan Yusafzai 92314630 090 b pk654
muhammad Ali Raza 92334544 082 b pk507
Arshad Ali Khan 92314713 001 e pk614

You will have to apply all the steps of Basic Sorted Neighborhood Method (BSN) by using two keys to find the most significant key (among these two) along with the similar records identified using this key. Keys will be formed as follows:

Key1:

Key-1 will comprise of six (6) characters, concatenated in the following sequence:

First two characters from passenger first name, then first two characters from passenger middle name and then first two characters from passenger last name.

Key2:

Key-2 will comprise (concatenation) of six (6) characters, concatenated in the following sequence:

First two characters from passenger middle name, then first two characters from passenger last name and then first two characters from passenger first name.

Question:

Apply all the steps of BSN method to find the most significant key among the two specified above. As you know, BSN method comprises of three steps, so for each key:

a) In step-1 you will show the key value against each record. You can append extra column at the end of the table to show the key value against each record.

b) In step-2 you will show the sorted records (complete sorted table) on the basis of that key.

c) In third step you will show the similar records identified on the basis of that key.

Finally you will mention the most significant key (the key that identifies more similar records than other).

Note: Consider the window size (i.e. w) equal to two (2). Two records (passengers) are considered to be same if the PCellno is same.

Solution:

A

 PFirstName PmiddleName PLastName PCellno Ticket ID Ticket Type Flight ID Key 1 Key 2 Irshad Ali khan 92314713 006 b pk203 Iralkh Alkhir ebrahim Khan yusafzai 92314630 052 e pk507 ebkhyu Khyueb Arif Raza awan 92333553 090 b pk307 Arraaw Raawar Abdul Wahab khan 92321553 012 e pk922 Abwakh Wakhab Ibrahim Khan yusafzai 92314630 090 b pk654 Ibkhyu Khyuib muhammad Ali raza 92334544 082 b pk507 Mualra Alramu Arshad Ali khan 92314713 001 e pk614 Aralkh Alkhar

B

 PFirstName PMiddleName PLastName PCellno Ticket ID Ticket Type Flight ID Key 1 Irshad Ali khan 92314713 006 b pk203 Iralkh Arshad Ali Khan 92314713 001 e Pk614 Aralkh Ebrahim Khan Yusafzai 92314630 052 e Pk507 ebkhyu Ibrahim Khan Yousafzai 92314630 090 b Pk654 Ibkhyu Arif Raza Awan 92333553 090 b Pk307 Arraw Abdul Wahab Khan 92321553 012 e Pk922 Abwakh Muhammad Ali raza 92334544 082 b Pk507 mualara

B

 PFirstName PmiddleName PLastName PCellno Ticket ID Ticket Type Flight ID Key 2 Irshad Ali khan 92314713 006 b pk203 Alkhir Arshad Ali khan 92314713 001 e pk614 Alkhar ebrahim Khan yusafzai 92314630 052 e pk507 Khyueb Ibrahim Khan yusafzai 92314630 090 b pk654 Khyuib Arif Raza awan 92333553 090 b pk307 Raawar Abdul Wahab khan 92321553 012 e pk922 Wakhab muhammad Ali raza 92334544 082 b pk507 Alramu

C Similar record

 PFirstName PMiddleName PLastName PCellno Ticket ID Ticket Type Flight ID Key 1 Irshad Ali khan 92314713 006 b pk203 Iralkh Arshad Ali Khan 92314713 001 e Pk614 Aralkh Ebrahim Khan Yusafzai 92314630 052 e Pk507 ebkhyu Ibrahim Khan Yousafzai 92314630 090 b Pk654 Ibkhyu

C Similar record

 PFirstName PmiddleName PLastName PCellno Ticket ID Ticket Type Flight ID Key 2 Irshad Ali khan 92314713 006 b pk203 Alkhir Arshad Ali khan 92314713 001 e pk614 Alkhar ebrahim Khan yusafzai 92314630 052 e pk507 Khyueb Ibrahim Khan yusafzai 92314630 090 b pk654 Khyuib

Key 2 is the most significant