CS614 Assignment 3 Solution Feb 2015

Consider the below applicant table:

Applicant_ Info



Apply all three steps of Basic Sorted Neighborhood (BSN) method to find out the duplicate records in the table. Records will be considered duplicate if the value of “Applicant_id

” column is same in these records.

Use the following rules for the key:


Key will consist of first three characters from “Applicant_id”, then first three characters from “Applicant_Name” and then first two characters from “father_Name” column.

BSN method comprises of three steps given below:

a) Create key

In step-1, you will create the key according to the rules as mentioned above against each record. For this, you can add extra column at the end of the table to show the new key created against each record.

b) Sort the data

In step-2, you will sort the record on the basis of newly created key of step-1.

c) Merge

In step-3, consider the window size (w) equal to two (2). You are required to identify the similar records on the basis of sorted key.