loading...

CS614 Data Warehousing GDB Solution Spring 2014

Consider the following table structure:

ProductID (Primary key, Integer, Not NULL)

ProductName (String, Not NULL)

ManufacturingDate (Date, Not NULL)

ExpiryDate (Date, Not NULL)

Category (Integer, Not NULL, Refers to Category column in Category table)

UnitPrice (Float, Not NULL)

Question: Suppose we want to create index (primary) on this table. Identify the indexing technique, you think is best to consider in this case. Justify your answer with valid reasons.

Solution: Indexing is a data structure technique to efficiently retrieve records from database files based on some attributes on which the indexing has been done. Indexing in database systems is similar to the one we see in books.

Indexing is defined based on its indexing attributes. Indexing can be one of the following types:

  • Primary Index: If index is built on ordering ‘key-field’ of file it is called Primary Index. Generally it is the primary key of the relation.
  • Secondary Index: If index is built on non-ordering field of file it is called Secondary Index.
  • Clustering Index: If index is built on ordering non-key field of file it is called Clustering Index.

Ordering field is the field on which the records of file are ordered. It can be different from primary or candidate key of a file.

An index improves performance for data retrieval but it reduces performance for DML operations. Since a row is inserted into table, a new key must be inserted into every index on table which increases strain on databases. Hence for transaction processing system where more records are inserted, fewer indexes are preferred. Whereas in query-intensive systems such as data warehousing you can create as many indexes as per your requirement.

There are many indexing techniques but most popular is B*Tree index and bit map index. B*tree is default index in most relational database systems.

B*Tree Index :
The top most level of the index is called the root and the lowest level is called the leaf node. All other levels in between are called branches. Both the root and branch contain entries that point to the next level in the index. A leaf node consists of the index key and pointers pointing to the physical location (i.e., row ids) in which the corresponding records are stored. This index is helpful if query has ‘=’ keyword since a particular value is searched in tree structure at leaf node. But it decreases performance if keywords such as ‘>’, ‘<’ ,’>=’ , ‘<=’ exist in query because every node is searched considering these range values. Also keyword NULL leads to full table scan in B*tree indexes.

E.g. Select * from products where product name is NULL.
This query leads to full table scan. Thus B*Tree is useful where table has high cardinality (distinct columns values) .

An index improves performance for data retrieval but it reduces performance for DML operations. Since a row is inserted into table, a new key must be inserted into every index on table which increases strain on databases. Hence for transaction processing system where more records are inserted, fewer indexes are preferred. Whereas in query-intensive systems such as data warehousing you can create as many indexes as per your requirement.

There are many indexing techniques but most popular is B*Tree index and bit map index. B*tree is default index in most relational database systems.

B*Tree Index :
The top most level of the index is called the root and the lowest level is called the leaf node. All other levels in between are called branches. Both the root and branch contain entries that point to the next level in the index. A leaf node consists of the index key and pointers pointing to the physical location (i.e., row ids) in which the corresponding records are stored. This index is helpful if query has ‘=’ keyword since a particular value is searched in tree structure at leaf node. But it decreases performance if keywords such as ‘>’, ‘<’ ,’>=’ , ‘<=’ exist in query because every node is searched considering these range values. Also keyword NULL leads to full table scan in B*tree indexes.

E.g. Select * from products where product name is NULL.
This query leads to full table scan. Thus B*Tree is useful where table has high cardinality (distinct columns values) .

DOWNLOAD SOLUTION HERE
loading...