Monday, 9 February 2015

Clustered Files

What is a Clustered File?



The records which are frequently used together are placed physically together, more records will be in the same page.


Hence  the  number  of  pages  to  be  retrieved  will  be  less  and  this  reduces  the  number  of  disk accesses which in turn gives a better performance.

This method of storing logically related records, physically together is called clustering.

Eg: Consider CUSTOMER table as shown below.

Cust-ID  Cust-Name  Cust-City  …
1001        Raj D            Bangalore
1002
1003


If  queries  retrieving  Customers  with  consecutive  Cust_IDs  frequently  occur  in  the  application,clustering based on Cust_ID will help improving the performance of these queries.

This can be explained as follows:

Assume that the Customer record size is 128 bytes and the typical size of a page retrieved by the File Manager is 1 Kb (1024 bytes).

Page 128  128  128  128  128  128  128  128                  128 bytes*8 = 1024 bytes

Record (each record with a size of 128 bytes)

If  there  is  no  clustering,  it  can  be  assumed  that  the  Customer  records  are  stored  at   random physical locations.
In the worst-case scenario, each record may be placed in  a different page.
Hence a query to retrieve 100 records with consecutive Cust_Ids (say, 1001 to 1100), will require 100 pages to be accessed which in turn translates to 100 disk accesses.
But, if the records are clustered, a page can contain 8 records.
Hence the number of pages to be accessed for retrieving the 100 consecutive records will be ceil (100/8) = 13 i.e., only 13 disk accesses will be required to obtain the query results.

When not to use clustering?
When the record size and page size are such that a page can contain only one record


No comments:

Post a Comment