What is a Clustered File?
The records which are frequently used together are placed physically together, more records will be in the same page.
Hence the number of pages to be retrieved will be less and this reduces the number of disk accesses which in turn gives a better performance.
This method of storing logically related records, physically together is called clustering.
Eg: Consider CUSTOMER table as shown below.
Cust-ID Cust-Name Cust-City …
1001 Raj D Bangalore
1002
1003
If queries retrieving Customers with consecutive Cust_IDs frequently occur in the application,clustering based on Cust_ID will help improving the performance of these queries.
This can be explained as follows:
Assume that the Customer record size is 128 bytes and the typical size of a page retrieved by the File Manager is 1 Kb (1024 bytes).
Page 128 128 128 128 128 128 128 128 128 bytes*8 = 1024 bytes
Record (each record with a size of 128 bytes)
If there is no clustering, it can be assumed that the Customer records are stored at random physical locations.
In the worst-case scenario, each record may be placed in a different page.
Hence a query to retrieve 100 records with consecutive Cust_Ids (say, 1001 to 1100), will require 100 pages to be accessed which in turn translates to 100 disk accesses.
But, if the records are clustered, a page can contain 8 records.
Hence the number of pages to be accessed for retrieving the 100 consecutive records will be ceil (100/8) = 13 i.e., only 13 disk accesses will be required to obtain the query results.
When not to use clustering?
When the record size and page size are such that a page can contain only one record
No comments:
Post a Comment