Organization of data in Clustered Files
Extensive empirical study has shown that pages are usually at about 67 percent occupancy. Thus the number of physical data pages is about 1.5B, and we use this observation in following analysis.
Scan:
The cost of a scan is 1.5B (D + RC)
because all data pages must be examined; this is similar to sorted files.
Search with Equality Selection:
If several records qualify (e.g., "Find all employees aged 18"), they are guaranteed to be
adjacent to each other due to the sorting on age, and so the cost of retrieving all such
records is the cost of locating the first such record (Dlog p1.5B + Clog R) plus the cost of reading all the qualifying records in sequential order.
Search with Range Selection:
Again assuming that the range selection matches the composite key, the first record that satisfies the selection is located as it is for search with equality.
Subsequently, data pages are sequentially retrieved (using the next and previous links at the leaf level) until a record is found that does not satisfy the range selection
S + number of matching pages
Insert:
To insert a record, we must first find the correct leaf page in the index, reading every page from root to leaf.Then, we must add the new record.
Most of the time, the leaf page has sufficient space for the new record, and all we need to do is to write out the modified leaf page.
Occasionally, the leaf is full and we need to retrieve and modify other pages, but this is sufficiently rare that we can ignore it in this simplified analysis.


No comments:
Post a Comment