Monday, 9 February 2015

Organization of data in Clustered Files

Organization of data in Clustered Files

Extensive  empirical  study  has  shown  that  pages  are  usually  at  about  67  percent occupancy.  Thus  the  number  of  physical  data  pages  is  about  1.5B,  and  we  use  this observation in following analysis.


Scan:

The  cost  of  a  scan  is 1.5B  (D  +  RC)
because  all  data  pages  must  be   examined;  this  is similar to sorted files.

Search with Equality Selection:

If several records qualify (e.g., "Find all employees aged 18"), they are  guaranteed to be
adjacent  to  each  other  due to the sorting  on  age,  and  so  the  cost of  retrieving  all  such
records is the  cost of locating  the first such record (Dlog p1.5B + Clog R) plus  the cost  of reading all the qualifying records in sequential order.


Search with Range Selection:

Again assuming that the range selection matches the composite key, the first record that satisfies the selection is located as it is for search with equality.
Subsequently, data pages are sequentially retrieved (using the next and previous links at the leaf level) until a record is found that does not satisfy the range selection

S + number of matching pages

Insert:

To  insert  a  record,  we  must  first  find  the  correct  leaf  page  in  the index,  reading  every page from root to leaf.
Then, we must add the new record.
Most of  the time, the leaf  page has sufficient  space  for the new record, and all we  need to do is to write out the modified leaf page.
Occasionally, the  leaf is full and  we  need  to retrieve and modify other  pages, but this is sufficiently rare that we can ignore it in this simplified analysis.

Delete:
We  must  search  for  the  record,  remove  the  record  from  the  page,  and  write  the modified page back.


No comments:

Post a Comment