Intro to Database Systems : Basic Perspectives on Disk and Buffer Management
The Database Management System stores information at 3 levels of the memory hierarchy:
- Primary storage - main memory (and cache) : for currently used data, it is fast and usually volatile.
- Secondary storage - magnetic (“hard”) disk : for persistent data, it is relatively slow and nonvolatile. It stores the main database.
- Tertiary storage - tape : nonvolatile older version of the data.
Now why can’t we store everything in the main memory, if it’s the fastest way? Because …
- it costs too much : with 100$ you buy 4GB of RAM, but 2000GB of disk (500 times more)
- it is volatile : we want to save data between runs, not only at run time!
Why can’t we store everything on tape?
- disks use random access vs. sequential
Disk blocks or pages are the main units for measuring retrieved data. They have a fixed usable size, usually being 512 bytes. We can read (from disk to RAM) or write (from RAM to disk) pages.
The seek time is the most time consuming operation when accessing data on disk (from 1 to 20 msec). To compare, accessing data from the main memory is in the order of nanoseconds.
Lowest layers of the Database Management System are in charge of how the place is used on the disk. Higher levels depend on a buffer (the lowest layer of DBMS) to:
- allocate/de-allocate a block of memory (page)
- read/write a block of memory (page)
In other words, the buffer manager is doing 3 things:
- 1) manages the functions for reading data that’s in the RAM
- 2) indexes pages (disk blocks) from the database into the buffer cache (also called the buffer pool)
- 3) writes modified pages back to the disk.
When data has to be loaded from the disk:
- if there is empty frame available in the pool, the buffer manager picks an empty frame
- if there is no empty frame in the pool, the buffer manager picks a frame for replacement
Replacement frames have a pin counter of 0. Once the page from the frame is loaded, the pin counter becomes 1, the equivalent of an empty frame. If there is an empty frame, the buffer manager picks an empty frame.
Once the data contained in the replacement frame is requested, its pin counter is incremented. We can state this as a general rule: “when requesting a page that is already in the buffer, its pin counter is incremented”. After the operation is finished, we decrement it. If the page (the disk block) is modified, a dirty bit is set and the frame is immediately written to the disk (update).
If there are no empty frames, only unpinned pages (pin counter = 0) can be chosen to accept loaded pages from the disk: this is the replacement policy. (This makes sense: if we reached the point where we have only non-empty frames, it means that we will have to wait for all the transactions to finish. We will have to wait for all the pins to be decremented. Hence, the pin counter has to get back to 0, or become 1 and unlock a frame.)
DBMS maintain their own buffer rather than use that of the OS so that they control when to let out pages from it through the implementation of pin counters and replacement policies.