Intro to Database Systems : Indexing Part 2 - B+ trees

In the previous section, Indexing Part 1, we’ve seen that building an index for frequently used attributes considerably increases the efficiency of a query.

In this section we’ll discuss the most widely used index implementation: the B+ Tree.

b+.PNG

Each node of a B+ tree is a page, a block of data. A page is the transfer unit to disk.

We are already aware that a table spans on many blocks of data. We can picture this by having the tree analogous to the table, the nodes analogous to the blocks of data and we have to keep in mind that each block of data contains multiple rows.

For now we’ve talked about two things stored on a disk : the index and the data. The index (in blue below) points at the data (in green below). We clearly notice now that creating an index takes extra space on the disk.

btree-index1.png

Let’s analyze the leaves of a B+ tree. We notice that they are structured as a linked list with 2 pointers:

Having this tree structure (and not only a sequential linked list structure) helps for insertion and deletion complexities: they have a logarithmic running time.

Say we have a B+ tree with a height h = 2. In this case, 3 blocks of data will be accessed:

How do we recognize B+ tree?

Let’s say d is the number of references that a node has to its children.

In order to be a valid B+ tree, it has to respect the following invariants:

Let’s see how does inserting nodes works in a B+ tree. Say we have a node X. The main algorithm is the following :

Step 1: If node X has empty space, insert (key, ref) into the node.

Step 2: If node X already full:

Let’s go through a few examples. Assume we have 4 rows per page and we’ve inserted the following set of key values : 2, 3, 5, 7, 11, 17, 19, 23, 29, 31.

An empty node with 4 rows per page will look like the following. We notice the 4 empty spaces at the edge for the 4 pointers :

emptyRow.PNG

1) If we want to insert 2, and then 3, and then 5, we just follow the Step 1 of the algorithm (node has empty space).

2)full node.PNG

2) Now let’s insert 7.

3).PNG

What happened?

3) Let’s insert 11.

4).PNG

What happened?

4) Insert 17.

5).PNG

What happened?

5) Insert 19.

6).PNG

What happened?

6) Insert 23.

7).PNG

What happened?

7) Insert 29.

8).PNG

What happened?

8) Insert 31:

9).PNG

What happened?

This simulator is pretty neat for testing your own implementations of B+ trees (many thanks to Joy & Graham from New Zealand).

Now why are we using B+ trees?

We notice that, unlike traversing a linked list, accessing any part of the tree requires visiting only a few nodes. Also, increasing the number of child nodes is decreasing the depth of the tree, hence decreasing the number of “hops” (time consuming disk reads) required.

 
100
Kudos
 
100
Kudos

Now read this

Intro to Database Systems : Schema Refinement - Functional Dependencies

Schema refinement is just a fancy term for saying polishing tables. It is the last step before considering physical design/tuning with typical workloads: 1) Requirement analysis : user needs 2) Conceptual design : high-level description,... Continue →