(平衡木)

http://www.sw.it.aoyama.ac.jp/2015/DA/lecture9.html

© 2009-15 Martin J. Dürst 青山学院大学

- Summary of last lecture
- Balanced trees for internal use
- 2-3-4 tree
- red-black-tree
- AVL-tree

- Balanced trees for secondary storage
- B-tree
- B+ tree

- A
*dictionary*is an ADT allowing the insertion, deletion, and search of data items using a key - With a simplistic implementation, some operations take
`O`(`n`) time - With a binary search tree, all operations are
`O`(log`n`) on average, but`O`(`n`) in the worst case - Different than for sorting, this cannot be improved using
randomization

(for quicksort, we can randomly select a pivot, but the order of insertions and deletions for a dictionary is determined externally)

- For the implementation of a priority queue, we
- Weakened the total order (of a binary search tree) to a local order (between parent and child only)
- Strengthened the shape of a (general) binary tree to a complete binary tree

- We have to consider strengthening or weakening invariants to improve worst-case performance of a binary search tree

- Each (internal) node has 2, 3, or 4 children
- A node with
`k`children stores`k`-1 keys and data items

(if all nodes have 2 children, a 2-3-4 tree is equal to a binary search tree) - The keys in the internal nodes separate the key ranges in the subtrees
- The tree is of
*uniform height* - In the lowest layer of the tree, the nodes have no children

(implemented as a single unique empty node)

- Start from the root node
- If the key being searched for is found in the current node, then return the corresponding data item
- Select the subtree based on this nodes' keys, and continue recursively

(each operation on a 2-3-4 tree is a generalization of the same operation on a binary search tree)

- Basic operation: Search downwards, insert new data item into leaf node
- If there are already 3 data items in the leaf node, this node has to be split
- If a node has to be split, a key and data item have to be inserted into the parent node
- This may trigger further splits in parents, potentially up to the root
- To avoid splits after insertion (difficult to implement),

nodes with 4 children are split preemptively on the way from the root to the leaf - This version of 2-3-4 trees is called top-down 2-3-4 tree

- More complicated than insertion (same as binary search tree)
- Find data item to be deleted, using search
- If the item to be deleted is not in a leaf, exchange with an item in a leaf
- Remove the item in the leaf
- If this results in a leaf node without data items, move (borrow) items from neigboring leafs
- If the situation cannot be fixed using moving, merge some nodes
- If the situation cannot be fixed using merging, address the problem one layer higher
- If the problem cannot be solved in the top layer, reduce the number of the layers

- Maximum number of data items in a 2-3-4 tree of height
`h`:`n`= 4^{h}-1 - Minimum number of data items in a 2-3-4 tree of height
`h`:`n`= 2^{h}-1 - ⇒ The height of the tree is
`O`(log`n`) - The time needed for each operation is proportional to the height of the
tree and therefore
`O`(log`n`)

- Implementation in Ruby: 9234tree.rb、9driver.rb
- Implementation of 2-3-4 trees is quite complicated
- Some memory (in nodes with 2 or 3 children) is unused
- Therefore, other balanced trees have been proposed

- Implementation of a 2-3-4 tree with a binary tree
- The edges of the original tree are black
- Nodes with 3 or 4 children are split into multiple nodes, coloring the internal edges red
- Two consecutive red edges are impossible/forbidden
- If this invariant is violated,
*rotations*are used for restoration - If only black edges are counted, the tree is of uniform height
- When all edges are considered, the maximum depth of a leaf is at most twice the minimum depth

- Proposed by Adelson-Velskii and Landis (Адельсон-Вельский and Ландис) in 1962
- Oldest (binary) balanced tree
- Invariant: At each internal node, the difference between the heights of the subtrees is 1 or less
- The difference between the heights of the left and the right subtrees (-1, 0, 1) is stored in each internal node and kept up to date
- The tree height is limited to 1.44 log
_{2}`n` - Searching is slightly faster than for a red-black-tree
- Insertion and deletion are slightly more complicated than for a red-black-tree

Internal Memory | Secondary Storage | ||

Access principle | random | random | linear |

Technology | dynamic RAM | HD, SSD | magnetic tape |

Unit of access | word | page | record |

Example unit size | 32/64 bits (4/8 bytes) | 512/1024/2048/4096/... bytes | varying |

Access speed | nanoseconds | milliseconds | seconds or minutes |

- Variant of 2-3-4 tree
- Each page is a node in the tree
- Maximise the number of keys per page
- The minimum number of keys per page is about half of the maximum

ref. to subtree | ||

key | data | |

ref. to subtree | ||

key | data | |

ref. to subtree | ||

... | ... | |

... | ||

key | data | |

ref. to subtree | ||

Starting with a B-tree, all data (except keys) is moved to lowest layer of tree

⇒ The number of keys and child nodes per internal node increase

(for practical applications, the size of a key is much smaller than the size of
the data)

⇒ The height of the tree shrinks

⇒ Access to data is faster

(the overall access time is dominated by the number of pages that have to be fetched from secondary memory)

ref. to subtree | |

key | |

ref. to subtree | |

key | |

ref. to subtree | |

key | |

ref. to subtree | |

key | |

ref. to subtree | |

key | |

ref. to subtree | |

key | |

ref. to subtree | |

... | |

... | |

key | |

ref. to subtree | |

key | data |

key | data |

key | data |

... | ... |

key | data |

`n`: Overall number of data items (example: 50,000)`L`: Page size (example: 1024 bytes)_{p}`L`: Key size (example: 4 bytes)_{k}`L`: Data size (one item, except key) (example: 240 bytes)_{d}`L`: Size of page number (page reference) (example: 4 bytes)_{pp}`α`: minimum occupancy (usually 0.5)_{min}

(⌊`a`⌋ is the floor function of `a`, the greatest
integer smaller or equal to `a`)

`d`= ⌊_{max}`L`/ (_{p}`L`+_{k}`L`)⌋ (example: 4)_{d}

(maximum number of data items per leaf)`d`= ⌊_{min}`d`_{max}`α`⌋ (example: 2)_{min}

(minimum number of data items per leaf)`k`= ⌊_{max}`L`/ (_{p}`L`+_{k}`L`)⌋ (example: 128)_{pp}

(maximum number of children per internal node)`k`= ⌊_{min}`k`_{max}`α`⌋ (example: 64)_{min}

(minimum number of children per internal node)

(⌈`a`⌉ is the ceiling function of `a`, the smallest
integer greater or equal to `a`)

`Nd`= ⌈_{max}`n`/`d`⌉ (example: 25,000)_{min}

(maximum number of leaves)`Nd`= ⌈_{min}`n`/`d`⌉ (example: 12,500)_{max}

(minimum number of leaves)`Nk`= ⌈_{max}`Nd`/_{max}`k`_{min}⌉ + ⌈`Nd`/_{max}`k`_{min}^{2}⌉ ...

(maximum number of internal nodes)

(example: 391 + 7 + 1 = 399; height of B+tree: 4; total number of nodes: 25,399)`Nk`= ⌈_{min}`Nd`/_{min}`k`⌉ + ⌈_{max}`Nd`/_{min}`k`_{max}^{2}⌉ + ...

(minimum number of internal nodes)

(example: 98 + 1 = 99; height of B+tree: 3; total number of nodes: 12,599)

- 2-3-4 trees and B(+)trees increase the degree of a binary tree, but keep the tree height constant
- Red-black-trees and AVL-trees impose limitations on the variation of the tree heigh
- Balanced trees allow to implement the basic operations on a dictionary
ADT in
`O`(log`n`) time - B-trees and B+ trees are extremely important for the implementation of file systems and databases on secondary storage

- red-black-tree
- 赤黒木 (あかくろぎ)
- AVL-tree
- AVL 木
- secondary storage
- 二次記憶装置
- B-tree
- B 木
- B+ tree
- B+木
- strengthen
- 強化 (する)
- weaken
- 緩和 (する)
- uniform
- 一定 (の)
- lowest layer
- 最下層
- occupancy
- 占有率
- floor function
- 床関数
- ceiling function
- 天井関数
- degree
- 次数