(平衡木)

http://www.sw.it.aoyama.ac.jp/2019/DA/lecture9.html

© 2009-19 Martin J. Dürst 青山学院大学

- Leftovers and summary of last lecture
- Balanced trees for internal memory
- 2-3-4 trees
- Red-black trees
- AVL trees

- Balanced trees for secondary storage
- B trees
- B+ trees

- A
*dictionary*is an ADT allowing the insertion, deletion, and search of data items using a key - With a simplistic implementation, some operations take
`O`(`n`) time - With a binary search tree, all operations are

`O`(log`n`) on average, but`O`(`n`) in the worst case - Different than for sorting, this cannot be improved using randomization:
- For quicksort,
**the algorithm**can randomly select a pivot - The order of insertions and deletions for a dictionary is
**externally**determined

- For quicksort,

- For the implementation of a priority queue, we
**Weakened**the total order (of a list or array)

to a local order (between parent and child only)**Strengthened**the shape of a binary tree

to a complete binary tree

- We have to consider strengthening or weakening invariants to improve worst-case performance of a binary search tree

- Each (internal) node has 2, 3, or 4 children
- A node with
`k`children stores`k`-1 keys and data items

(if all nodes have 2 children, a 2-3-4 tree is equal to a full binary search tree) - The keys in the internal nodes separate the key ranges in the subtrees
- The tree is of
*uniform height* - In the lowest layer of the tree, the nodes have no children

(implemented as a single unique empty node) - Operations are generalizations of the same operation on a binary search tree

- Start from the root node
- If the key being searched for is found in the current node, then return the corresponding data item
- Select the subtree based on this nodes' keys, and continue recursively
- If the key being searched is not found in a leaf node, return "not found"

- Basic operation: Search downwards, insert new data item into leaf node
- If a leaf node already has 4 children, it has to be split
- If a node has to be split, its middle key and data item have to be inserted into the parent node
- This may trigger further splits in parents, potentially up to the root
- To avoid splits
*after*insertion (difficult to implement),

nodes with 4 children are split*preemptively*on the way from the root to the leaf - This is the reason for the name
*top-down*2-3-4 tree

(there are other variants)

- More complicated than insertion (same as binary search tree)
- Find data item to be deleted, using search
- If the item to be deleted is not in a leaf, exchange with an item in a leaf
- Remove the item in the leaf
- If this results in a leaf node without data items, move (borrow) items from neigboring leafs
- If the situation cannot be fixed using moving, merge some nodes
- If the situation cannot be fixed using merging, address the problem one layer higher
- If the problem cannot be solved in the top layer, reduce the number of the layers

- Maximum number of data items in a 2-3-4 tree of height
`h`:`n`= 4^{h}-1 - Minimum number of data items in a 2-3-4 tree of height
`h`:`n`= 2^{h}-1 - ⇒ The height of the tree is
`O`(log`n`) - The time needed for each operation is proportional to the height of the
tree and therefore
`O`(log`n`)

(even in the worst case)

- Implementation in Ruby: 9234tree.rb
- Implementation of 2-3-4 trees is quite complicated
- Some memory (in nodes with 2 or 3 children) is unused
- Therefore, other balanced trees have been proposed

- Implementation of a 2-3-4 tree with a binary tree
- The edges of the original tree are black
- Nodes with 3 or 4 children are split into multiple nodes, coloring the internal edges red
- Two consecutive red edges are forbidden
- If this invariant is violated,
*rotations*are used for restoration - If only black edges are counted, the tree is of uniform height
- When all edges are considered, the maximum depth of a leaf is at most
twice the minimum depth (O(log
`n`))

- Proposed by
**A**delson-**V**elskii and**L**andis (Адельсон-Вельский and Ландис) in 1962 - Oldest (binary) balanced tree
- Invariant: At each internal node, the difference between the heights of the subtrees is 1 or less
- The difference between the heights of the left and the right subtrees (-1, 0, 1) is stored in each internal node and kept up to date
- The tree height is limited to 1.44 log
_{2}`n` - Searching is slightly faster than for a red-black-tree
- Insertion and deletion are slightly more complicated than for a red-black-tree

Internal Memory | External (Secondary) Storage | ||
---|---|---|---|

Access principle | random | random | linear |

Technology | dynamic RAM | SSD, HD | magnetic tape |

Unit of access | word | page/sector | record |

Example unit size | 32/64 bits (4/8 bytes) | 512/1024/2048/4096/... bytes | varying |

Access speed | nanoseconds | micro/milliseconds | seconds or minutes |

- Variant of 2-3-4 trees
- Suited for external random access storage (SSD, HD)
- Each page is a node in the tree

→ Efficient access to external memory - Maximise number of keys per page
- Minimum number of keys per page is about half of the maximum

Starting with a B-tree, all data (except keys) is moved to lowest layer of tree

⇒ The number of keys and child nodes per internal node increase

(for practical applications, the size of a key is much smaller than the size of
the data)

⇒ The height of the tree shrinks

⇒ Access to data is faster

(the overall access time is dominated by the number of pages that have to be fetched from secondary memory)

`n`: Overall number of data items (example: 50,000)`L`: Page size (example: 1024 bytes)_{p}`L`: Key size (example: 4 bytes)_{k}`L`: Data size (one item, except key) (example: 240 bytes)_{d}`L`: Size of page number (page reference) (example: 4 bytes)_{pp}`α`: minimum occupancy (usually 0.5)_{min}

(⌊`a`⌋ is the floor function of `a`, the greatest
integer smaller than or equal to `a`,

⌊`a`⌋∈ℤ ∧ ⌊`a`⌋≦`a` ∧
¬∃`b`: `b`∈ℤ ∧
⌊`a`⌋≦`b`<`a`)

`d`= ⌊_{max}`L`/ (_{p}`L`+_{k}`L`)⌋ (example: 4)_{d}

(maximum number of data items per leaf page)`d`= ⌊_{min}`d`_{max}`α`⌋ (example: 2)_{min}

(minimum number of data items per leaf page)`k`= ⌊_{max}`L`/ (_{p}`L`+_{k}`L`)⌋ (example: 128)_{pp}

(maximum number of children per internal node)`k`= ⌊_{min}`k`_{max}`α`⌋ (example: 64)_{min}

(minimum number of children per internal node)

(⌈`a`⌉ is the ceiling function of `a`, the smallest
integer greater than or equal to `a`,

⌈`a`⌉∈ℤ ∧ `a`≦⌈`a`⌉ ∧
¬∃`b`: `b`∈ℤ ∧
`a`<`b`≦⌈`a`⌉)

`Nd`= ⌈_{max}`n`/`d`⌉ (example: 25,000)_{min}

(maximum number of leave pages)`Nd`= ⌈_{min}`n`/`d`⌉ (example: 12,500)_{max}

(minimum number of leave pages)`Nk`= ⌈_{max}`Nd`/_{max}`k`_{min}⌉ + ⌈`Nd`/_{max}`k`_{min}^{2}⌉ ...

(maximum number of internal nodes)

(example: 391 + 7 + 1 = 399; height of B+tree: 4; total number of nodes: 25,399)`Nk`= ⌈_{min}`Nd`/_{min}`k`⌉ + ⌈_{max}`Nd`/_{min}`k`_{max}^{2}⌉ + ...

(minimum number of internal nodes)

(example: 98 + 1 = 99; height of B+tree: 3; total number of nodes: 12,599)

- Balanced search trees are important for efficient implementation of dictionary ADTs
- 2-3-4 trees and B(+)trees increase the degree of a binary tree, but keep the tree height constant
- Red-black-trees and AVL-trees impose limitations on the variation of the tree heigh
- Balanced trees allow to implement the basic operations on a dictionary
ADT in
`O`(log`n`) worst-case time - B-trees and B+ trees are extremely important for the implementation of file systems and databases on secondary storage

- red-black-tree
- 赤黒木 (あかくろぎ)
- AVL-tree
- AVL 木
- secondary storage
- 二次記憶装置
- B-tree
- B 木
- B+ tree
- B+木
- strengthen
- 強化 (する)
- weaken
- 緩和 (する)
- uniform
- 一定 (の)
- lowest layer
- 最下層
- occupancy
- 占有率
- floor function
- 床関数
- ceiling function
- 天井関数
- degree
- 次数