(辞書とその実装: 二分探索木など)

http://www.sw.it.aoyama.ac.jp/2019/DA/lecture8.html

© 2009-19 Martin J. Dürst 青山学院大学

- Leftovers and summary of last lecture
- Sorting algorithms faster than
`O`(`n`log`n`) - The dictionary ADT
- Binary trees and their traversal methods
- Binary search trees
- Balanced trees

- Quicksort is a very efficient algorithm for sorting
- In the worst case, quicksort is
`O`(`n`^{2}); on average,`O`(`n`log`n`) - Quicksort is a good example for the use of
*average time complexity*and*randomization*in algorithms - Implementing quicksort requires careful attention to many details
- Animation of many sorting algorithms: sort.svg
- Sorting based on pairwise comparison is
`Ω`(`n`log`n`)

- All sorting algorithms studied so far assume an
*arbitrary distribution*of values - Decisions are made by
*pairwise comparisons*of values

→depth of decision tree is at least`Ω`(`n`log`n`) - If there is some knowledge about the value distribution, improvements are possible
- Extreme example: Integers from 1 to
`n`

→Final place of data can be predicted exactly

→O(`n`) - Radix sort, bin sort

(also called bucket sort)

Example: Sorting by student number

- Separate data into 10 parts using
*most significant digit* - Apply recursively to less significant digits
- To manage memory, split separation into two phases

(`one_digit_stable_sort`

in 8binradix.rb)- Calculate size of each part
- Move data items

- Complexity is
`O`(`n``k`), where`k`is the number of digits - Implementation in Ruby:
`conceptual_bin_sort`

in 8binradix.rb

- Sort once for each digit, starting with the
*least significant digit* - No need to partition data
- A stable sorting method is necessary
- Complexity is
`O`(`n``k`), where`k`is the number of digits - Implementation in Ruby:
`radix_sort`

8binradix.rb

Bin Sort | Radix Sort | |
---|---|---|

Complexity | O(n k) |
O(n k) |

First digit sorted | most significant digit | least significant digit |

Last digit sorted | least significant digit | most significant digit |

Direction | → | ← |

Stable sort needed | No | Yes |

Data partitioning needed | Yes | No |

- Recently, computers do not get faster, but smaller and larger in numbers
- Parallelization becomes important!
- For some tasks, parallelization can be very difficult
- For some tasks, parallelization can be quite easy
- Many sorting algorithms are eazy to parallelize:
- Bubblesort
- Mergesort
- Quicksort
- Binsort

(caution: **Not** the same as a (book) dictionary)

- For each data item, there is:
- A
*key*: Used to identify the data item, e.g. during search - A
*value*: All the information besides the key (may be empty)

- A
- Operations
- Search/find
- Insert
- Delete

- Sorted array: Search is
`O`(`log``n`) (binary search), insertion/deletion is`O`(`n`) - Unordered array/linear list: Search is
`O`(`n`) - Ideally, search/insertion/deletion should all be
`O`(log`n`) or even`O`(1)- Binary search tree (this week)
- Balanced tree (next week)
- Hashing (in two weeks)

- Graph: Consisting of
*nodes*and*edges* - Tree: The
*root*does not have any*parent*; all other nodes have exactly one parent - Binary tree: Each node has ≦2 children

- Depth first
- Preorder
- Inorder
- Postorder

- Breadth first

- Binary tree
- Each node contains one data item
- For any node with key
`k`:- All the keys in the
**left**subtree will be**<**`k`(or**≦**`k`) - All the keys in the
**right**subtree will be**>**`k`(or**≧**`k`)

- All the keys in the
- What to do with multiple identical keys is
*implementation-dependent*

- Start searching from the root node
- If the search key, compared to the current node key, is:
- The same: Return the data item at the current node
- Smaller: Search in left subtree (recursion)
- Greater: Search the right subtree (recursion)
- The empty node: Terminate seach (not found!)

- Start insertion from the root node
- If the inserted key, compared to the current node key, is:
- Smaller: Insert item into left subtree (recursion)
- Greater: Insert item into right subtree (recursion)
- The same: Give up/insert into right subtree, ... (implementation dependent)

- If the current node is empty: Insert item here as a new node (with two empty nodes as children)

- Find the node to delete (same as search)
- If the number of (real, non-empty) children is:
- 0: Delete current node (replace with empty node)
- 1: Replace current node with child
- 2: Replace current node with smallest child in the right subtree

(or largest node in left subtree)

- Share a single special node (
`NilNode`

) everywhere there is no child node - Pseudocode/implementation in Ruby: 8bintree.rb

- Execution speed depends on the height of the tree
- Best height is
`O`(log`n`) - Worst height is
`O`(`n`) - Average height is
`O`(log`n`)

(assuming that all input orders have the same probability)

- In the worst case, the shape of a general search tree is the same as the shape of a linear list
- The order of insertions/deletions cannot be changed
- Therefore, selecting the dividing item using randomization is impossible
- In the case of a complete binary tree, insertion/deletion take too much time

Solution: A tree that is to some degree (but not perfectly) balanced

(definition/invariants)

- The number of children for each node is 2, 3, or 4
- If a node has
`k`children, it stores`k`-1 keys and data items

(if the number of children is 2, then this is the same as a binary search tree node) - The keys stored in a node are the separators for the subtrees
- The tree is of
*uniform height* - In the lowest layer of the tree, the nodes have no children

*Bin sort*and*radix sort*are`O`(`n``k`)- A
*dictionary*is an ADT storing*values*that can be found using*keys* - A
*binary search tree*is a way to implement a dictionary - Operations on binary search trees are
`O`(`n`log`n`) on average, but O(`n`) in the worst case - This problem can be addressed using balanced trees

(no need to submit)

- Calculate the minimum and maximum height of a binary search tree with
`n`data items - Calculate the minimum and maximum height of a 2-3-4 tree with
`n`data items - Using various examples, think about how to insert items into a 2-3-4 tree and propose an algorithm

- bin sort
- ビンソート
- most significant digit
- 最上位の桁
- radix sort
- 基数整列
- least significant digit
- 最下位の桁
- balanced tree
- 平衡木
- traversal method
- 辿り方
- binary search tree
- 二分探索木
- balanced tree
- 平衡木
- hashing
- ハッシュ法
- binary tree
- 二分木
- parallelization
- 並列化
- depth first
- 深さ優先
- preorder
- 行きがけ順
- inorder
- 通りがけ順
- postorder
- 帰りがけ順
- breadth first
- 幅優先
- key
- キー、鍵
- implementation-dependent
- 実装依存
- top-down 2-3-4 tree
- トップダウン 2-3-4 木
- uniform height
- 一定の高さ