Dictionaries and their Implementation: Binary Trees

(辞書とその実装: 二分木など)

Data Structures and Algorithms

8th lecture, November 8, 2018

http://www.sw.it.aoyama.ac.jp/2018/DA/lecture8.html

Martin J. Dürst

Today's Schedule

Leftovers and summary of last lecture
Sorting algorithms faster than O(n log n)
The dictionary ADT
Binary trees and their traversal methods
Binary search trees
Balanced trees

`Leftovers of Last Lecture`

Summary of Last Lecture

Quicksort is a very efficient algorithm for sorting
In the worst case, quicksort is O(n²); on average, O(n log n)
Quicksort is a good example for the use of average time complexity and randomized algorithms
Implementing quicksort requires careful attention to many details
Animation of many sorting algorithms: sort.svg
Sorting based on pairwise comparison is Ω(n log n)

`Sorting Faster than` `O`(n log n)

All sorting algorithms studied so far assume an arbitrary distribution of values
Decisions are made by binary comparisons of values
→depth of decision tree is at least Ω(n log n)
If there is some knowledge about the value distribution, improvements are possible
Extreme example: Integers from 1 to n
→Final place of data can be predicted exactly
→O(n)
Radix sort, bin sort

Bin Sort

(also called bucket sort)

Example: Sorting by student number

Separate data into 10 parts using most significant digit
Apply recursively to less significant digits
To manage memory, split separation into two phases
(one_digit_stable_sort in 8binradix.rb)
1. Calculate size of each part
2. Move data items
Complexity is O(n k), where k is the number of digits
Implementation in Ruby: conceptual_bin_sort in 8binradix.rb

Radix Sort

Sort once for each digit, starting with the least significant digit
No need to partition data
A stable sorting method is necessary
Complexity is O(n k), where k is the number of digits
Implementation in Ruby: radix_sort 8binradix.rb

Bin Sort vs. Radix Sort

	Bin Sort	Radix Sort
Complexity	`O`(`n` `k`)	`O`(`n` `k`)
First digit sorted	most significant digit	least significant digit
Last digit sorted	least significant digit	most significant digit
Direction	→	←
Stable sort needed	No	Yes
Data partitioning needed	Yes	No

Parallelization of Sorting

Recently, computers do not get faster, but smaller and larger in numbers
Parallelization becomes important!
For some tasks, parallelization can be very difficult
For some tasks, parallelization can be quite easy
Many sorting algorithms are eazy to parallelize:
- Bubble sort
- Merge sort
- Quick sort

The Dictionary ADT

(caution: Not the same as a (book) dictionary)

For each data item, there is:
- A key: Used to identify the data item, e.g. during search
- A value: All the information besides the key (may be empty)
Operations
- Search/find
- Insert
- Delete

Simple Dictionary Implementations

Sorted array: Search is O(log n) (binary search), insertion/deletion is O(n)
Unordered array/linear list: Search is O(n)
Ideally, search/insertion/deletion should all be O(log n) or even O(1)
- Binary search tree (this week)
- Balanced tree (next week)
- Hashing (in two weeks)

Binary Tree

Graph: Consisting of nodes and edges
Tree: The root does not have any parent; all other nodes have exactly one parent
Binary tree: Each node has ≦2 children

Traversal Methods for Binary Trees

Depth first
- Preorder
- Inorder
- Postorder
Breadth first

Binary Search Tree: Invariants

Binary tree
Each node contains one data item
For any node with key k:
- All the keys in the left subtree will be <k (or ≦k)
- All the keys in the right subtree will be >k (or ≧k)
What to do with multiple identical keys is implementation-dependent

Search in a Search Tree

Start searching from the root node
If the search key, compared to the current node
- Is the same: Return the data item at the current node
- Is smaller: Search in left subtree (recursion)
- Is greater: Search the right subtree (recursion)
- Is the empty node: Terminate seach (not found!)

Insertion into a Search Tree

Start insertion from the root node
If the inserted key, compared to the current node
- Is smaller: Insert item into left subtree (recursion)
- Is greater: Insert item into right subtree (recursion)
- Is the same: Give up/insert into right subtree, ... (implementation dependent)
If the current node is empty: Insert item here as a new node (with two empty nodes as children)

Deletion from a Search Tree

Find the node to delete (same as search)
If the number of (real, non-empty) children is
- 0: Delete current node (replace with empty node)
- 1: Replace current node with child
- 2: Replace current node with smallest child in the right subtree
  (or largest node in left subtree)

Implementation of Search Tree

Share a single special node (NilNode) everywhere there is no child node
Pseudocode/implementation in Ruby: 8bintree.rb

Evaluation of Simple Search Tree

Execution speed depends on the height of the tree
Best height is O(log n)
Worst height is O(n)
Average height is O(log n)
(assuming that all input orders have the same probability)

Balanced Trees

In the worst case, the shape of a general search tree is the same as the shape of a linear list
Because the order of insertions/deletions cannot be changed,
using randomization (as used for quicksort) to select the dividing item is impossible
In the case of a complete binary tree, insertion/deletion take too much time

Solution: A tree that is to some degree (but not perfectly) balanced

Top-down 2-3-4 Tree

(definition/invariants)

The number of children for each node is 2, 3, or 4
If a node has k children, it stores k-1 keys and data items
(if the number of children is 2, then this is the same as for a binary search tree)
The keys stored in a node are the separators for the subtrees
The tree is of uniform height
In the lowest layer of the tree, the nodes have no children

Summary

Bin sort and radix sort are O(n k)
A dictionary is an ADT storing values that can be found using keys
A binary search tree is a way to implement a dictionary
Operations on binary search trees are O(n log n) on average, but O(n) in the worst case
This problem can be addressed using balanced trees

Homework

(no need to submit)

Calculate the minimum and maximum height of a binary search tree with n data items
Calculate the minimum and maximum height of a 2-3-4 tree with n data items
Using various examples, think about how to insert items into a 2-3-4 tree and propose an algorithm

Glossary

bin sort: ビンソート
most significant digit: 最上位の桁
radix sort: 基数整列
least significant digit: 最下位の桁
balanced tree: 平衡木
traversal method: 辿り方
binary search tree: 二分探索木
balanced tree: 平衡木
hashing: ハッシュ法
binary tree: 二分木
parallelization: 並列化
depth first: 深さ優先
preorder: 行きがけ順
inorder: 通りがけ順
postorder: 帰りがけ順
breadth first: 幅優先
key: キー、鍵
implementation-dependent: 実装依存
top-down 2-3-4 tree: トップダウン 2-3-4 木
uniform height: 一定の高さ