# Dictionaries and their Implementation: Binary Search Trees

(辞書とその実装: 二分探索木など)

## Data Structures and Algorithms

### 8th lecture, November 21, 2019

http://www.sw.it.aoyama.ac.jp/2019/DA/lecture8.html

### Martin J. Dürst © 2009-19 Martin J. Dürst 青山学院大学

# Today's Schedule

• Leftovers and summary of last lecture
• Sorting algorithms faster than O(n log n)
• Binary trees and their traversal methods
• Binary search trees
• Balanced trees

# Summary of Last Lecture

• Quicksort is a very efficient algorithm for sorting
• In the worst case, quicksort is O(n2); on average, O(n log n)
• Quicksort is a good example for the use of average time complexity and randomization in algorithms
• Implementing quicksort requires careful attention to many details
• Animation of many sorting algorithms: sort.svg
• Sorting based on pairwise comparison is Ω(n log n)

# Sorting Faster thanO(n log n)

• All sorting algorithms studied so far assume an arbitrary distribution of values
• Decisions are made by pairwise comparisons of values
→depth of decision tree is at least Ω(n log n)
• If there is some knowledge about the value distribution, improvements are possible
• Extreme example: Integers from 1 to n
→Final place of data can be predicted exactly
→O(n)

# Bin Sort

(also called bucket sort)

Example: Sorting by student number

• Separate data into 10 parts using most significant digit
• Apply recursively to less significant digits
• To manage memory, split separation into two phases
(`one_digit_stable_sort` in 8binradix.rb)
1. Calculate size of each part
2. Move data items
• Complexity is O(n k), where k is the number of digits
• Implementation in Ruby: `conceptual_bin_sort` in 8binradix.rb

• Sort once for each digit, starting with the least significant digit
• No need to partition data
• A stable sorting method is necessary
• Complexity is O(n k), where k is the number of digits
• Implementation in Ruby: `radix_sort` 8binradix.rb

# Bin Sort vs. Radix Sort

Complexity O(n k) O(n k)
First digit sorted most significant digit least significant digit
Last digit sorted least significant digit most significant digit
Direction
Stable sort needed No Yes
Data partitioning needed Yes No

# Parallelization of Sorting

• Recently, computers do not get faster, but smaller and larger in numbers
• Parallelization becomes important!
• For some tasks, parallelization can be very difficult
• For some tasks, parallelization can be quite easy
• Many sorting algorithms are eazy to parallelize:
• Bubblesort
• Mergesort
• Quicksort
• Binsort

(caution: Not the same as a (book) dictionary)

• For each data item, there is:
• A key: Used to identify the data item, e.g. during search
• A value: All the information besides the key (may be empty)
• Operations
• Search/find
• Insert
• Delete

# Simple Dictionary Implementations

• Sorted array: Search is O(log n) (binary search), insertion/deletion is O(n)
• Unordered array/linear list: Search is O(n)
• Ideally, search/insertion/deletion should all be O(log n) or even O(1)
• Binary search tree (this week)
• Balanced tree (next week)
• Hashing (in two weeks)

# Binary Tree

• Graph: Consisting of nodes and edges
• Tree: The root does not have any parent; all other nodes have exactly one parent
• Binary tree: Each node has ≦2 children

• Depth first
• Preorder
• Inorder
• Postorder

# Binary Search Tree: Invariants

• Binary tree
• Each node contains one data item
• For any node with key k:
• All the keys in the left subtree will be <k (or k)
• All the keys in the right subtree will be >k (or k)
• What to do with multiple identical keys is implementation-dependent

# Search in a Search Tree

• Start searching from the root node
• If the search key, compared to the current node key, is:
• The same: Return the data item at the current node
• Smaller: Search in left subtree (recursion)
• Greater: Search the right subtree (recursion)

# Insertion into a Search Tree

• Start insertion from the root node
• If the inserted key, compared to the current node key, is:
• Smaller: Insert item into left subtree (recursion)
• Greater: Insert item into right subtree (recursion)
• The same: Give up/insert into right subtree, ... (implementation dependent)
• If the current node is empty: Insert item here as a new node (with two empty nodes as children)

# Deletion from a Search Tree

• Find the node to delete (same as search)
• If the number of (real, non-empty) children is:
• 0: Delete current node (replace with empty node)
• 1: Replace current node with child
• 2: Replace current node with smallest child in the right subtree
(or largest node in left subtree)

# Implementation of Search Tree

• Share a single special node (`NilNode`) everywhere there is no child node
• Pseudocode/implementation in Ruby: 8bintree.rb

# Evaluation of Simple Search Tree

• Execution speed depends on the height of the tree
• Best height is O(log n)
• Worst height is O(n)
• Average height is O(log n)
(assuming that all input orders have the same probability)

# Balanced Trees

• In the worst case, the shape of a general search tree is the same as the shape of a linear list
• The order of insertions/deletions cannot be changed
• Therefore, selecting the dividing item using randomization is impossible
• In the case of a complete binary tree, insertion/deletion take too much time

Solution: A tree that is to some degree (but not perfectly) balanced

# Top-down 2-3-4 Tree

(definition/invariants)

• The number of children for each node is 2, 3, or 4
• If a node has k children, it stores k-1 keys and data items
(if the number of children is 2, then this is the same as a binary search tree node)
• The keys stored in a node are the separators for the subtrees
• The tree is of uniform height
• In the lowest layer of the tree, the nodes have no children

# Summary

• Bin sort and radix sort are O(n k)
• A dictionary is an ADT storing values that can be found using keys
• A binary search tree is a way to implement a dictionary
• Operations on binary search trees are O(n log n) on average, but O(n) in the worst case
• This problem can be addressed using balanced trees

# Homework

(no need to submit)

• Calculate the minimum and maximum height of a binary search tree with n data items
• Calculate the minimum and maximum height of a 2-3-4 tree with n data items
• Using various examples, think about how to insert items into a 2-3-4 tree and propose an algorithm

# Glossary

bin sort
ビンソート
most significant digit

least significant digit

balanced tree

traversal method

binary search tree

balanced tree

hashing
ハッシュ法
binary tree

parallelization

depth first

preorder

inorder

postorder