# Dictionaries and their Implementation: Binary Trees, ...

(辞書とその実装: 二分木など)

## Data Structures and Algorithms

### 8th lecture, November 17, 2016

http://www.sw.it.aoyama.ac.jp/2016/DA/lecture8.html

### Martin J. Dürst

© 2009-16 Martin J. Dürst 青山学院大学

# Today's Schedule

• Summary of last lecture
• Sorting algorithms faster than O(n log n)
• Binary trees and their traversal methods
• Binary search tree
• Balanced tree

# Summary of Last Lecture

• Quicksort is a very efficiont algorithm for sorting, and a good example to learn about algorithms and their implementation
• In the worst case, quicksort is O(n2); on average, O(n log n)
• Quicksort is a good example for the use of average time complexity and randomized algorithms
• Implementing quicksort requires attention to many details
• Animation of many sorting algorithms: sort.svg
• Sorting based on pairwise comparison is Ω(n log n)

# Sorting Faster thanO(n log n)

• All sorting algorithms studied so far assume an arbitrary distribution of values
• Decisions are made by comparisons of values;
the depth of the decision tree is at least O(n log n)
• If there is some knowledge about the value distribution, improvements are possible
• Extreme example: Integers from 1 to n
→Final place of data can be predicted exactly
→O(n)

# Bin Sort

(also called bucket sort)

Example: Sorting by student number

• Separate data into 10 parts using most significant digit
• Apply recursively to less significant digits
• To manage memory, split separation into two phases
(`one_digit_stable_sort` in 8binradix.rb)
1. Counting number of items in each part
2. Moving data items
• Complexity is O(n k), where k is the number of digits
• Implementation in Ruby: `conceptual_bin_sort` in 8binradix.rb

• Sort once for each digit, starting with the least significant digit
• No need to partition data
• A stable sorting method is necessary
• Complexity is O(n k), where k is the number of digits
• Implementation in Ruby: `radix_sort` 8binradix.rb

# Parallelization of Sorting

• These days, computers do not get faster, but smaller and larger in numbers
• Parallelization becomes important!
• For some tasks, parallelization can be very difficult
• For some tasks, parallelization can be quite easy
• Many sorting algorithms are eazy to parallelize:
• Bubble sort
• Merge sort
• Quick sort

(caution: Not exactly the same as a (book) dictionary)

• For each data item, there is:
• A key: Used to identify the data item, e.g. during search
• A value: All the information besides the key (may be empty)
• Operations
• Search/find
• Insert
• Delete

# Simple Dictionary Implementations

• Sorted array: Search is O(log n) (binary search), insertion/deletion is O(n)
• Unordered array/linear list: Search is O(n)
• Ideally, search/insertion/deletion should all be O(log n) or even O(n)
• Binary search tree (this week)
• Balanced tree (next week)
• Hashing (in two weeks)

# Binary Tree

• Graph: Consisting of nodes and edges
• Tree: The root does not have any parent; all other nodes have exactly one parent
• Binary tree: Each node has ≦2 children

• Depth first
• Preorder
• Inorder
• Postorder

# Binary Search Tree

(definition/invariants)

• Each node contains one data item
• For any node with key k:
• All the keys in the left subtree will be <k (or k)
• All the keys in the right subtree will be >k (or k)
• What to do with multiple identical keys is implementation-dependent

# Search in a Search Tree

• Start searching from the root node
• If the search key, compared to the current node
• Is the same: Return the data item at the current node
• Is smaller: Search in left subtree
• Is greater: Search the right subtree
• Is the empty node: Give up seaching

# Insertion into a Search Tree

• Start insertion from the root node
• If the inserted key, compared to the current node
• Is smaller: Insert item into left subtree
• Is greater: Insert item into right subtree
• Is the same: Give up/insert into right subtree, ... (implementation dependent)
• If the current node is empty: Insert item here as a new node (with two empty nodes as children)

# Deletion from a Search Tree

• Find the node to delete (using search)
• If the number of (real, non-empty) children is
• 0: Delete current node (replace with empty node)
• 1: Replace current node with child
• 2: Replace current node with smallest child in the right subtree
(or largest node in left subtree)

# Implementation of Search Tree

• Share a single special node (`NilNode`) everywhere there is no child node
• Pseudocode/implementation in Ruby: 8bintree.rb

# Evaluation of Simple Search Tree

• The execution speed depends on the height of the tree
• The best height is O(log n)
• The worst height is O(n)
• The average height is O(log n)
(assuming that all input orders have the same probability)

# Balanced Trees

• In the worst case, the shape of a general search tree is the same as the shape of a linear list
• Because the order of insertions/deletions cannot be changed,
using randomization (as used for quicksort) to select the dividing item is impossible
• In the case of a complete binary tree, insertion/deletion take too much time

Solution: A tree that is to some degree (but not perfectly) balanced

# Top-down 2-3-4 Tree

• The number of children for each node is 2, 3, or 4
• If a node has k children, it stores k-1 keys and data items
(if the number of children is 2, then this is the same as for a binary search tree)
• The keys stored in a node are the separators for the subtrees
• The tree is of uniform height
• In the lowest layer of the tree, the nodes have no children

# Summary

• Bin sort and radix sort are O(n k)
• A dictionary is an ADT storing values that can be found using keys
• A binary search tree is a way to implement a dictionary
• Operations on binary search trees are O(n log n) on average, but O(n) in the worst case
• This problem can be addressed using balanced trees

# Homework

(no need to submit)

• Calculate the minimum and maximum height of a binary search tree with n data items
• Using various examples, think about how to insert items into a 2-3-4 tree and propose an algorithm

# Glossary

bin sort
ビンソート
most significant digit

balanced tree

traversal method

binary search tree

balanced tree

hashing
ハッシュ法
binary tree

parallelization

depth first

preorder

inorder

postorder