# Quicksort, Average Time Complexity

(クイックソート、平均計算量)

## Data Structures and Algorithms

### 7th lecture, November 2, 2017

http://www.sw.it.aoyama.ac.jp/2017/DA/lecture7.html

### Martin J. Dürst © 2009-17 Martin J. Dürst 青山学院大学

# Today's Schedule

• Summary of last lecture
• About the Manual Sorting report
• Quicksort:
Concept, implementation, optimizations
• Average time complexity
• Sorting in C, Ruby, ...
• Comparing sorting algorithms using animation

# Summary of Last Lecture

• Sorting is a very important operation for Information Technology
• Simple sorting algorithms are all O(n2)
• Merge sort (O(n log n)) needs double memory
• Heap sort (O(n log n)) uses a large number of comparisons and exchanges

# Report: Manual Sorting: Problems Seen

• 218341.368 seconds (⇒about 61 hours)
• 61010·103·1010 (units? way too big)
• O(60000) (how many seconds could this be)
• Calulation of actual time backwards from big-O notation (1second/operation, n=5000, O(n2) ⇒ 25'000'000 seconds?)
• A O(n) algorithm (example: "5 seconds per page")
• For 20 people, having only one person work at the end of the algorithm
• For humans, binary sorting is constraining (sorting into 3~10 parts is better)
• Using bubble sort (868 days without including breaks or sleep)
• Prepare 1010 boxes (problem: space, cost, distance for walking)
• Forgetting time for preparation, cleanup, breaks,...
• Submitting just a program
• Report too short

# Today's Goals

Using quicksort as an example, understand

• Different ways to use divide-and-conquer for sorting
• Move from algorithmic concept to efficient implementation
• Average time complexity

# History of Quicksort

• Invented by C. A. R. Hoare in 1959
• Researched in great detail
• Extremely widely used

# Reviewing Divide and Conquer

• Heap sort: The highest priority item of the overall tree is the highest priority item of the two subtrees
• Merge sort: Split into equal-length parts, recurse, merge
• Quicksort: Use an arbitrary boundary element to partition the data, recursively

# Basic Workings of Quicksort

• Select one element as the partitioning element (pivot)
• Split elements elements so that:
• Elements smaller than the pivot go to the left, and
• Elements larger than the pivot go to the right
• Apply quicksort recursively to the data on the left and right sides of the pivot

Ruby pseudocode/implementation: `conceptual_quick_sort` in 7qsort.rb

# Quicksort Implementation Core

1. Use e.g. the rightmost element as the pivot
2. Starting from the right, find an element smaller than the pivot
3. Starting from the left, find an element larger than the pivot
4. Exchange the elements found in steps 2. and 3.
5. Repeat steps 2.-4. until no further exchanges are needed
6. Exchange the pivot with the element in the middle
7. Recurse on both sides

Ruby pseudocode/implementation: `simple_quick_sort` in 7qsort.rb

# Worst Case Complexity

• What happens if the largest (or the smallest) element is always choosen as the pivot?
• The time complexity is Qw(n) = n + Qw(n-1) = Σni=1 i
O(n2)
• This is the worst case complexity (worst case running time) for quick sort
• This complexity is the same as the complexity of the simple sorting algorithms
• This worst case can easily happen if the input is already sorted

# Best Case Complexity

• QB(n) = n + 1 + 2 QB(n/2)
• QB(1) = 0
• Same as merge sort
• O(n log n)
• Unclear whether this is relevant

For most algorithms (but there are exceptions):

• Worst case complexity is very important
• Best case complexity is mostly irrelevant

# Average Complexity

• Assumption: All permutations of the input values have the same probability

→for the pivot, each position is equally probably

• QA(n) = n + 1 + 1/n Σ1≤kn (QA(k-1)+QA(n-k))
• QA(1) = 0

# Calculating QA

 QA(n) = n + 1 + 1/n Σ1≤kn (QA(k-1)+QA(n-k))

 QA(0) + ... + QA(n-2) + QA(n-1) =
= QA(n-1) + QA(n-2) + ... + QA(0)

 QA(n) = n + 1 + 2/n Σ1≤kn QA(k-1) [use  in ]

 n QA(n) = n (n + 1) + 2 Σ1≤kn QA(k-1) [multiply  by n]

 (n-1) QA(n-1) = (n-1) n + 2 Σ1≤kn-1 QA(k-1) [, with n replaced by n-1]

# Calculating QA (continued)

 n QA(n) - (n-1) QA(n-1) = n (n+1) - (n-1) n + 2 QA(n-1) [-]

 n QA(n) = (n+1) QA(n-1) + 2n [simplifying ]

 QA(n)/(n+1) = QA(n-1)/n + 2/(n + 1) [dividing  by n (n+1)]

QA(n)/(n+1) =
= QA(n-1)/n + 2/(n + 1) = [repeatedly expand right side of  by using ]
= QA(n-2)/(n-1) + 2/n + 2/(n+1) =
= QA(n-3)/(n-2) + 2/(n-1) 2/n + 2/(n+1) = ...
= QA(2)/3 + Σ3≤kn 2/(k+1) [approximating sum by integral]

QA(n)/(n+1) ≈ 2 Σ1≤kn 2/k ≈ 2∫1n 1/x dx = 2 ln n

# Result of Calculating QA

QA(n) ≈ 2n ln n ≈ 1.39 n log2 n

O(n log n)

⇒ The number of comparisons on average is ~1.39 times the optimal number of comparisons in an optimal decision tree

# Distribution around Average

• A good average complexity is not enough if the worst case is frequently reached
• For QA, it can be shown that the standard deviation is about 0.65n
• This means that the probability of deviation from the average very quickly gets extremely small

# Complexity of Sorting

Question: What is the complexity of sorting (as a problem)?

• Many good sorting algorithms are O(n log n)
• The basic operations for sorting are comparison and movement
• For n data items, the number of different sorting orders (permutations) is n!
• With each comparision, in the best case, we can reduce the number of sorting orders to half
• The mimimum number of comparisions necessary for sorting is log (n!) ≈ O(n log n)
• Sorting using pairwise comparison is Ω(n log n)

# Pivot Selection

• The efficiency of quicksort strongly depends on the selection of the pivot
• Some solutions:
• Select rightmost element
(dangerous!)
• Use the median of three values
• Use the value at a random location
(this is an example of a randomized algorithm)

# Implementation Improvements

• Comparison of indices
→ Use a sentinel to remove one comparision
• Stack overflow for deep recursion
→ When splitting into two, use recursion for the smaller part, and tail recursion or a loop for the larger part
• Low efficiency of quicksort for short arrays/parts
→ For parts smaller than a given size, change to a simple sort algorithm
→ With insertion sort, it is possible to do this in one go at the very end
(this needs care when testing)
→ Quicksort gets about 10% faster if change is made at an array size of about 10
• Duplicate keys
→ Split in three rather than two

Ruby pseudocode/implementation (excluding split in three): `quick_sort` in 7qsort.rb

# Comparing Sorting Algorithms using Animation

• Uses Web technology: SVG (2D vector graphics) and JavaScript
• Uses special library (Narrative JavaScript) for timing adjustments
• Comparisons are shown in yellow (except for insertion sort), exchanges in blue

Watch animation: sort.svg

# Stable Sorting

• Definition: A sorting algorithm is stable if it retains the original order for two data items with the same key value
• Used for sorting with multiple criteria (e.g. sort by year and prefecture):
• First, sort using the lower priority criterion (e.g. prefecture)
• Then, sort using the higher priority criterion (e.g. year)
• The simple sorting algorithms and merge sort can easily be made stable
• Heap sort and quicksort are not stable
→ Solution 1: Sort multiple criteria together
→ Solution 2: Use the original position as a lower priority criterion

# Sorting in C and Ruby

• Sorting is provided as a library function or method
• Implementation is often based on quicksort
• Comparison of data items depends on type of data and purpose of sorting
→ Use comparison function as a function argument
• If comparison is slow
→ Precompute a value that can be used for sorting
• If exchange of data items is slow (e.g. very large data items)
→ Sort/exchange references (pointers) only

# C's `qsort` Function

```void qsort(
void *base,        // start of array
size_t nel,        // number of elements in array
size_t width,      // element size
int (*compar)(     // comparison function
const void *,
const void *)
);```

# Ruby's `Array#sort`

(`Klass#method` denotes instance method `method` of class `Klass`)

`array.sort` uses `<=>` for comparison

`array.sort { |a, b| a.length <=> b.length }`
This example sorts (e.g. strings) by length

The code block (between `{` and `}`) is used as a comparison function

# Ruby's `<=>` Operator

(also called spaceship operator, similar to `strcmp` in C)

 Relationship between `a` and `b` return value of `a <=> b` `a < b` `-1` (or other integer smaller than 0) `a = b` `0` `a > b` `+1` (or other integer greater than 0)

# Ruby's `Array#sort_by`

`array.sort_by { |str| str.length }` or ```array.sort_by &:length```

(sorting strings by length)

```array.sort_by { |stu| [stu.year, stu.prefecture] }```

(sorting students by year and prefecture)

This calculates the values for the sort criterion for each array element in advance

# Summary

• Quicksort is another application of divide and conquer
• Quicksort is a very famous algorithm, and a good example to learn about algorithms and their implementation
• Quicksort has been carefully researched and widely implemented and used
• Quicksort is a classical example of the importance of average time complexity
• Quicksort is our first example of a randomized algorithm
• Sorting based on pairwise comparison is Θ(n log n)

# Preparation for Next Time

• Think about inputs for which `conceptual_quick_sort` will fail
• Watch the animations carefully (>20 times) to deepen your understanding of sorting algorithms

# Glossary

quicksort
クイックソート
partition

partitioning element (pivot)

worst case complexity (running time)

best case complexity (running time)

average complexity (running time)

standard deviation

randomized algorithm
ランドム化アルゴリズム
median

decision tree

tail recursion

in one go

stable sorting

criterion (plural criteria)

block
ブロック