# Quicksort, Average Time Complexity

(クイックソート、平均計算量)

## Data Structures and Algorithms

### 7th lecture, October 29, 2015

http://www.sw.it.aoyama.ac.jp/2015/DA/lecture7.html

### Martin J. Dürst © 2009-15 Martin J. Dürst 青山学院大学

# Today's Schedule

• Summary of last lecture
• About the Manual Sorting report
• Quicksort:
Concept, implementation, optimizations
• Average time complexity
• Sorting in C, Ruby, ...
• Comparing sorting algorithms using animation

# Summary of Last Lecture

• Sorting is a very important operation for Information Technology
• Simple sorting algorithms are all O(n2)
• Merge (O(n log n)) sort needs double memory
• Heap sort (O(n log n)) uses lots of comparisons and exchanges

# Today's Goals

Using quicksort as an example, understand

• Move from algorithmic concept to efficient implementation
• Average time complexity

# History of Quicksort

(quicksort)

• Invented by C. A. R. Hoare in 1960
• Researched in great detail
• Extremely widely used

# Reviewing Divide and Conquer

• Heap sort: The highest priority item of the overall tree is the highest priority item of the two subtrees
• Merge sort: Split into equal-length parts, recurse, merge
• Quicksort: Use an arbitrary boundary element to partition the data, recursively

# Basic Workings of Quicksort

• Select one element as the partitioning element (pivot)
• Exchange elements so that
elements smaller than the pivot go to the left, and
elements larger than the pivot go to the right
• Apply quicksort recursively to the data on the left and right sides of the pivot

Ruby pseudocode/implementation: `conceptual_quick_sort` in 7qsort.rb

# Quicksort Implementation Details

1. Use the rightmost element as the pivot
2. Starting from the right, find an element smaller than the pivot
3. Starting from the left, find an element larger than the pivot
4. Exchange the elements found in steps 2. and 3.
5. Repeat steps 2.-4. until no further exchanges are needed
6. Exchange the pivot with the element in the middle
7. Recurse on both sides

Ruby pseudocode/implementation: `simple_quick_sort` in 7qsort.rb

# Worst Case Complexity

• What happens if the largest (or the smallest) element is always choosen as the pivot?
• The time complexity is Qw(n) = n + Qw(n-1) = Σni=1 i
O(n2)
• This is the worst case complexity (worst case running time) for quick sort
• This is the same complexity as the simple sorting algorithms
• This worst case can easily happen if the input is already sorted

# Best Case Complexity

• QB(n) = n + 1 + 2 QB(n/2)
• QB(1) = 0
• Same as merge sort
• O(n log n)
• Unclear whether this is relevant

For most algorithms, worst case complexity is very important, and best case complexity is mostly irrelevant. But there are exceptions.

# Average Complexity

• Assumption: All permutations of the input values have the same probability
• QA(n) = n + 1 + 1/n Σ1≤kn (QA(k-1)+QA(n-k))
• QA(1) = 0

# Calculating QA

QA(n) = n + 1 + 1/n Σ1≤kn (QA(k-1)+QA(n-k))

QA(0) + ... + QA(n-2) + QA(n-1) =
= QA(n-1) + QA(n-2) + ... + QA(0)

QA(n) = n + 1 + 2/n Σ1≤kn QA(k-1)

n QA(n) = n (n + 1) + 2 Σ1≤kn QA(k-1)

(n-1) QA(n-1) = (n-1) n + 2 Σ1≤kn-1 QA(k-1)

# Calculating QA (continued)

n QA(n) - (n-1) QA(n-1) = n (n+1) - (n-1) n + 2 QA(n-1)

n QA(n) = (n+1) QA(n-1) + 2n

QA(n)/(n+1) =
= QA(n-1)/n + 2/(n + 1) =
= QA(n-2)/(n-1) + 2/n + 2/(n+1) =
= QA(2)/3 + Σ3≤kn 2/(k+1)

QA(n)/(n+1) ≈ 2 Σ1≤kn 2/k ≈ 2∫1n 1/x dx = 2 ln n

# Result of Calculating QA

QA(n) ≈ 2n ln n ≈ 1.39 n log2 n

O(n log n)

⇒ The number of comparisons on average is ~1.39 times the optimal number of comparisons in an optimal decision tree

# Complexity of Sorting

Question: What is the complexity of sorting (as a problem)?

• Most sorting algorithms are O(n log n) (except for simple sorting algorithms)
• The basic operations for sorting are comparison and movement
• For n data items, the number of different sorting orders (permutations) is n!
• With each comparision, in the best case, we can reduce the number of sorting orders to half
• The mimimum number of comparisions necessary for sorting is log (n!) ≈ O(n log n)

# Pivot Selection

• The efficiency of quick sort strongly depends on the selection of the pivot
• Some solutions:
• Select the pivot using a random number
(this is an example of a randomized algorithm)
• Use the median of three values

# Implementation Improvements

• Comparison of indices
→ Use a sentinel to remove one comparision
• Stack overflow for deep recursion
→ When splitting into two, use recursion for the smaller part, and tail recursion or a loop for the larger part
• Low efficiency of quicksort for short arrays/parts
→ For parts smaller than a given size, change to a simple sort algorithm
→ With insertion sort, it is possible to do this in one go at the very end
(this need care when testing)
→ Quicksort gets about 10% faster if change is made at an array size of about 10
• Double keys
→ Split in three rather than two

Ruby pseudocode/implementation (excluding split in three): `quick_sort` in 7qsort.rb

# Comparing Sorting Algorithms using Animation

• Uses Web technology: SVG (2D vector graphics) and JavaScript
• Uses special library (Narrative JavaScript) for timing adjustments
• Comparisons are shown in yellow (except for insertion sort)、exchanges in blue

Watch animation: sort.svg

# Stable Sorting

• Definition: A sorting algorithm is stable if it retains the original order for two data items with the same key value
• Used for sorting with multiple criteria (e.g. sort by year and prefecture):
• First, sort using the lower priority criterion (e.g. prefecture)
• Then, sort using the higher priority criterion (e.g. year)
• The simple sorting algorithms and merge sort can easily be made stable
• Heap sort and quicksort are not stable
→ Solution 1: Sort multiple criteria together
→ Solution 2: Use the original position as a lower priority criterion

# Sorting in C and Ruby

• Sorting is provided as a library function or method
• Implementation is often based on quicksort
• Comparison of data items depends on type of data and purpose of sorting
→ Use comparison function as a function argument
• If comparison is slow
→ Precompute a value that can be used for sorting
• If exchange of data items is slow (e.g. very large data items)
→ Sort/exchange references (pointers) only

# C's `qsort` Function

```void qsort(
void *base,        // 配列のスタート
size_t nel,        // 配列の要素数
size_t width,      // 要素の大きさ
int (*compar)(     // 比較関数
const void *,
const void *)
);```

# Ruby's `Array#sort`

(`Klass#method` denotes instance method `method` of class `Klass`)

`array.sort` uses `<=>` for comparison

`array.sort { |a, b| a.length <=> b.length }`
This example sorts (e.g. strings) by length

The code block (between `{` and `}`) is used as a comparison function

# Ruby's `<=>` Operator

(also called spaceship operator)

 Relationship between `a` and `b` return value of `a <=> b` `a < b` `-1` (or other integer smaller than 0) `a = b` `0` `a > b` `+1` (or other integer greater than 0)

# Ruby's `Array#sort_by`

`array.sort_by { |str| str.length }`

(sorting strings by length)

```array.sort_by { |stu| [stu.year, stu.prefecture] }```

(sorting students by year and prefecture

This calculates the values for the sort criterion for each array element in advance

# Summary

• Quicksort is another application of divide and conquer
• Quicksort is a very famous algorithm, and a good example to learn about algorithms and their implementation
• Quicksort has been carefully researched and widely implemented and used
• Quicksort is a classical example of the importance of average time complexity
• Quicksort is our first example of a randomized algorithm
• Sorting based on pairwise comparison is Θ(n log n)

# Preparation for Next Time

• Think about inputs for which `conceptual_quick_sort` will fail
• Watch the animations carefully (>20 times) to deepen your understanding of sorting algorithms
• Complete and submit the report "Manual Sorting"
Deadline: November 4th, 2015 (Wednesday), 19:00

# Glossary

quicksort
クイックソート
partition

partitioning element (pivot)

worst case complexity (running time)

best case complexity (running time)

average complexity (running time)

randomized algorithm
ランドム化アルゴリズム
median

decision tree

tail recursion

in one go

stable sorting

criterion (plural criteria)

block
ブロック