(クイックソート、平均計算量)

http://www.sw.it.aoyama.ac.jp/2015/DA/lecture7.html

© 2009-15 Martin J. Dürst 青山学院大学

- Summary of last lecture
- About the Manual Sorting report
- Quicksort:

Concept, implementation, optimizations - Average time complexity
- Sorting in C, Ruby, ...
- Comparing sorting algorithms using animation

- Sorting is a very important operation for Information Technology
- Simple sorting algorithms are all
`O`(`n`^{2}) - Merge (
`O`(`n`log`n`)) sort needs double memory - Heap sort (
`O`(`n`log`n`)) uses lots of comparisons and exchanges

Using quicksort as an example, understand

- Move from algorithmic concept to efficient implementation
- Average time complexity

(quicksort)

- Invented by C. A. R. Hoare in 1960
- Researched in great detail
- Extremely widely used

- Heap sort: The highest priority item of the overall tree is the highest priority item of the two subtrees
- Merge sort: Split into equal-length parts, recurse, merge

- Quicksort: Use an arbitrary boundary element to partition the data, recursively

- Select one element as the
*partitioning element*(`pivot`) - Exchange elements so that

elements smaller than the pivot go to the left, and

elements larger than the pivot go to the right - Apply quicksort recursively to the data on the left and right sides of the pivot

Ruby pseudocode/implementation: `conceptual_quick_sort`

in 7qsort.rb

- Use the rightmost element as the pivot
- Starting from the right, find an element smaller than the pivot
- Starting from the left, find an element larger than the pivot
- Exchange the elements found in steps 2. and 3.
- Repeat steps 2.-4. until no further exchanges are needed
- Exchange the pivot with the element in the middle
- Recurse on both sides

Ruby pseudocode/implementation: `simple_quick_sort`

in 7qsort.rb

- What happens if the largest (or the smallest) element is always choosen as the pivot?
- The time complexity is Q
_{w}(`n`) =`n`+ Q_{w}(`n`-1) = Σ^{n}_{i=1}`i`

⇒`O`(`n`^{2}) - This is the
*worst case complexity*(*worst case running time*) for quick sort - This is the same complexity as the simple sorting algorithms
- This worst case can easily happen if the input is already sorted

- Q
_{B}(`n`) =`n`+ 1 + 2 Q_{B}(`n`/2) - Q
_{B}(1) = 0 - Same as merge sort
- ⇒
`O`(`n`log`n`^{}) - Unclear whether this is relevant

For most algorithms, worst case complexity is very important, and best case complexity is mostly irrelevant. But there are exceptions.

- Assumption: All permutations of the input values have the same probability
- Q
_{A}(`n`) =`n`+ 1 + 1/`n`Σ_{1≤k≤n}(Q_{A}(`k`-1)+Q_{A}(`n`-`k`)) - Q
_{A}(1) = 0

Q_{A}(`n`) = `n` + 1 + 1/`n`
Σ_{1≤k≤n}
(Q_{A}(`k`-1)+Q_{A}(`n`-`k`))

Q_{A}(0) + ... + Q_{A}(`n`-2) +
Q_{A}(`n`-1) =

= Q_{A}(`n`-1) + Q_{A}(`n`-2) + ... +
Q_{A}(0)

Q_{A}(`n`) = `n` + 1 + 2/`n`
Σ_{1≤k≤n}
Q_{A}(`k`-1)

`n` Q_{A}(`n`) = `n` (`n` + 1) +
2` Σ`_{1≤k≤n}
Q_{A}(`k`-1)

(`n`-1) Q_{A}(`n`-1) = (`n`-1)
`n` + 2`
Σ`_{1≤k≤n-1}
Q_{A}(`k`-1)

`n` Q_{A}(`n`) - (`n`-1)
Q_{A}(`n`-1) = `n` (`n`+1) -
(`n`-1) `n` + 2 Q_{A}(`n`-1)

`n` Q_{A}(`n`) = (`n`+1)
Q_{A}(`n`-1) + 2`n`

`Q`_{A}(`n`)/(`n`+1) =

= Q_{A}(`n`-1)/`n` + 2/(` n` + 1)
=

= Q

= Q

Q_{A}(`n`)/(`n`+1) ≈ 2
Σ_{1≤k≤n} 2/`k` ≈
2∫_{1}` ^{n}` 1/

Q_{A}(`n`) ≈ 2`n` ln `n` ≈ 1.39
`n` log_{2} `n`

⇒ `O`(`n` log `n`^{})

⇒ The number of comparisons on average is ~1.39 times the optimal number of comparisons in an optimal decision tree

Question: What is the complexity of sorting (as a problem)?

- Most sorting algorithms are
`O`(`n`log`n`) (except for simple sorting algorithms) - The basic operations for sorting are comparison and movement
- For
`n`data items, the number of different sorting orders (permutations) is`n`! - With each comparision, in the best case, we can reduce the number of sorting orders to half
- The mimimum number of comparisions necessary for sorting is log (
`n`!) ≈ O(`n`log`n`)

- The efficiency of quick sort strongly depends on the selection of the pivot
- Some solutions:
- Select the pivot using a random number

(this is an example of a*randomized algorithm*) - Use the median of three values

- Select the pivot using a random number

- Comparison of indices

→ Use a sentinel to remove one comparision

- Stack overflow for deep recursion

→ When splitting into two, use recursion for the smaller part, and tail recursion or a loop for the larger part - Low efficiency of quicksort for short arrays/parts

→ For parts smaller than a given size, change to a simple sort algorithm

→ With insertion sort, it is possible to do this in one go at the very end

(this need care when testing)

→ Quicksort gets about 10% faster if change is made at an array size of about 10 - Double keys

→ Split in three rather than two

Ruby pseudocode/implementation (excluding split in three):
`quick_sort`

in 7qsort.rb

- Uses Web technology: SVG (2D vector graphics) and JavaScript
- Uses special library (Narrative JavaScript) for timing adjustments
- Comparisons are shown in yellow (except for insertion sort)、exchanges in blue

Watch animation: sort.svg

- Definition: A sorting algorithm is
`stable`if it retains the original order for two data items with the same key value - Used for sorting with multiple criteria (e.g. sort by year and
prefecture):

- First, sort using the lower priority criterion (e.g. prefecture)
- Then, sort using the higher priority criterion (e.g. year)

- The simple sorting algorithms and merge sort can easily be made stable
- Heap sort and quicksort are not stable

→ Solution 1: Sort multiple criteria together

→ Solution 2: Use the original position as a lower priority criterion

- Sorting is provided as a library function or method
- Implementation is often based on quicksort
- Comparison of data items depends on type of data and purpose of
sorting

→ Use comparison function as a function argument - If comparison is slow

→ Precompute a value that can be used for sorting - If exchange of data items is slow (e.g. very large data items)

→ Sort/exchange references (pointers) only

`qsort`

Functionvoid qsort( void *base, // 配列のスタート size_t nel, // 配列の要素数 size_t width, // 要素の大きさ int (*compar)( // 比較関数 const void *, const void *) );

`Array#sort`

(`Klass#method`

denotes instance method `method`

of
class `Klass`

)

`array.sort`

uses `<=>`

for comparison

`array.sort { |a, b| a.length <=> b.length }`

This example sorts (e.g. strings) by length

The code block (between `{`

and `}`

) is used as a
comparison function

`<=>`

Operator(also called spaceship operator)

Relationship between `a` and `b` |
return value of `a <=> b` |

`a < b` |
`-1` (or other integer smaller than 0) |

`a = b` |
`0` |

`a > b` |
`+1` (or other integer greater than 0) |

`Array#sort_by`

`array.sort_by { |str| str.length }`

(sorting strings by length)

```
array.sort_by { |stu| [stu.year, stu.prefecture]
}
```

(sorting students by year and prefecture

This calculates the values for the sort criterion for each array element in advance

- Quicksort is another application of divide and conquer
- Quicksort is a very famous algorithm, and a good example to learn about algorithms and their implementation
- Quicksort has been carefully researched and widely implemented and used
- Quicksort is a classical example of the importance of average time complexity
- Quicksort is our first example of a randomized algorithm
- Sorting based on pairwise comparison is
`Θ`(`n`log`n`)

- Think about inputs for which
`conceptual_quick_sort`

will fail - Watch the animations carefully (>20 times) to deepen your understanding of sorting algorithms
- Complete and submit the report "Manual Sorting"

Deadline: November 4th, 2015 (Wednesday), 19:00

- quicksort
- クイックソート
- partition
- 分割
- partitioning element (pivot)
- 分割要素
- worst case complexity (running time)
- 最悪時の計算量
- best case complexity (running time)
- 最善時の計算量
- average complexity (running time)
- 平均計算量
- randomized algorithm
- ランドム化アルゴリズム
- median
- 中央値
- decision tree
- 決定木
- tail recursion
- 末尾再帰
- in one go
- 一括
- stable sorting
- 安定な整列法
- criterion (plural criteria)
- 基準
- block
- ブロック