(分割統治法、マージソート)

http://www.sw.it.aoyama.ac.jp/2018/DA/lecture6.html

© 2009-18 Martin J. Dürst 青山学院大学

- Leftovers, summary of last lecture, homework
- The importance of sorting
- Simple sorting algorithms: Bubble sort, selection sort, insertion sort
- Loops in Ruby
- Divide and conquer
- Merge sort
- Summary

- A
*priority queue*is an important ADT - Implementing a priority queue with an array or a linked list is not efficient
- In a
*heap*, each parent has higher priority than its children - In a heap, the highest priority item is at the root of a
*complete binary tree* - A heap is an efficient implementation of a priority queue
- Many data structures are defined using invariants
- The operations heapify_up and heapify_down are used to restore heap invariants
- A heap can be used for sorting, using
*heap sort*

(no need to submit, but bring the sorting cards)

- Complete the report (deadline October 24, 2018 (Wednesday), 19:00)
- Cut the sorting cards, and bring them with you to the next lecture
- Shuffle the sorting cards, and try to find a fast way to sort them. Play against others (who is fastest?).
- Find five different applications of sorting (no need to submit)
- Implement joining two (normal) heaps (no need to submit)
- Add the items of the smaller heap to the bigger heap (6heapmerge.rb)
- Use a binomial heap (binomial queue)

- Think about the time complexity of creating a heap:

`heapify_down`

will be called`n`/2 times and may take up to`O`(log`n`) each time.

Therefore, one guess for the overall time complexity is`O`(`n`log`n`).

However, this upper bound can be improved by careful analysis.

(no need to submit)

- 218341.368 seconds (⇒about 61 hours)
- 6
^{1010}·10^{3·1010}(units? way too big) `O`(40000) (how many seconds could this be)- Calulation of actual time backwards from big-
`O`notation: 1second/operation,`n`=5000,`O`(`n`^{2}) ⇒ 25'000'000 seconds? - A
`O`(`n`) algorithm (example: "5 seconds per page") - For 12 people, having only one person work towards the end of the algorithm
- For humans, binary sorting is constraining (sorting into 3~10 parts is better)
- Using bubble sort (868 days without including breaks or sleep)
- Prepare 10
^{10}boxes (problem: space, cost, distance for walking) - Forgetting time for preparation, cleanup, breaks,...
- Submitting just a program
- Report too short

`heapify_all`

- How
`heapify_all`

works: Apply`heapify_down`

starting with lower layers - The complexity of
`heapify_down`

is`O`(log`n`) or lower `heapify_all`

may be`Θ`(`n`log`n`), but we should check- Analysis for each layer:
Layer Size Operations of `heapify_down`

Operations per layer 0 (bottom) `n`/20 (unnecessary) 0· `n`/21 `n`/41 1· `n`/42 `n`/82 2· `n`/83 `n`/163 3· `n`/16`i``n`/2^{i+1}`i``i`·`n`/2^{i+1}Total: ∑

_{0≤i<log2 n}`i`·`n`/2^{i+1}=`n`/2 ∑_{0≤i<log2 n}`i`/2^{i}≈`n`/2·2 =`n`∈`O`(`n`) [6heapsum.rb] - Conclusions:

- Time complexity can be lower than suggested by simple guessing
- It is possible to build a heap directly in
`O`(`n`) time, but

adding items one-by-one will take`O`(`n`log`n`) in the worst case

∑_{0≤i≤∞} `i`/2^{i}
=

= 0/1 + 1/2 + 2/4 + 3/8 + 4/16 + 5/32 + 6/64 + 7/128 + 8/256 + 9/512 +
...

< 1/2 + 2/4 + 4/8 + 4/8 + 8/32 + 8/32 + 8/32 +
8/32 + 16/512 + ...

= 1/2 + 1·2/4 +2·4/8 + 4·8/32 +
8·16/512 + 16·32/131072 + ...

= 1/2 + 2^{1}/2^{2} +
2^{3}^{}/2^{3} + 2^{5}/2^{5} +
2^{7}/2^{9} + 2^{9}/2^{17} +
2^{11}/2^{33} + 2^{13}/2^{65} + ...

= 1/2 +
∑_{0≤k≤∞}2^{(1+2k)-(2k+1)}

= 1/2 +
∑_{0≤k≤∞}2^{2k-2k}

< 3.254

- Make output easy to understand and check (search by humans)
- Group related items together
- Preparation for search (example: binary search, index in databases, ...)
- Use as component in more complicated algorithms

- Bubble sort
- Selection sort
- Insertion sort

- Compare neigboring items,

exchange if not in order - Pass through the data from start to end, repeatedly
- The number of passes needed to fully order the data is
`O`(`n`) - The number of comparisons (and potential exchanges) in each pass is
`O`(`n`) - Time complexity is
`O`(`n`^{2})

Possible improvements:

- Alternatively pass back and forth
- Remember the place of the last exchange to limit the range of exchanges
- Work in parallel

Pseudocode/example implementation: 6sort.rb

- Looping a fixed number of times
- Looping with an index
- Many others, ...

Syntax:

number.times do# some workend

Example:

(length-1).times do# bubbleend

Syntax:

start.uptoenddo |index|# some work using indexend

Example:

0.upto(length-2) do |i|# selectend

- Start with an unsorted array
- Find the smallest element, and exchange it with the first element
- Continue finding the smallest and exchanging it with the first element of the rest of the array
- The area at the start of the array that is fully sorted will get larger and larger
- Number of exchanges:
`O`(`n`) - Work needed to find smallest element:
`O`(`n`) - Overall time complexity:
`O`(`n`^{2})

- The number of comparisons to find the minimum of
`n`elements is`n`-1 - The size of the unsorted area initially is
`n`elements, at the end 2 elements - ∑
_{i=2}^{n}`n`-`i`+1 =`n`-1 +`n`-2 + ... + 2 + 1 =`n`· (`n`-1) / 2 =`O`(`n`^{2}`)`

- Start with an unsorted array
- View the first element of the array as sorted (sorted area of length 1)
- Take the second element of the array and insert it at the right place in
to the sorted area

→sorted area of length 2 - Continue with the following elements, making the sorted area longer and longer
- To insert an element into the already sorted area,

move any elements greater than the new element to the right by one - The (worst-case) time complexity is
`O`(`n`^{2}) - Insertion sort is fast if the data is already (almost) sorted
- Insertion sort can be used if data items are added into an already sorted array

Improvement: Using a sentinel: Add a first data item that is guaranteed to be smaller than any real data items. This saves one index check.

- The number of elements to be inserted is
`n` - The maximum number of comparisions/moves when inserting data item number
`i`is`i`-1 - ∑
_{i=2}^{n}`i`-1 = 1 + 2 + ... +`n`-2 +`n`-1 =`n`· (`n`-1) / 2 =`O`(`n`^{2}`)`

Selection Sort | Insertion Sort | |
---|---|---|

handling first item | O(n) |
O(1) |

handling last item | O(1) |
O(n) |

initial area | perfectly sorted | sorted, but some items still missing |

rest of data | greater than any items in sorted area | any size possible |

advantage | only O(n) exchanges |
fast if (almost) sorted |

disadvantage | always same speed | may get slower if many moves needed |

(Latin: divide et impera)

- Term of military strategy and tactics
- Problem solving method:

Solve a problem by dividing it into smaller problems - Important principle for programming in general

(e.g. split a bigger program into various functions) - Important design principle for algorithms and data structures

- Split the items to be sorted into two halves
- Separately sort each half
- Combine the two halfs by merging them

- Two-way merge and multi-way merge
- Create one sorted sequence from two or more sorted sequences
- Repeatedly select the smaller/smallest item from the input sequences
- When only one sequence is left, copy the rest of the items

- Recursively split the items to be sorted into two halves
- Parts with only 1 item are sorted by definition
- Combine the parts (in the reverse order of splitting them) by merging

- Split is possible in
`O`(1) time (index calculation only) - Merging
`n`items takes`O`(`n`) time - Recurrence:
(1) = 0

M

`M`(`n`) = 1 + 2`M`(`n`/2)^{(*)}+`n`(1) = 0 - Discovering a pattern by repeated substitution:

`M`(`n`) = 1 + 2`M`(`n`/2) +`n`=

= 1 + 2 (1+ 2`M`(`n`/2/2) +`n`/2) +`n`=

= 1 + 2 + 4`M`(`n`/4) +`n`+`n`=

= 1 + 2 + 4 (1 + 2`M`(`n`/4/2) +`n`/4) +`n`+`n`=

= 1 + 2 + 4 + 8`M`(`n`/8) +`n`+`n`+`n`=

= 2^{k}- 1 + 2^{k}`M`(`n`/2^{k}) +`k``n` - Using
`M`(1) = 0:`n`/2^{k}= 1 ⇒`k`= log_{2}`n` `M`(`n`) =`n`- 1 +`n`log_{2}`n`- Asymptotic time complexity:
`O`(`n`log`n`)

(*) more exactly, `M`(⌈`n`/2⌉) +
`M`(⌊`n`/2⌋) rather than 2
`M`(`n`/2)

- Merging means copying all elements

⇒ We need twice the memory of the original data - Merge sort is better suited for external memory than for internal memory
- External memory:
- Punchcards
- Magnetic tapes
- Hard disks (HD)
- Solid state drives (SSD)

- Simple sorting algorithms:
- Bubble sort (easiest to implement)
- Selection sort (only
`O`(`n`) data exchanges) - Insertion sort (fast when already (almost) sorted)

- Simple sorting algorithms are all
`O`(`n`^{2}) - Merge sort is based on
`divide and conquer` - Merge sort is
`O`(`n`log`n`) (same as heap sort)

- Using the sorting cards, play with your
friends to see which algorithms may be faster.

(Example: Two players, one player uses selection sort, one player uses insertion sort, who wins?)

- bubble sort
- バブル整列法、バブルソート
- selection sort
- 選択整列法、選択ソート
- insertion sort
- 挿入整列法、挿入ソート
- sentinel
- 番兵
- index
- 指数
- divide and conquer
- 分割統治法
- military strategy
- 軍事戦略
- tactics
- 戦術
- design principle
- 設計方針
- merge sort
- マージソート
- merge
- 併合
- 2-way merge
- 2 ウェイ併合
- multiway merge
- マルチウェイ併合
- external memory
- 外部メモリ
- internal memory
- 内部メモリ
- punchcard
- パンチカード
- magnetic tape
- 磁気テープ
- hard disk
- ハードディスク