(分割統治法、マージソート)

http://www.sw.it.aoyama.ac.jp/2015/DA/lecture6.html

© 2009-15 Martin J. Dürst 青山学院大学

- Summary of last lecture, leftovers, homework
- The importance of sorting
- Simple sorting algorithms: Bubble sort, selection sort, insertion sort
- Loops in Ruby
- Divide and conquer
- Merge sort
- Summary

Heap sort

- A
*priority queue*is an important ADT - Implementing a priority queue with an array or a linked list is not efficient
- In a
*heap*, each parent has higher priority than its children - In a heap, the highest priority item is at the root of a
*complete binary tree* - A heap is an efficient implementation of a priority queue
- Many data structures are defined using invariants
- The operations heapify_up and heapify_down are used to restore heap invariants
- A heap can be used for sorting, using
*heap sort*

November ~~3rd~~4th, 2015 (Wednesday), 19:00

[now is the best time to ask questions about this report]

`heapify_all`

- In most cases of information processing, sorting is needed before output
- As a preparation for search (Example: binary search, databases, ...)
- To group related items together
- As a component in more complicated algorithms

- Bubble sort
- Selection sort
- Insertion sort

- Compare neigboring items,

exchange if not in order - Pass through the data from start to end
- The number of passes needed to fully order the data is
`O`(`n`) - The number of comparisons (and potential exchanges) in each pass is
`O`(`n`) - Time complexity is
`O`(`n`^{2})

Possible improvements:

- Alternatively pass back and forth
- Remember the place of the last exchange to limit the range of exchanges

Pseudocode/example implementation: 6sort.rb

- Looping a fixed number of times
- Looping with an index
- Many others, ...

Syntax:

number.times do# some workend

Example:

(length-1).times do# bubbleend

Syntax:

start.uptoenddo |index|# some work using indexend

Example:

0.upto(length-2) do |i| # select end

- Start with an unsorted array
- Find the smallest element, and exchange it with the first element
- Continue finding the smallest and exchanging it with the first element of the rest of the array
- The area at the start of the array that is fully sorted will get larger and larger
- Number of exchanges:
`O`(`n`) - Work needed to find smallest element:
`O`(`n`) - Overall time complexity:
`O`(`n`^{2})

- The number of comparisons to find the minimum of
`n`elements is`n`-1 - The size of the unsorted area initially is
`n`elements, at the end 2 elements - ∑
_{i=2}^{n}`n`-`i`+1 =`n`-1 +`n`-2 + ... + 2 + 1 =`n`· (`n`-1) / 2 =

- Start with an unsorted array
- View the first element of the array as sorted (sorted area of length 1)
- Take the second element of the array and insert it at the right place in
to the sorted area

→sorted area of length 2 - Continue with the following elements, making the sorted area longer and longer
- To insert an element into the already sorted area,

move any elements greater than the new element to the right by one - The (worst-case) time complexity is
`O`(`n`^{2}) - Insertion sort is fast if the data is already (almost) sorted
- Insertion sort can be used if data items are added into an already sorted array

Improvement: Using a sentinel: Add a first data item that is guaranteed to be smaller than any real data items. This saves one index check.

- The number of elements to be inserted is
`n` - The maximum number of comparisions/moves when inserting data item number
`i`is`i`-1 - ∑
_{i=2}^{n}`i`-1 = 1 + 2 + ... +`n`-2 +`n`-1 =`n`· (`n`-1) / 2 =`O`(`n`^{2}`)`

Selection Sort | Insertion Sort | |
---|---|---|

handling first item | O(n) |
O(1) |

handling last item | O(1) |
O(n) |

initial area | perfectly sorted | sorted, but some items still missing |

rest of data | greater than any items in sorted area | any size possible |

advantage | only O(n) exchanges |
fast if (almost) sorted |

disadvantage | always same speed | may get slower if many moves needed |

(Latin: divide et impera)

- Term of military strategy and tactics
- Problem solving method:

Solve a problem by dividing it into smaller problems - Important principle for programming in general
- Important design principle for algorithms and data structures

- Split the items to be sorted into two halves
- Separately sort each half
- Combine the two halfs by merging them

- Two-way merge and multi-way merge
- Create one sorted sequence from two or more sorted sequences
- Repeatedly select the smaller/smallest item from among the input sequences
- When only one sequence is left, copy the rest of the items

- Recursively split the items to be sorted into two halves
- Parts with only 1 item are sorted by definition
- Combine the parts (in the reverse order of splitting them) by merging

- Split is possible in
`O`(1) time (index calculation only) - Merging
`n`items takes`O`(`n`) time - Recurrence:

`M`(`n`) = 1 + 2`M`(`n`/2) +`n`(more exactly,`M`(⌈`n`/2⌉) +`M`(⌊`n`/2⌋) rather than`M`(`n`/2))(1) = 0

M - Discovering a pattern by repeated substitution:

`M`(`n`) = 1 + 2`M`(`n`/2) +`n`=

= 1 + 2 (1+ 2`M`(`n`/2/2) +`n`/2) +`n`=

= 1 + 2 + 4`M`(`n`/4) +`n`+`n`=

= 1 + 2 + 4 (1 + 2`M`(`n`/4/2) +`n`/4) +`n`+`n`=

= 1 + 2 + 4 + 8`M`(`n`/8) +`n`+`n`+`n`=

= 2^{k}- 1 + 2^{k}`M`(`n`/2^{k}) +`k``n` - Using
`M`(1) = 0:`n`/2^{k}= 1 ⇒`k`= log_{2}`n` `M`(`n`) =`n`- 1 +`n`log_{2}`n`- Asymptotic time complexity:
`O`(`n`log`n`)

- Because mergin two arrays means copying all elements, we need twice as much memory as the original data
- Merge sort is better suited for external memory than for internal memory
- External memory:
- Punchcards
- Magnetic tapes
- Hard disks

- Simple sorting algorithms:
- Bubble sort
- Selection sort
- Insertion sort

- Simple sorting algorithms are all
`O`(`n`^{2}) - Merge sort is based on
`divide and conquer` - Merge sort is
`O`(`n`log`n`) (same as heap sort)

- Cut the sorting Cards and replay the
algoriths you got to know in this lectur

(Example: Two players, one player uses selection sort, one player uses insertion sort, who wins?) - Work on Report: Manual Sorting; be ready to ask questions next time

- bubble sort
- バブル整列法、バブルソート
- selection sort
- 選択整列法、選択ソート
- insertion sort
- 挿入整列法、挿入ソート
- sentinel
- 番兵
- index
- 指数
- divide and conquer
- 分割統治法
- military strategy
- 軍事戦略
- tactics
- 戦術
- design principle
- 設計方針
- merge sort
- マージソート
- merge
- 併合
- 2-way merge
- 2 ウェイ併合
- multiway merge
- マルチウェイ併合
- external memory
- 外部メモリ
- internal memory
- 内部メモリ
- punchcard
- パンチカード
- magnetic tape
- 磁気テープ
- hard disk
- ハードディスク