Divide and Conquer, Mergesort

(分割統治法、マージソート)

Data Structures and Algorithms

6th lecture, November 3, 2016

http://www.sw.it.aoyama.ac.jp/2016/DA/lecture6.html

Martin J. Dürst

© 2009-16 Martin J. Dürst 青山学院大学

Today's Schedule

• Summary of last lecture, leftovers, homework
• The importance of sorting
• Simple sorting algorithms: Bubble sort, selection sort, insertion sort
• Loops in Ruby
• Divide and conquer
• Merge sort
• Summary

Summary of Last Lecture

• A priority queue is an important ADT
• Implementing a priority queue with an array or a linked list is not efficient
• In a heap, each parent has higher priority than its children
• In a heap, the highest priority item is at the root of a complete binary tree
• A heap is an efficient implementation of a priority queue
• Many data structures are defined using invariants
• The operations heapify_up and heapify_down are used to restore heap invariants
• A heap can be used for sorting, using heap sort

Leftovers from Last Lecture

How to use irb; other kinds of heaps; homework (except report)

Report: Manual Sorting

Deadline: November 9, 2016 (Wednesday), 19:00.

Importance of Sorting

• In most cases of information processing, sorting is needed before output
• As a preparation for search (example: binary search, index in databases, ...)
• To group related items together
• As a component in more complicated algorithms

Simple Sorting Algorithms

• Bubble sort
• Selection sort
• Insertion sort

Bubble Sort

• Compare neigboring items,
exchange if not in order
• Pass through the data from start to end
• The number of passes needed to fully order the data is O(n)
• The number of comparisons (and potential exchanges) in each pass is O(n)
• Time complexity is O(n2)

Possible improvements:

• Alternatively pass back and forth
• Remember the place of the last exchange to limit the range of exchanges

Pseudocode/example implementation: 6sort.rb

Various Ways to Loop in Ruby

• Looping a fixed number of times
• Looping with an index
• Many others, ...

Looping a Fixed Number of Times

Syntax:

```number.times do
# some work
end```

Example:

```(length-1).times do
# bubble
end```

Looping with an Index

Syntax:

```start.upto end do |index|
# some work using index
end```

Example:

```0.upto(length-2) do |i|
# select
end```

Selection Sort

• Find the smallest element, and exchange it with the first element
• Continue finding the smallest and exchanging it with the first element of the rest of the array
• The area at the start of the array that is fully sorted will get larger and larger
• Number of exchanges: O(n)
• Work needed to find smallest element: O(n)
• Overall time complexity: O(n2)

Details of Time Complexity for Selection Sort

• The number of comparisons to find the minimum of n elements is n-1
• The size of the unsorted area initially is n elements, at the end 2 elements
• i=2n n-i+1 = n-1 + n-2 + ... + 2 + 1 = n · (n-1) / 2 = O(n2)

Insertion Sort

• View the first element of the array as sorted (sorted area of length 1)
• Take the second element of the array and insert it at the right place in to the sorted area
→sorted area of length 2
• Continue with the following elements, making the sorted area longer and longer
• To insert an element into the already sorted area,
move any elements greater than the new element to the right by one
• The (worst-case) time complexity is O(n2)
• Insertion sort is fast if the data is already (almost) sorted
• Insertion sort can be used if data items are added into an already sorted array

Improvement: Using a sentinel: Add a first data item that is guaranteed to be smaller than any real data items. This saves one index check.

Details of Time Complexity for Insertion Sort

• The number of elements to be inserted is n
• The maximum number of comparisions/moves when inserting data item number i is i-1
• i=2n i-1 = 1 + 2 + ... + n-2 + n-1 = n · (n-1) / 2 = O(n2)

Comparison between Selection Sort and Insertion Sort

Selection Sort Insertion Sort
handling first item O(n) O(1)
handling last item O(1) O(n)
initial area perfectly sorted sorted, but some items still missing
rest of data greater than any items in sorted area any size possible
advantage only O(n) exchanges fast if (almost) sorted
disadvantage always same speed may get slower if many moves needed

Divide and Conquer

(Latin: divide et impera)

• Term of military strategy and tactics
• Problem solving method:
Solve a problem by dividing it into smaller problems
• Important principle for programming in general
• Important design principle for algorithms and data structures

Merge Sort (without recursion)

• Split the items to be sorted into two halves
• Separately sort each half
• Combine the two halfs by merging them

Merge

• Two-way merge and multi-way merge
• Create one sorted sequence from two or more sorted sequences
• Repeatedly select the smaller/smallest item from among the input sequences
• When only one sequence is left, copy the rest of the items

Merge Sort

• Recursively split the items to be sorted into two halves
• Parts with only 1 item are sorted by definition
• Combine the parts (in the reverse order of splitting them) by merging

Time Complexity of Merge Sort

• Split is possible in O(1) time (index calculation only)
• Merging n items takes O(n) time
• Recurrence:
M(n) = 1 + 2 M(n/2) + n (more exactly, M(⌈n/2⌉) + M(⌊n/2⌋) rather than M(n/2))
M
(1) = 0
• Discovering a pattern by repeated substitution:
M(n) = 1 + 2 M(n/2) + n =
= 1 + 2 (1+ 2 M(n/2/2) + n/2) + n =
= 1 + 2 + 4 M(n/4) + n + n =
= 1 + 2 + 4 (1 + 2 M(n/4/2) + n/4) + n + n =
= 1 + 2 + 4 + 8 M(n/8) + n + n + n =
= 2k - 1 + 2k M(n/2k) + kn
• Using M(1) = 0: n/2k = 1 ⇒ k = log2 n
• M(n) = n - 1 + n log2 n
• Asymptotic time complexity: O(n log n)

Properties of Merge Sort

• Because merging two arrays means copying all elements, we need twice as much memory as the original data
• Merge sort is better suited for external memory than for internal memory
• External memory:
• Punchcards
• Magnetic tapes
• Hard disks

Summary

• Simple sorting algorithms:
• Bubble sort
• Selection sort
• Insertion sort
• Simple sorting algorithms are all O(n2)
• Merge sort is based on divide and conquer
• Merge sort is O(n log n) (same as heap sort)

Preparation for Next Time

• Using the sorting cards, play with your friends to see which algorithms may be faster.
(Example: Two players, one player uses selection sort, one player uses insertion sort, who wins?)
• Work on Report: Manual Sorting

Glossary

bubble sort
バブル整列法、バブルソート
selection sort

insertion sort

sentinel

index

divide and conquer

military strategy

tactics

design principle

merge sort
マージソート
merge

2-way merge
2 ウェイ併合
multiway merge
マルチウェイ併合
external memory

internal memory

punchcard
パンチカード
magnetic tape

hard disk
ハードディスク