データ構造とアルゴリズム - 期末試験

選択整列法と挿入整列法 / Selection Sort and Insertion Sort (14 点)

選択整列法と挿入整列法は似ている点も違う点も多い。入力のデータがばらばらで、整列全体にかかる時間のちょうど半分当たりの時点を考える。その時点でそれぞれのアルゴリズムについて、データ全体のどの割合がどのように整列されているかとその理由を文章と図両方を使って説明しなさい。
Selection Sort and Insertion Sort have many similarities and differences. Under the assumtion that the input is in random order, let's look at the point in time exactly half of the overall time needed for sorting. For that point in time, for each algorithm, describe what fraction of the data is sorted, and how, and justify your conclusions using both text and diagrams.

選択整列法 / Selection Sort: 整列されてない部分から一番小さいものを選ぶ。その部分はどんどん小さくなるので、最初はあまり進まない。半分の時間ではデータの 1-1/√2 分のところまでしか進まない。しかしその部分が完全に整列され、残りは全てそれより大きいものばかりしか残ってない。

  [図欠落]

挿入整列法 / Insertion Sort: 整列されてない部分から項目を選んで、整列された部分に挿入するので、整列された部分が大きくなるにつれ作業が遅くなる。したがって半分の時間ではデータの 1/√2 分までに既に整列されているが、その整列は完璧なものではなく、そこにまだ整列されてない項目が更に導入される可能性もある。

  [図欠落]

整列と平均計算量 / Sorting and Average Time Complexity (10 点)

平均計算量と最善時の計算量が同じで、最悪時の計算量と違う整列法について、次の問いに答えなさい。/ Answer the following questions for the sorting algorithm where best case complexity and average complexity are the same, but different from worst case complexity.

アルゴリズムの名前 / Name of the algorithm: Quicksort

平均計算量 / Average time complexity: O(n log n)

最悪時の計算量 / Worst case time complexity: O(n²)

最悪時の計算量を避けるための実装上の注意の詳細
Details about precautions to take to avoid worst case time complexity (4 点):

最悪時の計算量を避けるため、分割要素の選択が大事。乱数で決めるか、三つの項目 (例: 最初、最後、真中) の中央値にする方法がある。

最悪時の計算量を無視できる理由 / Reason the worst case time complexity can be ignored (3 点):

平均計算量の標準偏差が 0.65n で、それに比べて例えば 1,000,000 項目の場合の n log n は 20n ぐらい。その二倍になるには標準偏差の30倍が必要が、その可能性が極限に低い。

ハッシュ表の拡張 / Expansion of Hash Table (16 点)

あるハッシュ表に格納されるデータ項目の数が予想できないため、拡張が必要になる場合がある。この拡張による項目の追加の計算量への影響について、拡張の仕組みを含め細かく説明しなさい。
Because it is impossible to predict how many data items will be stored in a hash table, it may have to be expanded. Please explain in detail how this expansion affects the time complexity of insertion, including details of the mechanism of expansion (5 点)

拡張には新しいハッシュ表を作って、項目を移動する必要があるので、項目数 n の場合、拡張に O(n) がかかる。しかし、拡張は項目の追加ごとのではなく、項目の数が倍になった時点で行う。合計 n 個の項目を追加するには合計拡張にかかる時間は 1 + 2 + 4 + ... + n/4 + n/2 + n で表せ、O(n) に収まる。これを項目数で割ると、1項目の追加の計算量は平均で O(1)。

項目追加の計算量の分析に使った方法の名前を書きなさい。/ Give the name of the method that you used to analyse the time complexity of inserting an item (1 点)

償却分析

ハッシュ表の衝突 (激突) 対策は主に二つの方法がある。それぞれの方法 (順不同) についてその名前を書き、拡張のための条件とその利用を説明しなさい。 To handle conflicts in hash tables, there are two main methods. For each method (order irrelevant), give its name and explain and justify the condition for expansion.

方法 A の名前 / Name of Method A (1 点): チェイン法

方法 A の場合の説明 / Explanation for Method A (4 点):

ハッシュ関数により各ビンに割り当てるデータ項目が連結リストで管理されるので、連結リストの平均の長さ (= 占有率) が長くなると効率が下がる。実装によるが、5以上になったところで拡張するのはいい目安だ。

方法 B の名前 / Name of Method B (1 点): 開番地法

方法 B の場合の説明 / Explanation for Method B (4 点):

激突が起こる場合、項目をハッシュ関数の再計算で別のビンに直接おさまるので、ハッシュ表がいっぱいに近いと効率が極端に下がる。項目の数をハッシュ表の大きさの半数以下にし、超える恐れがあるときは拡張した方がいい。

授業へのコメント / Comment about Course (6 点)

この授業で一番分かりにくかったことを書きなさい。 (決まった正解はありません。)
What was most difficult to understand in this course? (there is no definite answer)

@@@@
@@@@

この授業で一番勉強になったことを書きなさい。 (決まった正解はありません。)
What topic in this course was most interesting for you? (there is no definite answer)

@@@@
@@@@

キュー (待ち行列) を実装するためのデータ構造の選択
Selection of Data Structures for Implementing Queues (15 点)

下記の三つの事情に合わせて、キューを実装するように頼まれた。データ構造を選んで、そのデータ構造の名前、仕組み、選んだ理由、各操作の計算量を記述しなさい。/ You have been asked to implement queues for the three situations below. Select the appropriate data structure, give its name and workings, the reason for your selection, and the computational complexity of the operations.

どの時点にもキューに入っている項目の最大の数はある一定の数より大きくはならない。追加は時間が少しでもかかると困るが、取り出しは少しでけ時間がかかってもよい。/ At each moment in time, the number of items in the queue is below a certain maximum number. Insertion of items has to be extremely fast, but removal may take slightly longer.: 配列を使う。配列の大きさを最大の項目の数に合わせる。配列の先頭から出すので、後の項目をすらすのは O(n) の時間がかかるが、項目の追加は後ろの方にやるので、 O(1) で可能。
キューに納まる項目数の制限がないが、出すのも入れるのも素早く行って欲しい。/ There is no limit on the number of items in the queue, but both insertion and removal should be extremely fast.: 線形リストを使う。最初の項目と最後の項目へのポインタを持っていれば、前からの削除も後への追加も一定時間 (O(1)) で可能。線形リストだと長さの制限もない。
普通のキューのではなく、項目に優先度が付いた順位キューが必要。最大の項目数が限定されている。 We need not just a queue, but a priority queue. The maximum number of items is known.: ヒープを使います。ヒープは配列の中に二分木を埋め、k 番目の要素の子は 2k と 2k+1 番目の場所にある。親は常に子より優先だが、兄弟の間の順番は決まってないので、項目の追加、削除、優先度の変更も全て O(log n) で可能。

友達のプログラマへの概念の説明 / Explanations to a Programmer Friend (12 点)

プログラミングができるがその概念を知らない友達に下記の概念を細かく説明しなさい。
Explain the following concepts to a friend who knows programming, but who hasn't heard about the concepts.

アルゴリズム / Algorithm

アルゴリズムは問題を解決する方法です。アルゴリズムの条件が三つある。
一つ目は、対応する問題が明確であること。例：数を順番に並びなおす。
二つ目は、レシピみたいにステップ毎に分かれ、再現できるように詳細に書いていること。
三つ目は、必ず結果を出して終了すること。
プログラムにしたものはアルゴリズムのではなく、その実装である。

疑似コード / Pseudocode

疑似コードはアルゴリズムの記述方法のひとつである。構造化プログラミングと同様に
関数、繰り返し、枝分れ、配列や構造体などが使われる。しかし、記述の詳細は
コンパイラなどの実行系に厳しく制限されるのではない。アルゴリズムの本質に集中する
ため、記述者が自由に省略し、抽象化し、コメントで記述できる。変数の宣言の
必要が無く、記号 (←, ≦, ≧, ≠ など) や数式も自由に使える。
一部ではプログラミング言語が疑似コードに使われることもある。特に Ruby が向いている。

アルゴリズムの設計方針 / Algorith Design Strategies (30 点)

下記の課題のため、下記の最初の四つの部分問題のアルゴリズムの設計方針や原理を使って、それぞれアルゴリズムの概略を提案し、予想される計算量とその根拠、設計方針を使った場合の問題点について述べなさい。
For the problem explained below, in the first four subproblems propose ideas for algorithms using the respective algorithm design strategies. In each case, indicate the expected time complexity and the reason for it, and the problems when using this design strategy.

注：この問題の目的は、できるだけ最適な結果のアルゴリズムの発見よりも、設計方針の知識の応用である。
Caution: The purpose of this problem is to show you can apply your knowledge about design strategies, rather than to design the best algorithm.

課題: 所属大学で全ての新入生 (n 人、n が 5000人程度) が寮に住むことになっている。学長から部屋割りをできるだけ不満が無いように決めるように頼まれた。寮の部屋は全て二人部屋。学生の希望のデータ (x にとって、相方には y と z どどちらが好ましい) は全ての (x, y, z) に対して提供される。
Problem: In you university, all freshmen (n students, around 5000) have to live in a dormitory. The University President asks you to create a room assignment that tries to make all students reasonably happy with their room partners. All rooms are for two students. You get data about student preferences for each triple of students (x, y, z) in the form "x prefers y over z)".

動的計画法 / dynamic programming

例えば入力順で隣り合っている二人組、三人組、… の順で一番好ましい部屋割りを計算し (二人組の倍は一位に決まる)、計算の範囲を徐々に拡大し、その時にすでに計算した結果をうまく考慮する。人数が少ないうちは簡単だが、増えると難しくなる。 O(n³) やそれ以上になるが、最適解の保証がない。

シミュレーテッドアニーリング / simulated annealing

乱数を使って何個かの部屋割りを作っておく。そこから乱数を使って学生を交換しながらさらに部屋割りを増やす。最初は交換の数を多め (温度が高め) にするが、徐々に下げる。途中の解をいい方のものを中心に残し、悪い方を削除する。計算量は作った解の数に比例が、解がたくさん必要なので、遅い。

貪欲アルゴリズム / greedy algorithm

学生を例えば好きな相手の数の順に並んで、順に部屋に入る一人目を割り振る。二人目は好き動詞になるように選び、繰り返す。計算量はソートが必要なので O(n log n) やそれ以上。問題は、最初に割り振られている学生にはいい解になるが、最後に割り振られる学生は嫌な相手と一緒になる可能性もある。

分割統治法 / divide and conquer

学生を二分し、それを再帰的に繰り返す。二人組になったところでそれを部屋割りにし、全部の組を逆順に統合する。いい部屋割りになるためには分割や統合 (または両方) のときの工夫が必要が、工夫に時間をかけすぎると分割統治法の意味がなくなる。計算量が O(n log n) になる可能性が高いが、最適解の保証はない。

この問題は NP困難問題であると想定される。しかし、学長から「来週入学式なので、それまで絶対最善の部屋割りを作ってもらわないと退学者も出るかもしれません。」と言われたときの返事の概略を書いてください。 It is assumed that this problem is NP-hard. Write your answer when the University President tells you "The entrance ceremony is next week, and if you don't find the optimal solution, some students may withdraw.

大変申し訳ございませんが、部屋割り問題は NP 困難問題と知られ、この問題と同様な問題が数多く存在する。この問題を素早く (すなわち、多項式時間で) 解ける方法が万が一見つかったら世紀の大発見になり、一億円以上の賞金も貰える。多くの専門家はそれが無理だと思っているが、その証明もいまだできていません。学生に少し我慢するようにするしかないかもしれません。

木構造の走査 / Tree Traversal (24 点)

通りがけ順の疑似コード / Pseudocode for Inorder Traversal (12 点)

出力が一行あたりに一つのノードで、インデントが整っている、二分木の通りがけ順の走査の疑似コードを書きなさい。引数として木のノード、一層当たりのインデントのスペースの数、そして現在のインデントの幅を使いなさい。木のノード tree_node の左の部分木は left、右の部分木は right、空のノードは empty?、ノードのデータを表す data を返す関数・メソッドが使える。出力に必要な関数も分かりやすい名前で使ってください (定義は不要)。
Write the pseudocode for an in-order traversal of a binary tree, where the output is one line per node, nicely indented. As arguments, use a tree node, the number of spaces per level of indent, and the current size of the indent. For a tree node tree_node, you can use the functions/methods left for the left subtree, right for the right subtree, empty? to test if a node is empty, and data for a node's data. Use functions with simple names for output (no need to define these functions).

function in_order
    inputs: tree_node, single_indent (number of spaces per level),
            current_indent (number of spaces for current indent)

    output: (to standard output only)
    
    if not tree_node.empty?
        in_order(tree_node.left, single_indent, current_indent+single_indent)

        print_spaces(current_indent)
        print_data(tree_node.data)
        print_new_line

        in_order(tree_node.right, single_indent, current_indent+single_indent)
    end
end

次の幅優先のノード数 n の二分木の出力のアルゴリズムについて、下記の問いに答えなさい。層の数が s の場合、層 i (i = 0, 1,... s-1) を出力するため、木を層 i まで深さ優先で辿って、層 i だけ出力する。
When printing out a binary tree with n nodes in breadth-first order using the following algorithm, reply to the questions below. For a tree with s layers, in order to output layer i (i = 0, 1,... s-1), traverse the tree in depth-first order only down to layer i, and only ouput layer i.

アルゴリズムの最善の計算量を求め、その理由を説明しなさい。
Obtain the best-case time complexity for the algorithm, with explanation. (6 点)

n=m²-1 の完全二分木の場合、最善の計算量になる。一番下の層まで辿るにはを O(n) の時間が必要。下から二番目の層の大きさは一番使途の層の半分なので、 O(n/2) がかかる。n + n/2 + n/4 + ... ~= 2n なので、最善の計算量は O(n) となる。

アルゴリズムの最悪の計算量を求め、その理由を説明しなさい。
Obtain the worst-case time complexity for the algorithm, with explanation. (6 点)

最悪の計算量は線形の木の場合に発生する。一番下の層のためには O(n)、その一つ上の層には O(n-1) がかかる。 n + n-1 + n-2 + n-3 + ... + 1 ~= n²/2 なので、最悪の計算量は O(n²) となる。

用語の説明 / Explanation of Terms (32 点)

次の英語の用語に相当する日本語の用語を書いて、簡単に説明しなさい。日本語の用語ではカタカナをできるだけ避けなさい。
For each of the English terms below, write the corresponding Japanese term and give a short explanation. For the Japanese terms, avoid Katakana as much as possible.

dictionary: 辞書; データ項目をキーを使って探索、挿入、削除できる抽象データ型
decision problem: 決定問題; 解とが「はい」か「いえ」だけの問題; 他の問題は決定問題に置き換え可能
memoization: 履歴管理; 関数の計算結果を例えばハッシュに登録し、動的計画法の自動化に使える技
binary search: 二分探索; 整列したデータで、範囲を再帰的に二分にして O(log n) の探索方法
radix sort: 基数整列; 小さい桁から桁ごとにソートする。安定な整列法が必要
abstract datatype: 抽象データ型；データ上の操作だけが定義; 実装はいろいろ
substring: 部分文字列; 文書や文字列の連続した部分
invariant: 不変条件; データ構造など保たされる条件で、挿入や削除などで修復する必要がある
asymptotic growth: 漸近的な増加; 関数の引数の増加によって関数の結果がおよそどの勢いで増えるか
genetic algorithm: 遺伝的アルゴリズム；問題の解を遺伝的情報にたとえ、交叉と突然変異で解を近似する
sentinel: 番兵; 配列などの範囲を超えるチェックが不要になるために追加する疑似的な項目
secondary storage: 二次記憶装置、ハードディスクなど、遅いのでアルゴリズムの選定で考慮が必要
local optimum: 局所的な最適解; 山登り法という近似アルゴリズムの場合発生する問題
precomputation: 事前計算; 実際の計算の前の準備、例えばデータ構造の初期化など
recurrence relation: 漸化式; 再帰的に定義されている数学的な関数、解けにくい場合がある
reduction: 帰着; ある問題を別の問題に置き換えて解くことで問題の相対的な難しさを示す

青山学院大学

選択整列法と挿入整列法 / Selection Sort and Insertion Sort (14 点)

整列と平均計算量 / Sorting and Average Time Complexity (10 点)

ハッシュ表の拡張 / Expansion of Hash Table (16 点)

授業へのコメント / Comment about Course (6 点)

青山学院大学

キュー (待ち行列) を実装するためのデータ構造の選択
Selection of Data Structures for Implementing Queues (15 点)

友達のプログラマへの概念の説明 / Explanations to a Programmer Friend (12 点)

アルゴリズムの設計方針 / Algorith Design Strategies (30 点)

青山学院大学

木構造の走査 / Tree Traversal (24 点)

用語の説明 / Explanation of Terms (32 点)

計算量の比較 / Comparing Computational Complexities (10 点)

青山学院大学

選択整列法と挿入整列法 / Selection Sort and Insertion Sort (14 点)

整列と平均計算量 / Sorting and Average Time Complexity (10 点)

ハッシュ表の拡張 / Expansion of Hash Table (16 点)

授業へのコメント / Comment about Course (6 点)

青山学院大学

キュー (待ち行列) を実装するためのデータ構造の選択Selection of Data Structures for Implementing Queues (15 点)

友達のプログラマへの概念の説明 / Explanations to a Programmer Friend (12 点)

アルゴリズムの設計方針 / Algorith Design Strategies (30 点)

青山学院大学

木構造の走査 / Tree Traversal (24 点)

用語の説明 / Explanation of Terms (32 点)

計算量の比較 / Comparing Computational Complexities (10 点)

キュー (待ち行列) を実装するためのデータ構造の選択
Selection of Data Structures for Implementing Queues (15 点)