Hash Functions and Hash Tables

(ハッシュ関数とハッシュ表)

Data Structures and Algorithms

10th lecture, December 5, 2019

http://www.sw.it.aoyama.ac.jp/2019/DA/lecture10.html

Martin J. Dürst

AGU

© 2009-19 Martin J. Dürst 青山学院大学

Today's Schedule

 

Leftovers

Summary of Last Lecture

 

Time Complexity for Known Dictionary Implementations

Implementation Search Insertion Deletion
Sorted array O(log n) O(n) O(n)
Unordered array/linked list O(n) O(1) O(n)
Balanced tree O(log n) O(log n) O(log n)

 

Direct Addressing

Problem: Array size, non-numeric keys

Solution: Transform key with a hash function

 

Overview of Hashing

(also called scatter storage technique)

 

Problems with Hashing

  1. Choice/design of hash function

    Example 1:
    remainder: def hf(k); k % 100; end

    students[15818000 % 100] = "I.T.Aoyama"

    Example 2:
    sum of codepoints (character numbers): def hf(k); k.codepoints.sum; end

    students["HanakoAoyama".codepoints.sum] = ...

  2. Resolution of conflicts

    What happens with the following:

    students[15818000 % 100] = "I.T.Aoyama"

    students[15718000 % 100] = "K.S.Aoyama"

 

Overview of Hash Function

Goal/step 2 is easy. Therefore, we concentrate on goal/step 1.
(often step 1 alone is called 'hash function')

 

Hash Function Example 1

int sdbm_hash(char key[])
{
    int hash = 0;
    while (*key) {
        hash = *key++ + hash<<6
+ hash<<16 - hash; } return hash; }

 

Hash Function Example 2

(simplified from MurmurHash3; for 32-bit machines)

#define ROTL32(n,by) (((n)<<(by)) | ((n)>>(32-(by))))
int too_simple_hash(int key[], int length)
{
    int h = 0;
    for (int i=0; i<length; i++) {
        int k = key[i] * C1;  // C1 is a constant
        h ^= ROTL32(k, R1);  // R1 is a constant
    }
    h ^= h >> 13;
    h *= 0xc2b2ae35;
    return h;
}

Frequent operations in hash functions: Addition (+), multiplication (*), bitwise XOR (^), shift (<<, >>)

 

Evaluation of Hash Functions

 

Precautions for Hash Functions

 

Conflicts

 

Terms and Variables for Conflict Resolution

 

Chaining

 

Implementation of Chaining

 

Open Addressing

Time Complexity of Hashing

(average, for chaining)

 

Expansion and Shrinking of Hash Table

 

Analysis of the Time Complexity of Expansion

(This is a simple example of amortized analysis.)

 

Special Purpose Hash Functions

 

Universal Hashing

 

Perfect Hash Function

 

Cryptographic Hash Function

 

Evaluation of Hashing

Advantages:

Problems:

 

Comparison of Dictionary Implementations

Implementation Search Insertion Deletion Sorting
Sorted array O(log n) O(n) O(n) O(n)
Unordered array/linked list O(n) O(1) O(n) O(n log n)
Balanced tree O(log n) O(log n) O(log n) O(n)
Hash table O(1) O(1) O(1) O(n log n)

 

The Ruby Hash Class

(Perl: hash; Java: HashMap; Python: dict)

 

Implementation of Hashing in Ruby

 

Summary

 

Glossary

direct addressing
直接アドレス表
hashing, scatter storage technique
ハッシュ法、挽き混ぜ法
hash function
ハッシュ関数
hash table
ハッシュ表
game of Go
囲碁
joseki
定石 (囲碁)
conflict
衝突
Poisson distribution
ポアソン分布
chaining
チェイン法、連鎖法
open addressing
開番地法、オープン法
load factor
占有率
linear probing
線形探査法
quadratic probing
二次関数探査法
divisor
(割り算の) 法
amortized analysis
償却分析
universal hashing
万能ハッシュ法
perfect hash function
完全ハッシュ関数
denial of service attack
DOS 攻撃、サービス拒否攻撃
cryptographic hash function
暗号技術的ハッシュ関数
electronic signature
電子署名
proximity search
近接探索
similarity search
類似探索