2018-10-29

a hash table uses data structure and a function to translate values to addresses for locations in the data structure. its special property is that it can guess memory locations to some degree, and because of that can archieve linear access time on average. the input values that are used to identify locations are usually called "keys"

- range of possible key values
- distribution to destination locations
- calculation effort
- fixed length or variable length key values

simple array indexing, pointer arithmetic. this one only works if no key is equal or greater than the size of the array

`array-start-address + integer-key`

`target-index = integer-key % last-array-index`

the hash table size should be a prime or at least odd. modulo uses division, and primes have the lowest number of divisors; the number of possible collisions is reduced. even size-values would lead to even hash-values for even keys and odd hash-values for odd keys

```
m = some-value-perhaps-arbitrary-chosen
A = (sqrt(5)-1)/2 = 0.6180339887
floor(m * (k * A - floor(k * A)))
```

- burtleburtle.net/bob/hash/index.html
- cse.yorku.ca/~oz/hash.html
- faculty.ycp.edu/~dbabcock/PastCourses/cs360/lectures/lecture11.html

so called perfect hash functions create an unique hash for every key. for the creation of such a function all possible key values have to be known and somewhat considered beforehand

because the set of possible keys is usually larger than the set of possible hashes it is possible, and likely, that the hash function creates the same hash for different keys

keys map to an array of addresses to linked lists. the linked lists are used to contain possibly several elements for array locations

when a value is about to be inserted and a collision is found, the next free location is used. every access of the value then requires a similar search

like linear probing, but the next free location is found by, if necessary repeatedly, multiplying the index with itself, wrapping the array size. depending on the key values, for example if they are nearly linear or more random, this distribution can reduce clustering of keys and can make collisions less likely. linear probing might have performance advantages through memory locality

often a combination of arrays and linked lists

big enough to store the desired number of elements, and usually bigger to reduce the likelihood of collisions

can include keys, are often structs or references to other containers

- murmur3 hash function
- klib khash hash table
- lookup3.c hash functions
- imht-set a set data structure that uses a minimalistic hash table