# about hash tables a hash table uses data structure and a function to translate values to addresses for locations in the data structure. its special property is that it can guess memory locations to some degree, and because of that can archieve linear access time on average. the input values that are used to identify locations are usually called "keys" # hash function ## considerations * range of possible key values * distribution to destination locations * calculation effort * fixed length or variable length key values ## examples ### identity function simple array indexing, pointer arithmetic. this one only works if no key is equal or greater than the size of the array ~~~ array-start-address + integer-key ~~~ ### modulo ~~~ target-index = integer-key % last-array-index ~~~ the hash table size should be a prime or at least odd. modulo uses division, and primes have the lowest number of divisors; the number of possible collisions is reduced. even size-values would lead to even hash-values for even keys and odd hash-values for odd keys ### multiplication ~~~ m = some-value-perhaps-arbitrary-chosen A = (sqrt(5)-1)/2 = 0.6180339887 floor(m * (k * A - floor(k * A))) ~~~ ### more * [burtleburtle.net/bob/hash/index.html](http://burtleburtle.net/bob/hash/index.html) * [cse.yorku.ca/~oz/hash.html](http://www.cse.yorku.ca/~oz/hash.html) * [faculty.ycp.edu/~dbabcock/PastCourses/cs360/lectures/lecture11.html](http://faculty.ycp.edu/~dbabcock/PastCourses/cs360/lectures/lecture11.html) so called perfect hash functions create an unique hash for every key. for the creation of such a function all possible key values have to be known and somewhat considered beforehand # collisions because the set of possible keys is usually larger than the set of possible hashes it is possible, and likely, that the hash function creates the same hash for different keys ## some ways to resolve collisions ### chaining keys map to an array of addresses to linked lists. the linked lists are used to contain possibly several elements for array locations ### linear probing when a value is about to be inserted and a collision is found, the next free location is used. every access of the value then requires a similar search ### quadratic probing like linear probing, but the next free location is found by, if necessary repeatedly, multiplying the index with itself, wrapping the array size. depending on the key values, for example if they are nearly linear or more random, this distribution can reduce clustering of keys and can make collisions less likely. linear probing might have performance advantages through memory locality # container often a combination of arrays and linked lists ## size big enough to store the desired number of elements, and usually bigger to reduce the likelihood of collisions ## content elements can include keys, are often structs or references to other containers # external ## more detailed information about the topic [course material from the university of auckland](https://www.cs.auckland.ac.nz/software/AlgAnim/hash_tables.html) ## c implementations * [murmur3](https://github.com/PeterScott/murmur3) hash function * [klib khash](https://github.com/attractivechaos/klib/blob/master/khash.h) hash table * [lookup3.c](http://burtleburtle.net/bob/c/lookup3.c) hash functions * [imht-set](https://github.com/sph-mn/sph-sc-lib#imht-set) a set data structure that uses a minimalistic hash table ## perfect hash function generator libraries * [gperf/](http://www.gnu.org/software/gperf/) * [cmph](http://cmph.sourceforge.net/index.html)