Hash tables

A hash table is a data structure that:

provides a mapping from some set of keys to some set of values associated to those keys;
has no intrinsic order for the (key, value) associations it contains;
supports in-place modification as the primary means of setting the contents of a hash table; and
provides key lookup and destructive update in amortised constant time, provided that a good hash function is used.

The Kawa hash table functions follow the SRFI-69 specifiation. The Kawa implementation has been optimized for performance and better Java integration. Specifically, the default hash function uses the standard Java hashCode method.

To use the hash table functions in your Kawa program you must first:

(require 'srfi-69)

(require 'hash-table)

Type constructors and predicate

Function: make-hash-table [ equal? [ hash [ size-hint]]] → hash-table

Create a new hash table with no associations. The equal? parameter is a predicate that should accept two keys and return a boolean telling whether they denote the same key value; it defaults to the equal? function.

The hash parameter is a hash function, and defaults to an appropriate hash function for the given equal? predicate (see the Hashing section). However, an acceptable default is not guaranteed to be given for any equivalence predicate coarser than equal?, except for string-ci=?. (The function hash is acceptable for equal?, so if you use coarser equivalence than equal? other than string-ci=?, you must always provide the function hash yourself.) (An equivalence predicate c1 is coarser than a equivalence predicate c2 iff there exist values x and y such that (and (c1 x y) (not (c2 x y))).)

The size-hint parameter can be used to suggested an approriate initial size. This option is not part of the SRFI-69 specification (though it is handled by the reference implementation), so specifying that option might be unportable.

Function: hash-table? obj → boolean

A predicate to test whether a given object obj is a hash table.

Function: alist->hash-table alist [ equal? [ hash [ size-hint]]] → hash-table

Takes an association list alist and creates a hash table hash-table which maps the car of every element in alist to the cdr of corresponding elements in alist. The equal?, hash, and size-hint parameters are interpreted as in make-hash-table. If some key occurs multiple times in alist, the value in the first association will take precedence over later ones. (Note: the choice of using cdr (instead of cadr) for values tries to strike balance between the two approaches: using cadr would render this procedure unusable for cdr alists, but not vice versa.)

Reflective queries

Function: hash-table-equivalence-function hash-table

Returns the equivalence predicate used for keys of hash-table.

Function: hash-table-hash-function hash-table

Returns the hash function used for keys of hash-table.

Dealing with single elements

Function: hash-table-ref hash-table key [ thunk ] → value

This procedure returns the value associated to key in hash-table. If no value is associated to key and thunk is given, it is called with no arguments and its value is returned; if thunk is not given, an error is signalled. Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

Function: hash-table-ref/default hash-table key default → value

Evaluates to the same value as (hash-table-ref hash-table key (lambda () default)). Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

Function: hash-table-set! hash-table key value → void

This procedure sets the value associated to key in hash-table. The previous association (if any) is removed. Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

Function: hash-table-delete! hash-table key → void

This procedure removes any association to key in hash-table. It is not an error if no association for the key exists; in this case, nothing is done. Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

Function: hash-table-exists? hash-table key → boolean

This predicate tells whether there is any association to key in hash-table. Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

Function: hash-table-update! hash-table key function [ thunk ] → void

Semantically equivalent to, but may be implemented more efficiently than, the following code:
(hash-table-set! hash-table key
                 (function (hash-table-ref hash-table key thunk)))

Function: hash-table-update!/default hash-table key function default → void

Behaves as if it evaluates to (hash-table-update! hash-table key function (lambda () default)).

Dealing with the whole contents

Function: hash-table-size hash-table → integer

Returns the number of associations in hash-table. This operation takes constant time.

Function: hash-table-keys hash-table → list

Returns a list of keys in hash-table. The order of the keys is unspecified.

Function: hash-table-values hash-table → list

Returns a list of values in hash-table. The order of the values is unspecified, and is not guaranteed to match the order of keys in the result of hash-table-keys.

Function: hash-table-walk hash-table proc → void

proc should be a function taking two arguments, a key and a value. This procedure calls proc for each association in hash-table, giving the key of the association as key and the value of the association as value. The results of proc are discarded. The order in which proc is called for the different associations is unspecified.

Function: hash-table-fold hash-table f init-value → final-value

This procedure calls f for every association in hash-table with three arguments: the key of the association key, the value of the association value, and an accumulated value, val. The val is init-value for the first invocation of f, and for subsequent invocations of f, the return value of the previous invocation of f. The value final-value returned by hash-table-fold is the return value of the last invocation of f. The order in which f is called for different associations is unspecified.

Function: hash-table->alist hash-table → alist

Returns an association list such that the car of each element in alist is a key in hash-table and the corresponding cdr of each element in alist is the value associated to the key in hash-table. The order of the elements is unspecified.

The following should always produce a hash table with the same mappings as a hash table h:
(alist->hash-table (hash-table->alist h)
                        (hash-table-equivalence-function h)
                        (hash-table-hash-function h))

Function: hash-table-copy hash-table → hash-table

Returns a new hash table with the same equivalence predicate, hash function and mappings as in hash-table.

Function: hash-table-merge! hash-table1 hash-table2 → hash-table

Adds all mappings in hash-table2 into hash-table1 and returns the resulting hash table. This function may modify hash-table1 destructively.

Hashing

Hashing means the act of taking some value and producing a number from the value. A hash function is a function that does this. Every equivalence predicate e has a set of acceptable hash functions for that predicate; a hash funtion hash is acceptable iff (e obj1 obj2) implies (= (hash obj1) (hash obj2)).

A hash function h is good for a equivalence predicate e if it distributes the result numbers (hash values) for non-equal objects (by e) as uniformly as possible over the numeric range of hash values, especially in the case when some (non-equal) objects resemble each other by e.g. having common subsequences. This definition is vague but should be enough to assert that e.g. a constant function is not a good hash function.

When the definition of make-hash-table above talks about an appropriate hashing function for e, it means a hashing function that gives decent performance (for the hashing operation) while being both acceptable and good for e. This definition, too, is intentionally vague.

The Kawa implementation always calls the hash functions with a single parameter, and expects the result to be within the entire (32-bit signed) <int> range, for compatibility with standard hashCode methods.

Function: hash object [ bound ] → integer

Produces a hash value for object in the range from 0 (inclusive) tp to bound (exclusive).

If bound is not given, the Kawa implementation returns a value within the range (- (expt 2 32)) (inclusive) to (- (expt 2 32) 1) (inclusive). It does this by calling the standard hashCode method, and returning the result as is. (If the object is the Java null value, 0 is returned.) This hash function is acceptable for equal?.

Function: string-hash string [ bound ] → integer

The same as hash, except that the argument string must be a string. (The Kawa implementation returns the same as the hash function.)

Function: string-ci-hash string [ bound ] → integer

The same as string-hash, except that the case of characters in string does not affect the hash value produced. (The Kawa implementation returns the same the hash function applied to teh lower-cased string.)

Function: hash-by-identity object [ bound ] → integer

The same as hash, except that this function is only guaranteed to be acceptable for eq?. Kawa uses the identityHashCode method of java.lang.System.