layout: post
title: objects as hash keys
-
{{ page.title }}
7 November 2011
what others do
Hash tables with complex keys – keys that are not strings. Ideally a hash should be able to store any object as key; well Python solves this problem by introducing tuples – read only sequences can be key to a hash; which results in the introduction of yet another entity. Perl does not bother with this problem – if you insert a Array as key, then it just casts its to scalar – takes the length of the array as key that is; not very intuitive either.
Entities that can be stored as keys to an array
- scalars; numbers. For a string key we have to make a new copy of the string when holding it in the hash entry.
- Sequences,vectors have to be copied by values, two vectors can be compare by comparing its elements; If the elements of an array are hash collections, then the hash collections have to be compared (ouch);
the problem
what to do if a hash table is inserted as key, or indirectly, if a hash table is part of an array, that is inserted as key. The problem is how to compare different hash table instances effectively, and how to compute the hash value of a hash object as key.
it can be done
to compute the hash key;
- for scalars as keys: compute the hash key
- for arrays as keys: compute the hash value of each element and combine them in sequence
- for hash tables as keys: create a sorted ordering of the keys, compute the hash value of each element and combine them in the order of the sorted keys.
How to sort an arbitrary set of values:
- all values of the same type are compared with entries of their own kind (integers with integers, strings with strings, arrays with arrays, hashes with hashes)
comparing two arrays
- two arrays are equal if both have the same number of elements, and all elements are equal.
- Array a is smaller than array b
- if a has less elements than b
- if an element a[i] is less then element b
comparison of hashes.
- If hash a has less keys then hash b then a is smaller than b
- if both hashes have same number or keys, sort the keys;
- compare keys after sorting if a.key[i] < b.key[i] then a is smaller then b
- if a.key[i] == b.key[i] then compare v1 = a.lookup( a.key[i] ) and v2 = b.lookup( b.key[i] )
- if v1 < v2 then a is smaller then b.
- if all are equal then both hash instances are equal.
is it all really that bad?
First of all the problem is mitigated if each element of the hash collection stores the hash value of the key; in this case the key has to be compared only in cases when the hash value of the given key is identical with a hash value of the collection entry. Even then most chances are that the ordered sequence of keys must not be created.
Also the sorted sequence of keys will only be computed on demand.
Is all this worth the effort ?
In a way yes; otherwise arrays as hash keys may not hold hashes; also yet another mind bending restriction is avoided as a start;