[R6RS] comments on rough draft for hash tables

Wed Mar 8 10:33:32 EST 2006

Thanks for the quick response, Will.  I've made changes in SVN in 
response to all of your points, as detailed below.

> I'd prefer hash-table-get/default,
> by analogy with the name used by SRFI-69:
> hash-table-ref/default.

Done.  For consistency, I've also renamed all the hash-table-get/xxx 
procedures to hash-table-ref/xxx, which matches SRFI-69's convention.

> We also should have something like
> 
> Procedure: hash-table-get/call hash-table key f => object
> 
>     Returns the value associated with key in the hash-table
> if the hash table contains key; otherwise tail-calls f on key.
> 
> In my opinion, hash-table-get/call is more useful and more
> efficient than hash-table-get/thunk, which often requires
> the user to obtain the thunk by writing things like
> (lambda () (f key)).

Added.

>>Procedure: hash-table-contains hash-table
> 
> 
> This should be
> 
> Procedure: hash-table-contains? hash-table key

Fixed.

> If we change /flag to /default or add the /call version,
> then we should make the corresponding change or addition
> to the update procedures.

Both done.  (Any "unnecessary" procedures can be cut later.)

>>Procedure: hash-table-fold hash-table procedure init => value
>>Procedure: hash-table-for-each procedure hash-table => unspecified
> 
> 
> What happens if the procedure adds or deletes keys from
> the hash-table?  

The issue item about "Side effects in thunks" was intended to indicate 
that this is yet to be addressed.  I've relabeled it as "Side effects in 
higher-order procedures", and changed the language to include "This 
should be addressed somehow, if only by a statement that the behavior 
caused by such procedures is unspecified."

> The specification of hash-table-clear!
> implies that this is allowed.

The code in question was a leftover from earlier work.  I've deleted it.

>>Procedure: eq-hash object => integer
>>
>>    Returns a hash value for object based on its location.
>>This procedure is the hash function used by make-eq-hash-table.
> 
> 
> No, we shouldn't guarantee that this procedure is used by
> make-eq-hash-table.  In some systems it may be, but it is
> unlikely to be the same hash function in systems whose
> garbage collectors move objects.  
...
> Hah.  The implementors of MIT Scheme did not anticipate
> garbage collectors that run 100 times per second.

Removed.  That behavior arose from a desire to have the reflection 
procedures produce something useful and consistent with the rest of the 
spec, when called on an eq or eqv hash table:

>>Procedure: hash-table-hash-function hash-table => procedure
>>
>>    Returns the hash function used by hash-table.
> 
> 
> Should these procedures be defined on hash tables that
> were created by calling make-eq-hash-table or
> make-eqv-hash-table?  

You're right that hash-table-hash-function shouldn't be, so I've added 
provisional language to that effect.  What about 
hash-table-equivalence-predicate ?

> O(1) retrieval cannot be guaranteed for arbitrary hash
> functions.  You can't even guarantee amortized O(1)
> retrieval for the system's hash functions.  The best
> you can do is to guarantee average-case amortized O(1)
> retrieval for the system's hash functions.

Good point.  I took that example from SRFI 69, which actually uses 
language such as "Given a good hash function, this operation should have 
an (amortised) complexity of O(1) with respect to the number of 
associations in hash-table."  This is now quoted in the issue text.  I 
mainly wanted to document the question of whether the proposal should 
include such constraints on implementations.

>>Concurrency: R6RS does not deal with concurrency.  Should
>>this proposal say any thing about it?
> 
> 
> We might not say anything about it, but we need to think
> about it.  Any implementation that supports concurrency is
> going to have to implement some kind of mutual exclusion
> for operations that have side effects, and some will need
> a bit of mutual exclusion even for hash-table-ref.
> 
> As specified, the updating operations are not atomic, so
> they create no new problems.  (Some implementors might try
> to implement them as atomic operations, but we can let
> them learn the hard way.)
> 
> The hash-table-fold and hash-table-for-each procedures
> already have a problem, even without concurrency.

I've added these remarks to the issue text.

--
Anton