[R6RS] Table proposal

Marc Feeley feeley
Sun Apr 24 00:12:53 EDT 2005


To followup on the hash table proposal Manuel presented in Snowbird,
I am contributing this proposal for "tables" which offer
similar functionality, but with a different interface and
with extended features.

Marc


TABLE PROPOSAL
==============

Tables are an abstract and efficient way of associating keys and
values.  Tables are conceptually related to association lists but are
abstract and mutable.  A possible efficient implementation is based on
hash tables, but they could also be implemented with other data
structures (binary search trees, ...).  This proposal is biased
towards a hash table implementation, but the proposal could be
extended in the future to allow the user to choose the implementation
when a table is created (using optional parameters).

Here is a summary of the procedures:

   (make-table [optional params])         create a table

   (table? obj)                           test if obj is a table

   (table-length table)                   number of key/value bindings

   (table-ref table key [default])        get value associated with key

   (table-set! table key [value])         set value associated with key

   (table-for-each proc table)            apply proc on all key/value 
bindings

   (table-search proc table)              search for first key/value 
binding
                                          for which proc is non false

   (table->list table)                    convert table to association 
list

   (list->table list [optional params])   convert association list to 
table

The basic interface 
(make-table/table?/table-length/table-ref/table-set!)
is analogous to the vector interface (make-vector/vector?/vector-length/
vector-ref/vector-set!).  The main difference is that tables allow any
object (the key) where the vector interface expects an index.  The
consistency in the table and vector interfaces means it is easy to
remember for the programmer.

There are also conversion procedures (table->list and list->table)
to convert between a table and its association list representation.
For list->table, when there are multiple occurences of a key, it is
the first key/value pair that takes precedence (this is consistent
with assoc/assv/assq which return the first key/value pair in the
association list that matches the key being searched).

The procedure table-ref has an optional third parameter.  If it is not
supplied and the key is not bound in the table then an exception is
raised.  If it is supplied and the key is not bound then the third
parameter is returned.  This is convenient to implement default values,
or initial values, or to test if a key is bound, i.e.:

     (table-ref table key #f)    ; <-- return #f if key not bound

     (table-set! table           ; <-- increment counter associated with 
key
                 key
                 (+ 1 (table-ref table key 0)))

     (let ((unique (list 0)))    ; <-- check if key is bound
       (if (eq? unique (table-ref table key unique))
           "key is not bound"
           "key is bound"))

The procedure table-set! mutates tables.  It can either add a new
key/value binding (if the key wasn't yet bound to a value), change
the current binding, or remove a key/value binding (when the third
parameter is absent).  The result is unspecified.

The procedure table-for-each is analogous to for-each, but iterates
over all key/value bindings in the table in an arbitrary order.
The procedure proc is dyadic (the first parameter is the key
and the second is the value).  The procedure proc is the first
parameter of table-for-each for consistency with for-each and
to allow future extensions of these procedures to more than one
table argument.

The procedure table-search iterates over the key/value bindings
similarly to table-for-each but stops at the first key/value
binding for which the procedure proc returns a non false value.
This non false value is returned by table-search.  If no key/value
binding satisfies proc, then #f is returned.

While a call to table-for-each or table-search is being performed on a
table, it is an error to call any of the following procedures to
access the table: table-ref, table-set!, table-for-each, table-search,
and table->list.  This restriction is to allow these primitives to
automatically reorganize the internal representation of the table (for
example a table-set! might add a key/value binding that causes the
table's load to exceed the maximum allowed load and require a resizing
and rehashing of the table).  Lifting this restriction could be
achieved by taking a "snapshot" of the table (for example with
table->list) and then using the snapshot to iterate over the key/value
bindings.  This would slow down these operations considerably.

The optional parameters on the make-table and list->table procedures
specify settings of the table that is created.  Here is the set of
optional parameters allowed:

   size        Exact nonnegative integer indicating the expected number 
of
               key/value bindings in the table (this information is 
purely
               advisory and can be ignored by the implementation).

   init        Value that will be returned by table-ref if the key is
               not bound and the default value is not specified.  When 
init
               is not specified when a table is created, then if the key 
is
               not bound and the default value is not specified then
               table-ref will raise an exception.

   weak-keys   Boolean indicating if the keys are held weakly.  The
               system (i.e. garbage collector) may remove key/value
               bindings when weak-keys is true and the key is only
               reachable from the roots through weak references.
               The default is weak-keys = #f.

   weak-values Boolean indicating if the values are held weakly.  The
               system may remove key/value bindings when weak-values
               is true and the value is only reachable from the roots
               through weak references.  The default is weak-values = #f.

   test        Dyadic procedure for comparing keys for equality.
               The default value of test is the equal? procedure.
               Any test procedure may be used, but it is expected that
               the implementation will treat these cases efficiently:

                  eq?
                  eqv?
                  equal?
                  string=?
                  string-ci=?

   hash        Unary procedure mapping keys found in the table to
               exact integers.  The hash procedure must be consistent
               with the test procedure, that is for any keys K1 and K2

                  (test K1 K2)  implies  (= (hash K1) (hash K2))

               The default hash procedure depends on the test procedure.
               It is expected that the implementation will provide
               efficient hash procedures for the following test
               procedures:

                  eq?
                  eqv?
                  equal?
                  string=?
                  string-ci=?

               For other test procedures the implementation may
               use a procedure that always returns the same value
               (this is consistent with any test procedure but is
               inefficient).

   min-load    Real number between 0 and 1 indicating the minimum
               acceptable load of the table (this information is purely
               advisory and can be ignored by the implementation).

   max-load    Real number between 0 and 1 indicating the maximum
               acceptable load of the table (this information is purely
               advisory and can be ignored by the implementation).

One issue is how to pass this large set of optional parameters.
Positional optional parameters are not appropriate because it is hard
to remember the parameter ordering, and it would force the user to
supply values for parameters whose default value would otherwise be
appropriate.  I believe that named parameters are needed, either named
with symbols, or self-evaluating keywords, i.e.

    (make-table 'test eq? 'weak-keys #t)

or

    (make-table test: eq? weak-keys: #t)

[I hope we can standardize on an approach for named parameters which
will be used consistently for all cases where named parameters are
advised, such as I/O procedures, GUI API's, etc.  This issue is
important but orthogonal to the table proposal, so we can discuss it
separately.]

There are other procedures we may consider for manipulating tables.
Here is a list of the most obvious.  I have no strong opinions about
these procedures except that some seem redundant and it is not always
clear what the best semantics is.

   (table-update! table key proc [default])
     Equivalent to (table-set! table key (proc (table-ref table key 
[default])))

   (table-bound? table key)
     Equivalent to (let ((unique (list 0)))
                     (not (eq? unique (table-ref table key unique))))

   (table-map proc table)
     Is the result a table, an association list, or a list of the
     results?  If a table then the table creation optional parameters
     should be permitted (but then table-map can't be extended to
     multiple tables, on the other hand it is not clear what the 
semantics
     would be for multiple tables if a key does not exist in all tables).

   (table-clear! table)
     Removes all key/value bindings from the table.  Is this truly useful
     given that it is probably faster to create a new table?

   (table-copy table)
     Returns a fresh copy of the table.  Is this useful?  After all
     vector-copy is not in R5RS (but I do think vector-copy should
     be in R6RS).

   (table-test table)
     Returns the key comparison procedure.  Other accessor procedures
     would be needed for the hash procedure, the min-load, etc.

Related work:

I've looked at various hash table specifications: Common Lisp, T,
Bigloo, Chez, MzScheme, Larceny, Scheme48, Emacs Lisp, Guile.  This
proposal is essentially what is available in Gambit 4.0.  Its features
are a superset of the features available in most of these
implementations.

A feature of Emacs Lisp not supported by this proposal is the
key-and-value weakness.  This type of weakness means that a key/value
binding can only be removed by the system when the key **and** the
value are only reachable from the roots through weak references.  I
don't think this case is particularly useful (none of the packages
distributed with Emacs uses this type of weakness).



More information about the R6RS mailing list