Formal comment #146 (defect)


Mustard must not be preposterous
Status: new
Reported by: 	William D Clinger
Version:	5.92

Quoting RFC 2119, section 5.1 says the word "must" means that a
statement is an absolute requirement of the specification.

Unfortunately, the draft's use of the word "must" repeatedly fails to
distinguish between absolute requirements that are imposed upon
programmers and absolute requirements that are imposed upon
implementations. The draft often requires implementations to perform
impractical or even undecidable checking of arguments. In some (but
not all) cases, these requirements are combined with explicit
"Implementation requirements" that are far more reasonable. It appears
that the general policy for interpreting "must" requirements, as
stated in section 5.2, does not quite say what it may have been
intended to say.

In particular, section 5.2 says:

        Descriptions of procedures may express other restrictions on
        the arguments of a procedure. Typically, such a restriction is
        formulated as a phrase of the form "x must be a ...". (or
        otherwise using the word "must.) [sic] If the description does
        not explicitly distinguish between the programmer's and the
        implementation's responsibilities, the restrictions describe
        both the programmer's responsibility, who [sic] must ensure
        that an appropriate argument is passed, and the
        implementation's responsibilities, which must check [sic] that
        the argument is appropriate.

Although implementation responsibilities are sometimes marked as such,
most of the "other restrictions" mentioned by the paragraph quoted
above do not explicitly distinguish between the programmer's and the
implementation's responsibilities. According to the above, this means
implementations "must check that the argument is appropriate".

Since these checks are often undecidable, and checking them is an
absolute requirement (because of the word "must"), the 5.92 draft
specification cannot be implemented.

I went through the LaTeX sources for the 5.92 draft, examining every
use of the word "must" to see whether I believe it is stating an
absolute requirement. Where "must" is used to describe some
restriction on the arguments to a procedure, without explicitly
distinguishing between the programmer's responsibilities and the
implementation's, I considered whether it is reasonable for the
specification to require implementations to check the
restriction. What follows is a list of the specific errors I believe I
found.

****************************************************************

Page 4, just before the acknowledgements:

    With respect to future viability, the editors have operated under
    the assumption that many more Scheme programs will be written in
    the future than exist in the present, so the future programs are
    those with which we must be most concerned.

Whether the editors are most concerned with the future is
undecidable. Should, not must.

****************************************************************

Page 10, left column, last paragraph:

    For example, indexes into data structures must be known exactly,
    as must some polynomial coefficients in a symbolic algebra system.

Implementations should not be required to determine whether anyone
knows the indexes into data structures or coefficients in a symbolic
algebra system.

****************************************************************

Page 20, semantics of the word "should":

    This word, or the adjective recommended, mean [sic] that valid
    reasons max exist in particular circumstances to ignore a
    statement, but that the implications must be understood and
    weighed before choosing a different course.

Whether the implications are understood is undecidable.

****************************************************************

Page 26, just before the rationale:

    Thus, a portable library must reference identifiers only in phases
    consistent with the declared levels, and the library's meaning
    must not depend on whether the instances of a library are
    distinguished or shared across phases or library expansions.

Whether the library's meaning depends on whether the instances of a
library are distinguished or shared across phases or library
expansions is probably undecidable.

****************************************************************

Page 29, second column, second paragraph:

    A definition in the sequence of forms must not define any
    identifier whose binding is used to determine the meaning of the
    undeferred portions of the definition or any definition that
    precedes it in the sequence of forms.

I think the macro experts decided that it was not practical for
implementations to enforce this.

****************************************************************

Page 31, specification of define-syntax:

    This binds the keyword variable to the value of expression, which
    must evaluate, at macro-expansion time, to a transformer.

Whether an expression evaluates to a transformer is
undecidable. (Whether an expression evaluates to anything at all is
already undecidable.) Should, not must.

****************************************************************

Page 33, specification of case:

    Key must be any expression.

The R6RS can reasonably require key to be an expression, but cannot
reasonably require it to be *any* expression.

****************************************************************

Page 35, specification of letrec:

    One restriction on letrec is very important: it must be possible
    to evaluate each init without assigning or referring to the value
    of any variable.

It is undecidable whether there exists an order of evaluation that can
avoid such assignments or references.

****************************************************************

Page 35, specification of letrec*:

    One restriction on letrec* is very important: it must be possible
    to evaluate each init without assigning or referring to the value
    the corresponding variable or the variable of any of the bindings
    that follow it in bindings.

It is undecidable whether there exists an order of evaluation for the
right hand sides of the bindings that can avoid such assignments or
references. The R6RS should permit implementations to evaluate
expressions in some particular order, without requiring
implementations to perform an exhaustive search of the
order-of-evaluation space at run time, complete with the overhead
necessary to undo any side effects that are performed by evaluations
that run afoul of the restriction.

****************************************************************

Page 45, specification of number->string:

    Radix must be an exact integer, either 2, 8, 10, or 16.

Page 46, specification of string->number:

    Radix must be an exact integer, either 2, 8, 10, or 16.

Many implementations of Scheme have generalized number->string and
string->number to accept other radixes. The editors may truly intend
to rule out such generalizations, but the ubiquity of mistaken mustard
made me question this one.

****************************************************************

Page 48, specification of list-tail:

    List must be a list of size at least k.

Requiring implementations to check whether list is a list contradicts
explicit language in the "Implementation responsibilities" just four
lines down.

****************************************************************

Page 48, specification of list-ref:

    List must be a list of size at least k+1.

This too contradicts explicit language in the "Implementation
responsibilities".

****************************************************************

Page 48, specifications of map and for-each:

    The lists must all have the same length. Proc must be a procedure
    that takes as many arguments as there are lists and returns a
    single value. Proc must not mutate any of the lists.

For programs that use mutable pairs, implementations cannot prevent
proc from mutating the lists, and it is unreasonable to require them
to check for such mutations. (Consider what would be necessary to
check for such mutations if the proc never returns from its first
call.) The vague language under "Implementation responsibilities"
suggests that the editors may not intend to require map even to check
that the lists are lists when proc does not terminate. Finally, it is
unreasonable to require arity checking when all of the lists are
empty.

****************************************************************

Page 54, specification of dynamic-wind:

    Before, thunk, and after must be procedures accepting zero
    arguments and returning any number of values.

Arity checking of thunk and after should not be required unless they
are actually called.

****************************************************************

Page 56, In "Binding constructs for syntactic keywords":

    A let-syntax or letrec-syntax form may also appear in an
    expression context, in which case the forms within their bodies
    must be expressions.

It is undecidable whether a form is an expression, because some
transformer that is called during expansion of the form may fail to
terminate. That particular undecidability is pervasive, but I'll
mention it just this once.

****************************************************************

Page L5, first sentence of the "Bytevectors" chapter:

    Many applications must deal with blocks of binary data by
    accessing them in various ways---extracting signed or unsigned
    numbers of various sizes.

The R6RS should not mandate the existence of applications that deal
with blocks of binary data.

****************************************************************

Page L9, specification of find:

    Proc must be a procedure that takes a single argument and returns
    a single value. Proc must not mutate list.

Arity checking should not be required unless proc is
called. Implementations should not be required to check for mutation
in any case.

The "Implementation responsibilities" say the right thing, but can't
undo the damage caused by the use of "must" in the first two sentences
of the specification. That's the last time I'll mention
"Implementation responsibilities" that contradict the language I am
quoting.

****************************************************************

Pages L9, specifications of for-all and exists:

    The lists must all have the same length, and proc must be a
    procedure that accepts n arguments and returns a single
    value. Proc must not mutate the list arguments.

Arity checking should not be required unless proc is called. If proc
is called and does not return, implementations should not be required
to determine whether the other arguments are lists or whether they all
have the same length. Implementations should not be required to
determine whether proc mutates the lists.

****************************************************************

Pages L9-10, specifications of filter and partition:

    Proc must be a procedure that accepts a single argument and
    returns a single value. Proc must not mutate list.

Arity checking should not be required unless proc is
called. Implementations should not be required to determine whether
proc mutates the list.

****************************************************************

Page L10, specifications of fold-left and fold-right:

    The lists must all have the same length. Combine must be a
    procedure that takes one more argument than there are lists and
    returns a single value. Combine must not mutate the list
    arguments.

Arity checking should not be required unless combine is called. If
combine is called and does not return, implementations should not be
required to determine whether the other arguments are lists or whether
they all have the same length. Implementations should not be required
to determine whether combine mutates the lists.

****************************************************************

Pages L10-11, specifications of remp, remove, remv, remq, memp,
member, memv, memq, assp, assoc, assv, assq:

    Proc must be a procedure that takes a single argument and returns
    a single value. Proc must not mutate the list arguments.

Arity checking and number-of-return-value-checking should not be
required unless proc is called. Implementations should not be required
to determine whether proc mutates the lists.

****************************************************************

Page L12, specifications of list-sort and vector-sort:

    Proc must be a procedure that accepts any two elements of the list or vector.

Requiring implementations to test whether proc accepts any two
elements of the list or vector turns an O(n lg n) algorithm into an
O(n2) algorithm.

****************************************************************

Page L14, specification of make-record-constructor-descriptor:

    If protocol is a procedure, it is called by record-constructor
    with a single argument p and must return a procedure that creates
    and returns an instance of the record type using p as described
    below.

That is undecidable. Implementations cannot even tell whether protocol
will return. Should, not must.

    The procedure returned by protocol may take any number of
    arguments but must call new with the number of arguments it
    expects and return the resulting record instance, as shown in the
    simple example below.

Whether protocol calls new at all is already undecidable. Should, not
must.

****************************************************************

Page L20, specification of with-exception-handler:

    Handler must be a procedure that accepts one argument.

Arity checking should not be required unless the handler is
called. (In many implementations of Scheme, there is no way for the
implementation to test the arity of a procedure without calling
it. Even in systems that can test for arity without calling a
procedure, the procedure may immediately delegate to some other
procedure that does not have the same arity.)

****************************************************************

Page L21, fourth paragraph (rationale) of section 6.2:

    Consequently, to facilitate effective handling of exceptions,
    conditions must communicate as much information as possible as
    accurately as possible, and still allow effective handling by code
    that did not precisely anticipate the nature of the exception that
    occurred.

Whether conditions communicate as much information as possible as
accurately as possible is undecidable. Should, not must.

****************************************************************

Page L27, specification of &i/o-decoding:

    If the exception handler returns, it must return a character or
    string representing the decoded text starting at the port's
    current position, and the exception handler must update the port's
    position to point past the error.

The exception handler is expected to return a character or string
representing the decoded text, and should update the port's position,
but implementations can't possibly enforce that. The specification of
&i/o-encoding gets this right.

****************************************************************

Page L30, specification of the read! argument to make-custom-binary-input-port:

    The read! procedure must return the number of bytes that it
    writes, as an exact integer.

Page L33, specification of the write! argument to
make-custom-binary-output-port:

    In any case, the write! procedure must return the number of bytes
    that it reads, as an exact integer.

Should (or "is expected to"), not must.

****************************************************************

Page L34, specification of put-datum:

    If put-datum is used to write several subsequent external
    representations to an output port, care must be taken to delimit
    them properly so they can be read back in by subsequent calls to
    get-datum.

Taking care is not an absolute requirement. Should, not must.

****************************************************************

Page L34, specification of call-with-input-file:

    Proc must be a procedure accepting a single argument.

Arity checking should not be required if an i/o exception is raised
before proc is called.

****************************************************************

Page L35, specifications of with-input-from-file and with-output-to-file:

    Thunk must be a procedure that takes no arguments.

Arity checking should not be required if an i/o exception is raised
before thunk is called.

****************************************************************

Page L44, first paragraph:

    In define-syntax (report section 9.3), let-syntax, and
    letrec-syntax forms (report section 9.20), a binding for a
    syntactic keyword must be an expression that evaluates to a
    transformer.

Whether an expression evaluates to a transformer is
undecidable. (Whether an expression evaluates to anything at all is
already undecidable.) Should, not must.

****************************************************************

Page L44, specification of make-variable-transformer:

    Proc must be a procedure that accepts one argument, a wrapped syntax object.

This too is undecidable.

****************************************************************

Page L50-51:

    A hash function is a procedure that maps keys to integers, and
    must be compatible with the equivalence function, which is a
    procedure that accepts two keys and returns true if they are
    equivalent, otherwise returns #f.

It is undecidable whether the hash function is compatible with the
equivalence function.

****************************************************************

Page L51, specification of make-hash-table:

    Hash-function will be called by other procedures described in this
    chapter with a key as argument, and must return a non-negative
    exact integer.

The make-hash-table procedure cannot tell whether hash-function would
always return a non-negative exact integer when called with a key. The
make-hash-table procedure doesn't even know what kinds of keys will be
used.

****************************************************************

Will


RESPONSE:

The issues raised in the formal comment are mostly editorial
oversights and will be addressed in the next draft of the report.