[R6RS] summary of decisions regarding multiple values

Thu Jun 22 08:59:41 EDT 2006

Since we're going to revote on this anyway, and because
I would rather light a candle than curse the darkness,
let me present two detailed proposals on which we can
vote.  The first concerns the interpretation of our vote
on exception issue 42, and the second reconsiders our
vote on general issue 1.)

The first proposal is that R6RS adopt the following
semantics for parameter passing and returning.

 1. The procedure that results from evaluating a lambda
    expression of the form (lambda (x1 ... xn) ...)
    expects exactly n arguments.  The procedure that
    results from evaluating a lambda expression of the
    form (lambda (x1 ... xn . rest) ...) expects at
    least n arguments.  The procedure that results
    from evaluating a lambda expression of the form
    (lambda x ...) expects any number of arguments.

 2. In safe mode, if a procedure is called with some
    number of arguments it does not expect, then a
    &values exception must be raised.  (In unsafe
    mode, all bets are off.)

 3. There is only one kind of continuation in Scheme.
    All continuations expect any number of arguments.
    The escape procedures created by call/cc expect
    any number of arguments.

 4. If a continuation created by argument evaluation
    receives more than one value, it ignores all but
    the first.  If it receives no values, it behaves
    as though the unspecified value had been returned.

 5. The values procedure returns its arguments to its
    continuation.

 6. The call-with-values procedure calls its second
    argument on the values returned by its first.
    (If that number of arguments is not what the
    second argument expected, then an exception
    must be raised, but that is a consequence of
    point 1 above and has nothing to do with
    continuations.)

This proposed semantics is simple, useful, tested
(Common Lisp and some implementations of Scheme),
and efficient.

The second proposal is contingent upon the first.
If the first proposal is adopted, then I propose
that set!, set-car!, etc be required to return zero
values.  (This is almost consistent with requiring
them to return the unspecified value, because the
only way to distinguish the two would be to use
call-with-values.)

Kent wrote:
> Requiring a begin continuation to accept zero values and set!, write,
> etc., to return zero values might seem more consistent, but that would
> break a lot of r5rs code.  Requiring a begin continuation to accept only a
> specific single value, the "unspecified" value, might also seem more
> consistent, but that would still break some r5rs code.

My two proposals above are perfectly consistent, and
neither would break a single line of R5RS-conforming
code.

Now, alas, I must curse some darkness.

Before Kent wrote the paragraph I quoted above,
he wrote:
> Good question.  Here's my answer.  If we're going to require that a begin
> continuation ignore a single, arbitrary value, it makes sense that it
> ignore all of the values.  In other words, using begin is essentially a
> declaration that the values don't matter and should be ignored.  For a
> continuation that doesn't ignore at least one of its values, it makes
> sense to require that it be passed the right number of values.

Those requirements make sense in the sense that they are
not complete nonsense, but not being complete nonsense is
not the only property people expect of a design.  We need
a genuine rationale.

> One reason is to detect a common programming error.  Another is to help
> ensure that programs intended to be portable don't count on
> implementation-specific behavior, like ignoring all but the first value
> passed to a single-value continuation or manufacturing some arbitrary
> single value when no values are passed to a single-value continuation.

Those are two rationales.  The first is genuine, and I
will look at it more carefully in a moment.  The second
is a genuine rationale for having some well-specified
semantics, but it is not a genuine rationale for any
particular semantics.

In particular, my first proposal resolves the portability
issue at least as well as the semantics you described.  In
a practical sense, I believe my first proposal would make
programs more portable than with the other semantics,
because four or five of the world's six and a half major
Scheme compilers are unlikely to implement the semantics
you described, even if that makes them non-compliant.

Let's examine the one and only genuine rationale that has
ever been suggested for the semantics you described: it
detects a common programming error.  To decide how common
it is, we have to understand the nature of the error, and
whether it is really an error.

In R5RS Scheme, returning anything but a single value to
a continuation that was not created by call-with-values
is definitely an error, simply because the R5RS says so.
That aspect of the R5RS is completely arbitrary, however.
A majority of the R6RS editors apparently believe it
should not be considered an error, since we have voted
to require continuations created by begin to accept any
number of arguments.

If that vote stands, then ignoring some of the values
returned to a continuation will not be considered an
error in R6RS Scheme.

Even though it is not always an error to ignore some
of the values returned to a continuation, one could
choose some particular syntactic construct and argue
that, for that particular syntactic context, it is
almost always an error to ignore some of the values,
and that we would do programmers a favor by forbidding
them to ignore values in those contexts.

Any such prohibition would introduce an irregularity
into the language and violate the principle "that a
very small number of rules for forming expressions,
with no restrictions on how they are composed" is
what Scheme is about.

That doesn't mean we should dismiss any proposed
restrictions out of hand, but it does mean those who
argue for the specific restriction we are considering
must bear the burden of establishing that programmers
seldom want to ignore any of the values that are
returned to an operand context.

That burden cannot be met.  The usefulness of ignoring
extra values has been demonstrated in Common Lisp and
Bigloo.  I have long advocated the semantics proposed
at the top of this essay, even before multiple values
were added to Scheme, and have always said this would
eventually be the semantics of multiple values in
Larceny.  (In fact, I have implemented this semantics,
and was just taking the precaution of confirming my
understanding of the R6RS semantics before checking my
implementation of it into Larceny's main code base.)

When I compare the semantics described at the top of
this essay to the semantics that Kent described, the
semantics I favor wins out with respect to simplicity
(of semantics, use, and implementation), consistency,
tradition, reusability, interoperability, and efficiency.

Some of its simplicity is obvious, but let me point
out some practical benefits of that simplicty.

My second proposal above is one immediate benefit:
procedures that return no useful value could return
zero values without breaking any R5RS-portable code.

Another benefit is that transformations from source
code into CPS and A-normal form, and optimizations
of those forms that appear throughout the research
literature, will continue to apply to Scheme.  With
the semantics Kent described, almost all of those
transformations would break, as would compilers that
use them.  Scheme would lose much of its traditional
usefulness for research into program transformations.

To understand how my proposed semantics would improve
reusability while simplifying interfaces, consider
the following arithmetic procedures from SRFI 77:

    fixnum-div+mod
    fixnum-div
    fixnum-mod
    fixnum-div0+mod0
    fixnum-div0
    fixnum-mod0
    fldiv+mod
    fldiv
    flmod
    fldiv0+mod0
    fldiv0
    flmod0
    exact-div+mod
    exact-div
    exact-mod
    exact-div0+mod0
    exact-div0
    exact-mod0
    inexact-div+mod
    inexact-div
    inexact-mod
    inexact-div0+mod0
    inexact-div0
    inexact-mod0
    div+mod
    div
    mod
    div0+mod0
    div0
    mod0
    fixnum+/carry
    fixnum-/carry
    fixnum*/carry
    fixnum+
    fixnum-
    fixnum*

With the semantics proposed at the top of this rant,
the div procedures could return both the dividend and
remainder as values, and the mod procedures could
return both the remainder and the dividend.  All of
the div+mod procedures would become equivalent to the
div procedures, and would go away.  Similarly the
fixnum+, fixnum*, and fixnum- procedures could return
both the low-order bits and the carry, making the
fixnum+/carry, fixnum*/carry, and fixnum-/carry
procedures redundant.

This would be just as efficient as having all of the
SRFI 77 procedures, because it is easy for a compiler
to recognize a known call to fixnum+ (for example)
whose context obviously ignores the carry, and turn
it into machine code that doesn't generate the carry.

> The overhead isn't significant in Chez Scheme, so I'm not convinced by the
> overhead argument, any more than I'm convinced by overhead arguments in
> other situations where we've required exceptions to be raised.

The editors have an obligation to consider the overhead
for implementations other than Chez Scheme.  In Chez
Scheme, IIRC, the values procedure must look inside its
continuation to determine the number of arguments that
the continuation expects.  This technique isn't really
practical for systems that compile to portable C code,
or for systems that must interoperate with Java or C#
continuations that cannot be examined.

The semantics I propose is much easier to implement
efficiently when compiling to C or to the JVM or CLR.
It is already implemented in Bigloo and (what I hope
will be) the next version of Larceny, and it would be
an easy semantics for Gambit to adopt.

In Larceny, the semantics Kent described would
increase the cost of non-tail calls by about 12%,
even in programs that don't use multiple values at
all.  That would violate the generally recognized
principle that programs should not pay for language
features they don't use.

Finally, the semantics I am proposing is also fast
for programs that make heavy use of multiple values,
even without the sophistication of Kent's published
algorithm for multiple values.  On a SunBlade 1500,
one iteration of a micro-benchmark that repeatedly
uses call-with-values to call an unknown procedure
that immediately returns two integer values takes
about 88 nanoseconds in Chez Scheme (opt-level 2),
with the semantics Kent described.  In Larceny,
with the semantics I am proposing, it takes about
102 nanoseconds.

Those numbers should be compared to the cost of
performing two separate calls to procedures that
return those values separately: 100 nanoseconds
in Chez Scheme, 130 nanoseconds in Larceny.

Will