[R6RS] my formal comment on the formal semantics

Sun Apr 8 13:59:57 EDT 2007

I finally finished reading the formal semantics and
wrote up my formal comment on it, which I submitted
a few minutes ago.  It may take a few days for that
formal comment to show up on r6rsDiscuss, and we're
planning to discuss the formal semantics Wednesday,
so I'm posting this duplicate of my formal comment
for the editors' consideration in advance of our
Wednesday telephone conference.

Will

--------

Submitter: William D Clinger
Email address: will at ccs.neu.edu
Issue type: Defect
Priority: Critical
Component: Formal Syntax
Report version: 5.92
Summary: the formal semantics should be a non-normative appendix

Full description of issue:

1.  The summary says that "Appendix 10 provides a formal
    semantics for a core of Scheme", but there is no
    appendix 10.  There is only a chapter 10, giving
    the impression that chapter 10 is supposed to be
    normative.

2.  Chapter 10 describes a rewriting system, not a formal
    semantics.

3.  That rewriting system is incomplete and also contains
    several errors.  If the rewriting system is taken as
    normative, then it must be completed, its errors
    repaired, and its semantic import clarified before
    the R6RS can be ratified.

4.  The rewriting system is unnecessarily complicated,
    which reduces its usefulness.

Suggestions:  Move the rewriting system into an appendix
that is clearly identified as non-normative.  Convert
the rewriting system into a semantics by using it to
describe the observable meaning of a program.  Correct
the errors and describe a semantics.  Simplify.

I will discuss these four issues, and then proceed to a
tediously detailed list of the errors I happened to spot
while reading chapter 10.

Normative vs non-normative
==========================

I would love to have a normative formal semantics for
Scheme, but the rewriting system in chapter 10 contains
too many errors and is too complicated for it or any
quick revision of it to be considered normative in the
R6RS.  It should be noted that the formal semantics of
previous reports was never considered normative, and
IEEE Std 1178-1990 was explicit about the semantics
being non-normative (and not a part of the standard).

If the formal semantics were considered normative, then
any conflict between it and the rest of the document
would render the language unimplementable in theory,
and would lead in practice to unresolvable disputes about
the semantics.  If the formal semantics is non-normative,
then any conflicts between it and the informal semantics
will be resolved in favor of the informal semantics.

Not a semantics
===============

A formal operational semantics is not just a rewriting
system; a semantics must also describe a relationship
between the rewriting system and the meanings of programs.
This is usually done by using the rewriting system to
define the observable effects and/or outcomes of programs.

Chapter 10 does not define the observable effects and/or
outcomes of programs.  Chapter 10 defines a set of possible
execution traces for each program, but the meaning of a
Scheme program should be more abstract than a set of
execution traces.  Chapter 10 should explain that
implementations of Scheme are required only to deliver
the same observable effects and/or results as one of the
execution sequences, and that programmers are responsible
for ensuring that all of the legal execution sequences
deliver observable effects and/or results that satisfy
the requirements of the program.

Incompleteness and errors
=========================

Chapter 10 makes no attempt to explain the semantics of
macro expansion or libraries.  That's fine if chapter 10
is non-normative, but a normative semantics would have
to be connected to the output of macro expansion.  That
would be difficult for R6RS Scheme, since the output of
macro expansion is explicitly implementation-dependent.

Chapter 10 omits the char, vector, and bytevector types
of (r6rs base), and omits almost all of the additional
types that are described in the standard libraries.

Chapter 10 requires several premature arity checks,
sanctions violations of the letrec* requirement, and
both over-specifies and under-specifies the semantics
of eqv?.  There are several other errors of lesser
importance, which I will list below.

Complicated
===========

Chapter 10 is overly complicated in two distinct ways:
(1) it attempts to specify the semantics of constructs
that could be (and probably would be) eliminated during
macro expansion and (2) it performs several optimizing
transformations that implementations might perform, but
are out of place in a formal semantics.

Both of these complexities have been defended on the
grounds that they make it easier for a human to follow
individual execution traces, but that misses the point
of a formal semantics.  If someone wants to understand
the execution traces of some particular program, they
will read the informal prose, or run the program in
several implementations, or employ a user interface
that presents the execution traces generated by the
rewriting system in a form that is easier for humans
to understand.  They are unlikely to look at the actual
execution traces produced by a rewriting system.

The purpose of a formal semantics is to prove theorems,
not to provide a tutorial.  The unnecessary complexity
of the rewriting system in chapter 10 makes it harder
to prove theorems.

The formal semantics should be simplified by:

  * dropping beginF forms (which will be eliminated by
    macro expansion)

  * dropping one-armed if (which can be converted into
    two-armed if by macro expansion)

  * requiring the body of a lambda expression to be
    a single expression (which the macro expander can
    easily arrange)

  * dropping the last condition of [6appN!] and dropping
    [6appN] altogether

  * dropping [6app0]

  * dropping figure 10.13 altogether

  * dropping [6beginF]

The first sentence of section 10.9, "Library Top Level",
says the semantics is limited to modelling the bodies
of libraries that don't export anything and import only
the primitives described by the semantics.  In other
words, the rewriting system does not attempt to shed
light on the problematic issues of library semantics:
macro expansion, phases, importing the same name from
different libraries, and so on.  The rewriting system
does describe a semantics for redefinition, but that
semantics is (at least partly) at odds with the prose
in chapter 6, and appears to be a mere artifact of the
formalization.  It would be better to drop section 10.9
altogether.

Detailed list of errors
=======================

Some of these have already been reported, but it is
easier for me to report them all.  They are listed in
order of occurrence, not in order of importance.  I'm
sure there are other errors I failed to notice.

page 61, paragraph 1: there's a space between the c
and o of "c ontext-".

page 61, paragraph 2, last sentence: E[ep] should be
E[e'].

page 61, paragraph 3, second display: missing curly
braces.

page 62, section 10.1, paragraph 6, first sentence:
"program" should be plural.

page 62, bullets: the bullets omit unspecified?,
raise*, and with-exception-handler.

page 63, figure 10.2, productions for PG and G:
these have no restriction on the number of values,
which may be correct but looked suspect to me.

page 63, figure 10.3: there are no productions for
(dot sym) and (dot sqv).  If the formal semantics
is supposed to use the convention that those are
equivalent to sym and sqv, it needs to say so.

page 63, last paragraph, first sentence: "are"
should be "is", since it refers to "a set".

page 64, last paragraph before 10.2, last sentence:
"are" should be "is".

page 64, section 10.4, first sentence: "workhorse"
should be "workhorses".

page 64, section 10.4, paragraph 1, sentence 3:
"is relevant" should be "are relevant".

boundary between pages 64 and 65: "P and E" should
be "E and P".

page 65, first full paragraph, sentence 1:
[6xuhee] should be [6xunee].

page 66, second full paragraph, sentence 3:
"installer" should be "installed".

page 67, section 10.6, paragraph 2, sentence 2:
it is misleading to call pp an identifier, since
the correctness of the rewriting system depends
upon pair pointers being distinct from identifiers.

page 68, figure 10.9:
[6cons] and [6pair?t] refer to pp as an individual,
but the last three rules refer to pp as a set.  The
last three rules are correct; [6cons] and [6pair?t]
should refer to a subscripted pp_i.

page 68, figure 10.10:
[6eqt] overspecifies; for example, it says that
(eqv? + +) is true.  [6eqf] underspecifies; for
example, it does not require (eqv? 3 +) to be
false.

page 69, rule [6appN!]: refers to a mysterious bp,
which must be fresh but is otherwise unexplained.
This may be a typo, or bp may have been omitted
from figure 10.1.

page 69, rule [6unarity]: why is this an arity
mismatch instead of a call to a non-procedure?

page 69, right column, line 4: "picks on" should
be "picks one".

pages 69 and 70, paragraph that spans those pages:
The two rules that have a Greek mu in their names
do not appear in any of the figures.  Presumably
they should refer to [6app] and [6arity].

page 71, section 10.7, last paragraph:  the first
sentence of that paragraph is not written in English.

page 72, figure 10.14, rules [6wind] and [6dwerr]:
both rules mandate premature arity checks.  This
is consistent with the report's prose, but the
report's prose is apparently in error also and
should be fixed in the next draft.

page 72, figure 10.14, typography:  I think the
script letters at the extreme left of the last 9
lines are intended to be T, S, and R, but it's
hard to tell because they are in a different
script from some (but not all) of the T, S, and
R that appear on the right hand sides of those
equations.

page 72, figure 10.14, specification of what I
think is supposed to be T:
the fourth of the five lines for T overlaps the
case covered by the fifth line, creating an
unintended ambiguity in the rewriting system.
This is an error, since dropping the fourth line
would not give the intended semantics.  A more
complete case dispatch is required here, alas.

page 72, figure 10.14, specification of what I
think is supposed to be S, first equation:
E_1 should be F_1.

page 72, first full paragraph, last sentence:
what is the point of guaranteeing that there
is always some result to a program, since the
rewriting system (as it stands) does not relate
said result to any observable meaning of the
program?

page 72, section 10.9, last paragraph:
there is no rule named [6setf].

page 73, figure 10.15, rule [6setd]:
this rule violates the letrec* restriction.

page 73, figure 10.15, rules [6refu] and [6setu]:
what is the point (or meaning) or "format" here?

Will