Formal comment #236 (defect) the formal semantics should be a non-normative appendix Reported by: William D Clinger Version: 5.92 Full description of issue: 1. The summary says that "Appendix 10 provides a formal semantics for a core of Scheme", but there is no appendix 10. There is only a chapter 10, giving the impression that chapter 10 is supposed to be normative. 2. Chapter 10 describes a rewriting system, not a formal semantics. 3. That rewriting system is incomplete and also contains several errors. If the rewriting system is taken as normative, then it must be completed, its errors repaired, and its semantic import clarified before the R6RS can be ratified. 4. The rewriting system is unnecessarily complicated, which reduces its usefulness. Suggestions: Move the rewriting system into an appendix that is clearly identified as non-normative. Convert the rewriting system into a semantics by using it to describe the observable meaning of a program. Correct the errors and describe a semantics. Simplify. I will discuss these four issues, and then proceed to a tediously detailed list of the errors I happened to spot while reading chapter 10. Normative vs non-normative I would love to have a normative formal semantics for Scheme, but the rewriting system in chapter 10 contains too many errors and is too complicated for it or any quick revision of it to be considered normative in the R6RS. It should be noted that the formal semantics of previous reports was never considered normative, and IEEE Std 1178-1990 was explicit about the semantics being non-normative (and not a part of the standard). If the formal semantics were considered normative, then any conflict between it and the rest of the document would render the language unimplementable in theory, and would lead in practice to unresolvable disputes about the semantics. If the formal semantics is non-normative, then any conflicts between it and the informal semantics will be resolved in favor of the informal semantics. Not a semantics A formal operational semantics is not just a rewriting system; a semantics must also describe a relationship between the rewriting system and the meanings of programs. This is usually done by using the rewriting system to define the observable effects and/or outcomes of programs. Chapter 10 does not define the observable effects and/or outcomes of programs. Chapter 10 defines a set of possible execution traces for each program, but the meaning of a Scheme program should be more abstract than a set of execution traces. Chapter 10 should explain that implementations of Scheme are required only to deliver the same observable effects and/or results as one of the execution sequences, and that programmers are responsible for ensuring that all of the legal execution sequences deliver observable effects and/or results that satisfy the requirements of the program. Incompleteness and errors Chapter 10 makes no attempt to explain the semantics of macro expansion or libraries. That's fine if chapter 10 is non-normative, but a normative semantics would have to be connected to the output of macro expansion. That would be difficult for R6RS Scheme, since the output of macro expansion is explicitly implementation-dependent. Chapter 10 omits the char, vector, and bytevector types of (r6rs base), and omits almost all of the additional types that are described in the standard libraries. Chapter 10 requires several premature arity checks, sanctions violations of the letrec* requirement, and both over-specifies and under-specifies the semantics of eqv?. There are several other errors of lesser importance, which I will list below. Complicated Chapter 10 is overly complicated in two distinct ways: (1) it attempts to specify the semantics of constructs that could be (and probably would be) eliminated during macro expansion and (2) it performs several optimizing transformations that implementations might perform, but are out of place in a formal semantics. Both of these complexities have been defended on the grounds that they make it easier for a human to follow individual execution traces, but that misses the point of a formal semantics. If someone wants to understand the execution traces of some particular program, they will read the informal prose, or run the program in several implementations, or employ a user interface that presents the execution traces generated by the rewriting system in a form that is easier for humans to understand. They are unlikely to look at the actual execution traces produced by a rewriting system. The purpose of a formal semantics is to prove theorems, not to provide a tutorial. The unnecessary complexity of the rewriting system in chapter 10 makes it harder to prove theorems. The formal semantics should be simplified by: * dropping beginF forms (which will be eliminated by macro expansion) * dropping one-armed if (which can be converted into two-armed if by macro expansion) * requiring the body of a lambda expression to be a single expression (which the macro expander can easily arrange) * dropping the last condition of [6appN!] and dropping [6appN] altogether * dropping [6app0] * dropping figure 10.13 altogether * dropping [6beginF] The first sentence of section 10.9, "Library Top Level", says the semantics is limited to modelling the bodies of libraries that don't export anything and import only the primitives described by the semantics. In other words, the rewriting system does not attempt to shed light on the problematic issues of library semantics: macro expansion, phases, importing the same name from different libraries, and so on. The rewriting system does describe a semantics for redefinition, but that semantics is (at least partly) at odds with the prose in chapter 6, and appears to be a mere artifact of the formalization. It would be better to drop section 10.9 altogether. Detailed list of errors Some of these have already been reported, but it is easier for me to report them all. They are listed in order of occurrence, not in order of importance. I'm sure there are other errors I failed to notice. page 61, paragraph 1: there's a space between the c and o of "c ontext-". page 61, paragraph 2, last sentence: E[ep] should be E[e']. page 61, paragraph 3, second display: missing curly braces. page 62, section 10.1, paragraph 6, first sentence: "program" should be plural. page 62, bullets: the bullets omit unspecified?, raise*, and with-exception-handler. page 63, figure 10.2, productions for PG and G: these have no restriction on the number of values, which may be correct but looked suspect to me. page 63, figure 10.3: there are no productions for (dot sym) and (dot sqv). If the formal semantics is supposed to use the convention that those are equivalent to sym and sqv, it needs to say so. page 63, last paragraph, first sentence: "are" should be "is", since it refers to "a set". page 64, last paragraph before 10.2, last sentence: "are" should be "is". page 64, section 10.4, first sentence: "workhorse" should be "workhorses". page 64, section 10.4, paragraph 1, sentence 3: "is relevant" should be "are relevant". boundary between pages 64 and 65: "P and E" should be "E and P". page 65, first full paragraph, sentence 1: [6xuhee] should be [6xunee]. page 66, second full paragraph, sentence 3: "installer" should be "installed". page 67, section 10.6, paragraph 2, sentence 2: it is misleading to call pp an identifier, since the correctness of the rewriting system depends upon pair pointers being distinct from identifiers. page 68, figure 10.9: [6cons] and [6pair?t] refer to pp as an individual, but the last three rules refer to pp as a set. The last three rules are correct; [6cons] and [6pair?t] should refer to a subscripted pp_i. page 68, figure 10.10: [6eqt] overspecifies; for example, it says that (eqv? + +) is true. [6eqf] underspecifies; for example, it does not require (eqv? 3 +) to be false. page 69, rule [6appN!]: refers to a mysterious bp, which must be fresh but is otherwise unexplained. This may be a typo, or bp may have been omitted from figure 10.1. page 69, rule [6unarity]: why is this an arity mismatch instead of a call to a non-procedure? page 69, right column, line 4: "picks on" should be "picks one". pages 69 and 70, paragraph that spans those pages: The two rules that have a Greek mu in their names do not appear in any of the figures. Presumably they should refer to [6app] and [6arity]. page 71, section 10.7, last paragraph: the first sentence of that paragraph is not written in English. page 72, figure 10.14, rules [6wind] and [6dwerr]: both rules mandate premature arity checks. This is consistent with the report's prose, but the report's prose is apparently in error also and should be fixed in the next draft. page 72, figure 10.14, typography: I think the script letters at the extreme left of the last 9 lines are intended to be T, S, and R, but it's hard to tell because they are in a different script from some (but not all) of the T, S, and R that appear on the right hand sides of those equations. page 72, figure 10.14, specification of what I think is supposed to be T: the fourth of the five lines for T overlaps the case covered by the fifth line, creating an unintended ambiguity in the rewriting system. This is an error, since dropping the fourth line would not give the intended semantics. A more complete case dispatch is required here, alas. page 72, figure 10.14, specification of what I think is supposed to be S, first equation: E_1 should be F_1. page 72, first full paragraph, last sentence: what is the point of guaranteeing that there is always some result to a program, since the rewriting system (as it stands) does not relate said result to any observable meaning of the program? page 72, section 10.9, last paragraph: there is no rule named [6setf]. page 73, figure 10.15, rule [6setd]: this rule violates the letrec* restriction. page 73, figure 10.15, rules [6refu] and [6setu]: what is the point (or meaning) or "format" here? Will RESPONSE: The next draft will have the rewriting system in an appendix that is clearly identified as non-normative. The remaining suggestions will be forwarded to the authors of the formal semantics appendix.