Formal comment #87 (simplification) Reduce over-specification as well as under-specification Reported by: Will Clinger Component: presentation Version: 5.91 The draft R6RS often specifies things that do not need to be specified, even as it neglects to specify related things that do need to be specified. For example, the draft R6RS often does a better job of specifying which programs are incorrect (because they raise an exception) than of specifying which programs are correct, or of how correct programs will behave. The R6RS ought to distinguish the responsibilities of programmers (clients) from the responsibilities of those who implement the language (implementors). It ought to describe those responsibilities clearly, without pretending one set of responsibilities can be inferred from the other. In particular, the R6RS ought not to pretend everything that implementations are not required to detect is legal, nor to pretend implementations must detect all illegal programs. This is a pervasive problem with the draft R6RS. The problem appears to be rooted within a tacit philosophy or ideology whose unintended effects are to increase the complexity of the draft R6RS and to interfere with the pragmatic goal of writing portable programs. This comment describes several specific examples of the problem, but systematic repair will require reconsideration and editing of the entire report. * * The reports on Scheme amount to informal axiomatic specifications of the language. Programmers can rely on any behavior that is implied by the axioms, and cannot rely on any behavior that is not implied by the axioms. Implementations of Scheme must satisfy the axioms, but may exhibit arbitrary behavior for situations not covered by the axioms. In model-theoretic words, an implementation is a model of the axiomatic specification (not the other way around). The Scheme reports are like the axioms of group theory or point set topology---and unlike the axioms of Euclidean plane geometry or the second order axioms of Peano arithmetic---in that they are intended to describe many models (implementations), not just one. Portable programs must rely only on behaviors that are guaranteed by the axioms. One of the main reasons it has been so difficult to write portable programs in Scheme is that the axiomatic specifications expressed by previous reports were too weak. It is appropriate for the R6RS to strengthen those specifications. The R6RS does not need to strengthen its specification so much that it allows only one model, or a single behavior, or a single implementation, or recursively enumerable behaviors. Unnecessary over-specification, such as specifying the unspecified value, does not contribute to portability; it is more likely to detract from portability, by wasting time on things that don't matter. Over-specification also increases the complexity of the report, which makes it harder for reviewers of the draft to spot problems with it, and makes misinterpretation of the specification by programmers and implementors alike more likely. Over-specification is counter-productive. Where previous reports have erred by under-specifying, the draft R6RS frequently errs by over-specifying, while continuing to under-specify some of the more important things. For example, the draft R6RS often provides a detailed specification of circumstances under which implementations are required to raise some exception, while failing to describe the circumstances under which implementations are forbidden to raise the exception. When writing a portable program, it is more important to know what will work (i.e. will not raise an exception) than to know what won't. In short, the draft R6RS has its priorities backwards. * * Here are six specific examples of the general problem: immutable objects (section 4.7) library phasing (section 6.2) script syntax (section 7.1.1) the unspecified value (section 9.8) list utilities (section 12) plausible lists (section 23.3) * * Immutable objects (section 4.7). This problem is inherited from the R5RS. The problem is that the "immutable objects", as defined in section 4.7, are explicitly implementation-dependent. (What's more, the antecedent of the phrase "such systems" is so unclear that neither programmers nor implementors can tell for sure which implementations even have immutable objects.) This makes it harder for programmers to figure out what they must do to write portable programs. Fixing the problem is easy. The R6RS should give an implementation-independent definition of immutable objects, as the objects that result from evaluation of a literal constant together with the strings returned by symbol->string (and perhaps a few other objects). The R6RS should say that correct programs do not try to mutate immutable objects, and that implementations should (not "must") raise an exception when mutation of an immutable object is attempted. That is how the matter was handled in the R4RS. The changes made in the R5RS may have been motivated by some ideological discomfort surrounding the fact that most existing implementations cannot distinguish the immutable objects from the mutable ones, and therefore cannot reliably raise the exception they should raise. That kind of embarrassment should not stand in the way of specifying the semantics of immutable objects. * * Library phasing (section 6.2). The last paragraph of section 6.2 talks about what implementations are allowed to do, and states that "a portable library's meaning must not depend on whether the invocations are distinguished or preserved across phases or library expansions." That paragraph is an example of what I am asking the R6RS to do more consistently: It explains the programmer's responsibility. It also says that implementations have no obligation to implement many details of the phasing semantics that had been described by the previous 22 paragraphs of section 6.2. In other words, section 6.2 contains a complex specification that implementations are not required to implement, and on which programs cannot rely. Hence programmers must establish the correctness of portable libraries under the assumption that the phasing semantics is implemented, and also under the assumption that a single set of bindings is shared by all phases. It seems to me that the complex phasing semantics, on which libraries cannot rely but with which they must nonetheless be prepared to cope, hinders the development of portable libraries more than it helps. One possible solution is for the R6RS to sketch only a vague semantics for phase levels, perhaps along the lines of the semantics given for declarations in section 9.22, instead of specifying so many details that implementations are explicitly allowed to ignore. The main thing programmers need to know about phase levels is that they matter only for procedural macros and for libraries used by procedural macros. With the draft R6RS, the easiest way to cope with the phasing problems in portable code is to avoid all use of the (r6rs syntax-case) library. If that were to remain true in the R6RS, then the R6RS ought to say so. * * Script syntax (section 7.1.1). The first line of a script is required to be: #! /usr/bin/env scheme-script with no space allowed before the . On the other hand, implementations are required to ignore the first line, even if it is not the above. The draft R6RS gives a rationale for requiring implementations to ignore the first line, and almost states a rationale for requiring all Scheme programs to contain the line that will be ignored: On Unix, the ignored line will make the program look like a shell script. Of course, an incorrect Scheme program might have something else, such as meaningless garbage, on its first line. According to the draft R6RS, such incorrect programs must nevertheless behave like programs that have the prescribed first line---even on Unix. That implies that executing a Scheme program as a Unix shell script is not allowed by the draft R6RS, except as part of some larger process that ensures Unix will (in effect) ignore the first line. Since that larger process might just as well insert the first line, the specification of that line's syntax within the draft R6RS does not serve any apparent purpose. If implementations must ignore the first line of a Scheme script, then the Unix-specific syntax of that first line should either be dropped or reduced to a mere recommendation, and the formal syntax of scripts should allow anything on the first line. As things stand, the most important role of the formal syntax is to define what is meant by "the first line." Since the draft R6RS requires implementations to accept first lines other than the one generated by the grammar, however, the formal syntax given in the draft R6RS serves no real purpose, and the definition of that first line becomes an interesting, albeit silly, question. For example, does the draft R6RS allow a first line that contains multiple line separators? That contains multiple returns? That contains multiple linefeeds? * * The unspecified value (section 9.8). "The unspecified value is not a datum value, and thus has no external representation." Since the draft R6RS does not specify or even constrain the output produced when a value that has no external representation is passed to procedures such as write, implementations are free to print the unspecified value however they like. An R6RS-conformant implementation might print the unspecified value as (), or as #f, or as 42. Specifying an unspecified value that may print the same as the empty list is no less confusing than allowing procedures to return the empty list when no result is specified. I do not believe that specifying the unspecified value will improve portability in any meaningful way. If the R6RS must specify the unspecified value, however, then the R6RS should specify an external representation for the unspecified value, so programmers will recognize it when they see it. * * List utilities (section 12). The specifications of find, forall, exists, filter, partition, fold-left, fold-right, remp, memp, assp all say that the proc argument must take a certain number of arguments, but only when the list arguments are nonempty. It would be simpler just to specify the number of arguments that the proc argument must take, independently of the length of the list arguments. No purpose is served by widening the range of proc arguments for just one boundary case, but some costs of that widening are clear. One cost is the increased complexity of specification. Another is that compilers would be forbidden to warn about things like (find vector-set! x) when the compilers can't prove that x won't be the empty list. Implementations should not be required to raise an exception when things like (find vector-set! '()) are executed, but they should be allowed to raise an exception. Similarly, the specifications of find, forall, exists, memp, member, memv, memq, assq, assoc, assv, and assq were made more complex by forbidding these procedures to traverse a list past the point at which their result could be known. It would be simpler just to require their arguments to be lists (or association lists). These procedures should all be allowed to raise an exception if any of their list arguments are not a list, even if their "natural" implementations would not be likely to detect that violation. These procedures should not be required to detect that violation, however, except perhaps when they have to traverse the entire list anyway; even then, simplicity argues against requiring the exception, as does the SRFI 1 semantics of some similar procedures. As with immutable objects, I suspect the unnecessary complexity of specification in section 12 grew out of ideological discomfort with the idea that systems might enjoy some leeway with respect to when and whether they raise exceptions. Ideological discomfort should not be allowed to complicate the semantics of Scheme. * * Plausible lists (section 23.3). The unnecessary complexity of specification in section 12 really comes home to roost when pairs are mutable, as in section 23.3. Almost all of the complexity in that section could be avoided by stipulating that programmers are responsible for making sure that all list arguments are lists, and no list argument may be mutated during the call while limiting implementations' responsibilities: implementations should raise an exception if any list argument is not a list, but implementations are not required to raise those exceptions Note that reliable detection of non-lists is impossible in systems with mutable pairs and concurrent execution, and is inefficient even with sequential execution of higher order procedures. (Most of the higher-order procedures could copy their list arguments on entry, which isn't too bad, but that wouldn't work for memp, whose result must share structure with its list argument. In any case, the copying solution doesn't always work when a concurrent process lengthens the list while it is being copied, and the inefficiency of copying matters even for sequential execution.) Note also that the complex specifications of the draft R6RS do not serve their apparent purpose of requiring detection of non-list arguments, since a plausible list may not be a list. In practice, traditional implementations of the specification sketched above would detect about as many bugs as the superficially stricter and far more complex semantics described by the draft R6RS. * * Although each of the examples presented above could be the subject of a more specific formal comment, their purpose in this formal comment is to illustrate a general tendency of the draft R6RS toward unnecessary over-specification, which often masks related under-specification. I believe it is more important to recognize and to correct the general tendency than to fix these specific examples of it. RESPONSE: We are in general agreement with the sentiment of the formal comment and will work toward revising the report draft accordingly.