[R6RS] write/read or read/write invariance
William D Clinger
will at ccs.neu.edu
Tue May 16 13:24:05 EDT 2006
> > What is intended, I believe, is that for every input port p
> > (let ( (x (read p)) (tmp
> > "/tmp/foo")) (define (datum? x)
> > (or (boolean? x) (
> > number? x)(
> > char? x)
> > (string? x)( symbol? x)(list? x)(vector? x)))
> > (if (datum? x)
> > (equal?
> > (begin (call-with-output-file tmp (lambda (out)
> > (write x out))) (call-with-input-file tmp read)) x) #t))
> Okay, so if we write out something that qualifies as a datum, it reads
> back in as the same datum. But earlier you told Anton that:
> The word "datum" means one of the external representations
> that are generated by the nonterminal <datum> in the formal
> syntax of Scheme. Since the value returned by (unspecified)
> has no external representation (aside from implementation-
> specific extensions), it is not a datum.
> So is datum an external representation (as the quoted text says) or a type
> of object that has an external representation (as your datum? predicate
My datum? predicate may suggest that, but it does not imply
A datum is an external representation. I apologize for not
stating the contract of the datum? predicate in my code. It
would be something like "Given a Scheme value x that was
returned by the read procedure, and has not been subject to
any side effects, returns #t if and only x is a datum." If
we assume the read procedure has not been extended to allow
implementation-specific extensions, then the datum? procedure
could be written more simply as [define [datum? x] [not [
> If the latter, is a list or vector that contains an element
> that is something other than a datum really a datum? Similarly, is a list
> or vector x that contains an element eq? to x really a datum? I notice
> you left out improper lists, like (a . b)---was this an oversight or do
> only proper lists qualify?
Leaving out improper lists was an oversight. For the rest,
I was assuming no implementation-specific extensions, and
my attempt to provide an exhaustive list of the types of
values that the read procedure can return (exclusive of the
end-of-file object) was a failed attempt at clarity through
> I suspect that you meant to include pairs, to exclude lists or vectors
> that contain an element that is not a datum, and to exclude cyclic
> structures in general (since we opted not to include a syntax for cyclic
Yes. If the read procedure is required to raise an exception
on non-standard inputs, then it cannot return lists or vectors
that contain a non-datum element and cannot return a cyclic
structure, unless the exception is continuable and someone
writes an exception handler that returns such a thing. (If
we have to spell that out for every procedure that might
raise an exception, the R6RS will become even more verbose.)
> Thus, not every list or vector we construct is a datum. Why
> then must every symbol we construct be a datum?
There is no technical reason why a symbol cannot be a datum,
but the R6RS editors have voted to require "serialization
(read-write invariance) for every datum (part of Unicode
support)". I believe the community interprets this to mean,
in part, that every symbol will be serializable; we are free
to disregard that interpretation. In any case, whether an
symbol must be a datum has naught to do with my interpretation
of read/write invariance, since every symbol that can be read
by the read procedure will in fact be a datum (modulo the
possibility of a continuable exception blah blah).
> If we're free to say what
> datum? means on our way to "write/read invariance", can't we say that if
> sym is a symbol, then (datum? sym) is true iff (symbol->string sym) can
> be parsed via the R5RS grammar for forming symbols? I'm not necessarily
> advocating this, just trying to figure out the principle, if any, behind
> "write/read invariance".
I don't know anything about write/read invariance. The only
reason I know that what I've been talking about is read/write
invariance is that I looked at the most recent status report.
I would suggest we stay far away from write/read invariance.
> > It provides external representations for every character,
> > string, and symbol that can be expressed using Unicode
> > scalar values.
> I don't see where it requires the use of any particular external
> representation. Will this be specified elsewhere in the report?
I don't believe the Unicode SRFI requires the use of any
particular external representation, although read/write
invariance combined with the potential requirement that
read must raise an exception on (certain) nonstandard
inputs would constrain the external representations.
> > > In particular, should the following return #t (assuming that there are no
> > > I/O problems)?
> > >
> > > (let ([x (string->symbol "12345")])
> > > (with-output-to-file "foo" (lambda () (write x)))
> > > (eq? (with-input-from-file "foo" read) x))
> > Yes.
> > > What will the file foo contain?
> > The file's contents should instead be:
> > \x31;\x32;\x33;\x34;\x35;
> Could this be
> Is one form encouraged over the other?
No. We could change that, of course, but I don't see a reason.
> Could it be
> instead? (I assume not.)
I also assume not. I also assume it is our job to specify
a lexical syntax that excludes 1\x32;345.
> The third bullet item still isn't clear, however, since it says only what
> happens if a slash appears in a symbol and not where slashes can appear.
> One possibility is that \x<x>...<x>; counts as an <initial>, no
> matter what character it represents.
That sounds like the right way to specify the lexical syntax.
More information about the R6RS