[R6RS] write/read or read/write invariance

William D Clinger will at ccs.neu.edu
Tue May 16 13:24:05 EDT 2006


Kent wrote:
> > What is intended, I believe, is that for every input port p
> >
> >    (let (   (x (read p)) (tmp
> >      "/tmp/foo")) (define (datum? x)
> >        (or (boolean? x) (
> >           number? x)(
> >       		char? x)
> >	(string?           x)(    symbol? x)(list? x)(vector? x)))
> >      (if (datum? x)
> >         (equal?
> >      (begin (call-with-output-file tmp (lambda (out)
> >      (write x out))) (call-with-input-file tmp read)) x) #t))
>  
> Okay, so if we write out something that qualifies as a datum, it reads
> back in as the same datum.  But earlier you told Anton that:
> 
>   The word "datum" means one of the external representations
>   that are generated by the nonterminal <datum> in the formal
>   syntax of Scheme.  Since the value returned by (unspecified)
>   has no external representation (aside from implementation-
>   specific extensions), it is not a datum.
> 
> So is datum an external representation (as the quoted text says) or a type
> of object that has an external representation (as your datum? predicate
> implies)?

My datum? predicate may suggest that, but it does not imply
it.

A datum is an external representation.  I apologize for not
stating the contract of the datum? predicate in my code.  It
would be something like "Given a Scheme value x that was
returned by the read procedure, and has not been subject to
any side effects, returns #t if and only x is a datum."  If
we assume the read procedure has not been extended to allow
implementation-specific extensions, then the datum? procedure
could be written more simply as [define [datum? x] [not [
eof-object? x]]].

> If the latter, is a list or vector that contains an element
> that is something other than a datum really a datum?  Similarly, is a list
> or vector x that contains an element eq? to x really a datum?  I notice
> you left out improper lists, like (a . b)---was this an oversight or do
> only proper lists qualify?

Leaving out improper lists was an oversight.  For the rest,
I was assuming no implementation-specific extensions, and
my attempt to provide an exhaustive list of the types of
values that the read procedure can return (exclusive of the
end-of-file object) was a failed attempt at clarity through
needless complexity.

> I suspect that you meant to include pairs, to exclude lists or vectors
> that contain an element that is not a datum, and to exclude cyclic
> structures in general (since we opted not to include a syntax for cyclic
> structures).

Yes.  If the read procedure is required to raise an exception
on non-standard inputs, then it cannot return lists or vectors
that contain a non-datum element and cannot return a cyclic
structure, unless the exception is continuable and someone
writes an exception handler that returns such a thing.  (If
we have to spell that out for every procedure that might
raise an exception, the R6RS will become even more verbose.)

> Thus, not every list or vector we construct is a datum.  Why
> then must every symbol we construct be a datum?

There is no technical reason why a symbol cannot be a datum,
but the R6RS editors have voted to require "serialization
(read-write invariance) for every datum (part of Unicode
support)".  I believe the community interprets this to mean,
in part, that every symbol will be serializable; we are free
to disregard that interpretation.  In any case, whether an
symbol must be a datum has naught to do with my interpretation
of read/write invariance, since every symbol that can be read
by the read procedure will in fact be a datum (modulo the
possibility of a continuable exception blah blah).

> If we're free to say what
> datum? means on our way to "write/read invariance", can't we say that if
> sym is a symbol, then (datum? sym) is true iff (symbol->string sym) can
> be parsed via the R5RS grammar for forming symbols?  I'm not necessarily
> advocating this, just trying to figure out the principle, if any, behind
> "write/read invariance".

I don't know anything about write/read invariance.  The only
reason I know that what I've been talking about is read/write
invariance is that I looked at the most recent status report.

I would suggest we stay far away from write/read invariance.

> > It provides external representations for every character,
> > string, and symbol that can be expressed using Unicode
> > scalar values.
> 
> I don't see where it requires the use of any particular external
> representation.  Will this be specified elsewhere in the report?

I don't believe the Unicode SRFI requires the use of any
particular external representation, although read/write
invariance combined with the potential requirement that
read must raise an exception on (certain) nonstandard
inputs would constrain the external representations.

> > > In particular, should the following return #t (assuming that there are no
> > > I/O problems)?
> > > 
> > >   (let ([x (string->symbol "12345")])
> > >     (with-output-to-file "foo" (lambda () (write x)))
> > >     (eq? (with-input-from-file "foo" read) x))
> >
> > Yes.
> >
> > > What will the file foo contain?
> >
> > The file's contents should instead be:
> >
> > \x31;\x32;\x33;\x34;\x35;
> 
> Could this be
> 
>   \x31;2345
> 
> instead?

Yes.

> Is one form encouraged over the other?

No.  We could change that, of course, but I don't see a reason.

> Could it be
> 
>   1\x32;345
> 
> instead?  (I assume not.)

I also assume not.  I also assume it is our job to specify
a lexical syntax that excludes 1\x32;345.

> The third bullet item still isn't clear, however, since it says only what
> happens if a slash appears in a symbol and not where slashes can appear. 
> One possibility is that \x<x>...<x>; counts as an <initial>, no
> matter what character it represents.

That sounds like the right way to specify the lexical syntax.

Will



More information about the R6RS mailing list