[R6RS] I/O

William D Clinger will at ccs.neu.edu
Mon Jul 10 16:08:02 EDT 2006


Mike wrote:
> More confusion, I'm afraid.  The descriptor is not for communication
> between a reader and a writer.  It is for communicating between the
> constructor of a writer and a procedure that accesses some aspect of
> the writer.  There's `open-bytes-writer' and `writer-bytes,' which
> communicate.

Please recall that we were discussing the specification
of make-simple-reader.  When I questioned the rationale
for the descriptor, you responded by talking about bytes
writers.  Now you say writers have no need to communicate
with readers.  Therefore, I conclude, the descriptor is
not motivated by the example you gave for that purpose.
I remain mystified by your argument.

> > The scream you hear was occasioned by my horror of
> > this semantics.  What is the rationale?
> 
> The intermittent end-of-files are a natural by-product of incremental
> or interactive data sources.  The trailing infinite sequence of
> end-of-files is from R5RS, and is consistent with that notion.

I don't have any problem with the trailing infinite
sequence of ends-of-file, but with the idea that data
must be read following the reading of an end of file.
I do not agree that this is a natural by-product of
incremental or interactive data sources.

> > Rather than invent a side effect that can be used
> > at any time, in any way, including the most inappropriate
> > ways and times that can be conceived, why not allow
> > an input port to be opened on inputs that may describe
> > their byte order in the standard way, allowing the byte
> > order to default in the standard way?
> 
> Because there is no "standard way", as far as I can tell.  Different
> environments and conventions allow for different sets of BOMs, for
> example.  The full set of BOMs doesn't allow unambiguous determination
> of the encoding.  (Moreover, there's XML which contains the name of
> the encoding on the first ASCII line, if I understand it correctly.)

There are, in fact, several standard ways.  Why not
support them?  (Indeed, the Unicode standard appears
to require us to provide the standard support for
whatever standard ways we choose to support.  See
C12b below.)  The side effect you are proposing will
encourage Scheme programmers to create yet more
nonstandard, ad hoc approaches to this mostly solved
problem.

> > It appears that you are requiring transcoders to
> > replace invalid, incomplete, and unsupported
> > encodings by question marks.  I question whether
> > this is consistent with Unicode conformance
> > requirements C5, C6, C7, C11, and (especially) C12a.
> 
> That's possible---I took this behavior from PLT Scheme, after
> discussing it with Matthew.  Alternative suggestions?

I would suggest:

C5  A process shall not interpret a noncharacter code
point as an abstract character.

C6  A process shall not interpret an unassigned code
point as an abstract character.

C7  A process shall interpret a coded character
representation according to the semantics established
by [the Unicode] standard, if that process does
interpret that character representation.

C11 When a process interprets a code unit sequence which
purports to be in a Unicode character encoding form, it
shall interpret that code unit sequence according to the
corresponding code point sequence.

C12 When a process generates a code unit sequence which
purports to be in a Unicode character encoding form, it
shall not emit ill-formed code unit sequences.

C12a When a process interprets a code unit sequence which
purports to be in a Unicode character encoding form, it
shall treat ill-formed code unit sequences as an error
condition, and shall not interpret such sequences as
characters.

C12b When a process interprets a byte sequence which
purports to be in a Unicode character encoding scheme,
it shall interpret that byte sequence according to the
byte order and specifications for the use of the byte
order mark established by [the Unicode] standard for
that character encoding scheme.

> >> > set-output-port-buffer-mode!:  Might there be some
> >> > inefficiency associated with requiring every output
> >> > port to support this operation?
> >>
> >> I don't think so.
> >
> > But I do.
> 
> Then you'll have to explain where you think the inefficiency is.

I see several inefficiencies.  At the very outset, the
representation of every output port will have to contain
space that is adequate for every buffer mode.  Furthermore
each output operation will have to check the current mode,
even within loops that contain infrequent predicated calls
to unknown procedures.  Et cetera.  For some loops, we're
probably talking about a factor of 2 in performance.

Will



More information about the R6RS mailing list