[R6RS] I/O

Mon Jul 10 14:20:23 EDT 2006

Mike wrote:
> This is a misunderstanding about the primary role of the Primitive I/O
> layer.  The Primitive I/O layer is mainly for people implementing
> custom data sources (and possibly doing very high-performance I/O,
> which is hard with the Ports layer).

Sorry, my bad.  I think the introduction to "Readers"
contributes to this misunderstanding by saying

    A reader object typically stands for an access
    object to a file or device, but can also represent
    the output of some algorithm, such as in the case
    of string readers.

It would be more accurate, and less misleading, to say

    Although a reader object might conceivably have
    something to do with a file or device, this is
    unlikely and programmers should never assume it.
    The purpose of reader and writer objects is to
    represent the output of arbitrary algorithms in
    a form susceptible to imperative i/o.

> > Specification of make-simple-reader:  The document
> > does not explain how a programmer is supposed to
> > lay hands on an object that can legitimately be
> > passed as the second argument (the descriptor) to
> > this procedure.  From that I conclude that this
> > procedure has no conceivable use in portable code,
> > and does not belong in the R6RS.
> 
> I'm obviously failing at describing this clearly, and I need your
> help.  Remember that the Primitive I/O layer is for implementors of
> custom data sources or sinks.  The descriptor is an optional
> communication channel between the operations of a certain kind of
> source or sink.  For example, an implementation of a bytes writer
> (which is built-in, but it wouldn't have to be) will need to provide
> `writer-bytes', given just a writer.  Thus, bytes writers keep the
> data that's being accumulated in the descriptor; the descriptor is a
> communication channel between `open-bytes-writer' and `writer-bytes.'
> None of the other procedures ignorant of what kind of reader/writer
> they get touches the descriptor.  Thus, this has nothing to do with
> portability.

The specification of make-simple-reader must describe
what kinds of objects are acceptable as the second
argument.  It appears to me that you may be trying to
describe a semantics in which any object whatsoever
is acceptable as a second argument.

Furthermore it appears to me that you are trying to
describe a semantics in which this second argument is
essentially useless, because the basic operations on
a simple reader will not have access to that object
unless they close over it.

> > It may be that any object whatsoever may be passed
> > as the descriptor, inasmuch as the reader's state
> > is essentially private to the procedures that are
> > passed (read!, available, get-position, set-position!,
> > end-position, close).  In that case, make-simple-reader
> > has a purpose, but I wonder what purpose is served by
> > its descriptor argument.
> 
> The problem is that this state is hidden in closures, and more
> difficult to make available to auxiliary operations such as
> `writer-bytes'.

The writer-bytes object does not have access to a simple
reader.  Any connection between a reader and a writer
must therefore be made in ways that are not described
by the primitive i/o layer.  I am therefore mystified
by your argument.

> > Buffer modes:  In addition to none, line, and block,
> > shouldn't there be an insouciant mode?
> 
> Sure.  Could we pick a different word, though?  I'm reasonably
> proficient in English, but I had to look this one up.  (And, looking
> at the entry in Roget's, it seems to have negative connotations.)  How
> about `dont-care', `no-preference' or `never-mind'?

First let's decide whether we want this extra mode.
>From your responses, I'm wondering whether I even
understand what you mean by buffering.

> > The spec speaks of "the next end of file"; do you envision input
> > ports that contain multiple ends of file?
> 
> The model is that you have a byte sequence with interleaved
> end-of-files which goes on indefinitely.  For a finite data source, it
> ends in an infite sequence of end-of-files.  I've tried to describe
> this better.

The scream you hear was occasioned by my horror of
this semantics.  What is the rationale?

> > These questions apply to several subsequent specifications as well.
> > By the way, what if UTF-8 is inconsistent with the transcoding of
> > the input port?
> 
> This last sentence I don't understand.  Could you explain?

I don't understand it either.  I think my confusion
was caused in part by the extensive use of pronouns
(e.g. it) with ambiguous antecedents, and in part by
the absence of any semantics for transcoding.

> > Specification of read-char:  This also seems
> > ambiguous in the sense that the character #\a
> > might be followed by modifiers that could be
> > composed with #\a to form a new character.  I
> > presume the intent is that no such compositions
> > be formed.
> 
> No.  The prefix of the byte sequence forms an encoding of a scalar
> value, and it's unambiguous when that sequence ends.

In other words, no such compositions are to be formed.

> > transcode-input-port!:  I don't like the side
> > effect on the port.  I assume the intention is
> > to prevent non-UTF-8 data from being written to
> > a UTF-8 port.
> 
> No.  The intention is to support reading data streams with unknown
> encodings, where the first few bytes denote the encoding.  This is
> fairly common with Unicode, with a BOM at the beginning.  (This is
> where the concept of a purely "character port" falls down, BTW.)

Rather than invent a side effect that can be used
at any time, in any way, including the most inappropriate
ways and times that can be conceived, why not allow
an input port to be opened on inputs that may describe
their byte order in the standard way, allowing the byte
order to default in the standard way?

In particular, I question whether the side-effecting
semantics can satisfy Unicode conformance requirement
C12b.

> > Specification of write-bytes: What if the bytes to be written are
> > inconsistent with the transcoder associated with the output port?
> > The same question applies to write-u8, write-string-n, write-char,
> > et cetera.
> 
> I assume you mean the situation where a non-UTF-8 byte sequence is
> written.  I've put a paragraph on this in the "Transcoders" section.
> (This was specified in the original SRFI, but the relevant section
> drifted to the Streams SRFI, I think.)

It appears that you are requiring transcoders to
replace invalid, incomplete, and unsupported
encodings by question marks.  I question whether
this is consistent with Unicode conformance
requirements C5, C6, C7, C11, and (especially) C12a.

> > set-output-port-buffer-mode!:  Might there be some
> > inefficiency associated with requiring every output
> > port to support this operation?
> 
> I don't think so.

But I do.

> > Design rationale, Encoding:  The rationale claims to
> > avoid the problems that result from "associating an
> > encoding with a port", "by specifying that textual
> > I/O always uses UTF-8".  I don't follow this at all.
> > The proposal includes "predefined codecs for the ISO
> > 8859-1, UTF-16LE, UTF-16BE, UTF32-LE, and UTF-32BE
> > encodings"
> 
> The codecs translate between UTF-8 and the other encodings.

Then the document ought to say this somewhere.  The
document ought also to say what it means for a port
to be (appropriately or inappropriately) transcoded.

> > and provides a side-effecting procedure that associates them with a
> > port; furthermore that side effect is allowed only once, which seems
> > really ad hoc given that some data may already have been read from
> > or written to the port before that side effect is performed.
> 
> What you're writing is exactly the reason why it's only supported
> once: If the stream is un-transcoded, the buffer position easily
> corresponds to a position in the input stream, and it's trivial to do
> the transcoding *from that point*.  If it is transcoded, this mapping
> isn't easily available.

I may return to this issue after I have read some
explanation of transcoders and their semantics.

Will