[R6RS] I/O

William D Clinger will at ccs.neu.edu
Tue Jul 4 05:13:47 EDT 2006


Comments on the current draft of SRFI 79 (Primitive I/O)
========================================================

General comment:  Exposing so much low-level detail
makes it harder to construct an efficient i/o system.
To me, this primitive i/o abstraction layer looks
like an extra layer of pure overhead.

If a port were defined as a reader or writer plus a
transcoder, I could see some use to this layer, but
with the side-effecting semantics for associating
transcoders with ports, I don't.

Filenames:  Please define "octet" somewhere.

Readers and Writers:  "The objects representing I/O
descriptors are called readers for input and writers
for output."  That sentence appears to be misleading
because, if I understand this document correctly,
the word "descriptor" means something completely
different (and essentially undefined) throughout
the rest of the document; it does not mean a reader
object or a writer object.

Readers, (get-position):  "EOFs do not count as
octets."  Do you envision multiple EOFs?

Specification of make-simple-reader:  The document
does not explain how a programmer is supposed to
lay hands on an object that can legitimately be
passed as the second argument (the descriptor) to
this procedure.  From that I conclude that this
procedure has no conceivable use in portable code,
and does not belong in the R6RS.

It may be that any object whatsoever may be passed
as the descriptor, inasmuch as the reader's state
is essentially private to the procedures that are
passed (read!, available, get-position, set-position!,
end-position, close).  In that case, make-simple-reader
has a purpose, but I wonder what purpose is served by
its descriptor argument.

Specification of make-simple-writer:  See my
comments above on make-simple-reader.


Comments on the current draft of SRFI 81 (Port I/O)
===================================================

Prequisites:  The unspecified value should be specified
as the value returned by the unspecified procedure.
Instead of saying "strings are represented as vectors
of scalar values", which implies that the vector?
predicate is true of strings, it should say something
like "strings are analogous to vectors of scalar values".

File options:  Instead of saying that file options are
as in SRFI 79, it should say that file options are a
subset of a certain set of symbols, as in the current
draft of the primitive io srfi.

Buffer modes:  In addition to none, line, and block,
shouldn't there be an insouciant mode?  The description
of buffer-mode should refer to name as a symbol, not
as an identifier.  (The buffer-mode syntax should
recognize the name as a symbol, not as the name of
a variable.  This matters when buffer-mode is used
within the scope of a variable whose name looks like
the symbol that names the mode.)

Specification of eol-style:  These forms should
evaluate to the symbols lf, crlf, and cr.

Specification of read-bytes-some:  If this procedure
is intended to hang when waiting to see whether more
bytes are forthcoming from its argument, the spec
should say so.  This applies to several subsequent
specifications also.

Specification of read-u8:  Please define octet
somewhere.  The spec speaks of "the next end of file";
do you envision input ports that contain multiple ends
of file?  How is "just past the end of file" different
from "just before the end of file"?  These questions
apply to several subsequent specifications as well.
By the way, what if UTF-8 is inconsistent with the
transcoding of the input port?

Specification of read-string:  The number of bytes
read appears to be ambiguous, since 0 bytes can
always be interpreted as a UTF-8 string and many
bytes that could follow a UTF-8 string might be
interpreted as an extension of that string.

Specification of read-char:  This also seems
ambiguous in the sense that the character #\a
might be followed by modifiers that could be
composed with #\a to form a new character.  I
presume the intent is that no such compositions
be formed.  A similar remark applies to the next
two procedures.

Specification of read-string-all:  How is this
procedure different from read-string ?

Specification of port-eof?:  What if the port is
currently pointing *past* an end of file (whatever
that means)?

Specification of input-port-position:  The term
"transcoded port" has not been defined prior to
its mention in this spec.  Ditto for "truncated
stream" and "translated stream".

Specification of set-input-port-position!:  Ditto
the above, plus "terminated stream", which I assume
is something like a closed port.

transcode-input-port!:  I don't like the side
effect on the port.  I assume the intention is
to prevent non-UTF-8 data from being written to
a UTF-8 port.  In that case, I'd prefer to have
some immutable transcoding defined for all ports.
(Most of the procedures that open ports already
allow a transcoding to be specified.  It should
be possible to specify a default transcoding to
be used when none is specified, and get rid of
this side effect.)

Specification of open-bytes-input-port:  The term
"byte stream" has not been defined.  Ditto for
open-string-input-port.

Specification of write-bytes:  What if the bytes
to be written are inconsistent with the transcoder
associated with the output port?  The same question
applies to write-u8, write-string-n, write-char, et
cetera.

set-output-port-buffer-mode!:  Might there be some
inefficiency associated with requiring every output
port to support this operation?

transcode-output-port!:  See my remarks regarding
transcode-input-port!.

call-with-string-output-port:  Why does this create
a "bytes writer" instead of a character writer?  If
it's a bytes writer, programs can write sequences
of bytes that have no UTF-8 decoding, and the spec
doesn't say what's supposed to happen in that case.

Specification of open-file-input+output-ports:  The
period at the end of the first sentence should be
outside both parentheses.  (Once again, "stream ports"
is an undefined term.)

Design rationale, Encoding:  The rationale claims to
avoid the problems that result from "associating an
encoding with a port", "by specifying that textual
I/O always uses UTF-8".  I don't follow this at all.
The proposal includes "predefined codecs for the ISO
8859-1, UTF-16LE, UTF-16BE, UTF32-LE, and UTF-32BE
encodings" and provides a side-effecting procedure
that associates them with a port; furthermore that
side effect is allowed only once, which seems really
ad hoc given that some data may already have been
read from or written to the port before that side
effect is performed.

Design rationale, display:  According to the most
recent status report, formatted output is not under
consideration for R6RS, so something like display
should remain.

Will



More information about the R6RS mailing list