[R6RS] I/O design

Wed Dec 6 16:06:26 EST 2006

Mike wrote:
> The correspondence between text and binary data is explained by a
> variant of UTF-8 (no EOL conversion, invalid encodings are replaced by
> an error character).  Thus, every byte sequence has an interpretation
> as text and vice versa.  (It's not an invertible mapping, but that
> doesn't seem to matter much.)  Anyhow, at this level, interleaving
> textual and binary I/O is no problem at all.  For un-transcoded ports,
> you also don't run into trouble with get-position and set-position!
> operations.

Please note that "every byte sequence has an interpretation
as text and vice versa" is highly misleading.  In sequences
of random bytes, about 55.5% of the bytes will not represent
the beginning of a legal UTF-8 character, which means that
Mike's proposal would replace over half of the bytes by an
error character.  To Mike, that doesn't matter much, and
"interleaving textual and binary I/O is no problem at all."

To someone who actually wishes to process binary data,
corrupting over half of the bytes is likely to matter.

Historically, Mike's presentations of his proposal have not
been very clear about the distinction between the default
("none") transcoding and the transcoding that uses UTF-8
with no EOL conversion.  If Mike's paragraph quoted above
is actually about the default ("none") transcoding, and has
nothing to do with any transcoding that is based upon the
UTF-8 encoding, then all he is saying is that UTF-8 with EOL
conversion is one of several different completely arbitrary
choices that one could use to extend the domain of get-char
(and other textual i/o procedures) to ports that use the
default ("none") transcoding; Latin-1 would be an equally
valid choice.

Here is John Cowan's opinion on this [1]:

    No, no, a thousand times no!

    If a port is binary, you should be able to read nothing but bytes objects
    from it (which you can then convert to machine integers or floats using
    the procedures of Section 11).  If you want to read characters from
    a port, make it a character port with a proper transcoder.  Java 1.0
    introduced methods to treat byte sequences as strings in this crass
    fashion; those methods were deprecated already in Java 1.1, and anyone who
    uses them is in a state of sin....

> As a note on Will's proposal: I'm unclear why the rationale for
> avoiding layering in the case of Primitive I/O doesn't also apply to
> layering textual ports on top of binary ones.

With my proposal, binary ports would still be ports, so
there is no need to duplicate portions of the API that
are common to binary and textual ports: open-input-file,
port-position, port-eof?, close-port, et cetera.

Will

[1] http://lists.r6rs.org/pipermail/r6rs-discuss/2006-November/000626.html