[R6RS] I/O

William D Clinger will at ccs.neu.edu
Thu Jul 13 12:02:10 EDT 2006


Mike wrote:
> > If it's true, then stating that it is true would help.
> > If it's false, then stating that it is false would help,
> > but you would still have to explain what kind of objects
> > are acceptable as a descriptor.
> 
> I tried to do that in the latest draft.  Better?

Yes, thank you.

> >> > I would guess that you are assuming a model in which some
> >> > character, e.g. END OF TRANSMISSION, popularly known as
> >> > control-D, is interpreted as an end of file when typed.
> >>
> >> No.  I'm assuming a model where the other end sends a block of data
> >> and then pauses temporarily (i.e. stops typing).
> >
> > Wow.  Please confirm:
> 
> No, you're right....
> Your original statement quoted above is correct.  Sorry about that.

In that case, I beg you to change the specification of
read-string-all so it returns #f when no data is available,
instead of returning the end-of-file object.  As things
stand, there is no way to distinguish between:

    1.  no character available at the moment; try again later
and
    2.  true end of file, forever and ever, amen.

If situation 1 looks exactly the same as situation 2, then
programs are going to waste time and space in polling an
input port that some part of the system already knows will
never be able to deliver any characters.

Matthew wrote:
> When you type control-D in a Unix terminal, it doesn't actually insert
> a control-D into the input stream connected to the terminal. Instead,
> the terminal handles the control-D in such a way that the stream
> returns an end-of-file to the reader (i.e., the read() system call
> returns 0, which indicates an end-of-file).
> 
> In other words, at the Unix stream level, it's not a question of
> keyboard handling or interpreting characters. The underlying Unix
> stream model supports EOF results that are followed by non-EOF results.

Ah, Eunichs.  Deficient by Design (service mark).

% stty --help; stty eof 0x0; stty -a

In other words, it is entirely a matter of interpreting
characters.  The problem is (1) the usual interpretation
inserts fake EOF results that can be followed by non-EOF
results, and (2) in cooked mode, this interpretation is
performed before the Scheme program sees the input.

More from Matthew:
> Whether we want to support this facet of Unix streams is a valid
> question.

I think the right thing for us to do is to write the R6RS
in a sane manner, as though end-of-file is a permanent
condition, but to add a note warning that certain operating
systems are known to issue spurious end-of-file results
under some circumstances, so that some input ports may
still be able to deliver characters after delivering an
end-of-file object.  I think that will make it clearer
that implementations of Scheme are free to do whatever
they want about OS-specific idiocy.

Back to Mike:
> > I believe I suggested an alternative some time ago: provide
> > a procedure that takes an input port and a transcoding, and
> > returns an input port that uses the transcoding.
> 
> I considered that, but it is very difficult to implement efficiently.
> (At least for me.)  I believe PLT's implementation of these procedures
> effectively turns off buffering on the underlying port, forcing the
> decoding to happen character-by-character.

That's a valid objection.  On the other hand, the inefficiency
would occur only in what we expect to be a rare combination of
circumstances:

    1.  the input may be in a completely nonstandard encoding
        (else one of the standard ways of opening the port
        and determining its encoding would suffice),

    2.  the program has some special knowledge that might
        enable it to guess the completely nonstandard
        encoding, and

    3.  having determined the encoding, and having specified
        an appropriate transcoder, the program continues to
        read the input both with and without transcoding.

In my opinion, this combination of circumstances will be so
rare as not to matter.  What does matter is whether the R6RS
provides convenient ways to open ports with all of the standard
Unicode encodings, including the common (though nonstandard)
situation in which the input might begin with a byte mark but
is not known to do so, and a modest guess is required.

> > Every continuable exception must be accompanied by a protocol
> > that describes how the exception can be continued.  In this
> > case, I propose that an exception handler that wishes to
> > ignore the situation simply return zero values, and that an
> > exception handler that wishes to replace the situation with
> > some sequence of replacement characters simply return those
> > characters.
> 
> OK, that can be done.  I'd still like us to decide how we want to
> handle encoding errors before I invest the time required to specify
> and implement this protocol.

Fair enough.

> > The point is that buffering *can* be used, and will be more
> > effective if implementations are allowed to design their own
> > protocols than if they are required to support random things
> > like set-output-port-buffer-mode! on every port.
> 
> I don't know that they're random: This exact model of controlling
> buffering is common, for example, among SML's and POSIX's model, and
> the available modes are certainly all useful.

I don't object to the buffering modes per se, and I believe
your specification will allow implementations to ignore the
requested mode in some circumstances.  (For example, I do
not see how the language lawyers could object to a system
that implements the block mode as line or none.)

I object to requiring every port to support a side effect
that can change its buffering mode at any time.  If this
side effect is merely a request that implementations can
ignore, then I withdraw my objection.

Will



More information about the R6RS mailing list