[R6RS] I/O issues

Tue Aug 8 08:22:51 EDT 2006

Mike wrote:
> (You say don't understand it, but that's the reason I oppose making
> the transcoder mutable, as the simple interface you proposed makes the
> transcoder either immutable for all time or mutable for all time.)

I take it, then, that all transcoders will be immutable in
the compromise you were proposing.

> More fundamentally, ports *are* already mutable...

Point taken.

> I'll try again.  Let's say your input port has a transcoder attached
> to it, and you replace it by another one, by whatever means.
> Supposedly, the new transcoder needs to start transcoding at the
> beginning of the data that the program has read from the port.

That is what I don't get.  It's an important point, because
what you are saying implies that *all* input ports whose
transcoders can be replaced must retain *all* bytes until
the port is closed.  If a program really needs to do that,
I think the program should copy the input to a temporary
file, figure out what transcoders it wants to use for what
parts, and then reread the data from the temporary file.

If you were saying that the new transcoder needs to start
transcoding at the beginning of the data that has not yet
been transcoded by the previous transcoder, it would make
sense to me.  But that interpretation does not affect the
buffering of the underlying port.  Since the standard
transcoders you have proposed do not need any buffering
of their own beyond a small finite-state machine, I don't
see any performance problem with this model.

> Now, if the port is buffered, the implementation has two choices:
> 
> 1. buffer the raw data from the input port
> 2. buffer the transcoded data
> 
> Choice #1 means that only as much data gets transcoded as the program
> asks for.  That's possible, but it's not clear to me that it can be
> done as efficiently as doing it in blocks.  (You promised benchmarks.)

I offered to write benchmarks, but I thought the offer was
declined.  In any case, I can't write a useful benchmark
until I understand the nature of the performance issue.
Furthermore I agree that unbounded buffering of all input
read so far cannot be as efficient as bounded buffering.
When I offered to write benchmarks, I did not understand
you to be claiming unbounded buffering is necessary.

> With #2, you need to maintain a correspondence between the output
> positions and the input positions as you don't know how much of the
> transcoded data sitting in the buffer the program will read.

I agree that #2 is a loser.

> Regardless of what you think is a good implementation, some
> implementors feel it's an issue:
> 
> http://srfi.schemers.org/srfi-68/mail-archive/msg00073.html

Bothner's message makes sense to me.  Please note that
Bothner does not claim, as you do, that "the new transcoder
needs to start transcoding at the beginning of the data
that the program has read from the port."  Furthermore
Bothner explains why switching from a character transcoder
to binary is not "a big deal":  "In a sane format, the
string will either be preceded by a count, or will be
followed by a delimiter, such as a nul byte.  In that case
you can extract the string as a byte array, and then
convert it.  Buffering isn't a problem if we're using a
non-stateful encoding *and* we do our own decoding."

Bothner then suggests the ability to change the transcoder
associated with a port, though it appears to me that his
concerns would be just as well addressed by the ability
to mutate a transcoder.

Will