[R6RS] Changing the transcoding mid-stream

Sat Aug 19 04:56:50 EDT 2006

William D Clinger <will at ccs.neu.edu> writes:

> Mike wrote:
>> > The ones that appear to perform binary i/o use the procedures that,
>> > according to you, were not intended for binary i/o.
>>
>> I think we're using different definitions of the word "binary I/O".
>> You seem to mean "untranscoded I/O" whereas I mean "I/O to and from
>> bytes objects and octets."  Is that a correct interpretation of what
>> you're asking for?
>
> I don't want to answer in the affirmative, because
> I no longer have any confidence in my understanding
> of what you mean by transcoding.  You are using that
> term to include both "compression or SSL or whatever"
> and Unicode encoding schemes, which to me are radically
> different things.

They are not to me.  The SRFI spells out its notion of transcoding
under "Encoding" in the "Design rationale" section.  Specifically:

>> This SRFI avoids this problem by specifying that textual I/O always
>> uses UTF-8. This means that, if the target or source of an I/O port
>> is to use a different encoding, a translated port needs to be used,
>> for which this SRFI offers the required facilities. This means that
>> text decoders or encoders are expressed as binary-to-binary
>> mappings, and as such compose.

Moreover, the second sentence in the section on "Text Transcoders" is:

>> A transcoder is an opaque object encapsulating a specific
>> translation from byte sequences to byte sequences.

Thus, the SRFI essentially treats textual I/O as a variant of binary
I/O, and transcoding works on the binary data.  That's also why
they're called "transcoders" implying a translation from one encoding
to another, and not "encoders", "decoders," or "codecs."

This is clearly contrary to your notion

>> The first maps from uninterpreted binary to uninterpreted binary,
>> while Unicode encoding schemes are all about interpreting binary.

I don't know how to argue with you about this---the way the SRFI deals
with transcoding is one of its distinguishing characteristics (and has
been since day 1), and I made to sure to point that out every time the
I/O discussion came up.  There are tradeoffs with every approach to
this, which I considered, and I stand by the one I chose.

I don't think there's any "most programmers" notion about this as most
programmers (including myself when I started out designing this)
haven't really considered the implications of mixed binary and textual
I/O in a multi-encoding and multi-byte encoding setting.

> I think we should either change the proposal so it can
> support what most programmers mean by mixed binary and
> textual i/o, or we should change the proposal to support
> completely separate binary and textual i/o through
> completely separate sets of i/o procedures, or we should
> give up on binary i/o for R6RS and eliminate all of the
> operations that give the misleading impression of
> performing (what most programmers mean by) binary i/o.

I don't.

-- 
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla