[R6RS] Changing the transcoding mid-stream

William D Clinger will at ccs.neu.edu
Thu Aug 17 14:03:48 EDT 2006


Mike wrote:
> The optional transcoder argument would go away from the various
> open-... procedures.  Then, the signatures of the get-... procedures
> would all get an optional transcoder argument:
> 
> (get-bytes-some input-port)
> (get-bytes-some input-port transcoder)
> 
> (get-u8 input-port)
> (get-u8 input-port transcoder)
> 
> (get-bytes-n input-port n)
> (get-bytes-n input-port n transcoder)
> 
> (get-bytes-n! input-port bytes start count)
> (get-bytes-n! input-port bytes start count transcoder)
> 
> (get-bytes-all input-port)
> (get-bytes-all input-port transcoder)

I'm way confused by this.  I thought the above procedures
were supposed to perform binary input, independent of the
transcoder.

> (lookahead-u8 input-port)
> (lookahead-u8 input-port transcoder)

Ditto.

> (put-bytes output-port bytes)
> (put-bytes output-port bytes start)
> (put-bytes output-port bytes start count)
> (put-bytes output-port bytes start count transcoder)
> 
> (put-u8 output-port octet)
> (put-u8 output-port octet transcoder)

Ditto.

> I'll point out the following possible flaw in the reasoning behind
> this proposal.  Suppose you're doing this:
> 
> (define t (transcoder (codec (utf-16be-codec))))
> 
> (put-u8 p #xEF t)
> (put-u8 p #xBB t)
> (put-u8 p #xBF t)
> 
> which should be equivalent to
> 
> (put-char p #\xFEFF t)
> 
> as the former is just the UTF-8 encoding of the latter.

Agreed.

> To do the
> transcoding correctly in the former case, you need to keep
> state---presumably in the port---between the invocations of `put-u8'.

I don't understand the need for that.  Since put-u8 is just
doing binary output (in my understanding), it should just
output the bits in its second argument to the output port.

Please note that while the Unicode standard proscribes the
insertion of arbitrary binary data into Unicode strings, it
does *not* forbid the placement of arbitrary binary data
adjacent to Unicode strings in records or in files.

> Now, the code has to deal with the possibility that the transcoder in
> the second call is different from the transcoder in the first call.

Not if put-u8 doesn't take a transcoder argument, which (in
my view) it shouldn't.

> This might involve checking the state that's stored in the port for
> compatibility with the codec.

Not if ports are purely binary, which (in my view) is a
consequence of making the port and codec orthogonal.

Will



More information about the R6RS mailing list