[R6RS] Changing the transcoding mid-stream

Michael Sperber sperber at informatik.uni-tuebingen.de
Sat Aug 19 07:42:02 EDT 2006


William D Clinger <will at ccs.neu.edu> writes:

> Note, however, that the section is called "Text Transcoders",
> and the sentence preceding the one you quoted begins with the
> words "Text transcoders".  

Yes, and I already admitted that this was potentially misleading.  The
next draft will not have this wording.

> The third paragraph of that section requires transcoders to
> do something weird if they encounter an illegal encoding.
> That implies that all of the transcoders, including the UTF-8
> transcoder, will interfere with binary i/o.  Since the SRFI
> also says that "no codec" corresponds to UTF-8, it follows
> that the proposal is useless for what I mean by binary i/o.

OK, I get the misunderstanding.  More below.

> I don't want to argue with you about what most programmers
> believe or have considered.  What concerns me is whether
> the proposal can deal with what I personally consider to
> be binary and mixed binary/textual i/o.  Since you do not
> like my definition of those things,

I don't dislike your definition of things, I'm still in the process of
understanding them, just as you were still in the process of
understanding mine.  I do want to address your concerns, but up to now
I didn't know what they were.

First of all, here's how to read your WAV file:

(open-file-input-port "foo.wav")

Indeed, "no transcoder" isn't the same as specifying a UTF-8
transcoder, and the sentence you quoted is highly misleading.  Sorry
about that.  As long as no transcoder is associated with the port, the various binary
I/O procedures deal with the binary data without doing any
interpretation of it as UTF-8.  Even a

(transcoder (eol-style (eol-style crlf)))

would only ever look for #x0a and #x0d bytes and translate those,
ignoring the rest.

> You couldn't use read-char to read the text fields because read-char
> assumes UTF-8.  You would have to use read-bytes-n (or similar) for
> the text fields, and then translate the bytes yourself.

No, you wouldn't.  The details depend on what mechanism we adopt for
associating with a port and/or changing it mid-stream.  In the
proposal where the transcoder is an argument, you'd do:

(open-file-input-port "foo.wav")
; binary I/O
... (get-u8 port) ...
; textual I/O
... (get-char port (transcoder (codec (latin1-codec)))) ...
; binary I/O again
... (get-u8 port) ...
                                   
If the transcoder is settable, you do:

(open-file-input-port "foo.wav")
; binary I/O
... (get-u8 port) ...
; textual I/O
(input-port-transcoder-set! port (transcoder (codec (latin1-codec))))
... (get-char port) ...
; binary I/O again
(input-port-transcoder-set! port (transcoder))
... (get-u8 port) ...

(One might make #f an alias for (transcoder) here.)

> In particular, we have no operations for translating bytes objects
> (or subsequences of bytes objects) into strings.  (We have
> open-bytes-reader, but there is no way to specify a
> translation/transcoding for it.)

No, but there's `open-bytes-input-port', and that does.

-- 
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla



More information about the R6RS mailing list