[R6RS] Changing the transcoding mid-stream

Michael Sperber sperber at informatik.uni-tuebingen.de
Thu Aug 17 12:07:34 EDT 2006


Here's my interpretation of what Will meant:

The optional transcoder argument would go away from the various
open-... procedures.  Then, the signatures of the get-... procedures
would all get an optional transcoder argument:

(get-bytes-some input-port)
(get-bytes-some input-port transcoder)

(get-u8 input-port)
(get-u8 input-port transcoder)

(get-bytes-n input-port n)
(get-bytes-n input-port n transcoder)

(get-bytes-n! input-port bytes start count)
(get-bytes-n! input-port bytes start count transcoder)

(get-bytes-all input-port)
(get-bytes-all input-port transcoder)
   
(get-char input-port)
(get-char input-port transcoder)
   
(get-string-n input-port n)
(get-string-n input-port n transcoder)

(get-string-n! input-port string start count)
(get-string-n! input-port string start count transcoder)
   
(get-string-all input-port)
(get-string-all input-port transcoder)
   
(get-line input-port)
(get-line input-port transcoder)
    
(lookahead-u8 input-port)
(lookahead-u8 input-port transcoder)
   
(lookahead-char input-port)
(lookahead-char input-port transcoder)

(put-bytes output-port bytes)
(put-bytes output-port bytes start)
(put-bytes output-port bytes start count)
(put-bytes output-port bytes start count transcoder)

(put-u8 output-port octet)
(put-u8 output-port octet transcoder)
   
(put-string-n output-port string)
(put-string-n output-port string start)
(put-string-n output-port string start count)
(put-string-n output-port string start count transcoder)
   
(put-char output-port char)
(put-char output-port char transcoder)

I'll point out the following possible flaw in the reasoning behind
this proposal.  Suppose you're doing this:

(define t (transcoder (codec (utf-16be-codec))))

(put-u8 p #xEF t)
(put-u8 p #xBB t)
(put-u8 p #xBF t)

which should be equivalent to 

(put-char p #\xFEFF t)

as the former is just the UTF-8 encoding of the latter.  To do the
transcoding correctly in the former case, you need to keep
state---presumably in the port---between the invocations of `put-u8'.
Now, the code has to deal with the possibility that the transcoder in
the second call is different from the transcoder in the first call.
This might involve checking the state that's stored in the port for
compatibility with the codec.

-- 
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla



More information about the R6RS mailing list