[R6RS] I/O questions for everyone: encoding errors

Matthew Flatt mflatt at cs.utah.edu
Thu Jul 13 14:57:29 EDT 2006


At Thu, 13 Jul 2006 19:58:50 +0200, Michael Sperber wrote:
> 
> 1. If one of the read-char and read-string... procedures encounters an
>    invalid encoding, should it:
> 
>    a) skip the first byte of the invalid encoding and treat it as 
>       U+FFFD (REPLACEMENT CHARACTER)
>    b) skip the first byte of the invalid encoding and ignore it
>    c) raise a continuable exception that allows the handler to specify
>       what the decoding should be
>    d) do one of the above depending on an (optional) configuration
>       option specified upon opening the port.

"c", but without the "continuable" part.

Options "a" and "b" can be expressed as decodings. For example, there's
a decoding like UTF-8, except that bytes that would be bad in UTF-8 are
decoded as U+FFFD. Similarly, there's a decoding that ignores bytes
that would be bad for UTF-8.

Or maybe I mean "d", because you get to specify the transcoder for the
port.

For transformers created with the current pre-defined codecs, I think
"c" is the right answer. To get "a"- or "b"-like behavior, we could add
some pre-defined "a"- or "b"-like codecs, or we could add an extra
argument to `transcoder'; I have no opinion on whether or how that
should be done, though.

> 2. If one of the write-char and write-string... procedures gets 
>    passed a character that the transcoder of the port cannot encode,
>    should it:
> 
>    a) encode the U+003F (QUESTION MARK) character instead
>    b) try to encode the U+FFFD (REPLACEMENT CHARACTER), and, if that
>       fails, do one of the other options
>    c) ignore the character
>    d) raise a continuable exception that allows the handler to specify
>       what the encoding should be
>    d) do one of the above depending on an (optional) configuration
>       option specified upon opening the port.

Same reasoning, but the given choices make the answer easier: "d".
(Again without the continuable part, though.)

Matthew




More information about the R6RS mailing list