[R6RS] Codecs and endianness

Michael Sperber sperber at informatik.uni-tuebingen.de
Mon Jun 25 07:07:37 EDT 2007


Matthew Flatt <mflatt at cs.utah.edu> writes:

> At Mon, 25 Jun 2007 11:45:46 +0200, Michael Sperber wrote:
>> I only now consciously noticed that Will removed the UTF-xxBE/UTF-xxLE
>> codecs from the I/O library.  I have no idea why he did
>> this---presumably because of some interpretation of the Unicode
>> standard.  However, UTF-16BE/LE are official Unicode encoding schemes
>> (as per section 2.6 of the Unicode standard), and separate from UTF-16
>> (with or without BOM).  Therefore, I'd like to add it back, and specify
>> that the UTF-16 codec deals with an optional BOM on input (defaulting to
>> UTF-16BE if it's absent), and outputs a BOM on output.
>
> Wasn't this change a result of formal comment 68 (from Cowan)?

Good point, but his main point was adding the UTF-16 codec.  His
argument is mainly about *reading* UTF-16 documents, but on *writing*
UTF-16 documents, it is occasionally useful to be able to specify what
variant of UTF-16 to use---especially when using `string->bytevector'
(which was added later).

Not having UTF-16BE/UTF-16LE codecs creates an annoying impedance
mismatch between the string<->bytevector procedures from (rnrs io ports
(6)) and the utfxx<->string procedures from (rnrs bytevectors
(6))---i.e. the former can't be used to implement the latter.  (The
reverse direction is also doesn't work, but for other reasons.)

-- 
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla



More information about the R6RS mailing list