[R6RS] Codecs and endianness
sperber at informatik.uni-tuebingen.de
Mon Jun 25 07:07:37 EDT 2007
Matthew Flatt <mflatt at cs.utah.edu> writes:
> At Mon, 25 Jun 2007 11:45:46 +0200, Michael Sperber wrote:
>> I only now consciously noticed that Will removed the UTF-xxBE/UTF-xxLE
>> codecs from the I/O library. I have no idea why he did
>> this---presumably because of some interpretation of the Unicode
>> standard. However, UTF-16BE/LE are official Unicode encoding schemes
>> (as per section 2.6 of the Unicode standard), and separate from UTF-16
>> (with or without BOM). Therefore, I'd like to add it back, and specify
>> that the UTF-16 codec deals with an optional BOM on input (defaulting to
>> UTF-16BE if it's absent), and outputs a BOM on output.
> Wasn't this change a result of formal comment 68 (from Cowan)?
Good point, but his main point was adding the UTF-16 codec. His
argument is mainly about *reading* UTF-16 documents, but on *writing*
UTF-16 documents, it is occasionally useful to be able to specify what
variant of UTF-16 to use---especially when using `string->bytevector'
(which was added later).
Not having UTF-16BE/UTF-16LE codecs creates an annoying impedance
mismatch between the string<->bytevector procedures from (rnrs io ports
(6)) and the utfxx<->string procedures from (rnrs bytevectors
(6))---i.e. the former can't be used to implement the latter. (The
reverse direction is also doesn't work, but for other reasons.)
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
More information about the R6RS