[R6RS] Source code encoding

Marc Feeley feeley
Mon Mar 7 12:35:00 EST 2005


> I believe it's important that we specify the encoding of Scheme source
> code files in the standard for the obvious portability reasons.
> 
> Manuel mentioned in Snowbird that it might be a bad idea to just pick
> UTF-8, as the standard Unicode encoding on Windows is UTF-16 + BOM.
> 
> My suggestion is to allow UTF-8 + BOM, UTF-16 + BOM, or Latin-1.  (We
> may allow UTF-32 + BOM as well, but that seems a rare encoding in
> files.)
> 
> This allows auto-detecting the actual UTF encoding used, except for
> Latin-1 files that start with LATIN SMALL LETTER THORN, LATIN SMALL
> LETTER Y WITH DIAERESIS (or the same in opposite order).

The beauty of UTF-8 is that plain ASCII files (probably most current
Scheme files) are compatible with UTF-8.  For UTF-8 + BOM you would
need to add a byte order mark at the beginning of the ASCII file,
which means changing all these files and thus making them hard to edit
and maintain in a non-Unicode aware editor.  So let's not use UTF-8 + BOM.

I feel a better solution is to allow UTF-8 and UTF-32 + BOM encodings
of Scheme source files.  As for end-of-line encodings, I propose that
all three end-of-line encodings (NL, CR, CR+NL) be equivalent.

Marc


More information about the R6RS mailing list