[R6RS] update to UnicodeSupport on wiki site

Matthew Flatt mflatt
Wed Dec 8 11:24:14 EST 2004


At Wed, 08 Dec 2004 17:14:26 +0100, Michael Sperber wrote:
> >>>>> "Matthew" == Matthew Flatt <mflatt at cs.utah.edu> writes:
> 
> Matthew> I added some comments on the comment. I also adjusted the proposal to
> Matthew> reflect my improved understanding of "code point", "Unicode scalar
> Matthew> value",
> 
> "Unicode Demystified" seems to say that "Unicode scalar value" is
> obsolete terminology on page 80, and also implies that this term is
> synonymous with "code point."  Is it wrong?  (I'm unclear on whether
> surrogates are code points or not, but am even less clear whether that
> difference matters in a context where chars are code points.)

The glossary 

   http://www.unicode.org/glossary/

is much clearer than _Unicode Demystified_ on the definitions. The
glossary defines "code point" so that it includes surrogates, and it
defines "Unicode scalar values" to not include surrogates.

I'm not sure why "Unicode scalar value" would be considered deprecated.
I find the distinction helpful, and the specification of UTF-8 and
other encodings (see "UTF-8 Encoding Form" in the glossary for a link)
refers to "Unicode scalar value" quite a lot.

The difference is important. For example, if you invent a variant of
UTF-8 that encodes surrogate code points as if they were scalar values,
a conformant UTF-8 decoder (such as the one in iconv) will not accept
the extra encodings.

Matthew



More information about the R6RS mailing list