[R6RS] Unicode scalar value escape sequences

Michael Sperber sperber
Thu Mar 3 10:44:01 EST 2005


>>>>> "Marc" == Marc Feeley <feeley at IRO.UMontreal.CA> writes:

>> Well, sure.  But given that you have all kinds of procedures that
>> operate on strings of length 1 only, I don't see how you're making the
>> character data type go away in any real sense---you still effectively
>> have a separate type.  It's just that the type is wedged into the
>> language in a way that, to me, makes way less sense than the current
>> setup.

Marc> How would you define (char-upcase (integer->char #x00df))?
Marc> What about (char-ci<? (integer->char #x00df) #\T)?

We've been through this a zillion times: via the standard Unicode case
mapping.

Marc> I'm against it.  I think the syntax would be strange, overly complex
Marc> and redundant.

I don't get this argument: Matthew's proposal has three different
escape sequences for scalar values.  Implicit termination incurs
complexity.  Also, specifying scalar values via the vanilla numerical
literal syntax *removes* redundancy, as you can just re-use the lexer
for the numerical literals.

Marc> Here's another proposal.  Keep the \xhh and \uhhhh syntaxes for
Marc> compatibility with C and Java (i.e. exactly 2 and 4 hexadecimals
Marc> respectively) but require a delimiter for \U and allow any number of
Marc> hexadecimals, i.e.

Marc>     "\U20;\U00000021;\Ua;"   =   " !\n"

I would consider this an improvement over the current status as well.

-- 
Cheers =8-} Mike
Friede, V?lkerverst?ndigung und ?berhaupt blabla


More information about the R6RS mailing list