[R6RS] Procedures that depend on Unicode character classification

Matthew Flatt mflatt at cs.utah.edu
Thu Jun 15 08:46:03 EDT 2006


At Wed, 14 Jun 2006 17:21:09 -0400, William D Clinger wrote:
> The Unicode general categories are represented by symbols
> in lower case, e.g. 'lu instead of 'Lu.  Is this really
> what we intend for a case-sensitive R6RS?

That's what I intended, at least. I have no objection to using 'Lu.

> The description of string-foldcase talks of "cased characters",
> which I assume to be Unicode general categories Lu, Ll, and Lt,

Yes;.

> but it also talks about "case-ignorable characters".  What are
> they? 

"Case-ignorable" is defined by Unicode:

 A character C is defined to be case-ignorable if C has the Unicode
 Property Word_Break=MidLetter as defined in Unicode Standard Annex
 #29, "Text Boundaries;" or the General Category of C is Nonspacing
 Mark (Mn), Enclosing Mark (Me), Format Control (Cf), Letter Modifier
 (Lm), or Symbol Modifier (Sk).

> The current draft of SRFI 75 says the char-alphabetic?,
> char-numeric?, and char-whitespace? predicates are as
> defined by SRFI-14, but SRFI 14 doesn't define those
> predicates; that SRFI explicitly warns that those
> procedures "may or may not be in agreement with the
> SRFI 14 base character sets" char-set:letter,
> char-set:digit, and char-set:whitespace.  I presume
> the intent is for those predicates to be in agreement.

Yes, that was the intent.

Matthew




More information about the R6RS mailing list