[R6RS] Procedures that depend on Unicode character classification

William D Clinger will at ccs.neu.edu
Wed Jun 14 17:21:09 EDT 2006


While writing a reference implementation for the procedures
of SRFI 75 (Unicode), I've come up with a few questions.

The Unicode general categories are represented by symbols
in lower case, e.g. 'lu instead of 'Lu.  Is this really
what we intend for a case-sensitive R6RS?

The description of string-foldcase talks of "cased characters",
which I assume to be Unicode general categories Lu, Ll, and Lt,
but it also talks about "case-ignorable characters".  What are
they?  (My wild guess is Lm, Lo, Pc, Pd, Ps, Pe, Pi, Pf, and Po,
but that's probably wrong.)

The current draft of SRFI 75 says the char-alphabetic?,
char-numeric?, and char-whitespace? predicates are as
defined by SRFI-14, but SRFI 14 doesn't define those
predicates; that SRFI explicitly warns that those
procedures "may or may not be in agreement with the
SRFI 14 base character sets" char-set:letter,
char-set:digit, and char-set:whitespace.  I presume
the intent is for those predicates to be in agreement.

Finally, I discovered a bug in my tables that, when
fixed, will make them a little larger than my previous
estimates.

Will



More information about the R6RS mailing list