[R6RS] Procedures that depend on Unicode character classification

Michael Sperber sperber at informatik.uni-tuebingen.de
Wed Jun 14 01:14:14 EDT 2006


William D Clinger <will at ccs.neu.edu> writes:

> tables for case folding and the -ci procedures:
>     9 kbytes
> tables for char-general-property and associated predicates:
>    10 kbytes
> tables for the four normalization procedures:
>    20 kbytes

Could you give hints as to what representations you used?

The latest UnicodeData.txt describes, by my count, 237236 code points,
out of which 2048 are surrogates.  So, with no compression of any
kind, I would think even creating a table with just one bit of
information per code point / scalar value gets you up to something
like >25k.

-- 
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla



More information about the R6RS mailing list