[R6RS] Procedures that depend on Unicode character classification

Michael Sperber sperber at informatik.uni-tuebingen.de
Thu Jun 15 04:33:30 EDT 2006


William D Clinger <will at ccs.neu.edu> writes:

> Concerning my estimated table sizes for Unicode support,
> Mike wrote:
>> Could you give hints as to what representations you used?
>
> The following implementation of char-general-property has
> not been tested, but it should give you an idea.

That meets my definition of compression.

But anyways, Scheme 48 puts the category, along with 1:1 case-mappings
and special-case information into a single word, and it.  It then uses
a compact array to represent the table, which should be faster than
binary search in most cases.  (Two indirections always.)

The compact array itself has 19494 entries (even scalar values that
are in the same category may differ wrt their case mappings and so
on), and the index table has 4352 entries.  The case mappings take 6
bits each (18 total), the category takes 5, and special-case for the
casing take another 3, for a total of 26 bits.  So there are 6 bits
wasted per entry, which could be used for other stuff.  Plus the
various auxiliary tables.

Is it really necessary to duplicate all that effort at this time?

-- 
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla



More information about the R6RS mailing list