[R6RS] external representation for bytes objects

William D Clinger will at ccs.neu.edu
Mon Aug 14 18:07:48 EDT 2006


I have checked in some minor edits for document/bytes.tex.

I would like to propose an external representation for
bytes objects, e.g. #u8(...).

The rationale for this become pretty clear if you look
at the current version of unicode/normalization.sch.
That file defines eight tables, five of which are or
contain bytes objects.  The three tables that aren't
bytes objects and don't contain bytes objects can be
quoted constants; with separate compilation, they
will probably compile to some representation that is
at least as compact as their run-time representation.

The five tables that are bytes objects or contain
bytes objects are unlikely to compile to so compact
a representation.  For example,

(define canonical-compositions
  (vector
   (list
    (list->bytevector
     '(#x0 #x41 #x0 #x45 #x0 #x49 #x0 #x4e ...))
    (list->bytevector
     '(#x0 #xc0 #x0 #xc8 #x0 #xcc #x1 #xf8 ...)))
   (list
    (list->bytevector '(...))
    (list->bytevector '(...)))
   ...))

is likely to compile to a representation that
contains one pair for every byte that will end up
in a bytes objects, plus some code that puts the
table together at run time.  That's a factor of 8
to 10 in space, for a large table, and it wouldn't
be necessary if we had an external representation
for bytes objects:

(define canonical-compositions
  '#((#u8(#x0 #x41 #x0 #x45 #x0 #x49 #x0 #x4e ...)
      #u8(#x0 #xc0 #x0 #xc8 #x0 #xcc #x1 #xf8 ...))
     (#u8(...)
      #u8(...))
     ...))

Will



More information about the R6RS mailing list