[R6RS] Records

Thu Sep 16 16:45:56 EDT 2004

Here are my ideas on records.

Marc

R6RS records (Marc Feeley, September 16 2004)
============

Here are some ideas for R6RS records to start the discussion.  I've
taken some inspiration from SRFI-9 and SRFI-57 and Gambit's
"define-type" special form.

Issues:

 1) Should record types be distinct from predefined types and other
    record types?

    If some existing type is reused, e.g. vectors, it is trivial to
    read/write records but code that manipulates records and vectors is
    harder to understand, analyse and optimize.

    I think record types should be distinct and there should be an
    external representation for records allowing read/write.

 2) Should it be possible to communicate records (i.e. write then read)
    between two independent programs (or equivalently write a record to
    a file and then read it back) while preserving their type (in the
    sense that the record type predicate and other operations still work
    properly)?

    This would be useful for implementing client/server systems,
    databases, snapshots, etc.  It would also be useful when the client
    and server are different implementations of Scheme (i.e. the
    external representation must be standardized).

 3) Should record reflexion be supported?

    Reflexion would allow inspecting records (accessing fields, getting
    the "name" of the record type, etc) at run time even if the record
    definition is not lexically visible.  This is particularly useful
    when records can be communicated between programs, and to write
    user-level pretty printers and inspectors.

 4) Should record subtyping be supported?

    This is useful for code factoring and extensibility.  It is **so**
    useful for implementing ASTs in compilers, exception hierarchies,
    port hierarchies (i.e. a byte port is a character port which is
    an "object" port), and many other things.

 5) Should internal definitions (a record definition other than at
    toplevel) be allowed?

    This would be more modular (i.e. define the record where needed).

 6) Should internal definitions be generative?

    Generative definitions allow types to be created dynamically.
    Non-generative internal definitions are nevertheless needed so that
    two programs can communicate records while preserving their type.

    I think both generative and non-generative definitions should be
    possible and under the control of the programmer.

    We can also wonder if toplevel definitions should be generative.
    If two distinct programs use the same module that defines a record
    at toplevel, are the types the same?  Also, what happens if you
    have a file that contains a toplevel record definition, and you
    "load" that file twice, is there a single type or two types?
    I think all record definitions should be generative by default.

 7) Should the interface to record operations (record construction,
    predicate, etc) be syntactic, procedural or a mix?

    In general, a procedural interface is more powerful.  However, a
    syntactic interface to some operations may be useful to support a
    special syntax for constructors (optional fields, named fields),
    for interfacing to a pattern matcher and to define subtypes (an
    example is shown below).

 8) Should record type definition be done with a syntax form or
    procedure or both?

    A procedural form would allow creating types at run time.

 9) Should the names of the constructor, accessors, predicate, etc be
    generated automatically?  Can the programmer override the default
    names?

    I think it is convenient to have automatically generated names but
    that the programmer should be able to specify the names explicitly.
    Moreover, th constructor should have the same name as the type, i.e.

       (point 10 20)  not  (make-point 10 20)

10) Should the programmer have control over mutability of fields?
    Should functional update operations be supported?  Should fields be
    mutable or immutable by default?

11) How are two records tested for equality (equal?)?  Which fields are
    used to test equality?  Should the programmer have control over
    which fields are tested and how?

12) Should the programmer have control over which fields are shown
    by "write"?

    This is useful for debugging cyclic data structures (doubly-linked
    lists, tree nodes that point to the parent, etc).

13) How do record definitions interact with the module system?

I will now describe Gambit's "define-type" form to show how some of
these issues can be solved.  I think the R6RS records could be based
on "define-type", modified to work well with the module system.

Simple usage
------------

The syntax of define-type is compatible with SRFI-9.  I chose to name
it define-type because I think it is best to view this as the only way
to define types (even the predefined types could be defined this way,
in principle).  The basic syntax is similar to other Scheme record
definition forms: you give the name of the type and the name of the
fields, as in this definition of 2D points:

    (define-type point x y)

This expands to:

    (begin

      (define ##type-2-point ; type descriptor of "point" (which has 2 fields)
        (##structure ##type-type        ; type descriptor of "type"
                     (##make-uninterned-symbol "##type-2-point") ; type id
                     'point             ; type name
                     8                  ; type attributes
                     #f                 ; super type descr (#f = no super type)
                     '#(x 0 #f y 0 #f)  ; field names, attributes and init vals
                     '#(1 2)))          ; fields initialized by the constructor

      (define (point p1 p2)
        (##structure ##type-2-point  ; type descriptor of "point"
                     p1              ; x field
                     p2))            ; y field

      (define (point? obj)
        (##structure-direct-instance-of? obj (##type-id ##type-2-point)))

      (define (point-x obj)
        (##structure-ref obj 1 ##type-2-point point-x))

      (define (point-x-set! obj val)
        (##structure-set! obj val 1 ##type-2-point point-x-set!))

      (define (point-y obj)
        (##structure-ref obj 2 ##type-2-point point-y))

      (define (point-y-set! obj val)
        (##structure-set! obj val 2 ##type-2-point point-y-set!)))

NOTE 1: Think of (##structure x y z) to be like (vector x y z) except the
        object's type tag flags it as a structure.

NOTE 2: The last 2 parameters of ##structure-ref and ##structure-set! are
        the type descriptor and the procedure.  This is used for type
        checking "obj" and indicating which accessor procedure signaled
        the error.

NOTE 3: (##structure-direct-instance-of? obj sym) tests if obj is
        a structure object and the type-id in its type descriptor is
        eq? to the symbol sym.  So this test is fast.

Overriding the generated procedure names
----------------------------------------

Like SRFI-9 the name of the generated procedures can be specified
explicitly.  Keywords are used to indicate the names of the
constructor and predicate procedures:

    (define-type point
      constructor: make-point
      predicate: is-a-point?
      (x get-x)                ; no setter for x field
      (y get-y set-y!))

expands to:

    (begin
      (define ##type-2-point ...)
      (define (make-point p1 p2) ...)
      (define (is-a-point? obj) ...)
      (define (get-x obj) ...)
      (define (get-y obj) ...)
      (define (set-y! obj val) ...))

Field attributes
----------------

Fields have attributes.  The field attributes supported are:

    read-write:     or read-only:       default is read-write:
    printable:      or unprintable:     default is printable:
    equality-test:  or equality-skip:   default is equality-test:

The attributes can be specified inside the field definition or
can prefix a field definition (in which case they apply to all fields
from that point):

    (define-type point      is equivalent to    (define-type point
      read-only:                                  
      x                                           (x read-only:)
      (y unprintable:))                           (y read-only: unprintable:))

they expand to:

    (begin
      (define ##type-2-point ...)  ; <-- attributes are stored in type descr
      (define (point p1 p2) ...)
      (define (point? obj) ...)
      (define (point-x obj) ...)
      (define (point-y obj) ...))

Field initializers
------------------

It is also possible to indicate which fields are initialized by the
constructor and which are initialized to a default value (which
must be a constant):

    (define-type point
       constructor: (make-point y)   ; only y is specified to the constructor
       (x init: 0)                   ; x gets the value 0
       y)

so that the call (make-point 7) evaluates to #<point #2 x: 0 y: 7>.

Defining subtypes
-----------------

Definition of subtypes is achieved with macros.  A type definition
that can be extended (i.e. for which it is possible to define
subtypes) must indicate explicitly a name for the "extender" which is
a macro defined in the expansion of the type definition.  The extender
has exactly the same syntax as "define-type", but it defines a subtype
of that type.  Here is how an extensible 2D point can be defined and a
3D point subtype:

    (define-type point
      extender: define-type-of-point
      x
      y)

    (define-type-of-point point3D
      z)

these definitions expand to

    (begin

      (define-macro (define-type-of-point . args)
        (##define-type-expand  ; macro expander for define-type family of forms
          'define-type-of-point
          '#<type #2 point>
          '##type-2-point
          args))

      (define ##type-2-point ...)

      (define (point p1 p2) ...)

      (define (point? obj)
        (##structure-instance-of? obj (##type-id ##type-2-point)))

      (define (point-x obj) ...)
      (define (point-x-set! obj val) ...)
      (define (point-y obj) ...)
      (define (point-y-set! obj val) ...)

      (define ##type-3-point3D ...) ; type descriptor of point3D

      (define (point3D p1 p2 p3)
        (##structure ##type-3-point3D p1 p2 p3))

      (define (point3D? obj)
        (##structure-direct-instance-of? obj (##type-id ##type-3-point3D)))

      (define (point3D-z obj)
        (##structure-ref obj 3 ##type-3-point3D point3D-z))

      (define (point3D-z-set! obj val)
        (##structure-set! obj val 3 ##type-3-point3D point3D-z-set!)))

NOTE: We can use ##structure-direct-instance-of? for point3D? because
      it cannot have subtypes.  However for point? we must use the slower
      ##structure-instance-of? which searches the super-types of obj to find
      a type-id that matches.

Non-generative type definitions
-------------------------------

Non-generative type definitions are obtained by explicitly specifying
a "globally unique identifier" (GUID) for the type.  This way, even
different programs which use the same type definition can agree on the
identity of that type.  This works as long as programmers are
responsible for managing the GUIDs (they must attach a new GUID if
they change anything in the type description).  There are obvious
security issues, but here I'm assuming that all parties are
trustworthy.  For example:

    (define-type point
      id: 3938F58E-07F2-11D9-9E1E-00039301BA52  ; <-- a unique symbol
      x                                         ;     generated by uuidgen
      y)

expands to

    (begin

      (define (point p1 p2)
        (##structure '#<type #2 point> ; type descriptor is now a constant
                     p1
                     p2))

      (define-macro (constant-point p1 p2) ; constant constructor macro
        (##define-type-construct-constant
         'constant-point
         '#<type #2 point>
         p1
         p2))

      (define (point? obj)
        (##structure-direct-instance-of?
         obj
         '##type-2-3938F58E-07F2-11D9-9E1E-00039301BA52)) ; literal type-id

      (define (point-x obj)
        (##structure-ref obj 1 '#<type #2 point> point-x))

      (define (point-x-set! obj val)
        (##structure-set! obj val 1 '#<type #2 point> point-x-set!))

      (define (point-y obj)
        (##structure-ref obj 2 '#<type #2 point> point-y))

      (define (point-y-set! obj val)
        (##structure-set! obj val 2 '#<type #2 point> point-y-set!)))

NOTE 1: The type descriptor is not stored in a variable because is is
        now constant and can be denoted with a literal where needed (i.e.
        the expression '#<type #2 point>).  The type-id is also known
        and used directly in the definition of point?.

NOTE 2: A "constant constructor" is also defined (as a macro), so that
        constants can be created.  All evaluations of a given constant
        constructor yield the same (eq?) object.  The arguments of the
        constant constructor must be self-evaluating or quoted constants.
        Compare the calls:

          (point (+ 1 2) (* 3 4))  ==>  the new object #<point #2 x: 3 y: 12>

        and

          (constant-point 3 '(* 3 4))) which expands to
                                         (quote #<point #2 x: 3 y: (* 3 4)>)

The definition of the constructor (and predicate) can be avoided by
specifying #f as its name.  This is useful when only one instance of
this type is needed (such as simple exception objects, the empty list, etc).

Serialization
-------------

Instances of non-generative types can be communicated (write then
read) by dumping all their fields, including the type descriptor.
Here is an example for the object (point 44 55):

    (begin

      (output-port-readtable-set!
       (current-output-port)
       (readtable-sharing-allowed?-set 
        (output-port-readtable (current-output-port))
        'serialize))

      (pretty-print (point 44 55) (current-output-port)))

prints:

    #structure(#structure(#0=#structure(#0#
                                        ##type-6
                                        type
                                        8
                                        #f
                                        #(id
                                          1
                                          #f
                                          name
                                          5
                                          #f
                                          flags
                                          5
                                          #f
                                          super
                                          5
                                          #f
                                          fields
                                          5
                                          #f
                                          cfields
                                          5
                                          #f)
                                        #(1 2 3 4 5 6))
                          ##type-2-3938F58E-07F2-11D9-9E1E-00039301BA52
                          point
                          8
                          #f
                          #(x 0 #f y 0 #f)
                          #(1 2))
               44
               55)

Notice that the object itself contains the "point" type descriptor and
the fields x and y.  The "point" type descriptor is an object that contains
the "type" type descriptor (which is circular because its type descriptor
is itself).  Each type descriptor also contains the type-id, the type's name,
the type's attributes, the super-type descriptor (#f = no super-type),
the field names, their attributes and initial value, and the list of
fields initialized by the constructor.

This external representation can be read by "read" and the object and
its type completely reconstructed (except for eq?-ness, but the type
predicate and other operations still work properly).

Other features
--------------

The define-type form also supports these type attributes:

  prefix: id        Adds "id" as a prefix to all automatically generated names.

  macros:           All operations generated are expressed as macros (except
                    that in the case of a generative type definition the
                    binding of the type descriptor to a variable is done
                    with a "define")

  implementer: id   Adds a definition of the parameterless macro "id" that
                    expands to all non-macro definitions produced by the
                    define-type.  This is useful to distinguish the two
                    binding times (compile-time and run time).  It can
                    be used to avoid duplicate definitions when a file
                    containing a define-type is included in multiple files
                    (I think this problem should be solved through the
                    module system, but this is how the problem can be solved
                    without a module system).  For example, let's say the
                    file point-def.scm contains the definition of the point
                    type, and files a.scm and b.scm need to manipulate points.
                    Then the file "point-def.scm" should be written like this:

                    ; File: "point-def.scm"
                    (define-type point
                      implementer: implement-point
                      extender: define-type-of-point ; <-- optional
                      macros:                        ; <-- optional
                      x
                      y)

                    And 3 files need to be compiled separately:

                    ; File: "point.scm"
                    (include "point-def.scm")
                    (implement-point)  ; <-- will define the procedures

                    ; File: "a.scm"
                    (include "point-def.scm")
                    (write (point 1 2))

                    ; File: "b.scm"
                    (include "point-def.scm")
                    (define-type-of-point point3D z)
                    (write (point3D 1 2 3))