[R6RS] abstractness of syntax objects

Fri Apr 28 14:41:09 EDT 2006

We have discussed two extremes for the abstractness of syntax objects. 
After working on and with a new reference implementation of syntax-case
for the better part of two weeks, I am now more strongly than ever in
favor of a particular point between the two extremes, namely the point
currently occupied by Chez Scheme's implementation of syntax-case and also
by the portable syntax-case system derived from it, which is used by
several other Scheme systems.

The two extremes are as follows:

Fully wrapped:
  A syntax object is an abstract value encapsulating a Scheme s-expression
  with contextual information about that s-expression.  When invoked by
  the expander, a transformer receives a syntax object and must return a
  syntax object.  Syntax objects are destructured using syntax-case, and
  syntax objects are constructed using (syntax template), which produces a
  syntax object representing a copy of template into which the values of
  any pattern variables appearing within template have been inserted in
  place of those pattern variables.  (This replacement is described in the
  SRFI draft, and I will not go into the details in this note.)

Mostly unwrapped:
  A syntax object is a pair or list of syntax objects, a vector of syntax
  objects, a nonlist, nonvector, nonsymbol value, or an abstract syntax
  object encapsulating a symbol.  When invoked by the expander, a
  transformer receives a syntax object and must return a syntax object. 
  Syntax objects can be destructured with syntax-case or list- and
  vector-processing operations.  Syntax objects may be constructed using
  list- or vector-processing operations or using (syntax template), which
  produces a syntax object representing a copy of template.  Within this
  copy, the values of any pattern variables appearing within template have
  been inserted in place of those pattern variables, and any symbol
  constants have been replaced by syntax objects.

Each extreme has advantages.  Option A's abstract representation frees an
implementation to choose an appropriate internal representation that
allows the implementation to provide more functionality and/or efficiency. 
Since fully wrapped syntax objects can be destructured only via
syntax-case, which automates matching and syntax checking, it encourages a
safe, high-level style of transformer code, like syntax-rules.  On the
other hand, Option B's mostly concrete representation provides more
flexibility to the programmer, who can use familiar list-processing
operations, like map, to process portions of a transformer's input.

Option B's flexibility can lead to an undisciplined style in which
appropriate matching is not done before a transformer goes grabbing for a
piece of the input or ignores extra pieces of the input that should not be
there.  Option A is much better in this regard, but the proposed
syntax->list, which is required to map transformer helpers over portions
of the input, can be applied to arbitrary portions of the input and so can
to some of the same problems.

The middle point I advocate shares the advantages of each extreme and
addresses the shortcomings of both.  This middle point can be described as
follows:

Partially unwrapped:
  A syntax object is a pair or list of syntax objects, a vector of syntax
  objects, a nonlist, nonvector, nonsymbol value, or an abstract syntax
  object encapsulating an s-expression.  When invoked by the expander, a
  transformer receives an abstract syntax object and must return a syntax
  object.  Abstract syntax objects, such as transformer input, are
  destructured using syntax-case.  The outer list and vector shells of a
  syntax object can be manipulated with list- and vector-processing
  operations.  Syntax objects may be created with list- and
  vector-processing operations or with (syntax template), which produces a
  copy of template into which the values of pattern variables have been
  inserted.  The abstractness (degree of unwrapping) of any portion of the
  template copy is defined by the following rules:

   - the copy of (t1 . t2) is a pair if t1 or t2 contain pattern variables,
   - the copy of (t ellipses) is a list if t contains pattern variables,
   - the copy of #(t1 ... tn) is a vector if any of {t1, ..., tn} contain
     pattern variables, and
   - the copy of any portion t not containing pattern variables is abstract.

The above rules are the key to this middle point between the two extremes. 
They allow the programmer to use list- and vector-processing operations,
but only on those portions of the transformer input that have been
properly matched.  For example, if a transformer's input is matched using
the following syntax-case input pattern:

  (_ ([x e] ...) b1 b2 ...)

the corresponding output expression can treat #'(x ...), #'(e ...),
#'([x e] ...), and #'(b1 b2 ...) as lists, but cannot delve into the x's,
e's, b1, or b2's without using syntax-case to further destructure them.

With this middle point, the representation of syntax objects, including
transformer input, remains sufficiently abstract to allow the
implementation great freedom, while the well-defined partial unwrapping
allows the programmer the appropriate amount of flexibility to manipulate
matched input.  In my (considerable) experience, this is all the freedom
one wants, and, IMO, it's all the freedom one should have, since we should
be encouraging the use of the high-level matching to fully check input
syntax.

Compared with the fully wrapped extreme, the middle point eliminates the
need for potentially problematic helpers like syntax->list.  Because a
transformer can return partially unwrapped output, it also eliminates the
need for corresponding output constructors like list->syntax.  Compared
with the mostly unwrapped extreme, the middle point ties the
implementation down less without unduly restricting the programmer from
using list- and vector-processing operations.  It also encourages a safer,
higher level style of transformer code.

I admit that the middle point isn't as simple as either extreme, and it
doesn't provide quite as much implementation freedom as the fully wrapped
extreme, but, IMO, it is the natural and right compromise between the two
extremes.

I would like to base the SRFI and reference implementation on this middle
point and merge the above discussion into the existing discussion of
abstractness in the issues section.  Please let me know ASAP if this is
unacceptable to you.  Of course, if community input or our own opinions
convince a majority of us that we should not go with this middle point, we
can update the SRFI or merely change the text before it goes into the
R6RS.

Kent