[R6RS] libraries

Fri Aug 18 00:43:54 EDT 2006

> Does the 10-15% estimate already take into account that no indirection
> and closure is needed for purely functional code that doesn't refer
> (transitively) to mutable variables or syntax constants?

No.  As I mentioned during our phone conversation, a sufficiently smart
compiler could probably eliminate the overhead for code that uses only
purely functional library imports.

Doing so would add substantial complexity to the compiler, however.  In
addition to the extra analysis required, it would involve building
knowledge of phased libraries into the compiler rather than taking care of
libraries at expansion time.

Why is this bad?  I have only so many hours to spend on the compiler, and
I can add only so much complexity before it becomes unmanageable, so the
time and code I'd spend overcoming phased library overhead will have to be
reflected in less time and code spent on actual optimization.  I'm
speaking for myself, of course, but I suspect that most compiler writers
operate under similar constraints.

> Would all R6RS
> functions be functional in this sense, so there's no penalty until a
> programmer starts using a module (at phase 0) that includes a mutable
> top-level variable or syntax constant?

I don't know.  I'm unsure whether exception handling, symbol->string,
dynamic-wind, or record-system state can be shared across phases.  Even if
the answer is yes in all cases, I'm unsure how the fact that this sharing
is "benign" can be expressed or determined automatically when the R6RS
libraries are built.  In the worst case, we'll be unable to detect that
the state of the exception system is benign, and we won't be able to
optimize uses of any procedure that can raise an exception, directly
or indirectly.

Fortunately, we don't have gensym or random, and we don't have standard
parameters like print-length and case-sensitive.  Most implementations do
have such features, however, and although the state won't be directly
visible in portable programs, it may be referenced behind the scenes by
some R6RS procedures.

> Meanwhile, it sounds like Larceny always has an indirection and closure
> anyway. Is that right? If so, do that mean there wouldn't be a
> performance hit at this level for Larceny? Put another way, would
> Larceny be 10-15% faster without the indirection?

I don't know much about Larceny, but Will may have been talking about
top-level procedures only, not realizing that closures would be forced
even upon internal procedures that reference library routines (i.e.,
virtually all procedures).  I'd be happy to elaborate.

> > and it is also enough to
> > inhibit some optimizations (perhaps all interlibrary optimization in less
> > sophisticated compilers). 
>
> I still don't understand this.

It will prevent some optimization in cases, where mutable state is
involved, and impede optimization in other cases simply by making it more
difficult.

> I can see how converting to `(env-ref n env)' too early in the
> compilation pipeline might obscure references to bindings in a library
> top level, and that would inhibit optimizations. But that just sounds
> like a problem in the compiler.

That's one way to look at it, but you've essentially just anticipated one
of my concerns, which is that knowledge of phased libraries has to be
built into a compiler if there is any hope of eliminating the added
overhead. 

> Doesn't the phaseless model mean that an "is defined?" check will be
> needed sometimes, or is this check easy to avoid (even in less
> sophisticated compilers)?

I don't think an "is defined?" check is needed, because the simpler
library model (Model 1) should be sufficient to guarantee that the right
libraries are loaded.  Even without this, the "is defined?" check can be
made once, just before the body of the importing library is executed, even
by less sophisticated compilers.

The bottom line for me is that there is going to be a run-time hit for
phased libraries, whether it's because we can't eliminate the overhead,
choose not to do so, or do so at the expense of time spent on other
optimizations.  Phased libraries have other costs as well, i.e., the
additional expand-time space and time overhead of visiting invoking
libraries multiple times.  I realize that phased libraries can eliminate
some build problems, but to me that's not a sufficient benefit for the
cost.

Kent