[R6RS] Scripts and toplevel code

Fri Aug 11 06:01:32 EDT 2006

Here's my take on the options for scripts.  I'm sure it's longer than it 
needs to be, but I wanted to make it available asap.

1. Summary of current proposals
===============================
The script proposal discussed in email by Kent and Mike addresses making 
an R6RS program executable by prepending a Unix script header, along 
with a few other details necessary for dealing with the script entry 
point and command line arguments.  The rationale for such scripts is the 
same is that in SRFI 22, which boils down to having a standard way to 
run an R6RS program, to support distribution of Scheme code that's 
portable across platforms and Scheme implementations.

(Note that the Unix-specific aspect here does not interfere with making 
the same scripts portable to Windows and (I believe) the Mac OSes.)

One major design decision for scripts is whether to support top-level 
code in scripts, or whether to require scripts to be implemented as R6RS 
libraries.  Mike and Kent had settled on the library approach, with the 
final proposal summarized here: 
https://r6rs.scheming.org/node/324#comment-1652 .  The following syntax 
for a script was proposed:

   <script header> <script spec> <library>

where

   <script header> -> #!/usr/bin/env scheme-script <line break> #!r6rs

   <script spec> -> (script <library name> <entry name>)

This proposal is very general and has numerous good properties (expanded 
on in the Design Factors section below).  However, it also results in a 
minimal script being somewhat heavyweight.  As I mentioned in an earlier 
note, a "Hello world" script would look like this:

   #!/usr/bin/env scheme-script
   #!r6rs
   (script hello-world hello)
   (library hello-world
     (import r6rs)
     (export hello)
     (define (hello arguments)
       (display "Hello world!")))

To allow for less heavyweight scripts, a subsequent exchange between 
Kent and I discussed supporting scripts which contain top-level code, as 
an alternative to a library.  This has resulted in the syntax proposed 
at https://r6rs.scheming.org/node/344#comment-1680 :

    #!/usr/bin/[env ]scheme-script <line break> [#!r6rs]
    (import <import-spec>*)
    <library body>
    <library>*

This allows both for more concise small scripts:

   #!/usr/bin/env scheme-script
   #!r6rs
   (import r6rs)
   (display "Hello world!")

... as well as allowing for scripts which contain one or more libraries, 
which can be invoked by top-level code.

The major difference between these two proposals centers around adding 
support for top-level code.  It has been proposed that the semantics for 
such top-level code be equivalent to that of a library body.

2. Rationale for supporting top-level code
==========================================

Currently, the only legal R6RS program consists purely of libraries 
(some of which may be wrapped in scripts, per the existing proposal). 
The library specification requires libraries to be explicitly named, and 
to explicitly specify imports and exports.  The script proposal adds 
further requirements, resulting in the heaviness mentioned earlier.  Not 
all programs require this level of infrastructure.

One of the traditional benefits of Scheme has been a kind of scalability 
along the dimension of source-code rigor: a program can start out as an 
experiment in the REPL, progress perhaps via cut & paste to becoming a 
relaxed script, with little structure, and later evolve into a more 
rigorously structured program.  This allows programs to evolve "from 
scripts to programs" (a catchphrase used repeatedly by the PLT group, in 
a slightly different but closely related context).  This property of 
Scheme is worth preserving, and worth addressing in the R6RS.

Scheme implementations will certainly provide ways of executing code 
outside of libraries. If the R6RS doesn't address this possibility, then 
it essentially becomes "the Scheme library specification", only 
addressing the semantics of code inside R6RS libraries.

It's also worth noting that abstracting out the verbosity of the script 
boilerplate is exactly the kind of thing that's often resolved with 
macros in Scheme.  However, in this case, that's not possible, because 
the library specification prohibits macros from generating libraries. 
This is an unusual situation in Scheme, and it provides an additional 
reason to address the issue in the R6RS.

Finally, related to the issue of a relaxed syntax, the requirement that 
all definitions precede expressions in library bodies is unnecessarily 
restrictive for many kinds of lighter-weight applications, including 
many scripting applications.  Again, many Schemes are certain to 
continue providing support for interleaved definitions and expressions 
in top-level code, and if the R6RS can address this requirement in a 
reasonable way, supporting it should be considered.

3. Design factors
=================

The following covers major choices involved in a script specification 
for the R6RS.  The Design Rationale section of SRFI 22 is also relevant, 
and not all of the points it raises will be repeated here.

The following general structure for scripts is assumed below:

   <script header> <script spec> <script body>

These components are covered individually below.  <script spec> is 
addressed last, because it depends on the choices made for <script body>.

3.1 Script Header
-----------------

The choices for <script-header> are fairly constrained, and the proposal 
so far (as specified above) seems fine.  Making the #!r6rs specification 
optional would help a little for quick & dirty scripts.

3.2 Script Body
---------------

3.2.1 Script Startup
--------------------
There are two major choices for specifying how execution of a script 
should be started:

(a) By executing top level code (or code at the top level of a library 
body).

(b) By executing a particular named procedure, with a name that is either:
     (i) fixed by the specification, e.g. "main".
     or
     (ii) specified in the <script spec>, as in the current proposal.

For choice (a), a way to access the command line arguments is required, 
e.g. a procedure named 'command-line'.  For choice (b), command-line 
arguments can be passed as ordinary arguments to the script entry procedure.

3.2.2 Syntax of Script Body
---------------------------
There are three major choices for the syntax of the script body:

(a) An explicit library definition, as in the current proposal.  A 
variation on this could support multiple explicit libraries in a script, 
in which case some means is needed for selecting which library contains 
the script startup code, e.g. via the <script spec>.

(b) A top-level program of some kind. This would most likely closely 
resemble, or be identical to, a library body.  Depending on the details, 
this approach can complicate testing, debugging, and other kinds of 
reuse, if merely loading a script causes it to be executed.

(c) Support both (a) and (b).

3.2.2.1 Related script body issues
----------------------------------
If a script body is a library, it implies that other scripts and 
libraries can import it.  This is an advantage for debugging, testing, 
and other reuse of scripts.

If a script body can contain multiple libraries, distribution of an 
application as a set of libraries in a single script file becomes possible.

If a script body consists of top-level code, it raises the question of 
whether import of a script by other libraries and scripts should be 
supported.  To support this via the existing library import mechanism, 
it must be possible to treat the script body as a kind of library, which 
means that at a minimum, it needs a name.  One way to do this would be 
to provide a name as part of <script spec>.

Alternatively, scripts containing top-level code could remain anonymous. 
  This would mean that they could not be invoked directly from other 
Scheme code, unless procedures for that purpose are provided, such as 
the 'load-script' and 'invoke-script' procedures described by Kent in: 
https://r6rs.scheming.org/node/262#comment-1552

3.2.2.2 Interleaved definitions and expressions
-----------------------------------------------
If the script body consists of top-level code, it could be specified to 
support interleaved definitions and expressions, to provide a more 
relaxed syntax, as mentioned in the rationale.

Mike has pointed out that this issue is orthogonal to that of the 
ability to portably execute scripts.  For example, a new 'begin'-style 
form could be provided to support interleaving.  However, even if such a 
form were provided, it could make sense to implement top-level script 
bodies in terms of that form.  Any conflation of concepts here seems 
harmless.

3.3 Script Spec
---------------
The purpose of <script spec> is to specify the name of the library 
containing a script, and the procedure entry point within that library.

No <script spec> is needed if a script body either consists of top-level 
code, or a single library containing top-level code, either of which, 
when evaluated, causes the script to start executing, so that no other 
entry point needs to be specified.

However, if a script contains multiple libraries, or if scripts are 
started via a procedure rather than top level code, then some means of 
specifying the appropriate library and procedure is needed.

3.4. Other choices
------------------
There are some more minor choices, such as how to handle command line 
arguments.  To a large extent, such choices will be dictated by other 
more major decisions, such as whether the script entry point is a 
procedure.  Discussion of other such minor details has been omitted.

*end*