Overview of Scheme

This chapter gives an overview of Scheme’s semantics. The purpose of this overview is to explain enough about the basic concepts of the language to facilitate understanding of the subsequent chapters of the report, which are organized as a reference manual. Consequently, this overview is not a complete introduction to the language, nor is it precise in all respects or normative in any way.

Following Algol, Scheme is a statically scoped programming language. Each use of a variable is associated with a lexically apparent binding of that variable.

Scheme has latent as opposed to manifest types [46]. Types are associated with values (also called objects) rather than with variables. (Some authors refer to languages with latent types as untyped, weakly typed or dynamically typed languages.) Other languages with latent types are Python, Ruby, Smalltalk, and other dialects of Lisp. Languages with manifest types (sometimes referred to as strongly typed or statically typed languages) include Algol 60, C, C#, Java, Haskell, and ML.

All objects created in the course of a Scheme computation, including procedures and continuations, have unlimited extent. No Scheme object is ever destroyed. The reason that implementations of Scheme do not (usually!) run out of storage is that they are permitted to reclaim the storage occupied by an object if they can prove that the object cannot possibly matter to any future computation. Other languages in which most objects have unlimited extent include C#, Java, Haskell, most Lisp dialects, ML, Python, Ruby, and Smalltalk.

Implementations of Scheme are required to be properly tail-recursive. This allows the execution of an iterative computation in constant space, even if the iterative computation is described by a syntactically recursive procedure. Thus with a properly tail-recursive implementation, iteration can be expressed using the ordinary procedure-call mechanics, so that special iteration constructs are useful only as syntactic sugar. See section 4.9.

Scheme was one of the first languages to support procedures as objects in their own right. Procedures can be created dynamically, stored in data structures, returned as results of procedures, and so on. Other languages with these properties include Common Lisp, Haskell, ML, Ruby, and Smalltalk.

One distinguishing feature of Scheme is that continuations, which in most other languages only operate behind the scenes, also have “first-class” status. Continuations are useful for implementing a wide variety of advanced control constructs, including non-local exits, backtracking, and coroutines. See section 9.16.

In Scheme, the argument expressions of a procedure call are evaluated before the procedure gains control, whether the procedure needs the result of the evaluation or not. C, C#, Common Lisp, Python, Ruby, and Smalltalk are other languages that always evaluate argument expressions before invoking a procedure. This is distinct from the lazy-evaluation semantics of Haskell, or the call-by-name semantics of Algol 60, where an argument expression is not evaluated unless its value is needed by the procedure.

Scheme’s model of arithmetic is designed to remain as independent as possible of the particular ways in which numbers are represented within a computer. In Scheme, every integer is a rational number, every rational is a real, and every real is a complex number. Scheme distinguishes between exact arithmetic, which corresponds to the mathematical ideal, and inexact arithmetic on approximations. Scheme implementations support exact arithmetic on at least integers, rationals, and complex numbers created from their rectangular components.

Scheme programs manipulate *values*, which are also referred
to as *objects*.
Scheme values are organized into sets of values called *types*.
This section gives an overview of the fundamentally important types of the
Scheme language. More types are described in later chapters.

Note:As Scheme is latently typed, the use of the termtypein this report differs from the use of the term in the context of other languages, particularly those with manifest typing.

A boolean value denotes a truth value, and can be either
true or false. In Scheme, the value for “false” is written
`#f`. The value “true” is written `#t`. In
most places where a truth value is expected, however, any value different from
`#f` counts as true.

Scheme supports a rich variety of numerical data types, including integers of arbitrary precision, rational numbers, complex numbers, and inexact numbers of various kinds. Chapter 2 gives an overview of the structure of Scheme’s numerical tower.

Scheme characters mostly correspond to textual characters.
More precisely, they are isomorphic to the *scalar values* of
the Unicode standard.

Strings are finite sequences of characters with fixed length and thus represent arbitrary Unicode texts.

A symbol is an object representing a string,
the symbol’s *name*.
Unlike strings, two symbols whose names are spelled the same
way are never distinguishable. Symbols are useful for many applications;
for instance, they may be used the way enumerated values are used in
other languages.

A pair is a data structure with two components. The most common use of pairs is to represent (singly linked) lists, where the first component (the “car”) represents the first element of the list, and the second component (the “cdr”) the rest of the list. Scheme also has a distinguished empty list, which is the last cdr in a chain of pairs that form a list.

Vectors, like lists, are linear data structures representing finite sequences of arbitrary objects. Whereas the elements of a list are accessed sequentially through the chain of pairs representing it, the elements of a vector are addressed by an integer index. Thus, vectors are more appropriate than lists for random access to elements.

Procedures are values in Scheme.

The most important elements of Scheme code are
*expressions*. Expressions can be
*evaluated*, producing a *value*. (Actually, any number
of values—see section 4.7.) The most
fundamental expressions are literal expressions:

23 ⇒ 23

This notation means that the expression `#t` evaluates to
`#t`, that is, the value for “true”, and that the expression
`23` evaluates to the number 23.

Compound expressions are formed by placing parentheses around their subexpressions. The first subexpression identifies an operation; the remaining subexpressions are operands to the operation:

(+ 14 (* 23 42)) ⇒ 980

`
In the first of these examples, ``+` is the name of
the built-in operation for addition, and `23` and `42` are the
operands. The expression `(+ 23 42)` reads as “the sum of 23 and
42”. Compound expressions can be nested—the second example reads
as “the sum of 14 and the product of 23 and 42”.

As these examples indicate, compound expressions in Scheme are always written using the same prefix notation. As a consequence, the parentheses are needed to indicate structure. Consequently, “superfluous” parentheses, which are often permissible in mathematical notation and also in many programming languages, are not allowed in Scheme.

As in many other languages, whitespace (including newlines) is not significant when it separates subexpressions of an expression, and can be used to indicate structure.

Scheme allows identifiers to denote locations containing values. These identifiers are called variables. In many cases, specifically when the location’s value is never modified after its creation, it is useful to think of the variable as denoting the value directly.

(y 42))

(+ x y)) ⇒ 65

In this case, the expression starting with `let` is a binding
construct. The parenthesized structure following the `let` lists
variables alongside expressions: the variable `x` alongside `23`, and the variable `y` alongside `42`. The `let`
expression binds `x` to 23, and `y` to 42. These bindings are
available in the *body* of the `let` expression, `(+ x
y)`, and only there.

The variables bound by a `let` expression
are *local*, because their bindings are visible only in the
`let`’s body. Scheme also allows creating top-level bindings for
identifiers as follows:

(define y 42)

(+ x y) ⇒ 65

(These are actually “top-level” in the body of a top-level program or library; see section 1.11 below.)

The first two parenthesized structures are *definitions*; they
create top-level bindings, binding `x` to 23 and `y` to 42.
Definitions are not expressions, and cannot appear in all places
where an expression can occur. Moreover, a definition has no value.

Bindings follow the lexical structure of the program: When several bindings with the same name exist, a variable refers to the binding that is closest to it, starting with its occurrence in the program and going from inside to outside, going all the way to a top-level binding only if no local binding can be found along the way:

(define y 42)

(let ((y 43))

(+ x y)) ⇒ 66

(let ((y 43))

(let ((y 44))

(+ x y))) ⇒ 67

While definitions are not expressions, compound expressions and definitions exhibit similar syntactic structure:

(* x 2)

`
While the first line contains a definition, and the second an
expression, this distinction depends on the bindings for ``define`
and `*`. At the purely syntactical level, both are
*forms*, and *form* is the general name for
a syntactic part of a Scheme program. In particular, `23` is a
*subform*of the form `(define x 23)`.

Definitions can also be used to define procedures:

(+ x 42))

(f 23) ⇒ 65

A procedure is, slightly simplified, an abstraction over an
expression. In the example, the first definition defines a procedure
called `f`. (Note the parentheses around `f x`, which
indicate that this is a procedure definition.) The expression `(f
23)` is a procedure call, meaning,
roughly, “evaluate `(+ x 42)` (the body of the procedure) with
`x` bound to 23”.

As procedures are regular values, they can be passed to other procedures:

(+ x 42))

(define (g p x)

(p x))

(g f 23) ⇒ 65

In this example, the body of `g` is evaluated with `p`
bound to `f` and `x` bound to 23, which is equivalent
to `(f 23)`, which evaluates to 65.

In fact, many predefined operations of Scheme are provided not by
syntax, but by variables whose values are procedures.
The `+` operation, for example, which receives
special syntactic treatment in many other languages, is just a regular
identifier in Scheme, bound to a procedure that adds numbers. The
same holds for `*` and many others:

(op x y))

(h + 23 42) ⇒ 65

(h * 23 42) ⇒ 966

Procedure definitions are not the only way to create procedures. A
`lambda` expression creates a new procedure as a value, with no
need to specify a name:

The entire expression in this example is a procedure call; `(lambda (x) (+ x 42))`, evaluates to a procedure that takes a single
number and adds 42 to it.

Whereas `(+ 23 42)`, `(f 23)`, and `((lambda (x) (+ x 42))
23)` are all examples of procedure calls, `lambda` and `let` expressions are not. This is because `let`, even though
it is an identifier, is not a variable, but is instead a *syntactic
keyword*. A form that has a
syntactic keyword as its first subexpression obeys special rules determined by
the keyword. The `define` identifier in a definition is also a
syntactic keyword. Hence, definitions are also not procedure calls.

The rules for the `lambda` keyword specify that the first
subform is a list of parameters, and the remaining subforms are the body of
the procedure. In `let` expressions, the first subform is a list
of binding specifications, and the remaining subforms are a body of
expressions.

Procedure calls can generally be distinguished from these “special forms” by
looking for a syntactic keyword in the first position of an
form: if it is not a syntactic keyword, the expression
is a procedure call.
(So-called *identifier macros* allow creating other kinds of
special forms, but are comparatively rare.)
The set of syntactic keywords of Scheme is
fairly small, which usually makes this task fairly simple.
It is possible, however, to create new bindings for syntactic keywords; see
below.

Scheme variables bound by definitions or `let` or `lambda`
forms are not actually bound directly to the values specified in the
respective bindings, but to locations containing these values. The
contents of these locations can subsequently be modified destructively
via *assignment*:

(set! x 42)

x) ⇒ 42

In this case, the body of the `let` expression consists of two
expressions which are evaluated sequentially, with the value of the
final expression becoming the value of the entire `let`
expression. The expression `(set! x 42)` is an assignment, saying
“replace the value in the location denoted by `x` with 42”.
Thus, the previous value of `x`, 23, is replaced by 42.

Many of the special forms specified in this report
can be translated into more basic special forms.
For example, `let` expressions can be translated
into procedure calls and `lambda` expressions. The following two
expressions are equivalent:

(y 42))

(+ x y)) ⇒ 65

((lambda (x y) (+ x y)) 23 42)

⇒ 65

Special forms like `let` expressions are called *derived
forms*because their semantics can be
derived from that of other kinds of forms by a syntactic
transformation. Some procedure definitions are also derived forms. The
following two definitions are equivalent:

(+ x 42))

(define f

(lambda (x)

(+ x 42)))

In Scheme, it is possible for a program to create its own derived forms by binding syntactic keywords to macros:

(syntax-rules ()

((def f (p ...) body)

(define (f p ...)

body))))

(def f (x)

(+ x 42))

The `define-syntax` construct specifies that a parenthesized
structure matching the pattern `(def f (p ...) body)`, where `f`, `p`, and `body` are pattern variables, is translated to
`(define (f p ...) body)`. Thus, the `def` form appearing in
the example gets translated to:

(+ x 42))

The ability to create new syntactic keywords makes Scheme extremely flexible and expressive, allowing many of the features built into other languages to be derived forms in Scheme.

A subset of the Scheme values called *datum
values*have a special
status in the language. These include booleans, numbers, characters, symbols,
and strings as well as lists and vectors whose elements are datums. Each
datum value may be represented in textual form as a
*syntactic datum*, which can be written out
and read back in without loss of information, giving a syntactic value
equal to the original (in the sense of `equal?`; see section 9.6).
A datum value may be represented by several different syntactic datums,
but the datum value corresponding to a syntactic datum is uniquely
determined up to equality (in the sense of `equal?`).
Moreover, each datum value
can be trivially translated to a literal expression in a program by
prepending a ` ’` to a corresponding syntactic datum:

’

’foo ⇒ foo

’(1 2 3) ⇒ (1 2 3)

’#(1 2 3) ⇒ #(1 2 3)

The ` ’` shown in the previous examples
is not needed for number or boolean literals.
The identifier

The syntactic datums form a superset of the Scheme forms. Thus, datums can be used to represent Scheme forms as data objects. In particular, symbols can be used to represent identifiers.

’(define (f x) (+ x 42))

⇒ (define (f x) (+ x 42))

This facilitates writing programs that operate on Scheme source code, in particular interpreters and program transformers.

Scheme code can be organized in components called
*libraries*. Each library contains
definitions and expressions. It can import definitions
from other libraries and export definitions to other libraries:

(export)

(import (rnrs base (6))

(rnrs i/o simple (6)))

(display "Hello World")

(newline))

A Scheme program is invoked via a *top-level program*.
Like a library, a top-level program contains definitions and
expressions, but specifies an entry point for execution. Thus a
top-level program defines, via the transitive closure of the libraries it
imports, a Scheme program.

(import (rnrs base (6))

(rnrs i/o ports (6))

(rnrs programs))

(put-bytes (standard-output-port)

(call-with-port

(open-file-input-port

(cadr (command-line)))

get-bytes-all))