Chapter 1

Overview of Scheme

This chapter gives an overview of Scheme’s semantics. The purpose of this overview is to explain enough about the basic concepts of the language to facilitate understanding of the subsequent chapters of the report, which are organized as a reference manual. Consequently, this overview is not a complete introduction to the language, nor is it precise in all respects or normative in any way.

Following Algol, Scheme is a statically scoped programming language. Each use of a variable is associated with a lexically apparent binding of that variable.

Scheme has latent as opposed to manifest types [28]. Types are associated with values (also called objects) rather than with variables. (Some authors refer to languages with latent types as untyped, weakly typed or dynamically typed languages.) Other languages with latent types are Python, Ruby, Smalltalk, and other dialects of Lisp. Languages with manifest types (sometimes referred to as strongly typed or statically typed languages) include Algol 60, C, C#, Java, Haskell, and ML.

All objects created in the course of a Scheme computation, including procedures and continuations, have unlimited extent. No Scheme object is ever destroyed. The reason that implementations of Scheme do not (usually!) run out of storage is that they are permitted to reclaim the storage occupied by an object if they can prove that the object cannot possibly matter to any future computation. Other languages in which most objects have unlimited extent include C#, Java, Haskell, most Lisp dialects, ML, Python, Ruby, and Smalltalk.

Implementations of Scheme are required to be properly tail-recursive. This allows the execution of an iterative computation in constant space, even if the iterative computation is described by a syntactically recursive procedure. Thus with a properly tail-recursive implementation, iteration can be expressed using the ordinary procedure-call mechanics, so that special iteration constructs are useful only as syntactic sugar. See section 4.9.

Scheme was one of the first languages to support procedures as objects in their own right. Procedures can be created dynamically, stored in data structures, returned as results of procedures, and so on. Other languages with these properties include Common Lisp, Haskell, ML, Ruby, and Smalltalk.

One distinguishing feature of Scheme is that continuations, which in most other languages only operate behind the scenes, also have “first-class” status. Continuations are useful for implementing a wide variety of advanced control constructs, including non-local exits, backtracking, and coroutines. See section 9.16.

In Scheme, the argument expressions of a procedure call are evaluated before the procedure gains control, whether the procedure needs the result of the evaluation or not. C, C#, Common Lisp, Python, Ruby, and Smalltalk are other languages that always evaluate argument expressions before invoking a procedure. This is distinct from the lazy-evaluation semantics of Haskell, or the call-by-name semantics of Algol 60, where an argument expression is not evaluated unless its value is needed by the procedure.

Scheme’s model of arithmetic provides a rich set of numerical types and operations on them. Furthermore, it distinguishes exact and inexact number objects: Essentially, an inexact number object corresponds to a number exactly, and an inexact number objects is the result of a computation that involved rounding or other errors.

1.1 Basic types

Scheme programs manipulate values, which are also referred to as objects. Scheme values are organized into sets of values called types. This section gives an overview of the fundamentally important types of the Scheme language. More types are described in later chapters.

Note: As Scheme is latently typed, the use of the term type in this report differs from the use of the term in the context of other languages, particularly those with manifest typing.

Boolean values

A boolean value denotes a truth value, and can be either true or false. In Scheme, the value for “false” is written #f. The value “true” is written #t. In most places where a truth value is expected, however, any value different from #f counts as true.

Numbers

Scheme supports a rich variety of numerical data types, including objects representing integers of arbitrary precision, rational numbers, complex numbers, and inexact numbers of various kinds. Chapter 2 gives an overview of the structure of Scheme’s numerical tower.

Characters

Scheme characters mostly correspond to textual characters. More precisely, they are isomorphic to the scalar values of the Unicode standard.

Strings

Strings are finite sequences of characters with fixed length and thus represent arbitrary Unicode texts.

Symbols

A symbol is an object representing a string, the symbol’s name. Unlike strings, two symbols whose names are spelled the same way are never distinguishable. Symbols are useful for many applications; for instance, they may be used the way enumerated values are used in other languages.

Pairs and lists

A pair is a data structure with two components. The most common use of pairs is to represent (singly linked) lists, where the first component (the “car”) represents the first element of the list, and the second component (the “cdr”) the rest of the list. Scheme also has a distinguished empty list, which is the last cdr in a chain of pairs that form a list.

Vectors

Vectors, like lists, are linear data structures representing finite sequences of arbitrary objects. Whereas the elements of a list are accessed sequentially through the chain of pairs representing it, the elements of a vector are addressed by an integer index. Thus, vectors are more appropriate than lists for random access to elements.

Procedures

Procedures are values in Scheme.

1.2 Expressions

The most important elements of Scheme code are expressions. Expressions can be evaluated, producing a value. (Actually, any number of values—see section 4.7.) The most fundamental expressions are literal expressions:

#t ⇒ #t 23 ⇒ 23

This notation means that the expression #t evaluates to #t, that is, the value for “true”, and that the expression 23 evaluates to a number object representing the number 23.

Compound expressions are formed by placing parentheses around their subexpressions. The first subexpression identifies an operation; the remaining subexpressions are operands to the operation:

(+ 23 42) ⇒ 65 (+ 14 (* 23 42)) ⇒ 980

In the first of these examples, + is the name of the built-in operation for addition, and 23 and 42 are the operands. The expression (+ 23 42) reads as “the sum of 23 and 42”. Compound expressions can be nested—the second example reads as “the sum of 14 and the product of 23 and 42”.

As these examples indicate, compound expressions in Scheme are always written using the same prefix notation. As a consequence, the parentheses are needed to indicate structure. Consequently, “superfluous” parentheses, which are often permissible in mathematical notation and also in many programming languages, are not allowed in Scheme.

As in many other languages, whitespace (including newlines) is not significant when it separates subexpressions of an expression, and can be used to indicate structure.

1.3 Variables and binding

Scheme allows identifiers to denote locations containing values. These identifiers are called variables. In many cases, specifically when the location’s value is never modified after its creation, it is useful to think of the variable as denoting the value directly.

(let ((x 23) (y 42)) (+ x y)) ⇒ 65

In this case, the expression starting with let is a binding construct. The parenthesized structure following the let lists variables alongside expressions: the variable x alongside 23, and the variable y alongside 42. The let expression binds x to 23, and y to 42. These bindings are available in the body of the let expression, (+ x y), and only there.

1.4 Definitions

The variables bound by a let expression are local, because their bindings are visible only in the let’s body. Scheme also allows creating top-level bindings for identifiers as follows:

(define x 23) (define y 42) (+ x y) ⇒ 65

(These are actually “top-level” in the body of a top-level program or library; see section 1.11 below.)

The first two parenthesized structures are definitions; they create top-level bindings, binding x to 23 and y to 42. Definitions are not expressions, and cannot appear in all places where an expression can occur. Moreover, a definition has no value.

Bindings follow the lexical structure of the program: When several bindings with the same name exist, a variable refers to the binding that is closest to it, starting with its occurrence in the program and going from inside to outside, going all the way to a top-level binding only if no local binding can be found along the way:

(define x 23) (define y 42) (let ((y 43)) (+ x y)) ⇒ 66 (let ((y 43)) (let ((y 44)) (+ x y))) ⇒ 67

1.5 Forms

While definitions are not expressions, compound expressions and definitions exhibit similar syntactic structure:

(define x 23) (* x 2)

While the first line contains a definition, and the second an expression, this distinction depends on the bindings for define and *. At the purely syntactical level, both are forms, and form is the general name for a syntactic part of a Scheme program. In particular, 23 is a subformof the form (define x 23).

1.6 Procedures

Definitions can also be used to define procedures:

(define (f x) (+ x 42)) (f 23) ⇒ 65

A procedure is, slightly simplified, an abstraction over an expression. In the example, the first definition defines a procedure called f. (Note the parentheses around f x, which indicate that this is a procedure definition.) The expression (f 23) is a procedure call, meaning, roughly, “evaluate (+ x 42) (the body of the procedure) with x bound to 23”.

As procedures are regular values, they can be passed to other procedures:

(define (f x) (+ x 42)) (define (g p x) (p x)) (g f 23) ⇒ 65

In this example, the body of g is evaluated with p bound to f and x bound to 23, which is equivalent to (f 23), which evaluates to 65.

In fact, many predefined operations of Scheme are provided not by syntax, but by variables whose values are procedures. The + operation, for example, which receives special syntactic treatment in many other languages, is just a regular identifier in Scheme, bound to a procedure that adds number objects. The same holds for * and many others:

(define (h op x y) (op x y)) (h + 23 42) ⇒ 65 (h * 23 42) ⇒ 966

Procedure definitions are not the only way to create procedures. A lambda expression creates a new procedure as a value, with no need to specify a name:

((lambda (x) (+ x 42)) 23) ⇒ 65

The entire expression in this example is a procedure call; (lambda (x) (+ x 42)), evaluates to a procedure that takes a single number object and adds 42 to it.

1.7 Procedure calls and syntactic keywords

Whereas (+ 23 42), (f 23), and ((lambda (x) (+ x 42)) 23) are all examples of procedure calls, lambda and let expressions are not. This is because let, even though it is an identifier, is not a variable, but is instead a syntactic keyword. A form that has a syntactic keyword as its first subexpression obeys special rules determined by the keyword. The define identifier in a definition is also a syntactic keyword. Hence, definitions are also not procedure calls.

The rules for the lambda keyword specify that the first subform is a list of parameters, and the remaining subforms are the body of the procedure. In let expressions, the first subform is a list of binding specifications, and the remaining subforms are a body of expressions.

Procedure calls can generally be distinguished from these “special forms” by looking for a syntactic keyword in the first position of an form: if it is not a syntactic keyword, the expression is a procedure call. (So-called identifier macros allow creating other kinds of special forms, but are comparatively rare.) The set of syntactic keywords of Scheme is fairly small, which usually makes this task fairly simple. It is possible, however, to create new bindings for syntactic keywords; see below.

1.8 Assignment

Scheme variables bound by definitions or let or lambda forms are not actually bound directly to the values specified in the respective bindings, but to locations containing these values. The contents of these locations can subsequently be modified destructively via assignment:

(let ((x 23)) (set! x 42) x) ⇒ 42

In this case, the body of the let expression consists of two expressions which are evaluated sequentially, with the value of the final expression becoming the value of the entire let expression. The expression (set! x 42) is an assignment, saying “replace the value in the location denoted by x with 42”. Thus, the previous value of x, 23, is replaced by 42.

1.9 Derived forms and macros

Many of the special forms specified in this report can be translated into more basic special forms. For example, let expressions can be translated into procedure calls and lambda expressions. The following two expressions are equivalent:

(let ((x 23) (y 42)) (+ x y)) ⇒ 65 ((lambda (x y) (+ x y)) 23 42) ⇒ 65

Special forms like let expressions are called derived formsbecause their semantics can be derived from that of other kinds of forms by a syntactic transformation. Some procedure definitions are also derived forms. The following two definitions are equivalent:

(define (f x) (+ x 42)) (define f (lambda (x) (+ x 42)))

In Scheme, it is possible for a program to create its own derived forms by binding syntactic keywords to macros:

(define-syntax def (syntax-rules () ((def f (p ...) body) (define (f p ...) body)))) (def f (x) (+ x 42))

The define-syntax construct specifies that a parenthesized structure matching the pattern (def f (p ...) body), where f, p, and body are pattern variables, is translated to (define (f p ...) body). Thus, the def form appearing in the example gets translated to:

(define (f x) (+ x 42))

The ability to create new syntactic keywords makes Scheme extremely flexible and expressive, allowing many of the features built into other languages to be derived forms in Scheme.

1.10 Syntactic datums and datum values

A subset of the Scheme values called datum valueshave a special status in the language. These include booleans, number objects, characters, symbols, and strings as well as lists and vectors whose elements are datums. Each datum value may be represented in textual form as a syntactic datum, which can be written out and read back in without loss of information, giving a syntactic value equal to the original (in the sense of equal?; see section 9.6). A datum value may be represented by several different syntactic datums, but the datum value corresponding to a syntactic datum is uniquely determined up to equality (in the sense of equal?). Moreover, each datum value can be trivially translated to a literal expression in a program by prepending a ’ to a corresponding syntactic datum:

’23 ⇒ 23 ’#t ⇒ #t ’foo ⇒ foo ’(1 2 3) ⇒ (1 2 3) ’#(1 2 3) ⇒ #(1 2 3)

The ’ shown in the previous examples is not needed for number representations or boolean literals. The identifier foo is a syntactic datum that represents a symbol with name “foo”, and ’foo is a literal expression with that symbol as its value. (1 2 3) is a syntactic datum that represents a list with elements 1, 2, and 3, and ’(1 2 3) is a literal expression with this list as its value. Likewise, #(1 2 3) is a syntactic datum that represents a vector with elements 1, 2 and 3, and ’#(1 2 3) is the corresponding literal.

The syntactic datums form a superset of the Scheme forms. Thus, datums can be used to represent Scheme forms as data objects. In particular, symbols can be used to represent identifiers.

’(+ 23 42) ⇒ (+ 23 42) ’(define (f x) (+ x 42)) ⇒ (define (f x) (+ x 42))

This facilitates writing programs that operate on Scheme source code, in particular interpreters and program transformers.

1.11 Libraries

Scheme code can be organized in components called libraries. Each library contains definitions and expressions. It can import definitions from other libraries and export definitions to other libraries:

(library (hello) (export) (import (rnrs base (6)) (rnrs io simple (6))) (display "Hello World") (newline))

1.12 Top-level programs

A Scheme program is invoked via a top-level program. Like a library, a top-level program contains definitions and expressions, but specifies an entry point for execution. Thus a top-level program defines, via the transitive closure of the libraries it imports, a Scheme program.

#!r6rs (import (rnrs base (6)) (rnrs io ports (6)) (rnrs programs)) (put-bytes (standard-output-port) (call-with-port (open-file-input-port (cadr (command-line))) get-bytes-all))