Base library

The libraries of the Scheme standard are organized according to
projected use. Hence, the `(rnrs base (6))` library exports
procedures and syntactic abstractions that are likely to be useful for
most Scheme programs and libraries. Conversely, each of the libraries
relegated to the separate report on libraries is likely to be missing
from the imports of a substantial number of programs and libraries.
Naturally, the specific decisions about this organization and the
separation of concerns of the libraries are debatable, and represent a
best attempt of the editors.

A number of secondary criteria were also used in choosing the exports
of the base library. In particular, macros transformers defined using
the facilities of the base library are guaranteed to be hygienic;
hygiene-breaking transformers are only available through the
`(rnrs syntax-case (6))` library.

Note that `(rnrs base (6))` is not a “primitive library” in the
sense that all other libraries of the Scheme standard can be
implemented portably using only its exports. Moreover, the library
organization is generally not layered from more primitive to more advanced
libraries, even though some libraries can certainly be implemented in
terms of others.
Such an organization would have little benefit for users and may not
reflect the internal organization of any particular implementation.
Instead, libraries are organized by use.

The distinction between primitive and derived features was removed from the report for similar reasons.

In library bodies and local bodies, all definitions must precede all
expressions. R^{6}RS treats bodies in top-level programs as a special
case. Allowing definitions and expressions to be mixed in top-level
programs has ugly semantics, and introduces a special case, but was
allowed as a concession to convenience when constructing programs
rapidly via cut and paste.

Definitions are not interchangeable with expressions, so definitions cannot be allowed to appear wherever expressions can appear. Composition of definitions with expressions therefore must be restricted in some way. The question is what those restrictions should be.

Historically, top-level definitions in Scheme have had a different
semantics from definitions in bodies. In a body, definitions serve as
syntactic sugar for the bindings of a `letrec` (or `letrec*`
in R^{6}RS) that is implicit at the head of every body.

That semantics can be stretched to cover top-level programs by converting expressions to definitions of ignored variables, but does not easily generalize to allow definitions to be placed anywhere within expressions. Different generalizations of definition placement are possible, however a survey of current Scheme code found surprisingly few places where such a generalization would be useful.

If such a generalization were adopted, programmers who are
familiar with Java and similar languages might expect definitions to
be allowed in the same kinds of contexts that allow declarations in
Java. However, Scheme
definitions have `letrec*` scope, while Java declarations (inside
a method body) have `let*` scope and cannot be used to define
recursive procedures. Moreover, Scheme's `begin` expressions do not introduce
a new scope, while Java's curly braces do introduce a new scope. Also,
flow analysis is nontrivial in higher order languages, while Java can
use a trivial flow analysis to reject programs with undefined
variables. Furthermore, Scheme's macro expander must locate all definitions,
while Java has no macro system. And so on. Rather than explain how
those facts justify restricting definitions to appear as top-level
forms of a body, it is simpler to explain that definitions are just
syntactic sugar for the bindings of an implicit `letrec*` at the
head of each body, and to explain that the relaxation of that
restriction for top-level bodies is (like several other features of
top-level programs) an ad-hoc special case.

The `syntax-rules` and `identifier-syntax` forms are
used to create macro transformers and are thus needed only at
expansion time, i.e., meta level 1.

The identifiers `unquote`, `unquote-splicing`, `=>`, and
`else` serve as literals in the syntax of one or more
syntactic forms; e.g., `else` serves as a
literal in the syntax of `cond` and `case`.
Bindings of these identifiers are exported from the base library so
that they can be distinguished from other bindings of these identifiers
or renamed on import.
The identifiers `...`, `_`, and `set!` serve as
literals in the syntax of `syntax-rules` and
`identifier-syntax` forms and are thus exported along with those
forms with level 1.

The `let-values` and `let-values*` forms are compatible with
SRFI 11 [16].

The definition of `eqv?` allows implementations latitude in
their treatment of procedures: implementations are free either to
detect or to fail to detect that two procedures are equivalent to each
other, and can decide whether or not to merge representations of
equivalent procedures by using the same pointer or bit pattern to
represent both. Moreover, they can use implementation techniques such
as inlining and beta reduction that duplicate otherwise equivalent
procedures.

The basic reason why the behavior of `eqv?` is not specified on
NaNs is that the IEEE-754 standard does not say much about how the
bits of a NaN are to be interpreted, and explicitly allows
implementations of that standard to use most of a NaN's bits to encode
implementation-dependent semantics. The implementors of a Scheme
system should therefore decide how `eqv?` should interpret those
bits.

Arguably, R^{6}RS should require

`
to evaluate to ``#t` when <expression> evaluates to a number object;
both R^{5}RS and R^{6}RS imply this for certain other types, and for
most numbers objects, but not for NaNs. Since the IEEE 754 and draft
IEEE 754R [19] both say that the interpretation of a NaN's
payload is left up to implementations, and implementations of Scheme
often do not have much control over the implementation of IEEE
arithmetic, it would be unwise for R^{6}RS to insist upon the truth of

(or (not (number? x))

(eqv? x x)))

`
even though that expression is likely to evaluate to ``#t` in most
systems. For example, a system with delayed boxing of inexact real
number objects might box the two arguments to `eqv?` separately, the boxing
process might involve a change of precision, and the two separate
changes of precision may result in two different payloads.

When *x* and *y* are flonums represented in IEEE floating
point or similar, it is reasonable to implement `(eqv? x
y)` by a bitwise comparison of the floating-point
representations. R

R

^{6}RS does not require that flonums be represented by a floating-point representation,the interpretation of a NaN's payload is explicitly implementation-dependent according to both the IEEE-754 standard and the current draft of its proposed replacement, IEEE 754R, and

the semantics of Scheme should remain independent of bit-level representations.

For example, IEEE 754, IEEE 754R, and the draft R^{6}RS all allow the
external representation `+nan.0` to be read as a NaN whose payload
encodes the input port and position at which `+nan.0` was read.
This is no different from any other external representation such as
`()`, ` #()`, or

It is usually possible to implement `eq?` much more efficiently
than `eqv?`, for example, as a simple pointer comparison instead
of as some more complicated operation. One reason is that it may not
be possible to compute `eqv?` of two number objects in constant time,
whereas `eq?` implemented as pointer comparison will always
finish in constant time. The `eq?` predicate may be used like
`eqv?` in applications using procedures to implement objects with
state since it obeys the same constraints as `eqv?`.

R^{5}RS does not require implementations to support the full numeric
tower. Consequently, writing portable R^{5}RS programs that
perform substantial arithmetic is difficult; it is unnecessarily difficult even
to write programs whose arithmetic is portable between different
implementations in the same category. The portability problems were
most easily solved by requiring all implementations to support the
full numerical tower.

As mentioned in chapter 3, the treatment of
infinities, NaNs and -0.0, if present in a Scheme implementation, are
in line with IEEE 754 [18] and IEEE 754R [19].
Analogously, the specification of branch cuts for certain
transcendental functions have been changed from R^{5}RS to conform to
the IEEE standard.

The specification of the transcendental functions follows Steele [37], which in turn cites Penfield [26]; refer to these sources for more detailed discussion of branch cuts, boundary conditions, and implementation of these functions.

The domains of the `finite?`, `infinite?`, and `nan?`
procedures could be expanded to include all number objects, or perhaps even
all objects. However, R^{6}RS restricts them to real number objects.
Expanding `nan?` to complex number objects would involve at least some
arbitrariness; not expanding its domain while expanding the domains of
the other two would introduce an irregularity into the domains of
these three procedures, which are likely to be used together. It is
easier for programmers who wish to use these procedures with complex
number objects to express their intent in terms of the real-only versions
than it would be for the editors to guess their intent.

Scheme's numerical types are the exactness types exact and inexact,
the tower types integer, rational, real, complex, and number, and the
Cartesian product of the exactness types with the tower types, where
< *t*_{1}, *t*_{2} >; is regarded as a subtype of both *t*_{1} and
*t*_{2}.

These types have an aesthetic symmetry to them, but they are not equally
important
In practice,
there is reason to believe that the most important numerical types are
the exact integer objects, the exact rational number objects, the
inexact real number objects, and the
inexact complex number objects. This section explores one of the reasons
those four types are important in practice, and why real number objects have an
exact zero as their imaginary part in R^{6}RS (a change from R^{5}RS).

Each of the four types mentioned above corresponds to a set of values that turns up repeatedly as the natural domain or range of the functions that are computed by Scheme's standard procedures. The reason these types turn up so often is that they are closed under certain sets of operations.

The exact integer objects, for example, are closed under the integral
operations of addition, subtraction, and multiplication. The exact
rational number objects are closed under the rational operations, which consist of
the integral operations plus division (although division by zero is a special
case). The real number objects (and inexact real number objects) are closed
under some (often inexact) interpretation of rational and irrational
operations such as `exp` and `sin`, but are not closed under operations
such as `log`, `sqrt`, and `expt`. The complex (and
inexact complex) number objects are closed under the largest set of
operations.

A naive implementation of Scheme's arithmetic operations is slow compared to the arithmetic operations of most other languages, mainly because most operations must perform case dispatch on the representation types of their arguments. The potential for this case dispatch arises when the type of an operation's argument is represented by a union of two or more representation types, or because the operation must raise an exception when given an argument of an incorrect type. (The second reason can be regarded as a special case of the first.)

To make Scheme's arithmetic more efficient, many implementations provide sets of operations whose domain is restricted to a single representation type, and which are not expected to raise an exception when given arguments of incorrect type when used in an unsafe mode.

Alternatively, or in addition, several compilers perform a flow analysis that attempts to infer the representation types of expressions. When a single representation type can be inferred for each argument of an operation, and those types match the types expected by some representation-specific version of the operation, then the compiler can substitute the specific version for the more general version that was specified in the source code.

Flow analysis is performed by solving the type and interval constraints that arise from such things as:

the types of literal constants, e.g.

`2`is an exact integer object that is known to be within the interval [2,2]conditional control flow that is predicated on known inequalities, e.g.,

`(if (< i n) <expression`_{1}> <expression_{2}>)conditional control flow that is predicated on known type predicates, e.g.,

`(if (real? x) <expression`_{1}> <expression_{2}>)the closure properties of known operations (for example,

`(+`always evaluates to a flonum)*flonum*_{1}*flonum*)_{2}

The purpose of flow analysis (as motivated in this section) is to infer a single representation type for each argument of an operation. That places a premium on predicates and closure properties from which a single representation type can be inferred.

In practice, the most important single representation types are fixnum, flonum, and compnum. (A compnum is a pair of flonums, representing an inexact complex number object.) These are the representation types for which a short sequence of machine code can be generated when the representation type is known, but for which considerably less efficient code will probably have to be generated when the representation type cannot be inferred.

The fixnum representation type is not closed under any operation of
R^{5}RS, so it is hard for flow analysis to infer the fixnum type from
portable code. Sometimes the combination of a more general type (e.g.,
exact integer object) and an interval (e.g., [0,*n*), where *n* is known to
be a fixnum) can imply the fixnum representation type. Adding
fixnum-specific operations that map fixnums to fixnums
greatly increases the number of fixnum
representation types that a compiler can infer.

The flonum representation type is not closed under operations such as
`sqrt` and `expt`, so flow analysis tends to break down in the
presence of those operations. This is unfortunate, because those
operations are normally used only with arguments for which the result
is expected to be a flonum. Adding flonum-specific versions such as
`flsqrt` and `flexpt` improves the effectiveness of flow
analysis.

R^{5}RS creates a more insidious problem by defining `(real?
z)` to be true if and only if

The compnum representation type is closed under virtually all operations, provided no operation that accepts two compnums as its argument ever returns a flonum. To work around the problem described in the paragraph above, several implementations automatically convert compnums with a zero imaginary part to the flonum representation. This practice virtually destroys the effectiveness of flow analysis for inferring the compnum representation, so it is not a good workaround. To improve the effectiveness of flow analysis, it is better to change the definition of Scheme's real number objects as described in the paragraph above.

Given arithmetic on exact integer objects of arbitrary precision, it is a
trivial matter to derive signed and unsigned integer types of finite
range from it by modular reduction. For example 32-bit signed
two-complement arithmetic behaves like computing with the residue
classes “mod 2^{32}”, where the set {−2^{31}, `...`,
2^{31}−1} has been chosen to represent the residue classes.
Likewise, unsigned 32-bit arithmetic also behaves like computing “mod
2^{32}”, but with a different set of representatives {0, `...`,
2^{32}−1}.

Unfortunately, the R^{5}RS operations `quotient`, `remainder`,
and `modulo` are not ideal for this purpose. In the following
example, `remainder` fails to transport the additive group
structure of the integers over to the residues modulo 3.

(remainder (+ (remainder -2 3)

(remainder 3 3))

3) ⇒ -2

`
In fact, ``modulo` should have been used, producing residues in
{0,1,2}. For modular reduction with symmetric residues, i.e., in
{−1,0,1} in the example, it is necessary to define a more
complicated reduction altogether.

Therefore, `quotient`, `remainder`, and `modulo` have been
replaced in R^{6}RS by the `div`, `mod`, `div0`, and `mod0` procedures, which are more useful when implementing modular
reduction. The underlying mathematical functions *d**i**v*,
*m**o**d*, *d**i**v*_{0}, and *m**o**d*_{0} (see report
section on “Integer division”) have been
adapted from the *d**i**v* and *m**o**d* operations by Egner
et al. [11]. They differ in the representatives
from the residue classes they return: *d**i**v* and *m**o**d*
always compute a non-negative residue, whereas *d**i**v*_{0} and
*m**o**d*_{0} compute a residue from a set centered on 0. The
former can be used, for example, to implement unsigned fixed-width
arithmetic, whereas the latter correspond to two's-complement arithmetic.

These operations differ slightly from the *d**i**v* and
*m**o**d* operations from Egner et al. The latter make both operations
available through a single pair of operations that distinguish
between the two cases for residues by the sign of the divisor (as well
as returning 0 for a zero divisor). Splitting the operations into
two sets of procedures avoids potential confusion.

The procedures `modulo`, `remainder`, and `quotient` from
R^{5}RS can easily be defined in terms of `div` and `mod`.

The behavior of the numerical type predicates `complex?`, `real?`, `rational?`, and `integer?` is motivated by
closure properties described in
section 11.6.6. Conversely, the procedures
`real-valued?`, `rational-valued?`, and `integer-valued?`
test whether a given number object can be coerced to the specified type
without loss of numerical accuracy.

`round`-
The
`round`procedure rounds to even for consistency with the default rounding mode specified by the IEEE floating-point standard. `sqrt`-
The behavior of
`sqrt`is consistent with the IEEE floating-point standard. `number->string`-
If
*z*is an inexact number object represented using binary floating point, and the radix is 10, then the expression listed in the specification is normally satisfied by a result containing a decimal point. The unspecified case allows for infinities, NaNs, and representations other than binary floating-point.

While R^{5}RS specifies characters and strings in terms of its own,
limited character set, R^{6}RS specifies characters and strings in
terms of Unicode. The primary goal of the design change was to
improve the portability of Scheme programs that manipulate text, while
preserving a maximum of backward compatibility with R^{5}RS.

R^{6}RS defines characters to be representations of Unicode scalar
values, and strings to be indexed sequences of characters. This is a
different representation for Unicode text than the representations
chosen by some other programming languages such as Java or
C`#`, which use UTF-16 code units as the basis for the type
of characters.

The representation of Unicode text corresponds to the lowest semantic level of the Unicode standard: The Unicode standard specifies most semantic properties in terms of Unicode scalar values. Thus, Unicode strings in Scheme allow the straightforward implementation of semantically sensitive algorithms on strings in terms of these scalar values.

In contrast, UTF-16 is a specific encoding for Unicode text, and performing semantic manipulation on UTF-16 representations of text is awkward. Choosing UTF-16 as the basis for the string representation would have meant that a character object potentially carries no semantic information at all, as surrogates have to be combined pairwise to yield the corresponding Unicode scalar value. (As a result, Java provides some semantic operations on Unicode text in two overloadings, one for character objects and one for integers that are Unicode scalar values.)

The surrogates cover a numerical range deliberately omitted from the
set of Unicode scalar values. Hence, surrogates have no
representation as characters—they are merely an artifact of the
design of UTF-16. Including surrogates in the set of characters
introduces complications similar to the complications of using UTF-16
directly. In particular, most Unicode consortium standards and
recommendations explicitly prohibit unpaired surrogates, including the
UTF-8 encoding, the UTF-16 encoding, the UTF-32 encoding, and
recommendations for implementing the ANSI C `wchar_t` type. Even
UCS-4, which originally permitted a larger range of values that
includes the surrogate range, has been redefined to match UTF-32
exactly. That is, the original UCS-4 range was shrunk and surrogates
were excluded.

Arguably, a higher-level model for text could be used as the basis for Scheme's character and string types, such as grapheme clusters. However, no design satisfying the goals stated above was available when the report was written.

Symbols have exactly the properties needed to represent identifiers in programs, and so most implementations of Scheme use them internally for that purpose. Symbols are useful for many other applications; for instance, they may be used the way enumerated values are used in C and Pascal.

A common use of `call-with-current-continuation` is for
structured, non-local exits from loops or procedure bodies, but in fact
`call-with-current-continuation` is useful for implementing a
wide variety of advanced control structures.

Most programming languages incorporate one or more special-purpose
escape constructs with names like `exit`, `return`, or
even `goto`. In 1965, however, Peter Landin [23]
invented a general-purpose escape operator called the J-operator. John
Reynolds [28] described a simpler but equally powerful
construct in 1972. The `catch` special form described by Sussman
and Steele in the 1975 report on Scheme is exactly the same as
Reynolds's construct, though its name came from a less general construct
in MacLisp. Several Scheme implementors noticed that the full power of the
`catch` construct could be provided by a procedure instead of by a
special syntactic construct, and the name
`call-with-current-continuation` was coined in 1982. This name is
descriptive, but opinions differ on the merits of such a long name, and
some people use the name `call/cc` instead.

The `dynamic-wind` procedure was added more recently in R^{5}RS.
It enables implementing a number of abstractions related to
continuations, such as implementing a general dynamic environment, and
making sure that finalization code runs when some dynamic extent
expires. More generally, the `dynamic-wind` procedure provides a
guarantee that

`
cannot call `*thunk* unless *before* has been called, and it
cannot leave the dynamic extent of the call to *thunk* without
calling *after*. These evaluations are never nested. As this
guarantee is crucial for enabling many of the uses of `call-with-current-continuation` and `dynamic-wind`, both are
specified jointly.

Many computations conceptually return several results. Scheme
expressions implementing such computations can return the results as
several values using the `values` procedure. Of course, such
expressions could alternatively return the results as a single
compound value, such as a list, vector, or a record. However, values
in programs usually represent conceptual wholes; in many cases,
multiple results yielded by a computation lack this coherence.
Moreover, this would be inefficient in many implementations, and a
compiler would need to perform significant optimization to remove the
boxing and unboxing inherent in packaging multiple results into a
single values. Most importantly, the mechanism for multiple values in
Scheme establishes a standard policy for returning several results
from an expression, which makes constructing interfaces and using them
easier.

R^{6}RS does not specify the semantics of multiple values completely.
In particular, it does not specify what happens when several
(or zero) values are returned to a continuation that implicitly
accepts only one value. In particular:

⇒

`
Whether an implementation must raise an exception when evaluating such
an expression, or should exhibit some other, non-exceptional behavior
is a contentious issue. Variations of two different and fundamentally
incompatible positions on this issue exist, each with its own merits:
`

Passing the wrong number of values to a continuation is typically a violation, one that implementations ideally detect and report.

There is no such thing as returning the wrong number of values to a continuation. In particular, continuations not created by

`begin`or`call-with-values`should ignore all but the first value, and treat zero values as one unspecified value.

R^{6}RS allows an implementation to take either position. Moreover, it
allows an implementation to let `set!`, `vector-set!`, and
other effect-only operators to pass zero values to their
continuations, preventing a program from making obscure use of the return
value. This causes a potential compatibility problem with R^{5}RS,
which specifies that such expression return a single unspecified
value, but the benefits of the change were deemed to outweigh the costs.

While the first subform of <srpattern> of a <syntax rule>
in a `syntax-rules` form (see report
section on “Macro transformers”)
may be an identifier, the
identifier is not involved in the matching and is not considered a
pattern variable or literal identifier. This is actually important,
as the identifier is most often the keyword used to identify the
macro. The scope of the keyword is determined by the binding form or
syntax definition that binds it to the associated macro transformer.
If the keyword were a pattern variable or literal identifier, then the
template that follows the pattern would be within its scope regardless
of whether the keyword were bound by `let-syntax`, `letrec-syntax`, or `define-syntax`.