[R6RS] string->number

Wed Jun 8 11:18:57 EDT 2005

This talk of identifers and numbers has reminded me of some string->number
issues that have come up in the past, particularly regarding division
by zero but also regarding overflow or underflow.  This is relevant
to our discussion of identifiers if we define the syntax in terms of
what string->number returns, and we should nail down the semantics of
string->number in any case.

At issue is what string->number should do with inputs like "0/0", "0#/0",
and #e1e1000.  (string->number "0/0") might return #f or it might signal
an error.  (string->number "0#/0") might return +nan.0 or #f or it might
signal an error.  (string->number "#e1e1000") might return (expt 10 1000)
or it might return false or signal an error, since 1e1000 => +inf.0 with
64-bit floats and there is no exact representation for +inf.0.

We can actually separate this into two issues, which are (1) whether
string->number should signal an error or return #f when handed
a syntactically valid number that cannot be represented, and (2) how
string->number should combine numbers with mixed exact and inexact
components or treat numbers that begin with the #e and #i prefixes.

With respect to the first issue, the R5RS description of string->number
is no help:

  Returns a number of the maximally precise representation expressed by
  the given string.
  [...]
  If string is not a syntactically valid notation for a number, then
  string->number returns #f.

In general, there are many syntactically valid notations for numbers
that may not be representable, like 0/0, and R5RS doesn't specify what
string->number (or read, for that matter) does in such cases.

I believe that it's more convenient for string->number to return #f
(and never signal errors), but that it's also preferable for the reader
to signal an error in such cases.  Unfortunately, this would mean that a
read implementation could not rely entirely on string->number to decide
what constitutes a syntactically valid notation for a number.

I didn't find any help in R5RS for the second issue either.  In a
discussion with Matthew some time back, I identified three possible
options.  There are surely other reasonable options as well.

(A) Treat each subpart as inexact if the #i prefix is specified or the
    #e prefix is not specified and any subpart is inexact, i.e.,
    contains a decimal point, exponent, or # character.  Treat each
    subpart as exact if the #e prefix is specified or if the #i prefix
    is not specified and each subpart is exact.

(B) Treat each subpart as exact or inexact in isolation and use the
    usual rules for preserving inexactness when combining the subparts.
    Apply exact->inexact to the result if #i is present and inexact->exact
    to the result if #e is present.

(C) If #e and #i are not present, treat each subpart as exact or inexact
    in isolation and use the usual rules for preserving inexactness when
    combining the subparts.  If #e is present, treat each subpart as
    exact, with # digits treated as zeros.  If #i is present, treat each
    subpart as inexact.

Here are some examples highlighting the differences among these options.
The * entries represent cases where string->number might signal an error
or return #f, depending upon what we decide on the first issue.

                       A                B                C

0/0                     *                *                *
0/0#               +nan.0                0                0
0#/0               +nan.0                *                *
0#/0#              +nan.0           +nan.0           +nan.0
#i0/0              +nan.0                *           +nan.0
#i0/0#             +nan.0              0.0           +nan.0
#i0#/0             +nan.0                *           +nan.0
#i0#/0#            +nan.0           +nan.0           +nan.0
#e0/0                   *                *                *
#e0/0#                  *                0                *
#e0#/0                  *                *                *
#e0#/0#                 *                *                *

1/0                     *                *                *
1/0#               +inf.0           +inf.0           +inf.0
1#/0               +inf.0                *                *
1#/0#              +inf.0           +inf.0           +inf.0
#i1/0              +inf.0                *           +inf.0
#i1/0#             +inf.0           +inf.0           +inf.0
#i1#/0             +inf.0                *           +inf.0
#i1#/0#            +inf.0           +inf.0           +inf.0
#e1/0                   *                *                *
#e1/0#                  *                *                *
#e1#/0                  *                *                *
#e1#/0#                 *                *                *

1/0+1.0i          +nan.0+1.0i            *                *
1.0+1/0i           1.0+nan.0i            *                *

#e1e1000         (expt 10 1000)          *         (expt 10 1000)
#e1#e1000        (expt 10 1001)          *         (expt 10 1001)

Chez Scheme's string->number currently implements option C and returns
#f in the * cases (i.e., never signals an error), but not I'm proposing
anything at this point, just pointing out the issue.

Kent