[R6RS] draft Unicode SRFI

Thu Jun 30 23:50:19 EDT 2005

On 30-Jun-05, at 10:58 AM, Matthew Flatt wrote:

> Another draft is enclosed. See below for a few additional replies.

Nice job Matthew!  Here are some comments.

"about Unicode 127" -> "above Unicode 127" ?

I don't fully agree with this bullet in the issues section:

     "C supports octal notation within strings and characters (as
     do some Scheme implementations). Octal notation is not included
     in this draft because the notation is no longer popular, and
     the variable-width encoding is confusing."

because the \0 notation for the nul character is quite common.
Unfortunately it would be ugly to make a special case for the nul
character (i.e. only support \0), so the complete set of octal
escapes should be supported.  Moreover the octal string escapes
exist in several other programming languages (i.e. they are "more
portable", and consequently it is possible for "write" to use
octal escapes for control characters to obtain a string literal that
would be identical in C, Java, Python, Perl, etc).  I won't oppose
removing octal escapes in strings if a majority wants them out
of R6RS.  This is actually a good issue for the SRFI discussion
to work out.  So either we add the octal string escape syntax in
the SRFI, or we add to the above bullet a mention about the
downside of not supporting octal string escapes.

Concerning \x<x><x> I note that Python also restricts the number
of <x>s to exactly two and \u<x><x><x><x> to exactly four <x>s.
On the other hand, \u<x>... is restricted to exactly 8 <x>s in
Python.  But I feel OK with the SRFI's specification because it
is easy to remember (x=2,u=4,U=6) and is not wildly incompatible
with other practice.

I oppose the generalization of here strings to the Perl
syntax, because Scheme's syntax would no longer be context free.

In the "Locales" section:

    "and the Unicode downcase mapping should be used, for
    example, to case-normalize Scheme symbols."

I'm not sure what you are referring to, given that in R6RS
the syntax is case-sensitive.

Why should integer->char "signal an error" when given an
integer outside the Unicode scalar values?  Can't we just
say "it is an error"?

Why do you define <eol> as either Unicode 10 or Unicode 12?
Only Unicode 10 (the #\newline character) should mark an end
of line.

Something is wrong with the \<eol><whitespace>* syntax because
<whitespace> is defined as "space, newline, tab, form feed,
vertical tab, and carriage return" in SRFI 14.  Clearly newline
should be excluded.

In the "Procedures" section, one might think that "char-comparator"
and "char-ci-comparator" are procedures that are specified
by this SRFI.  A note should be added that these procedures
are not "exported".

"each accept two string" => "each accept two strings"

Why should a locale be specified with a string?  Wouldn't a
symbol be more sensible?  I have no experience with this, so
please correct me if I am wrong.  A symbol also avoids the
need to create a copy of the string when current-locale is
called.

Acknowledgements: let's get it right... "Van Staaten" -> "van
Straaten".

Marc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.iro.umontreal.ca/mailman/private/r6rs/attachments/20050630/d000d0ae/attachment.html