[R6RS] draft Unicode SRFI

Fri Jul 1 09:17:05 EDT 2005

New draft enclosed...

At Thu, 30 Jun 2005 23:50:40 -0400, Marc Feeley wrote:
> I don't fully agree with this bullet in the issues section:
> 
>      "C supports octal notation within strings and characters (as
>      do some Scheme implementations). Octal notation is not included
>      in this draft because the notation is no longer popular, and
>      the variable-width encoding is confusing."

I've revised it. 

> In the "Locales" section:
> 
>     "and the Unicode downcase mapping should be used, for
>     example, to case-normalize Scheme symbols."
> 
> I'm not sure what you are referring to, given that in R6RS
> the syntax is case-sensitive.

There was too much cut-and-paste here, now fixed.

> Why should integer->char "signal an error" when given an
> integer outside the Unicode scalar values?  Can't we just
> say "it is an error"?

Ok.

> Why do you define <eol> as either Unicode 10 or Unicode 12?
> Only Unicode 10 (the #\newline character) should mark an end
> of line.
> 
> Something is wrong with the \<eol><whitespace>* syntax because
> <whitespace> is defined as "space, newline, tab, form feed,
> vertical tab, and carriage return" in SRFI 14.  Clearly newline
> should be excluded.

Thanks for bringing this up. I had forgotten that it's an issue.

Here's my take:

The wide definition of <eol> means that a file improperly
transferred/opened on systems with different line-separator conventions
will still work.

Examples:

 * Depending on how it's created, a Mac OS file might have \r an
   separator instead of \n. Allowing \r as <eol> means that the escape
   works in the way that a casual reader would expect, no matter how
   the file is opened.

 * Depending on how it's created, a Windows file is likely to have an
   \r\n separator. Same as above, and if \r is an <eol>, then \n should
   be allowed to continue the escape.

Of course, I'm aware of the convention for reading a file in "text"
mode, but I believe that it works badly in practice. (I frequently have
to clean up files sent to me that have \r\r\n in them.) We can design
around it effectively.

My line of reasoning doesn't work for here-strings. I predict that
widely used libraries won't have them (if the end-of-line content of
the string matters), because it won't be worth the trouble that people
create by incorrectly transferring the code. I like here-strings for
one-off scripts, though.

For now, I've just added to "Issues", but I'll change the spec if
others also prefer <newline> in place of <eol>.

> In the "Procedures" section, one might think that "char-comparator"
> and "char-ci-comparator" are procedures that are specified
> by this SRFI.  A note should be added that these procedures
> are not "exported".

I added a note.

> Why should a locale be specified with a string?  Wouldn't a
> symbol be more sensible?  I have no experience with this, so
> please correct me if I am wrong.  A symbol also avoids the
> need to create a copy of the string when current-locale is
> called.

It sounds like `with-locale' belongs in a SRFI that better standardizes
locales. Let's get rid of it and `current-locale'. I think the other
`locale' functions are still useful with the default locale.

> Acknowledgements: let's get it right... "Van Staaten" -> "van
> Straaten".

Yes, I'm hopeless.

-Mathyew