[R6RS] draft Unicode SRFI

Thu Jun 30 02:25:34 EDT 2005

Good work---many thanks!

I've got a number of comments---as usual, I'm staying silent on the
stuff I like and sticking to the criticism, hopefully constructive.

Matthew Flatt <mflatt at cs.utah.edu> writes:

>  * I added the \<eol><whitespaces> and \<space> string escapes, as
>   discussed on this list.

I think I probably missed \<space>---does this mean I can only
terminate a variable-length escape sequence with a space?  If so, that
seems silly.

>  * Marc's here-string and quoted symbols are in.

> To discuss:

Whatever we don't resolve here should probably go into an issues
section.

>  * I left octal escapes for strings intact for compatibility for C.
>    (Also, I actually use them --- perhaps from spending too much time
>    with UTF-8 encoding.) There's no octal for characters, though.

Did you use octal escapes to denote UTF-8 code units or actually
scalar values?  (I'm still opposed.)

>  * I added an extension for symbols that allows any non-whitespace
>    character above 127 where a <letter> is allowed. Is this too
>    liberal?

Yes.  Shouldn't we at least restrict to Unicode letters and numbers?
It seems to me we at least should exclude Unicode separators.

>    Also, should we try to allow `->' as a symbol at this
>    point?

If we talk about symbols at all, we should.  (More on that below.)

- Could the SRFI please have an issues section where the things we
  haven't agreed on are listed?  On that list, by me, are:

  o duplication and potential confusion between #\linefeed and
    #\newline
  o alternate syntax for numerical scalar values in character and
    string literals
  o Anton's (Perl's) generalization of here strings
  o \<eol><whitespaces>
  o #\o<o><o><o>

  Otherwise, we'll just run around the same block on the SRFI list as
  here needlessly.  (We'll probably run around it anyway, but this
  way, at least it doesn't drop out of the sky for the readers.
  Politically, I also think we'll get into trouble with some of the
  potential participants of the discussion if we don't at least agree
  on that the issues are.)

- I think the delimiter issue for character literals could use an
  example.  Otherwise, the point may get lost on the casual reader.

- The document says "any C string literal is also a Scheme string
  literal": I don't believe that's true anymore, as the \x syntax is
  variable-length in C.  (The sentence is literally true, I guess, but
  not in a meaningful way.)  As a result, I'm pretty confused on the
  compatibility issue---if we're not compatible with C, we could also
  make octal escapes fixed-length at least, to make the whole
  scalar-value-literal issue a little less patchwork than it seems
  now. Compatibility with C and Java should also be in the issues
  section probably.

- I think the whole symbol-syntax issue hasn't been discussed
  adequately on the list, and I think it doesn't need to be in this
  SRFI.  If it is, there should be an entry in the issues section, and
  a rationale.  Moreover, as you pointed out, there should be a
  grammar for the lexical syntax.

- What are your plans wrt the reference implementation?  In my mind,
  we could and should provide one for most of it.  I'd be happy to
  donate code.

- I don't understand how I could portably use the locale functionality
  in my code, since the document doesn't specify a single string I
  might use as a locale name.  Also, the locale stuff could (and, to
  my mind should) have a reference implementation for at least some
  locales from the Unicode standard.  (We could bum a starting point
  off Alex Shinn, I think.)

- The sentence on UnicodeData.txt should probably be expanded a little
  bit and include a link to
  http://www.unicode.org/Public/UNIDATA/UnicodeData.txt be
  understandable by non-Unicode-wizards.

- The section on here strings should probably refer to the scsh
  manual, and possibly to the manuals of PLT Scheme and Gambit-C.

Typos:

- "charcter"

- "can be includes" -> "can be included"

- "returna" -> "return a"

- "delimitter"

- The document makes out Neil Van Dyke as an R6RS editor.

-- 
Cheers =8-} Mike
Friede, V?lkerverst?ndigung und ?berhaupt blabla