[R6RS] draft Unicode SRFI

Matthew Flatt mflatt
Thu Jun 30 09:08:07 EDT 2005

At Thu, 30 Jun 2005 08:24:33 +0200, Michael Sperber wrote:> >  * I added the \<eol><whitespaces> and \<space> string escapes, as
> >   discussed on this list.
> I think I probably missed \<space>---does this mean I can only
> terminate a variable-length escape sequence with a space?

No. It's so you can terminate a \<eol> sequence and continue with spaces.
See Kent's message here for an example:


> >  * I left octal escapes for strings intact for compatibility for C.
> >    (Also, I actually use them --- perhaps from spending too much time
> >    with UTF-8 encoding.) There's no octal for characters, though.
> Did you use octal escapes to denote UTF-8 code units or actually
> scalar values?

Code units.

> >  * I added an extension for symbols that allows any non-whitespace
> >    character above 127 where a <letter> is allowed. Is this too
> >    liberal?
> Yes.  Shouldn't we at least restrict to Unicode letters and numbers?

I think we want Unicode symbols to be in Scheme symbols, for example.

> It seems to me we at least should exclude Unicode separators.

Separators are defined by SRFI-14 to be whitespace, right?

> - Could the SRFI please have an issues section where the things we
>   haven't agreed on are listed?

Ok, I'll add that.

> - I think the delimiter issue for character literals could use an
>   example.  Otherwise, the point may get lost on the casual reader.

I'll add that, too.

> - The document says "any C string literal is also a Scheme string
>   literal": I don't believe that's true anymore, as the \x syntax is
>   variable-length in C.

In that case, I favor changing \x, but...

>  (The sentence is literally true, I guess, but
>   not in a meaningful way.)  As a result, I'm pretty confused on the
>   compatibility issue---if we're not compatible with C, we could also
>   make octal escapes fixed-length at least, to make the whole
>   scalar-value-literal issue a little less patchwork than it seems
>   now. Compatibility with C and Java should also be in the issues
>   section probably.

... there seems to be more support among the editors to ditch octal and
not worry about complete compatibility with C. That's ok with me.

> - What are your plans wrt the reference implementation?  In my mind,
>   we could and should provide one for most of it.  I'd be happy to
>   donate code.

I had no plans. I'm happy to assemble code starting with yours.

> - I don't understand how I could portably use the locale functionality
>   in my code, since the document doesn't specify a single string I
>   might use as a locale name.  Also, the locale stuff could (and, to
>   my mind should) have a reference implementation for at least some
>   locales from the Unicode standard.  (We could bum a starting point
>   off Alex Shinn, I think.)

I'll investigate standards on locale names, which I've never done
before. In MzScheme, it's effectively defined as "whatever setlocale()

> - The sentence on UnicodeData.txt should probably be expanded a little
>   bit and include a link to
>   http://www.unicode.org/Public/UNIDATA/UnicodeData.txt be
>   understandable by non-Unicode-wizards.


> - The section on here strings should probably refer to the scsh
>   manual, and possibly to the manuals of PLT Scheme and Gambit-C.


> Typos:


> - The document makes out Neil Van Dyke as an R6RS editor.

Sorry, Anton! (You, Neil, and David were faceless northeastern "Van"s
to me on the plt mailing list, at first. I don't confuse the people,
anymore, but I sometimes use the wrong name --- just like I sometimes
call my kids by the wrong names.)


More information about the R6RS mailing list