Formal comment #46 (defect) LF should not be the only line separator Reported by: Reinder Verlinde Component: unicode Version: 5.91 Section 3.2.1 contains the following production: --> that is not First, a minor, textual issue: the < and > tags do not balance (a > is missing at the end of the line) The major issue here is the choice to make "line feed" the only inter- line separator. http://en.wikipedia.org/wiki/Newline#Unicode, although far from normative, states: "The Unicode standard addresses the problem by defining a large number of characters that conforming applications should recognize as line terminators: LF: Line Feed, u000A CR: Carriage Return, u000D CR+LF: CR followed by LF, u000D followed by u000A NEL: Next Line, u0085 FF: Form Feed, u000C LS: Line Separator, u2028 PS: Paragraph Separator, u2029" but in my reading, it is consistent with http://www.unicode.org/reports/tr14. If a goal of the spec is to make Scheme Unicode compliant, it must follow the mandatory aspects of that page. When not making this change, be aware that source files will be single-line on some (minority) platforms (examples: Mac OS 9 and earlier, and if I read things correctly, EBCDIC-based systems) This may mean additional changes to the grammar. I haven't completely thought it over, but I think the cleanest approach would be to define the lexical syntax as consisting of two phases, the first of which normalizes line endings (say to LFs). RESPONSE: Unicode TR 14 does not seem relevant to the question of which characters are recognized as a linefeed following backslash in a string. TR 14 is about rendering text that is represented as a sequence of Unicode scalar values. Unicode TR 13, however, seems directly relevant. The recommendation of TR 13 is to recognize the platform-specific newline character (not LS (U+2028) and PS (U+2029). The R6RS grammar is meant to apply to a decoded stream. That is, specific bytes that represent a newline some stream are meant to be converted to the LF character in the character stream that is parsed as a program. For example, under Mac OS 9, it is expected that the decoder used for reading ASCII-encoded source code will decode a #xD byte as an LF character. If the file is read via an R6RS port, the port should use a transcoder whose eol-style is 'cr. Putting these pieces together, the R6RS grammar is almost consistent with Unicode TR 13. To make it more consistent, however, LS and PS should be added as alternatives for in the grammar, and we intend to make this change in the next draft.