[R6RS] Timeline for R6RS SRFIs

Marc Feeley feeley
Fri Jun 3 11:34:48 EDT 2005


On 3-Jun-05, at 10:41 AM, Manuel Serrano wrote:

> Marc wrote,
>
>
>>> As for port creation procedures, no new procedures are required.  A
>>> port (as created by open-input-file, etc) would be viewed as a  
>>> stream
>>> of octets and read-u8 and write-u8 would impose a character encoding
>>> on that stream of octets (either in an implementation defined way or
>>>
>>>
>>
>> Sorry, I meant read-char and write-char impose a character  
>> encoding on
>> the octet stream.
>>
> Could you elaborate on that. I don't really understand what you  
> mean. Sorry.
>
> --
> Manuel
>

What I mean is that all R6RS ports (at this point this means ports  
attached to files) are conceptually a stream of octets.  The stream  
of octets can be read with the procedure read-u8.  Now the procedure  
read-char can be implemented in terms of read-u8 like this:

    (define (read-char . other) ; implements latin1 encoding
      (let ((port (if (null? other) (current-input-port) (car other))))
        (let ((n (read-u8 port)))
          (if (eof-object? n)
              n
              (integer->char n)))))

or like this

    (define (read-char . other) ; implements utf8 encoding
      (let ((port (if (null? other) (current-input-port) (car other))))
        (let ((a (read-u8 port)))
          (if (eof-object? a)
              a
              (cond ((<= a #x7f)
                     (integer->char a))
                    ((<= a #xbf)
                     (let ((b (read-u8 port)))
                       (if (or (eof-object? b) (>= b #x80))
                           (error "invalid utf8 encoding")
                           (integer->char (+ (* 128 (modulo a 64))
                                             (modulo b 128))))))
                    ...etc)))))

The encoding of characters which read-char and write-char use could be

0) implementation defined

1) implementation defined but the same for all file ports

2) specified by R6RS, for example utf8 (and thus the same on all
    R6RS Scheme implementations)

3) optionally specified in the call to open-input-file, with-input- 
from-file, etc
    for example: (open-input-file '(path: "foo" char-encoding: utf8))
    [if not specified it would be like one of the other options]

My preference is for option 3 with a default to option 1, but if that  
is controversial I can accept any of the other options (after all  
even option 0 conforms to R5RS).

What I specifically don't want is extending the set of port creation  
procedures with names that indicate the character encoding or the  
fact that it is for binary I/O, i.e. open-binary-input-file, open- 
utf8-input-file, etc.  This is the wrong way to generalize the port  
creation procedures (how would the names be extended to indicate the  
end-of-line encoding? the buffering? etc).  In fact if we agree on  
option 3, I would suggest adding a "direction" setting and an "open- 
file" procedure so that:

    (open-input-file "foo")  = (open-file '(path: "foo" direction:  
input))
    (open-output-file "foo") = (open-file '(path: "foo" direction:  
output))

But we should keep the open-input-file and open-output-file  
procedures for backward-compatibility, and because the direction is a  
fundamental setting of a port that you always need to specify.

Marc



More information about the R6RS mailing list