[R6RS] High-level I/O proposal

Marc Feeley feeley
Tue May 24 21:51:33 EDT 2005


This I/O system proposal is a fairly complete subset of Gambit's I/O
system.  Some of the features are rather specialized
(e.g. object->string procedure and process-ports) and I'm willing to
drop them from the proposal if they are too controversial.

Marc


High-level I/O
==============

1. STREAM API

Here's how I define the stream API which is central to any I/O system:

1) The stream API is used for accessing sequential data.  The stream is
    first opened, then there is a sequence of read or write operations,
    the end of the stream can be tested (e.g. with eof-object?), and
    finally the stream is closed.

2) Read and write operations will block the process/thread until
    the operation can complete.

3) The stream API can be extended with a "seek" operation to allow
    random access to data.  This only makes sense for some sources of
    data, such as regular files.

4) Bidirectional streams are those that allow both read and write
    operations.  A bidirectional stream can be viewed as the fusion of
    two independent streams, and it may make sense to close one
    direction independently from the other.

5) Streams of one type of data can be used to represent streams of
    another type of data by adopting encoding and decoding procedures.
    For example, the Scheme procedures write and read implement the
    encoding and decoding of a Scheme datum to/from a sequence of
    characters.  Similarly, a stream of characters can be represented
    with a stream of octets by choosing a character encoding
    (e.g. latin1, utf8, utf16le, etc).

Note that there are several different types of data that can be
accessed with the stream API.  Some classical types are:

   - regular files on the filesystem (as opened with stdio's "fopen")
   - TCP sockets (as opened with Unix's "connect")
   - subprocesses (as opened with Unix's "popen")

There are other types of data for which the stream API is appropriate:

   - String ports. A read operation returns the next character.  A
     write operation adds a character to the string of characters
     accumulated by the port.  String ports can be generalized to other
     types of streams: streams of octets ("u8vector" ports) and streams
     of Scheme objects ("vector" ports).  Streams of Scheme objects are
     particularly useful as FIFOs (aka "pipes") to connect client
     threads to a server thread (the elements of the FIFO are requests
     for the server).  A read operation on an empty FIFO will block
     the thread until a client sends a request.

   - Directories (as opened with Unix's "opendir").  A read operation
     returns the next file name.  Using the stream API avoids the
     problem of a "directory->list" type API which would momentarily
     consume a large amount of space for storing the list of file names
     of a very big directory when you simply want to iterate over the
     files in the directory.

   - TCP ports for accepting connections (as with Unix's "accept").
     A read operation accepts the next connection request and
     returns a bidirectional port to interact with the client.
     The read will block if no connection request is queued on
     the IP port number associated with the TCP port.

2. ENCODING

R5RS ports only support text.  This allows streams of characters and
also streams of Scheme objects (by encoding the Scheme objects with
characters using their external representation).  Unfortunately not
all Scheme objects have an external representation with write/read
invariance and the encoding of characters is not specified by R6RS.
R6RS should also support binary I/O.

Note that characters can be encoded with sequences of octets (using
any of the standard encodings: latin1, utf8, utf16le, ...), and that
objects can be encoded with sequences of characters (using the R5RS
write and read procedures).  So this suggests that ports can be
organized in an inheritance hierarchy such that operations possible
for a certain class of port are also possible for the subclasses.
Here are the four abstract classes proposed:

   1) An "object-port" (or simply a port) provides operations to read
      and write Scheme data (i.e. any Scheme object) to/from the port.
      It also provides operations to force output to occur, to change
      the way threads block on the port, and to close the port.  Note
      that the class of objects for which write/read invariance is
      guaranteed depends on the particular type of port.

   2) A "character-port" provides all the operations of an object-port,
      and also operations to read and write individual characters to/from
      the port.  When a Scheme object is written to a character-port, it
      is converted into the sequence of characters that corresponds to
      its external-representation.  When reading a Scheme object, an
      inverse conversion occurs.  Note that some Scheme objects do not
      have an external textual representation that can be read back.

   3) An "octet-port" provides all the operations of a character-port, 
and
      also operations to read and write individual octets to/from the
      port.  When a character is written to a octet-port, some encoding
      of that character into a sequence of octets will occur (for 
example,
      #\newline will be encoded as the 2 octets CR-LF when using
      latin1 character encoding and cr-lf end-of-line encoding, and a
      non-ASCII character will generate more than 1 octet when using utf8
      character encoding).  When reading a character, a similar decoding
      occurs.

   4) A "device-port" provides all the operations of an octet-port, and
      also operations to control the operating system managed device
      (file, network connection, terminal, etc) that is connected to the
      port.

The inheritance hierarchy corresponds to this tree (the spine on the
left are all abstract port classes):

       object-port____________________________________________
            |     \          \               \                \
            |    vector-port  directory-port  tcp-server-port  ...
            |
      character-port_________
            |       \        \
            |    string-port  ...
            |
        octet-port_____________
            |     \            \
            |    u8vector-port  ...
            |
       device-port___________________________
                  \          \               \
                 file-port   tcp-client-port  ...

So the result of (open-input-file "foo.txt") would be a device-port
attached to the file "foo.txt".  This port would allow file-specific
operations (such as "seek"), binary I/O (such as reading or writing an
octet or group of octets), character I/O (i.e. write-char and
read-char), and Scheme object I/O (i.e. write and read).

On the other hand, the result of (open-input-string "a 123") would be
a character-port allowing character and Scheme object I/O, but not
binary I/O.  Analogously, the result of (open-input-vector '#(a 123))
would be an object-port allowing Scheme object I/O (with complete
write/read invariance), but not character or binary I/O.

3. PORT SETTINGS

Port settings are parameters specified when a port is created that
affect how I/O operations on that port behave (character encoding,
buffering, etc).  Some port settings are only valid for specific port
classes whereas some others are valid for all ports.  Port settings
that are not specified when a port is created will default to some
reasonable values.  Keyword objects are used to name the settings to
be set.  As a simple example, a device-port connected to the file
"foo" can be created using the call

      (open-input-file "foo")

This will use default settings for the character encoding, buffering,
etc.  If the utf8 character encoding is desired, then the port could be
opened using the call

      (open-input-file (list path: "foo" char-encoding: 'utf8))

Here the argument of the procedure open-input-file has been replaced
by a "port settings list" which specifies the value of each port
setting that should not be set to the default value.  Note that some
port settings have no useful default and it is therefore required to
specify a value for them, such as the "path:" in the case of the file
opening procedures.  All port creation procedures (i.e. named
open-...) take a single argument that can either be a port settings
list or a value of a type that depends on the kind of port being
created (a path string for files, an IP port number for TCP servers,
etc).

4. OBJECT-PORTS

4.1 Object-port settings

The following is a list of port settings that are valid for all types
of ports.

    * direction: ( input | output | input-output )

      This setting controls the direction of the port.  The symbol
      input indicates a unidirectional input-port, the symbol output
      indicates a unidirectional output-port, and the symbol
      input-output indicates a bidirectional port.  The default value
      of this setting depends on the port creation procedure.

    * buffering: ( #f | #t | line )

      This setting controls the buffering of the port.  To set each
      direction separately the keywords input-buffering: and
      output-buffering: must be used instead of buffering:.  The
      value #f selects unbuffered I/O, the value #t selects fully
      buffered I/O, and the symbol line selects line buffered I/O (the
      output buffer is drained when a #\newline character is written).
      Line buffered I/O only applies to character-ports.  The default
      value of this setting depends on the port creation procedure.

4.2 Object-port operations

  - [procedure] (input-port? OBJ)
  - [procedure] (output-port? OBJ)
  - [procedure] (port? OBJ)

      The procedure input-port? returns #t when OBJ is a
      unidirectional input-port or a bidirectional port and #f
      otherwise.

      The procedure output-port? returns #t when OBJ is a
      unidirectional output-port or a bidirectional port and #f
      otherwise.

      The procedure port? returns #t when OBJ is a port (either
      unidirectional or bidirectional) and #f otherwise.

  - [procedure] (read [PORT])

      This procedure reads and returns the next Scheme object from the
      input-port PORT.  The end-of-file object is returned when the end
      of the stream is reached.  If it is not specified, PORT defaults
      to the current input-port.

  - [procedure] (read-all [PORT [READER]])

      This procedure repeatedly calls the procedure READER with PORT as
      the sole argument and accumulates a list of each value returned up
      to the end-of-file object.  The procedure read-all returns the
      accumulated list without the end-of-file object.  If it is not
      specified, PORT defaults to the current input-port.  If it is not
      specified, READER defaults to the procedure read.

      For example:

           > (call-with-input-string "3,2,1\ngo!" read-all)
           (3 ,2 ,1 go!)
           > (call-with-input-string "3,2,1\ngo!"
                                     (lambda (p) (read-all p read-char)))
           (#\3 #\, #\2 #\, #\1 #\newline #\g #\o #\!)
           > (call-with-input-string "3,2,1\ngo!"
                                     (lambda (p) (read-all p read-line)))
           ("3,2,1" "go!")

  - [procedure] (write OBJ [PORT])

      This procedure writes the Scheme object OBJ to the output-port PORT
      and the value returned is unspecified.  If it is not specified,
      PORT defaults to the current output-port.

  - [procedure] (newline [PORT])

      This procedure writes an "object separator" to the output-port
      PORT and the value returned is unspecified.  The separator ensures
      that the next Scheme object written with the write procedure will
      not be confused with the latest object that was written.  On
      character-ports this is done by writing the character #\newline.
      On ports where successive objects are implicitly distinct (such
      as "vector ports") this procedure does nothing.

      Regardless of the class of a port P and assuming that the external
      textual representation of the object X is readable, the expression
      (begin (write X P) (newline P)) will write to P a representation
      of X that can be read back with the procedure read.  If it is
      not specified, PORT defaults to the current output-port.

  - [procedure] (force-output [PORT])

      The procedure force-output causes the output buffers of the
      output-port PORT to be drained (i.e. the data is sent to its
      destination).  If PORT is not specified, the current output-port
      is used.

      For example:

           > (define p (open-tcp-client
                         (list server-address: "www.iro.umontreal.ca"
                               port-number: 80)))
           > (display "GET /\n" p)
           > (force-output p)
           > (read-line p)
           "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 
Transitional//EN\""

  - [procedure] (close-input-port PORT)
  - [procedure] (close-output-port PORT)
  - [procedure] (close-port PORT)

      The PORT argument of these procedures must be a unidirectional or
      a bidirectional port.  For all three procedures the value returned
      is unspecified.

      The procedure close-input-port closes the input-port side of
      PORT, which must not be a unidirectional output-port.

      The procedure close-output-port closes the output-port side of
      PORT, which must not be a unidirectional input-port.  The ouput
      buffers are drained before PORT is closed.

      The procedure close-port closes all sides of the PORT.  Unless
      PORT is a unidirectional input-port, the output buffers are
      drained before PORT is closed.

      For example:

           > (define p (open-tcp-client
                         (list server-address: "www.iro.umontreal.ca"
                               port-number: 80)))
           > (display "GET /\n" p)
           > (close-output-port p)
           > (read-line p)
           "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 
Transitional//EN\""

  - [procedure] (input-port-timeout-set! PORT TIMEOUT [THUNK])
  - [procedure] (output-port-timeout-set! PORT TIMEOUT [THUNK])

      When a thread tries to perform an I/O operation on a port, the
      requested operation may not be immediately possible and the thread
      must wait.  For example, the thread may be trying to read a line of
      text from the console and the user has not typed anything yet, or
      the thread may be trying to write to a network connection faster
      than the network can handle.  In such situations the thread
      normally blocks until the operation becomes possible.

      It is sometimes necessary to guarantee that the thread will not
      block too long or not at all.  For this purpose, to each
      input-port and output-port is attached a "timeout" and
      "timeout-thunk".  The timeout indicates the point in time beyond
      which the thread should stop waiting on an input and output
      operation respectively.  When the timeout is reached, the thread
      calls the port's timeout-thunk.  If the timeout-thunk returns #f
      the thread abandons trying to perform the operation (in the case
      of an input operation an end-of-file is read and in the case of
      an output operation an exception is raised).  Otherwise, the
      thread will block again waiting for the operation to become
      possible (note that if the port's timeout has not changed the
      thread will immediately call the timeout-thunk again if the
      operation is still not possible).

      The procedure input-port-timeout-set! sets the timeout of the
      input-port PORT to TIMEOUT and the timeout-thunk to THUNK.  The
      procedure output-port-timeout-set! sets the timeout of the
      output-port PORT to TIMEOUT and the timeout-thunk to THUNK.  If it
      is not specified, the THUNK defaults to a thunk that returns #f.
      The TIMEOUT is either a time object indicating an absolute point
      in time (see SRFI 18), or it is a real number which indicates the
      number of seconds relative to the moment the procedure is called.
      For both procedures the value returned is unspecified.

      When a port is created the timeout is set to infinity (+inf.).
      This causes the thread to wait as long as needed for the operation
      to become possible.  Setting the timeout to a point in the past
      (-inf.) will cause the thread to attempt the I/O operation and
      never block (i.e. the timeout-thunk is called if the operation is
      not immediately possible).

****************
      The following example shows how to cause the REPL to terminate
      when the user does not enter an expression within the next 60
      seconds.

           > (input-port-timeout-set! (repl-input-port) 60)
           >
           *** EOF again to exit

5. CHARACTER-PORTS

5.1 Character-port settings

The following is a list of port settings that are valid for
character-ports.

    * output-width: POSITIVE-INTEGER

      This setting indicates the width of the character output-port in
      number of characters.  This information could be used by a
      pretty-printing procedure.  The default value of this setting is
      80.

    [[
    If R6RS is extended in the future to support configuring the
    reader and writer with "readtable" objects, a setting for readtable
    could be added here:

    * readtable: READTABLE

      This setting determines the readtable attached to the
      character-port.  To set each direction separately the keywords
      input-readtable: and output-readtable: must be used instead of
      readtable:.  Readtables control the external textual
      representation of Scheme objects, that is the encoding of Scheme
      objects using characters.  The behavior of the read procedure
      depends on the port's input-readtable and the behavior of the
      procedures write, pretty-print, and related procedures is
      affected by the port's output-readtable.  The default value of this
      setting is the value bound to the parameter object
      current-readtable.
    ]]

5.2 Character-port operations

  - [procedure] (input-port-line PORT)
  - [procedure] (input-port-column PORT)
  - [procedure] (output-port-line PORT)
  - [procedure] (output-port-column PORT)

      The current character location of a character input-port is the
      location of the next character to read.  The current character
      location of a character output-port is the location of the next
      character to write.  Location is denoted by a line number (the
      first line is line 1) and a column number, that is the location on
      the current line (the first column is column 1).  The procedures
      input-port-line and input-port-column return the line location
      and the column location respectively of the character input-port
      PORT.  The procedures output-port-line and output-port-column
      return the line location and the column location respectively of
      the character output-port PORT.

      For example:

           > (call-with-output-string
               '()
               (lambda (p)
                 (display "abc\n123def" p)
                 (write (list (output-port-line p) (output-port-column 
p))
                        p)))
           "abc\n123def(2 7)"

  - [procedure] (output-port-width PORT)

      This procedure returns the width, in characters, of the character
      output-port PORT.  The value returned is the port's output-width
      setting.

      For example:

           > (output-port-width (current-output-port))
           80

  - [procedure] (read-char [PORT])

      This procedure reads the character input-port PORT and returns the
      character at the current character location and advances the
      current character location to the next character, unless the PORT
      is already at end-of-file in which case read-char returns the
      end-of-file object.  If it is not specified, PORT defaults to the
      current input-port.

  - [procedure] (peek-char [PORT])

      This procedure returns the same result as read-char but it does
      not advance the current character location of the input-port PORT.
      If it is not specified, PORT defaults to the current input-port.

  - [procedure] (write-char CHAR [PORT])

      This procedure writes the character CHAR to the character
      output-port PORT and advances the current character location of
      that output-port.  The value returned is unspecified.  If it is not
      specified, PORT defaults to the current output-port.

  - [procedure] (read-line [PORT [SEPARATOR [INCLUDE-SEPARATOR?]]])

      This procedure reads characters from the character input-port PORT
      until a specific SEPARATOR or the end-of-file is encountered and
      returns a string containing the sequence of characters read.  The
      SEPARATOR is included at the end of the string only if it was the
      last character read and INCLUDE-SEPARATOR? is not #f.  The
      SEPARATOR must be a character or #f (in which case all the
      characters until the end-of-file are read).  If it is not
      specified, PORT defaults to the current input-port.  If it is not
      specified, SEPARATOR defaults to #\newline.  If it is not
      specified, INCLUDE-SEPARATOR? defaults to #f.

      For example:

           > (define (split sep)
               (lambda (str)
                 (call-with-input-string
                   str
                   (lambda (p)
                     (read-all p (lambda (p) (read-line p sep)))))))
           > ((split #\,) "a,b,c")
           ("a" "b" "c")
           > (call-with-input-string "1,2,3\n4,5"
                                     (lambda (p)
                                       (map (split #\,)
                                            (read-all p read-line))))
           (("1" "2" "3") ("4" "5"))

  - [procedure] (read-substring STRING START END [PORT])
  - [procedure] (write-substring STRING START END [PORT])

      These procedures support bulk character I/O.  The part of the
      string STRING starting at index START and ending just before
      index END is used as a character buffer that will be the target
      of read-substring or the source of the write-substring.  Up to
      END-START characters will be transferred.  The number of
      characters transferred, possibly zero, is returned by these
      procedures.  Fewer characters will be read by read-substring if
      an end-of-file is read, or a timeout occurs before all the
      requested characters are transferred and the timeout thunk
      returns #f (see the procedure input-port-timeout-set!).  Fewer
      characters will be written by write-substring if a timeout occurs
      before all the requested characters are transferred and the
      timeout thunk returns #f (see the procedure
      output-port-timeout-set!).  If it is not specified, PORT defaults
      to the current input-port and current output-port respectively.

      For example:

           > (define s (make-string 10 #\x))
           > (read-substring s 2 5)123456789
           3
           > 456789
           > s
           "xx123xxxxx"

6. OCTET-PORTS

6.1 Octet-port settings

The following is a list of port settings that are valid for
octet-ports.

    * char-encoding: ENCODING

      This setting controls the character encoding of the octet-port.
      For bidirectional octet-ports, the character encoding for input
      and output is set.  To set each direction separately the keywords
      input-char-encoding: and output-char-encoding: must be used
      instead of char-encoding:.  The default value of this setting is
      operating system dependent.  The following encodings are
      supported:

     latin1
           LATIN1 character encoding.  Each character is encoded by a
           single octet.  Only Unicode characters with a code in the
           range 0 to 255 are allowed.

     utf8
           UTF8 character encoding.  Each character is encoded by a
           sequence of one to four octets.

     utf16
           Each character is encoded by 16 or 32 bits, i.e. two or four
           octets.  Each 16 bit chunk may be encoded using little-endian
           encoding or big-endian encoding.  If the port is an
           input-port and the first two octets read are a BOM ("Byte
           Order Mark" character with hexadecimal code FEFF) then the
           BOM will be discarded and the endianness will be set
           accordingly, otherwise the endianness depends on the
           operating system.  If the port is an output-port then a BOM
           will be output at the beginning of the stream and the
           endianness depends on the operating system.

     utf16le
           UTF16 character encoding with little-endian endianness.
           Each character is encoded by 16 or 32 bits, i.e. two or four
           octets.  No BOM processing is done.

     utf16be
           UTF16 character encoding with big-endian endianness.
           Each character is encoded by 16 or 32 bits, i.e. two or four
           octets.  No BOM processing is done.

      Other encodings could be added, such as: ascii, ucs2, ucs2le,
      ucs2be, ucs4, ucs4le, ucs4be, native, and ebcdic.

    * eol-encoding: ENCODING

      This setting controls the end-of-line encoding of the octet-port.
      To set each direction separately the keywords input-eol-encoding:
      and output-eol-encoding: must be used instead of eol-encoding:.
      The default value of this setting is operating system dependent.
      Note that for output-ports the end-of-line encoding is applied
      before the character encoding, and for input-ports it is applied
      after.  The following encodings are supported:

     lf    For an output-port, writing a #\newline character outputs a
           #\linefeed character to the stream (Unicode character code
           10).  For an input-port, a #\newline character is read when
           a #\linefeed character is encountered on the stream.  Note
           that #\linefeed and #\newline are two names for the same
           character, so this end-of-line encoding is actually the
           identity function.  Text files created by UNIX applications
           typically use this end-of-line encoding.

     cr    For an output-port, writing a #\newline character outputs a
           #\return character to the stream (Unicode character code
           10).  For an input-port, a #\newline character is read when
           a #\linefeed character or a #\return character is
           encountered on the stream.  Text files created by Classic Mac
           OS applications typically use this end-of-line encoding.

     cr-lf For an output-port, writing a #\newline character outputs to
           the stream a #\return character followed by a #\linefeed
           character.  For an input-port, a #\newline character is read
           when a #\linefeed character or a #\return character is
           encountered on the stream.  Moreover, if this character is
           immediately followed by the opposite character (#\linefeed
           followed by #\return or #\return followed by #\linefeed)
           then the second character is ignored.  In other words, all
           four possible end-of-line encodings are read as a single
           #\newline character.  Text files created by DOS and
           Microsoft Windows applications typically use this
           end-of-line encoding.

6.2 Octet-port operations

  When using a buffered octet-port, the read-u8 and read-subu8vector
  procedures specified in this section must be called before any use of
  the port in a character input operation (i.e. a call to the
  procedures read, read-char, peek-char, etc) because otherwise the
  character-stream and octet-stream may be out of sync due to the port
  buffering.

  - [procedure] (read-u8 [PORT])

      This procedure reads the octet input-port PORT and returns the 
octet
      at the current octet location and advances the current octet
      location to the next octet, unless the PORT is already at
      end-of-file in which case read-u8 returns the end-of-file
      object.  If it is not specified, PORT defaults to the current
      input-port.

      For example:

           > (call-with-input-u8vector
               '#u8(11 22 33 44)
               (lambda (p)
                 (let ((a (read-u8 p))) (list a (read-u8 p)))))
           (11 22)
           > (call-with-input-u8vector '#u8() read-u8)
           #!eof

  - [procedure] (write-u8 N [PORT])

      This procedure writes the octet N to the octet output-port PORT and
      advances the current octet location of that output-port.  The value
      returned is unspecified.  If it is not specified, PORT defaults to
      the current output-port.

      For example:

           > (call-with-output-u8vector '() (lambda (p) (write-u8 33 p)))
           #u8(33)

  - [procedure] (read-subu8vector U8VECTOR START END [PORT])
  - [procedure] (write-subu8vector U8VECTOR START END [PORT])

      These procedures support bulk binary I/O.  The part of the u8vector
      U8VECTOR starting at index START and ending just before index END
      is used as an octet buffer that will be the target of
      read-subu8vector or the source of the write-subu8vector.  Up
      to END-START octets will be transferred.  The number of octets
      transferred, possibly zero, is returned by these procedures.
      Fewer octets will be read by read-subu8vector if an end-of-file
      is read, or a timeout occurs before all the requested octets are
      transferred and the timeout thunk returns #f (see the procedure
      input-port-timeout-set!).  Fewer octets will be written by
      write-subu8vector if a timeout occurs before all the requested
      octets are transferred and the timeout thunk returns #f (see the
      procedure output-port-timeout-set!).  If it is not specified,
      PORT defaults to the current input-port and current output-port
      respectively.

      For example, assuming the console is using latin1 character
      encoding:

           > (define v (make-u8vector 10))
           > (read-subu8vector v 2 5)123456789
           3
           > 456789
           > v
           #u8(0 0 49 50 51 0 0 0 0 0)

7. DEVICE-PORTS

7.1 Filesystem devices

  - [procedure] (open-file PATH-OR-SETTINGS)
  - [procedure] (open-input-file PATH-OR-SETTINGS)
  - [procedure] (open-output-file PATH-OR-SETTINGS)
  - [procedure] (call-with-input-file PATH-OR-SETTINGS PROC)
  - [procedure] (call-with-output-file PATH-OR-SETTINGS PROC)
  - [procedure] (with-input-from-file PATH-OR-SETTINGS THUNK)
  - [procedure] (with-output-to-file PATH-OR-SETTINGS THUNK)

      All of these procedures create a port to interface to an 
octet-stream
      device (such as a file, console, serial port, named pipe, etc)
      whose name is given by a path of the filesystem.  The direction:
      setting will default to the value input for the procedures
      open-input-file, call-with-input-file and
      with-input-from-file, to the value output for the procedures
      open-output-file, call-with-output-file and
      with-output-to-file, and to the value input-output for the
      procedure open-file.  The procedures open-file,
      open-input-file and open-output-file return the port that is
      created.  The procedures call-with-input-file and
      call-with-output-file call the procedure PROC with the port as
      single argument, and then return the value(s) of this call after
      closing the port.  The procedures with-input-from-file and
      with-output-to-file dynamically bind the current input-port and
      current output-port respectively to the port created for the
      duration of a call to the procedure THUNK with no argument.  The
      value(s) of the call to THUNK are returned after closing the port.

      The first argument of these procedures is either a string denoting
      a filesystem path or a list of port settings which must contain a
      path: setting.  Here are the settings allowed in addition to the
      generic settings of octet-ports:

         * path: STRING

           This setting indicates the location of the file in the
           filesystem.  There is no default value for this setting.

         * append: ( #f | #t )

           This setting controls whether output will be added to the end
           of the file.  This is useful for writing to log files that
           might be open by more than one process.  The default value of
           this setting is #f.

         * create: ( #f | #t | maybe )

           This setting controls whether the file will be created when
           it is opened.  A setting of #f requires that the file exist
           (otherwise an exception is raised).  A setting of #t
           requires that the file does not exist (otherwise an exception
           is raised).  A setting of maybe will create the file if it
           does not exist.  The default value of this setting is maybe
           for output-ports and #f for input-ports and bidirectional
           ports.

         * permissions: 12-BIT-EXACT-INTEGER

           This setting controls the UNIX permissions that will be
           attached to the file if it is created.  The default value of
           this setting is #o666.

         * truncate: ( #f | #t )

           This setting controls whether the file will be truncated when
           it is opened.  For input-ports, the default value of this
           setting is #f.  For output-ports, the default value of this
           setting is #t when the append: setting is #f, and #f
           otherwise.

      For example:

           > (with-output-to-file
               (list path: "nofile"
                     create: #f)
               (lambda ()
                 (display "hello world!\n")))
           *** ERROR IN (console)@1.1 -- No such file or directory
           (with-output-to-file '(path: "nofile" create: #f) 
'#<procedure #2>)

  - [procedure] (input-port-u8-position PORT [POSITION [WHENCE]])
  - [procedure] (output-port-u8-position PORT [POSITION [WHENCE]])

      When called with a single argument these procedures return the
      octet position where the next I/O operation would take place in
      the file attached to the given PORT (relative to the beginning of
      the file).  When called with two or three arguments, the octet
      position for subsequent I/O operations on the given PORT is
      changed to POSITION, which must be an exact integer.  When WHENCE
      is omitted or is the symbol "start", the POSITION is relative to
      the beginning of the file.  When WHENCE is the symbol "current",
      the POSITION is relative to the current octet position of the
      file.  When WHENCE is the symbol "end", the POSITION is relative
      to the end of the file.  The return value is the new octet
      position.  On most operating systems the octet position for
      reading and writing of a given bidirectional port are the same.

      When input-port-u8-position is called to change the octet
      position of an input-port, all input buffers will be flushed so
      that the next octet read will be the one at the given position.

      When output-port-u8-position is called to change the octet
      position of an output-port, there is an implicit call to
      force-output before the position is changed.

      For example:

           > (define p  ; p is an input-output-port
               (open-file '(path: "test" char-encoding: latin1 create: 
maybe)))
           > (list (input-port-u8-position p) (output-port-u8-position 
p))
           (0 0)
           > (display "abcdefghij\n" p)
           > (list (input-port-u8-position p) (output-port-u8-position 
p))
           (0 0)
           > (force-output p)
           > (list (input-port-u8-position p) (output-port-u8-position 
p))
           (11 11)
           > (input-port-u8-position p 2)
           2
           > (list (input-port-u8-position p) (output-port-u8-position 
p))
           (2 2)
           > (peek-char p)
           #\c
           > (list (input-port-u8-position p) (output-port-u8-position 
p))
           (11 11)
           > (output-port-u8-position p -7 2)
           4
           > (list (input-port-u8-position p) (output-port-u8-position 
p))
           (4 4)
           > (write-char #\! p)
           > (list (input-port-u8-position p) (output-port-u8-position 
p))
           (4 4)
           > (force-output p)
           > (list (input-port-u8-position p) (output-port-u8-position 
p))
           (5 5)
           > (input-port-u8-position p 1)
           1
           > (read p)
           bcd!fghij

7.2 Process devices

  - [procedure] (open-process PATH-OR-SETTINGS)

      This procedure starts a new operating system process and returns
      a port that allows communication with that process on its
      standard input and standard output.  The default value of the
      direction: setting is input-output, i.e. the Scheme program can
      write to the process' standard input and can read from the
      process' standard output.

      The first argument of this procedure is either a string denoting a
      filesystem path of an executable program or a list of port settings
      which must contain a path: setting.  Here are the settings
      allowed in addition to the generic settings of octet-ports:

         * path: STRING

           This setting indicates the location of the executable program
           in the filesystem.  There is no default value for this
           setting.

         * arguments: LIST-OF-STRINGS

           This setting indicates the string arguments that are passed
           to the program.  The default value of this setting is the
           empty list (i.e. no arguments).

         * environment: LIST-OF-STRINGS

           This setting indicates the set of environment variable
           bindings that the process receives.  Each element of the list
           is a string of the form "VAR=VALUE", where VAR is the
           name of the variable and VALUE is its binding.  If
           LIST-OF-STRINGS is #f, the process inherits the environment
           variable bindings of the Scheme program.  The default value
           of this setting is #f.

         * stderr-redirection: ( #f | #t )

           This setting indicates how the standard error of the process
           is redirected.  A setting of #t will redirect the standard
           error to the standard output (i.e. all output to standard
           error can be read from the process-port).  A setting of #f
           will leave the standard error as-is, which typically results
           in error messages being output to the console.  The default
           value of this setting is #f.

         * pseudo-terminal: ( #f | #t )

           This setting indicates what type of device will be bound to
           the process' standard input and standard output.  A setting
           of #t will use a pseudo-terminal device (this is a device
           that behaves like a tty device even though there is no real
           terminal or user directly involved).  A setting of #f will
           use a pair of pipes.  The difference is important for
           programs which behave differently when they are used
           interactively, for example shells.  The default value of this
           setting is #f.

      For example:

           > (define p (open-process (list path: "/bin/ls"
                                           arguments: '("../examples"))))
           > (read-line p)
           "complex"
           > (read-line p)
           "README"
           > (close-port p)
           > (define p (open-process "/usr/bin/dc"))
           > (display "2 100 ^ p\n" p)
           > (force-output p)
           > (read-line p)
           "1267650600228229401496703205376"

7.3 Network devices

  - [procedure] (open-tcp-client SETTINGS)

      This procedure opens a network connection to a TCP/IP server and
      returns a tcp-client-port (a subtype of device-port) that
      represents this connection and allows communication with that
      server.  The default value of the direction: setting is
      input-output, i.e. the Scheme program can send information to
      the server and receive information from the server.  The sending
      direction can be "shutdown" using the close-output-port
      procedure and the receiving direction can be "shutdown" using the
      close-input-port procedure.  The close-port procedure closes
      both directions of the connection.

      The first argument of this procedure is a list of port settings
      which must contain a server-address: setting and a
      port-number: setting.  Here are the settings allowed in addition
      to the generic settings of octet-ports:

         * server-address: STRING-OR-U8VECTOR

           This setting indicates the internet address of the server.
           It can be a string denoting a host name, which will be
           translated to an IP address by the host-info procedure, or
           a 4 or 16 element u8vector which contains the 32-bit IPv4 or
           128-bit IPv6 address respectively.  There is no default value
           for this setting.

         * port-number: 16-BIT-EXACT-INTEGER

           This setting indicates the IP port-number of the server to
           connect to (e.g. 80 for the standard HTTP server, 23 for the
           standard telnet server).  There is no default value for this
           setting.

         * keep-alive: ( #f | #t )

           This setting controls the use of the "keep alive" option on
           the connection.  The "keep alive" option will periodically
           send control packets on otherwise idle network connections to
           ensure that the server host is active and reachable.  The
           default value of this setting is #f.

         * coalesce: ( #f | #t )

           This setting controls the use of TCP's "Nagle algorithm" which
           reduces the number of small packets by delaying their
           transmission and coalescing them into larger packets.  A
           setting of #t will coalesce small packets into larger ones.
           A setting of #f will transmit packets as soon as possible.
           The default value of this setting is #f.  Note that this
           setting does not affect the buffering of the port.

      Here is an example of the client-side code that opens a connection
      to an HTTP server on port 8080 on the same computer (for the
      server-side code see the example for the procedure
      open-tcp-server):

           > (define p (open-tcp-client
                         (list server-address: '#u8(127 0 0 1)
                               port-number: 8080
                               eol-encoding: 'cr-lf)))
           > p
           #<input-output-port #2 (tcp-client #u8(127 0 0 1) 8080)>
           > (display "GET / HTTP/1.1\n" p)
           > (force-output p)
           > (read-line p)
           "<HTML>"

  - [procedure] (open-tcp-server PORT-NUMBER-OR-SETTINGS)

      This procedure sets up a socket to accept network connection
      requests from clients and returns a tcp-server-port from which
      network connections to clients are obtained.  Tcp-server-ports are
      a direct subtype of object-ports (i.e. they are not
      character-ports) and are input-ports.  Reading from a
      tcp-server-port with the read procedure will block until a
      network connection request is received from a client.  The read
      procedure will then return a tcp-client-port (a subtype of
      device-port) that represents this connection and allows
      communication with that client.  Closing a tcp-server-port with
      either the close-input-port or close-port procedures will
      cause the network subsystem to stop accepting connections on that
      socket.

      The first argument of this procedure is an IP port-number (16-bit
      nonnegative exact integer) or a list of port settings which must
      contain a port-number: setting.  Below is a list of the settings
      allowed in addition to the settings keep-alive: and coalesce:
      allowed by the open-tcp-client procedure and the generic
      settings of octet-ports.  The settings which are not listed below
      apply to the tcp-client-port that is returned by read when a
      connection is accepted and have the same meaning as if they were
      used in a call to the open-tcp-client procedure.

         * port-number: 16-BIT-EXACT-INTEGER

           This setting indicates the IP port-number assigned to the
           socket which accepts connection requests from clients.
           There is no default value for this setting.

         * backlog: POSITIVE-EXACT-INTEGER

           This setting indicates the maximum number of connection
           requests that can be waiting to be accepted by a call to
           read (technically it is the value passed as the second
           argument of the UNIX listen() function).  The default value
           of this setting is 128.

         * reuse-address: ( #f | #t )

           This setting controls whether it is possible to assign a
           port-number that is currently active.  Note that when a
           server process terminates, the socket it was using to accept
           connection requests does not become inactive immediately.
           Instead it remains active for a few minutes to ensure clean
           termination of the connections.  A setting of #f will cause
           an exception to be raised in that case.  A setting of #t
           will allow a port-number to be used even if it is active.
           The default value of this setting is #t.

      Here is an example of the server-side code that accepts
      connections on port 8080 (for the client-side code see the example
      for the procedure open-tcp-client):

           > (define s (open-tcp-server (list port-number: 8080
                                              eol-encoding: 'cr-lf)))
           > (define p (read s))  ; blocks until client connects
           > p
           #<input-output-port #2 (tcp-client 8080)>
           > (read-line p)
           "GET / HTTP/1.1"
           > (display "<HTML>\n" p)
           > (force-output p)

7.4 Directory-ports

  - [procedure] (open-directory PATH-OR-SETTINGS)

      This procedure opens a directory of the filesystem for reading its
      entries and returns a directory-port from which the entries can be
      enumerated.  Directory-ports are a direct subtype of object-ports
      (i.e. they are not character-ports) and are input-ports.  Reading
      from a directory-port with the read procedure returns the next
      file name in the directory as a string.  The end-of-file object is
      returned when all the file names have been enumerated.  Another
      way to get the list of all files in a directory is the
      directory-files procedure which returns a list of the files in
      the directory.  The advantage of using directory-ports is that it
      allows iterating over the files in a directory in constant space,
      which is interesting when the number of files in the directory is
      not known in advance and may be large.  Note that the order in
      which the names are returned is operating-system dependent.

      The first argument of this procedure is either a string denoting a
      filesystem path to a directory or a list of port settings which
      must contain a path: setting.  Here are the settings allowed in
      addition to the generic settings of object-ports:

         * path: STRING

           This setting indicates the location of the directory in the
           filesystem.  There is no default value for this setting.

         * ignore-hidden: ( #f | #t | dot-and-dot-dot )

           This setting controls whether hidden-files will be returned.
           Under UNIX and Mac OS X hidden-files are those that start
           with a period (such as ., .., and .profile).  Under
           Microsoft Windows hidden files are the . and .. entries
           and the files whose "hidden file" attribute is set.  A
           setting of #f will enumerate all the files.  A setting of
           #t will only enumerate the files that are not hidden.  A
           setting of dot-and-dot-dot will enumerate all the files
           except for the . and .. hidden files.  The default value
           of this setting is #t.

      For example:

           > (let ((p (open-directory (list path: "../examples"
                                            ignore-hidden: #f))))
               (let loop ()
                 (let ((fn (read p)))
                   (if (string? fn)
                       (begin
                         (write fn)
                         (newline)
                         (loop)))))
               (close-input-port p))
           "."
           ".."
           "complex"
           "README"
           "simple"
           > (define x (open-directory "../examples"))
           > (read-all x)
           ("complex" "README" "simple")

  - [procedure] (directory-files [PATH-OR-SETTINGS])

      This procedure returns the list of the files in a directory.  The
      argument PATH-OR-SETTINGS is either a string denoting a filesystem
      path to a directory or a list of settings which must contain a
      path: setting.  If it is not specified, PATH-OR-SETTINGS
      defaults to the current directory (the value bound to the
      current-directory parameter object).  Here are the settings
      allowed:

         * path: STRING

           This setting indicates the location of the directory in the
           filesystem.  There is no default value for this setting.

         * ignore-hidden: ( #f | #t | dot-and-dot-dot )

           This setting controls whether hidden-files will be returned.
           Under UNIX and Mac OS X hidden-files are those that start
           with a period (such as ., .., and .profile).  Under
           Microsoft Windows hidden files are the . and .. entries
           and the files whose "hidden file" attribute is set.  A
           setting of #f will enumerate all the files.  A setting of
           #t will only enumerate the files that are not hidden.  A
           setting of dot-and-dot-dot will enumerate all the files
           except for the . and .. hidden files.  The default value
           of this setting is #t.

      For example:

           > (directory-files)
           ("complex" "README" "simple")
           > (directory-files "../include")
           ("config.h" "config.h.in" "foo.h" "makefile" "makefile.in")
           > (directory-files (list path: "../include" ignore-hidden: 
#f))
           ("." ".." "config.h" "config.h.in" "foo.h" "makefile" 
"makefile.in")

8. VECTOR-PORTS

  - [procedure] (open-vector [VECTOR-OR-SETTINGS])
  - [procedure] (open-input-vector [VECTOR-OR-SETTINGS])
  - [procedure] (open-output-vector [VECTOR-OR-SETTINGS])
  - [procedure] (call-with-input-vector VECTOR-OR-SETTINGS PROC)
  - [procedure] (call-with-output-vector VECTOR-OR-SETTINGS PROC)
  - [procedure] (with-input-from-vector VECTOR-OR-SETTINGS THUNK)
  - [procedure] (with-output-to-vector VECTOR-OR-SETTINGS THUNK)

      Vector-ports represent streams of Scheme objects.  They are a
      direct subtype of object-ports (i.e. they are not
      character-ports).  All of these procedures create vector-ports
      that are either unidirectional or bidirectional.  The direction:
      setting will default to the value input for the procedures
      open-input-vector, call-with-input-vector and
      with-input-from-vector, to the value output for the procedures
      open-output-vector, call-with-output-vector and
      with-output-to-vector, and to the value input-output for the
      procedure open-vector.  Bidirectional vector-ports behave like
      FIFOs: data written to the port is added to the end of the stream
      that is read.  It is only when a bidirectional vector-port's
      output-side is closed with a call to the close-output-port
      procedure that the stream's end is known (when the stream's end is
      reached, reading the port returns the end-of-file object).

      The procedures open-vector, open-input-vector and
      open-output-vector return the port that is created.  The
      procedures call-with-input-vector and call-with-output-vector
      call the procedure PROC with the port as single argument, and then
      return the value(s) of this call after closing the port.  The
      procedures with-input-from-vector and with-output-to-vector
      dynamically bind the current input-port and current output-port
      respectively to the port created for the duration of a call to the
      procedure THUNK with no argument.  The value(s) of the call to
      THUNK are returned after closing the port.

      The first argument of these procedures is either a vector of the
      elements used to initialize the stream or a list of port settings.
      If it is not specified, the argument of the open-vector,
      open-input-vector, and open-output-vector procedures defaults
      to an empty list of port settings.  Here are the settings allowed
      in addition to the generic settings of object-ports:

         * init: VECTOR

           This setting indicates the initial content of the stream.
           The default value of this setting is an empty vector.

         * permanent-close: ( #f | #t )

           This setting controls whether a call to the procedures
           close-output-port will close the output-side of a
           bidirectional vector-port permanently or not.  A permanently
           closed bidirectional vector-port whose end-of-file has been
           reached on the input-side will return the end-of-file object
           for all subsequent calls to the read procedure.  A
           non-permanently closed bidirectional vector-port will return
           to its opened state when its end-of-file is read.  The
           default value of this setting is #t.

      For example:

           > (define p (open-vector))
           > (write 1 p)
           > (write 2 p)
           > (write 3 p)
           > (read p)
           1
           > (read p)
           2
           > (close-output-port p)
           > (read p)
           3
           > (read p)
           #!eof

  - [procedure] (open-vector-pipe [VECTOR-OR-SETTINGS1 
[VECTOR-OR-SETTINGS2]])

      The procedure open-vector-pipe creates two vector-ports and
      returns these two ports.  The two ports are interrelated as
      follows: the first port's output-side is connected to the second
      port's input-side and the first port's input-side is connected to
      the second port's output-side.  The value VECTOR-OR-SETTINGS1 is
      used to setup the first vector-port and VECTOR-OR-SETTINGS2 is
      used to setup the second vector-port.  The same settings as for
      open-vector are allowed.  The default direction: setting is
      input-output (i.e. a bidirectional port is created).  If it is
      not specified VECTOR-OR-SETTINGS1 defaults to the empty list.  If
      it is not specified VECTOR-OR-SETTINGS2 defaults to
      VECTOR-OR-SETTINGS1 but with the init: setting set to the empty
      vector and with the input and output settings exchanged (e.g. if
      the first port is an input-port then the second port is an
      output-port, if the first port's input-side is non-buffered then
      the second port's output-side is non-buffered).

      For example:

           > (define (server op)
               (receive (c s) (open-vector-pipe) ; client-side and 
server-side ports
                 (thread-start!
                   (make-thread
                     (lambda ()
                       (let loop ()
                         (let ((request (read s)))
                           (if (not (eof-object? request))
                               (begin
                                 (write (op request) s)
                                 (newline s)
                                 (force-output s)
                                 (loop))))))))
                 c))
           > (define a (server (lambda (x) (expt 2 x))))
           > (define b (server (lambda (x) (expt 10 x))))
           > (write 100 a)
           > (write 30 b)
           > (read a)
           1267650600228229401496703205376
           > (read b)
           1000000000000000000000000000000

  - [procedure] (get-output-vector VECTOR-PORT)

      The procedure get-output-vector takes an output vector-port or a
      bidirectional vector-port as argument and removes all the objects
      currently on the output-side, returning them in a vector.  The port
      remains open and subsequent output to the port and calls to the
      procedure get-output-vector are possible.

      For example:

           > (define p (open-vector '#(1 2 3)))
           > (write 4 p)
           > (get-output-vector p)
           #(1 2 3 4)
           > (write 5 p)
           > (write 6 p)
           > (get-output-vector p)
           #(5 6)

9. STRING-PORTS

  - [procedure] (open-string [STRING-OR-SETTINGS])
  - [procedure] (open-input-string [STRING-OR-SETTINGS])
  - [procedure] (open-output-string [STRING-OR-SETTINGS])
  - [procedure] (call-with-input-string STRING-OR-SETTINGS PROC)
  - [procedure] (call-with-output-string STRING-OR-SETTINGS PROC)
  - [procedure] (with-input-from-string STRING-OR-SETTINGS THUNK)
  - [procedure] (with-output-to-string STRING-OR-SETTINGS THUNK)
  - [procedure] (open-string-pipe [STRING-OR-SETTINGS1 
[STRING-OR-SETTINGS2]])
  - [procedure] (get-output-string STRING-PORT)

      String-ports represent streams of characters.  They are a direct
      subtype of character-ports.  These procedures are the string-port
      analog of the procedures specified in the vector-ports section.
      Note that these procedures are a superset of the procedures
      specified in the "Basic String Ports SRFI" (SRFI 6).

  - [procedure] (object->string OBJ [N])

      This procedure converts the object OBJ to its external
      representation and returns it in a string.  The parameter N
      specifies the maximal width of the resulting string.  If the
      external representation is wider than N, the resulting string will
      be truncated to N characters and the last 3 characters will be set
      to periods.

10. U8VECTOR-PORTS

  - [procedure] (open-u8vector [U8VECTOR-OR-SETTINGS])
  - [procedure] (open-input-u8vector [U8VECTOR-OR-SETTINGS])
  - [procedure] (open-output-u8vector [U8VECTOR-OR-SETTINGS])
  - [procedure] (call-with-input-u8vector U8VECTOR-OR-SETTINGS PROC)
  - [procedure] (call-with-output-u8vector U8VECTOR-OR-SETTINGS PROC)
  - [procedure] (with-input-from-u8vector U8VECTOR-OR-SETTINGS THUNK)
  - [procedure] (with-output-to-u8vector U8VECTOR-OR-SETTINGS THUNK)
  - [procedure] (open-u8vector-pipe [U8VECTOR-OR-SETTINGS1 
[U8VECTOR-OR-SETTINGS2]])
  - [procedure] (get-output-u8vector U8VECTOR-PORT)

      U8vector-ports represent streams of octets.  They are a direct
      subtype of octet-ports.  These procedures are the u8vector-port
      analog of the procedures specified in the vector-ports section.



More information about the R6RS mailing list