Formal comment #68 (defect) R6RS must provide a UTF-16 codec, because UTF-16 is an essential encoding Reported by: John Cowan Component: i/o Version: 5.91 R6RS implementations are currently required to support the UTF-8, Latin-1 (ISO 8859-1), UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE encodings. This list omits the essential UTF-16 encoding. The difference between UTF-16 and UTF-16{BE,LE} is that in the former, the presence of a BOM (U+FEFF) character at the beginning of the input stream indicates the ordering of the bytes that make up each character. The BOM is not considered part of the content. (If no BOM is present, the environment's default ordering is used; failing that, big-endian order is used.) In the UTF-16BE and UTF-16LE encodings, no BOM is permitted; an initial U+FEFF character has its alternative semantics of zero-width no-break space. These encodings are far less commonly used than the UTF-16 encoding. In particular, the Windows operating system consistently creates UTF-16 documents in little-endian order (not UTF-16LE documents) whenever characters must be written that are not available in the locale-dependent encoding. In essence, Windows systems provide two different encodings at any one time: the "ANSI" (locale-dependent, 8-bit or 8/16-bit) encoding, and the UTF-16 encoding. (The MS-DOS compatibility support provides a third encoding for use by MS-DOS programs.) Failing to provide a UTF-16 codec will make it unnecessarily hard to process Unicode documents generated by Windows. In addition, UTF-16 (not UTF-16LE or UTF-16BE) is one of the two encodings which all XML processors (parsers) are required to accept, the other being UTF-8. Depending on the predominant language of the document, UTF-16 encoding may be more or less compact than UTF-8 encoding. Failing to provide a UTF-16 codec will make a substantial range of XML documents difficult to process. I propose that a procedure named "utf-16-codec" be added to section 15.3.3 (p. 86). I further propose that the codecs for the rarely used UTF-{16,32}{BE,LE} encodings be removed. No form of UTF-32 encoding is in common use in I/O, though UTF-32 format is sometimes convenient for internal use. RESPONSE: The next draft of the report will reflect these suggestions.