Chapter 2

Bytevectors

Many applications deal with blocks of binary data by accessing them in various ways—extracting signed or unsigned numbers of various sizes. Therefore, the (rnrs bytevector (6))library provides a single type for blocks of binary data with multiple ways to access that data. It deals with integers and floating-point representations in various sizes with specified endianness, because these are the most frequent applications.

Bytevectorsare objects of a disjoint type. Conceptually, a bytevector represents a sequence of 8-bit bytes. The description of bytevectors uses the term byte for an exact integer in the interval { - 128, ..., 127} and the term octet for an exact integer in the interval {0, ..., 255}. A byte corresponds to its two’s complement representation as an octet.

The length of a bytevector is the number of bytes it contains. This number is fixed. A valid index into a bytevector is an exact, non-negative integer. The first byte of a bytevector has index 0; the last byte has an index one less than the length of the bytevector.

Generally, the access procedures come in different flavors according to the size of the represented integer and the endianness of the representation. The procedures also distinguish signed and unsigned representations. The signed representations all use two’s complement.

Like list and vector literals, literals representing bytevectors must be quoted:

’#vu8(12 23 123) ⇒ #vu8(12 23 123)

2.1 Endianness

Many operations described in this chapter accept an endianness argument. Endianness describes the encoding of exact integers as several contiguous bytes in a bytevector. For this purpose, the binary representation of the integer is split into consecutive bytes. The little-endian encoding places the least significant byte of an integer first, with the other bytes following in increasing order of significance. The big-endian encoding places the most significant byte of an integer first, with the other bytes following in decreasing order of significance.

This terminology also applies to IEEE-754 numbers: IEEE-754 describes how to represent a floating-point number as an exact integer, and endianness describes how the bytes of such an integer are laid out in a bytevector.

Note: Little- and big-endianness are only the most common kinds of endianness. Some architectures distinguish between the endianness at different levels of a binary representation.

2.2 General operations

(endianness <endianness symbol>) syntax

<Endianness symbol> must be a symbol describing an endianness. An implementation must support at least the symbols big and little, but may support other endianness symbols. (endianness <endianness symbol>) evaluates to the symbol named <endianness symbol>. Whenever one of the procedures operating on bytevectors accepts an endianness as an argument, that argument must be one of these symbols. It is a syntax violation for <endianness symbol> to be anything other than an endianness symbol supported by the implementation.

Note: Implementors are encouraged to use widely accepted designations for endianness symbols other than big and little.

(native-endianness) procedure

Returns the endianness symbol associated implementation’s preferred endianness (usually that of the underlying machine architecture). This may be any <endianness symbol>, including a symbol other than big and little.

(bytevector? obj) procedure

Returns #t if obj is a bytevector, otherwise returns #f.

(make-bytevector k) procedure

(make-bytevector k fill) procedure

Returns a newly allocated bytevector of k bytes.

If the fill argument is missing, the initial contents of the returned bytevector are unspecified.

If the fill argument is present, it must be an exact integer in the interval { - 128, ... 255} that specifies the initial value for the bytes of the bytevector: If fill is positive, it is interpreted as an octet; if it is negative, it is interpreted as a byte.

(bytevector-length bytevector) procedure

Returns, as an exact integer, the number of bytes in bytevector.

(bytevector=? bytevector₁ bytevector₂) procedure

Returns #t if bytevector₁ and bytevector₂ are equal—that is, if they have the same length and equal bytes at all valid indices. It returns #f otherwise.

(bytevector-fill! bytevector fill)

The fill argument is as in the description of the make-bytevector procedure. Stores fill in every element of bytevector and returns unspecified values. Analogous to vector-fill!.

(bytevector-copy! source source-start procedure

target target-start k)

Source and target must be bytevectors. Source-start, target-start, and k must be non-negative exact integers that satisfy

where l_source is the length of source and l_target is the length of target.

The bytevector-copy! procedure copies the bytes from source at indices

to consecutive indices in target starting at target-index.

This must work even if the memory regions for the source and the target overlap, i.e., the bytes at the target location after the copy must be equal to the bytes at the source location before the copy.

This returns unspecified values.

(let ((b (u8-list->bytevector ’(1 2 3 4 5 6 7 8)))) (bytevector-copy! b 0 b 3 4) (bytevector->u8-list b)) ⇒ (1 2 3 1 2 3 4 8)

(bytevector-copy bytevector) procedure

Returns a newly allocated copy of bytevector.

2.3 Operations on bytes and octets

(bytevector-u8-ref bytevector k) procedure

(bytevector-s8-ref bytevector k) procedure

K must be a valid index of bytevector.

The bytevector-u8-ref procedure returns the byte at index k of bytevector, as an octet.

The bytevector-s8-ref procedure returns the byte at index k of bytevector, as a (signed) byte.

(let ((b1 (make-bytevector 16 -127)) (b2 (make-bytevector 16 255))) (list (bytevector-s8-ref b1 0) (bytevector-u8-ref b1 0) (bytevector-s8-ref b2 0) (bytevector-u8-ref b2 0))) ⇒ (-127 129 -1 255)

(bytevector-u8-set! bytevector k octet) procedure

(bytevector-s8-set! bytevector k byte) procedure

K must be a valid index of bytevector.

The bytevector-u8-set! procedure stores octet in element k of bytevector.

The bytevector-s8-set! procedure stores the two’s complement representation of byte in element k of bytevector.

Both procedures return unspecified values.

(let ((b (make-bytevector 16 -127))) (bytevector-s8-set! b 0 -126) (bytevector-u8-set! b 1 246) (list (bytevector-s8-ref b 0) (bytevector-u8-ref b 0) (bytevector-s8-ref b 1) (bytevector-u8-ref b 1))) ⇒ (-126 130 -10 246)

(bytevector->u8-list bytevector) procedure

(u8-list->bytevector list) procedure

List must be a list of octets.

The bytevector->u8-list procedure returns a newly allocated list of the octets of bytevector in the same order.

The u8-list->bytevector procedure returns a newly allocated bytevector whose elements are the elements of list list, in the same order. It is analogous to list->vector.

2.4 Operations on integers of arbitrary size

(bytevector-uint-ref bytevector k endianness size) procedure

(bytevector-sint-ref bytevector k endianness size) procedure

(bytevector-uint-set! bytevector k n endianness size) procedure

(bytevector-sint-set! bytevector k n endianness size) procedure

Size must be a positive exact integer. {k, ..., k + size - 1} must be valid indices of bytevector.

bytevector-uint-ref retrieves the exact integer corresponding to the unsigned representation of size size and specified by endianness at indices {k, ..., k + size - 1}.

bytevector-sint-ref retrieves the exact integer corresponding to the two’s complement representation of size size and specified by endianness at indices {k, ..., k + size - 1}.

For bytevector-uint-set!, n must be an exact integer in the set {0, ..., 256^size - 1}.

bytevector-uint-set! stores the unsigned representation of size size and specified by endianness into bytevector at indices {k, ..., k + size - 1}.

For bytevector-sint-set!, n must be an exact integer in the interval { - 256^size/2, ..., 256^size/2 - 1}. bytevector-sint-set! stores the two’s complement representation of size size and specified by endianness into bytevector at indices {k, ..., k + size - 1}.

The ...-set! procedures return unspecified values.

(define b (make-bytevector 16 -127)) (bytevector-uint-set! b 0 (- (expt 2 128) 3) (endianness little) 16) (bytevector-uint-ref b 0 (endianness little) 16) ⇒ #xfffffffffffffffffffffffffffffffd (bytevector-sint-ref b 0 (endianness little) 16) ⇒ -3 (bytevector->u8-list b) ⇒ (253 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255) (bytevector-uint-set! b 0 (- (expt 2 128) 3) (endianness big) 16) (bytevector-uint-ref b 0 (endianness big) 16) ⇒ #xfffffffffffffffffffffffffffffffd (bytevector-sint-ref b 0 (endianness big) 16) ⇒ -3 (bytevector->u8-list b) ⇒ (255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 253))

(bytevector->uint-list bytevector endianness size) procedure

(bytevector->sint-list bytevector endianness size) procedure

(uint-list->bytevector list endianness size) procedure

(sint-list->bytevector list endianness size) procedure

Size must be a positive exact integer. The length of bytevector or, respectively, of list must be divisible by size.

These procedures convert between lists of integers and their consecutive representations according to size and endianness in the bytevector objects in the same way as bytevector->u8-list and u8-list->bytevector do for one-byte representations.

(let ((b (u8-list->bytevector ’(1 2 3 255 1 2 1 2)))) (bytevector->sint-list b (endianness little) 2)) ⇒ (513 -253 513 513) (let ((b (u8-list->bytevector ’(1 2 3 255 1 2 1 2)))) (bytevector->uint-list b (endianness little) 2)) ⇒ (513 65283 513 513)

2.5 Operations on 16-bit integers

(bytevector-u16-ref bytevector k endianness) procedure

(bytevector-s16-ref bytevector k endianness) procedure

(bytevector-u16-native-ref bytevector k) procedure

(bytevector-s16-native-ref bytevector k) procedure

(bytevector-u16-set! bytevector k n endianness) procedure

(bytevector-s16-set! bytevector k n endianness) procedure

(bytevector-u16-native-set! bytevector k n) procedure

(bytevector-s16-native-set! bytevector k n) procedure

K must be a valid index of bytevector; so must k + 1.

These retrieve and set two-byte representations of numbers at indices k and k + 1, according to the endianness specified by endianness. The procedures with u16 in their names deal with the unsigned representation; those with s16 in their names deal with the two’s complement representation.

The procedures with native in their names employ the native endianness, and work only at aligned indices: k must be a multiple of 2.

The ...-set! procedures return unspecified values.

(define b (u8-list->bytevector ’(255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 253))) (bytevector-u16-ref b 14 (endianness little)) ⇒ 65023 (bytevector-s16-ref b 14 (endianness little)) ⇒ -513 (bytevector-u16-ref b 14 (endianness big)) ⇒ 65533 (bytevector-s16-ref b 14 (endianness big)) ⇒ -3 (bytevector-u16-set! b 0 12345 (endianness little)) (bytevector-u16-ref b 0 (endianness little)) ⇒ 12345 (bytevector-u16-native-set! b 0 12345) (bytevector-u16-native-ref b 0) ⇒ 12345 (bytevector-u16-ref b 0 (endianness little)) ⇒ unspecified

2.6 Operations on 32-bit integers

(bytevector-u32-ref bytevector k endianness) procedure

(bytevector-s32-ref bytevector k endianness) procedure

(bytevector-u32-native-ref bytevector k) procedure

(bytevector-s32-native-ref bytevector k) procedure

(bytevector-u32-set! bytevector k n endianness) procedure

(bytevector-s32-set! bytevector k n endianness) procedure

(bytevector-u32-native-set! bytevector k n) procedure

(bytevector-s32-native-set! bytevector k n) procedure

{k, ..., k + 3} must be valid indices of bytevector..

These retrieve and set four-byte representations of numbers at indices {k, ..., k + 3}, according to the endianness specified by endianness. The procedures with u32 in their names deal with the unsigned representation; those with s32 with the two’s complement representation.

The procedures with native in their names employ the native endianness, and work only at aligned indices: k must be a multiple of 4.

The ...-set! procedures return unspecified values.

(define b (u8-list->bytevector ’(255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 253))) (bytevector-u32-ref b 12 (endianness little)) ⇒ 4261412863 (bytevector-s32-ref b 12 (endianness little)) ⇒ -33554433 (bytevector-u32-ref b 12 (endianness big)) ⇒ 4294967293 (bytevector-s32-ref b 12 (endianness big)) ⇒ -3

2.7 Operations on 64-bit integers

(bytevector-u64-ref bytevector k endianness) procedure

(bytevector-s64-ref bytevector k endianness) procedure

(bytevector-u64-native-ref bytevector k) procedure

(bytevector-s64-native-ref bytevector k) procedure

(bytevector-u64-set! bytevector k n endianness) procedure

(bytevector-s64-set! bytevector k n endianness) procedure

(bytevector-u64-native-set! bytevector k n) procedure

(bytevector-s64-native-set! bytevector k n) procedure

{k, ..., k + 7} must be valid indices of bytevector.

These retrieve and set eight-byte representations of numbers at indices {k, ..., k + 7}, according to the endianness specified by endianness. The procedures with u64 in their names deal with the unsigned representation; those with s64 with the two’s complement representation.

The procedures with native in their names employ the native endianness, and work only at aligned indices: k must be a multiple of 8.

The ...-set! procedures return unspecified values.

(define b (u8-list->bytevector ’(255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 253))) (bytevector-u64-ref b 8 (endianness little)) ⇒ 18302628885633695743 (bytevector-s64-ref b 8 (endianness little)) ⇒ -144115188075855873 (bytevector-u64-ref b 8 (endianness big)) ⇒ 18446744073709551613 (bytevector-s64-ref b 8 (endianness big)) ⇒ -3

2.8 Operations on IEEE-754 numbers

(bytevector-ieee-single-native-ref bytevector k) procedure

(bytevector-ieee-single-ref bytevector k endianness) procedure

{k, ..., k + 3} must be valid indices of bytevector. For bytevector-ieee-single-native-ref, k must be a multiple of 4.

These procedures return the inexact real that best represents the IEEE-754 single precision number represented by the four bytes beginning at index k.

(bytevector-ieee-double-native-ref bytevector k) procedure

(bytevector-ieee-double-ref bytevector k endianness) procedure

{k, ..., k + 7} must be valid indices of bytevector. For bytevector-ieee-double-native-ref, k must be a multiple of 8.

These procedures return the inexact real that best represents the IEEE-754 single precision number represented by the eight bytes beginning at index k.

(bytevector-ieee-single-native-set! bytevector k x) procedure

(bytevector-ieee-single-set! bytevector procedure

k x endianness)

{k, ..., k + 3} must be valid indices of bytevector. For bytevector-ieee-single-native-set!, k must be a multiple of 4. X must be a real number.

These procedures store an IEEE-754 single precision representation of x into elements k through k + 3 of bytevector, and return unspecified values.

(bytevector-ieee-double-native-set! bytevector k x) procedure

(bytevector-ieee-double-set! bytevector procedure

k x endianness)

{k, ..., k + 7} must be valid indices of bytevector. For bytevector-ieee-double-native-set!, k must be a multiple of 8.

These procedures store an IEEE-754 double precision representation of x into elements k through k + 7 of bytevector, and return unspecified values.

2.9 Operations on strings

This section describes procedures that convert between strings and bytevectors containing Unicode encodings of those strings. When decoding bytevectors, encoding errors are handled as with the replace semantics of textual I/O (see section 8.2.4): If an invalid or incomplete character encoding is encountered, then the replacement character U+FFFD is appended to the string being generated, an appropriate number of bytes are ignored, and decoding continues with the following bytes.

(string->utf8 string) procedure

Returns a newly allocated (unless empty) bytevector that contains the UTF-8 encoding of the given string.

(string->utf16 string) procedure

(string->utf16 string endianness) procedure

If endianness is specified, it must be the symbol big or the symbol little. The string->utf16 procedure returns a newly allocated (unless empty) bytevector that contains the UTF-16BE or UTF-16LE encoding of the given string (with no byte-order mark). If endianness is not specified or is big, then UTF-16BE is used. If endianness is little, then UTF-16LE is used.

(string->utf32 string) procedure

(string->utf32 string endiannessprocedure)

If endianness is specified, it must be the symbol big or the symbol little. The string->utf32 returns a newly allocated (unless empty) bytevector that contains the UTF-32BE or UTF-32LE encoding of the given string (with no byte mark). If endianness is not specified or is big, then UTF-32BE is used. If endianness is little, then UTF-32LE is used.

(utf8->string bytevector) procedure

Returns a newly allocated (unless empty) string whose character sequence is encoded by the given bytevector.

(utf16->string bytevector) procedure

(utf16->string bytevector endianness) procedure

If endianness is specified, it must be the symbol big or the symbol little. The utf16->string procedure returns a newly allocated (unless empty) string whose character sequence is encoded by the given bytevector. If endianness is big, then UTF-16BE is used. If endianness is little, then UTF-16LE is used. If endianness is not specified, bytevector is decoded according to UTF-16: If it starts with the bytes #xFE, #xFF (a big-endian byte-order mark), then UTF-16BE is used for the remaining contents of bytevector; if it starts with the bytes #xFF, #xFE (a little-endian byte-order mark), then UTF-16LE is used for the remaining contents of bytevector; otherwise the entire contents of bytevector are decoded according to UTF-16BE.

(utf32->string bytevector) procedure

(utf32->string bytevector endianness) procedure

If endianness is specified, it must be the symbol big or the symbol little. The utf32->string procedure returns a newly allocated (unless empty) string whose character sequence is encoded by the given bytevector. If endianness is big, then UTF-32BE is used. If endianness is little, then UTF-32LE is used. If endianness is not specified, bytevector is decoded according to UTF-32: If it starts with the bytes #x00, #x00, #xFE, #xFF (a big-endian byte-order mark), then UTF-32BE is used for the remaining contents of bytevector; if it starts with the bytes #xFF, #xFE, #x00, #x00 (a little-endian byte-order mark), then UTF-32LE is used for the remaining contents of bytevector; otherwise the entire contents of bytevector are decoded according to UTF-32BE.