In order to express text in terms of binary, some sort of encoding has to be used. In a modern context, this is understood as a two-way mapping between Unicode code points (characters) and some amount of binary. Since English isn't the only language in the world, ASCII is not sufficient as a mapping from binary to Unicode; it can't even express em-dashes or curly quotes. Unicode was designed as a universal character set that could potentially represent everything.
Not all encodings can represent all Unicode code points, but Unicode can represent basically everything that exists in modern encodings. Some encodings are language-specific, and some can represent everything in Unicode. Though the world is moving toward Unicode and UTF-8, the reality today is that there are several encodings which must be taken into account.
Factor uses a system of encoding descriptors to denote encodings. Encoding descriptors are objects which describe encodings. Examples are utf8
. Encoding descriptors can be passed around independently. Each encoding descriptor has some method for constructing an encoded or decoded stream, and the resulting stream has an encoding descriptor stored which has methods for reading or writing characters.
Constructors for streams which deal with bytes usually take an encoding as an explicit parameter. For example, to open a text file for reading whose contents are in UTF-8, use the following
"file.txt" utf8 <file-reader>
If there is an error in the encoded stream, a replacement character (0xFFFD) will be inserted. To throw an exception upon error, use a strict encoding as follows
"file.txt" utf8 strict <file-reader>
In a similar way, encodings can be specified when opening a file for writing.
"file.txt" ascii <file-writer>
An encoding is also needed for some words that don't return streams, such as file-contents
, for example
"file.txt" utf16 file-contents
Encoding descriptors are also used by Byte-array streams
and taken by combinators like with-file-writer
which deal with streams. It is not
used with String streams
because these deal with abstract text.
When the binary
encoding is used, a byte-array
is expected for writing and returned for reading, since the stream deals with bytes. All other encodings deal with strings, since they are used to represent text.