UTF-8 encoding
Factor handbook » Input and output » I/O encodings » Encoding descriptors

Prev:Binary encoding


UTF-8 is a variable-width encoding. 7-bit ASCII characters are encoded as single bytes, and other Unicode code points are encoded as 2 to 4 byte sequences.
utf8


While not generally recommended, UTF-8 can have a Byte-Order-Mark (BOM) inserted at the beginning of a stream. We provide an encoding that will optionally skip the BOM, as well as insert the BOM when encoding.
utf8-bom ( -- utf8-bom )