UTF-8 is a variable-width encoding. 7-bit ASCII characters are encoded as single bytes, and other Unicode code points are encoded as 2 to 4 byte sequences.
While not generally recommended, UTF-8 can have a Byte-Order-Mark (BOM) inserted at the beginning of a stream. We provide an encoding that will optionally skip the BOM, as well as insert the BOM when encoding.