Unicode support - Factor Documentation

Unicode support

The unicode vocabulary and its sub-vocabularies implement support for the Unicode 14.0 character set.

The Unicode character set contains most of the world's writing systems. Unicode is intended as a replacement for, and is a superset of, such legacy character sets as ASCII, Latin1, MacRoman, and so on. Unicode characters are called code points; Factor's Strings are sequences of code points.

The Unicode character set is accompanied by several standard algorithms for common operations like encoding text in files, capitalizing a string, finding the boundaries between words, and so on.

The Unicode algorithms implemented by the unicode vocabulary are:

Case mapping

Collation and weak comparison

Unicode category syntax

Word and grapheme breaks

Unicode normalization

The following are mostly for internal use:

Unicode category syntax

Unicode data tables

See also
ASCII, I/O encodings