The unicode vocabulary and its sub-vocabularies implement support for the Unicode 5.2 character set.
The Unicode character set contains most of the world's writing systems. Unicode is intended as a replacement for, and is a superset of, such legacy character sets as ASCII, Latin1, MacRoman, and so on. Unicode characters are called code points; Factor's Strings are sequences of code points.
The Unicode character set is accompanied by several standard algorithms for common operations like encoding text in files, capitalizing a string, finding the boundaries between words, and so on.