Parser algorithm
Factor handbook » The language » Syntax

Next:Parse-time word lookup


At the most abstract level, Factor syntax consists of whitespace-separated tokens. The parser tokenizes the input on whitespace boundaries. The parser is case-sensitive and whitespace between tokens is significant, so the following three expressions tokenize differently:
2X+ 2 X + 2 x +

As the parser reads tokens it makes a distinction between numbers, ordinary words, and parsing words. Tokens are appended to the parse tree, the top level of which is a quotation returned by the original parser invocation. Nested levels of the parse tree are created by parsing words.

The parser iterates through the input text, checking each character in turn. Here is the parser algorithm in more detail -- some of the concepts therein will be defined shortly:
If the current character is a double-quote ("), the " parsing word is executed, causing a string to be read.
Otherwise, the next token is taken from the input. The parser searches for a word named by the token in the currently used set of vocabularies. If the word is found, one of the following two actions is taken:
If the word is an ordinary word, it is appended to the parse tree.
If the word is a parsing word, it is executed.
Otherwise if the token does not represent a known word, the parser attempts to parse it as a number. If the token is a number, the number object is added to the parse tree. Otherwise, an error is raised and parsing halts.

Parsing words play a key role in parsing; while ordinary words and numbers are simply added to the parse tree, parsing words execute in the context of the parser, and can do their own parsing and create nested data structures in the parse tree. Parsing words are also able to define new words.

While parsing words supporting arbitrary syntax can be defined, the default set is found in the syntax vocabulary and provides the basis for all further syntactic interaction with Factor.