Factor is a concatenative programming language in the spirit of Forth. What is a concatenative language?

To understand concatenative programming, imagine a world where every value is a function, and the only operation allowed is function composition. Since function composition is so pervasive, it is implicit, and functions can be juxtaposed in order to compose them. So if f and g are two functions, their composition is just f g (unlike in mathematical notation, functions are read from left to right, so this means first execute f, then execute g).

This requires some explanation, since we know functions often have multiple inputs and outputs, and it is not always the case that the output of f matches the input of g. For instance, g may need access to values computed by earlier functions. But the only thing that g can see is the output of f, so the output of f is the whole state of the world as far as g is concerned. To make this work, functions have to thread the global state, passing it to each other.

There are various ways this global state can be encoded. The most naive would use a hashmap that maps variable names to their values. This turns out to be too flexible: if every function can access any piece of global state, there is little control on what functions can do, little encapsulation, and ultimately programs become an unstructured mess of routines mutating global variables.

It works well in practice to represent the state of the world as a stack. Functions can only refer to the topmost element of the stack, so that elements below it are effectively out of scope. If a few primitives are given to manipulate a few elements on the stack (e.g., swap, that exchanges the top two elements on the stack), then it becomes possible to refer to values down the stack, but the farther the value is down the stack, the harder it becomes to refer to it.

So, functions are encouraged to stay small and only refer to the top two or three elements on the stack. In a sense, there is no distinction between local and global variables, but values can be more or less local depending on their distance from the top of the stack.

Notice that if every function takes the state of the whole world and returns the next state, its input is never used anymore. So, even though it is convenient to think of pure functions as receiving a stack as input and outputting a stack, the semantics of the language can be implemented more efficiently by mutating a single stack.