Saturday, December 24, 2022

Tiny Prime Combinations

Writing a prime number generator in lambda calculus, starting from first principles. Getting our hands dirty on syntax, term representation, term reduction, Church numbers and lists, context-free recursion and the Y-combinator, lazy evaluation and infinite data structures, and many more shenanigans. Going from zero to hero, and fitting it on a single A4.

Introduction


We first came up with this idea about three decades ago, but never really got to it. The idea to write Erathostenes' prime sieve in lambda calculus, building it from first principles.

Inspired by a prime number generating expression in Miranda (Haskell's predecessor, say), using list comprehension, something like:
[ p <- [1..] | #[ d <- [1..p] | p%d = 0 ] = 2 ]
And thereby convinced that it (the idea) could be done on a single A4.

Despite lambda stuff being ubiquitus on the Internet today, we persisted in taking the test, and came up with a late arrival. And yes, adorned with generous comments, it fits a single A4, as demonstrated at the end of this post under Prime Number Generation from First Principles.

By the way, we stopped pasting code snippets in the text (apart from lambda fragments, that is), but made the entire implementation available on GitLab Suskewiet.

And oh yeah, you're right. It's not Erathostenes', but another variation ...

Prime Number Generation using List Comprehension


List comprehension is a techique where lists are formally described, typically using a fairly mathematical notation, very expressive (as in short), and with the flavour of executable specifications.

We'll explain it by illustration, using the Miranda prime number generator:
[ p <- [1..] | #[ d <- [1..p] | p%d = 0 ] = 2 ]
First of all, don't pick on the syntax, our Miranda knowledge is a little rusty.

The list is opened using a square bracket, where one can start enumerating elements, or alternatively, describe the content of the list. Over here, we start reading, and it says:

The list of all p, elements of the set of natural numbers, ...

The notation [1..] is a list enumerating its elements, starting from 1, but without an upper bound. This is indeed an infinite list, so in order to work with it, we need a computational context that supports lazy evaluation, such as lambda calculus.

With the vertical bar |, we continue reading ... for which ...

Following is a condition that constrains the elements that are part of the list. In this case, the condition is on the number of elements in a list described by a nested expression.

We continue reading from the hash symbol #, an operator giving the length of a list, ... the cardinality of the list of all d, elements of the range from 1 to p, ...

This time a range with an upper bound. Again with a limiting condition ... for which the remainder after dividing p by d is 0, ...

We refer, incorrectly, but equally clear, to the modulo operator using the percent sign %, and to the equality operator using the equals sign =.

Consequently, the nested expression turns out to describe the list of divisors of p. We continue reading, one level up, back to the cardinality ... is 2.

List Comprehension Primitives


List comprehension can be implemented (tranformed into executable code) using a limited number of primitives. Besides the generation of finite and infinite enumerated lists, for our prime number generator, we only need the filter function to select data, denoted by the vertical bar |.

Generically, one would also need the map function, to manipulate data, and the fold function, for aggregation, to provide maximal power and expressivity in list comprehension.

Nice to see how these concepts, originating long time ago from pure functional programming, found their way into nearly every contemporary programming language. Second best, after world dominance by Haskell.

The Grand Design


In order to implement such a prime number generator from scratch, we will need to climb up the ladder of abstraction, addressing the following topics:
  • Select a representation of natural numbers, such as the Church encoding.
  • Define a number of arithmetic and relational operations on these numbers.
  • Select a representation for boolean values and define a number of logical operations.
  • Introduce the Y combinator for the implementation of recursive functions.
  • Select a representation of lists, with basic list operations.
  • Implement the list primitives required for list comprehension.
  • Represent the infinite list of natural numbers.
  • Use all of these to implement the prime number generator.

No worries. All of these building blocks can easily be gathered from various sources on the Internet.

Getting our Hands Dirty


However, to get to the bottom of things, we will also implement lambda calculus from scratch in Java, including a tokenizer and a parser for lambda expressions and term rewriting for actual computation. This will alow us to play with the expressions, and effectively test the prime number generation.

The full source code, including unit testing and all the lambda snippets presented in this post, are publicly accessible on GitLab Suskewiet.

In what follows, we will gradually climb up the aforementioned abstraction ladder, step by step, starting from first principles, to the effective calculation of prime numbers.

First Principles


The only data type in lambda calculus is a function. The only operation is function application. Function application is denoted by juxtaposition. It associates to the left. Brackets can be used to override default association.
a b c = (a b) c ≠ a (b c)
Besides function application, abstraction can be used to lift an expression to a function. Many sources use the lambda and dot characters to denote abstraction, as in λx.e, often shortened to x.e. In this post, we will use the somewhat more programming-oriented backslash operator to denote abstraction, as in (\x e).

All of these refer to a function with formal parameter x and function body e. Application of such a function on an argument a is done by substitution of (free) occurrences of x in e by a:
(\x e) a -> e[ x/a ]
Lots of notational conventions here. Abstraction runs to the end of an expression or to the first unmatched closing bracket, hence the brackets. The arrow denotes rewriting the term, also referred to as reduction. We use square brackets to define a substitution, in this case consisting of a single replacement, of variable x by expression a, and apply that subtitution to expression e by putting it after the expression.

As an example, consider the application of the identity function to an expression e:
(\x x) a -> x[ x/a ] -> a
Computation in lambda calculus is done by rewriting a term until no more rewrite rules can be applied. The term is then said to be in normal form. Some terms will never reach normal form, if reduction does not terminate, as in
(\x x x) (\x x x) -> (\x x x) (\x x x) -> ...
Many sources provide more detailed and more correct explanations of lambda terms and reduction rules. You may want to start with the Wikipedia page on Lambda Calculus.

Church Numbers


There's a lot more to it, but essentially you can capture the natural numbers by defining the number 0 and a successor function. Church encoding, named after Alonzo Church, does it like this:
zero = \f \x x
succ = \n \f \x f (n f x)
Let's see what this means for the representation of the number 1:
succ zero
-> (\n \f \x f (n f x)) zero
-> \f \x f (zero f x)
-> \f \x f ((\f \x x) f x)
-> \f \x f ((\x x) x)
-> \f \x f x
And the number 2:
succ (succ zero)
-> (\n \f \x f (n f x)) (succ zero)
-> \f \x f (succ zero f x)
-> \f \x f ((\n \f \x f (n f x)) zero f x)
-> \f \x f ((\f \x f (zero f x)) f x)
-> \f \x f ((\x f (zero f x)) x)
-> \f \x f (f (zero f x))
-> \f \x f (f ((\f \x x) f x))
-> \f \x f (f ((\x x) x))
-> \f \x f (f x)
Apparently, the number represented by the term is reflected by the number of occurences of f in the term.

The derivation illustrates that our reduction supports making named references to expressions that have been defined previously. We will come back to this when providing some Notes on the Implementation. In what follows we will extend identifier resolution to natural numbers, merely to simplify expressions and increase readability, as in
2 -> \f \x f (f x)
We will also need the predecessor function, which is somewhat obscurely defined as:
pred = \n \f \x n (\g \h h (g f)) (\u x) (\u u)
Again, lots of material avaiable, such as on the Wikipedia page on Church Encoding. There, you will also find the derivation of the pred function.

Arithmetic


Undoubtably no coincidence, but the Church encoding lends itself very well to define arithmetic in a simple and expressive way. Consider the following definition for add, a function taking two number parameters that returns the sum of those numbers:
add = \m \n n succ m
In order to understand how it works, note that the implementation starts by applying the number n to the successor function. Let's apply the number 2 to the successor function and see what happens:
2 succ
-> (\f \x f (f x)) succ
-> \x succ (succ x)
-> ...
We interrupt the reduction at the point where it becomes apparent that the number 2, applied to the successur function, reduces to a function that applies the successor function twice. This is a general property, where the representation of a number n, when used as a function, represents the n-fold application, through composition, of its argument.

Consequently, adding 1 and 2 ...
add 1 2
-> (\m \n n succ m) 1 2
-> (\n n succ 1) 2
-> 2 succ 1
-> ...
... reduces to applying the successor function to 1 twice, reducing further to the representation of 3.

In fact, add is a function of one parameter, that returns a function that in turn will consume another parameter. This is a principle referred to as currying, named after Haskell Curry, which applies across the board in the context of lambda calculus, and is very common in functional programming in general.

Completely analogous, we can define arithmetic subtraction as
sub = \m \n n pred m
Other operations can be defined with similar elegance, but with the exception of the modulo operator, we don't need them for our purpose. The modulo operator itself requires some more preparation and will be addressed after introducing the Y-combinator.

Booleans and Logical Operations


As for numbers, working with boolean values requires a representation of the values and some basic operations on those values. We will start with the values:
true = \a \b a
false = \a \b b
Here we see that true consumes two arguments and returns the first, while false consumes two arguments and returns the second. We assume this is clear by argumentation, but try it out if not.

As for numbers, this does not mean anything much by itself, but gets meaning in combination with the operations. The boolean operations we need are if for branching and not for logical negation:
if = \p \a \b p a b
not = \p \a \b p b a
and = \p \q p q p
The if function consumes three arguments, the condition, the then-part and the else-part. It applies the condition, assumed to reduce to a boolean value, on these parts. Knowing the effect of using boolean values as a function, you can guess what comes out ...
if true a b
-> (\p \a \b p a b) true a b
-> (\a \b true a b) a b
-> (\b true a b) b
-> true a b
-> (\a \b a) a b
-> (\b a) b
-> a
The not function is the same, but swaps the last two arguments, implicitly negating the condition.

The and function consumes two arguments, and uses the first one as a boolean aplied to both arguments. If the first argument reduces to true, it returns the second argument. Otherwise, it returns the first argument. Exactly as one would expect from a logical and function.

Relational Operations


Next we need relational operations, on numbers, that is.
iszero = \n n (\x false) true
leq = \m \n iszero (sub m n)
eq = \m \n and (leq m n) (leq n m)
lt = \m \n not (leq n m)
To understand the iszero function, first note that (\x false) is the constant false function. Applied to any argument, this function will return false.

The iszero function applies its argument, assumed to represent a number, to the constant false function. We learnt earlier that this will repeatedly apply the constant false function, exactly n times, in this case to the value true. After a single application, the result will be false, and remain false. Only for the number 0, without any application of the constant false function, the result will be true.

The remaining functions are less-than-or-equal, equal and less-than, respectively. Their definition boils down to plain functional programming. We will assume them to be self-explaining.

With the exception of the leq function maybe, where we note that the pred function is idempotent from 0 onward:
pred 0
-> (\n \f \x ((n \g \h h (g f)) \u x) \u u) 0
-> \f \x ((0 \g \h h (g f)) \u x) \u u
-> \f \x (((\f \x x) \g \h h (g f)) \u x) \u u
-> \f \x ((\x x) \u x) \u u
-> \f \x (\u x) \u u
-> \f \x x

The Issue with Recursion


The next thing we need is recursion. For all kinds of list utilities, and for some additional arithmetic.

Well, what is the issue here ? We can perfectly define the factorial function for example, using the building blocks we have so far:
fac = \n if (leq n 1) 1 (mul n (fac (pred n)))
No, we can't. Although the definition seems to be just like anything we've done before, in fact, it isn't. The difference is in the self-reference.

Up to now, we've named expressions merely for convenience, readability. Lambda calculus in itself does not provide a naming mechanism. We've let it sneak in silently, and it went unnoticed. That is because, up to now, we could easily eliminate the names by substituting them for their definition wherever they occur. With the definition of the factorial function, due to the self-reference, this is no longer the case.

Without self-reference, all naming could be resolved statically, using a kind of let operator:
let x = a in e
Our definition of the factorial function would require dynamic substitution, during evaluation. Binding and resolving names during evaluation would jeopardize referential transparency, one of the key concepts in functional programming. We refer to the Notes on the Implementation for some practical considerations on naming and name resolution.

We can define an alternative factorial function facr, with an extra parameter to provide the function for the recursive invocation, abstracting it for the name of the recursive function:
facr = \r \n if (leq n 1) 1 (mul n (r r (pred n)))
This equally calcuates the factorial, without self-reference, invoked as in
facr facr 3
However, having to provide the function as an argument twice is inconvenient.

The Y-combinator will come to the rescue.

The Y-Combinator


The Y-combinator is a lambda expression that helps in defining context-free recursive functions. It, or at least one variant, is defined as
Y = \f (\x f (x x)) (\x f (x x))
Applying it to a function, and rewriting it, we get
Y f
-> (\f (\x f (x x)) (\x f (x x))) f
-> (\x f (x x)) (\x f (x x))
-> f ((\x f (x x)) (\x f (x x)))
-> ...
The reduction sequence will never terminate. Indeed, this expression does not have a normal form.

However, somewhat out of nowhere, let's look at the following reduction:
f (Y f)
-> f ((\f (\x f (x x)) (\x f (x x))) f)
-> f ((\x f (x x)) (\x f (x x)))
-> ...
Again, no termination. But we see the same term arising as in the previous reduction sequence.

In fact, we just demonstrated that (Y f) is a fixed-point of f, for any f:
f (Y f) ~> Y f
Note the difference in notation for this reduction with the ~> arrow. Although these terms are semantically equal, we did use some rewrite rules right-to-left to prove it.

We can use this property to eliminate self-reference for recursion. Consider the original, single argument definition of the factorial function, abstracting it to the function name
facr = \r \n if (leq n 1) 1 (mul n (r (pred n)))
and then defining fac as
fac = Y facr
Now consider the reduction sequence for the factorial of 3:
fac 3
-> Y facr 3
~> facr (Y facr) 3
-> (\r \n if (leq n 1) 1 (mul n (r (pred n)))) (Y facr) 3
-> (\n if (leq n 1) 1 (mul n (Y facr (pred n)))) 3
-> if (leq 3 1) 1 (mul 3 (Y facr (pred 3)))
-> mul 3 (Y facr 2)
~> mul 3 (fac 2)
Note again the use of the alternative arrow ~> for upstream reductions. Once to reduce the Y-combinator in an appropriate way, and once to show that the recursion effectively works as expected: the factorial of 3 is 3 times the factorial of 2. Once more, we refer to the Notes on the Implementation, this time for some practical considerations on the reduction of the Y-combinator.

In summary, we define a self-referencing version of the factorial function, abstract it for recursion and apply the Y-combinator:
fac = Y \r \n if (leq n 1) 1 (mul n (r (pred n)))
Repeat, this time for our ultimate objective of generating prime numbers, where we will need the modulo operator.

We define a self-referencing version of the mod function:
mod = \n \d if (lt n d) n (mod (sub n d) d)
We abstract that function for recursion:
modr = \r \n \d if (lt n d) n (r (sub n d) d)
And we apply the Y-combinator to eliminate the extra function argument:
mod = Y modr = Y \r \n \d if (lt n d) n (r (sub n d) d)

Lists and Basic List Operations


For the last time, we will be introducing some constructs for a data structure and related operations, this time for list representation and basic operations:
nil = false
cons = \x \y \z z x y
head = \p p true
tail = \p p false
isnil = \l l (\h \t \d false) true
Note that cons, head and tail are usually introduced as pair, first and second, respectively. Indeed, a list is represented as a pair, the first element being the head of the list and the second element being the tail of the list, with nil as a representation of the empty list.

We will quickly verify the cooperation of these elements, starting with a list with head h and tail t:
cons h t
-> (\x \y \z z x y) h t
-> (\y \z z h y) t
-> \z z h t
Checking the head function:
head \z z a b
-> (\p p true) \z z a b
-> (\z z a b) true
-> true a b
-> a
The trick is to bring true into position to be applied to the two arguments previously captured by cons, as it will return the first argument. The tail function works in exactly the same way, but will apply false to the arguments. We leave its verification as an exercise.

As for isnil, looking at the definition, and knowing that the empty list nil is represented as false, it will obviously reduce to true. Let's consider the case where the argument list is not empty:
isnil \z z a b
-> (\l l (\h \t \d false) true) \z z a b
-> (\z z a b) (\h \t \d false) true
-> (\h \t \d false) a b true
-> (\t \d false) b true
-> (\d false) true
-> false

Some More List Operations


Now that the last data type has been introduced, the cryptic part is over. From here on, it becomes plain functional programming ...

We introduce the len function to obtain the number of elements in a list, the getfunction to get an element by position, and the take function to obtain a list with the first so many elements of a list.
len = Y \r \l if (isnil l) 0 (succ (r (tail l)))
get = Y \r \l \n if (leq n 0) (head l) (r (tail l) (pred n))
take = Y \r \l \n if (leq n 0) nil (cons (head l) (r (tail l) (pred n)))
Note the use of the Y-combinator, together with the extra abstraction for the function argument.

It should be fairly easy to understand the functions by reading the definition. Verification is more of the same. Check it out on GitLab Suskewiet if you feel like it.

List Comprehension


Finally, we've come to the subject of list comprehension. Not too much remaining though. We only need the filter function:
filter = Y \r \p \l if (isnil l) nil (if (p (head l)) (cons (head l) (r p (tail l))) (r p (tail l)))

The Natural Numbers


The final building block we need to define the prime number generator is the set of natural numbers. Let's start by defining the numbers function, generating a list of consecutive natural numbers from a starting value upward:
numbers = \n cons n (numbers (succ n))
Using the Y-combinator and an extra abstraction to eliminate the self-reference has become routine by now:
numbers = Y \r \n cons n (r (succ n))
Then we define the set of natural numbers, excluding 0, as
numbers1 = numbers 1
This is an infinite set, but thanks to lazy evaluation lambda calculus can work with infinite structures. We cannot request for the entire structure, but we can, for example, ask for a finite part of the structure:
get numbers1 6 ~> 7
Finally, we define the range function to take the sublist of the so many first natural numbers, again excluding 0:
range = \n take numbers1 n

Prime Number Generation


We are now ready to define the set of prime numbers, according to the same principle as the Miranda snippet in the introduction, but in strict lambda notation:
divisor = \n \d iszero (mod n d)
prime = \n eq (len (filter (divisor n) (range n))) 2
primes = filter prime numbers1
The divisor function is a predicate that checks whether its second argument is a divisor of its first argument.

The prime function is a predicate that checks whether its argument is a prime number. It does so by counting the number of elements in the set of divisors of the argument, which should be 2.

The set of primes can then simply be defined by applying prime as a filter to the set of the natural numbers. We can, for example, take the fourth (0-based) prime number:
get primes 4 ~> 11
It will return 11, though it will take over ten million reductions to do so. Real life functional programming languages come with a primitive type for numbers.

Notes on the Implementation


As stated under Getting our Hands Dirty, the full source code related to this post, in Java, is available on GitLab Suskewiet.

In what follows, we will elaborate on a number of aspects of the implementation that might justify somewhat more explanation.

Syntax

Besides the representation of a lambda term, as an abstract syntax tree (AST), and an implementation of the rewrite rules, the repository provides a tokenizer and an operator precedence parser (OPP) for lambda calculus and combinatory logic. It also provides the conversion of lambda expressions to combinatory logic and vice versa, but those are never used here.

For this post, abstraction is denoted using the backslash operator as in (\x y). The implementation equally supports using the dot operator, as in x.y.

Application associates to the left, as explained. Abstraction precedes over application. That is, for a correct reading:
a \b c d = a (\b c d) c ≠ (a \b c) d
Precedence may be different for other dialects.

Context

According to principles of functional programming, and in support of referential transparency, naming and name resolution should be static. The implementation does however support dynamic naming and resolution. Consequently, the self-referencing definition of the factorial function, for example, does work after all:
fac = \n if (leq n 1) 1 (mul n (fac (pred n)))
fac 3 ~> 6
However, the realization of the prime number generator does not need nor exploit this feature. That's where the Y-combinator kicks in, precisely to make the point.

Reduction

The implementation of term rewriting features outermost reduction, also known as lazy evaluation, as opposed to innermost reduction, also known as eager evaluation. This is required for working with infinite data structures, for example. Outermost reduction is implemented as follows:
  • For an identifier:
    • If the identifier is a number, resolve it to the Church representation of that number.
    • Try resolving the identifier in the context.
  • For an application:
    • If the function of the application is an abstraction, substitute the variable of that abstraction for the argument of the application in the expression of that abstraction.
    • Try reducing the function of the application.
    • Try reducing the argument of the application.
  • For an abstraction:
    • Try reducing the expression of the abstraction.
The first rule that succeeds wins. Trying succeeds if it makes a difference.

Now remember that the Y-combinator does not have a normal form. Reducing the Y-combinator according to the recipe for outermost reduction above will never terminate, but rather exhaust your computer's resources.

In the examples we provided, the Y-combinator was reduced in a very particular way:
Y f ~> f (Y f)
Upstream, so to speak. This then creates the opportunity for the first occurence of f to be addressed with priority in subsequent reductions.

Implementing this behaviour generically in the reduction rules is ugly. They don't tell you in elementary, but the nicest way to achieve this is to create the Y-combinator as a built-in combinator with its own reduction logic.

This then boils down to adding the following reduction rule:
  • For an application:
    • If the function of the application is the Y-combinator, apply the argument of the application to the term that is being reduced.
Add it to the end, such that all other reduction rules get priority over this one. That way, the Y-combinator will reduce only after all other reduction opportunities have been exhausted, and it will reduce only once before handing control back to other reduction rules.

Obviously, you will have to remove the regular definition of the Y-combinator from the context, in order for this to work.

Testing

By now you will have understood that the naming of bound variables does not make a difference. This means for example that the identity function can be written in different ways:
\x x = \y y
Generally speaking, testing for equality of terms should make abstraction of the naming of bound variables, so long as these names are used consistenly throughout the terms. This is particularly relevant for writing unit tests on the reduction of terms, as term rewriting tends to introduce new names for bound variables.

In the implementation this is done by renaming bound variables in subterms before recursively checking for equality of subterms.

As a consequence, the Java Object contract requires the hashcode of a term to make the same abstraction, as it should be consistent with the implementation of equals. This has not been done yet. We will try addressing it in due course.

Prime Number Generation from First Principles


Here, we bring it all together. Generating prime numbers with no prior instrumentation but the basic principles of lambda calculus: function application and abstraction.
# church numbers
zero = \f \x x
succ = \n \f \x f (n f x)
pred = \n \f \x n (\g \h h (g f)) (\u x) (\u u)

# arithmetic
add = \m \n n succ m
sub = \m \n n pred m

# booleans
true = \a \b a
false = \a \b b

# logical operations
if = \p \a \b p a b
not = \p \a \b p b a
and = \p \q p q p

# relational
iszero = \n n (\x false) true
leq = \m \n iszero (sub m n)
eq = \m \n and (leq m n) (leq n m)
lt = \m \n not (leq n m)

# the Y combinator
Y = \f (\x f (x x)) (\x f (x x))

# arithmetic continued
mod = Y \r \n \d if (lt n d) n (r (sub n d) d)

# lists
nil = false
cons = \x \y \z z x y
head = \p p true
tail = \p p false
isnil = \l l (\h \t \d false) true

# list utilities
len = Y \r \l if (isnil l) 0 (succ (r (tail l)))
get = Y \r \l \n if (leq n 0) (head l) (r (tail l) (pred n))
take = Y \r \l \n if (leq n 0) nil (cons (head l) (r (tail l) (pred n)))

# list comprehension
filter = Y \r \p \l if (isnil l) nil (if (p (head l)) (cons (head l) (r p (tail l))) (r p (tail l)))

# natural numbers
numbers = Y \r \n cons n (r (succ n))
numbers1 = numbers 1
range = \n take numbers1 n

# prime number generation
divisor = \n \d iszero (mod n d)
prime = \n eq (len (filter (divisor n) (range n))) 2
primes = filter prime numbers1