Mind to Matter: 2014

Monday, October 20, 2014

Self-Replication in a Combinatorial Soup

About the spontaneous emergence of self-replication in combinatory logic and its implementation in Java. We explore some variations of combinatory logic, such as SKI, BCKW and the one-point basis. The use of anonymous Java classes provides a simple and elegant implementation.

Introduction

Evidence for evolution of organisms in an ecosystem is ubiquitous. The power of Darwinism in explaining the variation in life forms on earth is overwhelming. Experiments in artificial life were able to duplicate many aspects of evolution.

The remaining mystery of life on earth is in the origins of the first living organisms. Tierra and Avida both start of with a population of "functioning" organisms and focus on the evolution of their code base, a metaphor for the genome. This post is about looking at that first step: the transition of dead matter into the first biological agents with some properties of life.

Simulation of chemistry is computationally complex. It would also be hard to create an environment with more potential than the one we have around us. Therefore, we will be using combinatory logic as a metaphor for chemistry. More expressive, computationally simpler, and Turing-equivalent.

Combinatory Logic

Combinatory logic is referring to variable-free variants of Lambda calculus. Combinatory logic offers function application as its sole operation. Function application is denoted by juxtaposition and associates to the left. Brackets are used to specify a different association. As for lambda calculus, all computation is realized through term rewriting, also called reduction.

One variant of combinatory logic, SKI Combinator Calculus, uses an alphabet of three letters, I, K and S, also called combinators. The rewrite rules for this calculus are few and simple:

Ix -> x
Kxy -> x
Sfgx -> fx(gx)

Here, the combinator I represents the identity function, as, applied to x, it yields x. The combinator K is the constant function generator, as K applied to x is the constant x-function. Finally, the combinator S is called the steering function. It distributes its third argument over the first and the second argument, and yields the application of both results.

In order to get some familiarity with term rewriting, we will write out the reduction of SKK applied to x.

SKKx
-> Kx(Kx)
-> x

The reduction shows that SKK applied to x yields x, for any x. Apparently, SKK is semantically equivalent to the I combinator, and the latter is redundant in SKI combinator calculus. This means we could do with the combinators K and S only, without losing computational power.

Combinatorial Soup

For our experiment we will simulate a combinatorial soup, a metaphor for the primordial soup, as a pool of expressions in SKI combinator calculus. The pool will be initialized with random combinators out of the alphabet, I, K and S. Chemical interaction will be replaced by function application, meaning that for randomly chosen expressions a and b, a will be replaced by ab (a applied to b) and b will be replaced by ba (b applied to a).

In our experiment, we will use no locality, in the sense that interactions occur between two randomly chosen expressions, regardless of their location. One might say the soup is served very hot.

Self-Replication

Given this setup, on most occasions the simulation results in a homogeneous soup, populated with specific combinatorial expressions. The shortest of these expressions, consisting of only 10 primitive combinators, is:

S(K(SSK))(K(K(SSK)))

This expression is self-replicating: when applied to any other expression, it yields a copy of itself, as illustrated by the following reduction sequence:

S(K(SSK))(K(K(SSK)))x
-> K(SSK)x(K(K(SSK))x)
-> SSK(K(K(SSK))x)
-> S(K(K(SSK))x)(K(K(K(SSK))x))
-> S(K(SSK))(K(K(K(SSK))x))
-> S(K(SSK))(K(K(SSK)))

The property of self-replication is not a trivial thing. The reduction is not merely making a copy of the original, as this would be kind of cheating. In fact, copying would be impossible, as there is no introspection in combinatory logic.

During the reduction the expression is thouroughly consumed. The fact that a copy of the expression is eventually reproduced implies that the structure of the expression is somehow encoded implicitly in the expression itself.

Another expression that frequently occurs is somewhat more complex with 11 primitive combinators:

S(K(SIK))(K(S(K(SIK))))

As before, this expression is self-replicating, as illustrated by the following reduction sequence:

S(K(SIK))(K(S(K(SIK))))x
-> K(SIK)x(K(S(K(SIK)))x)
-> SIK(K(S(K(SIK)))x)
-> I(K(S(K(SIK)))x)(K(K(S(K(SIK)))x))
-> K(S(K(SIK)))x(K(K(S(K(SIK)))x))
-> S(K(SIK))(K(K(S(K(SIK)))x))
-> S(K(SIK))(K(S(K(SIK))))

Many larger variants are arising on other occasions, all sharing the same property of self-replication.

An Alternative Alphabet

Different variations of combinatory logic have been defined, each based on a specific alphabet, their own set of primitive combinators, and each with their own set of rewrite rules. One such variant is the BCKW-system, which uses the combinators ... indeed.

The semantics of the BCKW-system are defined by the following rewrite rules:

Bxyz -> x(yz)
Cxyz -> xyz
Kxy -> y
Wxy -> xyy

Here, B and C can be seen as restricted versions of the S combinator, simplifying reduction at the functor or argument side respectively.

Repeating the experiment with a combinatorial soup based on this alphabet yields similar results. As this variant is somewhat more expressive, the shortest expression that appears counts only 7 primitive combinators:

C(K(WC))(K(WC))

Self-replication of this expression is demonstrated by the following reduction sequence:

C(K(WC))(K(WC))x
-> K(WC)x(K(WC))
-> WC(K(WC))
-> C(K(WC))(K(WC))

A variant consisting of 9 primitive combinators is:

C(C(KW)C)(C(KW)C)

The demonstration of self-replication is left as an exercise.

One-Point Basis

We will close the experiment using a variant of combinatory logic with a one-point basis. Let's define the combinator X as:

X := x.xSK

We are using lambda notation for the definition of X, as explained on Wikipedia under combinatory logic.

Both K and S have semantic equivalents that can be expressed in function of X, as for example:

K := X(X(XX))
S := X(X(X(XX)))

This means that the variant of combinatory logic with only X as an alphabet is computationally equivalent to the other variants. However, we do not have a reduction rule for X that is expressed in terms of X only. More problematically, if we substitute K and S in the definition of X, we get an expression that has no normal form: the reduction never stops.

As an alternative, we will reduce expressions in X using the definition in terms of K and S, and if that reduction results in a normal form, substitute K and S by their equivalents expressed in terms of X, without any further reduction. This way, we avoid the problem of infinite reduction inherent to the definition of X and we get a kind of normal form for many expressions.

Repeating the experiment with a combinatorial soup based on this one-point basis and this single rewrite rule again yields similar results. As this variant of combinatory logic is significantly less expressive, the shortest self-replicating expression that appears requires 45 primitive combinators:

X(X(X(XX)))(X(X(XX))(X(X(X(XX)))(X(X(X(XX))))(X(X(XX)))))(X(X(XX))(X(X(XX))(X(X(X(XX)))(X(X(X(XX))))(X(X(XX))))))

As before, the demonstration of self-replication is left as an exercise, but make sure to have enough paper at hand.

Conclusions

In terms of artificial life and the mystery of the origin of life, the results of the experiment are not convincing. The fitness function is endogenous, but with the best of intentions, the resulting "organisms" cannot be called protocells. Also, the system is not "open-ended": after a while, no new forms are arising and the soup becomes stable in its composition.

In terms of playing with combinatory logic and self-organization however, the results obtained are interesting and may be an inspiration for future work. We could add locality, using a 3D structure for the ecosystem and limit interactions in space. Also, we might introduce occasional errors in term rewriting, as a kind of mutation. This would change proximity of "organisms" and allow the exploration of new tracks in "animal space", as introduced by Richard Dawkins in The Blind Watchmaker. It might be a first step in coming to an "open-ended" system ?

Another orientation might be to make the fitness function exogenous, measuring resource consumption, such as the size of expressions or the number of reduction cycles required to reduce. Alternatively, we could introduce fitness scoring in function of the ability to perform specific tasks, in number theory for example, and evolve useful functions.

Or we could use the ability for self-interpretation as a fitness function ...

Combinator Representation and Reduction

To complete this post we will present the essentials of the implementation of combinators in Java as it was used to run these simulations. The implementation makes a clear separation between representation and reduction, such that these aspects can be used independently of one another.

Combinators are immutable. All combinators implement the Combinator interface.

public interface Combinator {
   public Combinator apply(Combinator a);
   public Combinator reduce();
}

The apply function is used to create the representation of function application. The reduce function implements a single step in term reduction: it applies a single rewrite rule and returns the resulting combinator. If no reduction is possible, reduce returns the combinator itself, unmodified. The expression is said to be in normal form.

The Apply class is a generic representation of an application. It is also the sole class used to build the structure of expressions.

public class Apply implements Combinator {

   protected Combinator f;
   protected Combinator a;
 
   public Apply(Combinator f, Combinator a) {
      this.f = f;
      this.a = a;
   }

   ...

   public Combinator reduce() {
      Combinator g = f.reduce();
      if (f!=g) return g.apply(a);
      Combinator b = a.reduce();
      if (a!=b) return f.apply(b);
      return this;
   }

}

The class provides a default implementation of the reduce function that offers the semantics of lazy evaluation, also known as outer-most reduction. It tries to reduce the functor of the application. Only if unsuccessful, it continues with trying to reduce the argument of the application.

Given these building blocks, the I combinator can be realized in a simple way.

public class I implements Combinator {

   ...

   public Combinator apply(Combinator a) {
      return new Apply(this, a) {
         public Combinator reduce() {
            return a;
         }
      };
   }

}

The apply function returns an anonymous class extending the Apply class. The anonymous class overrides the reduce function to implement the semantics of the I combinator. Read this as: apply returns a combinator that, when reduced, returns its argument.

In a similar way, the K combinator can be realized.

public class K implements Combinator {

   ...

   public Combinator apply(Combinator a) {
      return new Apply(this, a) {
         public Combinator apply(Combinator b) {
            return new Apply(this, b) {
               public Combinator reduce() {
                  return a;
               }
            };
         }
      };
   }

}

As the K combinator requires two arguments in order to be reduced, nested anonymous classes are used. In this case, the first level overrides the apply function of Apply, while the second level overrides the reduce function.

And we need an extra level of anonymous classes for the implementation of the S combinator.

public class S implements Combinator {

   ...

   public Combinator apply(Combinator a) {
      return new Apply(this, a) {
         public Combinator apply(Combinator b) {
            return new Apply(this, b) {
               public Combinator apply(Combinator c) {
                  return new Apply(this, c) {
                     public Combinator reduce() {
                        return a.apply(c).apply(b.apply(c));
                     }
                  };
               }
            };
         }
      };
   }

}

Conclusion

The combination of a single, generic representation class for application and creating nested anonymous classes in the apply function provides an elegant way to implement primitive combinators. It also significantly simplifies the implementation of currying.

Tuesday, September 16, 2014

Tuple Calculus using Java 8 Streams

About a light-weight approach to processing arbitrary tabular data sets using Java 8 streams. The encapsulation of stream functions and utilities as operations on typed tuple streams offers a coherent basis for processing tabular data sets. The approach extends naturally into a straightforward domain specific language that is easy to explain and understand.

Introduction

Recently I came across the problem of processing arbitrary tabular data sets from various sources. Data sources included query results from relational databases, search engines and a bunch of flat files. Processing had to support filtering, calculated fields, aggregation, joining, sorting ..., basically the full computational power offered by SQL, Relational Algebra or Tuple Calculus.

This post summarizes a solution to this problem using a light-weight domain specific language on top of Java 8 streams. The approach offers an encapsulation of streams where:

Streamed objects are tuples, supporting field access by name and position.
Tuple streams are typed using schemas. Schemas are used to define stream operations too.
Tuple streams are defined as a composition of operations. Operations encapsulate the specifics of and variations in lambda types and stream composition.
These tuple stream defintions are expressed using a light-weight domain specific language.

Benford's Law

In this post I will be using Benford's law to illustrate the concepts and techniques introduced.

Benford's law refers to the frequency distribution of the first digit in real-life, numerical data. One would expect a uniform distribution of digits. Counterintuitively, the distribution is not uniform: digit 1 occurs as the first digit about 30% of the time, while digit 9 occurs as the first digit less than 5% of the time, as predicted by Benford's law.

I will use the World Bank country surface area's 2013 data set and compare the frequencies predicted by Benford's law with the actual observations. The data set provides the surface area of over 200 countries. The data has been prepared in a CSV-file, say country.csv, which starts with the following content:

country	surface:Decimal
Afghanistan	652,230
Albania	28,750
Algeria	2,381,740

The country data can be processed using the following tuple stream definition:

read(‘country.csv’)
.aggregate(
   { digit <- integer(substr(text(surface),0,1)) },
   { observation <- count()/count(*), prediction <- ln(1+1/digit) }
)
.sort({ digit })

The operations that constitute the tuple stream definition are:

The Read generator produces the content of a CSV-file as a stream of tuples. The name of the file is provided as an argument. The header row of the CSV-file is used as the schema definition of the content of the file.
The Aggregate operation groups the data, based on a group schema defining the tuple used for grouping and a value schema defining the aggregated values for each group. The result schema is the concatenation of the key and aggregate schemas.
- The digit field in the group schema is defined as the first character of the conversion of the surface area to text, converted to an integer again.
- The observation field in the value schema defines the relative frequency for each digit, dividing the absolute frequency by the total number of entries.
- The prediction field in the value schema defines the theoretical relative frequency for each digit using Benford's law.
The Sort operation sorts these data on the digit field.

The resulting tuple stream generates the following frequency values:

digit:Integer	observation:Decimal	prediction:Decimal
1	0.276995	0.301030
2	0.178404	0.176091
3	0.122066	0.124939
4	0.112676	0.096910
5	0.079812	0.079181
6	0.061033	0.066947
7	0.070423	0.057992
8	0.042254	0.051153
9	0.056338	0.045757

To compare the observations with the predictions we combine them into a chart:

Clearly, the observed values (blue column chart) nicely match the theoretical frequencies (red line chart). The chi-square test is left as an exercise.

The remainder of this post illustrates how the tuple calculus used in the example can be realized using Java 8 streams.

Building Blocks

Before coming to the actual Java 8 streams, I will introduce some concepts and the basics of their definition in Java.

I will be working with streams of tuples, Stream<Tuple>, where Tuple is an interface. A tuple represents an ordered collection of objects and provides access to these objects by name.

public interface Tuple extends Comparable {
   public Schema getSchema();
   public Object get(String code);
}

Tuples are comparable, using the ordering of the values they contain. Tuples are mutable, in the sense that the values they contain can be updated. For brevity, the setter functions have been left out.

The content of a tuple is described by a Schema. Basically, a schema is an ordered collection of fields. The Schema interface provides positional access to these fields.

public interface Schema {
   public int getSize();
   public Field getField(int i);
}

Besides the access functions mentioned here, a schema provides access to fields by name and some convenience functions to add fields and get a field iterator.

A field is defined by a name and a primitive data type. As schemas are used for multiple purposes, fields are somewhat overloaded and may specify an expression for calculated fields and a sort order for schemas used for sorting and merging streams.

public interface Field {
   public String getCode();
   public AttributeType getType();
   public Expression getExpression();
   public boolean getAscending();
}

An expression is the representation of an algebraic formula that supports the calculation of field values. I will come back on this subject when talking about a domain specific language for tuple streams.

Stream Operations

Now that we have tuples, schemas and fields, we can start talking about operations. Operations are the building blocks of tuple stream definitions. A tuple stream definition is a composition of operations.

An operation typically operates on one or more source operations. An operation that has no source operations is a generator. It can be used as the starting point for a tuple stream definition. I will be talking about generators further on.

Before addressing some examples of operations, we will have a look at the Operation interface.

public interface Operation {
   public Schema getSchema();
   public Stream<Tuple> getStream();
}

An operation provides access to instances of the stream that it defines using the getStream function. It also provides access to the schema of this stream.

As a first example, consider the Filter operation, used for selecting a subset of the tuples in a stream. The selection citerion is provided as an expression that is evaluated for each tuple.

The implementation is straightforward, using the Java 8 Stream.filter function.

public class Filter extends BaseOperation {

   protected Expression condition;

   public Filter(Operation source, String condition) {
      this.condition = StreamUtils.parseExpression(condition);
      addSourceOperation(source);
   }

   public Schema getSchema() {
      return getSourceOperation(0).getSchema();
   }

   public Stream<Tuple> getStream() {
      return getSourceOperation(0).getStream()
         .filter(tuple -> condition.eval(new TupleContext(tuple)));
   }

}

The constructor of the Filter operation takes a source stream and the filter condition as an argument. As mentioned before, I will come back on expressions when talking about a domain specific language for tuple streams. For now, note that the expression is parsed once and evaluated many times, once per tuple that is processed.

The getSchema function is very simple, as the operation works on a single source stream, and the schema of the result stream is the same as the schema of the source stream. The management of source streams is implemented at the level of the BaseOperation class.

The getStream function takes the stream of the source operation and applies a filter using a Predicate that evaluates the condition expression. The TupleContext serves as the evaluation context based on the fields of the source tuple. The evaluation context merely provides a binding of field names to field values for that particular tuple.

As a second example, consider the Select operation, used for selecting fields based on the fields of the source stream. The operation also supports the use of calculated fields, once again using expressions. The fields and calculated fields selected are provided as a schema.

Again, the implementation is straightforward, using the Java 8 Stream.map function.

public class Select extends BaseOperation {

   protected Schema schema;

   public Select(Operation source, Schema schema) {
      this.schema = schema;
      addSourceOperation(source);
   }

   public Schema getSchema() {
      return schema;
   }

   public Stream<Tuple> getStream() {
      return getSourceOperation(0).getStream()
         .map(tuple -> TupleUtils.select(schema, tuple));
   }

}

In this case, the schema of the result stream is the selection schema provided as a constructor argument. The stream of the operation is created based on the stream of the source operation, where the selected tuple fields are calculated out of the tuples of the source stream. The TupleUtils.select function evaluates the expressions in the schema argument based on the field values of the tuple argument.

As a somewhat more complex example, consider the Merge operation, used for merging a number of sorted source streams with identical schemas while preserving the sort order.

The implementation is greatly simplified using the Iterators.mergeSorted function part of Google's guava-libraries. As a result, the implementation primarily consists of wrapping streams as iterators and vice versa.

public class Merge extends BaseOperation {

   protected Schema sortSchema;

   public Merge(Iterable sources, Schema sortSchema) {
      this.sortSchema = sortSchema;
      for (Operation source : sources) {
         addSourceOperation(source);
      }
   }

   public Schema getSchema() {
      return getSourceOperation(0).getSchema();
   }

   public Stream<Tuple> getStream() {
      List<Iterator<Tuple>> iterators = getSourceOperationList()
         .stream()
         .map(operation -> iterator(operation))
         .collect(Collectors.toList());
      Iterator<Tuple> iterator = Iterators.mergeSorted(
         iterators,
         (a, b) -> TupleUtils.compare(sortSchema, a, b)
      );
      return StreamSupport.stream(
         Spliterators.spliteratorUnknownSize(
            operation -> operation.getStream().iterator(),
            Spliterator.ORDERED
         ),
         false
      );
   }

}

The getStream function gets the collection of streams for the source operations using stream primitives and uses Iterators.mergeSorted to merge those iterators. The TupleUtils.compare function compares tuples based on a sort schema. The function then wraps the resulting iterator as a Spliterator and converts this to a stream.

As a last example, consider the Aggregate operation. It is used to aggregate tuples in a stream based on aggregation key and value schemas.

public class Aggregate extends BaseOperation {

   protected Schema aggregateSchema;
   protected Schema valueSchema;
   protected Schema schema;

   public Aggregate(Operation source, Schema aggregateSchema,
      Schema valueSchema) {
      this.aggregateSchema = aggregateSchema;
      this.valueSchema = valueSchema;
      addSourceOperation(source);
   }

   public Schema getSchema() {
      if (schema == null) {
         schema = SchemaUtils.concat(aggregateSchema, valueSchema);
      }
      return schema;
   }

   public Stream<Tuple> getStream() {
      Map<Tuple, Tuple> map = getSourceOperation(0).getStream()
         .collect(supplier(), accumulator(), combiner());
      return map.keySet().stream().map(
         key -> TupleUtils.concat(getSchema(), key, map.get(key))
      );
   }

}

The getSchema function simply returns the concatenation of the key and value schemas. The aggregation is implemented using a Collector based on a Map<Tuple, Tuple>, with the appropriate supplier, accumulator and combiner functions. The result stream is then created by streaming the key tuples in the map with the corresponding value tuples concatenated to them.

Stream Generators

As mentioned before, a generator is an operation that takes no source operations. It can be used as the starting point of a sequence of operations.

As a first example of a generator, consider the Read operation, that defines a tuple stream based on the content of a CSV-file.

public class Read extends BaseOperation {

   protected String file;
   protected Schema schema;

   public Read(String file) {
      this.file = file;
   }

   public Schema getSchema() {
      if (schema == null) {
         schema = SchemaUtils.getSchema(
            Files.lines(Paths.get(file)).findFirst().get()
         );
      }
      return schema;
   }

   public Stream getStream() {
      return Files.lines(Paths.get(file)).skip(1).map(
         line -> TupleUtils.getTuple(schema, line)
      );
   }

}

The schema of the Read operation is defined by the header row in the CSV-file. This line should contain the separated labels of the columns in the CSV-file. Optionally, a label may be followed by a type indication, as in

count:Integer

The stream for the Read operation is created using the file stream features present in Java 8. The first line is skipped, as it is the header line. Subsequent lines are converted to tuples using a utility function.

As a second example, consider the Generate operation, which uses a start schema to generate an initial tuple and a step schema to generate subsequent tuples out of the previous tuple.

public class Generate extends BaseOperation {

   protected Schema startSchema;
   protected Schema stepSchema;

   public Generate(Schema startSchema, Schema stepSchema) {
      this.startSchema = startSchema;
      this.stepSchema = stepSchema;
   }

   public Schema getSchema() {
      return startSchema;
   }

   public Stream getStream() {
      return StreamSupport.stream(
         Spliterators.spliteratorUnknownSize(
            new TupleIterator(startSchema, stepSchema),
            Spliterator.ORDERED
         ),
         false
      );
   }

}

The stream for the Generate operation is created using a TupleIterator, which will evaluate expressions in the start and step schemas to generate tuples. Once again, a Java 8 Spliterator is used to wrap this iterator as a stream.

Coding Convenience

You will have noticed the presence of some convenience classes such as StreamUtils, SchemaUtils and TupleUtils. The use of such convenience classes can be extended to simplify using operations too.

In particular, the creation of schemas can be made less cumbersome by providing a varargs SchemaUtils.create function. Using this function, schemas can be created fairly easily, as in

SchemaUtils.create(
   "radius",
   "circumference <- 2*radius*pi()"
   "surface <- sqr(radius)*pi()"
)

This schema could be used to select a radius field from a source stream and complete the result stream with the circumference and the surface area of a circle with that radius. Again we are using expressions, this time not only to specify calculated values, but also to specify their binding to schema fields.

Another convenience technique is to provide constructor shortcut functions on the BaseOperation class, as in

public class BaseOperation implements Operation {

   public Filter filter(String condition) {
      return new Filter(this, condition);
   }

}

This supports chaining of operations when defining complex tuple streams, as in

BaseOperation
   .read("country.csv")
   .filter("surface > 1000")
   .sort(SchemaUtils.create("name"))

This stream definition processes countries for which at least the name and the surface area is available in a CSV-file by filtering out the ones with a surface area above 1,000 and sorting the result by name.

The Sort operation uses a sort schema to sort a tuple stream using the natural sort order of tuples defined by the sort schema. I will leave the implementation as an exercise.

Towards a Domain Specific Language

The last example illustrates that tuple stream definitions can be formulated in an expressive way given the proper convenience classes and functions. We can simplify these definitions even more by the introduction of a domain specific language for tuple stream definitions. In fact, the expression syntax used so far already constitutes part of such a domain specific language.

Parsing these expressions is done using an operator precedence parser. Such a parser supports term notation and can be configured dynamically to add support for unary and binary operators. The parser creates an abstract syntax tree for parsed expressions. I hope to address the topic of the operator precedence parser in another post.

By adding the appropriate operator definitions, the expression syntax can be extended to:

Support chaining for the construction of operations using the . (dot) operator.
Provide a syntax for schema specifications using curly braces ({}) and the <- (arrow) operator.

More Operations

On top of the operations already mentioned in this post, the entire collection of stream functions provided by the Java 8 Streams library can be lifted to make them available as operations, including concat, distinct, limit, skip ..., each time leveraging the schema and expression features addressed in this post.

In the context of accessing data of many different source systems, one may imagine generators similar to the Read operation, such as a Sql operation to fetch data from a relational database, a Solr operation to fetch data from a solr server, and even a Sax operation to stream the content of an XML-file as a sequence of SAX events.

Presenting aggregated results in a tabular layout can be supported by a Matrix operation that transforms a tuple stream with row labels, column labels and values into a matrix. The inverse operation Unwind (to give it a name) may convert data in a matrix layout to a tuple stream with row labels, column labels and values.

Conclusions

The encapsulation of Java 8 stream functions and utilities as operations on typed tuple streams offers a coherent basis for processing tabular data sets. Java 8 stream features can be lifted to the level of operations, and the approach adds a descriptive level for stream definitions. The focus of operations is mainly on the composition of streams, while Java 8 takes care of the actual processing and resource management.

The approach extends naturally into a straightforward domain specific language that is easy to explain and understand. The resulting formalism is light-weight, but it equals the computational power of SQL, Relational Algebra and Tuple Calculus. It provides a rich set of operations that offer excellent expressivity for data processing in a variety of contexts.