Parallel programming in PLINQ

1. September 2008 10:13 by CarlosLoria in General  //  Tags:   //   Comments (0)

In our previous post on parallel programming we mentioned some pointers to models on nested data parallelism (NDP) as offered in Intel’s Ct and Haskell derivatives in the form of a parallel list or array (Post).

We want to complement our set of references by mentioning just another quite interesting project by Microsoft, namely, the Parallel Language Integrated Query, PLINQ (J. Duffy) developed as an extension of LINQ. PLINQ is a component of FX. We base this post on the indicated reference.

We find this development particularly interesting given the potentially thin relation with pattern-matching (PM) that we might be exploiting in some particular schemas, as we suggested in our post; hence, it will be interesting to take that possibility into consideration for such a purpose. That would be the case if we eventually aim at PM based code on (massive) data sources (i.e. by means of rule-based programs).

As you might know LINQ offers uniformed and natural access to (CLR supported) heterogeneous data sources by means of an IEnumerable<T> interface that abstracts the result of the query. Query results are (lazily) consumable by means of iteration as in a foreach(T t in q) statement or directly by forcing it into other data forms. A set of query operators -main of them with a nice syntax support via extensions methods- are offered (from, where, join, etc) that can be considered phases on gathering and further filtering, joining, etc, input data sources.

For PLINQ, as we can notice, the path followed is essentially via a special data structure represented by the type IParallelEnumerable<T> (that extends IEnumerable<T>). For instance, in the following snippet model given by the author:

IEnumerable<T> data = ...;
var q = data.AsParallel().Where(x => p(x)).Orderby(x => k(x)).Select(x => f(x));
foreach (var e in q) a(e);

A call to extension method AsParallel is apparently the main requirement from the programmer point of view. But, in addition, three models of parallelism are available to in order to process query results: Pipelined, stop-and-go and inverted enumeration. Pipelined synchronizes query dedicated threads to (incrementally) feed the separated enumeration thread. This is normally the default behavior, for instance under the processing schema:

foreach (var e in q) {
    a(e);
}

Stop-and-go joins enumeration so that it waits until query threads are done. In some cases this is chosen, when the query is completely forced.

Inverted enumeration uses a lambda expression provided by the user which to be applied to each element of the query. That avoids any synchronization issue but requires using the special PLINQ ForAll extension method not (yet?) covered by the sugared syntax. As indicated by the author, in the following form:

var q = ... some query ...;
q.ForAll(e => a(e));

As in other cases, however, side-effects must be avoided under this model, but this is just programmer responsibility.