This project was stopped because no PhD student was available. However, a library has been developed that can be used by a parser or programmer: see the SPMDlib pages.
Keywords: parallel processing, programming tools, preprocessor, directives, loop optimization, virtual topologies, speedup estimation, SPMD architectures
Like manufacturers of SMP and other architectures have compiler
directives for loop parallelization, e.g. SGI's C$DOACCROSS, the idea is to develop
a set of directives for SPMD message passing that can be expanded into full
source code by means of a translator or preprocessor.
The basic idea is that message passing almost
always implies the use of the same code fragments, and high-level directives will
free the programmer of this, saving a lot of time.
At this moment there are two parallelization
levels under study: independent code blocks that can be executed in parallel and
DO loops (the code blocks may contain DO loops to be executed on node sub-partitions).
The main application will be image processing, where there are many processing loops
over pixel arrays, but ParTools might turn important for other areas as well.
Our aim is to develop tools to convert semi-automatically original sequential
Fortran77/90 code with preprocessor directives into a new 77/90
code that compiles and runs under EPX on the Parsytec CC etc.
To this end we develop a library of preprocessor directives to be inserted
by the programmer. Aspects to be covered are:
- selection of virtual topology
- independent code block execution
- loop parallelization
- automatic variable and array updating
- selection of the number of processors to be used
- minimum speedup estimation
If you want to see how it might work, with programming examples, have a look at the f2cc simulated manpage. NOTE: this is just an idea and we are thinking hard about what to include and how... If you have any ideas, please let us know!
To be discussed...
One of the preprocessor directives to be created concerns the estimation of the parallelization speedup if the programmer is not certain about the cost of all communications relative to the gain of splitting loops and code blocks. For a given environment with a number of processors, a certain virtual topology and array size, the idea is to analyse the code to be executed in parallel and to estimate the speedup for different numbers of processors. Based on these estimates a decision can be taken whether to parallelize or not.
Is there any virtual topology better than the other ones for a given problem? For minimizing the communication overheads it would be nice if the preprocessor could give advice about the programmer's choice as indicated by his directive or to help the indecisive user and suggest some topology. The preprocessor could produce speedup estimates for e.g. star and ring topologies and ask a confirmation from the programmer. Another question is whether there is a way to recognize a preferred topology for the given data structure(s). Example: to determine the maximum of a big array can be done by means of a star topology, but the addition of two arrays plus the updating on all nodes can be done in a ring. These two operations could be done in a single loop using both topologies at the same time.
Please send comments to Hans du Buf.
Last update: June 2001