Project ParTools:
Tools for semi-automatic parallelization in SPMD

This project was stopped because no PhD student was available. However, a library has been developed that can be used by a parser or programmer: see the SPMDlib pages.

Keywords: parallel processing, programming tools, preprocessor, directives, loop optimization, virtual topologies, speedup estimation, SPMD architectures


Brief description

Like manufacturers of SMP and other architectures have compiler directives for loop parallelization, e.g. SGI's C$DOACCROSS, the idea is to develop a set of directives for SPMD message passing that can be expanded into full source code by means of a translator or preprocessor. The basic idea is that message passing almost always implies the use of the same code fragments, and high-level directives will free the programmer of this, saving a lot of time.
At this moment there are two parallelization levels under study: independent code blocks that can be executed in parallel and DO loops (the code blocks may contain DO loops to be executed on node sub-partitions).
The main application will be image processing, where there are many processing loops over pixel arrays, but ParTools might turn important for other areas as well.

  • SPMD programming paradigm:

    Our aim is to develop tools to convert semi-automatically original sequential Fortran77/90 code with preprocessor directives into a new 77/90 code that compiles and runs under EPX on the Parsytec CC etc. To this end we develop a library of preprocessor directives to be inserted by the programmer. Aspects to be covered are:
    - selection of virtual topology
    - independent code block execution
    - loop parallelization
    - automatic variable and array updating
    - selection of the number of processors to be used
    - minimum speedup estimation

    If you want to see how it might work, with programming examples, have a look at the f2cc simulated manpage. NOTE: this is just an idea and we are thinking hard about what to include and how... If you have any ideas, please let us know!

  • Loop optimization:

    To be discussed...

  • Minimum speedup requirement:

    One of the preprocessor directives to be created concerns the estimation of the parallelization speedup if the programmer is not certain about the cost of all communications relative to the gain of splitting loops and code blocks. For a given environment with a number of processors, a certain virtual topology and array size, the idea is to analyse the code to be executed in parallel and to estimate the speedup for different numbers of processors. Based on these estimates a decision can be taken whether to parallelize or not.

  • Topology:

    Is there any virtual topology better than the other ones for a given problem? For minimizing the communication overheads it would be nice if the preprocessor could give advice about the programmer's choice as indicated by his directive or to help the indecisive user and suggest some topology. The preprocessor could produce speedup estimates for e.g. star and ring topologies and ask a confirmation from the programmer. Another question is whether there is a way to recognize a preferred topology for the given data structure(s). Example: to determine the maximum of a big array can be done by means of a star topology, but the addition of two arrays plus the updating on all nodes can be done in a ring. These two operations could be done in a single loop using both topologies at the same time.


    Back to the Vision Laboratory


    Please send comments to Hans du Buf.
    Last update: June 2001