pgen - generate code for shared address space multiprocessors
pgen [ options ] infile outfile
The pgen program transforms SUIF code to run in parallel on shared address space multiprocessors. It restructures the code and inserts calls to the parallel runtime library. Pgen reads annotations (created by previous passes) from the SUIF code that tell it where and how to modify the code. These annotations fall into three categories: parallelization and scheduling, synchronization and variable scoping. If none of these annotations are found in the code, then the input file will be read and written back again without any modifications.
This pass expects two file specifications on the command line, first for the input file, then for the output file.
-check-work
Generate code that checks if the amount of work in
a parallel loop is above certain thresholds before
running it.
Setting the number of processors to 0 means the number of processors is unknown at compile-time, and will be determined at run-time. The default is 0.
begin_parallel_region
end_parallel_region
These annotations can be placed on any tree_node.
All tree_nodes between (and including) the
begin_parallel_region and end_parallel_region"
annotations will be run in parallel on different
processors.
doall This annotation can only be placed on TREE_FORs. The doall annotation is short-hand for a parallel region comprised of a single TREE_FOR. The iterations of the loop are statically scheduled across the processors in a blocked fashion. Any doall" annotations on TREE_FORs nested inside within this TREE_FOR will be stripped off (i.e. pgen will not generate nested parallel regions). This annotation is created by skweel. Only one of the annotations doall", comp_decomp and loop_cyclic is allowed within a parallel region.
comp_decomp named_symcoeff_ineq
This annotation can only be placed on TREE_FORs.
Schedule iterations of the TREE_FOR across the processors.
The named_symcoeff_ineq specifies the
mapping from the iterations to processors. The
dependence library is called to generate the new
bounds of the TREE_FOR for the given inequality.
Only one of the annotations doall", comp_decomp"
and loop_cyclic is allowed within a parallel
region.
loop_cyclic dimension offset
This annotation can only be placed on TREE_FORs.
Schedule the loop using a cyclic mapping in the
processor dimension given by the integer dimension.
Currently, only dimensions of 0, 1 or 2 are
accepted. The integer offset gives the starting
offset of the loop. Only one of the annotations
doall", comp_decomp and loop_cyclic is allowed
within a parallel region.
guard var+
This annotation can only be placed on TREE_FORs.
Put an IF statement around the TREE_FOR so that the
loop is only executed if the var_syms var are equal
to the processor that owns that data written within
the loop. This annotation can only be used in conjunction
with the comp_decomp annote.
doacross dimension direction type
This annotation can only be placed on TREE_FORs.
Generate counter synchronization around this loop.
The integer dimension specifies the processor
dimension in which to place the synchronization.
Currently, only dimensions of 0, 1 or 2 are
accepted. The integer direction gives the offset
of the processor to wait on. The string type gives
the scheduling type of the loop. Currently, the
only kind accepted is block".
tile_loops depth tripsize+
This annotation can only be placed on TREE_FORs.
Tile the loop nest (the outermost loop in the nest
is the loop with the annotation). An integer trip_size
must be given for each loop in the nest, and
specifies the size of the tile for that loop. A
trip of 1 in the outer loop will cause the tile to
be coalesced. For a loop nest of depth n, standard
tiling creates 2n loops. If the tile is coalesced,
then 2n-1 loop are generated.
lock locknum
unlock locknum
This annotation can be placed on any tree_node.
Generates lock or unlock statements after the
tree_node with the annote. The integer locknum is
the number of the lock variable.
global_barrier
This annotation can be placed on any tree_node.
Generate a barrier statement after the tree_node
with the annote. This generates a global barrier
that makes all the processors wait at the barrier.
Unique barriers are generated for each barrier
annotation within a parallel region.
reduced type var
This annotation can only be placed on TREE_FORs.
Create a private copy of the var_sym var on each
processor. After the loop, perform a global reduction
of the kind given by the string type. Currently
supported reduction types include sum, product,
max and min. Multiple reduced annotations
on a single TREE_FOR are allowed. This annotation
is created by skweel.
privatized var+
This annotation can only be placed on TREE_FORs.
Create a private copy of the var_syms var on each
processor. Multiple privatized annotations on a
single TREE_FOR are allowed. This annotation is
created by skweel.
None.
After running SUIF code through pgen, the resulting program must link in the runtime library for the target machine. In addition, FORTRAN programs must link in the F77_doall and I77_doall libraries. The F77_doall and I77_doall libraries replace the F77 and I77 libraries, respectively, used for sequential FORTRAN programs.
The pgen program was written by Jennifer Anderson.