skweel - optimize loop-level parallelism and locality
skweel [ options ] infile outfile
The skweel program analyzes and transforms the code to optimize loop-level parallelism and locality.
The transformations skweel performs includes a general class of loop transformations known as unimodular transformations". These transformations include loop interchange, loop skewing and loop reversal. Skweel can also tile loop nests for cache locality.
This pass expects two file specifications on the command line, first for the input file, then for the output file. When scc is run with the -parallel or -multi flags, skweel will be run. In both cases, skweel is called with the options -P -T -i.
-A n Print out the SUIF code before and/or after transformation. If the integer n is set to 1, the SUIF code is printed before transformation. If it is set to 2, then the code is printed after transformation. The code is printed both before and after transformation if n is set to 3.
privatizable var*
This annotation is placed on TREE_FOR nodes by the
scalar expansion pass of moo and oynk. Private
copies of the var_syms var can be made for each
iteration of the loop nest. This information is
used to parallelize loops.
need_finalization var*
This annotation is placed on TREE_FOR nodes by the
scalar expansion pass of oynk. Finalization is
needed in order for the var_syms var to privatized.
Currently, skweel will not privatize any variables
that need finalization.
iv_live
This annotation is placed on TREE_FOR nodes by the
scalar expansion pass of oynk. The induction variable
for this loop is read after exiting the loop.
If a loop has a live induction variable, skweel
will not consider transforming the loop.
pure function
This annotation is placed on the proc_sym for FORTRAN
intrinsic functions that are known to have no
side-effects. Skweel will not parallelize loops
that contain calls to functions that do not have
pure function annotations.
reduction type var
This annotation is placed on TREE_FOR nodes and
instructions by reduction. A reduction of the commutative
operation type is calculated over the
var_sym var. Currently supported reduction types
include sum, product, max and min.
C pragma doall
doall These annotations are generated by the front-end
from pragmas in the source code. They allow users
to explicitly parallelize loops that skweel
wouldn't otherwise parallelize. The annotations
are placed on mrk instructions. When skweel sees
one of these annotations, it puts a doall annotation
on the closest TREE_FOR following the mrk
instruction. The TREE_FOR must be in the same list
as the mrk instruction. Subsequent passes (e.g.
pgen ) will now treat the loop as parallel.
begin_fully_permutable
end_fully_permutable
These annotations are placed on TREE_FOR nodes to
mark the boundaries of fully permutable loop nests.
If a loop nest is fully permutable then any permutation
of the loops within the nest is legal, and
the loop nest can be tiled.
doall This annotation is placed on TREE_FOR nodes that
skweel determines can be legally run in parallel. The doall annotation is read by pgen.
privatized vars+
This annotation is placed on TREE_FOR nodes, and is
used in conjunction with the doall annotation.
The var_syms var must be made private in order for
the code to run correctly in parallel. The privatized"
annotation is read by pgen.
reduced type var
This annotation is placed on TREE_FOR nodes, and is
used in conjunction with the doall annotation. A
reduction of the commutative operation type must be
calculated over the var_sym var in order for the
code to run correctly in parallel. The reduced"
annotation is read by pgen.
small_bound
This annotation is placed on TREE_FOR nodes. The
loop was not marked with a doall annotation
because the number of iterations in the loop was
deemed too small (using the parameter specified
with the -y flag.
fixfortran(1) , oynk(1) , moo(1) , pgen(1) , reduction(1) , scc(1)
Michael E. Wolf. Improving Locality and Parallelism in Nested Loops", Ph.D. thesis, Stanford University, Computer Systems Laboratory, August, 1992.
M. E. Wolf and M. S. Lam. A Loop Transformation Theory and An Algorithm to Maximize Parallelism", IEEE Transac_tions on Parallel and Distributed Systems, October, 1991.
M. E. Wolf and M. S. Lam. A Data Locality Optimizing Algorithm", Proceedings of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation, June, 1991.
The original parallelism and locality optimizer for the old SUIF system was written by Michael Wolf. Jennifer Anderson translated it to new SUIF, and added some features.