Go to the first, previous, next, last section, table of contents.
In the SUIF vs. KAP tables (see section Performance Results), a number of the KAP only loops are due to SUIF parallelizing an outer loop while KAP is parallelizing an inner loop. However, there are a number of limitations in the current SUIF compiler that cause it to miss some loops that KAP is able to parallelize.
SUIF does not perform loop distribution, whereas the KAP compiler does. This accounts for more of the KAP only loops than any of the other limitations listed below. For example, the following loop from the Perfect club benchmark APS.f performs I/O in the loop and thus is not parallelizable:
DO 30 K=1,NZ UM(K)=REAL(WM(K)) VM(K)=AIMAG(WM(K)) WRITE(6,40) K,ZET(K),UG(K),VG(K),TM(K),DKM(K),UM(K),VM(K) WRITE(8,40) K,ZET(K),UG(K),VG(K),TM(K),DKM(K),UM(K),VM(K) 30 CONTINUE
However, KAP is able to parallelize some of the statements in the loop by applying loop distribution:
DO 2 K=1,NZ UM(K)=REAL(WM(K)) VM(K)=AIMAG(WM(K)) 2 CONTINUE DO 3 K=1,NZ WRITE(6,40) K,ZET(K),UG(K),VG(K),TM(K),DKM(K),UM(K),VM(K) WRITE(8,40) K,ZET(K),UG(K),VG(K),TM(K),DKM(K),UM(K),VM(K) 3 CONTINUE
The DO 2 K
loop can now be parallelized, and the DO 3 K
loop
runs sequentially.
Another reason KAP is able to parallelize loops that SUIF cannot is due to the way SUIF handles FORTRAN equivalences. SUIF carries no information about which variables could potentially refer to the same memory location. Accesses to equivalenced variables are represented as offsets from a base memory location. The parallelizer, upon seeing these offsets, assumes that any location in memory could potentially be touched, and thus cannot parallelize the loop. This problem shows up in the SPEC92 benchmark wave5.f, for example. SUIF's handling of equivalences is an artifact of our FORTRAN front end being based on the FORTRAN-to-C converter `f2c'.
Each of the remaining SUIF limitations listed below have a lesser impact than loop distribution and equivalences. They each account for only a small number of the KAP only loops.
SUIF currently does not normalize loops with symbolic step sizes. If a loop is not normalized, the parallelizer will not consider parallelizing the loop.
SUIF's dependence analysis is based on linear affine functions. It will not take the loop bounds into account if they are non-linear. For example, in the following loop from the SPEC92 benchmark wave5.f:
NS2 = (N+1)/2 NP2 = N+2 ... DO 102 K=2,NS2 KC = NP2-K XH(K) = W(K-1)*X(KC)+W(KC-1)*X(K) XH(KC) = W(K-1)*X(K)-W(KC-1)*X(KC) 102 CONTINUE
SUIF replaces NS2
with (N+1)/2
and KC
with
N+2-K
, leaving the following code:
DO 102 K=2,(N+1)/2 KC = NP2-K XH(K) = W(K-1)*X(N+2-K)+W(N-K+1)*X(K) XH(N+2-K) = W(K-1)*X(K)-W(N-K+1)*X(N+2-K) 102 CONTINUE
However, since SUIF cannot determine whether N+1
is even, the
function (N+1)/2
is non-linear. The dependence library
then does not use the bounds to determine that the accesses to
XH(K)
and X(N+2-K)
are independent.
We have extended SUIF's dependence analysis to handle simple non-linear array index functions. However, this analysis will not work when there are multiple symbolic coefficients. For example in the following loop from the Perfect club benchmark APS.f:
DO 30 K=1,NZTOP DO 20 J=1,NY DO 10 I=1,NX L=L+1 DCDX(L)=-(UX(L)+UM(K))*DCDX(L)-(VY(L)+VM(K))*DCDY(L)+Q(L) 10 CONTINUE 20 CONTINUE 30 CONTINUE
The auxiliary induction variable L
is replaced with a function of
the loop indices, and the resulting accesses to the array DCDX
become DCDX(NY * NX * (K-1) + NX * (J-1) + I - 1)
. Because
of the multiple symbolic coefficients NY
and NX
, the
dependence library is not able to analyze the expression.
SUIF's scalar analysis uses only localized information, and does not perform symbolic analysis. For example, symbolic analysis is needed for the following loop from the Perfect benchmark WSS.f:
14 KBOT = KK - 1 KTOP = KK ... code containing IF statements deleted ... DO 18 K=KK,KTOP 18 Q(K) = Q(KBOT)
KAP is able replace KBOT
with KK-1
and can thus determine
that the two accesses to array Q
are independent.
KAP is able to copy references to array elements into scalars if they
are only accessed by constant indices within a loop. It is then able to
privatize the scalars, and parallelize the loop. SUIF's porky
-scalarize
pass will only turn array elements into scalars if they are
accessed by constant indices throughout the entire program.
SUIF's FORTRAN front-end, `sf2c', converts some FORTRAN intrinsics
into procedures that are not pure functions. Thus any loops containing
calls to these procedures are not parallelized. For example, the
intrinsic EXP
(on complex values) gets translated to a procedure
with two arguments in the `F77' library.
KAP is able to handle some forms of reductions that SUIF cannot yet handle. For example, consider the following loop from the Perfect club benchmark OCS.f:
DO 30 I=2,N2P 30 WORK(1) = MAX(WORK(1), WORK(I))
Elements WORK(2)
through WORK(N2P)
are reduced into
WORK(1)
. Currently, SUIF will not find reductions of arrays into
a subsection of the array.
SUIF is not be able to recognize some reductions over complex variables.
This is due to the fact that SUIF represents complex variables as arrays
of size 2 -- one element for the complex part and one for the real part.
Also, the front-end often generates extra temporaries for these complex
variables. The array references to complex variables are turned into
scalars by the porky -scalarize
pass. However, the reduction
recognition pass cannot handle the indirect reductions through the
temporaries.
Finally, KAP performs conditional parallelization, whereas SUIF does not. KAP will duplicate a loop to create a parallel version and a sequential version. For example, consider the following loop from the SPEC92 benchmark su2cor.f:
DO 40 I=0,249 IREG(I)=IREG(I+LVEC) 40 CONTINUE
This loop is sequential only when LVEC <= 249
, and
thus the second loop below can run in parallel:
IF (LVEC .LE. 249) THEN C Sequential DO 40 I=0,249 IREG(I)=IREG(I+LVEC) 40 CONTINUE ELSE C Parallel DO 2 I=0,249 IREG(I) = IREG(I+LVEC) 2 CONTINUE END IF
Go to the first, previous, next, last section, table of contents.