next up previous
Next: 7 Planned Features Up: Blocking Techniques for HPF Previous: 5 HPF Blocking with

Subsections

6 Performance Results

This Section shows some example programs that benefit from blocking techniques.

6.1 Matrix Multiplication

The following algorithm implements a matrix multiplication for square matrices. In the three-dimensional loop nest the array A is traversed $N$ times. If the array A does not fit in the cache, a high rate of cache misses can be observed. Though prefetch techniques can be applied, the performance is not very high.

      double precision, dimension (N, N) :: A, B, C
!hpf$ distribute A(*,block)        ! one-dimensional blocking
!hpf$ distribute A(block,block)    ! two-dimensional blocking
      integer I, J, K

      C = 0.0d0
!hpf$ parallel, reduction(C)
      do J = 1, N
!hpf$ independent, on home (A(:,K))
        do K = 1, N
!hpf$ independent, on home (A(I,K))
          do I = 1, N
            C(I,J) = C(I,J) + A(I,K)*B(K,J)
          end do
        end do
      end do

By the block distribution of the array A it will be guaranteed that always one block of array A can fit in the cache. This results in much better performance.


Table 1: One-dimensional HPF blocking of matrix multiplication ($N=850$).
  WALL_TIME MFLOPS LD_INS L1_MISS L2_MISS Bus
NB ms   M M M M / s
ori 3478.6 336.73 1174.12 223.54 24.93 42.47
1 3484.0 336.21 1174.12 223.54 24.89 42.38
16 1558.2 751.85 1178.72 224.62 5.71 19.90
24 1196.0 979.69 1180.68 224.66 2.41 9.89
32 1197.5 978.53 1182.64 224.09 2.68 10.53
64 1374.5 852.82 1190.46 224.92 4.94 16.20



Table 2: Two-dimensional HPF blocking of matrix multiplication ($N=850$).
  WALL_TIME MFLOPS LD_INS L1_MISS L2_MISS Bus
NB ms   M M M M / s
ori 3478.6 336.73 1174.12 223.54 24.93 42.47
1x1 3479.3 336.66 1175.87 223.63 24.97 42.50
24x1 1476.7 793.88 1199.63 158.10 4.12 14.89
32x1 1149.5 1020.13 1203.55 157.14 1.83 8.95
1x32 1203.0 974.74 1203.55 157.46 2.24 10.59
8x8 1216.8 964.73 1235.73 161.02 1.68 8.53
8x4 1086.8 1078.90 1203.55 156.79 1.39 7.19


6.2 Gauss Algorithm

Gauss algorithm:

      integer, parameter :: N = 1000
      double precision :: A(N,N), B(N)
!hpf$ distribute (*,block) :: A
!hpf$ distribute (block) :: B
      ...
!hpf$ parallel 
      do I = 1, N-1
         Q = 1.0d0/A(I,I)
!hpf$    independent
         do J  = I+1, N
            B(J)=B(J)-B(I)*(A(J,I)*Q)
            do K=I+1,N
              A(K,J)=A(K,J)-(A(K,I)*Q)*A(I,J)
            end do
         enddo
      end do


Table 3: HPF blocking of the GAUSS algorithm.
  WALL_TIME MFLOPS LD_INS L1_MISS L2_MISS Bus
NB ms   M M M M / s
ori 3535.0 359.01 956.55 117.65 36.46 46.44
1 3677.0 345.81 957.24 117.00 36.47 44.64
8 2904.7 473.81 959.28 116.53 30.10 46.50
16 1446.7 879.14 961.55 115.97 9.87 30.71
32 829.6 1533.36 966.08 116.42 1.10 8.13
40 796.2 1597.80 968.35 116.17 0.56 5.97
48 805.7 1579.18 970.62 116.25 0.60 6.30
64 840.1 1514.76 975.15 116.94 0.76 7.75
128 960.1 1326.52 993.29 119.32 1.38 12.46
256 1200.4 1062.78 1029.58 123.08 2.66 19.14
512 1698.4 753.55 1102.14 133.30 5.20 26.41
1000 2627.9 414.91 1240.46 141.54 10.31 33.86
int 2186.7 581.50 957.02 118.66 10.08 39.71


6.3 Sieving Primes

      logical, dimension (N) :: A
      integer :: S
!hpf$ distribute A (block)
      ...
      S = 0
!hpf$ parallel, reduction(S) begin
      A = .true.
      A(1) = .false.
      do K = 2, N1
         if (A(K)) then
            where (A(K*K:N:K)) A(K*K:N:K) = .false.
         end if
      end do
!hpf$ independent
      do I = 1, N
         if (A(I)) S = S + 1
      end do
!hpf$ end parallel


Table 4: Sieving prime numbers ($N=100.000.000$).
  WALL_TIME LD_INS L1_MISS L2_MISS Bus
NB ms M M M M / s
ori 10996.8 417.08 193.89 137.95 50.37
1 10979.8 327.51 193.46 138.32 50.46
700 6194.3 602.72 210.17 50.46 32.85
800 3057.8 642.07 212.58 7.06 10.54
900 2794.2 681.43 214.75 3.67 6.78
1000 2798.7 720.79 216.87 3.23 7.16
1300 2806.7 838.88 223.25 2.82 5.22
1500 2879.4 917.61 227.40 2.60 4.50
2000 2952.2 1114.42 234.82 2.09 3.56
2500 3187.8 1311.23 245.38 1.68 2.78



next up previous
Next: 7 Planned Features Up: Blocking Techniques for HPF Previous: 5 HPF Blocking with
Thomas Brandes 2004-03-18