Results for MPI perforance tests on anlspx
Contents
memcpy
Determining delivered memory performance
mpicc -o memcpy -O memcpy.c
Size (bytes) Time (sec) Rate (MB/sec)
4 0.000000 15.614326
8 0.000000 31.225788
16 0.000000 55.669409
32 0.000001 46.609725
64 0.000001 66.798314
128 0.000002 85.264213
256 0.000003 98.939649
512 0.000005 107.567651
1024 0.000009 112.469647
2048 0.000018 115.091828
4096 0.000035 116.449749
8192 0.000070 117.139274
16384 0.000144 114.014488
32768 0.000551 59.483547
65536 0.001122 58.428860
131072 0.002250 58.249905
262144 0.004509 58.140851
524288 0.009269 56.563445
1048576 0.018380 57.050923
2097152 0.036755 57.057403
Determining delivered memory performance with unaligned data
mpicc -o memcpy memcpy.c
Size (bytes) Time (sec) Rate (MB/sec)
4 0.000000 9.601692
8 0.000000 18.493497
16 0.000000 32.225968
32 0.000001 35.667154
64 0.000001 53.442891
128 0.000002 70.933621
256 0.000003 84.811636
512 0.000005 94.007904
1024 0.000010 99.396445
2048 0.000020 102.329821
4096 0.000039 103.862922
8192 0.000078 104.645513
16384 0.000161 101.834478
32768 0.000508 64.497588
65536 0.001015 64.597719
131072 0.002024 64.746096
262144 0.004064 64.498383
524288 0.008238 63.641277
1048576 0.016541 63.394360
2097152 0.033159 63.245237
pingpong
Benchmarking point to point performance
mpicc -o pingpong -O pingpong.c
Kind n time (sec) Rate (MB/sec)
Send/Recv 1 0.000104 0.076765
Send/Recv 2 0.000104 0.153357
Send/Recv 4 0.000107 0.298157
Send/Recv 8 0.000135 0.475562
Send/Recv 16 0.000135 0.947681
Send/Recv 32 0.000148 1.727645
Send/Recv 64 0.000165 3.103093
Send/Recv 128 0.000198 5.167290
Send/Recv 256 0.000261 7.832739
Send/Recv 512 0.000364 11.263187
Send/Recv 1024 0.000575 14.234888
Send/Recv 2048 0.001050 15.609011
Send/Recv 4096 0.001662 19.714225
Send/Recv 8192 0.002737 23.944902
Send/Recv 16384 0.004857 26.988012
Send/Recv 32768 0.008883 29.511789
Send/Recv 65536 0.016951 30.929187
Send/Recv 131072 0.033155 31.626827
Send/Recv 262144 0.065577 31.979914
Send/Recv 524288 0.130593 32.117468
Send/Recv 1048576 0.260617 32.187432
Benchmarking point to point performance with nonblocking operations
mpicc -o pingpong -O pingpong.c
Kind n time (sec) Rate (MB/sec)
Isend/Irecv 1 0.000114 0.070279
Isend/Irecv 2 0.000114 0.140762
Isend/Irecv 4 0.000116 0.276044
Isend/Irecv 8 0.000146 0.439235
Isend/Irecv 16 0.000147 0.873457
Isend/Irecv 32 0.000158 1.619716
Isend/Irecv 64 0.000175 2.917711
Isend/Irecv 128 0.000207 4.955367
Isend/Irecv 256 0.000272 7.518585
Isend/Irecv 512 0.000369 11.111937
Isend/Irecv 1024 0.000588 13.934637
Isend/Irecv 2048 0.001065 15.387651
Isend/Irecv 4096 0.001696 19.318619
Isend/Irecv 8192 0.002752 23.816009
Isend/Irecv 16384 0.004903 26.730636
Isend/Irecv 32768 0.008920 29.388794
Isend/Irecv 65536 0.016999 30.843011
Isend/Irecv 131072 0.033243 31.543117
Isend/Irecv 262144 0.065365 32.083935
Isend/Irecv 524288 0.130599 32.115805
Isend/Irecv 1048576 0.260572 32.193074
Benchmarking point to point performance with nonblocking operations, head-to-head
mpicc -o pingpong -O pingpong.c
Kind n time (sec) Rate (MB/sec)
head-to-head Isend/Irecv 1 0.000147 0.108858
head-to-head Isend/Irecv 2 0.000147 0.217562
head-to-head Isend/Irecv 4 0.000149 0.430426
head-to-head Isend/Irecv 8 0.000235 0.544273
head-to-head Isend/Irecv 16 0.000235 1.090261
head-to-head Isend/Irecv 32 0.000247 2.077066
head-to-head Isend/Irecv 64 0.000259 3.953973
head-to-head Isend/Irecv 128 0.000285 7.183626
head-to-head Isend/Irecv 256 0.000357 11.464556
head-to-head Isend/Irecv 512 0.000513 15.974265
head-to-head Isend/Irecv 1024 0.000862 19.014129
head-to-head Isend/Irecv 2048 0.001575 20.808714
head-to-head Isend/Irecv 4096 0.002742 23.898189
head-to-head Isend/Irecv 8192 0.004617 28.391305
head-to-head Isend/Irecv 16384 0.008475 30.932905
head-to-head Isend/Irecv 32768 0.015942 32.886442
head-to-head Isend/Irecv 65536 0.031141 33.671771
head-to-head Isend/Irecv 131072 0.061472 34.115315
head-to-head Isend/Irecv 262144 0.122062 34.362128
head-to-head Isend/Irecv 524288 0.242440 34.600716
head-to-head Isend/Irecv 1048576 0.484358 34.638043
Benchmarking point to point performance with unaligned data
mpicc -o pingpong -O pingpong.c
Kind char n time (sec) Rate (MB/sec)
Send/Recv 1 0.000101 0.009899
Send/Recv 2 0.000101 0.019865
Send/Recv 4 0.000100 0.039809
Send/Recv 8 0.000101 0.079287
Send/Recv 16 0.000101 0.158201
Send/Recv 32 0.000103 0.310028
Send/Recv 64 0.000132 0.483737
Send/Recv 128 0.000132 0.972394
Send/Recv 256 0.000145 1.770249
Send/Recv 512 0.000157 3.252598
Send/Recv 1024 0.000186 5.503158
Send/Recv 2048 0.000255 8.034128
Send/Recv 4096 0.000371 11.031140
Send/Recv 8192 0.000608 13.470915
Send/Recv 16384 0.001085 15.095072
Send/Recv 32768 0.001681 19.495767
Send/Recv 65536 0.002778 23.588631
Send/Recv 131072 0.004907 26.709256
Send/Recv 262144 0.008975 29.208367
Send/Recv 524288 0.017022 30.800588
Send/Recv 1048576 0.033321 31.468546
Kind double n time (sec) Rate (MB/sec)
Send/Recv 1 0.000102 0.078589
Send/Recv 2 0.000102 0.157390
Send/Recv 4 0.000104 0.306526
Send/Recv 8 0.000134 0.476883
Send/Recv 16 0.000135 0.950038
Send/Recv 32 0.000145 1.762180
Send/Recv 64 0.000163 3.146235
Send/Recv 128 0.000197 5.205945
Send/Recv 256 0.000260 7.888427
Send/Recv 512 0.000366 11.181711
Send/Recv 1024 0.000613 13.361607
Send/Recv 2048 0.001085 15.102550
Send/Recv 4096 0.001672 19.595890
Send/Recv 8192 0.002727 24.028856
Send/Recv 16384 0.004829 27.143453
Send/Recv 32768 0.008875 29.538808
Send/Recv 65536 0.016951 30.929552
Send/Recv 131072 0.033074 31.703769
Send/Recv 262144 0.065646 31.946282
Send/Recv 524288 0.130774 32.072846
Send/Recv 1048576 0.260439 32.209548
Kind int n time (sec) Rate (MB/sec)
Send/Recv 1 0.000101 0.039447
Send/Recv 2 0.000102 0.078611
Send/Recv 4 0.000102 0.156829
Send/Recv 8 0.000104 0.306552
Send/Recv 16 0.000133 0.479641
Send/Recv 32 0.000134 0.952041
Send/Recv 64 0.000143 1.788469
Send/Recv 128 0.000162 3.169263
Send/Recv 256 0.000193 5.291991
Send/Recv 512 0.000258 7.933371
Send/Recv 1024 0.000373 10.974614
Send/Recv 2048 0.000615 13.312477
Send/Recv 4096 0.001082 15.135859
Send/Recv 8192 0.001713 19.126499
Send/Recv 16384 0.002748 23.850028
Send/Recv 32768 0.004868 26.923360
Send/Recv 65536 0.008943 29.312267
Send/Recv 131072 0.017087 30.682568
Send/Recv 262144 0.033327 31.463364
Send/Recv 524288 0.065625 31.956803
Send/Recv 1048576 0.130545 32.129234
Benchmarking point to point performance with contention
mpicc -o pingpong -O pingpong.c
Kind (np=2) n time (sec) Rate (MB/sec)
Send/Recv 1 0.000103 0.077528
Send/Recv 2 0.000103 0.155322
Send/Recv 4 0.000106 0.301088
Send/Recv 8 0.000134 0.478318
Send/Recv 16 0.000135 0.947745
Send/Recv 32 0.000145 1.766161
Send/Recv 64 0.000164 3.122935
Send/Recv 128 0.000198 5.165893
Send/Recv 256 0.000261 7.840484
Send/Recv 512 0.000366 11.195081
Send/Recv 1024 0.000573 14.300114
Send/Recv 2048 0.001057 15.499556
Send/Recv 4096 0.001689 19.396951
Send/Recv 8192 0.002726 24.039763
Send/Recv 16384 0.004796 27.327446
Send/Recv 32768 0.008796 29.801621
Send/Recv 65536 0.016918 30.990318
Send/Recv 131072 0.033286 31.501788
Send/Recv 262144 0.065738 31.901664
Send/Recv 524288 0.130503 32.139417
Send/Recv 1048576 0.260385 32.216110
Kind (np=4) n time (sec) Rate (MB/sec)
Send/Recv 1 0.000101 0.079398
Send/Recv 2 0.000101 0.158582
Send/Recv 4 0.000102 0.313930
Send/Recv 8 0.000132 0.484275
Send/Recv 16 0.000133 0.964484
Send/Recv 32 0.000146 1.757273
Send/Recv 64 0.000164 3.125636
Send/Recv 128 0.000199 5.148546
Send/Recv 256 0.000259 7.917527
Send/Recv 512 0.000355 11.543717
Send/Recv 1024 0.000569 14.399718
Send/Recv 2048 0.001066 15.366364
Send/Recv 4096 0.001708 19.185574
Send/Recv 8192 0.002733 23.981156
Send/Recv 16384 0.004823 27.174686
Send/Recv 32768 0.008872 29.547798
Send/Recv 65536 0.016968 30.898041
Send/Recv 131072 0.033270 31.517624
Send/Recv 262144 0.065520 32.007845
Send/Recv 524288 0.130516 32.136287
Send/Recv 1048576 0.260396 32.214839
Kind (np=8) n time (sec) Rate (MB/sec)
Send/Recv 1 0.000100 0.080152
Send/Recv 2 0.000100 0.159942
Send/Recv 4 0.000102 0.315196
Send/Recv 8 0.000130 0.491317
Send/Recv 16 0.000131 0.973895
Send/Recv 32 0.000143 1.788934
Send/Recv 64 0.000162 3.164189
Send/Recv 128 0.000194 5.267974
Send/Recv 256 0.000259 7.907972
Send/Recv 512 0.000357 11.480220
Send/Recv 1024 0.000567 14.440011
Send/Recv 2048 0.001059 15.477046
Send/Recv 4096 0.001691 19.382897
Send/Recv 8192 0.002739 23.924250
Send/Recv 16384 0.004781 27.412678
Send/Recv 32768 0.008809 29.757938
Send/Recv 65536 0.016814 31.182307
Send/Recv 131072 0.033011 31.764021
Send/Recv 262144 0.065307 32.112307
Send/Recv 524288 0.130001 32.263520
Send/Recv 1048576 0.259408 32.337510
Kind (np=16) n time (sec) Rate (MB/sec)
Send/Recv 1 0.000101 0.078902
Send/Recv 2 0.000102 0.157423
Send/Recv 4 0.000103 0.311658
Send/Recv 8 0.000133 0.479940
Send/Recv 16 0.000134 0.953829
Send/Recv 32 0.000145 1.761530
Send/Recv 64 0.000164 3.127529
Send/Recv 128 0.000195 5.251812
Send/Recv 256 0.000256 7.996487
Send/Recv 512 0.000364 11.260094
Send/Recv 1024 0.000609 13.456808
Send/Recv 2048 0.001228 13.340120
Send/Recv 4096 0.002042 16.049764
Send/Recv 8192 0.003675 17.832440
Send/Recv 16384 0.007427 17.647328
Send/Recv 32768 0.014478 18.106103
Send/Recv 65536 0.028325 18.509588
Send/Recv 131072 0.056647 18.510633
Send/Recv 262144 0.112816 18.589075
Send/Recv 524288 0.225635 18.588905
Send/Recv 1048576 0.451204 18.591620
Kind (np=32) n time (sec) Rate (MB/sec)
Send/Recv 1 0.000105 0.076366
Send/Recv 2 0.000104 0.153164
Send/Recv 4 0.000107 0.299988
Send/Recv 8 0.000134 0.477105
Send/Recv 16 0.000135 0.945916
Send/Recv 32 0.000147 1.743323
Send/Recv 64 0.000161 3.171719
Send/Recv 128 0.000200 5.115524
Send/Recv 256 0.000259 7.895144
Send/Recv 512 0.000368 11.116459
Send/Recv 1024 0.000700 11.704738
Send/Recv 2048 0.001147 14.280638
Send/Recv 4096 0.002830 11.579004
Send/Recv 8192 0.005581 11.742646
Send/Recv 16384 0.010970 11.947800
Send/Recv 32768 0.021665 12.100017
Send/Recv 65536 0.042806 12.247974
Send/Recv 131072 0.085240 12.301449
Send/Recv 262144 0.172683 12.144521
Send/Recv 524288 0.345566 12.137488
Send/Recv 1048576 0.673554 12.454254
barrier
Benchmarking collective barrier
mpicc -o barrier -O barrier.c
Kind np time (sec)
Barrier 1 0.000002
Barrier 2 0.000148
Barrier 4 0.000306
Barrier 8 0.000524
Barrier 16 0.000755
Barrier 32 0.000992
Benchmarking collective Allreduce
mpicc -o barrier -O barrier.c
Kind np time (sec)
Allreduce 1 0.000017
Allreduce 2 0.000263
Allreduce 4 0.000476
Allreduce 8 0.000682
Allreduce 16 0.000910
Allreduce 32 0.001175
vector
Comparing the performance of MPI vector datatypes
mpicc -o vector -O vector.c
Kind n stride time (sec) Rate (MB/sec)
Vector 1000 24 0.001961 4.079713
Struct 1000 24 0.011425 0.700237
User 1000 24 0.001363 5.868871
User(add) 1000 24 0.001368 5.846339
circulate
Pipelining pitfalls
mpicc -c -O circulate.c
mpicc -o circulate -O circulate.o -lm
For n = 20000, m = 20000, T_comm = 0.012559, T_compute = 0.035706, sum = 0.048265, T_both = 0.044894
For n = 500, m = 500, T_comm = 0.000277, T_compute = 0.000885, sum = 0.001162, T_both = 0.001106
3way
Exploring the cost of synchronization delays
mpicc -c -O bad.c
mpicc -o bad -O bad.o -lm
[2] Litsize = 8, Time for first send = 0.000266, for second = 0.000098
[2] Litsize = 9, Time for first send = 0.000278, for second = 0.000097
[2] Litsize = 511, Time for first send = 0.000442, for second = 0.000304
[2] Litsize = 512, Time for first send = 0.000438, for second = 0.000298
[2] Litsize = 513, Time for first send = 0.000439, for second = 0.000303
jacobi
Jacobi Iteration - Example Parallel Mesh
mpicc -c -O jacobi.c
mpicc -c -O cmdline.c
mpicc -c -O setupmesh.c
mpicc -c -O exchng.c
mpicc -o jacobi -O jacobi.o cmdline.o setupmesh.o exchng.o -lm
send/recv: 6 iterations in 0.005936 secs (0.070760 MFlops); diffnorm 0.008134, m=7 n=4 np=1
send/recv: 7 iterations in 0.021281 secs (18.862072 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
send/recv: 24 iterations in 0.018214 secs (0.368939 MFlops); diffnorm 0.009895, m=7 n=10 np=4
send/recv: 25 iterations in 0.314889 secs (18.210837 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
send/recv: 25 iterations in 0.031620 secs (0.885511 MFlops); diffnorm 0.036615, m=7 n=34 np=16
send/recv: 25 iterations in 0.301708 secs (76.025757 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
send/recv: 25 iterations in 0.037695 secs (1.485607 MFlops); diffnorm 0.055291, m=7 n=66 np=32
Jacobi Iteration - Shift up and down
mpicc -c -O jacobi.c
mpicc -c -O cmdline.c
mpicc -c -O setupmesh.c
mpicc -c -O exchng.c
mpicc -o jacobi -O jacobi.o cmdline.o setupmesh.o exchng.o -lm
shift/sendrecv: 6 iterations in 0.002730 secs (0.153832 MFlops); diffnorm 0.008134, m=7 n=4 np=1
shift/sendrecv: 7 iterations in 0.021880 secs (18.346054 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
shift/sendrecv: 24 iterations in 0.021756 secs (0.308887 MFlops); diffnorm 0.009895, m=7 n=10 np=4
shift/sendrecv: 25 iterations in 0.268722 secs (21.339520 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
shift/sendrecv: 25 iterations in 0.034136 secs (0.820245 MFlops); diffnorm 0.036615, m=7 n=34 np=16
shift/sendrecv: 25 iterations in 0.285873 secs (80.236993 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
shift/sendrecv: 25 iterations in 0.054288 secs (1.031526 MFlops); diffnorm 0.055291, m=7 n=66 np=32
shift/sendrecv: 25 iterations in 0.300203 secs (152.814120 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Exchange head-to-head
mpicc -c -O jacobi.c
mpicc -c -O cmdline.c
mpicc -c -O setupmesh.c
mpicc -c -O exchng.c
mpicc -o jacobi -O jacobi.o cmdline.o setupmesh.o exchng.o -lm
head-to-head sendrecv: 6 iterations in 0.002771 secs (0.151548 MFlops); diffnorm 0.008134, m=7 n=4 np=1
head-to-head sendrecv: 7 iterations in 0.021468 secs (18.697882 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
head-to-head sendrecv: 24 iterations in 0.021695 secs (0.309754 MFlops); diffnorm 0.009895, m=7 n=10 np=4
head-to-head sendrecv: 25 iterations in 0.249516 secs (22.982075 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
head-to-head sendrecv: 25 iterations in 0.033141 secs (0.844874 MFlops); diffnorm 0.036615, m=7 n=34 np=16
head-to-head sendrecv: 25 iterations in 0.278263 secs (82.431243 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
head-to-head sendrecv: 25 iterations in 0.039664 secs (1.411877 MFlops); diffnorm 0.055291, m=7 n=66 np=32
head-to-head sendrecv: 25 iterations in 0.316599 secs (144.900061 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Nonblocking send/recv
mpicc -c -O jacobi.c
mpicc -c -O cmdline.c
mpicc -c -O setupmesh.c
mpicc -c -O exchng.c
mpicc -o jacobi -O jacobi.o cmdline.o setupmesh.o exchng.o -lm
irecv/isend: 6 iterations in 0.002726 secs (0.154070 MFlops); diffnorm 0.008134, m=7 n=4 np=1
irecv/isend: 7 iterations in 0.022221 secs (18.064211 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
irecv/isend: 24 iterations in 0.017761 secs (0.378368 MFlops); diffnorm 0.009895, m=7 n=10 np=4
irecv/isend: 25 iterations in 0.307407 secs (18.654102 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
irecv/isend: 25 iterations in 0.032045 secs (0.873775 MFlops); diffnorm 0.036615, m=7 n=34 np=16
irecv/isend: 25 iterations in 0.300050 secs (76.445983 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
irecv/isend: 25 iterations in 0.035585 secs (1.573682 MFlops); diffnorm 0.055291, m=7 n=66 np=32
irecv/isend: 25 iterations in 0.338172 secs (135.656221 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Nonblocking send/recv for receiver pull
mpicc -c -O jacobi.c
mpicc -c -O cmdline.c
mpicc -c -O setupmesh.c
mpicc -c -O exchng.c
mpicc -o jacobi -O jacobi.o cmdline.o setupmesh.o exchng.o -lm
isend/irecv: 6 iterations in 0.003004 secs (0.139797 MFlops); diffnorm 0.008134, m=7 n=4 np=1
isend/irecv: 7 iterations in 0.021650 secs (18.540700 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
isend/irecv: 24 iterations in 0.018179 secs (0.369664 MFlops); diffnorm 0.009895, m=7 n=10 np=4
isend/irecv: 25 iterations in 0.278133 secs (20.617508 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
isend/irecv: 25 iterations in 0.032580 secs (0.859411 MFlops); diffnorm 0.036615, m=7 n=34 np=16
isend/irecv: 25 iterations in 0.328086 secs (69.913382 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
isend/irecv: 25 iterations in 0.038458 secs (1.456116 MFlops); diffnorm 0.055291, m=7 n=66 np=32
isend/irecv: 25 iterations in 0.351832 secs (130.389606 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Synchronous send
mpicc -c -O jacobi.c
mpicc -c -O cmdline.c
mpicc -c -O setupmesh.c
mpicc -c -O exchng.c
mpicc -o jacobi -O jacobi.o cmdline.o setupmesh.o exchng.o -lm
ssend/irecv: 6 iterations in 0.006492 secs (0.064693 MFlops); diffnorm 0.008134, m=7 n=4 np=1
ssend/irecv: 7 iterations in 0.021361 secs (18.791519 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
ssend/irecv: 24 iterations in 0.035334 secs (0.190184 MFlops); diffnorm 0.009895, m=7 n=10 np=4
ssend/irecv: 25 iterations in 0.254951 secs (22.492183 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
ssend/irecv: 25 iterations in 0.043648 secs (0.641491 MFlops); diffnorm 0.036615, m=7 n=34 np=16
ssend/irecv: 25 iterations in 0.292154 secs (78.512132 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
ssend/irecv: 25 iterations in 0.053200 secs (1.052628 MFlops); diffnorm 0.055291, m=7 n=66 np=32
ssend/irecv: 25 iterations in 0.316913 secs (144.756367 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Ready send
mpicc -c -O jacobi.c
mpicc -c -O cmdline.c
mpicc -c -O setupmesh.c
mpicc -c -O exchng.c
mpicc -o jacobi -O jacobi.o cmdline.o setupmesh.o exchng.o -lm
rsend: 6 iterations in 0.000308 secs (1.362862 MFlops); diffnorm 0.008134, m=7 n=4 np=1
rsend: 7 iterations in 0.021335 secs (18.814552 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
/tmp/shfdEMpt0: No space left on device
/tmp/shfesNAN0: No space left on device
/tmp/shfQENHt0: No space left on device
rsend: 25 iterations in 0.279519 secs (82.061065 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
/etc/FRAMES/SP_Scheduler/Queue/tipei:102516471508:tipei:B:34:50:W not found
rsend: 25 iterations in 0.037582 secs (1.490080 MFlops); diffnorm 0.055291, m=7 n=66 np=32
rsend: 25 iterations in 0.292237 secs (156.979384 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Overlapping communication
mpicc -c -O jacobi.c
mpicc -c -O cmdline.c
mpicc -c -O setupmesh.c
mpicc -c -O exchng.c
mpicc -o jacobi -O jacobi.o cmdline.o setupmesh.o exchng.o -lm
isend/overlap: 6 iterations in 0.002733 secs (0.153676 MFlops); diffnorm 0.008134, m=7 n=4 np=1
isend/overlap: 7 iterations in 0.021322 secs (18.825891 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
isend/overlap: 24 iterations in 0.019064 secs (0.352500 MFlops); diffnorm 0.009895, m=7 n=10 np=4
isend/overlap: 25 iterations in 0.254702 secs (22.514132 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
isend/overlap: 25 iterations in 0.029156 secs (0.960341 MFlops); diffnorm 0.036615, m=7 n=34 np=16
isend/overlap: 25 iterations in 0.300965 secs (76.213583 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
isend/overlap: 25 iterations in 0.036606 secs (1.529791 MFlops); diffnorm 0.055291, m=7 n=66 np=32
isend/overlap: 25 iterations in 0.313245 secs (146.451511 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Overlapping communication (sends first)
mpicc -c -O jacobi.c
mpicc -c -O cmdline.c
mpicc -c -O setupmesh.c
mpicc -c -O exchng.c
mpicc -o jacobi -O jacobi.o cmdline.o setupmesh.o exchng.o -lm
send first/overlap: 6 iterations in 0.006601 secs (0.063622 MFlops); diffnorm 0.008134, m=7 n=4 np=1
send first/overlap: 7 iterations in 0.022270 secs (18.024425 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
send first/overlap: 24 iterations in 0.116756 secs (0.057556 MFlops); diffnorm 0.009895, m=7 n=10 np=4
send first/overlap: 25 iterations in 0.281508 secs (20.370304 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
send first/overlap: 25 iterations in 0.030827 secs (0.908292 MFlops); diffnorm 0.036615, m=7 n=34 np=16
send first/overlap: 25 iterations in 0.332135 secs (69.060976 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
send first/overlap: 25 iterations in 0.036385 secs (1.539115 MFlops); diffnorm 0.055291, m=7 n=66 np=32
send first/overlap: 25 iterations in 0.363885 secs (126.070625 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Persistent send/recv
mpicc -c -O jacobi.c
mpicc -c -O cmdline.c
mpicc -c -O setupmesh.c
mpicc -c -O exchng.c
mpicc -o jacobi -O jacobi.o cmdline.o setupmesh.o exchng.o -lm
persistent send/recv: 6 iterations in 0.000302 secs (1.390729 MFlops); diffnorm 0.008134, m=7 n=4 np=1
persistent send/recv: 7 iterations in 0.021336 secs (18.813296 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
persistent send/recv: 24 iterations in 0.016217 secs (0.414377 MFlops); diffnorm 0.009895, m=7 n=10 np=4
persistent send/recv: 25 iterations in 0.255769 secs (22.420229 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
persistent send/recv: 25 iterations in 0.028345 secs (0.987824 MFlops); diffnorm 0.036615, m=7 n=34 np=16
persistent send/recv: 25 iterations in 0.298587 secs (76.820440 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
persistent send/recv: 25 iterations in 0.035549 secs (1.575282 MFlops); diffnorm 0.055291, m=7 n=66 np=32
persistent send/recv: 25 iterations in 0.337602 secs (135.885582 MFlops); diffnorm 0.470684, m=4098 n=66 np=32