Results for MPI perforance tests on anlspx (nopoll)
Contents
memcpy
Determining delivered memory performance
Target memcpy is up to date.
Size (bytes) Time (sec) Rate (MB/sec)
4 0.000000 15.613092
8 0.000000 31.221766
16 0.000000 55.669211
32 0.000001 46.609181
64 0.000001 66.798034
128 0.000002 85.263756
256 0.000003 98.939044
512 0.000005 107.566209
1024 0.000009 112.468054
2048 0.000018 115.090994
4096 0.000035 116.448907
8192 0.000070 117.139258
16384 0.000143 114.196600
32768 0.000563 58.247751
65536 0.001122 58.404997
131072 0.002246 58.348876
262144 0.004512 58.102506
524288 0.009135 57.392694
1048576 0.018317 57.246056
2097152 0.036738 57.084389
Determining delivered memory performance with unaligned data
Target memcpy is up to date.
Size (bytes) Time (sec) Rate (MB/sec)
4 0.000000 9.600770
8 0.000000 18.491915
16 0.000000 32.223567
32 0.000001 35.664054
64 0.000001 53.442179
128 0.000002 70.932362
256 0.000003 84.811184
512 0.000005 94.007348
1024 0.000010 99.397056
2048 0.000020 102.330468
4096 0.000039 103.861556
8192 0.000078 104.644125
16384 0.000160 102.084971
32768 0.000494 66.357171
65536 0.000986 66.483666
131072 0.001971 66.499975
262144 0.003946 66.438735
524288 0.007998 65.555461
1048576 0.016075 65.231249
2097152 0.032440 64.647949
pingpong
Benchmarking point to point performance
Target pingpong is up to date.
Kind n time (sec) Rate (MB/sec)
Send/Recv 1 0.000113 0.071106
Send/Recv 2 0.000112 0.142856
Send/Recv 4 0.000114 0.280479
Send/Recv 8 0.000356 0.179938
Send/Recv 16 0.000296 0.432699
Send/Recv 32 0.000239 1.072867
Send/Recv 64 0.000220 2.329859
Send/Recv 128 0.000224 4.572158
Send/Recv 256 0.000282 7.254372
Send/Recv 512 0.000386 10.606936
Send/Recv 1024 0.000586 13.985489
Send/Recv 2048 0.001079 15.186716
Send/Recv 4096 0.001693 19.350134
Send/Recv 8192 0.002841 23.071283
Send/Recv 16384 0.004899 26.757443
Send/Recv 32768 0.008930 29.356294
Send/Recv 65536 0.016871 31.076124
Send/Recv 131072 0.033063 31.714904
Send/Recv 262144 0.065363 32.084886
Send/Recv 524288 0.129922 32.283269
Send/Recv 1048576 0.259748 32.295197
Benchmarking point to point performance with nonblocking operations
Target pingpong is up to date.
Kind n time (sec) Rate (MB/sec)
Isend/Irecv 1 0.000124 0.064615
Isend/Irecv 2 0.000123 0.130093
Isend/Irecv 4 0.000125 0.255099
Isend/Irecv 8 0.000352 0.181587
Isend/Irecv 16 0.000299 0.427617
Isend/Irecv 32 0.000255 1.005344
Isend/Irecv 64 0.000223 2.292965
Isend/Irecv 128 0.000234 4.370932
Isend/Irecv 256 0.000294 6.973398
Isend/Irecv 512 0.000393 10.415106
Isend/Irecv 1024 0.000597 13.728842
Isend/Irecv 2048 0.001090 15.030847
Isend/Irecv 4096 0.001777 18.437864
Isend/Irecv 8192 0.002806 23.355353
Isend/Irecv 16384 0.004897 26.766801
Isend/Irecv 32768 0.008954 29.275154
Isend/Irecv 65536 0.017182 30.514504
Isend/Irecv 131072 0.033315 31.474733
Isend/Irecv 262144 0.065434 32.049668
Isend/Irecv 524288 0.130724 32.085095
Isend/Irecv 1048576 0.260616 32.187671
Benchmarking point to point performance with nonblocking operations, head-to-head
Target pingpong is up to date.
Kind n time (sec) Rate (MB/sec)
head-to-head Isend/Irecv 1 0.000292 0.054800
head-to-head Isend/Irecv 2 0.000294 0.108787
head-to-head Isend/Irecv 4 0.000296 0.216286
head-to-head Isend/Irecv 8 0.000568 0.225397
head-to-head Isend/Irecv 16 0.000541 0.472838
head-to-head Isend/Irecv 32 0.000551 0.928446
head-to-head Isend/Irecv 64 0.000514 1.992031
head-to-head Isend/Irecv 128 0.000573 3.571522
head-to-head Isend/Irecv 256 0.000611 6.704130
head-to-head Isend/Irecv 512 0.000530 15.471194
head-to-head Isend/Irecv 1024 0.000956 17.137628
head-to-head Isend/Irecv 2048 0.001948 16.823298
head-to-head Isend/Irecv 4096 0.003492 18.765319
head-to-head Isend/Irecv 8192 0.005022 26.098782
head-to-head Isend/Irecv 16384 0.008555 30.641662
head-to-head Isend/Irecv 32768 0.016558 31.663680
head-to-head Isend/Irecv 65536 0.031551 33.233819
head-to-head Isend/Irecv 131072 0.061299 34.211791
head-to-head Isend/Irecv 262144 0.120522 34.801192
head-to-head Isend/Irecv 524288 0.239733 34.991432
head-to-head Isend/Irecv 1048576 0.477629 35.126065
Benchmarking point to point performance with unaligned data
Target pingpong is up to date.
Kind char n time (sec) Rate (MB/sec)
Send/Recv 1 0.000112 0.008947
Send/Recv 2 0.000111 0.017974
Send/Recv 4 0.000110 0.036210
Send/Recv 8 0.000111 0.071843
Send/Recv 16 0.000112 0.143494
Send/Recv 32 0.000113 0.282777
Send/Recv 64 0.000317 0.201682
Send/Recv 128 0.000281 0.454894
Send/Recv 256 0.000226 1.134039
Send/Recv 512 0.000184 2.779777
Send/Recv 1024 0.000221 4.637421
Send/Recv 2048 0.000282 7.268531
Send/Recv 4096 0.000395 10.381119
Send/Recv 8192 0.000627 13.071383
Send/Recv 16384 0.001111 14.753384
Send/Recv 32768 0.001689 19.396668
Send/Recv 65536 0.002962 22.127271
Send/Recv 131072 0.004974 26.353215
Send/Recv 262144 0.008971 29.220618
Send/Recv 524288 0.017196 30.489439
Send/Recv 1048576 0.033199 31.584685
Kind double n time (sec) Rate (MB/sec)
Send/Recv 1 0.000112 0.071378
Send/Recv 2 0.000112 0.142912
Send/Recv 4 0.000114 0.280235
Send/Recv 8 0.000341 0.187649
Send/Recv 16 0.000302 0.424163
Send/Recv 32 0.000250 1.022904
Send/Recv 64 0.000226 2.261026
Send/Recv 128 0.000223 4.598335
Send/Recv 256 0.000282 7.253088
Send/Recv 512 0.000397 10.323556
Send/Recv 1024 0.000642 12.755651
Send/Recv 2048 0.001111 14.742430
Send/Recv 4096 0.001752 18.705868
Send/Recv 8192 0.002915 22.478669
Send/Recv 16384 0.004850 27.022926
Send/Recv 32768 0.008952 29.282062
Send/Recv 65536 0.016969 30.897586
Send/Recv 131072 0.033306 31.483238
Send/Recv 262144 0.065622 31.958215
Send/Recv 524288 0.130447 32.153310
Send/Recv 1048576 0.259607 32.312778
Kind int n time (sec) Rate (MB/sec)
Send/Recv 1 0.000111 0.036019
Send/Recv 2 0.000112 0.071523
Send/Recv 4 0.000112 0.143276
Send/Recv 8 0.000114 0.281623
Send/Recv 16 0.000340 0.188451
Send/Recv 32 0.000308 0.416028
Send/Recv 64 0.000239 1.072334
Send/Recv 128 0.000191 2.683564
Send/Recv 256 0.000220 4.659045
Send/Recv 512 0.000284 7.211904
Send/Recv 1024 0.000399 10.263735
Send/Recv 2048 0.000634 12.915787
Send/Recv 4096 0.001104 14.834701
Send/Recv 8192 0.001717 19.084449
Send/Recv 16384 0.002696 24.308267
Send/Recv 32768 0.004917 26.656362
Send/Recv 65536 0.009024 29.049926
Send/Recv 131072 0.017124 30.618017
Send/Recv 262144 0.033281 31.506509
Send/Recv 524288 0.065548 31.993916
Send/Recv 1048576 0.130628 32.108758
Benchmarking point to point performance with contention
Target pingpong is up to date.
Kind (np=2) n time (sec) Rate (MB/sec)
Send/Recv 1 0.000115 0.069828
Send/Recv 2 0.000115 0.139731
Send/Recv 4 0.000116 0.274702
Send/Recv 8 0.000322 0.198944
Send/Recv 16 0.000313 0.408505
Send/Recv 32 0.000244 1.047987
Send/Recv 64 0.000209 2.452342
Send/Recv 128 0.000225 4.546060
Send/Recv 256 0.000285 7.184914
Send/Recv 512 0.000395 10.377175
Send/Recv 1024 0.000611 13.411372
Send/Recv 2048 0.001088 15.056573
Send/Recv 4096 0.001749 18.735811
Send/Recv 8192 0.002798 23.420980
Send/Recv 16384 0.004907 26.708847
Send/Recv 32768 0.008946 29.304034
Send/Recv 65536 0.017120 30.624456
Send/Recv 131072 0.033511 31.290677
Send/Recv 262144 0.065813 31.865467
Send/Recv 524288 0.130850 32.054178
Send/Recv 1048576 0.261148 32.122086
Kind (np=4) n time (sec) Rate (MB/sec)
Send/Recv 1 0.000112 0.071461
Send/Recv 2 0.000112 0.143366
Send/Recv 4 0.000114 0.281136
Send/Recv 8 0.000337 0.189959
Send/Recv 16 0.000306 0.418062
Send/Recv 32 0.000258 0.992741
Send/Recv 64 0.000217 2.357736
Send/Recv 128 0.000221 4.629930
Send/Recv 256 0.000282 7.255014
Send/Recv 512 0.000382 10.724268
Send/Recv 1024 0.000593 13.813920
Send/Recv 2048 0.001095 14.962045
Send/Recv 4096 0.001778 18.428791
Send/Recv 8192 0.002671 24.534751
Send/Recv 16384 0.004856 26.992735
Send/Recv 32768 0.008814 29.742745
Send/Recv 65536 0.017050 30.750863
Send/Recv 131072 0.033173 31.609452
Send/Recv 262144 0.065441 32.046448
Send/Recv 524288 0.130619 32.110955
Send/Recv 1048576 0.260107 32.250657
Kind (np=8) n time (sec) Rate (MB/sec)
Send/Recv 1 0.000111 0.072170
Send/Recv 2 0.000111 0.144312
Send/Recv 4 0.000114 0.280537
Send/Recv 8 0.000335 0.191142
Send/Recv 16 0.000294 0.435595
Send/Recv 32 0.000222 1.150887
Send/Recv 64 0.000214 2.397228
Send/Recv 128 0.000223 4.588328
Send/Recv 256 0.000283 7.236003
Send/Recv 512 0.000382 10.731995
Send/Recv 1024 0.000601 13.625513
Send/Recv 2048 0.001088 15.063322
Send/Recv 4096 0.001760 18.613819
Send/Recv 8192 0.002895 22.635697
Send/Recv 16384 0.004823 27.175460
Send/Recv 32768 0.008927 29.363981
Send/Recv 65536 0.016978 30.880638
Send/Recv 131072 0.032977 31.796915
Send/Recv 262144 0.065325 32.103182
Send/Recv 524288 0.130097 32.239892
Send/Recv 1048576 0.259485 32.327924
Kind (np=16) n time (sec) Rate (MB/sec)
Send/Recv 1 0.000111 0.071839
Send/Recv 2 0.000112 0.143416
Send/Recv 4 0.000115 0.277887
Send/Recv 8 0.000346 0.184755
Send/Recv 16 0.000296 0.433012
Send/Recv 32 0.000248 1.032221
Send/Recv 64 0.000205 2.498668
Send/Recv 128 0.000221 4.625299
Send/Recv 256 0.000284 7.219210
Send/Recv 512 0.000389 10.519421
Send/Recv 1024 0.000604 13.569370
Send/Recv 2048 0.001086 15.092809
Send/Recv 4096 0.001732 18.914255
Send/Recv 8192 0.002885 22.716511
Send/Recv 16384 0.004845 27.050812
Send/Recv 32768 0.009027 29.039509
Send/Recv 65536 0.017389 30.149883
Send/Recv 131072 0.033829 30.996673
Send/Recv 262144 0.066899 31.348032
Send/Recv 524288 0.133754 31.358298
Send/Recv 1048576 0.267101 31.406183
Kind (np=32) n time (sec) Rate (MB/sec)
Send/Recv 1 0.000113 0.070770
Send/Recv 2 0.000114 0.140738
Send/Recv 4 0.000116 0.276265
Send/Recv 8 0.000346 0.184863
Send/Recv 16 0.000285 0.449718
Send/Recv 32 0.000256 1.000072
Send/Recv 64 0.000224 2.285102
Send/Recv 128 0.000222 4.609387
Send/Recv 256 0.000285 7.193642
Send/Recv 512 0.000392 10.449645
Send/Recv 1024 0.000624 13.137414
Send/Recv 2048 0.001495 10.960206
Send/Recv 4096 0.003027 10.826804
Send/Recv 8192 0.005712 11.473942
Send/Recv 16384 0.011345 11.553665
Send/Recv 32768 0.022138 11.841620
Send/Recv 65536 0.044420 11.802869
Send/Recv 131072 0.087207 12.023980
Send/Recv 262144 0.178001 11.781714
Send/Recv 524288 0.355826 11.787521
Send/Recv 1048576 0.709474 11.823697
barrier
Benchmarking collective barrier
Target barrier is up to date.
Kind np time (sec)
Barrier 1 0.000002
Barrier 2 0.000266
Barrier 4 0.000699
Barrier 8 0.001198
Barrier 16 0.001706
Barrier 32 0.002223
Benchmarking collective Allreduce
Target barrier is up to date.
Kind np time (sec)
Allreduce 1 0.000017
Allreduce 2 0.000287
Allreduce 4 0.000537
Allreduce 8 0.000814
Allreduce 16 0.001102
Allreduce 32 0.001469
vector
Comparing the performance of MPI vector datatypes
Target vector is up to date.
Kind n stride time (sec) Rate (MB/sec)
Vector 1000 24 0.002021 3.958464
Struct 1000 24 0.011546 0.692904
User 1000 24 0.001412 5.666553
User(add) 1000 24 0.001413 5.660011
circulate
Pipelining pitfalls
Target circulate is up to date.
For n = 20000, m = 20000, T_comm = 0.012439, T_compute = 0.035924, sum = 0.048363, T_both = 0.042928
For n = 500, m = 500, T_comm = 0.000311, T_compute = 0.000885, sum = 0.001196, T_both = 0.001459
3way
Exploring the cost of synchronization delays
Target bad is up to date.
[2] Litsize = 8, Time for first send = 0.000301, for second = 0.000119
[2] Litsize = 9, Time for first send = 0.000298, for second = 0.000118
[2] Litsize = 511, Time for first send = 0.000487, for second = 0.000343
[2] Litsize = 512, Time for first send = 0.000482, for second = 0.000423
[2] Litsize = 513, Time for first send = 0.001217, for second = 0.000347
jacobi
Jacobi Iteration - Example Parallel Mesh
Target jacobi is up to date.
send/recv: 6 iterations in 0.002572 secs (0.163302 MFlops); diffnorm 0.008134, m=7 n=4 np=1
send/recv: 7 iterations in 0.021552 secs (18.625460 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
send/recv: 24 iterations in 0.031917 secs (0.210547 MFlops); diffnorm 0.009895, m=7 n=10 np=4
send/recv: 25 iterations in 0.354430 secs (16.179230 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
send/recv: 25 iterations in 0.055101 secs (0.508158 MFlops); diffnorm 0.036615, m=7 n=34 np=16
send/recv: 25 iterations in 0.414272 secs (55.368390 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
send/recv: 25 iterations in 0.068831 secs (0.813589 MFlops); diffnorm 0.055291, m=7 n=66 np=32
send/recv: 25 iterations in 0.415681 secs (110.361452 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Shift up and down
Target jacobi is up to date.
shift/sendrecv: 6 iterations in 0.002747 secs (0.152904 MFlops); diffnorm 0.008134, m=7 n=4 np=1
shift/sendrecv: 7 iterations in 0.021888 secs (18.339139 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
shift/sendrecv: 24 iterations in 0.033650 secs (0.199700 MFlops); diffnorm 0.009895, m=7 n=10 np=4
shift/sendrecv: 25 iterations in 0.329892 secs (17.382638 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
shift/sendrecv: 25 iterations in 0.053200 secs (0.526313 MFlops); diffnorm 0.036615, m=7 n=34 np=16
shift/sendrecv: 25 iterations in 0.372010 secs (61.658632 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
shift/sendrecv: 25 iterations in 0.068540 secs (0.817045 MFlops); diffnorm 0.055291, m=7 n=66 np=32
shift/sendrecv: 25 iterations in 0.388944 secs (117.948168 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Exchange head-to-head
Target jacobi is up to date.
head-to-head sendrecv: 6 iterations in 0.002780 secs (0.151052 MFlops); diffnorm 0.008134, m=7 n=4 np=1
head-to-head sendrecv: 7 iterations in 0.022213 secs (18.070615 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
head-to-head sendrecv: 24 iterations in 0.036451 secs (0.184357 MFlops); diffnorm 0.009895, m=7 n=10 np=4
head-to-head sendrecv: 25 iterations in 0.314373 secs (18.240771 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
head-to-head sendrecv: 25 iterations in 0.058582 secs (0.477964 MFlops); diffnorm 0.036615, m=7 n=34 np=16
head-to-head sendrecv: 25 iterations in 0.370915 secs (61.840575 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
head-to-head sendrecv: 25 iterations in 0.068585 secs (0.816505 MFlops); diffnorm 0.055291, m=7 n=66 np=32
head-to-head sendrecv: 25 iterations in 0.367531 secs (124.819888 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Nonblocking send/recv
Target jacobi is up to date.
irecv/isend: 6 iterations in 0.002634 secs (0.159471 MFlops); diffnorm 0.008134, m=7 n=4 np=1
irecv/isend: 7 iterations in 0.021806 secs (18.407870 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
irecv/isend: 24 iterations in 0.031714 secs (0.211891 MFlops); diffnorm 0.009895, m=7 n=10 np=4
irecv/isend: 25 iterations in 0.307855 secs (18.626934 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
irecv/isend: 25 iterations in 0.051460 secs (0.544115 MFlops); diffnorm 0.036615, m=7 n=34 np=16
irecv/isend: 25 iterations in 0.350507 secs (65.441241 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
irecv/isend: 25 iterations in 0.060896 secs (0.919604 MFlops); diffnorm 0.055291, m=7 n=66 np=32
irecv/isend: 25 iterations in 0.368205 secs (124.591481 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Nonblocking send/recv for receiver pull
Target jacobi is up to date.
isend/irecv: 6 iterations in 0.002911 secs (0.144278 MFlops); diffnorm 0.008134, m=7 n=4 np=1
isend/irecv: 7 iterations in 0.021792 secs (18.420055 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
isend/irecv: 24 iterations in 0.033644 secs (0.199736 MFlops); diffnorm 0.009895, m=7 n=10 np=4
isend/irecv: 25 iterations in 0.332836 secs (17.228921 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
isend/irecv: 25 iterations in 0.071952 secs (0.389151 MFlops); diffnorm 0.036615, m=7 n=34 np=16
isend/irecv: 25 iterations in 0.371885 secs (61.679361 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
isend/irecv: 25 iterations in 0.062465 secs (0.896496 MFlops); diffnorm 0.055291, m=7 n=66 np=32
isend/irecv: 25 iterations in 0.406233 secs (112.928350 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Synchronous send
Target jacobi is up to date.
ssend/irecv: 6 iterations in 0.006455 secs (0.065066 MFlops); diffnorm 0.008134, m=7 n=4 np=1
ssend/irecv: 7 iterations in 0.021829 secs (18.388538 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
ssend/irecv: 24 iterations in 0.063690 secs (0.105512 MFlops); diffnorm 0.009895, m=7 n=10 np=4
ssend/irecv: 25 iterations in 0.296673 secs (19.329001 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
ssend/irecv: 25 iterations in 0.080165 secs (0.349281 MFlops); diffnorm 0.036615, m=7 n=34 np=16
ssend/irecv: 25 iterations in 0.343747 secs (66.728064 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
ssend/irecv: 25 iterations in 0.088993 secs (0.629261 MFlops); diffnorm 0.055291, m=7 n=66 np=32
ssend/irecv: 25 iterations in 0.354813 secs (129.293886 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Ready send
Target jacobi is up to date.
rsend: 6 iterations in 0.000314 secs (1.336834 MFlops); diffnorm 0.008134, m=7 n=4 np=1
rsend: 7 iterations in 0.021740 secs (18.464093 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
rsend: 24 iterations in 0.030727 secs (0.218701 MFlops); diffnorm 0.009895, m=7 n=10 np=4
rsend: 25 iterations in 0.335406 secs (17.096892 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
rsend: 25 iterations in 0.052040 secs (0.538049 MFlops); diffnorm 0.036615, m=7 n=34 np=16
rsend: 25 iterations in 0.374883 secs (61.186044 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
rsend: 25 iterations in 0.062123 secs (0.901438 MFlops); diffnorm 0.055291, m=7 n=66 np=32
rsend: 25 iterations in 0.382195 secs (120.030741 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Overlapping communication
Target jacobi is up to date.
isend/overlap: 6 iterations in 0.002704 secs (0.155321 MFlops); diffnorm 0.008134, m=7 n=4 np=1
isend/overlap: 7 iterations in 0.021670 secs (18.523908 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
isend/overlap: 24 iterations in 0.033490 secs (0.200655 MFlops); diffnorm 0.009895, m=7 n=10 np=4
isend/overlap: 25 iterations in 0.301209 secs (19.037947 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
isend/overlap: 25 iterations in 0.062912 secs (0.445069 MFlops); diffnorm 0.036615, m=7 n=34 np=16
isend/overlap: 25 iterations in 0.356756 secs (64.294906 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
isend/overlap: 25 iterations in 0.062354 secs (0.898095 MFlops); diffnorm 0.055291, m=7 n=66 np=32
isend/overlap: 25 iterations in 0.369554 secs (124.136813 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Overlapping communication (sends first)
Target jacobi is up to date.
send first/overlap: 6 iterations in 0.002756 secs (0.152399 MFlops); diffnorm 0.008134, m=7 n=4 np=1
send first/overlap: 7 iterations in 0.021358 secs (18.794137 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
send first/overlap: 24 iterations in 0.032777 secs (0.205023 MFlops); diffnorm 0.009895, m=7 n=10 np=4
send first/overlap: 25 iterations in 0.331630 secs (17.291538 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
send first/overlap: 25 iterations in 0.052683 secs (0.531483 MFlops); diffnorm 0.036615, m=7 n=34 np=16
send first/overlap: 25 iterations in 0.368361 secs (62.269384 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
send first/overlap: 25 iterations in 0.065713 secs (0.852188 MFlops); diffnorm 0.055291, m=7 n=66 np=32
send first/overlap: 25 iterations in 0.380592 secs (120.536348 MFlops); diffnorm 0.470684, m=4098 n=66 np=32
Jacobi Iteration - Persistent send/recv
Target jacobi is up to date.
persistent send/recv: 6 iterations in 0.000302 secs (1.392919 MFlops); diffnorm 0.008134, m=7 n=4 np=1
persistent send/recv: 7 iterations in 0.021310 secs (18.836846 MFlops); diffnorm 0.006899, m=4098 n=4 np=1
persistent send/recv: 24 iterations in 0.030315 secs (0.221672 MFlops); diffnorm 0.009895, m=7 n=10 np=4
persistent send/recv: 25 iterations in 0.301785 secs (19.001604 MFlops); diffnorm 0.138820, m=4098 n=10 np=4
persistent send/recv: 25 iterations in 0.051264 secs (0.546197 MFlops); diffnorm 0.036615, m=7 n=34 np=16
persistent send/recv: 25 iterations in 0.363528 secs (63.097277 MFlops); diffnorm 0.468864, m=4098 n=34 np=16
persistent send/recv: 25 iterations in 0.061367 secs (0.912542 MFlops); diffnorm 0.055291, m=7 n=66 np=32
persistent send/recv: 25 iterations in 0.358814 secs (127.852233 MFlops); diffnorm 0.470684, m=4098 n=66 np=32