1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-11-14 09:45:36 +00:00
Commit Graph

362 Commits

Author SHA1 Message Date
Peter Boyle
9c4dcc5ea3 Merge branch 'master' into develop 2020-11-16 16:34:57 +01:00
Peter Boyle
e9bc748828 Useful GPU machine benchmark for GDR used to shakeout Booster at Juelich - see slack earlyaccess channel 2020-11-13 03:58:34 +01:00
Peter Boyle
f48156529b Work on 2,2,2,8 ranks 2020-11-13 03:57:58 +01:00
Peter Boyle
f16c2665f5 Host memory explict 2020-11-12 20:29:58 +01:00
Peter Boyle
41e28015ae Volume divisible guarantee 2020-11-07 13:32:16 +01:00
Peter Boyle
3f06209720 Pretty print 2020-10-13 22:18:51 -04:00
c2b688abc9 Benchmark_IO: reducing max local volume to 32^4 2020-10-10 16:52:56 +01:00
b0d61b9687 Benchmark_IO cleaner output 2020-10-09 21:46:45 +01:00
5f893bf9af Benchmark_IO procurement sizes 2020-10-09 21:31:59 +01:00
0e17bd6597 I/O benchmark cleanup 2020-10-09 20:29:57 +01:00
22caa158cc multi-pass I/O benchmark, with statistic and robustness summary 2020-10-09 20:29:40 +01:00
Peter Boyle
992ef6e9fc more runtime 2020-10-08 22:19:20 -04:00
Peter Boyle
f32a320bc3 Single prec benchmark in double prec compile 2020-10-08 19:52:08 -04:00
Peter Boyle
5f0fe029d2 Improve meemory benchmarks for GPU (avoid host mem ping pong) 2020-10-08 19:51:28 -04:00
Peter Boyle
3f9c427a3a Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2020-10-07 13:12:57 -04:00
Peter Boyle
d201277652 Expose Nc as a compile time configure option.
Remove precision option
2020-10-07 13:07:00 -04:00
e22d30f715 Merge branch 'develop' into feature/benchmark-io-update 2020-10-07 15:56:39 +01:00
1ba25a0d8c more I/O benchmark code cleaning 2020-10-07 15:38:41 +01:00
9ba3647bdf script to convert I/O benchmark logs to CSV 2020-10-07 15:35:03 +01:00
5ee832f738 I/O benchmark code cleaning 2020-10-07 15:31:51 +01:00
Peter Boyle
35a69a5133 SU4 x SU4 2020-10-06 21:48:35 -04:00
acac2d6938 standard C/C++ I/O in benchmark 2020-10-06 17:57:00 +01:00
Peter Boyle
81441e98f4 HIP runs sensible 2020-09-16 03:35:03 +01:00
Peter Boyle
8244caff25 Remove the asynchronous non-Stencil calls. 2020-09-03 18:52:55 -04:00
Bartosz Kostrzewa
a9b92867a8 use tabulator 2020-08-31 18:41:17 +02:00
Bartosz Kostrzewa
65920faeba correct formatting of Benchmark_wilson_sweep output 2020-08-31 18:39:27 +02:00
nmeyer-ur
337d9dc043 move barrier in Benchmark_wilson 2020-07-08 08:13:40 +02:00
nmeyer-ur
8726e94ea7 merge upstream develop 2020-07-07 20:26:47 +02:00
nmeyer-ur
a87e45ba25 SVE readme update 2020-06-18 11:23:08 +02:00
Peter Boyle
cdf0a04fc5 Merge branch 'develop' into sycl 2020-06-09 04:00:12 -04:00
Peter Boyle
14fcd0912a Merge branch 'sycl' of https://github.com/paboyle/Grid into sycl 2020-06-05 19:14:17 -04:00
Peter Boyle
3111c0bd4f Single precisiono hardwire 2020-06-05 19:13:27 -04:00
Peter Boyle
1a4c8c3387 Global edit with change to View usage. autoView() creates a wrapper object that closes the view when scope closes. 2020-06-05 18:52:35 -04:00
nmeyer-ur
5f52804907 update calculation of data 2020-05-30 10:55:17 +02:00
nmeyer-ur
936071773e correct throughput in wilson and dwf 2020-05-29 22:15:59 +02:00
nmeyer-ur
1732f9319e more mods; counters seem to work correctly 2020-05-29 18:44:00 +02:00
nmeyer-ur
5cb3530c34 enable counters in Benchmark_wilson 2020-05-29 15:44:52 +02:00
Peter Boyle
006cc8a8f1 Staggereed move to accelerator 2020-05-28 08:33:06 -04:00
Peter Boyle
cf2938688a Sycl unhappy fix 2020-05-25 08:36:53 -07:00
Peter Boyle
a7abda89e2 View location & access mode 2020-05-21 16:13:59 -04:00
nmeyer-ur
015d8bb38a introduced assertions in Benchmark_wilson, removed data output from Benchmark_dwf 2020-05-15 09:15:50 +02:00
Peter Boyle
ea08f193e7 Allocator cache spliit into large/small pools 2020-05-10 05:24:26 -04:00
Peter Boyle
ee1de82a53 Working ITT benchmark again 2020-05-08 18:54:50 -04:00
Peter Boyle
2b576fc185 Comment deadd codde remove 2020-05-08 18:54:29 -04:00
Peter Boyle
6859a3e1d4 Schur operator 2020-05-08 09:19:12 -04:00
Peter Boyle
28a1fcaaff First compile against SYCL 2020-05-05 11:13:27 -07:00
u37294
59c51d2c35 Make compile if HAVE_LIME=0 2020-05-04 10:26:20 -07:00
nils meyer
64b72fc17f testing gcc 10.0.1: build errors in Exchange1 using -DA64FX and in Lattice_base.h building Dslash only 2020-04-19 01:25:40 +02:00
Peter Boyle
e279b2be29 Merge develop 2019-08-14 23:01:59 +01:00
Peter Boyle
48e6efc7c9 Merge branch 'develop' into feature/gpu-port
Conflicts:
	Grid/qcd/action/fermion/WilsonKernelsAsm.cc
	Grid/qcd/action/fermion/implementation/ImprovedStaggeredFermionImplementation.h
	Grid/qcd/action/fermion/implementation/StaggeredKernelsAsm.h
	benchmarks/Benchmark_comms.cc
2019-08-14 18:56:54 +01:00
Peter Boyle
263dcbabab Simplify the comms benchmark 2019-07-30 22:51:04 +01:00
Peter Boyle
d85dcc72df Multinode fix 2019-07-20 07:13:28 +01:00
Peter Boyle
0561c2edeb Benchmarks modified for new GPU constructs 2019-06-15 12:52:56 +01:00
Peter Boyle
3e41b1055c Remove Gpu only kernels. 2019-06-09 11:20:01 +01:00
Peter Boyle
da8d87e9da Cuda switch off 2019-06-08 17:11:38 +01:00
Peter Boyle
6d77941990 Drop the 5D vec actions 2019-06-08 13:38:05 +01:00
Peter Boyle
47c063f984 Remove Ls Vec cases from benchmarks 2019-06-04 20:45:35 +01:00
Peter Boyle
ee6f96d85c
Merge pull request #210 from grid-test-organisation/feature/gpu-port-develop
Cayley fermion functions for GPUs
2019-05-18 19:06:20 +01:00
Peter Boyle
4e9df9e93c GPU patches 2019-05-18 17:43:11 +01:00
gfilaci
e3c56fd9b3 CayleyZeroCounters before benchmark loop 2019-05-13 15:52:00 +01:00
gfilaci
d9438627d9 M5D benchmark without vector copy overhead 2019-05-02 11:10:57 +01:00
gfilaci
6da9aa9971 replace std::vector with Vector in benchmark 2019-05-02 10:56:22 +01:00
gfilaci
b52fa38f8c seed initialisation of RNG5 2019-05-02 10:36:09 +01:00
Peter Boyle
c43a2b599a GPU support 2019-01-01 15:07:29 +00:00
Peter Boyle
b57a4d32aa Merge branch 'develop' into feature/gpu-port 2018-12-13 05:11:34 +00:00
0ba3d469c7 Benchmark IO in single and double precision 2018-10-17 20:27:34 +01:00
291bc2a1f0 IO benchmark on a list of directories 2018-10-15 17:25:08 +01:00
Peter Boyle
adbdc4e65b Half comms not working on GPU yet, so disable. 2018-09-11 05:15:22 +01:00
Peter Boyle
f4bfeb835d Drop back to smaller Ls 2018-09-09 14:25:06 +01:00
a15a2dfd29 Merge branch 'develop' into feature/hadrons 2018-08-10 16:08:22 +01:00
paboyle
27cdb79063 Sha used to seed from a unique string 2018-08-10 15:11:01 +01:00
Peter Boyle
00b92a91b5 Optimising 2018-07-28 23:46:22 +01:00
paboyle
65533741f7 7 moms 2018-07-28 16:17:47 +01:00
Peter Boyle
131a6785d4
Merge branch 'feature/hadrons-a2a' into feature/hadrons-a2a 2018-07-27 23:03:42 +01:00
paboyle
44f4f5c8e2 Momentum loop 2018-07-27 23:00:16 +01:00
fionnoh
2679df034f Changes to meson field benchmark. Now includes the gammas in the final part of the naive method, both methods compute
lhs^dag*Gamma*rhs (previously Gamma*lhs^dag*rhs), and checks results.
2018-07-27 18:31:10 +01:00
paboyle
71e1006ba8 Updated meson field benchmark for dirac structures 2018-07-26 09:09:29 +01:00
fionnoh
24128ff109 Changes needed for MF benchmark to work with comms correctly 2018-07-23 15:51:37 +01:00
Peter Boyle
21a1710b43 Verbose vector length 2018-07-23 06:08:39 -04:00
paboyle
ec9939c1ba Test for faster implementation of meson field inner loop
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.

It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.

Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.

This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
Peter Boyle
4b04ae3611 Printing improvement 2018-07-05 06:59:38 -04:00
Peter Boyle
2f776d51c6 Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,
but a bitof (known) work.
2018-07-05 06:58:37 -04:00
paboyle
25becc9324 GPU tweaks for benchmarking; really necessary? 2018-06-13 20:26:07 +01:00
Peter Boyle
eb921041d0 Perf count control 2018-05-12 17:57:32 -04:00
bfbf2f1fa0 no threaded stencil benchmark if OpenMP is not supported 2018-05-03 16:20:01 +01:00
Dr Peter Boyle
1dddd17e3c Benchmark improvements from tesseract 2018-04-27 11:44:46 +01:00
Peter Boyle
fa0d8feff4 Performance of CovariantCshift now non-embarrassing. 2018-04-26 17:56:27 +01:00
Peter Boyle
05b44aef6b Merge branch 'develop' of https://github.com/paboyle/Grid into develop
Conflicts:
	benchmarks/Benchmark_su3.cc
2018-04-26 15:38:49 +01:00
Peter Boyle
91a0a3f820 Improvement 2018-04-26 14:48:35 +01:00
Peter Boyle
8f44c799a6 Saving the benchmarking tests for Cshift 2018-04-26 14:48:03 +01:00
Guido Cossu
43f5a0df50 More timers in the integrator 2018-04-26 12:01:56 +09:00
paboyle
2baf193031 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-04-25 00:14:03 +01:00
paboyle
362ba0443a Cshift updates 2018-04-25 00:12:11 +01:00
Guido Cossu
c5b9147b53 Correction of a minor bug in the su3 benchmark 2018-04-24 08:03:57 -07:00
Guido Cossu
a1be533329 Corrected Flop count in Benchmark su3 and expanded the Wilson flow output 2018-04-24 01:19:53 -07:00
paboyle
b5510427f9 physical fermion interface, cshift benchmark in SU3. 2018-04-18 01:43:29 +01:00
paboyle
276f113f28 IO uses master boss node for metadata. 2018-03-30 16:17:05 +01:00
paboyle
ab6afd18ac Still compile if no LIME 2018-03-30 13:39:20 +01:00
c5a885dcd6 I/O benchmark 2018-03-29 19:57:41 +01:00
Peter Boyle
6fe9b28a82 Cosmetic 2018-03-24 19:27:14 -04:00