Peter Boyle
cf76741ec6
Intel DPCPP Gold happy now (compiles all, runs Benchmark_dwf_fp32 )
2020-12-03 03:47:11 -08:00
Peter Boyle
147dc15d26
Update
2020-11-20 13:13:59 -05:00
Peter Boyle
8fcb392e24
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2020-11-17 04:51:31 -08:00
Peter Boyle
dd8d70eeff
Build without LIME
2020-11-17 04:41:15 -08:00
Peter Boyle
3aab983760
Flop count set as in DiRAC-ITT-2020 (mistaken 20% low, but must maintain consistency)
2020-11-16 17:13:58 +01:00
Peter Boyle
9c4dcc5ea3
Merge branch 'master' into develop
2020-11-16 16:34:57 +01:00
Peter Boyle
e9bc748828
Useful GPU machine benchmark for GDR used to shakeout Booster at Juelich - see slack earlyaccess channel
2020-11-13 03:58:34 +01:00
Peter Boyle
f48156529b
Work on 2,2,2,8 ranks
2020-11-13 03:57:58 +01:00
Peter Boyle
f16c2665f5
Host memory explict
2020-11-12 20:29:58 +01:00
Peter Boyle
41e28015ae
Volume divisible guarantee
2020-11-07 13:32:16 +01:00
Peter Boyle
3f06209720
Pretty print
2020-10-13 22:18:51 -04:00
c2b688abc9
Benchmark_IO: reducing max local volume to 32^4
2020-10-10 16:52:56 +01:00
b0d61b9687
Benchmark_IO cleaner output
2020-10-09 21:46:45 +01:00
5f893bf9af
Benchmark_IO procurement sizes
2020-10-09 21:31:59 +01:00
0e17bd6597
I/O benchmark cleanup
2020-10-09 20:29:57 +01:00
22caa158cc
multi-pass I/O benchmark, with statistic and robustness summary
2020-10-09 20:29:40 +01:00
Peter Boyle
992ef6e9fc
more runtime
2020-10-08 22:19:20 -04:00
Peter Boyle
f32a320bc3
Single prec benchmark in double prec compile
2020-10-08 19:52:08 -04:00
Peter Boyle
5f0fe029d2
Improve meemory benchmarks for GPU (avoid host mem ping pong)
2020-10-08 19:51:28 -04:00
Peter Boyle
3f9c427a3a
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2020-10-07 13:12:57 -04:00
Peter Boyle
d201277652
Expose Nc as a compile time configure option.
...
Remove precision option
2020-10-07 13:07:00 -04:00
e22d30f715
Merge branch 'develop' into feature/benchmark-io-update
2020-10-07 15:56:39 +01:00
1ba25a0d8c
more I/O benchmark code cleaning
2020-10-07 15:38:41 +01:00
9ba3647bdf
script to convert I/O benchmark logs to CSV
2020-10-07 15:35:03 +01:00
5ee832f738
I/O benchmark code cleaning
2020-10-07 15:31:51 +01:00
Peter Boyle
35a69a5133
SU4 x SU4
2020-10-06 21:48:35 -04:00
acac2d6938
standard C/C++ I/O in benchmark
2020-10-06 17:57:00 +01:00
Peter Boyle
81441e98f4
HIP runs sensible
2020-09-16 03:35:03 +01:00
Peter Boyle
8244caff25
Remove the asynchronous non-Stencil calls.
2020-09-03 18:52:55 -04:00
Bartosz Kostrzewa
a9b92867a8
use tabulator
2020-08-31 18:41:17 +02:00
Bartosz Kostrzewa
65920faeba
correct formatting of Benchmark_wilson_sweep output
2020-08-31 18:39:27 +02:00
nmeyer-ur
337d9dc043
move barrier in Benchmark_wilson
2020-07-08 08:13:40 +02:00
nmeyer-ur
8726e94ea7
merge upstream develop
2020-07-07 20:26:47 +02:00
nmeyer-ur
a87e45ba25
SVE readme update
2020-06-18 11:23:08 +02:00
Peter Boyle
cdf0a04fc5
Merge branch 'develop' into sycl
2020-06-09 04:00:12 -04:00
Peter Boyle
14fcd0912a
Merge branch 'sycl' of https://github.com/paboyle/Grid into sycl
2020-06-05 19:14:17 -04:00
Peter Boyle
3111c0bd4f
Single precisiono hardwire
2020-06-05 19:13:27 -04:00
Peter Boyle
1a4c8c3387
Global edit with change to View usage. autoView() creates a wrapper object that closes the view when scope closes.
2020-06-05 18:52:35 -04:00
nmeyer-ur
5f52804907
update calculation of data
2020-05-30 10:55:17 +02:00
nmeyer-ur
936071773e
correct throughput in wilson and dwf
2020-05-29 22:15:59 +02:00
nmeyer-ur
1732f9319e
more mods; counters seem to work correctly
2020-05-29 18:44:00 +02:00
nmeyer-ur
5cb3530c34
enable counters in Benchmark_wilson
2020-05-29 15:44:52 +02:00
Peter Boyle
006cc8a8f1
Staggereed move to accelerator
2020-05-28 08:33:06 -04:00
Peter Boyle
cf2938688a
Sycl unhappy fix
2020-05-25 08:36:53 -07:00
Peter Boyle
a7abda89e2
View location & access mode
2020-05-21 16:13:59 -04:00
nmeyer-ur
015d8bb38a
introduced assertions in Benchmark_wilson, removed data output from Benchmark_dwf
2020-05-15 09:15:50 +02:00
Peter Boyle
ea08f193e7
Allocator cache spliit into large/small pools
2020-05-10 05:24:26 -04:00
Peter Boyle
ee1de82a53
Working ITT benchmark again
2020-05-08 18:54:50 -04:00
Peter Boyle
2b576fc185
Comment deadd codde remove
2020-05-08 18:54:29 -04:00
Peter Boyle
6859a3e1d4
Schur operator
2020-05-08 09:19:12 -04:00
Peter Boyle
28a1fcaaff
First compile against SYCL
2020-05-05 11:13:27 -07:00
u37294
59c51d2c35
Make compile if HAVE_LIME=0
2020-05-04 10:26:20 -07:00
nils meyer
64b72fc17f
testing gcc 10.0.1: build errors in Exchange1 using -DA64FX and in Lattice_base.h building Dslash only
2020-04-19 01:25:40 +02:00
Peter Boyle
e279b2be29
Merge develop
2019-08-14 23:01:59 +01:00
Peter Boyle
48e6efc7c9
Merge branch 'develop' into feature/gpu-port
...
Conflicts:
Grid/qcd/action/fermion/WilsonKernelsAsm.cc
Grid/qcd/action/fermion/implementation/ImprovedStaggeredFermionImplementation.h
Grid/qcd/action/fermion/implementation/StaggeredKernelsAsm.h
benchmarks/Benchmark_comms.cc
2019-08-14 18:56:54 +01:00
Peter Boyle
263dcbabab
Simplify the comms benchmark
2019-07-30 22:51:04 +01:00
Peter Boyle
d85dcc72df
Multinode fix
2019-07-20 07:13:28 +01:00
Peter Boyle
0561c2edeb
Benchmarks modified for new GPU constructs
2019-06-15 12:52:56 +01:00
Peter Boyle
3e41b1055c
Remove Gpu only kernels.
2019-06-09 11:20:01 +01:00
Peter Boyle
da8d87e9da
Cuda switch off
2019-06-08 17:11:38 +01:00
Peter Boyle
6d77941990
Drop the 5D vec actions
2019-06-08 13:38:05 +01:00
Peter Boyle
47c063f984
Remove Ls Vec cases from benchmarks
2019-06-04 20:45:35 +01:00
Peter Boyle
ee6f96d85c
Merge pull request #210 from grid-test-organisation/feature/gpu-port-develop
...
Cayley fermion functions for GPUs
2019-05-18 19:06:20 +01:00
Peter Boyle
4e9df9e93c
GPU patches
2019-05-18 17:43:11 +01:00
gfilaci
e3c56fd9b3
CayleyZeroCounters before benchmark loop
2019-05-13 15:52:00 +01:00
gfilaci
d9438627d9
M5D benchmark without vector copy overhead
2019-05-02 11:10:57 +01:00
gfilaci
6da9aa9971
replace std::vector with Vector in benchmark
2019-05-02 10:56:22 +01:00
gfilaci
b52fa38f8c
seed initialisation of RNG5
2019-05-02 10:36:09 +01:00
Peter Boyle
c43a2b599a
GPU support
2019-01-01 15:07:29 +00:00
Peter Boyle
b57a4d32aa
Merge branch 'develop' into feature/gpu-port
2018-12-13 05:11:34 +00:00
0ba3d469c7
Benchmark IO in single and double precision
2018-10-17 20:27:34 +01:00
291bc2a1f0
IO benchmark on a list of directories
2018-10-15 17:25:08 +01:00
Peter Boyle
adbdc4e65b
Half comms not working on GPU yet, so disable.
2018-09-11 05:15:22 +01:00
Peter Boyle
f4bfeb835d
Drop back to smaller Ls
2018-09-09 14:25:06 +01:00
a15a2dfd29
Merge branch 'develop' into feature/hadrons
2018-08-10 16:08:22 +01:00
paboyle
27cdb79063
Sha used to seed from a unique string
2018-08-10 15:11:01 +01:00
Peter Boyle
00b92a91b5
Optimising
2018-07-28 23:46:22 +01:00
paboyle
65533741f7
7 moms
2018-07-28 16:17:47 +01:00
Peter Boyle
131a6785d4
Merge branch 'feature/hadrons-a2a' into feature/hadrons-a2a
2018-07-27 23:03:42 +01:00
paboyle
44f4f5c8e2
Momentum loop
2018-07-27 23:00:16 +01:00
fionnoh
2679df034f
Changes to meson field benchmark. Now includes the gammas in the final part of the naive method, both methods compute
...
lhs^dag*Gamma*rhs (previously Gamma*lhs^dag*rhs), and checks results.
2018-07-27 18:31:10 +01:00
paboyle
71e1006ba8
Updated meson field benchmark for dirac structures
2018-07-26 09:09:29 +01:00
fionnoh
24128ff109
Changes needed for MF benchmark to work with comms correctly
2018-07-23 15:51:37 +01:00
Peter Boyle
21a1710b43
Verbose vector length
2018-07-23 06:08:39 -04:00
paboyle
ec9939c1ba
Test for faster implementation of meson field inner loop
...
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.
It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.
Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.
This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
Peter Boyle
4b04ae3611
Printing improvement
2018-07-05 06:59:38 -04:00
Peter Boyle
2f776d51c6
Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,
...
but a bitof (known) work.
2018-07-05 06:58:37 -04:00
paboyle
25becc9324
GPU tweaks for benchmarking; really necessary?
2018-06-13 20:26:07 +01:00
Peter Boyle
eb921041d0
Perf count control
2018-05-12 17:57:32 -04:00
bfbf2f1fa0
no threaded stencil benchmark if OpenMP is not supported
2018-05-03 16:20:01 +01:00
Dr Peter Boyle
1dddd17e3c
Benchmark improvements from tesseract
2018-04-27 11:44:46 +01:00
Peter Boyle
fa0d8feff4
Performance of CovariantCshift now non-embarrassing.
2018-04-26 17:56:27 +01:00
Peter Boyle
05b44aef6b
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
...
Conflicts:
benchmarks/Benchmark_su3.cc
2018-04-26 15:38:49 +01:00
Peter Boyle
91a0a3f820
Improvement
2018-04-26 14:48:35 +01:00
Peter Boyle
8f44c799a6
Saving the benchmarking tests for Cshift
2018-04-26 14:48:03 +01:00
Guido Cossu
43f5a0df50
More timers in the integrator
2018-04-26 12:01:56 +09:00
paboyle
2baf193031
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2018-04-25 00:14:03 +01:00
paboyle
362ba0443a
Cshift updates
2018-04-25 00:12:11 +01:00
Guido Cossu
c5b9147b53
Correction of a minor bug in the su3 benchmark
2018-04-24 08:03:57 -07:00
Guido Cossu
a1be533329
Corrected Flop count in Benchmark su3 and expanded the Wilson flow output
2018-04-24 01:19:53 -07:00