5f52804907
update calculation of data
2020-05-30 10:55:17 +02:00
936071773e
correct throughput in wilson and dwf
2020-05-29 22:15:59 +02:00
1732f9319e
more mods; counters seem to work correctly
2020-05-29 18:44:00 +02:00
5cb3530c34
enable counters in Benchmark_wilson
2020-05-29 15:44:52 +02:00
015d8bb38a
introduced assertions in Benchmark_wilson, removed data output from Benchmark_dwf
2020-05-15 09:15:50 +02:00
64b72fc17f
testing gcc 10.0.1: build errors in Exchange1 using -DA64FX and in Lattice_base.h building Dslash only
2020-04-19 01:25:40 +02:00
e279b2be29
Merge develop
2019-08-14 23:01:59 +01:00
48e6efc7c9
Merge branch 'develop' into feature/gpu-port
...
Conflicts:
Grid/qcd/action/fermion/WilsonKernelsAsm.cc
Grid/qcd/action/fermion/implementation/ImprovedStaggeredFermionImplementation.h
Grid/qcd/action/fermion/implementation/StaggeredKernelsAsm.h
benchmarks/Benchmark_comms.cc
2019-08-14 18:56:54 +01:00
263dcbabab
Simplify the comms benchmark
2019-07-30 22:51:04 +01:00
d85dcc72df
Multinode fix
2019-07-20 07:13:28 +01:00
0561c2edeb
Benchmarks modified for new GPU constructs
2019-06-15 12:52:56 +01:00
3e41b1055c
Remove Gpu only kernels.
2019-06-09 11:20:01 +01:00
da8d87e9da
Cuda switch off
2019-06-08 17:11:38 +01:00
6d77941990
Drop the 5D vec actions
2019-06-08 13:38:05 +01:00
47c063f984
Remove Ls Vec cases from benchmarks
2019-06-04 20:45:35 +01:00
ee6f96d85c
Merge pull request #210 from grid-test-organisation/feature/gpu-port-develop
...
Cayley fermion functions for GPUs
2019-05-18 19:06:20 +01:00
4e9df9e93c
GPU patches
2019-05-18 17:43:11 +01:00
e3c56fd9b3
CayleyZeroCounters before benchmark loop
2019-05-13 15:52:00 +01:00
d9438627d9
M5D benchmark without vector copy overhead
2019-05-02 11:10:57 +01:00
6da9aa9971
replace std::vector with Vector in benchmark
2019-05-02 10:56:22 +01:00
b52fa38f8c
seed initialisation of RNG5
2019-05-02 10:36:09 +01:00
c43a2b599a
GPU support
2019-01-01 15:07:29 +00:00
b57a4d32aa
Merge branch 'develop' into feature/gpu-port
2018-12-13 05:11:34 +00:00
0ba3d469c7
Benchmark IO in single and double precision
2018-10-17 20:27:34 +01:00
291bc2a1f0
IO benchmark on a list of directories
2018-10-15 17:25:08 +01:00
adbdc4e65b
Half comms not working on GPU yet, so disable.
2018-09-11 05:15:22 +01:00
f4bfeb835d
Drop back to smaller Ls
2018-09-09 14:25:06 +01:00
a15a2dfd29
Merge branch 'develop' into feature/hadrons
2018-08-10 16:08:22 +01:00
27cdb79063
Sha used to seed from a unique string
2018-08-10 15:11:01 +01:00
00b92a91b5
Optimising
2018-07-28 23:46:22 +01:00
65533741f7
7 moms
2018-07-28 16:17:47 +01:00
131a6785d4
Merge branch 'feature/hadrons-a2a' into feature/hadrons-a2a
2018-07-27 23:03:42 +01:00
44f4f5c8e2
Momentum loop
2018-07-27 23:00:16 +01:00
2679df034f
Changes to meson field benchmark. Now includes the gammas in the final part of the naive method, both methods compute
...
lhs^dag*Gamma*rhs (previously Gamma*lhs^dag*rhs), and checks results.
2018-07-27 18:31:10 +01:00
71e1006ba8
Updated meson field benchmark for dirac structures
2018-07-26 09:09:29 +01:00
24128ff109
Changes needed for MF benchmark to work with comms correctly
2018-07-23 15:51:37 +01:00
21a1710b43
Verbose vector length
2018-07-23 06:08:39 -04:00
ec9939c1ba
Test for faster implementation of meson field inner loop
...
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.
It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.
Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.
This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
4b04ae3611
Printing improvement
2018-07-05 06:59:38 -04:00
2f776d51c6
Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,
...
but a bitof (known) work.
2018-07-05 06:58:37 -04:00
25becc9324
GPU tweaks for benchmarking; really necessary?
2018-06-13 20:26:07 +01:00
eb921041d0
Perf count control
2018-05-12 17:57:32 -04:00
bfbf2f1fa0
no threaded stencil benchmark if OpenMP is not supported
2018-05-03 16:20:01 +01:00
1dddd17e3c
Benchmark improvements from tesseract
2018-04-27 11:44:46 +01:00
fa0d8feff4
Performance of CovariantCshift now non-embarrassing.
2018-04-26 17:56:27 +01:00
05b44aef6b
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
...
Conflicts:
benchmarks/Benchmark_su3.cc
2018-04-26 15:38:49 +01:00
91a0a3f820
Improvement
2018-04-26 14:48:35 +01:00
8f44c799a6
Saving the benchmarking tests for Cshift
2018-04-26 14:48:03 +01:00
43f5a0df50
More timers in the integrator
2018-04-26 12:01:56 +09:00
2baf193031
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2018-04-25 00:14:03 +01:00