a15a2dfd29
Merge branch 'develop' into feature/hadrons
2018-08-10 16:08:22 +01:00
paboyle
27cdb79063
Sha used to seed from a unique string
2018-08-10 15:11:01 +01:00
Peter Boyle
00b92a91b5
Optimising
2018-07-28 23:46:22 +01:00
paboyle
65533741f7
7 moms
2018-07-28 16:17:47 +01:00
Peter Boyle
131a6785d4
Merge branch 'feature/hadrons-a2a' into feature/hadrons-a2a
2018-07-27 23:03:42 +01:00
paboyle
44f4f5c8e2
Momentum loop
2018-07-27 23:00:16 +01:00
fionnoh
2679df034f
Changes to meson field benchmark. Now includes the gammas in the final part of the naive method, both methods compute
...
lhs^dag*Gamma*rhs (previously Gamma*lhs^dag*rhs), and checks results.
2018-07-27 18:31:10 +01:00
paboyle
71e1006ba8
Updated meson field benchmark for dirac structures
2018-07-26 09:09:29 +01:00
fionnoh
24128ff109
Changes needed for MF benchmark to work with comms correctly
2018-07-23 15:51:37 +01:00
Peter Boyle
21a1710b43
Verbose vector length
2018-07-23 06:08:39 -04:00
paboyle
ec9939c1ba
Test for faster implementation of meson field inner loop
...
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.
It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.
Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.
This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
Peter Boyle
4b04ae3611
Printing improvement
2018-07-05 06:59:38 -04:00
Peter Boyle
2f776d51c6
Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,
...
but a bitof (known) work.
2018-07-05 06:58:37 -04:00
paboyle
25becc9324
GPU tweaks for benchmarking; really necessary?
2018-06-13 20:26:07 +01:00
Peter Boyle
eb921041d0
Perf count control
2018-05-12 17:57:32 -04:00
bfbf2f1fa0
no threaded stencil benchmark if OpenMP is not supported
2018-05-03 16:20:01 +01:00
Dr Peter Boyle
1dddd17e3c
Benchmark improvements from tesseract
2018-04-27 11:44:46 +01:00
Peter Boyle
fa0d8feff4
Performance of CovariantCshift now non-embarrassing.
2018-04-26 17:56:27 +01:00
Peter Boyle
05b44aef6b
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
...
Conflicts:
benchmarks/Benchmark_su3.cc
2018-04-26 15:38:49 +01:00
Peter Boyle
91a0a3f820
Improvement
2018-04-26 14:48:35 +01:00
Peter Boyle
8f44c799a6
Saving the benchmarking tests for Cshift
2018-04-26 14:48:03 +01:00
Guido Cossu
43f5a0df50
More timers in the integrator
2018-04-26 12:01:56 +09:00
paboyle
2baf193031
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2018-04-25 00:14:03 +01:00
paboyle
362ba0443a
Cshift updates
2018-04-25 00:12:11 +01:00
Guido Cossu
c5b9147b53
Correction of a minor bug in the su3 benchmark
2018-04-24 08:03:57 -07:00
Guido Cossu
a1be533329
Corrected Flop count in Benchmark su3 and expanded the Wilson flow output
2018-04-24 01:19:53 -07:00
paboyle
b5510427f9
physical fermion interface, cshift benchmark in SU3.
2018-04-18 01:43:29 +01:00
paboyle
276f113f28
IO uses master boss node for metadata.
2018-03-30 16:17:05 +01:00
paboyle
ab6afd18ac
Still compile if no LIME
2018-03-30 13:39:20 +01:00
c5a885dcd6
I/O benchmark
2018-03-29 19:57:41 +01:00
Peter Boyle
6fe9b28a82
Cosmetic
2018-03-24 19:27:14 -04:00
Peter Boyle
b002587d7c
Simplify
2018-03-24 19:26:44 -04:00
Peter Boyle
6c08385782
Simplify
2018-03-24 19:26:19 -04:00
Peter Boyle
a3690071b4
Warm up GPu
2018-03-22 18:05:20 -04:00
Peter Boyle
5ac96dbdc6
Warm behaviour in SU3 benchmark
2018-03-20 07:18:31 -04:00
paboyle
aead94e9a7
View introduced
2018-03-04 16:39:29 +00:00
paboyle
36ea5f6b77
gpu friendly coordinates ; no std::vector on GPU
2018-02-24 22:20:14 +00:00
Guido Cossu
fb24e3a7d2
Adding utilities for perf profiling
2018-01-29 11:11:45 +01:00
paboyle
604c05f4b8
parallel_for elimination -> thread_loop
2018-01-28 01:01:36 +00:00
paboyle
ce4da83bc2
Zero changes, literally
2018-01-27 23:51:10 +00:00
paboyle
c4f82e072b
_grid becomes private ; use Grid()§
2018-01-27 00:04:12 +00:00
paboyle
2a4a0e43c1
Hide internals
2018-01-26 23:08:27 +00:00
paboyle
f4010023ca
Warning fixes
2018-01-25 23:46:47 +00:00
paboyle
e7cba358c2
Temporary update to reflect the new dropping of std::vector in Lattice
...
Will update again to hide the internals in an interface
2018-01-25 23:31:41 +00:00
Guido Cossu
cff3bae155
Adding support for general Nc in the benchmark outputs
2018-01-25 13:46:31 +01:00
paboyle
918c105c57
NVCC warning elimination
2018-01-24 13:23:59 +00:00
paboyle
d74c21a386
GLobal edit for QCD namespace removal & NAMESPACE macros
2018-01-15 09:37:58 +00:00
paboyle
9b32d51cd1
Simplify comms layer proliferatoin
2018-01-08 11:27:14 +00:00
paboyle
4f8b6f26b4
Merge branch 'develop' into feature/dwf-multirhs
2017-10-02 11:41:49 +01:00
Peter Boyle
bfb68e6f02
Merge pull request #130 from giltirn/gparity-handunroll
...
Gparity handunroll
2017-09-21 10:11:00 +01:00