1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-11-15 02:05:37 +00:00
Commit Graph

366 Commits

Author SHA1 Message Date
u37294
59c51d2c35 Make compile if HAVE_LIME=0 2020-05-04 10:26:20 -07:00
nils meyer
64b72fc17f testing gcc 10.0.1: build errors in Exchange1 using -DA64FX and in Lattice_base.h building Dslash only 2020-04-19 01:25:40 +02:00
Peter Boyle
e279b2be29 Merge develop 2019-08-14 23:01:59 +01:00
Peter Boyle
48e6efc7c9 Merge branch 'develop' into feature/gpu-port
Conflicts:
	Grid/qcd/action/fermion/WilsonKernelsAsm.cc
	Grid/qcd/action/fermion/implementation/ImprovedStaggeredFermionImplementation.h
	Grid/qcd/action/fermion/implementation/StaggeredKernelsAsm.h
	benchmarks/Benchmark_comms.cc
2019-08-14 18:56:54 +01:00
Peter Boyle
263dcbabab Simplify the comms benchmark 2019-07-30 22:51:04 +01:00
Peter Boyle
d85dcc72df Multinode fix 2019-07-20 07:13:28 +01:00
Peter Boyle
0561c2edeb Benchmarks modified for new GPU constructs 2019-06-15 12:52:56 +01:00
Peter Boyle
3e41b1055c Remove Gpu only kernels. 2019-06-09 11:20:01 +01:00
Peter Boyle
da8d87e9da Cuda switch off 2019-06-08 17:11:38 +01:00
Peter Boyle
6d77941990 Drop the 5D vec actions 2019-06-08 13:38:05 +01:00
Peter Boyle
47c063f984 Remove Ls Vec cases from benchmarks 2019-06-04 20:45:35 +01:00
Peter Boyle
ee6f96d85c
Merge pull request #210 from grid-test-organisation/feature/gpu-port-develop
Cayley fermion functions for GPUs
2019-05-18 19:06:20 +01:00
Peter Boyle
4e9df9e93c GPU patches 2019-05-18 17:43:11 +01:00
gfilaci
e3c56fd9b3 CayleyZeroCounters before benchmark loop 2019-05-13 15:52:00 +01:00
gfilaci
d9438627d9 M5D benchmark without vector copy overhead 2019-05-02 11:10:57 +01:00
gfilaci
6da9aa9971 replace std::vector with Vector in benchmark 2019-05-02 10:56:22 +01:00
gfilaci
b52fa38f8c seed initialisation of RNG5 2019-05-02 10:36:09 +01:00
Peter Boyle
c43a2b599a GPU support 2019-01-01 15:07:29 +00:00
Peter Boyle
b57a4d32aa Merge branch 'develop' into feature/gpu-port 2018-12-13 05:11:34 +00:00
0ba3d469c7 Benchmark IO in single and double precision 2018-10-17 20:27:34 +01:00
291bc2a1f0 IO benchmark on a list of directories 2018-10-15 17:25:08 +01:00
Peter Boyle
adbdc4e65b Half comms not working on GPU yet, so disable. 2018-09-11 05:15:22 +01:00
Peter Boyle
f4bfeb835d Drop back to smaller Ls 2018-09-09 14:25:06 +01:00
a15a2dfd29 Merge branch 'develop' into feature/hadrons 2018-08-10 16:08:22 +01:00
paboyle
27cdb79063 Sha used to seed from a unique string 2018-08-10 15:11:01 +01:00
Peter Boyle
00b92a91b5 Optimising 2018-07-28 23:46:22 +01:00
paboyle
65533741f7 7 moms 2018-07-28 16:17:47 +01:00
Peter Boyle
131a6785d4
Merge branch 'feature/hadrons-a2a' into feature/hadrons-a2a 2018-07-27 23:03:42 +01:00
paboyle
44f4f5c8e2 Momentum loop 2018-07-27 23:00:16 +01:00
fionnoh
2679df034f Changes to meson field benchmark. Now includes the gammas in the final part of the naive method, both methods compute
lhs^dag*Gamma*rhs (previously Gamma*lhs^dag*rhs), and checks results.
2018-07-27 18:31:10 +01:00
paboyle
71e1006ba8 Updated meson field benchmark for dirac structures 2018-07-26 09:09:29 +01:00
fionnoh
24128ff109 Changes needed for MF benchmark to work with comms correctly 2018-07-23 15:51:37 +01:00
Peter Boyle
21a1710b43 Verbose vector length 2018-07-23 06:08:39 -04:00
paboyle
ec9939c1ba Test for faster implementation of meson field inner loop
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.

It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.

Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.

This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
Peter Boyle
4b04ae3611 Printing improvement 2018-07-05 06:59:38 -04:00
Peter Boyle
2f776d51c6 Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,
but a bitof (known) work.
2018-07-05 06:58:37 -04:00
paboyle
25becc9324 GPU tweaks for benchmarking; really necessary? 2018-06-13 20:26:07 +01:00
Peter Boyle
eb921041d0 Perf count control 2018-05-12 17:57:32 -04:00
bfbf2f1fa0 no threaded stencil benchmark if OpenMP is not supported 2018-05-03 16:20:01 +01:00
Dr Peter Boyle
1dddd17e3c Benchmark improvements from tesseract 2018-04-27 11:44:46 +01:00
Peter Boyle
fa0d8feff4 Performance of CovariantCshift now non-embarrassing. 2018-04-26 17:56:27 +01:00
Peter Boyle
05b44aef6b Merge branch 'develop' of https://github.com/paboyle/Grid into develop
Conflicts:
	benchmarks/Benchmark_su3.cc
2018-04-26 15:38:49 +01:00
Peter Boyle
91a0a3f820 Improvement 2018-04-26 14:48:35 +01:00
Peter Boyle
8f44c799a6 Saving the benchmarking tests for Cshift 2018-04-26 14:48:03 +01:00
Guido Cossu
43f5a0df50 More timers in the integrator 2018-04-26 12:01:56 +09:00
paboyle
2baf193031 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-04-25 00:14:03 +01:00
paboyle
362ba0443a Cshift updates 2018-04-25 00:12:11 +01:00
Guido Cossu
c5b9147b53 Correction of a minor bug in the su3 benchmark 2018-04-24 08:03:57 -07:00
Guido Cossu
a1be533329 Corrected Flop count in Benchmark su3 and expanded the Wilson flow output 2018-04-24 01:19:53 -07:00
paboyle
b5510427f9 physical fermion interface, cshift benchmark in SU3. 2018-04-18 01:43:29 +01:00
paboyle
276f113f28 IO uses master boss node for metadata. 2018-03-30 16:17:05 +01:00
paboyle
ab6afd18ac Still compile if no LIME 2018-03-30 13:39:20 +01:00
c5a885dcd6 I/O benchmark 2018-03-29 19:57:41 +01:00
Peter Boyle
6fe9b28a82 Cosmetic 2018-03-24 19:27:14 -04:00
Peter Boyle
b002587d7c Simplify 2018-03-24 19:26:44 -04:00
Peter Boyle
6c08385782 Simplify 2018-03-24 19:26:19 -04:00
Peter Boyle
a3690071b4 Warm up GPu 2018-03-22 18:05:20 -04:00
Peter Boyle
5ac96dbdc6 Warm behaviour in SU3 benchmark 2018-03-20 07:18:31 -04:00
paboyle
aead94e9a7 View introduced 2018-03-04 16:39:29 +00:00
paboyle
36ea5f6b77 gpu friendly coordinates ; no std::vector on GPU 2018-02-24 22:20:14 +00:00
Guido Cossu
fb24e3a7d2 Adding utilities for perf profiling 2018-01-29 11:11:45 +01:00
paboyle
604c05f4b8 parallel_for elimination -> thread_loop 2018-01-28 01:01:36 +00:00
paboyle
ce4da83bc2 Zero changes, literally 2018-01-27 23:51:10 +00:00
paboyle
c4f82e072b _grid becomes private ; use Grid()§ 2018-01-27 00:04:12 +00:00
paboyle
2a4a0e43c1 Hide internals 2018-01-26 23:08:27 +00:00
paboyle
f4010023ca Warning fixes 2018-01-25 23:46:47 +00:00
paboyle
e7cba358c2 Temporary update to reflect the new dropping of std::vector in Lattice
Will update again to hide the internals in an interface
2018-01-25 23:31:41 +00:00
Guido Cossu
cff3bae155 Adding support for general Nc in the benchmark outputs 2018-01-25 13:46:31 +01:00
paboyle
918c105c57 NVCC warning elimination 2018-01-24 13:23:59 +00:00
paboyle
d74c21a386 GLobal edit for QCD namespace removal & NAMESPACE macros 2018-01-15 09:37:58 +00:00
paboyle
9b32d51cd1 Simplify comms layer proliferatoin 2018-01-08 11:27:14 +00:00
paboyle
4f8b6f26b4 Merge branch 'develop' into feature/dwf-multirhs 2017-10-02 11:41:49 +01:00
Peter Boyle
bfb68e6f02 Merge pull request #130 from giltirn/gparity-handunroll
Gparity handunroll
2017-09-21 10:11:00 +01:00
paboyle
17c5b0f152 Patching comparison point 2017-09-16 18:18:07 +01:00
Peter Boyle
b331be9101 Better reporting 2017-08-31 11:32:57 +01:00
Peter Boyle
49c20a9fa8 Patch to reporting 2017-08-31 11:32:21 +01:00
paboyle
7359df3501 Full reporting for benchmark; save robustness factor 2017-08-31 10:42:35 +01:00
Christopher Kelly
d36d2fb40d Added ability to override default Ls in Benchmark_dwf 2017-08-28 06:53:56 -07:00
Peter Boyle
5b9267e88d Cleaner comms benchmark treatment for one node runs 2017-08-27 18:24:48 -04:00
paboyle
15fd4003ef Improving presentation of results 2017-08-27 13:46:02 +01:00
paboyle
ad89abb018 Fix 2017-08-25 20:43:37 +01:00
paboyle
80c5bce5bb Merge branch 'develop' into feature/multi-communicator 2017-08-25 20:21:26 +01:00
Peter Boyle
d0f3d525d5 Optimal block size for KNL 2017-08-25 19:33:54 +01:00
Peter Boyle
3a58217405 Updated 2017-08-25 14:29:53 +01:00
Peter Boyle
c289699d9a updated from cambridge mpi3 shakeout 2017-08-25 11:41:01 +01:00
Peter Boyle
c3b1263e75 Benchmark prep 2017-08-25 09:25:54 +01:00
Christopher Kelly
edabb3577f Imported Benchmark_gparity 2017-08-23 16:54:06 -04:00
paboyle
ae56e556c6 finalise issue on new OPA revert 2017-08-20 02:53:12 +01:00
paboyle
383ca7d392 Switch off comms for now until feature/multi-communicator is merged 2017-08-20 01:27:48 +01:00
paboyle
a446d95c33 Trying to pass TeamCity and Travis 2017-08-20 01:10:50 +01:00
paboyle
be66e7dd95 Merge branch 'develop' into feature/multi-communicator 2017-08-19 23:12:38 +01:00
paboyle
bfef525ed2 New benchmark prep 2017-08-19 23:10:12 +01:00
Peter Boyle
7d88198387 Merge branch 'develop' into feature/multi-communicator 2017-08-19 13:03:35 -04:00
Peter Boyle
9e658de238 Use Vector 2017-08-19 12:52:44 -04:00
Peter Boyle
14d53e1c9e Threaded MPI calls patches 2017-07-29 13:08:10 -04:00
Peter Boyle
40e119c61c NUMA improvements worth preserving from AMD EPYC tests 2017-07-08 22:27:11 -04:00
Peter Boyle
b73bd151bb Switch off counters by default 2017-06-30 10:16:35 +01:00
Peter Boyle
694b305cab Update to reporting 2017-06-30 10:16:13 +01:00
paboyle
6f5a5cd9b3 Improved threaded comms benchmark 2017-06-28 23:27:02 +01:00
Peter Boyle
08e04b9676 Better benchmarks 2017-06-28 15:30:06 +01:00
paboyle
54e94360ad Experimental: Multiple communicators to see if we can avoid thread locks in --enable-comms=mpit 2017-06-24 23:10:24 +01:00
paboyle
6ebf9f15b7 Splitting communicators first cut 2017-06-22 08:14:34 +01:00
paboyle
3bfd1f13e6 I/O improvements 2017-06-11 23:14:10 +01:00
Peter Boyle
725c513d94 Better MPI3 benchmarking 2017-05-29 16:47:32 -04:00
Guido Cossu
0ffc235741 Adding more statistics to the Benchmark_comms. Min and max 2017-05-19 10:55:04 +01:00
Guido Cossu
8e19c99c7d Adding more statistical info in the Benchmark_comms 2017-05-18 19:07:35 +01:00
Guido Cossu
a0bc0ad06f Reverting change in Bechmark_comms. Keeping 300 iterations 2017-05-18 17:48:11 +01:00
Guido Cossu
bc862ce3ab Fixing an allocation issue in Benchmark_comms 2017-05-18 14:44:56 +01:00
paboyle
751f2b9703 Better check and benchmark driving 2017-05-05 19:54:38 +01:00
Guido Cossu
20999c1370 Merge branch 'develop' into feature/hmc_generalise 2017-05-05 12:47:17 +01:00
Peter Boyle
945767c6d8 More info 2017-05-03 20:26:35 -04:00
Peter Boyle
92e364a35f Better reporting in benchmark for MPI3 2017-05-03 15:43:36 -04:00
Guido Cossu
4063238943 Adding HMC test file example for Mobius + smearing 2017-05-01 13:44:00 +01:00
Guido Cossu
3344788fa1 Merge branch 'develop' into feature/hmc_generalise 2017-05-01 12:13:56 +01:00
paboyle
738c1a11c2 longer nloop 2017-04-26 08:43:20 +01:00
paboyle
ab66bac4e6 Think I'm getting on top of the reduced cost exterior precomputed list of links 2017-04-25 08:50:26 +01:00
paboyle
c429ace748 Cleaner OpenMP use 2017-04-22 20:28:42 +01:00
Peter Boyle
1d1b225497 Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node). 2017-04-22 09:05:28 -04:00
paboyle
fc4ab9ccd5 Working half precision comms 2017-04-20 11:20:26 +01:00
Guido Cossu
8c540333d5 Merge branch 'develop' into feature/hmc_generalise 2017-04-05 14:41:04 +01:00
paboyle
f18f5ed926 Drop random device 2017-04-02 00:26:26 +09:00
paboyle
4b17e8eba8 Merge branch 'develop' into feature/bgq-asm
Conflicts:
	lib/qcd/action/fermion/Fermion.h
	lib/qcd/action/fermion/WilsonFermion.cc
	lib/util/Init.cc
	tests/Test_cayley_even_odd_vec.cc
2017-03-28 04:49:30 -04:00
paboyle
18bde08d1b Merge branch 'feature/staggering' into develop 2017-03-28 15:25:55 +09:00
paboyle
e099dcdae7 Merge branch 'develop' into feature/bgq-asm 2017-02-23 00:25:29 +00:00
azusayamaguchi
1c30e9a961 Verified 2017-02-21 23:01:25 +00:00
paboyle
3ae92fa2e6 Global changes to parallel_for structure.
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
paboyle
1a30455a10 1000 iters on bmark for more accurate timing 2017-02-20 17:47:01 -05:00
paboyle
aca7a3ef0a Optimisation control improvements 2017-02-10 18:22:31 -05:00
Guido Cossu
8b6a6c8236 Resolving small merge conflict 2017-02-09 16:20:24 +00:00
Guido Cossu
e0571c872b Merge branch 'develop' into feature/hmc_generalise 2017-02-09 16:12:00 +00:00
paboyle
2bf4688e83 Running on BNL KNL 2017-02-07 01:32:10 -05:00
paboyle
060da786e9 Comms benchmark improvements 2017-02-07 01:07:39 -05:00
Guido Cossu
17629b8d9e Merge branch 'develop' into feature/hmc_generalise 2017-01-25 11:33:53 +00:00
a37e71f362 New automatic implementation of gamma matrices, Meson and SeqGamma are broken 2017-01-23 19:13:43 -08:00
azusayamaguchi
05c1924819 Timing loop change 2017-01-23 10:43:45 +00:00
Peter Boyle
55cb22ad67 Z mobius bmark 2016-12-18 00:55:37 +00:00
Guido Cossu
0bd296dda4 Adding check of the Dag part in the benchmark 2016-12-14 03:15:09 +00:00
Peter Boyle
ff71a8e847 Ready for sim 2016-12-08 17:00:32 +00:00
Peter Boyle
e27c6b217c Updating 2016-12-01 12:42:53 +00:00
Peter Boyle
cd01c1dbe9 Ls 16 more relevant 2016-11-30 22:11:10 +00:00
paboyle
bd0430b34f Serialisation in malloc fixed 2016-11-29 22:27:55 +00:00
Azusa Yamaguchi
c097fd041a Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering 2016-11-29 13:44:17 +00:00
Azusa Yamaguchi
389e0a77bd Staggerd Fermion 5D 2016-11-29 13:13:56 +00:00
paboyle
2f92b4860b Test the full Mooee sector 2016-11-29 00:15:08 +00:00
Azusa Yamaguchi
95f43d27ae Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering 2016-11-22 13:49:22 +00:00
Azusa Yamaguchi
668ca57702 Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering 2016-11-22 13:49:11 +00:00
433afd36f5 Makefile rule for simple_* objects 2016-11-19 01:33:13 +01:00
042ae5b87c generic 256bits SIMD 2016-11-15 12:16:15 +00:00
paboyle
33dc1f51b5 Final sign off commits from Cori-1 2016-11-09 04:11:03 -08:00
Azusa Yamaguchi
ee686a7d85 Compiles now 2016-11-03 16:58:23 +00:00