1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-24 02:32:02 +01:00
Commit Graph

676 Commits

Author SHA1 Message Date
20d1941a45 enabled asm kernels for fixed-size A64FXFIXEDSIZE 2020-05-12 19:01:12 +02:00
bbbee5660d First compiile on HiP 2020-05-10 05:28:09 -04:00
2bb2c68e15 Separate pools for small and large allocations cache 2020-05-09 22:57:21 -04:00
f8b8e00090 Systematise the accelerator primitives and locate to Grid/threads/Accelerator.h / Accelerator.cc
Aim to reduce the amount of cuda and other code variations floating around all over the place.

Will move GpuInit iinto Accelerator.cc from Init.cc
Need to worry about SharedMemoryMPI.cc and the Peer2Peer windows
2020-05-08 06:23:55 -07:00
0dd1bdfa94 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2020-05-08 09:21:43 -04:00
93920c4811 Remove verbose 2020-05-08 09:19:54 -04:00
42bb5f0721 asserrtion 2020-05-07 18:06:12 +01:00
253bcc3426 back to old version 2020-05-07 18:03:17 +01:00
591ebb6213 Merge branch 'develop' of github.com:paboyle/Grid into feature/baryonSpeedup 2020-05-07 11:13:21 +01:00
56e2f7d088 deleted test routines. cleaned up fast version. assert Ns=4,Nc=3. 2020-05-07 10:03:45 +01:00
3c6ffcb48c Merge branch 'develop' into feature/gpt 2020-05-06 15:03:35 +02:00
28a1fcaaff First compile against SYCL 2020-05-05 11:13:27 -07:00
dd3ebc2ce4 Slow compile on NVCC switch off conserved current 2020-04-29 08:43:12 -04:00
6240e02619 added assertion to avoid potential infinite loop 2020-04-27 18:50:53 +01:00
f4033ad8cb baryon speedup by a factor 2 2020-04-27 17:46:14 +01:00
c2c3cad20d Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2020-04-23 04:35:42 -04:00
edec9ee2e2 Conserved current rewrite done. Zmobius working 2020-04-23 04:34:01 -04:00
39b448affb Merge remote-tracking branch 'origin/develop' into feature/a64fx-2 2020-04-22 17:34:12 +02:00
181709bba4 Merge branch 'develop' into feature/zmobius_paramcompute 2020-04-20 09:12:34 -04:00
64b72fc17f testing gcc 10.0.1: build errors in Exchange1 using -DA64FX and in Lattice_base.h building Dslash only 2020-04-19 01:25:40 +02:00
6fdce60492 revised BodyA64FX; 990 GiB/s Wilson, 687 GiB/s DW using intrinsics (armclang 20.0) 2020-04-16 22:43:32 +02:00
0475c46ecb Merge pull request #256 from djm2131/feature/BiCGSTAB
Import BiCGSTAB solvers and tests
2020-04-16 11:45:15 -04:00
327da332bb Merge branch 'develop' of https://github.com/paboyle/Grid into feature/gpt 2020-04-16 11:30:17 -04:00
6504a098cc 999 GiB/s Wilson; 694 GiB/s DW (DP) 2020-04-15 15:06:52 +02:00
c12a67030a 980 GiB/s Wilson; 680 GiB/s DW (DP) 2020-04-15 10:55:06 +02:00
581392f2f2 now with pf, best results so far using intrinsics+pf 2020-04-12 22:06:14 +02:00
113f277b6a enable dslash asm using -DA64FXASM, additionaly -DDSLASHINTRIN for intrinsics impl 2020-04-11 04:55:01 +02:00
974586bedc Dslash finally works; cleaned up; uses MOVPRFX in assembly 2020-04-10 22:26:40 +02:00
8e81a811d0 Merge branch 'feature/hdcr' into develop 2020-04-10 11:14:49 -04:00
160f78c1e4 changed debug output to variable direct 3 2020-04-10 12:23:07 +02:00
7e4e1bbbc2 changed debug output to variable direct 2 2020-04-10 12:22:04 +02:00
e699b7e9f9 changed debug output to variable direct 2020-04-10 12:18:30 +02:00
a28bc0de90 debug register address test in WilsonHand 2020-04-10 12:07:45 +02:00
14d0fe4d6c added predication in WilsonHand 2020-04-10 12:04:00 +02:00
0ad2e0815c debug output in WilsonHand 2020-04-10 11:56:29 +02:00
dc9c8340bb switched to DSLASHINTRIN for A64FX Dslash intrinsics 2020-04-09 23:30:23 +02:00
19eef97503 specialized A64FX Dslash kernels 2020-04-09 23:25:25 +02:00
5cdbb7e71e fixed A64FX Dslash; compiles, but does not specialize -> assertion 2020-04-09 21:23:39 +02:00
86c9c4da8b changes 2020-04-09 16:40:06 +02:00
bd310932f7 changes 2020-04-09 16:32:31 +02:00
77fa586f6c introduced A64FX Wilson kernels 2020-04-09 13:30:06 +02:00
2c22db841a Added momentum scaling to scalar HMC theories in order to follow UKQCD/CPS conventions 2020-04-02 17:38:47 +01:00
b6cbdd2aa3 Merge pull request #1 from DanielRichtmann/feature/read-openqcd
Feature/read openqcd
2020-03-26 17:39:04 +01:00
a2188ea875 remove debugging printf from WilsonKernelsImplementation 2020-03-26 09:12:36 -04:00
989af65807 Check in parallel reader for openqcd configs 2020-03-24 11:20:54 +01:00
c9b737a4e7 make trace,adj,transpose unary operators 2020-03-16 17:58:30 -04:00
037bb6ea73 Check in reader for openqcd configs
This reader is suboptimal in the sense that it opens the entire config on every MPI rank.
2020-03-16 14:28:02 +01:00
7c061e20c9 All directions of dirac operator for fastt coarsening 2020-01-27 12:40:13 -05:00
e5d1c09665 Faster DhopDirAll for little dirac operator coarsening 2020-01-27 12:38:54 -05:00
8016a465ae Remove extraneous variable 2020-01-27 12:35:37 -05:00