ferben
6c6812a5ca
GB/s output
2020-05-20 12:26:57 +01:00
ferben
1f154fe652
some cleanup in BaryonUtils
2020-05-19 13:48:56 +01:00
ferben
d708c0258d
some cleanup in BaryonUtils
2020-05-19 13:48:00 +01:00
Peter Boyle
2e652431e5
No compile on summiit fix
2020-05-12 18:56:47 -04:00
Peter Boyle
82f71643a4
Remove the norm in MdagM
2020-05-12 17:55:53 -04:00
nmeyer-ur
20d1941a45
enabled asm kernels for fixed-size A64FXFIXEDSIZE
2020-05-12 19:01:12 +02:00
Peter Boyle
bbbee5660d
First compiile on HiP
2020-05-10 05:28:09 -04:00
Peter Boyle
2bb2c68e15
Separate pools for small and large allocations cache
2020-05-09 22:57:21 -04:00
Peter Boyle
f8b8e00090
Systematise the accelerator primitives and locate to Grid/threads/Accelerator.h / Accelerator.cc
...
Aim to reduce the amount of cuda and other code variations floating around all over the place.
Will move GpuInit iinto Accelerator.cc from Init.cc
Need to worry about SharedMemoryMPI.cc and the Peer2Peer windows
2020-05-08 06:23:55 -07:00
Peter Boyle
0dd1bdfa94
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2020-05-08 09:21:43 -04:00
Peter Boyle
93920c4811
Remove verbose
2020-05-08 09:19:54 -04:00
ferben
42bb5f0721
asserrtion
2020-05-07 18:06:12 +01:00
ferben
253bcc3426
back to old version
2020-05-07 18:03:17 +01:00
ferben
591ebb6213
Merge branch 'develop' of github.com:paboyle/Grid into feature/baryonSpeedup
2020-05-07 11:13:21 +01:00
ferben
56e2f7d088
deleted test routines. cleaned up fast version. assert Ns=4,Nc=3.
2020-05-07 10:03:45 +01:00
Christoph Lehner
3c6ffcb48c
Merge branch 'develop' into feature/gpt
2020-05-06 15:03:35 +02:00
Peter Boyle
28a1fcaaff
First compile against SYCL
2020-05-05 11:13:27 -07:00
Peter Boyle
dd3ebc2ce4
Slow compile on NVCC switch off conserved current
2020-04-29 08:43:12 -04:00
ferben
6240e02619
added assertion to avoid potential infinite loop
2020-04-27 18:50:53 +01:00
ferben
f4033ad8cb
baryon speedup by a factor 2
2020-04-27 17:46:14 +01:00
Peter Boyle
c2c3cad20d
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2020-04-23 04:35:42 -04:00
Peter Boyle
edec9ee2e2
Conserved current rewrite done. Zmobius working
2020-04-23 04:34:01 -04:00
nmeyer-ur
39b448affb
Merge remote-tracking branch 'origin/develop' into feature/a64fx-2
2020-04-22 17:34:12 +02:00
Christopher Kelly
181709bba4
Merge branch 'develop' into feature/zmobius_paramcompute
2020-04-20 09:12:34 -04:00
nils meyer
64b72fc17f
testing gcc 10.0.1: build errors in Exchange1 using -DA64FX and in Lattice_base.h building Dslash only
2020-04-19 01:25:40 +02:00
nils meyer
6fdce60492
revised BodyA64FX; 990 GiB/s Wilson, 687 GiB/s DW using intrinsics (armclang 20.0)
2020-04-16 22:43:32 +02:00
Peter Boyle
0475c46ecb
Merge pull request #256 from djm2131/feature/BiCGSTAB
...
Import BiCGSTAB solvers and tests
2020-04-16 11:45:15 -04:00
Christoph Lehner
327da332bb
Merge branch 'develop' of https://github.com/paboyle/Grid into feature/gpt
2020-04-16 11:30:17 -04:00
nils meyer
6504a098cc
999 GiB/s Wilson; 694 GiB/s DW (DP)
2020-04-15 15:06:52 +02:00
nils meyer
c12a67030a
980 GiB/s Wilson; 680 GiB/s DW (DP)
2020-04-15 10:55:06 +02:00
nils meyer
581392f2f2
now with pf, best results so far using intrinsics+pf
2020-04-12 22:06:14 +02:00
nils meyer
113f277b6a
enable dslash asm using -DA64FXASM, additionaly -DDSLASHINTRIN for intrinsics impl
2020-04-11 04:55:01 +02:00
nils meyer
974586bedc
Dslash finally works; cleaned up; uses MOVPRFX in assembly
2020-04-10 22:26:40 +02:00
Peter Boyle
8e81a811d0
Merge branch 'feature/hdcr' into develop
2020-04-10 11:14:49 -04:00
nmeyer-ur
160f78c1e4
changed debug output to variable direct 3
2020-04-10 12:23:07 +02:00
nmeyer-ur
7e4e1bbbc2
changed debug output to variable direct 2
2020-04-10 12:22:04 +02:00
nmeyer-ur
e699b7e9f9
changed debug output to variable direct
2020-04-10 12:18:30 +02:00
nmeyer-ur
a28bc0de90
debug register address test in WilsonHand
2020-04-10 12:07:45 +02:00
nmeyer-ur
14d0fe4d6c
added predication in WilsonHand
2020-04-10 12:04:00 +02:00
nmeyer-ur
0ad2e0815c
debug output in WilsonHand
2020-04-10 11:56:29 +02:00
nils meyer
dc9c8340bb
switched to DSLASHINTRIN for A64FX Dslash intrinsics
2020-04-09 23:30:23 +02:00
nils meyer
19eef97503
specialized A64FX Dslash kernels
2020-04-09 23:25:25 +02:00
nils meyer
5cdbb7e71e
fixed A64FX Dslash; compiles, but does not specialize -> assertion
2020-04-09 21:23:39 +02:00
nmeyer-ur
86c9c4da8b
changes
2020-04-09 16:40:06 +02:00
nmeyer-ur
bd310932f7
changes
2020-04-09 16:32:31 +02:00
nmeyer-ur
77fa586f6c
introduced A64FX Wilson kernels
2020-04-09 13:30:06 +02:00
2c22db841a
Added momentum scaling to scalar HMC theories in order to follow UKQCD/CPS conventions
2020-04-02 17:38:47 +01:00
Christoph Lehner
b6cbdd2aa3
Merge pull request #1 from DanielRichtmann/feature/read-openqcd
...
Feature/read openqcd
2020-03-26 17:39:04 +01:00
Christoph Lehner
a2188ea875
remove debugging printf from WilsonKernelsImplementation
2020-03-26 09:12:36 -04:00
Daniel Richtmann
989af65807
Check in parallel reader for openqcd configs
2020-03-24 11:20:54 +01:00