Peter Boyle
cc220abd1d
inline for HIP
2020-09-16 00:35:38 +01:00
Peter Boyle
d1c0c0197e
HipCC requires inline on definition
2020-09-16 00:35:06 +01:00
Peter Boyle
fd9424ef27
innlines required to make HIP happy
2020-09-16 00:34:32 +01:00
Peter Boyle
a5c35c4024
Make HIP / Vega happy
2020-09-16 00:33:53 +01:00
Peter Boyle
b4255140d6
Stale data member eliminated
2020-09-03 15:47:46 -04:00
Christoph Lehner
0e88bf4bff
remove Nils's default pragma
2020-07-29 10:24:35 -04:00
nmeyer-ur
bbd145382b
enable --enable-simd=A64FX in configure
2020-07-08 12:43:51 +02:00
nmeyer-ur
8726e94ea7
merge upstream develop
2020-07-07 20:26:47 +02:00
Peter Boyle
b949cf6b12
PeekLocal needs a view to keep thread safe.
...
ALLOCATION_CACHEE reenable
2020-06-19 17:13:27 -04:00
Peter Boyle
1aa988b2af
Comms overlap fix UVM case
2020-06-19 01:21:14 -04:00
Peter Boyle
fd97f64612
Merge branch 'sycl' of https://github.com/paboyle/Grid into sycl
2020-06-10 12:58:13 -04:00
Peter Boyle
8720aecb80
Offload more loops
2020-06-10 12:57:55 -04:00
Peter Boyle
cdf0a04fc5
Merge branch 'develop' into sycl
2020-06-09 04:00:12 -04:00
Peter Boyle
e97f3688db
Fix the HMC issue - kernel was launchnig asynchronously
2020-06-08 17:01:15 -04:00
nmeyer-ur
433766ac62
revert Add/SubTimesI and prefetching in stencil
...
This reverts commit 9b2699226c
.
2020-06-08 12:02:53 +02:00
Peter Boyle
1a4c8c3387
Global edit with change to View usage. autoView() creates a wrapper object that closes the view when scope closes.
2020-06-05 18:52:35 -04:00
nmeyer-ur
5ee3ea2144
round-up after testing of prefetches in stencil close
2020-06-03 11:58:20 +02:00
nmeyer-ur
91c81cab30
some corrections; compiles on my laptop; untested
2020-05-29 18:19:22 +02:00
nmeyer-ur
38164f8480
include counters in WilsonFermionImplementation.h
2020-05-29 17:59:26 +02:00
nmeyer-ur
f013979791
add counter support in WilsonFermion.h
2020-05-29 17:13:59 +02:00
Peter Boyle
1d252d0922
Accelerator inline
2020-05-28 11:45:25 -04:00
Peter Boyle
006cc8a8f1
Staggereed move to accelerator
2020-05-28 08:33:06 -04:00
Peter Boyle
7860a50f70
Make view specify where and drive data motion - first cut.
...
This is a compile tiime option --enable-unified=yes/no
2020-05-21 16:13:16 -04:00
nmeyer-ur
9e085bd04e
guard prevents multiple A64FX build messages
2020-05-20 19:16:30 +02:00
Peter Boyle
82f71643a4
Remove the norm in MdagM
2020-05-12 17:55:53 -04:00
nmeyer-ur
20d1941a45
enabled asm kernels for fixed-size A64FXFIXEDSIZE
2020-05-12 19:01:12 +02:00
Peter Boyle
bbbee5660d
First compiile on HiP
2020-05-10 05:28:09 -04:00
Peter Boyle
2bb2c68e15
Separate pools for small and large allocations cache
2020-05-09 22:57:21 -04:00
Peter Boyle
f8b8e00090
Systematise the accelerator primitives and locate to Grid/threads/Accelerator.h / Accelerator.cc
...
Aim to reduce the amount of cuda and other code variations floating around all over the place.
Will move GpuInit iinto Accelerator.cc from Init.cc
Need to worry about SharedMemoryMPI.cc and the Peer2Peer windows
2020-05-08 06:23:55 -07:00
Peter Boyle
93920c4811
Remove verbose
2020-05-08 09:19:54 -04:00
Christoph Lehner
3c6ffcb48c
Merge branch 'develop' into feature/gpt
2020-05-06 15:03:35 +02:00
Peter Boyle
28a1fcaaff
First compile against SYCL
2020-05-05 11:13:27 -07:00
Peter Boyle
dd3ebc2ce4
Slow compile on NVCC switch off conserved current
2020-04-29 08:43:12 -04:00
Peter Boyle
c2c3cad20d
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2020-04-23 04:35:42 -04:00
Peter Boyle
edec9ee2e2
Conserved current rewrite done. Zmobius working
2020-04-23 04:34:01 -04:00
nmeyer-ur
39b448affb
Merge remote-tracking branch 'origin/develop' into feature/a64fx-2
2020-04-22 17:34:12 +02:00
Christopher Kelly
181709bba4
Merge branch 'develop' into feature/zmobius_paramcompute
2020-04-20 09:12:34 -04:00
nils meyer
64b72fc17f
testing gcc 10.0.1: build errors in Exchange1 using -DA64FX and in Lattice_base.h building Dslash only
2020-04-19 01:25:40 +02:00
nils meyer
6fdce60492
revised BodyA64FX; 990 GiB/s Wilson, 687 GiB/s DW using intrinsics (armclang 20.0)
2020-04-16 22:43:32 +02:00
Christoph Lehner
327da332bb
Merge branch 'develop' of https://github.com/paboyle/Grid into feature/gpt
2020-04-16 11:30:17 -04:00
nils meyer
6504a098cc
999 GiB/s Wilson; 694 GiB/s DW (DP)
2020-04-15 15:06:52 +02:00
nils meyer
c12a67030a
980 GiB/s Wilson; 680 GiB/s DW (DP)
2020-04-15 10:55:06 +02:00
nils meyer
581392f2f2
now with pf, best results so far using intrinsics+pf
2020-04-12 22:06:14 +02:00
nils meyer
113f277b6a
enable dslash asm using -DA64FXASM, additionaly -DDSLASHINTRIN for intrinsics impl
2020-04-11 04:55:01 +02:00
nils meyer
974586bedc
Dslash finally works; cleaned up; uses MOVPRFX in assembly
2020-04-10 22:26:40 +02:00
Peter Boyle
8e81a811d0
Merge branch 'feature/hdcr' into develop
2020-04-10 11:14:49 -04:00
nmeyer-ur
160f78c1e4
changed debug output to variable direct 3
2020-04-10 12:23:07 +02:00
nmeyer-ur
7e4e1bbbc2
changed debug output to variable direct 2
2020-04-10 12:22:04 +02:00
nmeyer-ur
e699b7e9f9
changed debug output to variable direct
2020-04-10 12:18:30 +02:00
nmeyer-ur
a28bc0de90
debug register address test in WilsonHand
2020-04-10 12:07:45 +02:00