Peter Boyle
a00ae981e0
Fence propagation from SYCL
2023-03-29 15:00:40 -04:00
Peter Boyle
d32b923b6c
Fencing on a stream in SYCL is needed. Didn't know that ... gulp
2022-08-02 07:58:04 -07:00
u61464
0e21adb3f6
Gives 200GF/s on SyCL/DG1 8^4, doesn't uglify develop for other platforms too badly.
...
Easy to revert to clean more C++ stylistic code. Theres a SYCL_HACK macro I will clean up later once dpcpp
evolves a central nervous systems.
2021-03-10 05:40:51 -08:00
Peter Boyle
442336bd96
Hand unrolled to use optimised code paths on GPU for coalesced reads in Wilson case.
...
Other cases to do. This now includes comms code path.
2021-03-02 14:50:51 +01:00
Peter Boyle
2859955a03
HIP requires "inline"
2020-09-16 00:36:13 +01:00
nmeyer-ur
8726e94ea7
merge upstream develop
2020-07-07 20:26:47 +02:00
Peter Boyle
cdf0a04fc5
Merge branch 'develop' into sycl
2020-06-09 04:00:12 -04:00
Peter Boyle
e97f3688db
Fix the HMC issue - kernel was launchnig asynchronously
2020-06-08 17:01:15 -04:00
nmeyer-ur
433766ac62
revert Add/SubTimesI and prefetching in stencil
...
This reverts commit 9b2699226c7a3ca8d45f843f4f8e4658fa082163.
2020-06-08 12:02:53 +02:00
Peter Boyle
1a4c8c3387
Global edit with change to View usage. autoView() creates a wrapper object that closes the view when scope closes.
2020-06-05 18:52:35 -04:00
nmeyer-ur
5ee3ea2144
round-up after testing of prefetches in stencil close
2020-06-03 11:58:20 +02:00
Peter Boyle
006cc8a8f1
Staggereed move to accelerator
2020-05-28 08:33:06 -04:00
Peter Boyle
7860a50f70
Make view specify where and drive data motion - first cut.
...
This is a compile tiime option --enable-unified=yes/no
2020-05-21 16:13:16 -04:00
Peter Boyle
f8b8e00090
Systematise the accelerator primitives and locate to Grid/threads/Accelerator.h / Accelerator.cc
...
Aim to reduce the amount of cuda and other code variations floating around all over the place.
Will move GpuInit iinto Accelerator.cc from Init.cc
Need to worry about SharedMemoryMPI.cc and the Peer2Peer windows
2020-05-08 06:23:55 -07:00
Christoph Lehner
3c6ffcb48c
Merge branch 'develop' into feature/gpt
2020-05-06 15:03:35 +02:00
Peter Boyle
28a1fcaaff
First compile against SYCL
2020-05-05 11:13:27 -07:00
Peter Boyle
c2c3cad20d
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2020-04-23 04:35:42 -04:00
Peter Boyle
edec9ee2e2
Conserved current rewrite done. Zmobius working
2020-04-23 04:34:01 -04:00
nmeyer-ur
39b448affb
Merge remote-tracking branch 'origin/develop' into feature/a64fx-2
2020-04-22 17:34:12 +02:00
Christoph Lehner
327da332bb
Merge branch 'develop' of https://github.com/paboyle/Grid into feature/gpt
2020-04-16 11:30:17 -04:00
nils meyer
974586bedc
Dslash finally works; cleaned up; uses MOVPRFX in assembly
2020-04-10 22:26:40 +02:00
Christoph Lehner
a2188ea875
remove debugging printf from WilsonKernelsImplementation
2020-03-26 09:12:36 -04:00
Peter Boyle
e5d1c09665
Faster DhopDirAll for little dirac operator coarsening
2020-01-27 12:38:54 -05:00
Peter Boyle
3c3d6a94f3
OPtimising the force term a bit
2020-01-04 03:16:23 -05:00
Peter Boyle
53e3ab4131
Fix force term
2019-08-11 11:06:13 +01:00
Peter Boyle
1282e1067f
Do the force term on the accelerator too. Needed particularly because comms buffers
...
are device memory.
2019-07-29 22:58:35 +01:00
Peter Boyle
fe700a183a
Getting HMC to run
2019-07-26 12:18:29 +01:00
Peter Boyle
bd155ca5c0
Overlap comms with comput now supported
2019-07-12 09:09:40 +01:00
Peter Boyle
1299225105
Accelerator loop changes
2019-06-15 09:03:46 +01:00
Peter Boyle
36f06555a2
Simplify Impl
2019-06-09 22:26:27 +01:00
Peter Boyle
d6c0e0756d
Remove GPU version
2019-06-09 11:23:42 +01:00
Peter Boyle
3e41b1055c
Remove Gpu only kernels.
2019-06-09 11:20:01 +01:00
Peter Boyle
ad2c433574
Instantiations move. Tried using Gianluca's suggestion about avoiding threadIdx but doesn't
...
seem to make a difference. Will revisit this and probably remove the lane parameter from the coalescedRead
2019-06-08 13:43:12 +01:00
Peter Boyle
0ee6e77cbc
Compiles GPU and CPU, still gives good performance on CPU
2019-06-05 13:28:16 +01:00
Peter Boyle
ec68b67d5d
Attempt at unified GPU and CPU kernel
2019-06-03 14:55:51 +01:00