1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-04-12 07:00:45 +01:00

35 Commits

Author SHA1 Message Date
Peter Boyle
a00ae981e0 Fence propagation from SYCL 2023-03-29 15:00:40 -04:00
Peter Boyle
d32b923b6c Fencing on a stream in SYCL is needed. Didn't know that ... gulp 2022-08-02 07:58:04 -07:00
u61464
0e21adb3f6 Gives 200GF/s on SyCL/DG1 8^4, doesn't uglify develop for other platforms too badly.
Easy to revert to clean more C++ stylistic code. Theres a SYCL_HACK macro I will clean up later once dpcpp
evolves a central nervous systems.
2021-03-10 05:40:51 -08:00
Peter Boyle
442336bd96 Hand unrolled to use optimised code paths on GPU for coalesced reads in Wilson case.
Other cases to do. This now includes comms code path.
2021-03-02 14:50:51 +01:00
Peter Boyle
2859955a03 HIP requires "inline" 2020-09-16 00:36:13 +01:00
nmeyer-ur
8726e94ea7 merge upstream develop 2020-07-07 20:26:47 +02:00
Peter Boyle
cdf0a04fc5 Merge branch 'develop' into sycl 2020-06-09 04:00:12 -04:00
Peter Boyle
e97f3688db Fix the HMC issue - kernel was launchnig asynchronously 2020-06-08 17:01:15 -04:00
nmeyer-ur
433766ac62 revert Add/SubTimesI and prefetching in stencil
This reverts commit 9b2699226c7a3ca8d45f843f4f8e4658fa082163.
2020-06-08 12:02:53 +02:00
Peter Boyle
1a4c8c3387 Global edit with change to View usage. autoView() creates a wrapper object that closes the view when scope closes. 2020-06-05 18:52:35 -04:00
nmeyer-ur
5ee3ea2144 round-up after testing of prefetches in stencil close 2020-06-03 11:58:20 +02:00
Peter Boyle
006cc8a8f1 Staggereed move to accelerator 2020-05-28 08:33:06 -04:00
Peter Boyle
7860a50f70 Make view specify where and drive data motion - first cut.
This is a compile tiime option --enable-unified=yes/no
2020-05-21 16:13:16 -04:00
Peter Boyle
f8b8e00090 Systematise the accelerator primitives and locate to Grid/threads/Accelerator.h / Accelerator.cc
Aim to reduce the amount of cuda and other code variations floating around all over the place.

Will move GpuInit iinto Accelerator.cc from Init.cc
Need to worry about SharedMemoryMPI.cc and the Peer2Peer windows
2020-05-08 06:23:55 -07:00
Christoph Lehner
3c6ffcb48c
Merge branch 'develop' into feature/gpt 2020-05-06 15:03:35 +02:00
Peter Boyle
28a1fcaaff First compile against SYCL 2020-05-05 11:13:27 -07:00
Peter Boyle
c2c3cad20d Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2020-04-23 04:35:42 -04:00
Peter Boyle
edec9ee2e2 Conserved current rewrite done. Zmobius working 2020-04-23 04:34:01 -04:00
nmeyer-ur
39b448affb Merge remote-tracking branch 'origin/develop' into feature/a64fx-2 2020-04-22 17:34:12 +02:00
Christoph Lehner
327da332bb Merge branch 'develop' of https://github.com/paboyle/Grid into feature/gpt 2020-04-16 11:30:17 -04:00
nils meyer
974586bedc Dslash finally works; cleaned up; uses MOVPRFX in assembly 2020-04-10 22:26:40 +02:00
Christoph Lehner
a2188ea875 remove debugging printf from WilsonKernelsImplementation 2020-03-26 09:12:36 -04:00
Peter Boyle
e5d1c09665 Faster DhopDirAll for little dirac operator coarsening 2020-01-27 12:38:54 -05:00
Peter Boyle
3c3d6a94f3 OPtimising the force term a bit 2020-01-04 03:16:23 -05:00
Peter Boyle
53e3ab4131 Fix force term 2019-08-11 11:06:13 +01:00
Peter Boyle
1282e1067f Do the force term on the accelerator too. Needed particularly because comms buffers
are device memory.
2019-07-29 22:58:35 +01:00
Peter Boyle
fe700a183a Getting HMC to run 2019-07-26 12:18:29 +01:00
Peter Boyle
bd155ca5c0 Overlap comms with comput now supported 2019-07-12 09:09:40 +01:00
Peter Boyle
1299225105 Accelerator loop changes 2019-06-15 09:03:46 +01:00
Peter Boyle
36f06555a2 Simplify Impl 2019-06-09 22:26:27 +01:00
Peter Boyle
d6c0e0756d Remove GPU version 2019-06-09 11:23:42 +01:00
Peter Boyle
3e41b1055c Remove Gpu only kernels. 2019-06-09 11:20:01 +01:00
Peter Boyle
ad2c433574 Instantiations move. Tried using Gianluca's suggestion about avoiding threadIdx but doesn't
seem to make a difference. Will revisit this and probably remove the lane parameter from the coalescedRead
2019-06-08 13:43:12 +01:00
Peter Boyle
0ee6e77cbc Compiles GPU and CPU, still gives good performance on CPU 2019-06-05 13:28:16 +01:00
Peter Boyle
ec68b67d5d Attempt at unified GPU and CPU kernel 2019-06-03 14:55:51 +01:00