1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-04-23 04:05:55 +01:00

128 Commits

Author SHA1 Message Date
0baaddbe98 Pipeline mode commit on Aurora. 5+ TF/s on 16^3x32 per tile at 384
nodes.
More concurrency/fine grained scheduling is possible.
2025-02-04 19:27:26 +00:00
8cf809e231 Best results on Aurora so far 2025-01-31 16:14:45 +00:00
94019a922e Significantly better performance on Aurora without using pipeline mode 2025-01-30 16:36:46 +00:00
Peter Boyle
8f70cfeda9 Clean up 2024-10-18 13:56:53 -04:00
c5c67b706e cl::sycl -> SYCL 2024-10-10 22:04:12 +00:00
066544281f Deprecate UVM 2024-09-17 13:34:27 +00:00
Peter Boyle
c4b9f71357 CPU compile ordering is important 2024-05-21 02:22:32 +01:00
Peter Boyle
9a1ad6a5eb Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2024-05-17 11:33:46 -04:00
Peter Boyle
057f86c1de 2 queues works ok in performance 2024-05-07 18:42:50 +00:00
Peter Boyle
cc04dc42dc Merge branch 'develop' into feature/scidac-wp1 2024-03-06 14:55:21 -05:00
783a66b348 Deterministic reduction please 2024-03-06 00:01:37 +00:00
Peter Boyle
3f1636637d
Merge pull request #453 from dbollweg/feature/sliceSum_gpu
Feature/slice sum gpu
2024-02-28 14:04:43 -05:00
Christoph Lehner
9f89486df5 remove unnecessary code path 2024-02-28 19:56:23 +01:00
Christoph Lehner
22b43b86cb Make GPT test suite work with SYCL 2024-02-28 12:57:17 +01:00
Peter Boyle
9f40467e24 Warning squash 2024-02-27 11:40:36 -05:00
Dennis Bollweg
6cd2d8fcd5 Replace cuda/hip memcpy with Grid functions 2024-02-26 09:55:07 -05:00
Peter Boyle
5e5b471bb2 Put/Get and DEviceToDevice 2024-02-21 14:47:06 -05:00
303b83cdb8 Scaling benchmarks, verbosity and MPICH aware in acceleratorInit()
For some reason Dirichlet benchmark fails on several nodes; need to
debug this.
2024-02-13 19:48:03 +00:00
Peter Boyle
6f51b49ef8 Use stderr 2024-01-22 17:41:09 -05:00
Peter Boyle
e07cb2b9de Accelerator memory 2024-01-17 16:24:31 -05:00
Peter Boyle
d34b207eab Avoid HIP warnings 2023-10-24 10:57:04 -04:00
Peter Boyle
33097681b9 FTHMC compiled and merged to develop 2023-10-14 00:42:55 +03:00
Peter Boyle
9626a2c7c0 Asynch handling 2023-10-13 18:21:56 +03:00
Peter Boyle
c564611ba7 Annoying hack that is useful to preserve for profiling 2023-09-29 17:11:12 -04:00
Peter Boyle
b4f2ca81ff Copy queue and compute queue same as better concurrency 2023-04-11 12:18:21 -07:00
Peter Boyle
da503fef0e Name change on barrier routine 2023-04-11 12:14:04 -07:00
Peter Boyle
4a382fad3f Use distinct SYCL queue for copies 2023-04-04 07:41:41 -07:00
Peter Boyle
af64c1c6b6 Had managed to drop the accelerator_barrier() in the Wilson Compressor gather 2023-03-30 17:34:44 -04:00
Peter Boyle
866f48391a Temporary fix for develop incorrect results 2023-03-30 17:10:13 -04:00
Peter Boyle
496d04cd85 Weaken the Fence 2023-03-29 18:58:51 -04:00
Peter Boyle
b5b759df73 Merge branch 'develop' into feature/dirichlet 2023-03-21 16:05:46 -04:00
Peter Boyle
861e5d7f4c SYCL version update. Why do they keep making incompatible changes 2023-03-14 12:10:02 -07:00
Peter Boyle
03508448f8 Remove verbose 2022-10-04 11:12:15 -07:00
Peter Boyle
1177b8f661 Merge branch 'develop' into feature/dirichlet 2022-08-31 19:05:57 -04:00
Peter Boyle
95b640cb6b 10TF/s on 32^3 x 64 on single node 2022-08-04 15:43:52 -04:00
Peter Boyle
2cb5bedc15 Copy stream HIP improvements 2022-08-04 15:24:03 -04:00
Peter Boyle
188d2c7a4d PVC default, ignore ATS 2022-08-02 08:38:53 -07:00
Peter Boyle
84110166e4 Fix the fence 2022-08-02 08:00:43 -07:00
Peter Boyle
d32b923b6c Fencing on a stream in SYCL is needed. Didn't know that ... gulp 2022-08-02 07:58:04 -07:00
Peter Boyle
5f8892bf03 Mistake pointed out by Camilo 2022-07-19 09:31:51 -07:00
Peter Boyle
f14e7e51e7 Grid accelerator 2022-07-12 10:56:22 -07:00
Peter Boyle
3544965f54 Stream doesn't work 2022-07-07 17:49:20 +01:00
Peter Boyle
bd99fd608c Introduce a non-default stream for compute operatoins 2022-07-01 09:42:53 -04:00
Peter Boyle
136d843ce7 Crusher updates 2022-05-25 12:36:09 -04:00
Peter Boyle
5012adfebf Merge branch 'develop' into feature/dirichlet 2022-04-05 16:26:19 -04:00
Peter Boyle
92a83a9eb3 Performance improve for Tesseract 2022-03-16 17:14:36 +00:00
Peter Boyle
5340e50427 HMC running with new formulation 2022-03-01 17:10:25 -05:00
Peter Boyle
e16fc5b2e4 Threaded intranode comms transfer - ideally between NUMA domains 2022-03-01 11:17:24 -05:00
Julio Maia
86f4e17928 Changing thread block order and adding launch_bounds 2022-02-07 11:29:37 -06:00
Peter Boyle
7f7d06d963 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2021-12-07 09:06:42 -08:00