1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-17 15:27:06 +01:00
Commit Graph

268 Commits

Author SHA1 Message Date
f617468e04 Update Lattice_base.h 2024-10-11 10:39:16 -04:00
2c9878fc3a Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2024-08-27 12:05:46 -04:00
3668e81c5e Extract slice working on checkerboard field for Block Lanczos 2024-08-27 11:31:30 -04:00
d66b2423cb Move slice operations to GPU for BlockCG 2024-08-27 11:28:47 -04:00
15cc78f0b6 peek/poke local site on checkerboard arrays 2024-08-27 11:23:42 -04:00
06db4ddea2 Fast init on GPU 2024-08-27 11:22:33 -04:00
a3322b470f Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2024-08-20 14:30:52 +00:00
6f1328160c Remove SVM use 2024-07-25 18:05:40 +00:00
a66973163f Device vector not UVM 2024-07-11 15:24:11 +00:00
9563238e9b Force initial to identity 2024-06-11 17:51:58 +00:00
1739146599 Property to initialise reduction 2024-06-11 16:47:35 +00:00
a49a161f8d SYCL update to use buffer on reduction variable 2024-06-08 16:05:18 +00:00
cfe1b13225 Back out zero change 2024-05-21 01:14:08 +01:00
832fc08809 Merge pull request #459 from dbollweg/sycl_slicesum_update
Sycl slicesum bugfix
2024-05-20 15:06:53 -04:00
5c3ace7c3e Merge branch 'develop' into feature/scidac-wp1 2024-04-30 05:26:06 -04:00
57552d8ca3 Assign from non-lattice made accelerator resident 2024-04-05 01:05:12 -04:00
d1e9fe50d2 Xor csum for repro testing 2024-03-22 15:42:57 +00:00
e49e95b037 Upgrade of the Britney test with flight recorder and fast xor checksum 2024-03-22 15:39:27 +00:00
fab1efb48c More britney logging improvements 2024-03-19 14:36:21 +00:00
461cd045c6 sliceSum cleanup 2024-03-13 18:18:44 -04:00
fee65d7a75 Merge branch 'paboyle:develop' into sycl_slicesum_update 2024-03-13 18:06:17 -04:00
31f9971dbf avoid PI_ERROR_OUT_OF_RESOURCES in sycl sliceSum 2024-03-13 13:39:26 -04:00
95f3d69cf9 Extra hardware test hook 2024-03-12 20:09:37 +00:00
cf8632bbac Britney test option 2024-03-12 15:15:35 +00:00
d87296f3e8 Merge branch 'develop' of https://github.com/dbollweg/Grid into develop 2024-03-06 16:54:22 -05:00
be94cf1c6f Fewer wait-calls in sycl slicesum 2024-03-06 16:53:13 -05:00
cc04dc42dc Merge branch 'develop' into feature/scidac-wp1 2024-03-06 14:55:21 -05:00
976c3e9b59 Hack for flight logging CG inner products.
Can be made to work, but could put in some more serious infrastructure
for repro testing and blame attribution (Britney test) if necessary
2024-03-05 23:59:57 +00:00
3f1636637d Merge pull request #453 from dbollweg/feature/sliceSum_gpu
Feature/slice sum gpu
2024-02-28 14:04:43 -05:00
22b43b86cb Make GPT test suite work with SYCL 2024-02-28 12:57:17 +01:00
3c9012676a CUDA cub refuses to reduce vSpinColourMatrix, breaking up into smaller parts like already done for HIP case. 2024-02-27 12:41:45 -05:00
6cd2d8fcd5 Replace cuda/hip memcpy with Grid functions 2024-02-26 09:55:07 -05:00
0a816b5509 Merge branch 'feature/sliceSum_gpu' of https://github.com/dbollweg/Grid into feature/sliceSum_gpu 2024-02-22 21:43:06 -05:00
1c8b807c2e free malloc'd memory 2024-02-22 21:42:44 -05:00
44b466e072 Make InsertSliceFast the default at some point in future.
Should I do this now?
2024-02-21 14:51:24 -05:00
66391f84f2 Merge branch 'feature/gpt' of ../Grid into develop 2024-02-21 19:05:00 +01:00
15878f7613 sliceSumReduction_cub_large now also faster than CPU on Frontier 2024-02-16 13:55:21 -05:00
6f3455900e Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs 2024-02-16 13:15:02 -05:00
09af8c25a2 Merge branch 'paboyle:develop' into feature/sliceSum_gpu 2024-02-09 13:02:59 -05:00
9514035b87 refactor slicesum: slicesum uses GPU version by default now 2024-02-09 13:02:28 -05:00
7019916294 RNG seed change safer for large volumes; this is a long term solution 2024-02-07 00:56:39 +00:00
1514b4f137 slicesum_sycl passes test 2024-02-06 19:08:44 -05:00
ab2de131bd work towards sliceSum for sycl backend 2024-02-06 13:24:45 -05:00
5af8da76d7 Fix cuda compilation of Lattice_slicesum_gpu.h 2024-02-01 18:02:30 -05:00
b8b9dc952d Async memcpy's and cleanup 2024-02-01 17:55:35 -05:00
79a6ed32d8 Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies 2024-02-01 16:41:03 -05:00
caa5f97723 Add sliceSum gpu using cub/hipcub 2024-01-31 16:50:06 -05:00
addc638856 Fast localCopyRegion, blockProjectFast 2024-01-22 17:40:38 -05:00
ca5ae8a2e6 Revert to working. 2024-01-17 16:32:05 -05:00
b7c7000d0d Don't need the numerical rounding tolerance in multigrid 2023-12-22 18:10:23 -05:00