|
f617468e04
|
Update Lattice_base.h
|
2024-10-11 10:39:16 -04:00 |
|
|
2c9878fc3a
|
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
|
2024-08-27 12:05:46 -04:00 |
|
|
3668e81c5e
|
Extract slice working on checkerboard field for Block Lanczos
|
2024-08-27 11:31:30 -04:00 |
|
|
d66b2423cb
|
Move slice operations to GPU for BlockCG
|
2024-08-27 11:28:47 -04:00 |
|
|
15cc78f0b6
|
peek/poke local site on checkerboard arrays
|
2024-08-27 11:23:42 -04:00 |
|
|
06db4ddea2
|
Fast init on GPU
|
2024-08-27 11:22:33 -04:00 |
|
|
a3322b470f
|
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
|
2024-08-20 14:30:52 +00:00 |
|
|
6f1328160c
|
Remove SVM use
|
2024-07-25 18:05:40 +00:00 |
|
|
a66973163f
|
Device vector not UVM
|
2024-07-11 15:24:11 +00:00 |
|
|
9563238e9b
|
Force initial to identity
|
2024-06-11 17:51:58 +00:00 |
|
|
1739146599
|
Property to initialise reduction
|
2024-06-11 16:47:35 +00:00 |
|
|
a49a161f8d
|
SYCL update to use buffer on reduction variable
|
2024-06-08 16:05:18 +00:00 |
|
|
cfe1b13225
|
Back out zero change
|
2024-05-21 01:14:08 +01:00 |
|
|
832fc08809
|
Merge pull request #459 from dbollweg/sycl_slicesum_update
Sycl slicesum bugfix
|
2024-05-20 15:06:53 -04:00 |
|
|
5c3ace7c3e
|
Merge branch 'develop' into feature/scidac-wp1
|
2024-04-30 05:26:06 -04:00 |
|
|
57552d8ca3
|
Assign from non-lattice made accelerator resident
|
2024-04-05 01:05:12 -04:00 |
|
|
d1e9fe50d2
|
Xor csum for repro testing
|
2024-03-22 15:42:57 +00:00 |
|
|
e49e95b037
|
Upgrade of the Britney test with flight recorder and fast xor checksum
|
2024-03-22 15:39:27 +00:00 |
|
|
fab1efb48c
|
More britney logging improvements
|
2024-03-19 14:36:21 +00:00 |
|
|
461cd045c6
|
sliceSum cleanup
|
2024-03-13 18:18:44 -04:00 |
|
|
fee65d7a75
|
Merge branch 'paboyle:develop' into sycl_slicesum_update
|
2024-03-13 18:06:17 -04:00 |
|
|
31f9971dbf
|
avoid PI_ERROR_OUT_OF_RESOURCES in sycl sliceSum
|
2024-03-13 13:39:26 -04:00 |
|
|
95f3d69cf9
|
Extra hardware test hook
|
2024-03-12 20:09:37 +00:00 |
|
|
cf8632bbac
|
Britney test option
|
2024-03-12 15:15:35 +00:00 |
|
|
d87296f3e8
|
Merge branch 'develop' of https://github.com/dbollweg/Grid into develop
|
2024-03-06 16:54:22 -05:00 |
|
|
be94cf1c6f
|
Fewer wait-calls in sycl slicesum
|
2024-03-06 16:53:13 -05:00 |
|
|
cc04dc42dc
|
Merge branch 'develop' into feature/scidac-wp1
|
2024-03-06 14:55:21 -05:00 |
|
|
976c3e9b59
|
Hack for flight logging CG inner products.
Can be made to work, but could put in some more serious infrastructure
for repro testing and blame attribution (Britney test) if necessary
|
2024-03-05 23:59:57 +00:00 |
|
|
3f1636637d
|
Merge pull request #453 from dbollweg/feature/sliceSum_gpu
Feature/slice sum gpu
|
2024-02-28 14:04:43 -05:00 |
|
|
22b43b86cb
|
Make GPT test suite work with SYCL
|
2024-02-28 12:57:17 +01:00 |
|
|
3c9012676a
|
CUDA cub refuses to reduce vSpinColourMatrix, breaking up into smaller parts like already done for HIP case.
|
2024-02-27 12:41:45 -05:00 |
|
|
6cd2d8fcd5
|
Replace cuda/hip memcpy with Grid functions
|
2024-02-26 09:55:07 -05:00 |
|
|
0a816b5509
|
Merge branch 'feature/sliceSum_gpu' of https://github.com/dbollweg/Grid into feature/sliceSum_gpu
|
2024-02-22 21:43:06 -05:00 |
|
|
1c8b807c2e
|
free malloc'd memory
|
2024-02-22 21:42:44 -05:00 |
|
|
44b466e072
|
Make InsertSliceFast the default at some point in future.
Should I do this now?
|
2024-02-21 14:51:24 -05:00 |
|
|
66391f84f2
|
Merge branch 'feature/gpt' of ../Grid into develop
|
2024-02-21 19:05:00 +01:00 |
|
|
15878f7613
|
sliceSumReduction_cub_large now also faster than CPU on Frontier
|
2024-02-16 13:55:21 -05:00 |
|
|
6f3455900e
|
Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs
|
2024-02-16 13:15:02 -05:00 |
|
|
09af8c25a2
|
Merge branch 'paboyle:develop' into feature/sliceSum_gpu
|
2024-02-09 13:02:59 -05:00 |
|
|
9514035b87
|
refactor slicesum: slicesum uses GPU version by default now
|
2024-02-09 13:02:28 -05:00 |
|
|
7019916294
|
RNG seed change safer for large volumes; this is a long term solution
|
2024-02-07 00:56:39 +00:00 |
|
|
1514b4f137
|
slicesum_sycl passes test
|
2024-02-06 19:08:44 -05:00 |
|
|
ab2de131bd
|
work towards sliceSum for sycl backend
|
2024-02-06 13:24:45 -05:00 |
|
|
5af8da76d7
|
Fix cuda compilation of Lattice_slicesum_gpu.h
|
2024-02-01 18:02:30 -05:00 |
|
|
b8b9dc952d
|
Async memcpy's and cleanup
|
2024-02-01 17:55:35 -05:00 |
|
|
79a6ed32d8
|
Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies
|
2024-02-01 16:41:03 -05:00 |
|
|
caa5f97723
|
Add sliceSum gpu using cub/hipcub
|
2024-01-31 16:50:06 -05:00 |
|
|
addc638856
|
Fast localCopyRegion, blockProjectFast
|
2024-01-22 17:40:38 -05:00 |
|
|
ca5ae8a2e6
|
Revert to working.
|
2024-01-17 16:32:05 -05:00 |
|
|
b7c7000d0d
|
Don't need the numerical rounding tolerance in multigrid
|
2023-12-22 18:10:23 -05:00 |
|