D. A. Clarke
|
927f8b800e
|
Merge b02d022993 into ee1b8bbdbd
|
2024-02-28 14:08:55 -05:00 |
|
Peter Boyle
|
ee1b8bbdbd
|
Merge pull request #454 from edbennett/adjoint-broke
fix HMC for non-fundamental representations
|
2024-02-28 14:05:27 -05:00 |
|
Peter Boyle
|
3f1636637d
|
Merge pull request #453 from dbollweg/feature/sliceSum_gpu
Feature/slice sum gpu
|
2024-02-28 14:04:43 -05:00 |
|
Peter Boyle
|
2e570f5300
|
Merge pull request #457 from lehner/feature/gpt
Import GPT-related updates
|
2024-02-28 13:59:04 -05:00 |
|
Christoph Lehner
|
9f89486df5
|
remove unnecessary code path
|
2024-02-28 19:56:23 +01:00 |
|
Christoph Lehner
|
22b43b86cb
|
Make GPT test suite work with SYCL
|
2024-02-28 12:57:17 +01:00 |
|
dbollweg
|
3c9012676a
|
CUDA cub refuses to reduce vSpinColourMatrix, breaking up into smaller parts like already done for HIP case.
|
2024-02-27 12:41:45 -05:00 |
|
Dennis Bollweg
|
b507fe209c
|
Added SpinColourMatrix case to sliceSum Test
|
2024-02-27 11:28:32 -05:00 |
|
Dennis Bollweg
|
6cd2d8fcd5
|
Replace cuda/hip memcpy with Grid functions
|
2024-02-26 09:55:07 -05:00 |
|
david clarke
|
b02d022993
|
fixed race condition (thx michael)
|
2024-02-23 17:14:28 -07:00 |
|
david clarke
|
94581e3c7a
|
accelerator_for is broken
|
2024-02-23 15:58:33 -07:00 |
|
david clarke
|
88b52cc045
|
Merge branch 'develop' into hisq_fat_links
|
2024-02-23 14:47:15 -07:00 |
|
dbollweg
|
0a816b5509
|
Merge branch 'feature/sliceSum_gpu' of https://github.com/dbollweg/Grid into feature/sliceSum_gpu
|
2024-02-22 21:43:06 -05:00 |
|
dbollweg
|
1c8b807c2e
|
free malloc'd memory
|
2024-02-22 21:42:44 -05:00 |
|
Christoph Lehner
|
66391f84f2
|
Merge branch 'feature/gpt' of ../Grid into develop
|
2024-02-21 19:05:00 +01:00 |
|
|
97f7a9ecb3
|
fix HMC for non-fundamental representations
|
2024-02-21 08:27:55 +00:00 |
|
Dennis Bollweg
|
15878f7613
|
sliceSumReduction_cub_large now also faster than CPU on Frontier
|
2024-02-16 13:55:21 -05:00 |
|
dbollweg
|
e0d5e3c6c7
|
Merge branch 'paboyle:develop' into feature/sliceSum_gpu
|
2024-02-16 13:16:37 -05:00 |
|
dbollweg
|
6f3455900e
|
Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs
|
2024-02-16 13:15:02 -05:00 |
|
david clarke
|
56827d6ad6
|
accelerator_inline bug
|
2024-02-14 13:56:57 -07:00 |
|
|
73c0b29535
|
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
|
2024-02-13 20:19:32 +00:00 |
|
|
303b83cdb8
|
Scaling benchmarks, verbosity and MPICH aware in acceleratorInit()
For some reason Dirichlet benchmark fails on several nodes; need to
debug this.
|
2024-02-13 19:48:03 +00:00 |
|
|
5ef4da3f29
|
Silence verbose
|
2024-02-13 19:47:36 +00:00 |
|
|
1502860004
|
Benchmark scripts
|
2024-02-13 19:47:02 +00:00 |
|
|
585efc6f3f
|
More benchmark scripts
|
2024-02-13 19:40:49 +00:00 |
|
|
62055e04dd
|
missing semicolon generates error with some compilers
|
2024-02-13 18:18:27 +01:00 |
|
david clarke
|
db420525b3
|
fix Simd::Nsimd typo
|
2024-02-12 15:03:53 -07:00 |
|
dbollweg
|
b5659d106e
|
more test cases
|
2024-02-09 13:37:14 -05:00 |
|
dbollweg
|
4b43307402
|
Undo include path changes for level zero api header
|
2024-02-09 13:07:56 -05:00 |
|
dbollweg
|
09af8c25a2
|
Merge branch 'paboyle:develop' into feature/sliceSum_gpu
|
2024-02-09 13:02:59 -05:00 |
|
dbollweg
|
9514035b87
|
refactor slicesum: slicesum uses GPU version by default now
|
2024-02-09 13:02:28 -05:00 |
|
david clarke
|
2da09ae99b
|
acceleration compiles and doesn't break scalar mode
|
2024-02-06 18:40:13 -07:00 |
|
david clarke
|
a38fb0e04a
|
first effort toward accelerators
|
2024-02-06 18:24:55 -07:00 |
|
|
7019916294
|
RNG seed change safer for large volumes; this is a long term solution
|
2024-02-07 00:56:39 +00:00 |
|
dbollweg
|
1514b4f137
|
slicesum_sycl passes test
|
2024-02-06 19:08:44 -05:00 |
|
|
91cf5ee312
|
Updated bench script
|
2024-02-06 23:45:10 +00:00 |
|
david clarke
|
0a6e2f42c5
|
small amount of cleanup
|
2024-02-06 16:32:07 -07:00 |
|
dbollweg
|
ab2de131bd
|
work towards sliceSum for sycl backend
|
2024-02-06 13:24:45 -05:00 |
|
|
5bfa88be85
|
Aurora MPI standalone benchmake and options that work well
|
2024-02-06 16:28:40 +00:00 |
|
Dennis Bollweg
|
5af8da76d7
|
Fix cuda compilation of Lattice_slicesum_gpu.h
|
2024-02-01 18:02:30 -05:00 |
|
Dennis Bollweg
|
b8b9dc952d
|
Async memcpy's and cleanup
|
2024-02-01 17:55:35 -05:00 |
|
Dennis Bollweg
|
79a6ed32d8
|
Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies
|
2024-02-01 16:41:03 -05:00 |
|
dbollweg
|
caa5f97723
|
Add sliceSum gpu using cub/hipcub
|
2024-01-31 16:50:06 -05:00 |
|
david clarke
|
4924b3209e
|
projectU3 yields a unitary matrix
|
2024-01-23 14:43:58 -07:00 |
|
david clarke
|
00f24f8765
|
already found some bugs in projection, still needs testing
|
2024-01-22 05:50:16 -07:00 |
|
david clarke
|
f5b3d582b0
|
first attempt at U3 projection
|
2024-01-22 02:49:40 -07:00 |
|
david clarke
|
981c93d67a
|
update Test_fatLinks to accept Naik
|
2024-01-21 21:09:19 -07:00 |
|
david clarke
|
c020b78e02
|
Merge branch 'develop' into hisq_fat_links
|
2024-01-21 20:21:08 -07:00 |
|
|
2a0d75bac2
|
Aurora files
|
2023-12-21 23:20:17 +00:00 |
|
Peter Boyle
|
f48298ad4e
|
Bug fix
|
2023-12-11 20:57:02 -05:00 |
|