dbollweg
|
6f3455900e
|
Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs
|
2024-02-16 13:15:02 -05:00 |
|
david clarke
|
56827d6ad6
|
accelerator_inline bug
|
2024-02-14 13:56:57 -07:00 |
|
|
73c0b29535
|
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
|
2024-02-13 20:19:32 +00:00 |
|
|
303b83cdb8
|
Scaling benchmarks, verbosity and MPICH aware in acceleratorInit()
For some reason Dirichlet benchmark fails on several nodes; need to
debug this.
|
2024-02-13 19:48:03 +00:00 |
|
|
62055e04dd
|
missing semicolon generates error with some compilers
|
2024-02-13 18:18:27 +01:00 |
|
david clarke
|
db420525b3
|
fix Simd::Nsimd typo
|
2024-02-12 15:03:53 -07:00 |
|
dbollweg
|
4b43307402
|
Undo include path changes for level zero api header
|
2024-02-09 13:07:56 -05:00 |
|
dbollweg
|
09af8c25a2
|
Merge branch 'paboyle:develop' into feature/sliceSum_gpu
|
2024-02-09 13:02:59 -05:00 |
|
dbollweg
|
9514035b87
|
refactor slicesum: slicesum uses GPU version by default now
|
2024-02-09 13:02:28 -05:00 |
|
david clarke
|
2da09ae99b
|
acceleration compiles and doesn't break scalar mode
|
2024-02-06 18:40:13 -07:00 |
|
david clarke
|
a38fb0e04a
|
first effort toward accelerators
|
2024-02-06 18:24:55 -07:00 |
|
|
7019916294
|
RNG seed change safer for large volumes; this is a long term solution
|
2024-02-07 00:56:39 +00:00 |
|
dbollweg
|
1514b4f137
|
slicesum_sycl passes test
|
2024-02-06 19:08:44 -05:00 |
|
david clarke
|
0a6e2f42c5
|
small amount of cleanup
|
2024-02-06 16:32:07 -07:00 |
|
dbollweg
|
ab2de131bd
|
work towards sliceSum for sycl backend
|
2024-02-06 13:24:45 -05:00 |
|
Dennis Bollweg
|
5af8da76d7
|
Fix cuda compilation of Lattice_slicesum_gpu.h
|
2024-02-01 18:02:30 -05:00 |
|
Dennis Bollweg
|
b8b9dc952d
|
Async memcpy's and cleanup
|
2024-02-01 17:55:35 -05:00 |
|
Dennis Bollweg
|
79a6ed32d8
|
Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies
|
2024-02-01 16:41:03 -05:00 |
|
dbollweg
|
caa5f97723
|
Add sliceSum gpu using cub/hipcub
|
2024-01-31 16:50:06 -05:00 |
|
david clarke
|
4924b3209e
|
projectU3 yields a unitary matrix
|
2024-01-23 14:43:58 -07:00 |
|
david clarke
|
00f24f8765
|
already found some bugs in projection, still needs testing
|
2024-01-22 05:50:16 -07:00 |
|
david clarke
|
f5b3d582b0
|
first attempt at U3 projection
|
2024-01-22 02:49:40 -07:00 |
|
david clarke
|
c020b78e02
|
Merge branch 'develop' into hisq_fat_links
|
2024-01-21 20:21:08 -07:00 |
|
Peter Boyle
|
f48298ad4e
|
Bug fix
|
2023-12-11 20:57:02 -05:00 |
|
Peter Boyle
|
d1d9827263
|
Integrator logging update
|
2023-12-08 12:14:00 -05:00 |
|
david clarke
|
9cd4128833
|
fix naik bug
|
2023-11-03 14:11:38 -06:00 |
|
david clarke
|
c8b17c9526
|
Naik to CShift
|
2023-11-02 12:43:22 -06:00 |
|
david clarke
|
2ae2a81e85
|
attempt to fix Naik
|
2023-10-31 13:54:55 -06:00 |
|
david clarke
|
69c869d345
|
fixed stupid typo
|
2023-10-30 17:41:52 -06:00 |
|
david clarke
|
df9b958c40
|
naik now returns separately
|
2023-10-30 17:40:53 -06:00 |
|
david clarke
|
3d3376d1a3
|
LePage works, trying Naik
|
2023-10-27 16:26:31 -06:00 |
|
Christoph Lehner
|
f2648e94b9
|
getHostPointer added to Lattice
|
2023-10-23 13:47:41 +02:00 |
|
david clarke
|
21ed6ac0f4
|
added floating-point support
|
2023-10-20 13:54:26 -06:00 |
|
david clarke
|
7bb8ab7000
|
improve smearing templating
|
2023-10-20 08:41:02 -06:00 |
|
david clarke
|
2c824c2641
|
Merge branch 'develop' into hisq_fat_links
|
2023-10-17 16:03:59 -06:00 |
|
david clarke
|
391fd9cc6a
|
try lepage term
|
2023-10-17 14:57:15 -06:00 |
|
Peter Boyle
|
33097681b9
|
FTHMC compiled and merged to develop
|
2023-10-14 00:42:55 +03:00 |
|
Peter Boyle
|
9626a2c7c0
|
Asynch handling
|
2023-10-13 18:21:56 +03:00 |
|
Peter Boyle
|
e936f5b80b
|
IfGridTensor shorthand
|
2023-10-13 18:21:56 +03:00 |
|
Peter Boyle
|
ffc0639cb9
|
Running in HMC tests
|
2023-10-13 18:21:56 +03:00 |
|
Peter Boyle
|
c5b43b322c
|
traceProduct eliminates non-contributing intermediate terms
|
2023-10-13 18:21:56 +03:00 |
|
Peter Boyle
|
c9c4576237
|
Improved frontier cshift
|
2023-10-13 18:21:56 +03:00 |
|
david clarke
|
bf4369f72d
|
clean up HISQSmear with decltypes
|
2023-10-12 12:41:06 -06:00 |
|
david clarke
|
36600899e2
|
working 7-link; Grid_log; generalShift
|
2023-10-12 11:11:39 -06:00 |
|
david clarke
|
b9c70d156b
|
Merge branch 'develop' into hisq_fat_links
|
2023-10-10 22:44:17 -06:00 |
|
david clarke
|
eb89579fe7
|
Merge remote-tracking branch 'origin/develop' into develop
|
2023-10-10 22:43:51 -06:00 |
|
david clarke
|
0cfd13d18b
|
7-link working
|
2023-10-10 22:41:52 -06:00 |
|
Christoph Lehner
|
e6ed516052
|
merged
|
2023-10-08 09:00:37 +02:00 |
|
Christoph Lehner
|
e2a3dae1f2
|
Option for multiple simultaneous CartesianStencils
|
2023-10-08 08:58:44 +02:00 |
|
Peter Boyle
|
d93eac7b1c
|
Performance regressed and is OK in icpx 2023.2
|
2023-10-03 15:53:14 +00:00 |
|