Peter Boyle
969b0a3922
Rewrite lattice GPU reduction to use CUB, hipCUB, and SYCL reduction
...
Replace hand-rolled shared-memory reduction kernels (reduceBlock/reduceBlocks/
reduceKernel) and the global device variable retirementCount with a unified
CUB/hipCUB DeviceReduce::Reduce path for CUDA/HIP and sycl::reduction for SYCL.
No small/large split is needed: both CUB and sycl::reduction handle arbitrary
object sizes internally.
Old implementations preserved as sum_gpu_old / sumD_gpu_old etc. in the
original files for regression testing on GPU hardware.
Also add CLAUDE.md with build, test, and architecture guidance.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-15 13:41:56 -04:00
Peter Boyle
c6c2834e03
Hip Happy
2026-05-15 11:30:29 -04:00
Peter Boyle
856545a1db
Support ROCM 7.0.2
2026-05-15 11:30:29 -04:00
Peter Boyle
e2d607f6c7
Merge pull request #490 from jdmaia/hip-guard-acceleratorfor2dNB
...
[HIP] Including kernel launch parameter guard on accelerator_for2dNB
2026-05-06 14:51:30 -04:00
Julio Maia
66da4e0657
Including guard on accelerator_for2dNB against invalid kernel configurations if GRID_HIP
2026-05-06 13:26:33 -05:00
Peter Boyle
b37390bb5a
4 node usqcd run
2026-04-27 14:40:11 -07:00
Peter Boyle
829dc8cceb
32 node
2026-04-27 14:38:02 -07:00
Peter Boyle
13cc2c39f5
FOM run
2026-04-27 14:20:49 -07:00
Peter Boyle
66ea3b271c
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2026-04-27 13:55:52 -07:00
Peter Boyle
d293b58a20
384 node baseline run
2026-04-27 13:54:40 -07:00
Peter Boyle
ce093b2bf3
rdtsc
2026-04-27 13:54:06 -07:00
Peter Boyle
e4404efe5a
Perlmutter compile update
2026-04-27 13:53:28 -07:00
Peter Boyle
5ce270f1de
Adding Claude related files
2026-04-21 10:41:18 -04:00
Peter Boyle
af43b067a0
New CLAUDE controllable visualiser
2026-04-10 11:23:25 -04:00
Quadro
34b44d1fee
New file for animation in MD time direction
2026-04-02 13:55:38 -04:00
Peter Boyle
595ceaac37
Include grid header and make the ENABLE correct
2026-03-11 17:24:44 -04:00
Peter Boyle
daf5834e8e
Fixing incorrect PR about disable fermion instantiations
2026-03-11 17:05:46 -04:00
Peter Boyle
0d8658a039
Optimised
2026-03-05 06:06:32 -05:00
Peter Boyle
095e004d01
Setup change GCR
2026-03-05 06:06:32 -05:00
Peter Boyle
0acabee7f6
Modest change
2026-03-05 06:06:32 -05:00
Peter Boyle
76fbcffb60
Improvement to 16^3 hdcg
2026-03-05 06:06:32 -05:00
Peter Boyle
a0a62d7ead
Merge pull request #478 from vataspro/PolyakovUpstream
...
Spatial Polyakov Loop implementation
2026-02-24 20:45:42 -05:00
Peter Boyle
c5038ea6a5
Merge pull request #483 from cmcknigh/bugfix/rocm7-rocblas-type-refactor
...
Adding a version check to handle rocBlas type refactor
2026-02-24 20:45:03 -05:00
Peter Boyle
a5120903eb
Merge pull request #486 from RChrHill/fix/sp4-fp32
...
Define Sp4 ProjectOnGeneralGroup for generic vtype
2026-02-24 20:44:08 -05:00
Peter Boyle
00b286a08a
Merge pull request #488 from RChrHill/feature/additional-ET-traces
...
Add ET support for Lattice spin- and colour-traces
2026-02-24 20:43:45 -05:00
Peter Boyle
24a9759353
Merge pull request #485 from edbennett/skip-fermion-instantiations
...
Be able to skip compiling fermion instantiations altogether
2026-02-24 20:43:20 -05:00
edbennett
1b56f6f46d
be able to skip compiling fermion instantiations altogether
2026-02-24 23:52:18 +00:00
Peter Boyle
2a8084d569
Subspace setup
2026-02-13 17:26:11 -05:00
Peter Boyle
6ff29f9d4f
Alternate multigrids
2026-02-13 17:25:45 -05:00
RChHill
c4d3e79193
Add ET support for Lattice spin- and colour-traces
2026-01-29 14:46:52 +00:00
Peter Boyle
7cd3f21e6b
preserving a bunch of experiments on setup and g5 subspace doubling
2026-01-06 05:57:39 -05:00
paboyle
4a0aaf0786
Fix issue with Aurora compilers
2025-11-21 21:41:13 +00:00
paboyle
9c3835524c
Fix compile warn
2025-11-21 21:41:12 +00:00
paboyle
549351bb8a
Stag verbose clean up
2025-11-20 18:22:57 +00:00
RChHill
b650b89682
Define Sp4 ProjectOnGeneralGroup for generic vtype
2025-11-19 13:26:52 +00:00
Peter Boyle
74e6b19f83
Looks like the reuse of xfers in staggered has bugs or corner cases depending on volume
2025-11-17 22:29:06 -05:00
Peter Boyle
2e684028de
Improvements
2025-11-14 18:12:27 -05:00
paboyle
c54d87a472
Aurora compile fix for new compiler
2025-11-06 18:17:33 +00:00
Allen McKnight
4304245c1b
Merge branch 'develop' into bugfix/rocm7-rocblas-type-refactor
2025-11-04 08:50:11 -06:00
Peter Boyle
6165931afa
Update GridStd.h
2025-10-03 14:35:37 -04:00
Your Name
1d1fd3bcaf
adding a version check to handle rocblas type change
2025-10-02 15:24:24 -05:00
paboyle
23581333e6
link cufft
2025-08-21 22:25:55 +01:00
paboyle
e5fa3d887f
Compile on CUDA
2025-08-21 22:10:27 +01:00
paboyle
583fa7bb0a
FFTW guarded after CUDA adn HIP
2025-08-21 22:00:12 +01:00
Peter Boyle
fe0db53842
FFT offload to GPU and MUCH faster comms.
...
40x speed up on Frontier
2025-08-21 16:45:38 -04:00
Peter Boyle
76c0ada1e1
Benchmark for En Hung
2025-08-21 16:45:38 -04:00
Peter Boyle
92f49e9194
Merge pull request #482 from g-simonetti/wflow_sp2n_paboyle
...
Fixed Wilson flow for Nc not equal to 3
2025-08-21 09:10:25 -04:00
Peter Boyle
44c8057b5f
Merge pull request #481 from vataspro/sp-reps-fix
...
Only compile higher fermion representations for symplectic gauge group when requested via configure flag
2025-08-20 12:57:28 -04:00
Alexis Provatas
0ad837f595
Fix Sp representations compilation
2025-08-20 17:48:39 +01:00
Peter Boyle
bd2103c746
Merge pull request #480 from vataspro/fix-no-comms
...
Fix enable-comms=none
2025-08-20 12:26:47 -04:00