1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-06-04 19:24:36 +01:00
Commit Graph

139 Commits

Author SHA1 Message Date
Peter Boyle 50aa51f93a debug: add Test_hipfft_repro — reproducer for hipFFT PARSE_ERROR on ROCm 7
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 22:27:27 -04:00
Peter Boyle 79ccc81a86 tests/debug: add G=4 to hipfft fail reproducer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 22:21:52 -04:00
Peter Boyle 3f0fdbb597 tests/debug: test hipMemset variant before cache is populated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 22:10:16 -04:00
Peter Boyle ea57bd8f03 tests/debug: extend hipfft fail reproducer with hipMemset and sync variants
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 22:02:02 -04:00
Peter Boyle 58cc6ca9c0 tests/debug: add minimal hipfft ordering bug fail/pass pair
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 21:48:23 -04:00
Peter Boyle e5996b440d tests/debug: test plan-before-malloc vs malloc-before-plan ordering
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 21:40:17 -04:00
Peter Boyle ad9d03fd85 tests/debug: extend hipfft reproducer with Grid-realistic howmany and exec tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 19:19:59 -04:00
Peter Boyle 4de160ce20 tests/debug: add minimal hipfft plan-creation reproducer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 17:52:59 -04:00
Peter Boyle 068f95ad2d Revert to hand-rolled reduction; drop Lattice_reduction_gpu_cub.h
Remove the CUB/hipCUB direction entirely. Restore Lattice_reduction_gpu.h,
Lattice_reduction_sycl.h, and Lattice_reduction.h to the state before the
CUB rewrite (commit 969b0a39), recovering the original primary function names
(sumD_gpu_small, sumD_gpu_large, sumD_gpu, sum_gpu, sum_gpu_large) and the
hand-rolled shared-memory reduction kernel.

Delete Lattice_reduction_gpu_cub.h. Update Test_reduction to remove the
old/new comparison sections that depended on sum_gpu_old.

The lesson: CUB DeviceReduce is slower than the hand-rolled kernel for small
types, and the smem sizing problem for the extraction pass has no clean
solution within the accelerator_for abstraction. The right improvement is
a higher radix (12 then 4) in sumD_gpu_large, applied directly to the
existing hand-rolled kernel.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 21:52:18 -04:00
Peter Boyle baa70d8ec9 Test_reduction: add timing benchmark for new vs old reduction paths
Reports us/call and GB/s for sum_gpu (CUB/sycl::reduction) and
sum_gpu_old (hand-rolled shared-memory) for each field type, with
5-call warmup and 100-call timed loop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 12:31:13 -04:00
Peter Boyle c0472aa0ec Test_reduction: use separate float and double grids
Float fields require a grid constructed with vComplexF::Nsimd(); using
a double grid causes grid->_gsites to undercount the sites in float
vobjF, making the constant-field expected value wrong.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 12:09:35 -04:00
Peter Boyle 09552cfd73 Rename scalarNorm2 to squaredSum in Test_reduction.cc
The function computes |sum|^2 — the squared magnitude of an aggregate sum —
not a norm. squaredSum makes clear that squaring is applied to the sum, not
to individual site values before summing, distinguishing it from sumOfSquares
(the squared L2 norm).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:15:11 -04:00
Peter Boyle 286c29d6fb Add Test_reduction to tests/debug
Tests the new CUB/hipCUB/SYCL lattice reduction (sum_gpu) against the
preserved hand-rolled implementation (sum_gpu_old) for LatticeComplexF/D,
LatticeColourMatrixF/D and LatticePropagatorF/D.

Part a) gaussian random field: checks that old and new agree to within
float/double roundoff tolerance.
Part b) constant field (= 1.0, identity-matrix init): verifies
innerProduct(sum, sum) = Ncomp * V^2 where Ncomp counts the nonzero
diagonal scalar components per site (1 / Nc / Ns*Nc respectively).

Make.inc is auto-generated by scripts/filelist on bootstrap and is not
tracked; the new .cc file is all that is needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 14:31:33 -04:00
Peter Boyle 0d8658a039 Optimised 2026-03-05 06:06:32 -05:00
Peter Boyle 76fbcffb60 Improvement to 16^3 hdcg 2026-03-05 06:06:32 -05:00
Peter Boyle 6ff29f9d4f Alternate multigrids 2026-02-13 17:25:45 -05:00
Peter Boyle 7cd3f21e6b preserving a bunch of experiments on setup and g5 subspace doubling 2026-01-06 05:57:39 -05:00
paboyle 9e6a4a4737 Assertion updates to macros (mostly) with backtrace.
WIlson flow to include options for DBW2, Iwasaki, Symanzik.
View logging for data assurance
2025-08-07 15:48:38 +00:00
Peter Boyle 677b4cc5b0 Make all tests compile 2025-04-24 20:33:26 -04:00
Peter Boyle 6fec3c15ca Cleaner printing 2025-04-04 18:35:06 -04:00
Peter Boyle c74d11e3d7 PVdagM MG 2025-02-01 11:04:13 -05:00
Peter Boyle 3f3661a86f Heading towards PVdagM multigrid 2025-01-17 14:33:35 +00:00
Peter Boyle 2a9cfeb9ea New files 2024-09-26 14:23:29 -04:00
Peter Boyle 575eb72182 Converges on 16^3 2024-08-27 19:20:38 +00:00
Peter Boyle 29f6b8a74a Setup 2024-08-27 12:02:49 -04:00
Peter Boyle 9779aaea33 16^3 optimise 2024-08-27 11:38:35 -04:00
Peter Boyle ec25604a67 Fastest solver for mrhs multigrid 2024-08-27 11:32:34 -04:00
Peter Boyle b461184797 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2024-07-23 09:53:58 -04:00
Peter Boyle 486412635a 8^4 test for PETSc 2024-07-22 15:25:17 -04:00
Peter Boyle 8b23a1546a Force compile temporarily 2024-07-22 15:24:56 -04:00
Peter Boyle a901e4e369 Regressed performance for paper 2024-07-22 15:24:04 -04:00
Peter Boyle 804d9367d4 Regressed performance 2024-07-22 15:23:25 -04:00
Peter Boyle 7c246606c1 Schur additional case 2024-07-10 22:04:32 +00:00
Peter Boyle 12b8be7cb9 Best so far on 96^3 350 Evecs converged on 4^4 block 2024-06-18 16:31:37 -04:00
Peter Boyle dc80b08969 96^3 test 2024-06-10 15:07:29 -04:00
Peter Boyle 0e607a55e7 Updated for 8^4 test 2024-05-26 20:53:05 +00:00
Peter Boyle ad14a82742 Working aas good as possible on 48^3 in double 2024-05-16 10:55:45 -04:00
Peter Boyle 98cf247f33 prepare to switch to mixed precision 2024-04-30 05:23:45 -04:00
Peter Boyle 0cf16522d1 Refine with HDCG choice 2024-04-30 05:22:14 -04:00
Peter Boyle 5147a42818 Updated hdcg 2024-04-05 01:05:57 -04:00
Peter Boyle 5b79d51c22 Improvements 2024-04-01 14:18:40 -04:00
Peter Boyle cc04dc42dc Merge branch 'develop' into feature/scidac-wp1 2024-03-06 14:55:21 -05:00
Peter Boyle 070b61f08f Simplifying the MultiRHS solver to make it do SRHS *and* MRHS 2024-03-06 14:04:33 -05:00
Peter Boyle cd15abe9d1 Mrhs prep 2024-02-27 11:41:13 -05:00
Peter Boyle eb702f581b Running on 12 rhs on 18 nodes of frontier 2024-01-22 17:44:15 -05:00
Peter Boyle d967eb53de Working for first time 2024-01-17 16:31:12 -05:00
Peter Boyle 25f71913b7 MultiRHS coarse 2024-01-04 12:01:17 -05:00
Peter Boyle d5fd90b2f3 Add 48^3 rtest 2024-01-04 12:00:01 -05:00
Peter Boyle 22c611bd1a Delete temp file 2023-12-21 18:32:31 -05:00
Peter Boyle c9bb1bf8ea Passing new BLAs based 2023-12-21 18:31:17 -05:00