1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-05-23 18:44:17 +01:00
Commit Graph

1620 Commits

Author SHA1 Message Date
Peter Boyle 068f95ad2d Revert to hand-rolled reduction; drop Lattice_reduction_gpu_cub.h
Remove the CUB/hipCUB direction entirely. Restore Lattice_reduction_gpu.h,
Lattice_reduction_sycl.h, and Lattice_reduction.h to the state before the
CUB rewrite (commit 969b0a39), recovering the original primary function names
(sumD_gpu_small, sumD_gpu_large, sumD_gpu, sum_gpu, sum_gpu_large) and the
hand-rolled shared-memory reduction kernel.

Delete Lattice_reduction_gpu_cub.h. Update Test_reduction to remove the
old/new comparison sections that depended on sum_gpu_old.

The lesson: CUB DeviceReduce is slower than the hand-rolled kernel for small
types, and the smem sizing problem for the extraction pass has no clean
solution within the accelerator_for abstraction. The right improvement is
a higher radix (12 then 4) in sumD_gpu_large, applied directly to the
existing hand-rolled kernel.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 21:52:18 -04:00
Peter Boyle baa70d8ec9 Test_reduction: add timing benchmark for new vs old reduction paths
Reports us/call and GB/s for sum_gpu (CUB/sycl::reduction) and
sum_gpu_old (hand-rolled shared-memory) for each field type, with
5-call warmup and 100-call timed loop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 12:31:13 -04:00
Peter Boyle c0472aa0ec Test_reduction: use separate float and double grids
Float fields require a grid constructed with vComplexF::Nsimd(); using
a double grid causes grid->_gsites to undercount the sites in float
vobjF, making the constant-field expected value wrong.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 12:09:35 -04:00
Peter Boyle 09552cfd73 Rename scalarNorm2 to squaredSum in Test_reduction.cc
The function computes |sum|^2 — the squared magnitude of an aggregate sum —
not a norm. squaredSum makes clear that squaring is applied to the sum, not
to individual site values before summing, distinguishing it from sumOfSquares
(the squared L2 norm).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:15:11 -04:00
Peter Boyle 286c29d6fb Add Test_reduction to tests/debug
Tests the new CUB/hipCUB/SYCL lattice reduction (sum_gpu) against the
preserved hand-rolled implementation (sum_gpu_old) for LatticeComplexF/D,
LatticeColourMatrixF/D and LatticePropagatorF/D.

Part a) gaussian random field: checks that old and new agree to within
float/double roundoff tolerance.
Part b) constant field (= 1.0, identity-matrix init): verifies
innerProduct(sum, sum) = Ncomp * V^2 where Ncomp counts the nonzero
diagonal scalar components per site (1 / Nc / Ns*Nc respectively).

Make.inc is auto-generated by scripts/filelist on bootstrap and is not
tracked; the new .cc file is all that is needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 14:31:33 -04:00
Peter Boyle 595ceaac37 Include grid header and make the ENABLE correct 2026-03-11 17:24:44 -04:00
Peter Boyle daf5834e8e Fixing incorrect PR about disable fermion instantiations 2026-03-11 17:05:46 -04:00
Peter Boyle 0d8658a039 Optimised 2026-03-05 06:06:32 -05:00
Peter Boyle 76fbcffb60 Improvement to 16^3 hdcg 2026-03-05 06:06:32 -05:00
edbennett 1b56f6f46d be able to skip compiling fermion instantiations altogether 2026-02-24 23:52:18 +00:00
Peter Boyle 6ff29f9d4f Alternate multigrids 2026-02-13 17:25:45 -05:00
Peter Boyle 7cd3f21e6b preserving a bunch of experiments on setup and g5 subspace doubling 2026-01-06 05:57:39 -05:00
Peter Boyle 2e684028de Improvements 2025-11-14 18:12:27 -05:00
Peter Boyle fe0db53842 FFT offload to GPU and MUCH faster comms.
40x speed up on Frontier
2025-08-21 16:45:38 -04:00
Peter Boyle 76c0ada1e1 Benchmark for En Hung 2025-08-21 16:45:38 -04:00
paboyle 9e6a4a4737 Assertion updates to macros (mostly) with backtrace.
WIlson flow to include options for DBW2, Iwasaki, Symanzik.
View logging for data assurance
2025-08-07 15:48:38 +00:00
paboyle 73af020f98 improved 2025-06-27 06:08:54 +00:00
Peter Boyle 3737a24096 Updated python output 2025-06-03 14:09:29 -04:00
Peter Boyle 5364d580c9 Output chirality, eigenvector density files and python source lego plot 2025-05-13 18:44:47 -04:00
Peter Boyle 677b4cc5b0 Make all tests compile 2025-04-24 20:33:26 -04:00
Peter Boyle ab3de50d5e Merge pull request #473 from UCL-ARC/gauge_action_deriv
WilsonGagueAction deriv
2025-04-24 14:39:10 -04:00
Chulwoo Jung a957e7bfa1 Adding DWF evec Chirality measurement 2025-04-22 22:17:51 +00:00
Chulwoo Jung cee4c8ce8c Merge branch 'develop' of https://github.com/paboyle/Grid into specflow 2025-04-18 19:55:36 +00:00
Peter Boyle 6fec3c15ca Cleaner printing 2025-04-04 18:35:06 -04:00
Mashy Green d41542c64b reverted sp2n test wilsonfundfermiongauge to original 2025-03-24 08:29:15 +00:00
Mashy Green 0000d2e558 Merge branch 'develop' into gauge_action_deriv 2025-03-10 08:35:57 +00:00
Muhammad Asif b1ba209696 Latest upstream with np-su3 patch and modified Sp_WilsonFunfFermionGauge test to be small (#22)
Co-authored-by: Mashy Green <mashy@me.com>

merging no-su3 patch
2025-02-24 11:38:42 +00:00
Mashy Green 717f647418 added the WilsonFlow patch from upstream PR #471 2025-02-24 08:41:31 +00:00
Peter Boyle c74d11e3d7 PVdagM MG 2025-02-01 11:04:13 -05:00
paboyle c4fc972fec Merge branch 'feature/deprecate-uvm' into develop 2025-01-31 16:32:36 +00:00
Chulwoo Jung 570b72a47b Bugfix. Sorry! 2025-01-21 15:37:39 -05:00
Chulwoo Jung a5798a89ed Merge branch 'develop' into specflow 2025-01-21 12:13:24 -05:00
Peter Boyle 3f3661a86f Heading towards PVdagM multigrid 2025-01-17 14:33:35 +00:00
Chulwoo Jung f7e2f9a401 Checking in spectral flow and DWF/Mobius kernel eigenvalue measurement 2025-01-16 20:47:33 +00:00
Chulwoo Jung 2848a9b558 DWF Kernel lanczos working(?) 2025-01-16 01:29:56 +00:00
paboyle 8fe429346f Dslash testing for reproduce 2024-11-11 23:11:11 +00:00
Peter Boyle b91fc1b6b4 Merge branch 'feature/boosted' into feature/deprecate-uvm
Fixed boosted free field test
2024-10-28 16:53:09 -04:00
Peter Boyle eafc150034 Test fft asserts 2024-10-23 16:46:26 -04:00
Peter Boyle 1e893af775 GPU happy 2024-10-23 14:52:15 -04:00
Peter Boyle d9f430a575 Happy GPU 2024-10-23 14:51:16 -04:00
Peter Boyle 5ae77876a8 Meson field and Aslash field on GPU; some compiler warning removed 2024-10-18 19:08:06 -04:00
paboyle 5cc4f3241d Meson field test 2024-10-18 15:42:30 +00:00
Peter Boyle 6815e138b4 Boosted fermion attempt 2024-10-17 18:37:33 +01:00
paboyle 03687c1d62 Final version of test, closer to original again 2024-10-15 14:35:17 +00:00
Peter Boyle 2a9cfeb9ea New files 2024-09-26 14:23:29 -04:00
paboyle 066544281f Deprecate UVM 2024-09-17 13:34:27 +00:00
paboyle 160969a758 UVM tester, doesn't turn up anything 2024-09-10 18:09:42 +00:00
Peter Boyle 575eb72182 Converges on 16^3 2024-08-27 19:20:38 +00:00
Peter Boyle 29f6b8a74a Setup 2024-08-27 12:02:49 -04:00
Peter Boyle 9779aaea33 16^3 optimise 2024-08-27 11:38:35 -04:00