portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-07-19 00:23:28 +01:00

Author	SHA1	Message	Date
Peter BoyleandClaude Sonnet 4.6	50aa51f93a	debug: add Test_hipfft_repro — reproducer for hipFFT PARSE_ERROR on ROCm 7 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 22:27:27 -04:00
Peter BoyleandClaude Sonnet 4.6	79ccc81a86	tests/debug: add G=4 to hipfft fail reproducer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 22:21:52 -04:00
Peter BoyleandClaude Sonnet 4.6	3f0fdbb597	tests/debug: test hipMemset variant before cache is populated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 22:10:16 -04:00
Peter BoyleandClaude Sonnet 4.6	ea57bd8f03	tests/debug: extend hipfft fail reproducer with hipMemset and sync variants Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 22:02:02 -04:00
Peter BoyleandClaude Sonnet 4.6	58cc6ca9c0	tests/debug: add minimal hipfft ordering bug fail/pass pair Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 21:48:23 -04:00
Peter BoyleandClaude Sonnet 4.6	e5996b440d	tests/debug: test plan-before-malloc vs malloc-before-plan ordering Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 21:40:17 -04:00
Peter BoyleandClaude Sonnet 4.6	ad9d03fd85	tests/debug: extend hipfft reproducer with Grid-realistic howmany and exec tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 19:19:59 -04:00
Peter BoyleandClaude Sonnet 4.6	4de160ce20	tests/debug: add minimal hipfft plan-creation reproducer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 17:52:59 -04:00
Peter BoyleandClaude Sonnet 4.6	068f95ad2d	Revert to hand-rolled reduction; drop Lattice_reduction_gpu_cub.h Remove the CUB/hipCUB direction entirely. Restore Lattice_reduction_gpu.h, Lattice_reduction_sycl.h, and Lattice_reduction.h to the state before the CUB rewrite (commit `969b0a39`), recovering the original primary function names (sumD_gpu_small, sumD_gpu_large, sumD_gpu, sum_gpu, sum_gpu_large) and the hand-rolled shared-memory reduction kernel. Delete Lattice_reduction_gpu_cub.h. Update Test_reduction to remove the old/new comparison sections that depended on sum_gpu_old. The lesson: CUB DeviceReduce is slower than the hand-rolled kernel for small types, and the smem sizing problem for the extraction pass has no clean solution within the accelerator_for abstraction. The right improvement is a higher radix (12 then 4) in sumD_gpu_large, applied directly to the existing hand-rolled kernel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 21:52:18 -04:00
Peter BoyleandClaude Sonnet 4.6	baa70d8ec9	Test_reduction: add timing benchmark for new vs old reduction paths Reports us/call and GB/s for sum_gpu (CUB/sycl::reduction) and sum_gpu_old (hand-rolled shared-memory) for each field type, with 5-call warmup and 100-call timed loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 12:31:13 -04:00
Peter BoyleandClaude Sonnet 4.6	c0472aa0ec	Test_reduction: use separate float and double grids Float fields require a grid constructed with vComplexF::Nsimd(); using a double grid causes grid->_gsites to undercount the sites in float vobjF, making the constant-field expected value wrong. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 12:09:35 -04:00
Peter BoyleandClaude Sonnet 4.6	09552cfd73	Rename scalarNorm2 to squaredSum in Test_reduction.cc The function computes \|sum\|^2 — the squared magnitude of an aggregate sum — not a norm. squaredSum makes clear that squaring is applied to the sum, not to individual site values before summing, distinguishing it from sumOfSquares (the squared L2 norm). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 23:15:11 -04:00
Peter BoyleandClaude Sonnet 4.6	286c29d6fb	Add Test_reduction to tests/debug Tests the new CUB/hipCUB/SYCL lattice reduction (sum_gpu) against the preserved hand-rolled implementation (sum_gpu_old) for LatticeComplexF/D, LatticeColourMatrixF/D and LatticePropagatorF/D. Part a) gaussian random field: checks that old and new agree to within float/double roundoff tolerance. Part b) constant field (= 1.0, identity-matrix init): verifies innerProduct(sum, sum) = Ncomp * V^2 where Ncomp counts the nonzero diagonal scalar components per site (1 / Nc / Ns*Nc respectively). Make.inc is auto-generated by scripts/filelist on bootstrap and is not tracked; the new .cc file is all that is needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 14:31:33 -04:00
Peter Boyle	0d8658a039	Optimised	2026-03-05 06:06:32 -05:00
Peter Boyle	76fbcffb60	Improvement to 16^3 hdcg	2026-03-05 06:06:32 -05:00
Peter Boyle	6ff29f9d4f	Alternate multigrids	2026-02-13 17:25:45 -05:00
Peter Boyle	7cd3f21e6b	preserving a bunch of experiments on setup and g5 subspace doubling	2026-01-06 05:57:39 -05:00
paboyle	9e6a4a4737	Assertion updates to macros (mostly) with backtrace. WIlson flow to include options for DBW2, Iwasaki, Symanzik. View logging for data assurance	2025-08-07 15:48:38 +00:00
Peter Boyle	677b4cc5b0	Make all tests compile	2025-04-24 20:33:26 -04:00
Peter Boyle	6fec3c15ca	Cleaner printing	2025-04-04 18:35:06 -04:00
Peter Boyle	c74d11e3d7	PVdagM MG	2025-02-01 11:04:13 -05:00
Peter Boyle	3f3661a86f	Heading towards PVdagM multigrid	2025-01-17 14:33:35 +00:00
Peter Boyle	2a9cfeb9ea	New files	2024-09-26 14:23:29 -04:00
Peter Boyle	575eb72182	Converges on 16^3	2024-08-27 19:20:38 +00:00
Peter Boyle	29f6b8a74a	Setup	2024-08-27 12:02:49 -04:00
Peter Boyle	9779aaea33	16^3 optimise	2024-08-27 11:38:35 -04:00
Peter Boyle	ec25604a67	Fastest solver for mrhs multigrid	2024-08-27 11:32:34 -04:00
Peter Boyle	b461184797	Merge branch 'develop' of https://github.com/paboyle/Grid into develop	2024-07-23 09:53:58 -04:00
Peter Boyle	486412635a	8^4 test for PETSc	2024-07-22 15:25:17 -04:00
Peter Boyle	8b23a1546a	Force compile temporarily	2024-07-22 15:24:56 -04:00
Peter Boyle	a901e4e369	Regressed performance for paper	2024-07-22 15:24:04 -04:00
Peter Boyle	804d9367d4	Regressed performance	2024-07-22 15:23:25 -04:00
Peter Boyle	7c246606c1	Schur additional case	2024-07-10 22:04:32 +00:00
Peter Boyle	12b8be7cb9	Best so far on 96^3 350 Evecs converged on 4^4 block	2024-06-18 16:31:37 -04:00
Peter Boyle	dc80b08969	96^3 test	2024-06-10 15:07:29 -04:00
Peter Boyle	0e607a55e7	Updated for 8^4 test	2024-05-26 20:53:05 +00:00
Peter Boyle	ad14a82742	Working aas good as possible on 48^3 in double	2024-05-16 10:55:45 -04:00
Peter Boyle	98cf247f33	prepare to switch to mixed precision	2024-04-30 05:23:45 -04:00
Peter Boyle	0cf16522d1	Refine with HDCG choice	2024-04-30 05:22:14 -04:00
Peter Boyle	5147a42818	Updated hdcg	2024-04-05 01:05:57 -04:00
Peter Boyle	5b79d51c22	Improvements	2024-04-01 14:18:40 -04:00
Peter Boyle	cc04dc42dc	Merge branch 'develop' into feature/scidac-wp1	2024-03-06 14:55:21 -05:00
Peter Boyle	070b61f08f	Simplifying the MultiRHS solver to make it do SRHS and MRHS	2024-03-06 14:04:33 -05:00
Peter Boyle	cd15abe9d1	Mrhs prep	2024-02-27 11:41:13 -05:00
Peter Boyle	eb702f581b	Running on 12 rhs on 18 nodes of frontier	2024-01-22 17:44:15 -05:00
Peter Boyle	d967eb53de	Working for first time	2024-01-17 16:31:12 -05:00
Peter Boyle	25f71913b7	MultiRHS coarse	2024-01-04 12:01:17 -05:00
Peter Boyle	d5fd90b2f3	Add 48^3 rtest	2024-01-04 12:00:01 -05:00
Peter Boyle	22c611bd1a	Delete temp file	2023-12-21 18:32:31 -05:00
Peter Boyle	c9bb1bf8ea	Passing new BLAs based	2023-12-21 18:31:17 -05:00

1 2 3